CN116500986A - Method and system for generating priority scheduling rule of distributed job shop - Google Patents

Method and system for generating priority scheduling rule of distributed job shop Download PDF

Info

Publication number
CN116500986A
CN116500986A CN202310439782.8A CN202310439782A CN116500986A CN 116500986 A CN116500986 A CN 116500986A CN 202310439782 A CN202310439782 A CN 202310439782A CN 116500986 A CN116500986 A CN 116500986A
Authority
CN
China
Prior art keywords
time
scheduling rule
graph
factory
distributed job
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310439782.8A
Other languages
Chinese (zh)
Inventor
李新宇
黄江平
高亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310439782.8A priority Critical patent/CN116500986A/en
Publication of CN116500986A publication Critical patent/CN116500986A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/41865Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by job scheduling, process planning, material flow
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32252Scheduling production, machining, job shop
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of workshop scheduling, and particularly discloses a method and a system for generating a priority scheduling rule of a distributed job workshop, wherein the method comprises the following steps: constructing a scheduling rule generation model for deciding on a distributed job shop scheduling problem, wherein the distributed job shop scheduling problem is represented as a disjunctive graph: each factory corresponds to one sub-extraction graph, sub-extraction graphs of all factories are spliced to obtain extraction graphs capable of representing factory allocation and sequencing of procedures in the factories, and each node of the extraction graphs comprises allocated factory information; solving the extraction graph through a Markov decision model, extracting characteristics of the extraction graph through a graph neural network in the decision process, and performing action decision through an actor network; training a scheduling rule generation model according to the pre-acquired data set, and updating parameters of the graphic neural network and the actor network to obtain a trained scheduling rule generation model. The invention can realize the generation of the priority scheduling rule of the distributed job shop, and has better performance and generalization.

Description

Method and system for generating priority scheduling rule of distributed job shop
Technical Field
The invention belongs to the field of workshop scheduling, and particularly relates to a method and a system for generating a priority scheduling rule of a distributed job workshop.
Background
Production scheduling is an important link of a manufacturing system, and directly affects the benefit and competitiveness of enterprises. Distributed manufacturing has become one of the important development directions of manufacturing industry, and has the advantages of large flexibility, rapid response, high reliability and the like, can cope with urgent production demands, promotes customization, low cost and small-scale production, and reduces the dependence of production on environment. Distributed job shop scheduling problems (Distributed Job Shop Scheduling Problem, DJSP) are typical representatives of equipment manufacturing, which treat each plant as one job shop, the process of different workpieces may be different; it mainly contains 2 sub-problems, shop allocation of workpieces and scheduling of processes on machines in each shop to meet different production requirements, as shown in fig. 1.
The priority scheduling rules (Priority Dispatch Rule, PDR) are a classical heuristic method that has been widely used in practical production. The PDR is visual, quick and easy to understand, and is widely applied to various scheduling problems. The advantages of PDR are more evident for complex production scenarios lacking a priori knowledge. A good PDR is based on rich domain knowledge and is continuously perfected during trial-and-error. In addition, the performance of PDR is greatly affected by the scale of the problem. Therefore, designing a general priority scheduling rule generating method with self-learning and self-evolution capability is very important to solve the complex and variable production scheduling problem.
Deep reinforcement learning (Deep Reinforcement Learning, DRL) technology combines the perceptibility of deep learning and the decision making capability of reinforcement learning, and is an artificial intelligence method which is more similar to the human thinking mode. The DRL has autonomy, can learn the optimal action selection, and reacts to the environment in real time. In addition, the method has strong generalization capability and high solving speed, so that the method is valuable in exploring the application of the method in the field of workshop scheduling.
The graph theory is widely applied in the field of workshop scheduling, and the extracted graph expression mode of the solution of the scheduling problem can clearly illustrate the constraint relation among the procedures of the same workpiece. In the field of deep learning, the graphic neural network (Graph Neural Network, GNN) is a network structure for directly operating graphic data, and has been successfully applied to the problems of decentralized wireless resource allocation, residual service life prediction of industrial equipment, voltage stability control in a power system, and the like.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a method and a system for generating a priority scheduling rule of a distributed job shop, and aims to provide a method for generating the priority scheduling rule with self-learning and self-evolution capabilities and strong universality, so as to realize the generation of the priority scheduling rule of the distributed job shop.
To achieve the above object, according to a first aspect of the present invention, there is provided a method for generating a priority scheduling rule for a distributed job shop, including the steps of:
constructing a scheduling rule generation model for deciding a distributed job shop scheduling problem, wherein:
representing the distributed job shop scheduling problem as a disjunctive graph: each factory corresponds to one sub-extraction diagram, sub-extraction diagrams of all factories are spliced to obtain extraction diagrams capable of representing factory allocation and sequencing of procedures in all factories, and each node of each extraction diagram comprises allocated factory information;
solving the disjunctive graph through a Markov decision model: the Markov decision model gradually perfects all nodes in the extraction graph through multiple decision updating extraction graph, so as to obtain a final solution; in the decision process, extracting the characteristics of the extracted graph through a graph neural network, and performing action decision through an actor network;
training the constructed scheduling rule generation model according to the pre-acquired data set, and iteratively updating the parameters of the graphic neural network and the actor network to obtain a trained scheduling rule generation model;
and the scheduling rule generation of the distributed job shop is realized through the trained scheduling rule generation model.
As a further preferred mode, in the scheduling rule generation model, the factory allocation of each workpiece is determined first, and then the extracted graph is solved through a Markov decision model; the factory distribution method of the workpiece comprises the following steps:
respectively calculating the total processing time of each workpiece for completing all working procedures, and sequencing each workpiece according to the ascending order of the total processing time; then sequentially placing the first f workpieces into f factories, wherein f is the total number of the factories;
for the rest workpieces, calculating the total processing time of all the workpieces in each factory, and distributing the workpieces with the forefront current sequence to the factory with the minimum total processing time; this process is repeated until factory distribution of all the workpieces is completed.
As a further preferred, the markov decision model includes state features at each moment in the decision process;
extracting a graph at any decision point t time, wherein the nodes comprise 5 features, and the 5 features form a state feature; the 5 features are specifically:
1) Node v corresponds to process O ji Processing time p of (2) ji
2) Binary variable b (v, s t ) The method comprises the steps of carrying out a first treatment on the surface of the When node v corresponds to process O ji When the scheduling has been completed at time t, b (v, s t ) =1, otherwise b (v, s t )=0;
3) Node v corresponds to process O ji Factory fac (v, s) t ) The method comprises the steps of carrying out a first treatment on the surface of the If a workpiece is assigned to the factory k, the factory characteristics of all the processes of the workpiece at time t are denoted as fac (v, s t )=k;
4) Node v corresponds to process O ji Lower boundary c of completion time of (c) LB (O ji ,s t );
When process O ji Is a workpiece J j In the first pass of (2), the estimated lower bound is equal to the workpiece J j Is added to the release time of the process O ji Is not limited, and the processing time of the device is not limited;
when process O ji Not work J j Judging: if process O at time t ji Finishing the processing, the estimated lower bound of the finishing time is equal to the actual finishing time; otherwise, the lower bound is estimated to pass c LB (O ji ,s t )=c LB (O j,i-1 ,s t )+p ji Calculated, wherein c LB (O j,i-1 ,s t ) Is the procedure O ji Lower limit of the estimation of the completion time of the previous process, p ji Is a procedure O ji Is a processing time of (a);
5) Node v corresponds to process O ji Is the earliest release time est (O) ji ,s t );
Initial state s 0 When (1): when process O ji Is a workpiece J j The earliest release time is 0 in the first procedure of (2); when process O ji Not work J j In the first step of (2), the earliest release time is est (O ji ,s 0 )=est(O j,i-1 ,s 0 )+p j,i-1 Wherein est (O) j,i-1 ,s 0 ) Is the procedure O ji Is the previous process O j,i-1 Is the earliest release time, p j,i-1 Is a procedure O j,i-1 Is a processing time of (a);
state s t When t+.0: when process O ji Finishing the processing, wherein the earliest release time is 0; when process O ji Not processed and preceding process O j,i-1 Finishing the processing, then the process O ji Is equal to the earliest release time of the previous procedure O j,i-1 Is a completion time of (2); when process O ji Not processed and preceding process O j,i-1 The earliest release time of the processing is the process O j,i-1 Adding process O to the earliest release time of (2) j,i-1 Processing time p of (2) j,i-1
As a further preference, the markov decision model also includes actions, state transitions, and rewards at each moment in the decision process; the method comprises the following steps:
the actions are as follows: action space A of decision point t t A set formed by the next working procedure of all unfinished workpieces;
state transition: inserting the selected action into a feasible position such that the earliest start processing time of the process is minimized;
rewarding: at decision point t, if action a t In the corresponding process in the plant k, the current prize is R (s t ,a t )=C k (s t )-C k (s t+1 ) Wherein R(s) t ,a t ) Is state s t Lower selection action a t Obtained prize value, C k (s t ) Is the maximum completion of factory k at time tTime of formation, C k (s t+1 ) Is to take action a t Maximum completion time of plant k at time t+1 later.
As a further preferred way of combining the 5 features into a state feature is:
each feature matrix is n, m, and is rewritten into a one-dimensional matrix with n multiplied by m,1, wherein n is the total number of workpieces, and m is the total number of working procedures of each workpiece; and splicing the one-dimensional matrixes corresponding to the 5 features into a matrix of [ n multiplied by m,5], and then carrying out normalization processing on the matrix to obtain the state features.
As a further preferred option, in the decision process, features of the extracted graph are extracted through the graph neural network, and the actor network is input, so that the actor network scores each action, and based on the score, the probability of each action being selected is calculated through a softmax function, and the action with the highest probability of being selected is output.
As a further preference, the scheduling-rule generating model is trained by an Adam optimizer.
As a further preferred mode, when training the scheduling rules generation model, calculating a network loss function through an asynchronous dominant actor-critic network, dynamically adjusting the learning rate by adopting an Adam optimizer, and updating the parameters of the graph neural network and the actor network.
According to a second aspect of the present invention, there is provided a priority scheduling rule generating system of a distributed job shop, comprising a processor for executing the above-mentioned priority scheduling rule generating method of the distributed job shop.
According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described priority scheduling rule generation method of a distributed job shop.
In general, compared with the prior art, the above technical solution conceived by the present invention mainly has the following technical advantages:
1. according to the characteristics of the distributed job shop, the invention designs a representation method of the disjunctive graph, and further establishes the relationship between disjunctive graph representation and graph neural network of the scheduling problem of the distributed job shop, thereby constructing a scheduling rule generation model which can make real-time scheduling decisions by observing the current scheduling environment, and provides a new view for solving the scheduling problem; meanwhile, the time for solving the problem is shortened, the quality of knowledge is improved, and therefore the production efficiency of enterprises can be remarkably improved.
2. Compared with the existing priority scheduling rule method, the method has strong self-learning and self-evolution capability, higher solving efficiency and stronger optimizing; compared with the prior meta heuristic algorithm, the method is simple, easy to understand, strong in generalization and stability and applicable to different scheduling environments.
3. According to the invention, factory allocation rules are designed, factory allocation of workpieces is performed in advance, and then, according to the characteristics of distributed problems, state characteristics, rewarding mechanisms and the like in a Markov decision model are designed in a matching manner, so that efficient and accurate generation of the priority scheduling rules of the distributed job shop is realized.
Drawings
FIG. 1 is a schematic view of a factory distribution in a distributed job shop according to an embodiment of the present invention;
fig. 2 (a) and (b) are exploded views of a solution of a distributed job shop according to an embodiment of the present invention;
FIGS. 3 (a) - (c) are schematic diagrams of the action space in the Markov decision model according to an embodiment of the present invention;
fig. 4 (a) and (b) are schematic views of state transitions in a markov decision model according to an embodiment of the present invention;
fig. 5 (a) - (c) are graphs of the results of the comparison of the TA dataset with the factory number of 2 with classical scheduling rules, meta-heuristic, reinforcement learning algorithms according to the embodiment of the present invention;
fig. 6 (a) - (c) are graphs of the results of the comparison of the TA dataset with the number of factories of 3 with classical scheduling rules, meta-heuristic algorithms, reinforcement learning algorithms, respectively, according to the embodiments of the present invention;
fig. 7 (a) - (c) are graphs of the results of the comparison of the TA dataset with the number of factories of 4 with classical scheduling rules, meta-heuristic algorithms, reinforcement learning algorithms, respectively, according to the embodiments of the present invention;
fig. 8 is a flowchart of a method for generating a priority scheduling rule in a distributed job shop according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The method for generating the priority scheduling rule of the distributed job shop provided by the embodiment of the invention, as shown in fig. 8, comprises the following steps:
s1, constructing a scheduling rule generation model for deciding a scheduling problem of a distributed job shop; in the scheduling rule generation model, the scheduling problem of the distributed job shop is expressed as a disjunctive graph, and the disjunctive graph is solved through a Markov decision model; the method specifically comprises the following steps:
(1) Solution representation method
In the shop scheduling problem, the extraction graph is a classical solution representation method. The extraction graph is a directed graph g= (V, C u D), where V is a node set, set V contains process nodes of the workpiece and 2 empty nodes { S, T }, the processing time of the empty nodes is 0, and all processing tasks start from node S and end with node T. C is a connection arc set and represents a preferential constraint relationship between the same workpiece processes. And D is an extracted arc set, and represents the sequence of the processing procedures on the same machine.
Aiming at DJSP, the invention provides a splicing extraction graph representation method, wherein each node comprises basic node information, such as processing time, earliest start processing time and the like of a corresponding working procedure of the node, and further comprises additional information, namely a processing factory where each working procedure is located. The factory allocation information in the extraction graph may reflect the factory allocation scheme of the DJSP.
For example, for an example of 2 factories, 6 workpieces, the workpiece processing letterAs shown in Table 1, the extracted graph of the initial information is shown in FIG. 2 (a), and FIG. 2 (b) shows one possible solution, with workpieces 1, 2 and 3 being processed in the factory 1 (first three rows) and workpieces 4, 5 and 6 being processed in the factory 2 (second three rows). The working sequence of the working procedures on the same machine is marked by the broken line arrow with the same color, such as the working procedure O 11 ,O 22 And O 31 In machine M 2 And (3) sequentially processing, wherein the process priority of the same workpiece is marked by black arrows.
Table 1 processing time and processing machine information of the process
(2) Factory allocation rules
Factory allocation rules are used to solve the first sub-problem of DJSP, the shop allocation of workpieces. Firstly, sequencing all workpieces according to the ascending order of the total processing time of the workpieces, and then sequentially placing the first f workpieces (f is the total number of factories) into f factories; and for the rest workpieces, calculating the total processing time of all the workpieces in each factory, distributing the workpieces with the forefront of the current sequence to the factory with the minimum total processing time, and repeating the process until the factory distribution of all the workpieces is completed.
(3) Markov decision model
DJSP is a sequence decision problem that can be constructed as a Markov decision process. The Markov decision process may be represented by a five-tuple (S, A, P, gamma, R), S being a set of states, A being a set of actions, P being a dynamic model, gamma being a discount factor between 0 and 1, R being a reward function.
Arbitrary decision point t, the agent observes the current environmental state s t E S, selecting an action a by a given policy pi (S→A) t E A, the agent is represented by probability p (s t+1 |s t ,a t ) Entering a new state s t+1 And obtain a real-time rewarding r t ∈R。
The invention adopts a disjunctive graph mode to represent the solution of DJSP, and a Markov model is established as follows:
status: extraction graph G (t) = (V, C ≡d) at arbitrary decision point t c (t),D u (t)) reflects the state of the current solution. D (D) c (t) includes an extracted arc with direction, D u (t) then contains the extracted arcs without direction, D c (t)∪D u (t) =d. When DJSP is in initial state, D c (t) =Φ; when DJSP has completed scheduling, D u (t) =Φ; the set V includes all the processing steps.
Any node in set V contains the following 5 features:
1) Node v corresponds to process O ji Processing time p of (2) ji
2) Binary variable b (v, s t ) The method comprises the steps of carrying out a first treatment on the surface of the When node v corresponds to process O ji When the scheduling has been completed at time t, b (v, s t ) =1, otherwise b (v, s t )=0;
3) Node v corresponds to process O ji Factory fac (v, s) t ) The method comprises the steps of carrying out a first treatment on the surface of the If the workpiece J j Is distributed to the factory k, the factory characteristics of all the working procedures of the workpiece at the time t are expressed as fac (v, s t )=k,k∈[1,f];
4) Node v corresponds to process O ji Lower boundary c of completion time of (c) LB (v,s t );
When i=1, i.e. procedure O ji Is a workpiece J j In the first pass of (2), the estimated lower bound is equal to the workpiece J j Is added to the release time of the process O ji Is not limited, and the processing time of the device is not limited;
when i.noteq.1, i.e. procedure O ji Not work J j During the first procedure of (2), judging: if process O at time t ji Finishing the processing, the estimated lower bound of the finishing time is equal to the actual finishing time; otherwise, the lower bound passes formula c LB (O ji ,s t )=c LB (O j,i-1 ,s t )+p ji Calculated, wherein is O j,i-1 Is the procedure O ji P is the previous step of ji Is a procedure O ji Is a processing time of (a);
5) Node v corresponds toProcedure O ji Is the earliest release time est (O) ji ,s t );
Initial state s 0 When i=1, i.e., step O ji Is a workpiece J j In the first step of (2), the earliest release time is 0, that is, est (O) j1 ,s 0 ) =0; when i.noteq.1, i.e. procedure O ji Not work J j In the first step of (2), the initial release time is est (O ji ,s 0 )=est(O j,i-1 ,s 0 )+p j,i-1 In the process O j,i-1 Is a procedure O ji P is the previous step of j,i-1 Is a procedure O j,i-1 Is a processing time of (a);
state s t If (t.noteq.0) is the process O ji The processing is completed, the earliest release time is set to 0, that is, est (O ji ,s t ) =0; when process O ji Not processed and preceding process O j,i-1 Finishing the processing, then the process O ji Is equal to the previous process O j,i-1 Completion time of (2), i.e. est (O) ji ,s t )=c LB (O j,i-1 ,s t ) Otherwise, if the procedure O ji Is the previous process O j,i-1 The earliest release time of the processing is the process O j,i-1 Adding process O to the earliest release time of (2) j,i-1 Treatment time p j,i-1 That is, est (O) ji ,s t )=est(O j,i-1 ,s t )+p j,i-1
The 5 characteristics are adopted to represent the state of DJSP at any time, thus obtaining 5 values of [ n, m ]]Are written as matrices of size [ n x m,1 respectively]Is spliced to form a one-dimensional matrix with the size of [ n multiplied by m,5]]Is simultaneously normalized by a normalization formula And carrying out normalization processing to obtain state characteristics.
The actions are as follows: number of workpiecesFor n, the DJSP problem of m number of work pieces each contains n×m number of work pieces, one work piece is selected by each decision point agent. According to the definition of DJSP problem, every workpiece can only finish the processing of one procedure at any time. Therefore, the operation space a at time t t Is a set formed by the next working procedure of all unfinished workpieces, and the action space is gradually reduced along with the gradual processing of the workpieces. As shown in fig. 3 (a), step O 11 ,O 21 And O 31 Has been scheduled, so the action space is { O } 12 ,O 22 ,O 32 ,O 41 ,O 51 ,O 61 -a }; as shown in FIG. 3 (b), in the step O 22 Selected and scheduling is completed, then the action space becomes { O } 12 ,O 23 ,O 32 ,O 41 ,O 51 ,O 61 -a }; as shown in FIG. 3 (c), if step O 23 Scheduled, the action space becomes { O } 12 ,O 32 ,O 41 ,O 51 ,O 61 And the size of the motion space is reduced by one.
State transition: when an agent selects an action, it is necessary to determine its earliest start time so that the completion time of the current process is minimized. As shown in FIG. 4, step O circled with a solid line 11 ,O 21 And O 31 The scheduling has been completed and the remaining unfinished scheduled steps are circled with dashed lines. As shown in fig. 4 (a), in state s 3 Action a 3 =O 22 Is from the state space { O 12 ,O 22 ,O 32 ,O 41 ,O 51 ,O 61 Selected from the group. Procedure O 22 The earliest start processing time of (2) is 3, i.e. procedure O 21 Is completed in machine M 2 In the process O 22 Earlier than procedure O 31 When the processing time is minimized, the state of the extracted graph is changed from that shown in fig. 4 (a) to that shown in fig. 4 (b), step O 31 The starting time of (2) is changed from 2 to 6.
Rewarding: the reward is a feedback signal of the environment indicating the quality of the decision made by the agent at the current decision point. The goal of reinforcement learning is to make the current rewards earned by the agent as large as possible, hopefullyIs the largest. Wherein the bonus function at time t is defined as R (s t ,a t )=H(s t )-H(s t+1 ) H (·) is a quality assessment of the different states. In the invention, the maximum completion time of DJSP directly reflects the production efficiency of workshops. By combining the distributed characteristics of DJSP, a reward function based on the maximum completion time of each factory is designed. The maximum completion time of each plant is equal to the maximum completion time of all the processes in the plant, i.e. C k =max(C LB (O ji )|fac(O ji )=k),k∈[1,f]. When all the working procedures are scheduled, the maximum completion time is C max =max(C k ). At decision point t, if action a t In the corresponding process, H(s) t )=C k (s t ). The reward function of the invention is R (s t ,a t )=C k (s t )-C k (s t+1 ). As can be seen by the reward function, maximizing the jackpot equates to minimizing the maximum completion time.
(4) GNN-based policies
The conventional PDR is to select a procedure for scheduling with probability 1 for each step. The invention designs a strategy pi (a t |s t ) Which outputs a probability distribution based on the motion space. To optimize the strategy, the strategy parameter θ needs to be determined through training to obtain a strategy pi of optimal parameters θ (a t |s t )。
Graph embedding: embedding is a compressed representation of information, and the present invention employs a graph neural network Graph Isomorphism Network (GIN) to extract features of the extracted graph. Given a graph G= (V, C U D), through K iterations, GIN calculates an embedded information vector of p dimension for all nodes in the set V, and the updating formula of the node information of GIN is Wherein (1)>Is compressed information of node v after k iterations, < >>Is the original feature of node v, +.>The network parameter obtained by the kth iteration is theta k E of the multi-layer perceptron network (k) Is a learnable parameter and N (v) is the neighborhood of node v.
Information aggregation: the node information after K iterations is aggregated by adopting an average pooling function, and the expression is as followsWhen the state is s t In any case, procedure O ji (O ji E V) is a 5-dimensional vector ∈ ->Wherein (1)> And->Procedure O obtained according to normalization formula ji Is a normalized value of 5 features of (c). After K iterations, procedure O ji Is characterized by->Global graph embedding information h from average pooling function g (s t )。
Action selection: the action space is a set composed of the current processes of all the unfinished machined workpieces. The invention is thatThe probability distribution of the action space is calculated by using an Actor Network (Actor Network) composed of two layers of multi-layer perceptrons. Integrating node embedded information and global embedded information to obtainAnd inputting the probability into an Actor Network, scoring each action by the Actor Network, calculating the probability of each action being selected by a softmax function, and selecting the action with the highest probability for output.
The reinforcement learning method comprises the following steps: actor-Critic networks (Actor-Critic networks) are a very effective reinforcement learning method that combines strategy gradient and time-difference learning. The Actor Network is a policy function that learns a policy such that an agent gets as high a prize as possible from the environment. A Critic Network (Critic Network) is a cost function for evaluating the superiority of current policies. Based on the cost function, the Actor-Critic Network updates the Network parameters once per step. The invention adopts an asynchronous dominant actor-critter network (Asynchronous Advantage Actor-Critic, A3C) to realize the self-learning of the strategy, thereby updating network parameters.
S2, training the constructed scheduling rule generation model according to the pre-acquired data set to obtain a trained scheduling rule generation model; the method specifically comprises the following steps:
(5) Model training
Training the scheduling discipline generation model by adopting an Adam optimizer.
Aiming at DJSP of different scales, respectively training a model, iterating the problem of each scale 1000 times, and carrying out parameter updating on the model once for each iteration. Every 10 iterations, the model is verified on 50 fixed test cases designed in advance, and if the current model optimizes the average value of the 50 test case results, the current model is saved.
(6) Model effect verification
The saved model can be directly used for solving the DJSP problem, so that the real-time scheduling requirement is met.
In order to verify the practical application effect of the invention, a TA standard test set is selected for simulation test and is compared with 8 classical PDRs, 5 meta heuristic algorithms and 3 RL algorithms.
Fig. 5-7 are comparison results of the present invention with other algorithms in solving TA datasets for numbers 2, 3 and 4 of plants. As can be seen from the results in the figure, the PDR generation method based on GNN and RL can solve the DJSP problem of different scales, and has great advantages compared with the traditional PDR, meta heuristic algorithm and related RL algorithm.
Besides verifying the effectiveness of the invention through design experiments, the generalization of the invention is verified by solving the large-scale problems through models trained by different scale problems. The experimental results are shown in table 2, and the data in the table show that the same problem and the objective function values obtained by different models are not different, so that the invention has strong generalization.
Table 2 model generalization verification data
The reaction variable in FIGS. 5 to 7 is the relative growth rate (relative percentage increase, RPI) and the formula isMethod sol The maximum completion time of the current example obtained by the algorithm Method is Best sol Is the optimal value of the maximum completion time of the current example calculated by all the comparison algorithms.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method for generating a priority scheduling rule of a distributed job shop is characterized by comprising the following steps:
constructing a scheduling rule generation model for deciding a distributed job shop scheduling problem, wherein:
representing the distributed job shop scheduling problem as a disjunctive graph: each factory corresponds to one sub-extraction diagram, sub-extraction diagrams of all factories are spliced to obtain extraction diagrams capable of representing factory allocation and sequencing of procedures in all factories, and each node of each extraction diagram comprises allocated factory information;
solving the disjunctive graph through a Markov decision model: the Markov decision model gradually perfects all nodes in the extraction graph through multiple decision updating extraction graph, so as to obtain a final solution; in the decision process, extracting the characteristics of the extracted graph through a graph neural network, and performing action decision through an actor network;
training the constructed scheduling rule generation model according to the pre-acquired data set, and iteratively updating the parameters of the graphic neural network and the actor network to obtain a trained scheduling rule generation model;
and the scheduling rule generation of the distributed job shop is realized through the trained scheduling rule generation model.
2. The method for generating a priority scheduling rule for a distributed job shop according to claim 1, wherein in the scheduling rule generation model, factory allocation of each workpiece is determined first, and then the extraction graph is solved by a markov decision model; the factory distribution method of the workpiece comprises the following steps:
respectively calculating the total processing time of each workpiece for completing all working procedures, and sequencing each workpiece according to the ascending order of the total processing time; then sequentially placing the first f workpieces into f factories, wherein f is the total number of the factories;
for the rest workpieces, calculating the total processing time of all the workpieces in each factory, and distributing the workpieces with the forefront current sequence to the factory with the minimum total processing time; this process is repeated until factory distribution of all the workpieces is completed.
3. The method for generating a priority scheduling rule for a distributed job shop according to claim 1, wherein the markov decision model includes state features at each moment in the decision process;
extracting a graph at any decision point t time, wherein the nodes comprise 5 features, and the 5 features form a state feature; the 5 features are specifically:
1) Node v corresponds to process O ji Processing time p of (2) ji
2) Binary variable b (v, s t ) The method comprises the steps of carrying out a first treatment on the surface of the When node v corresponds to process O ji When the scheduling has been completed at time t, b (v, s t ) =1, otherwise b (v, s t )=0;
3) Node v corresponds to process O ji Factory fac (v, s) t ) The method comprises the steps of carrying out a first treatment on the surface of the If a workpiece is assigned to the factory k, the factory characteristics of all the processes of the workpiece at time t are denoted as fac (v, s t )=k;
4) Node v corresponds to process O ji Lower boundary c of completion time of (c) LB (O ji ,s t );
When process O ji Is a workpiece J j In the first pass of (2), the estimated lower bound is equal to the workpiece J j Is added to the release time of the process O ji Is not limited, and the processing time of the device is not limited;
when process O ji Not work J j Judging: if process O at time t ji Finishing the processing, the estimated lower bound of the finishing time is equal to the actual finishing time; otherwise, the lower bound is estimated to pass c LB (O ji ,s t )=c LB (O j,i-1 ,s t )+p ji Calculated, wherein c LB (O j,i-1 ,s t ) Is the procedure O ji Lower limit of the estimation of the completion time of the previous process, p ji Is a procedure O ji Is a processing time of (a);
5) Node v corresponds to process O ji Is the earliest release time est (O) ji ,s t );
Initial state s 0 When (1): when process O ji Is a workpiece J j The earliest release time is 0 in the first procedure of (2); when process O ji Not work J j In the first process of (a), itThe earliest release time is est (O ji ,s 0 )=est(O j,i-1 ,s 0 )+p j,i-1 Wherein est (O) j,i-1 ,s 0 ) Is the procedure O ji Is the previous process O j,i-1 Is the earliest release time, p j,i-1 Is a procedure O j,i-1 Is a processing time of (a);
state s t When t+.0: when process O ji Finishing the processing, wherein the earliest release time is 0; when process O ji Not processed and preceding process O j,i-1 Finishing the processing, then the process O ji Is equal to the earliest release time of the previous procedure O j,i-1 Is a completion time of (2); when process O ji Not processed and preceding process O j,i-1 The earliest release time of the processing is the process O j,i-1 Adding process O to the earliest release time of (2) j,i-1 Processing time p of (2) j,i-1
4. A method of generating a priority scheduling rule for a distributed job shop as claimed in claim 3, wherein the markov decision model further comprises actions, state transitions and rewards at each moment in the decision process; the method comprises the following steps:
the actions are as follows: action space A of decision point t t A set formed by the next working procedure of all unfinished workpieces;
state transition: inserting the selected action into a feasible position such that the earliest start processing time of the process is minimized;
rewarding: at decision point t, if action a t In the corresponding process in the plant k, the current prize is R (s t ,a t )=C k (s t )-C k (s t+1 ) Wherein R(s) t ,a t ) Is state s t Lower selection action a t Obtained prize value, C k (s t ) Is the maximum completion time of factory k at time t, C k (s t+1 ) Is to take action a t Maximum completion time of plant k at time t+1 later.
5. A method of generating a priority scheduling rule for a distributed job shop according to claim 3, wherein the 5 feature combinations are state features in the manner of:
each feature matrix is n, m, and is rewritten into a one-dimensional matrix with n multiplied by m,1, wherein n is the total number of workpieces, and m is the total number of working procedures of each workpiece; and splicing the one-dimensional matrixes corresponding to the 5 features into a matrix of [ n multiplied by m,5], and then carrying out normalization processing on the matrix to obtain the state features.
6. The method for generating a priority scheduling rule for a distributed job shop according to claim 1, wherein in the decision process, features of the extracted graph are extracted through a graph neural network, and actor networks are input, whereby the actor networks score each action, and based on the score, a probability of each action being selected is calculated through a softmax function, and an action with the highest probability of being selected is output.
7. A method of generating a priority scheduling rule for a distributed job shop as claimed in any one of claims 1 to 6, wherein the scheduling rule generation model is trained by an Adam optimizer.
8. The method for generating a priority scheduling rule for a distributed job shop according to claim 7, wherein when training the scheduling rule generation model, calculating a network loss function through an asynchronous dominant actor-critique network, dynamically adjusting a learning rate by using an Adam optimizer, and updating parameters of a graph neural network and actor network.
9. A priority scheduling rule generating system for a distributed job shop, comprising a processor for executing the priority scheduling rule generating method for a distributed job shop as claimed in any one of claims 1-8.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the priority scheduling rule generating method of a distributed job shop according to any one of claims 1-8.
CN202310439782.8A 2023-04-20 2023-04-20 Method and system for generating priority scheduling rule of distributed job shop Pending CN116500986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310439782.8A CN116500986A (en) 2023-04-20 2023-04-20 Method and system for generating priority scheduling rule of distributed job shop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310439782.8A CN116500986A (en) 2023-04-20 2023-04-20 Method and system for generating priority scheduling rule of distributed job shop

Publications (1)

Publication Number Publication Date
CN116500986A true CN116500986A (en) 2023-07-28

Family

ID=87316053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310439782.8A Pending CN116500986A (en) 2023-04-20 2023-04-20 Method and system for generating priority scheduling rule of distributed job shop

Country Status (1)

Country Link
CN (1) CN116500986A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116957172A (en) * 2023-09-21 2023-10-27 山东大学 Dynamic job shop scheduling optimization method and system based on deep reinforcement learning
CN116993028A (en) * 2023-09-27 2023-11-03 美云智数科技有限公司 Workshop scheduling method and device, storage medium and electronic equipment
CN117057569A (en) * 2023-08-21 2023-11-14 重庆大学 Non-replacement flow shop scheduling method and device based on neural network
CN117555306A (en) * 2024-01-11 2024-02-13 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057569A (en) * 2023-08-21 2023-11-14 重庆大学 Non-replacement flow shop scheduling method and device based on neural network
CN116957172A (en) * 2023-09-21 2023-10-27 山东大学 Dynamic job shop scheduling optimization method and system based on deep reinforcement learning
CN116957172B (en) * 2023-09-21 2024-01-16 山东大学 Dynamic job shop scheduling optimization method and system based on deep reinforcement learning
CN116993028A (en) * 2023-09-27 2023-11-03 美云智数科技有限公司 Workshop scheduling method and device, storage medium and electronic equipment
CN116993028B (en) * 2023-09-27 2024-01-23 美云智数科技有限公司 Workshop scheduling method and device, storage medium and electronic equipment
CN117555306A (en) * 2024-01-11 2024-02-13 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system
CN117555306B (en) * 2024-01-11 2024-04-05 天津斯巴克斯机电有限公司 Digital twinning-based multi-production-line task self-adaptive scheduling method and system

Similar Documents

Publication Publication Date Title
CN116500986A (en) Method and system for generating priority scheduling rule of distributed job shop
Luo et al. Energy-efficient scheduling for multi-objective flexible job shops with variable processing speeds by grey wolf optimization
CN112734172B (en) Hybrid flow shop scheduling method based on time sequence difference
CN107767022B (en) Production data driven dynamic job shop scheduling rule intelligent selection method
CN111756653B (en) Multi-coflow scheduling method based on deep reinforcement learning of graph neural network
CN113792924A (en) Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network
CN111353646B (en) Steelmaking flexible scheduling optimization method, system, medium and equipment with switching time
CN114565247B (en) Workshop scheduling method, device and system based on deep reinforcement learning
CN111160755B (en) Real-time scheduling method for aircraft overhaul workshop based on DQN
CN112348314A (en) Distributed flexible workshop scheduling method and system with crane
Du et al. Collaborative optimization of service scheduling for industrial cloud robotics based on knowledge sharing
CN115454005A (en) Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
Pol et al. Global Reward Design for Cooperative Agents to Achieve Flexible Production Control under Real-time Constraints.
CN117331700B (en) Computing power network resource scheduling system and method
CN114611897A (en) Intelligent production line self-adaptive dynamic scheduling strategy selection method
Iklassov et al. On the Study of Curriculum Learning for Inferring Dispatching Policies on the Job Shop Scheduling.
CN116562584A (en) Dynamic workshop scheduling method based on Conv-lasting and generalization characterization
CN116796964A (en) Method for solving job shop scheduling problem based on generation countermeasure imitation study
CN116300756A (en) Double-target optimal scheduling method and system for flexible manufacturing workshop with transportation robot
CN117057528A (en) Distributed job shop scheduling method based on end-to-end deep reinforcement learning
CN116151581A (en) Flexible workshop scheduling method and system and electronic equipment
CN115933568A (en) Multi-target distributed hybrid flow shop scheduling method
Wei et al. Composite rules selection using reinforcement learning for dynamic job-shop scheduling
Marchesano et al. Deep Reinforcement Learning Approach for Maintenance Planning in a Flow-Shop Scheduling Problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination