CN114912826A - Flexible job shop scheduling method based on multilayer deep reinforcement learning - Google Patents

Flexible job shop scheduling method based on multilayer deep reinforcement learning Download PDF

Info

Publication number
CN114912826A
CN114912826A CN202210603831.2A CN202210603831A CN114912826A CN 114912826 A CN114912826 A CN 114912826A CN 202210603831 A CN202210603831 A CN 202210603831A CN 114912826 A CN114912826 A CN 114912826A
Authority
CN
China
Prior art keywords
graph
model
reinforcement learning
decision
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210603831.2A
Other languages
Chinese (zh)
Inventor
李小霞
曾正祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong Agricultural University
Original Assignee
Huazhong Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong Agricultural University filed Critical Huazhong Agricultural University
Priority to CN202210603831.2A priority Critical patent/CN114912826A/en
Publication of CN114912826A publication Critical patent/CN114912826A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Manufacturing & Machinery (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a flexible job shop scheduling method based on multilayer deep reinforcement learning, which comprises the following steps: p1 deep reinforcement learning model part: the deep learning adopts a graph neural network, and the graph is extracted as the input of the graph neural network to obtain the characteristics of the graph, so that the characteristic representation of the problem is effectively obtained. The reinforcement learning is based on a Markov decision model, the flexible workshop scheduling problem obtains a decision scheme through the repeated decision process of the model, and the goal is optimized in a mode of maximizing the reward value. P2 training algorithm part: the method comprises the steps of training a model by adopting an operator _ critic algorithm, distributing tasks collected by samples to a plurality of sub-threads for carrying out decision making and sample generation independently by each sub-thread, and simultaneously deciding a plurality of problems by each sub-thread to generate a plurality of decision tracks, so that an unrelated high-quality sample optimization model is rapidly generated, and a final model is rapidly obtained.

Description

Flexible job shop scheduling method based on multilayer deep reinforcement learning
Technical Field
The invention relates to the field of combination optimization, in particular to a flexible job shop scheduling method based on multilayer deep reinforcement learning.
Background
The flexible job shop scheduling problem, in which the same workpiece may have multiple processing paths and the processing machines of the same process may have multiple sets, is an important extension of the shop scheduling problem and is considered as an NP problem. This greatly increases the complexity of the problem. How to find out the optimal solution for the flexible job shop scheduling problem in the shortest time has important significance in the combination optimization problem. At present, the main methods for solving the flexible job shop scheduling problem are scheduling rules and meta-heuristic algorithms. By prioritizing processes and machines based on the scheduling rules of the flexible job shop scheduling problem, solutions can be obtained quickly.
However, the scheduling results obtained using the scheduling rules are not ideal, and the scheduling rules are not applicable to a diverse processing environment. Compared with a scheduling rule, the meta-heuristic algorithm finds the optimal solution through a plurality of rounds of iteration, can obtain a good result, but has long calculation time, does not have generalization performance, and needs to be initialized and iterated again when the problem changes. Machine learning is applied to many fields as a new method and achieves good results, so that the application of the machine learning method to the flexible workshop scheduling problem is a new research direction. The deep reinforcement learning is a research branch of machine learning, a model of the deep reinforcement learning can be directly used for problem decision after a large amount of training, and a flexible workshop scheduling problem can also be expressed as a decision problem. The design of the deep reinforcement learning model is an important part of the method.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a flexible job shop scheduling method based on multilayer deep reinforcement learning aiming at the defects in the prior art. The flexible workshop scheduling problem is represented by a disjunctive graph, the graph neural network is used for extracting features, states, actions and rewards corresponding to the problem are designed to establish a Markov model, a layered decision model is designed to divide the flexible workshop scheduling problem into two sub-problems of procedure sequencing and machine selection for solving, and the asynchronous dominant operator _ critical algorithm can be used for training the model quickly and effectively.
The technical scheme adopted by the invention for solving the technical problem is as follows:
the invention provides a flexible job shop scheduling method based on multilayer deep reinforcement learning, which is characterized in that a deep reinforcement learning model is established for a flexible shop scheduling problem, the deep reinforcement learning model is trained, the flexible shop scheduling problem is solved through the trained deep reinforcement learning model, and an optimal scheduling scheme is output; the method comprises the following two parts:
p1 deep reinforcement learning model part: the deep reinforcement learning model is used for deciding the flexible workshop scheduling problem, expressing the flexible workshop scheduling problem as an extraction graph, and solving the flexible workshop scheduling problem as an orientation process of extraction arcs; the deep learning adopts a graph neural network, and the graph is extracted as the input of the graph neural network to obtain the characteristics of the graph, so that the characteristic representation of the problem is effectively obtained; the reinforcement learning is based on a Markov decision model, a state, an action and a reward corresponding to the problem are designed, and the layered decision model makes corresponding actions according to the state characteristics; the flexible workshop scheduling problem obtains a decision scheme through a repeated decision process of the model, and the target is optimized in a mode of maximizing the reward value;
p2 training algorithm part: the method comprises the steps of training a deep reinforcement learning model by adopting a multithreading and multi-track asynchronous dominant operator _ critic algorithm, distributing tasks collected by samples to a plurality of sub-threads for carrying out decision making and sample generation independently by each sub-thread, and simultaneously deciding a plurality of problems by each sub-thread to generate a plurality of decision tracks, so that an unrelated high-quality sample optimization model is rapidly generated, a final model is rapidly obtained, and the trained model supports rapid solving of flexible workshop scheduling problems and generalization on problems of different scales; and outputting an optimal scheduling scheme of the flexible workshop through the trained deep reinforcement learning model, and handing the optimal scheduling scheme to the flexible workshop for execution.
Further, a specific method for obtaining the characteristics of the disjunctive graph in the P1 deep reinforcement learning model part of the present invention is as follows:
step 1.1, obtaining an analytic Graph representation Graph according to a flexible workshop scheduling problem;
step 1.2, determining node information according to disjunct arcs in the disjunct graph;
and step 1.3, obtaining the Feature of the extracted graph by taking the extracted graph as the input of a graph neural network.
Further, the extraction diagram in step 1.1 of the present invention is defined as follows:
the disjunctive graph of the flexible plant scheduling problem is described as: given graph G ═ O, C, D, where O is the set of all process nodes O and two virtual process nodes S and E, which represent the start and end of the schedule, respectively; c is a connecting arc set<v,w>L V, w belongs to V }, and the two processes represented by V and w belong to the same workpiece; for a compound belonging to C<v,w>The expression that the node v to the node w have a connecting arc which is a one-way arc and has s for ensuring the sequential constraint of the processing sequence of each procedure on the same workpiece tv <s tw ,s tv The machining start time of the process represented by the node v; d is an extraction arc set, D is a last<v,w>L V, w belongs to V, and each procedure of extracting arcs which are bidirectional arcs to represent connected nodes V and w can be processed on the same machine; the final goal is to determine the directions of all disjunct arcs and simultaneously make the maximum completion time shortest; the number of working procedures of each workpiece in the flexible workshop scheduling problem may be different, and when the analysis graph is converted, if the number of the working procedures of the workpiece is less than the maximum number of the working procedures, a '0' working procedure node is added at the tail of the workpiece to ensure the uniformity of the graph structure, the '0' working procedure running time is not counted, and the workpiece can be processed on all machines.
Further, the method for calculating the node information in step 1.2 of the present invention specifically includes:
step 1.2.1, randomly selecting the execution time of each procedure on an executable machine as the estimated execution time of each procedure;
and 1.2.2, not considering the unoriented disjunctive arc constraint, sequentially processing each procedure according to the connection arc constraint relation and the oriented disjunctive arc relation, and calculating the completion time of each procedure as the node information.
Further, the specific method for calculating the neural network characteristics of the graph in the step 1.3 of the present invention is as follows:
step 1.3.1, inputting node information and an arc relation into a neural network of a kth-level graph to calculate node representation, wherein k is 1; the node characterization calculation formula is as follows:
adopting a graph isomorphic network structure, executing K times of updating iterations to calculate p-dimensional embedding of each node V, wherein V belongs to V, and the updating of the K-th layer is expressed as:
Figure BDA0003670045170000041
wherein the content of the first and second substances,
Figure BDA0003670045170000042
is a characteristic representation of node v at layer k; MLP is a multi-layer linear model, N (v) Is the set of all nodes connected to node v;
step 1.3.2, pooling node characterization graphs to obtain graph characterization, and adopting average pooling, wherein k is k + 1;
step 1.3.3, circularly executing step 1.3.1 and step 1.3.2 for K times;
and 1.3.4, performing linear transformation on the output layer to obtain the output characteristic Feature by the representation of the final graph.
Further, the decision making process in the P1 deep reinforcement learning model part of the present invention is as follows:
step 2.1, calculating the probability of the selection process according to the obtained Feature as the input of a decision model;
2.2, selecting a process o with the maximum probability according to the obtained probability greedy;
2.3, selecting the machine m most suitable for the selected process according to the scheduling rule;
step 2.4, the selected combination of the process o and the machine m is used as an action (o, m) in the current state, the state conversion is executed to obtain a new state, the extraction graph is updated, and the new and old states and the reward value are saved as samples;
step 2.5, repeatedly executing the step 1.2 to the step 2.4 until all the process selections are finished;
step 2.6, obtaining a final decision scheme through repeated decision of the model, and strengthening learning based on a Markov decision model; the Markov decision model is as follows:
a State State: the method comprises the steps that a corresponding graph structure is obtained through an input test set or a training set, machining processes of workpieces serve as nodes of the graph, machining sequence relations of the processes are arcs, node information comprises completion time of the processes, the arcs in the graph are directed arcs, arc information comprises machining sequences of the processes on a machine, namely two process nodes connected by the arcs execute a second process pointed by the arcs after a first process is completed in a decision scheme. The status also includes basic problem information, including whether each workpiece can be processed on different machines and the time corresponding to the processing;
and (4) Action: defining a primary action as a process o of determining a certain workpiece and allocating a machine m for the workpiece, wherein the process o is represented as (o, m), the state is used as the input of a deep reinforcement learning model, the characteristics of the process are extracted through a graph neural network and then input into a decision model to obtain process selection probability distribution, the workpiece is selected by the obtained probability greedy and the process is determined, and a proper machine is selected for the workpiece by a relevant scheduling rule;
and (3) state conversion: updating the state according to the selected action, updating the arc relation and the node information of the graph according to the working procedure and the machine corresponding to the action, namely adding or modifying the arc in the directed graph, and updating the completion time of the working procedure to be used as a new state;
reward: and taking the difference of the maximum completion time of the corresponding schemes of the analysis graphs before and after one state conversion as the timely reward of the decision, and summing the instant rewards of each decision as the accumulated reward according to the estimated processing time.
Further, the specific calculation method of the scheduling rule in step 2.3 of the present invention is as follows:
step 2.3.1, determining its set of executable machines S by the selection process o m
Step 2.3.2, obtaining a value f1 obtained by normalizing the time of each machine processing selected procedure in the set as an index 1;
step 2.3.3, calculating a value f2 normalized by the number of the processed procedures of each machine in the set as an index 2;
step 2.3.4, adding the index 1 and the index 2 to obtain a final index (f1+ f 2);
and 2.3.5, determining a machine from the set according to the final index, wherein the selected machine is a machine with short processing time and a small number of processed procedures.
Further, the state transition process in the markov decision model of the present invention specifically includes:
step 3.1, judging the processing feasibility of the selection process on the selection machine according to the state and the action;
step 3.2, determining and selecting a machined process sequence of a machine according to the analysis chart;
3.3, determining the processing time of the selection procedure on the selection machine;
judging whether the selection process can be inserted into a preset idle time period of the selection machine, if so, executing the step 3.4, and if not, executing the step 3.7;
step 3.4, calculating the earliest machinable time of the selection procedure and the idle time section of the selection machine, and determining the insertion position of the selection procedure in the machined procedure sequence
Step 3.5, modifying the arc relation of the extraction graph according to the insertion position, and deleting other extraction arcs connected with the selected process nodes;
step 3.6, updating the node information and finishing the state conversion;
step 3.7, determining the total start-up time, and adding the process to the end of the processed process sequence;
step 3.8, determining the disjuncting arc direction of the selection procedure on the selection machine, and deleting other disjuncting arcs connected with the selection procedure nodes;
and 3.9, updating the node information and finishing the state conversion.
Further, the calculation process of the instant reward and the cumulative reward in the markov decision model of the present invention is as follows:
step 4.1, calculating the maximum completion time T of the old state s
Step 4.2, calculating the maximum completion time T of the new state s+1
Step 4.3, calculate the instant prize value T s -T s+1
The calculation formula of the accumulated award is as follows:
r t =TS t -Ts t+1
Figure BDA0003670045170000071
wherein R is the cumulative prize value, T s1 For maximum completion time, T, corresponding to the initial state send Maximum completion time for the final project due to T s1 The fixed value determined according to the problem information does not change with the decision, so maximizing the prize value is equivalent to minimizing the maximum completion time of the final scheme.
Further, the training process in the P2 training algorithm part of the present invention specifically includes:
step 5.1, generating a main thread and T sub-threads and initializing a training round number Count to be 0;
step 5.2, copying parameters of the main thread model to the sub thread model;
step 5.3, starting each sub thread, and initializing the number of training rounds;
step 5.4, generating U problems by each sub thread;
5.5, solving a flexible workshop scheduling problem through a deep reinforcement learning model by the sub-thread and generating a sample;
step 5.6, completing sample collection, and optimizing the model parameters of the main thread by using a gradient descent strategy, wherein the Count is equal to Count + 1; the optimization formula is as follows:
Figure BDA0003670045170000072
Figure BDA0003670045170000073
where pi is the operator network, i.e. the graph neural network and the decision network, theta is the parameter thereof, and v is the critic network, used for and rewardingThe values are collectively estimated to calculate a merit function,
Figure BDA0003670045170000081
is a parameter thereof;
step 5.7, judging whether the maximum training round number T is reached c And if the parameter is not reached, executing the step 5.4 to the step 5.6, if the parameter is reached, finishing the training, and storing the main thread model parameters.
The invention has the following beneficial effects:
1. deep learning extracts problem features, can take into account internal connections of problems and adapt to changes in different manufacturing environments.
The trained model can obtain a better result in a short time, and the model can be used for solving flexible workshop scheduling problems of different scales without retraining, so that the method has strong generalization.
2. The flexible workshop scheduling problem is decomposed into two sub-problems of procedure ordering and machine selection, the two sub-problems are solved by using a layered structure, the complexity of the problem is reduced, meanwhile, the calculation time is reduced, and the structural complexity of the overall model is also reduced by using a mode of cooperation of a neural network model and a scheduling rule.
3. The asynchronous dominant operator _ critical algorithm is used for training, the training time is greatly shortened by the multi-thread and multi-track training method, samples used for training are from different decision processes of a model for scheduling different flexible workshops at the same moment, and therefore each sample is irrelevant and effective.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a schematic diagram of a flexible job shop scheduling method based on multi-layer deep reinforcement learning according to the present invention;
FIG. 2 is a disjunctive graph model of a 3 × 3 flexible shop scheduling problem;
FIG. 3 is a flexible shop dispatch criteria test set MK 01.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The framework of the invention is a training framework of a multi-thread and multi-track based hierarchical deep reinforcement learning model as shown in FIG. 1. Taking a training process as an example, the method comprises the following specific steps:
as shown in fig. 1, the training method is based on an asynchronous dominance operator _ critic algorithm, a multithreading multi-track method is adopted to train a model, a main thread model is a model to be optimized, model parameters of sub-threads are copied from the main thread model, each sub-thread makes a decision on a plurality of problems at the same time to generate a plurality of decision tracks, and each decision track is obtained according to a depth reinforcement learning model.
The specific implementation steps of the deep learning extraction feature specific implementation process of the deep reinforcement model in P1 are as follows:
step 1, generating a flexible workshop scheduling problem of 20 × 10 according to a set scale, initializing the flexible workshop scheduling problem, recording problem information, converting the problem into an extraction graph structure, calculating node information to obtain an initial state, and setting a decision time t to be 0;
when the processing sequence and the processing machines are not determined in the corresponding process of the node, one machine is randomly selected from the processing machines of the node, the processing end time of the process on the selected machine is used as the estimated processing time, and the actual processing time is used as the node information after the processing sequence and the processing machines are determined for the process.
Step 2, the extraction graph is input into a graph neural network to calculate state Feature;
the specific implementation steps of the state feature calculation of the neural network of the graph described in P1 are as follows:
step 1, inputting node information and an arc relation into a k-th layer graph neural network to calculate node representation, wherein k is 1;
the node characterization calculation formula is as follows:
Figure BDA0003670045170000101
step 2, pooling the node characterization graphs to obtain the characterization of the graphs, and adopting average pooling, wherein k is k + 1;
step 3, circularly executing the step 1 and the step 2 for K times, wherein K is 3;
and 4, obtaining the Feature of the output state by the representation of the graph through multi-layer linear network transformation.
The state characteristics are output through a layered decision network, the action comprises a process and a machine, the process decision adopts a linear neural network, and the machine selects a scheduling rule.
The specific implementation steps of the hierarchical decision network output action in P1 are as follows:
step 1, calculating the probability of selecting a process according to the obtained Feature as the input of a decision model;
step 2, selecting the process o with the maximum probability according to the obtained probability greedy t
Step 3, selecting the machine m most suitable for the selected process according to the scheduling rule t
Step 4, Process o of selection t And machine m t Combined as action (o) in the current state t ,m t );
In the above technical solution, the scheduling rule of the hierarchical decision network is calculated as follows:
step 1, determining executable machine set S by selection process o m
Step 2, obtaining the time P of the working procedure selected by each machine in the set om The normalized value f1 is index 1;
step 3, calculating the number N of the processed working procedures of each machine in the set m The normalized value f2 is index 2;
step 4, adding the index 1 and the index 2 to obtain a final index (f1+ f 2);
step 5, from the set S of final indexes m Middle determination machineMachine m t The machine selected is a machine with a short processing time and a small number of processed processes.
The sample collection procedure described in P2 is as follows:
step 1, executing state conversion to obtain a new state, and updating an extraction graph;
step 2, calculating an award value, and storing a new state and an old state and the award value as samples, wherein t is t + 1;
step 3, judging whether the decision is finished or not, if t is less than 200, returning to the step 2 in the feature extraction, and if t is 200, ending the decision;
and 4, obtaining a final scheme.
In the above technical solution, the state transition specifically executed steps of sample collection are as follows:
step 1, judging the processing feasibility of a selection procedure on a selection machine according to the current state and the selected action;
step 2, determining and selecting the machined process sequence M of the machine according to the analysis chart sec
Step 3, determining the processing time P of the selection procedure on the selection machine om And selecting the earliest starting time T of the process o
Step 3, calculating T o Judging whether the selection procedure can be inserted into the preposed idle time period of the selected machine or not in the maximum idle time period MT of the machine after the moment, and if the MT is in the preset idle time period>P om Execute step 4 if MT<P om Then step 7 is executed;
step 4, determining the sequence M of the selected working procedure in the processed working procedure according to the earliest processable time of the selected working procedure and the idle time segment of the selected machine sec The insertion position of (a);
step 5, modifying the arc relation of the extraction graph according to the insertion position, and deleting other extraction arcs connected with the selected process node;
step 6, updating the node information and finishing the state conversion;
step 7, determining the idle time T of the machine m In the selection step, the earliest starting time T o And T m Max (T) of o ,T m ) As a start-upTime, process step addition processed process step sequence M sec Ending;
step 8, determining the disjuncting arc direction of the selection procedure on the selection machine, and deleting other disjuncting arcs connected with the selection procedure nodes;
and 9, updating the node information and finishing the state conversion.
In the above technical solution, the calculation process of the instant prize collected by the sample is as follows:
step 1, calculating the maximum completion time T of the old state s
Step 2, calculating the maximum completion time T of the new state s+1
Step 3, calculating the reward value T s -T s+1
The total training flow described in P2 is as follows:
step 1, generating a main thread and T sub-threads, wherein T is 4;
step 2, copying the parameters of the main thread model as the model parameters of the sub-thread model;
step 3, each thread uses U problems to make a decision at the same time, and U is 4;
step 4, initializing a sub thread, wherein the number of training rounds is 0;
step 5, extracting a sample collection flow according to the characteristics to make a decision and collect a sample;
step 6, optimizing a main thread model parameter by using a gradient descent method according to the sample, wherein the Count is equal to Count + 1;
and 7, repeating the steps 2 to 6 until the training Count reaches the maximum training round number Tc, and ending the sub-thread, wherein Tc is 10000.
In the above technical solution, the gradient descent method has the following optimization formula:
Figure BDA0003670045170000121
Figure BDA0003670045170000122
wherein pi is an actor network, namely a graph neural network and a decision network, theta is a parameter of the actor network, v is a critic network and is used for estimating and calculating the advantage function together with the reward value,
Figure BDA0003670045170000123
is a parameter thereof;
the above technical solution describes the general framework of the present invention in a training process, and the following describes the proposed flexible workshop scheduling method based on hierarchical reinforcement learning by taking a solving process as an example, after model training is completed, through the process of solving MK 01. The method comprises the following specific steps:
step 1, loading model parameters;
step 2, converting MK01 into an extraction graph to obtain an initial state, wherein t is 0;
step 3, calculating the disjunctive graph through a graph neural network to obtain state characteristics;
step 4, inputting the state characteristics into a decision network to obtain process selection probability and select a process, and determining a processing machine according to a scheduling rule, wherein the process selection probability and the process selection probability are combined into an action;
step 5, executing state conversion to obtain a new state, wherein t is t + 1;
and 6, judging whether the decision is finished or not, returning to the step 3 if t is less than 60, finishing the decision if t is 60, and outputting a solving scheme.
The maximum completion time of the result obtained by the above solving process is 52, and the specific actions are selected as follows:
(0,5),(30,0),(1,1),(18,1),(24,5),(36,2),(48,0),(42,0),(54,5),(6,2),(12,5),(7,2),(8,0),(13,4),(25,5),(9,5),(26,5),(2,4),(3,0),(10,1),(11,0),(4,3),(27,2),(28,4),(5,2),(14,2),(15,2),(16,1),(17,4),(19,3),(20,1),(21,2),(22,4),(23,0),(49,3),(50,5),(51,2),(52,4),(53,5),(31,4),(32,3),(33,0),(34,2),(35,3),(37,1),(38,3),(39,3),(40,3),(41,0),(29,0),(43,5),(44,3),(45,0),(46,5),(47,5),(55,5),(56,5),(57,0),(58,2),(59,3)。
wherein the first value of each action is a selected process, the first process from the first workpiece is process 0, the second process from the first workpiece is process 1, and so on until the last process from the last workpiece is process 59; the second value is the processing machine selected for the process.
According to the implementation case, the training algorithm in the technical scheme can quickly and efficiently perform model training and obtain the model suitable for solving the flexible workshop scheduling problem, the layered deep reinforcement learning model in the technical scheme can quickly solve the flexible workshop scheduling problem and can obtain a good optimization result, and the trained model can be directly used for solving the flexible workshop scheduling problem in different scales and has good generalization performance.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (10)

1. A flexible job shop scheduling method based on multilayer deep reinforcement learning is characterized in that for a flexible shop scheduling problem, a deep reinforcement learning model is established and trained, the flexible shop scheduling problem is solved through the trained deep reinforcement learning model, and an optimal scheduling scheme is output; the method comprises the following two parts:
p1 deep reinforcement learning model part: the deep reinforcement learning model is used for deciding the flexible workshop scheduling problem, expressing the flexible workshop scheduling problem as an extraction graph, and solving the flexible workshop scheduling problem as an orientation process of extraction arcs; the deep learning adopts a graph neural network, and the graph is extracted as the input of the graph neural network to obtain the characteristics of the graph, so that the characteristic representation of the problem is effectively obtained; the reinforcement learning is based on a Markov decision model, a state, an action and a reward corresponding to the problem are designed, and the layered decision model makes corresponding actions according to the state characteristics; the flexible workshop scheduling problem obtains a decision scheme through a repeated decision process of the model, and the target is optimized in a mode of maximizing the reward value;
p2 training algorithm part: the method comprises the steps of training a deep reinforcement learning model by adopting a multithreading and multi-track asynchronous dominant operator _ critic algorithm, distributing tasks collected by samples to a plurality of sub-threads for carrying out decision making and sample generation independently by each sub-thread, and simultaneously deciding a plurality of problems by each sub-thread to generate a plurality of decision tracks, so that an unrelated high-quality sample optimization model is rapidly generated, a final model is rapidly obtained, and the trained model supports rapid solving of flexible workshop scheduling problems and generalization on problems of different scales; and outputting an optimal scheduling scheme of the flexible workshop through the trained deep reinforcement learning model, and handing the optimal scheduling scheme to the flexible workshop for execution.
2. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 1, wherein the specific method for obtaining the extraction map features in the P1 deep reinforcement learning model part is as follows:
step 1.1, obtaining an analytic Graph representation Graph according to a flexible workshop scheduling problem;
step 1.2, determining node information according to disjunct arcs in the disjunct graph;
and step 1.3, obtaining the Feature of the extracted graph by taking the extracted graph as the input of a graph neural network.
3. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 2, wherein the extraction map in the step 1.1 is defined as follows:
the disjunctive graph of the flexible plant scheduling problem is described as: given graph G ═ O, C, D, where O is the set of all process nodes O and two virtual process nodes S and E, which represent the start and end of the schedule, respectively; c is a connecting arc set<v,w>L V, w belongs to V, and the two processes represented by V and w belong to the same workpiece; for a compound belonging to C<v,w>The expression that the node v to the node w have a connecting arc which is a one-way arc and has s for ensuring the sequential constraint of the processing sequence of each procedure on the same workpiece tv <s tw ,s tv The machining start time of the process indicated by the node v; d is a disjuncting arc set, D ═ tone<v,w>L V, w belongs to V, each is bidirectionalThe arc extraction indicates that the connected node v and node w can be processed on the same machine; the final goal is to determine the directions of all disjunct arcs and simultaneously make the maximum completion time shortest; the number of working procedures of each workpiece in the flexible workshop scheduling problem may be different, and when the analysis graph is converted, if the number of the working procedures of the workpiece is less than the maximum number of the working procedures, a '0' working procedure node is added at the tail of the workpiece to ensure the uniformity of the graph structure, the '0' working procedure running time is not counted, and the workpiece can be processed on all machines.
4. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 2, wherein the calculation method of the node information in the step 1.2 is specifically as follows:
step 1.2.1, randomly selecting the execution time of each procedure on an executable machine as the estimated execution time of each procedure;
and 1.2.2, not considering the unoriented disjunctive arc constraint, sequentially processing each procedure according to the connection arc constraint relation and the oriented disjunctive arc relation, and calculating the completion time of each procedure as the node information.
5. The flexible job shop scheduling method based on multilayer deep reinforcement learning according to claim 2, wherein the specific method for calculating the neural network characteristics of the graph in the step 1.3 is as follows:
step 1.3.1, inputting node information and an arc relation into a neural network of a kth-level graph to calculate node representation, wherein k is 1; the node characterization calculation formula is as follows:
adopting a graph isomorphic network structure, executing K times of updating iterations to calculate p-dimensional embedding of each node V, wherein V belongs to V, and the updating of the K-th layer is expressed as:
Figure FDA0003670045160000031
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003670045160000032
is a characteristic representation of node v at layer k; MLP is a multi-layer linear model, N (v) Is the set of all nodes connected to node v;
step 1.3.2, pooling node characterization graphs to obtain the characterizations of the graphs, wherein average pooling is adopted, and k is k + 1;
step 1.3.3, step 1.3.1 and step 1.3.2 are executed in K times in a circulating manner;
and 1.3.4, performing linear transformation on the output layer to obtain the output characteristic Feature by the representation of the final graph.
6. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 2, wherein the decision making process in the P1 deep reinforcement learning model part is as follows:
step 2.1, calculating the probability of the selection process according to the obtained Feature as the input of a decision model;
2.2, selecting a process o with the maximum probability according to the obtained probability greedy;
2.3, selecting the machine m most suitable for the selected process according to the scheduling rule;
step 2.4, the selected combination of the process o and the machine m is used as an action (o, m) in the current state, the state conversion is executed to obtain a new state, the extraction graph is updated, and the new and old states and the reward value are saved as samples;
step 2.5, repeatedly executing the step 1.2 to the step 2.4 until all the process selections are finished;
step 2.6, obtaining a final decision scheme through repeated decision of the model, and strengthening learning based on a Markov decision model; the Markov decision model is as follows:
state: the method comprises the steps of obtaining a corresponding graph structure through an input test set or a training set, using machining processes of workpieces as nodes of a graph, wherein the machining sequence relation of the processes is an arc, node information comprises the completion time of the processes, the arc in the graph is a directed arc, the arc information comprises the machining sequence of the processes on a machine, namely, two process nodes connected by the arc execute a second process pointed by the arc after a first process is completed in a decision scheme. The status also includes basic problem information, including whether each workpiece can be processed on different machines and the time corresponding to the processing;
and (4) Action: defining a primary action as a process o of determining a certain workpiece and allocating a machine m for the workpiece, wherein the process o is represented as (o, m), the state is used as the input of a deep reinforcement learning model, the characteristics of the process are extracted through a graph neural network and then input into a decision model to obtain process selection probability distribution, the workpiece is selected by the obtained probability greedy and the process is determined, and a proper machine is selected for the workpiece by a relevant scheduling rule;
and (3) state conversion: updating the state according to the selected action, updating the arc relation and the node information of the graph according to the working procedure and the machine corresponding to the action, namely adding or modifying the arc in the directed graph, and updating the completion time of the working procedure to be used as a new state;
reward: and taking the difference of the maximum completion time of the corresponding schemes of the analysis graphs before and after one state conversion as the timely reward of the decision, and summing the instant rewards of each decision as the accumulated reward according to the estimated processing time.
7. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 6, wherein the specific calculation method of the scheduling rule in the step 2.3 is as follows:
step 2.3.1, determining its set of executable machines S by the selection process o m
Step 2.3.2, obtaining a value f1 as an index 1 after normalizing the time of each machine processing selected procedure in the set;
step 2.3.3, calculating a value f2 normalized by the number of the processed procedures of each machine in the set as an index 2;
step 2.3.4, adding the index 1 and the index 2 to obtain a final index (f1+ f 2);
and 2.3.5, determining a machine from the set according to the final index, wherein the selected machine is a machine with short processing time and a small number of processed procedures.
8. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 6, wherein the state transition process in the Markov decision model is specifically:
step 3.1, judging the processing feasibility of the selection process on the selection machine according to the state and the action;
step 3.2, determining and selecting a machined process sequence of a machine according to the analysis chart;
3.3, determining the processing time of the selection procedure on the selection machine;
judging whether the selection process can be inserted into a preset idle time period of the selection machine, if so, executing the step 3.4, and if not, executing the step 3.7;
step 3.4, calculating the earliest machinable time of the selection process and the idle time section of the selection machine, and determining the insertion position of the selection process in the machined process sequence
Step 3.5, modifying the arc relation of the extraction graph according to the insertion position, and deleting other extraction arcs connected with the selected process nodes;
step 3.6, updating the node information and finishing the state conversion;
step 3.7, determining the total start-up time, and adding the process to the end of the processed process sequence;
step 3.8, determining the disjuncting arc direction of the selection procedure on the selection machine, and deleting other disjuncting arcs connected with the selection procedure nodes;
and 3.9, updating the node information and finishing the state conversion.
9. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 6, wherein the computation process of the instantaneous reward and the cumulative reward in the Markov decision model is as follows:
step 4.1, calculating the maximum completion time T of the old state s
Step 4.2, calculating the maximum completion time T of the new state s+1
Step 4.3, calculating the instant prize value T s -T s+1
The calculation formula of the accumulated award is as follows:
r t =Ts t -Ts t+1
Figure FDA0003670045160000061
wherein R is the cumulative prize value, T s1 For maximum completion time, T, corresponding to the initial state send Maximum completion time for the final project due to T s1 The fixed value, which is determined based on the problem information, does not vary with the decision, so maximizing the reward value is equivalent to minimizing the maximum completion time of the final proposal.
10. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 1, wherein the training process in the P2 training algorithm part is specifically as follows:
step 5.1, generating a main thread and T sub-threads and initializing a training round number Count to be 0;
step 5.2, copying parameters of the main thread model to the sub thread model;
step 5.3, starting each sub thread, and initializing the number of training rounds;
step 5.4, generating U problems by each sub thread;
5.5, solving a flexible workshop scheduling problem through a deep reinforcement learning model by the sub-thread and generating a sample;
step 5.6, completing sample collection, and optimizing the model parameters of the main thread by using a gradient descent strategy, wherein the Count is equal to Count + 1; the optimization formula is as follows:
Figure FDA0003670045160000062
Figure FDA0003670045160000063
wherein pi is actorNetworks, namely a graph neural network and a decision network, theta is a parameter of the graph neural network, v is a critic network, and is used for estimating and calculating the advantage function together with the reward value,
Figure FDA0003670045160000071
is a parameter thereof;
step 5.7, judging whether the maximum training round number T is reached c And if the parameter is not reached, executing the step 5.4 to the step 5.6, if the parameter is reached, finishing the training, and storing the main thread model parameters.
CN202210603831.2A 2022-05-30 2022-05-30 Flexible job shop scheduling method based on multilayer deep reinforcement learning Pending CN114912826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210603831.2A CN114912826A (en) 2022-05-30 2022-05-30 Flexible job shop scheduling method based on multilayer deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210603831.2A CN114912826A (en) 2022-05-30 2022-05-30 Flexible job shop scheduling method based on multilayer deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114912826A true CN114912826A (en) 2022-08-16

Family

ID=82771105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210603831.2A Pending CN114912826A (en) 2022-05-30 2022-05-30 Flexible job shop scheduling method based on multilayer deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114912826A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293623A (en) * 2022-08-17 2022-11-04 海尔数字科技(青岛)有限公司 Training method and device for production scheduling model, electronic equipment and medium
CN116414093A (en) * 2023-04-13 2023-07-11 暨南大学 Workshop production method based on Internet of things system and reinforcement learning
CN116993028A (en) * 2023-09-27 2023-11-03 美云智数科技有限公司 Workshop scheduling method and device, storage medium and electronic equipment
CN117973635A (en) * 2024-03-28 2024-05-03 中科先进(深圳)集成技术有限公司 Decision prediction method, electronic device, and computer-readable storage medium
CN117973635B (en) * 2024-03-28 2024-06-07 中科先进(深圳)集成技术有限公司 Decision prediction method, electronic device, and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200026264A1 (en) * 2018-02-07 2020-01-23 Jiangnan University Flexible job-shop scheduling method based on limited stable matching strategy
US20210081787A1 (en) * 2019-09-12 2021-03-18 Beijing University Of Posts And Telecommunications Method and apparatus for task scheduling based on deep reinforcement learning, and device
CN112631214A (en) * 2020-11-27 2021-04-09 西南交通大学 Flexible job shop batch scheduling method based on improved invasive weed optimization algorithm
CN113792924A (en) * 2021-09-16 2021-12-14 郑州轻工业大学 Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200026264A1 (en) * 2018-02-07 2020-01-23 Jiangnan University Flexible job-shop scheduling method based on limited stable matching strategy
US20210081787A1 (en) * 2019-09-12 2021-03-18 Beijing University Of Posts And Telecommunications Method and apparatus for task scheduling based on deep reinforcement learning, and device
CN112631214A (en) * 2020-11-27 2021-04-09 西南交通大学 Flexible job shop batch scheduling method based on improved invasive weed optimization algorithm
CN113792924A (en) * 2021-09-16 2021-12-14 郑州轻工业大学 Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟彬彬;吴艳;: "面向云计算的分布式机器学习任务调度算法研究", 西安文理学院学报(自然科学版), no. 01, 15 January 2020 (2020-01-15) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293623A (en) * 2022-08-17 2022-11-04 海尔数字科技(青岛)有限公司 Training method and device for production scheduling model, electronic equipment and medium
CN116414093A (en) * 2023-04-13 2023-07-11 暨南大学 Workshop production method based on Internet of things system and reinforcement learning
CN116414093B (en) * 2023-04-13 2024-01-16 暨南大学 Workshop production method based on Internet of things system and reinforcement learning
CN116993028A (en) * 2023-09-27 2023-11-03 美云智数科技有限公司 Workshop scheduling method and device, storage medium and electronic equipment
CN116993028B (en) * 2023-09-27 2024-01-23 美云智数科技有限公司 Workshop scheduling method and device, storage medium and electronic equipment
CN117973635A (en) * 2024-03-28 2024-05-03 中科先进(深圳)集成技术有限公司 Decision prediction method, electronic device, and computer-readable storage medium
CN117973635B (en) * 2024-03-28 2024-06-07 中科先进(深圳)集成技术有限公司 Decision prediction method, electronic device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN114912826A (en) Flexible job shop scheduling method based on multilayer deep reinforcement learning
Shen et al. Mathematical modeling and multi-objective evolutionary algorithms applied to dynamic flexible job shop scheduling problems
CN112734172A (en) Hybrid flow shop scheduling method based on time sequence difference
CN110458326B (en) Mixed group intelligent optimization method for distributed blocking type pipeline scheduling
CN114707881A (en) Job shop adaptive scheduling method based on deep reinforcement learning
CN115454005A (en) Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene
CN105786610B (en) The method that computation-intensive task is unloaded into Cloud Server
CN115130789A (en) Distributed manufacturing intelligent scheduling method based on improved wolf optimization algorithm
CN114565247A (en) Workshop scheduling method, device and system based on deep reinforcement learning
CN116466659A (en) Distributed assembly flow shop scheduling method based on deep reinforcement learning
CN111353646A (en) Steel-making flexible scheduling optimization method with switching time, system, medium and equipment
CN111061565B (en) Two-section pipeline task scheduling method and system in Spark environment
CN116500986A (en) Method and system for generating priority scheduling rule of distributed job shop
CN115640898A (en) Large-scale flexible job shop scheduling method based on DDQN algorithm
CN117314055A (en) Intelligent manufacturing workshop production-transportation joint scheduling method based on reinforcement learning
CN116774657A (en) Dynamic scheduling method for remanufacturing workshop based on robust optimization
CN117057528A (en) Distributed job shop scheduling method based on end-to-end deep reinforcement learning
CN116562584A (en) Dynamic workshop scheduling method based on Conv-lasting and generalization characterization
CN110705844A (en) Robust optimization method of job shop scheduling scheme based on non-forced idle time
CN115034615A (en) Method for improving feature selection efficiency in genetic programming scheduling rule for job shop scheduling
CN115016405A (en) Process route multi-objective optimization method based on deep reinforcement learning
CN114219274A (en) Workshop scheduling method adapting to machine state based on deep reinforcement learning
CN117872999A (en) Method for solving scheduling problem of flexible job shop based on hybrid reinforcement learning
CN116500994B (en) Dynamic multi-target scheduling method for low-carbon distributed flexible job shop
CN117519030B (en) Distributed assembly blocking flow shop scheduling method based on hyper-heuristic reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination