CN111985672A - Single-piece job shop scheduling method for multi-Agent deep reinforcement learning - Google Patents
Single-piece job shop scheduling method for multi-Agent deep reinforcement learning Download PDFInfo
- Publication number
- CN111985672A CN111985672A CN202010380488.0A CN202010380488A CN111985672A CN 111985672 A CN111985672 A CN 111985672A CN 202010380488 A CN202010380488 A CN 202010380488A CN 111985672 A CN111985672 A CN 111985672A
- Authority
- CN
- China
- Prior art keywords
- action
- job shop
- probability
- scheduling
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000002787 reinforcement Effects 0.000 title claims abstract description 27
- 230000009471 action Effects 0.000 claims abstract description 70
- 230000006870 function Effects 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 230000003993 interaction Effects 0.000 claims abstract description 8
- 239000003795 chemical substances by application Substances 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 24
- 238000003062 neural network model Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 7
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 3
- 238000011156 evaluation Methods 0.000 abstract description 2
- 238000012795 verification Methods 0.000 abstract description 2
- 238000004519 manufacturing process Methods 0.000 description 8
- 238000005457 optimization Methods 0.000 description 6
- 238000010845 search algorithm Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Manufacturing & Machinery (AREA)
- Primary Health Care (AREA)
- General Factory Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Multi-Process Working Machines And Systems (AREA)
Abstract
The invention provides a single job shop scheduling method based on multi-Agent deep reinforcement learning, aiming at the characteristics that the single job shop scheduling problem is complex in constraint and various in solution space, and the traditional mathematical programming algorithm and meta heuristic algorithm cannot meet the requirement for fast solving of the large-scale job shop scheduling problem. Firstly, designing a communication mechanism among multiple agents, and performing reinforcement learning modeling on a scheduling problem of a single job shop by adopting a multi-Agent method; secondly, a deep neural network is constructed to extract the workshop state, and an operation workshop action selection mechanism is designed on the basis, so that the interaction between a workshop processing workpiece and a workshop environment is realized; thirdly, designing a reward function to evaluate the whole scheduling decision, and updating the scheduling decision by using a policy gradient algorithm to obtain a more excellent scheduling result; and finally, performing performance evaluation and verification on the performance of the algorithm by using a standard data set. The method and the system can solve the problem of job shop scheduling and enrich the method system of the job shop scheduling problem.
Description
Technical Field
The invention relates to the field of workshop scheduling, and the researched problem is the most common single-piece job workshop scheduling problem in production.
Background
The manufacturing industry is the pillar industry of China, the production links of modern manufacturing enterprises are many, the cooperation relation is complex, and reasonable production scheduling has important significance for improving the production efficiency of the enterprises, reducing the cost and shortening the production cycle. The job-shop scheduling problem (JSP) is the most common job-shop scheduling problem, and reflects the mapping relationship between the allocation of manufacturing tasks and resources under the constraints of workshop materials, processes and the like.
The JSP problem is complex to solve and is a typical NP-Hard problem. At present, common methods for solving the JSP problem include an optimization method and a meta-heuristic algorithm. The optimization method carries out modeling solution on the job shop scheduling problem through a mathematical programming method, and can be described by an integer programming method, a mixed integer programming method and a dynamic programming method respectively according to different job shop scheduling problems. The meta-heuristic algorithm can obtain a near-optimal solution of the problem in a short time through continuous iterative optimization, and can be divided into a local search algorithm, a tabu search algorithm, a simulated annealing algorithm, a genetic algorithm, a particle swarm search algorithm, an artificial neural network algorithm and the like according to different optimization strategies.
In recent years, the rise of deep reinforcement learning provides a new idea for solving the JSP problem. In 2017, Irwan uses deep reinforcement learning to train a model with decision-making capability, and successfully solves the TSP problem of the generic NP-Hard problem. The Hanjun Dai uses a deep reinforcement learning method to solve the problem of combination optimization. In 2019, Xiao et al used a deep reinforcement learning method to solve the scheduling problem of flow shop. At present, the research on JSP problem of deep reinforcement learning is still lacked in China.
Disclosure of Invention
The purpose of the invention is: the distributed reinforcement learning modeling of the single job shop scheduling problem, the scheduling decision based on the neural network and the scheduling decision optimization based on the policygadient algorithm are realized.
In order to achieve the aim, the technical scheme of the invention is to provide a single job shop scheduling method for multi-Agent deep reinforcement learning, which is characterized by comprising the following steps of:
inputting the global state S into a neural network model, outputting the probability P of each workpiece to be processed by the neural network model, and adopting a probability function P (a, S) oriented to the scheduling process of the job shop when the probability is output by the neural network modeli|θi) Indicating a status S in the work shopiProbability P, theta of executing action aiRepresents the state SiThe weight corresponding to each action is used to make the probability of selecting the workpieces which are not processed currently and the workpieces which are processed completely zero, and the following steps are provided:
in the formula (I), the compound is shown in the specification,represents the state SiThe weight corresponding to the lower action a is calculated,state SiWeight corresponding to the lower action x, x ∈ SiDenotes x as state SiAll possible actions to perform;
and 3, selecting a processing workpiece according to the workshop state extracted by the neural model:
when action selection is carried out according to the probability P, the action selection mechanism is designed by combining the action selection of the maximum probability alpha max (P) with the action selection of the action alpha random (P) according to probability distribution and adding uncertainty into the current optimal decision, the action selection mechanism has artificially set hyper-parameter c and a natural number d generated immediately, d belongs to (0,1), when d is greater than the hyper-parameter c, the workpiece with the maximum probability is selected for processing, and when d is less than the hyper-parameter c, the workpiece is selected for processing according to the probability distribution, namely:
step 4, designing a multi-Agent interaction mechanism of the job shop to realize interaction between workshop processing workpieces and workshop environment:
when Ag is presentiIn-process step Oa,b,a∈AiThen the process O is completeda,bThen, AgiLocal action set A ofiIs changed into Ai:=AiA, and Agi′(i′=γ(Oa,b+1) Is expanded as A)i′:=Ai+ a, defining the action transfer function σi:
Wherein a represents a processing step Oa,bCorresponding work, b denotes a working process Oa,bCorresponding machine tool, gamma (o)a,b) To representWorking Process Oa,bCorresponding processing time, k represents all machine tools in the scheduling problem of the job shop;
and 5, designing a reward function to evaluate the whole scheduling decision, and updating the scheduling decision by updating the weight parameter of the neural network by using a policy gradient algorithm.
Preferably, in step 1, the reward R is denoted as R (S, a, S ') indicating that performing action a in state S results in a reward value obtained in state S'.
Preferably, in step 1, the local state SiFrom AgiSet of local actions A corresponding to a workpiece waiting to be machined on a machine tooliAnd the corresponding processing time (A)i) Denotes, i.e. Si=Ai∪(Ai)。
Preferably, in step 2, the neural network model is composed of an input layer, a hidden layer, and an output layer, where:
an input layer: will the job shop state SiConversion to vector mode output SiTo the first hidden layer h1Inputting layer to hidden layer h1The tanhx activation function is adopted in the method,W1and b1Respectively representing a first hidden layer h1The weights and offsets of (c) are then:
h1=tanh(W1Si+b1)
hiding the layer: the number of nodes of the hidden layer is set to be 20, and an activation function is not used from the hidden layer to the output layer, and the method comprises the following steps:
hN=tanh(WNhN-1+bN)
in the formula, hNDenotes the Nth hidden layer, WNAnd bNRespectively represent the Nth hidden layer hNWeight and bias of;
an output layer: each node theta of the output layeraThe number of output layer nodes is set to n, corresponding to the probability that the action a of Ag is selected.
Preferably, in step 5, the policy gradient algorithm updates the policy according to the reward function J (θ) after the policy is completed, and includes:
the function J (θ) indicates that the final state S is reached when T steps are overfThe obtained weighted reward, the weighting factor gammatDepending on the time step and the discount factor gamma, GtRepresents the weighted reward found for T steps,representing the weighted average of the awards, aiming at the characteristic of awarding time sequence in the JSP problem, r (t) is always 0 in the scheduling process until the target function min (C) of the JSP problem is completed when the scheduling process is completedmax) Assigning the prize value to-CmaxAnd when γ in the formula is 1, then:
relating the reward function J (theta) to the action probability parameter thetaaDifferentiating to obtain a function gradient gaThe method comprises the following steps:
in the formula (I), the compound is shown in the specification,representing the probability θ for action aaCalculating a deviation derivative;
determining a gradient g of the functionaThen to AgiMovement ofProbability parameter thetaaThe updating is carried out by the following steps:
θa:=θa+μNga
wherein muNE, R represents the updating rate, and N represents the updating times;
for the probability parameter thetaaAnd after updating is completed, calling an Adadelta optimizer by using a back propagation principle to update parameters of the neural network weight W, and completing the updating of the whole strategy.
The invention provides a single job shop scheduling method based on multi-Agent deep reinforcement learning, aiming at the characteristics that the single job shop scheduling problem is complex in constraint and various in solution space, and the traditional mathematical programming algorithm and meta heuristic algorithm cannot meet the requirement for fast solving of the large-scale job shop scheduling problem. Firstly, designing a communication mechanism among multiple agents, and performing reinforcement learning modeling on a scheduling problem of a job shop by adopting a multiple Agent method; secondly, a deep neural network is constructed to extract the workshop state, and an operation workshop action selection mechanism is designed on the basis, so that the interaction between a workshop processing workpiece and a workshop environment is realized; thirdly, designing a reward function to evaluate the whole scheduling decision, and updating the scheduling decision by using a policy gradient algorithm to obtain a more excellent scheduling result; and finally, performing performance evaluation and verification on the performance of the algorithm by using a standard data set. The method and the system can solve the problem of scheduling of a single job shop and enrich the method system of the problem of scheduling of the job shop.
Drawings
FIG. 1 is a multi-Agent reinforcement learning model;
FIG. 2 is a deep neural network model structure;
FIG. 3 is a Policy Gradient flow chart;
FIG. 4 is a graph of FT06 objective function;
FIG. 5 is an optimal solution to the FT06 problem;
FIG. 6 is a flow chart of the present invention.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The invention provides a single job shop scheduling method for multi-Agent deep reinforcement learning, which comprises the following steps:
As shown in fig. 1, the multi-Agent reinforcement learning model of the present invention includes the following contents:
Ag={Ag1,L,Agi,L,Agmin which Ag isiFor the ith machine tool, i 1.
S={S1,L,Si,L,SmIs all AgiLocal state S ofiThe global state of the composition.
A={A1,L,Ai,L,AmIs all AgiLocal action set A ofiConstructed global action set, aiRepresents AgiThe action performed at the current time.
P is the probability transition matrix in state S, and the transition function P (S '| S, a) represents the probability that executing action a in state S results in state S'.
R is the reward function and R (S, a, S ') represents the reward value obtained by performing action a in state S resulting in state S'.
Gamma is a discount factor, and gamma belongs to [0,1 ].
Factorizing global state S into m Ag in the multi-Agent reinforcement learning job shop scheduling processiLocal state S ofiSequentially inputting into the multi-Agent reinforcement learning system and outputting AgiCurrently performed action aiChange global state S, obtain reward R, repeat the process until all AgiAnd finishing the processing task. Wherein the local state SiFrom AgiSet of local actions A corresponding to a workpiece waiting to be machined on a machine tooliAnd the corresponding processing time (A)i) Denotes, i.e. Si=Ai∪(Ai)。
And 2, constructing a neural network model and extracting the workshop state.
In this embodiment, the neural network model is composed of an input layer, a hidden layer, and an output layer.
An input layer: the number of nodes of the input layer is set to 10, and the state S of the job shop is setiConversion to vector mode output SiTo the first hidden layer h1Inputting layer to hidden layer h1The tanhx activation function is adopted in the method, W1and b1Respectively representing a first hidden layer h1The weights and offsets of (c) are then:
h1=tanh(W1Si+b1)
hiding the layer: the number of nodes of the hidden layer is set to be 20, and an activation function is not used from the hidden layer to the output layer, and the method comprises the following steps:
hN=tanh(WNhN-1+bN)
in the formula, hNDenotes the Nth hidden layer, WNAnd bNRespectively represent the Nth hidden layer hNWeight and bias of
An output layer: each node theta of the output layeraThe number of output layer nodes is set to n, corresponding to the probability that the action a of Ag is selected.
Represents the state SiThe weight corresponding to the lower action a is calculated,state SiWeight corresponding to the lower action x, x ∈ SiDenotes x as state SiAll possible actions to be performed.
In this embodiment, the global state S is input into the neural network model, the neural network model outputs the probability P of each workpiece being processed, a tanhx activation function is used between the input layer and the hidden layer,when the neural network model outputs the probability, the finished workpieces and the workpieces which can not be machined currently are prevented from being selected in the scheduling process of the job shop, and the sum of the probabilities is ensuredThe invention designs a probability function P ═ f (a, S) for the scheduling process of a job shop based on a Softmax functioni|θi) Indicating a status S in the work shopiProbability P, theta of executing action aiRepresents the state SiAnd the weight corresponding to each action is used for enabling the probability of selecting the workpieces which are not machined currently and the workpieces which are machined completely to be zero.
In the formula (I), the compound is shown in the specification,represents the state SiThe weight corresponding to the lower action a is calculated,state SiWeight corresponding to the lower action x, x ∈ SiDenotes x as state SiAll possible actions to be performed.
And 3, selecting the machined workpiece according to the workshop state extracted by the neural model.
When selecting an action according to the probability P, in order to ensure that the scheduling strategy can converge and has the capability of jumping out of a local optimal solution, the action selecting mechanism is designed by combining the action selecting with the maximum probability alpha max (P) and the action selecting with the probability distribution alpha random (P), and adding uncertainty into the current optimal decision. The action selection mechanism has a manually set hyper-parameter c and a randomly generated natural number d, d belongs to (0,1), when d is greater than the hyper-parameter c, a workpiece with the maximum probability is selected for processing, and when d is less than the hyper-parameter c, the processed workpiece is selected according to probability distribution, namely:
and 4, designing a multi-Agent interaction mechanism of the job shop to realize interaction between the workshop processing workpiece and the workshop environment.
State S in job shop schedulingiSet of follow-up local actions AiChange, whereby the local state S is visibleiSet of follow-up local actions AiThe invention therefore establishes a communication mechanism between agents by defining an action transfer function. When Ag is presentiIn-process step Oa,b,a∈AiThen the process O is completeda,bThen, AgiLocal action set A ofiIs changed into Ai:=AiA, and Agi′(i′=γ(Oa,b+1) Is expanded as A)i′:=Ai+ a. Defining an action transfer function sigma by this inventioni:
Wherein a represents a processing step Oa,bCorresponding work, b denotes a working process Oa,bCorresponding machine tool, gamma (o)a,b) Shows a working Process Oa,bCorresponding processing time, k represents all machine tools in the scheduling problem of the job shop;
and 5, designing a reward function to evaluate the whole scheduling decision, and updating the scheduling decision by updating the weight parameter of the neural network by using a policy gradient algorithm. As shown in fig. 3, the core of policygidi algorithm is to update the policy according to the reward function J (θ) after the policy is completed, which includes:
the function J (θ) indicates that the final state S is reached when T steps are overfThe obtained weighted reward, the weighting factor gammatDepending on the time step and the discount factor gamma, GtRepresents the weighted reward found for T steps,indicating a weighted average of their rewards. Aiming at the characteristic of rewarding time sequence in the JSP problem, r (t) is always 0 in the scheduling process until the JSP problem target function min (C) is used when the scheduling process is finishedmax) Assigning the prize value to-CmaxAnd γ ═ 1, then:
the criterion of fastest expected revenue reduction is followed when the strategy is updated, so that a return function J (theta) is added to an action probability parameter thetaaDifferentiating to obtain a function gradient gaThe method comprises the following steps:
in the formula (I), the compound is shown in the specification,representing the probability θ for action aaCalculating the partial derivative to obtain the function gradient gaThen to AgiMotion probability parameter θaThe updating is carried out by the following steps:
θa:=θa+μNga
wherein muNE R represents the update rate and N represents the number of updates.
For the probability parameter thetaaAnd after updating is completed, calling an Adadelta optimizer by using a back propagation principle to update parameters of the neural network weight W, and completing the updating of the whole strategy.
Claims (5)
1. A single job shop scheduling method for multi-Agent deep reinforcement learning is characterized by comprising the following steps:
step 1, performing distributed modeling on a job shop scheduling environment by adopting a multi-Agent method;
factorizing global state S into m Ag in the multi-Agent reinforcement learning job shop scheduling processiLocal state S ofiSequentially inputting into the multi-Agent reinforcement learning system and outputting AgiCurrently performed action aiChange global state S, obtain reward R, repeat the process until all AgiCompleting the processing task, wherein, AgiCorresponding to the ith machine tool, i is 1, m, m is the total number of the machine tools, SiIs AgiS ═ S1,…,Si,…,Sm},AiIs the local action set of Ag …;
step 2, constructing a neural network model and extracting a workshop state;
inputting the global state S into a neural network model, outputting the probability P of each workpiece to be processed by the neural network model, and adopting a probability function P (a, S) oriented to the scheduling process of the job shop when the probability is output by the neural network modeli|θi) Indicating a status S in the work shopiProbability P, theta of executing action aiRepresents the state SiThe weight corresponding to each action is used to make the probability of selecting the workpieces which are not processed currently and the workpieces which are processed completely zero, and the following steps are provided:
in the formula (I), the compound is shown in the specification,represents the state SiThe weight corresponding to the lower action a is calculated,state SiWeight corresponding to the lower action x, x ∈ SiDenotes x as state SiAll possible actions to perform;
and 3, selecting a processing workpiece according to the workshop state extracted by the neural model:
when action selection is carried out according to the probability P, the action selection mechanism is designed by combining the action selection of the maximum probability alpha max (P) with the action selection of the action alpha random (P) according to probability distribution and adding uncertainty into the current optimal decision, the action selection mechanism has artificially set hyper-parameter c and a natural number d generated immediately, d belongs to (0,1), when d is greater than the hyper-parameter c, the workpiece with the maximum probability is selected for processing, and when d is less than the hyper-parameter c, the workpiece is selected for processing according to the probability distribution, namely:
step 4, designing a multi-Agent interaction mechanism of the job shop to realize interaction between workshop processing workpieces and workshop environment:
when Ag is presentiIn-process step Oa,b,a∈AiThen the process O is completeda,bThen, AgiLocal action set A ofiIs changed into Ai:=AiA, and Agi′(i′=γ(Oa,b+1) Is expanded as A)i′:=Ai+ a, defining the action transfer function σi:
Wherein a represents a processing step Oa,bCorresponding work, b denotes a working process Oa,bCorresponding machine tool, gamma (o)a,b) Shows a working Process Oa,bCorresponding processing time, k represents all machine tools in the scheduling problem of the job shop;
and 5, designing a reward function to evaluate the whole scheduling decision, and updating the scheduling decision by updating the weight parameter of the neural network by using a policy gradient algorithm.
2. The single job shop scheduling method of multi-Agent deep reinforcement learning according to claim 1, wherein in step 1, the reward R is represented as R (S, a, S ') indicating that the reward value obtained in state S' is obtained by executing action a in state S.
3. The single-piece job shop scheduling method for multi-Agent deep reinforcement learning according to claim 1, wherein in step 1, the local state S isiFrom AgiSet of local actions A corresponding to a workpiece waiting to be machined on a machine tooliAnd the corresponding processing time (A)i) Denotes, i.e. Si=Ai∪(Ai)。
4. The single-piece job shop scheduling method of multi-Agent deep reinforcement learning according to claim 1, wherein in step 2, the neural network model is composed of an input layer, a hidden layer and an output layer, wherein:
an input layer: will the job shop state SiConversion to vector mode output SiTo the first hidden layer h1Inputting layer to hidden layer h1The tanhx activation function is adopted in the method,W1and b1Respectively representing a first hidden layer h1The weight and the threshold value of (c) are:
h1=tanh(W1Si+b1)
hiding the layer: the number of nodes of the hidden layer is set to be 20, and an activation function is not used from the hidden layer to the output layer, and the method comprises the following steps:
hN=tanh(WNhN-1+bN)
in the formula, hNDenotes the Nth hidden layer, WNAnd bNRespectively represent the Nth hidden layer hNThe weight and threshold of;
an output layer: each node theta of the output layeraThe number of output layer nodes is set to n, corresponding to the probability that the action a of Ag is selected.
5. The single job shop scheduling method for multi-Agent deep reinforcement learning according to claim 4, wherein in step 5, the policygadient algorithm updates the policy according to the return function J (θ) after the policy is completed, and comprises:
the function J (θ) indicates that the final state S is reached when T steps are overfThe obtained weighted reward, the weighting factor gamma t depending on the time step and the discount factor gamma, GtRepresents the weighted reward found for T steps,indicating a weighted average of their rewards. Aiming at the characteristic of rewarding time sequence in the JSP problem, r (t) is always 0 in the scheduling process until the JSP problem target function min (C) is used when the scheduling process is finishedmax) Assigning the prize value to-CmaxAnd when γ is 1, then:
relating the reward function J (theta) to the action probability parameter thetaaDifferentiating to obtain a function gradient gaThe method comprises the following steps:
in the formula (I), the compound is shown in the specification,representing the probability θ for action aaCalculating the partial derivative to obtain the function gradient gaThen to AgiMotion probability parameter θaThe updating is carried out by the following steps:
θa:=θa+μNga
wherein muNE, R represents the updating rate, and N represents the updating times;
for the probability parameter thetaaAnd after updating is completed, calling an Adadelta optimizer by using a back propagation principle to update parameters of the neural network weight W, and completing the updating of the whole strategy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010380488.0A CN111985672B (en) | 2020-05-08 | 2020-05-08 | Single-piece job shop scheduling method for multi-Agent deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010380488.0A CN111985672B (en) | 2020-05-08 | 2020-05-08 | Single-piece job shop scheduling method for multi-Agent deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111985672A true CN111985672A (en) | 2020-11-24 |
CN111985672B CN111985672B (en) | 2021-08-27 |
Family
ID=73441772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010380488.0A Active CN111985672B (en) | 2020-05-08 | 2020-05-08 | Single-piece job shop scheduling method for multi-Agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111985672B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112150088A (en) * | 2020-11-26 | 2020-12-29 | 深圳市万邑通信息科技有限公司 | Huff-puff flexible intelligent assembly logistics path planning method and system |
CN112598309A (en) * | 2020-12-29 | 2021-04-02 | 浙江工业大学 | Job shop scheduling method based on Keras |
CN112700099A (en) * | 2020-12-24 | 2021-04-23 | 亿景智联(北京)科技有限公司 | Resource scheduling planning method based on reinforcement learning and operation research |
CN112884239A (en) * | 2021-03-12 | 2021-06-01 | 重庆大学 | Aerospace detonator production scheduling method based on deep reinforcement learning |
CN113093673A (en) * | 2021-03-31 | 2021-07-09 | 南京大学 | Method for optimizing workshop operation schedule by using mean field action value learning |
CN113222253A (en) * | 2021-05-13 | 2021-08-06 | 珠海埃克斯智能科技有限公司 | Scheduling optimization method, device and equipment and computer readable storage medium |
CN113361915A (en) * | 2021-06-04 | 2021-09-07 | 聪明工厂有限公司 | Flexible job shop scheduling method based on deep reinforcement learning and multi-agent graph |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571570A (en) * | 2011-12-27 | 2012-07-11 | 广东电网公司电力科学研究院 | Network flow load balancing control method based on reinforcement learning |
CN103248693A (en) * | 2013-05-03 | 2013-08-14 | 东南大学 | Large-scale self-adaptive composite service optimization method based on multi-agent reinforced learning |
CN108282587A (en) * | 2018-01-19 | 2018-07-13 | 重庆邮电大学 | Mobile customer service dialogue management method under being oriented to strategy based on status tracking |
CN108573303A (en) * | 2018-04-25 | 2018-09-25 | 北京航空航天大学 | It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly |
CN110084375A (en) * | 2019-04-26 | 2019-08-02 | 东南大学 | A kind of hierarchy division frame based on deeply study |
CN110648049A (en) * | 2019-08-21 | 2020-01-03 | 北京大学 | Multi-agent-based resource allocation method and system |
CN110691422A (en) * | 2019-10-06 | 2020-01-14 | 湖北工业大学 | Multi-channel intelligent access method based on deep reinforcement learning |
CN110991972A (en) * | 2019-12-14 | 2020-04-10 | 中国科学院深圳先进技术研究院 | Cargo transportation system based on multi-agent reinforcement learning |
-
2020
- 2020-05-08 CN CN202010380488.0A patent/CN111985672B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571570A (en) * | 2011-12-27 | 2012-07-11 | 广东电网公司电力科学研究院 | Network flow load balancing control method based on reinforcement learning |
CN103248693A (en) * | 2013-05-03 | 2013-08-14 | 东南大学 | Large-scale self-adaptive composite service optimization method based on multi-agent reinforced learning |
CN108282587A (en) * | 2018-01-19 | 2018-07-13 | 重庆邮电大学 | Mobile customer service dialogue management method under being oriented to strategy based on status tracking |
CN108573303A (en) * | 2018-04-25 | 2018-09-25 | 北京航空航天大学 | It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly |
CN110084375A (en) * | 2019-04-26 | 2019-08-02 | 东南大学 | A kind of hierarchy division frame based on deeply study |
CN110648049A (en) * | 2019-08-21 | 2020-01-03 | 北京大学 | Multi-agent-based resource allocation method and system |
CN110691422A (en) * | 2019-10-06 | 2020-01-14 | 湖北工业大学 | Multi-channel intelligent access method based on deep reinforcement learning |
CN110991972A (en) * | 2019-12-14 | 2020-04-10 | 中国科学院深圳先进技术研究院 | Cargo transportation system based on multi-agent reinforcement learning |
Non-Patent Citations (1)
Title |
---|
吉靖: ""局部感知情形下的车间调度建模与优化"", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112150088A (en) * | 2020-11-26 | 2020-12-29 | 深圳市万邑通信息科技有限公司 | Huff-puff flexible intelligent assembly logistics path planning method and system |
CN112700099A (en) * | 2020-12-24 | 2021-04-23 | 亿景智联(北京)科技有限公司 | Resource scheduling planning method based on reinforcement learning and operation research |
CN112598309A (en) * | 2020-12-29 | 2021-04-02 | 浙江工业大学 | Job shop scheduling method based on Keras |
CN112598309B (en) * | 2020-12-29 | 2022-04-19 | 浙江工业大学 | Job shop scheduling method based on Keras |
CN112884239A (en) * | 2021-03-12 | 2021-06-01 | 重庆大学 | Aerospace detonator production scheduling method based on deep reinforcement learning |
CN112884239B (en) * | 2021-03-12 | 2023-12-19 | 重庆大学 | Space detonator production scheduling method based on deep reinforcement learning |
CN113093673A (en) * | 2021-03-31 | 2021-07-09 | 南京大学 | Method for optimizing workshop operation schedule by using mean field action value learning |
CN113222253A (en) * | 2021-05-13 | 2021-08-06 | 珠海埃克斯智能科技有限公司 | Scheduling optimization method, device and equipment and computer readable storage medium |
CN113222253B (en) * | 2021-05-13 | 2022-09-30 | 珠海埃克斯智能科技有限公司 | Scheduling optimization method, device, equipment and computer readable storage medium |
CN113361915A (en) * | 2021-06-04 | 2021-09-07 | 聪明工厂有限公司 | Flexible job shop scheduling method based on deep reinforcement learning and multi-agent graph |
Also Published As
Publication number | Publication date |
---|---|
CN111985672B (en) | 2021-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111985672B (en) | Single-piece job shop scheduling method for multi-Agent deep reinforcement learning | |
CN104268722B (en) | Dynamic flexible job-shop scheduling method based on multi-objective Evolutionary Algorithm | |
CN108846570B (en) | Method for solving resource-limited project scheduling problem | |
CN113792924A (en) | Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network | |
CN113011612B (en) | Production and maintenance scheduling method and system based on improved wolf algorithm | |
Ueda et al. | An emergent synthesis approach to simultaneous process planning and scheduling | |
CN115454005A (en) | Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene | |
CN115130789A (en) | Distributed manufacturing intelligent scheduling method based on improved wolf optimization algorithm | |
CN112348314A (en) | Distributed flexible workshop scheduling method and system with crane | |
Cao et al. | An adaptive multi-strategy artificial bee colony algorithm for integrated process planning and scheduling | |
Xue et al. | Estimation of distribution evolution memetic algorithm for the unrelated parallel-machine green scheduling problem | |
Bekker | Applying the cross-entropy method in multi-objective optimisation of dynamic stochastic systems | |
CN113139747A (en) | Method for reordering coating of work returning vehicle based on deep reinforcement learning | |
CN113406939A (en) | Unrelated parallel machine dynamic hybrid flow shop scheduling method based on deep Q network | |
Li et al. | An improved whale optimisation algorithm for distributed assembly flow shop with crane transportation | |
CN112488543B (en) | Intelligent work site intelligent scheduling method and system based on machine learning | |
Yan et al. | A job shop scheduling approach based on simulation optimization | |
Kim | Permutation-based elitist genetic algorithm using serial scheme for large-sized resource-constrained project scheduling | |
Nugraheni et al. | Hybrid Metaheuristics for Job Shop Scheduling Problems. | |
Han et al. | Research on optimization method of routing buffer linkage based on Q-learning | |
CN112734286B (en) | Workshop scheduling method based on multi-strategy deep reinforcement learning | |
Fujii et al. | Integration of process planning and scheduling using multi-agent learning | |
CN117950379A (en) | Intelligent workshop real-time rescheduling method based on deep circulation Q network | |
CN117215275B (en) | Large-scale dynamic double-effect scheduling method for flexible workshop based on genetic programming | |
CN116384602A (en) | Multi-target vehicle path optimization method, system, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |