CN116500994B - Dynamic multi-target scheduling method for low-carbon distributed flexible job shop - Google Patents

Dynamic multi-target scheduling method for low-carbon distributed flexible job shop Download PDF

Info

Publication number
CN116500994B
CN116500994B CN202310494027.XA CN202310494027A CN116500994B CN 116500994 B CN116500994 B CN 116500994B CN 202310494027 A CN202310494027 A CN 202310494027A CN 116500994 B CN116500994 B CN 116500994B
Authority
CN
China
Prior art keywords
workpiece
point
scheduling
rescheduling
energy consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310494027.XA
Other languages
Chinese (zh)
Other versions
CN116500994A (en
Inventor
陈光柱
陈懿
廖晓鹃
侯英杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Univeristy of Technology
Original Assignee
Chengdu Univeristy of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Univeristy of Technology filed Critical Chengdu Univeristy of Technology
Priority to CN202310494027.XA priority Critical patent/CN116500994B/en
Publication of CN116500994A publication Critical patent/CN116500994A/en
Application granted granted Critical
Publication of CN116500994B publication Critical patent/CN116500994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/41865Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by job scheduling, process planning, material flow
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32252Scheduling production, machining, job shop

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • General Factory Administration (AREA)

Abstract

The invention discloses a dynamic multi-target scheduling method for a low-carbon distributed flexible job shop, and belongs to the field of shop scheduling. Establishing a dynamic multi-target scheduling planning model of the low-carbon distributed flexible job shop according to the low-carbon job scheduling requirement of the low-carbon distributed flexible job shop; constructing a state space of a low-carbon distributed flexible job shop, designing an action space of a composite scheduling rule, and providing an instant rewarding function and a round rewarding function; providing a Rainbow DQN deep reinforcement learning algorithm, and solving a dynamic multi-target scheduling planning model of the low-carbon distributed flexible job shop; the Rainbow agent constantly interacts with the scheduling environment to obtain a better scheduling rule at the scheduling point. By the method, the decision efficiency of manufacturing enterprises is improved, the better solution can be adaptively and rapidly generated, the loss caused by delay time is effectively reduced, the energy consumption is reduced, and the method has robustness and generalization.

Description

Dynamic multi-target scheduling method for low-carbon distributed flexible job shop
Technical Field
The invention relates to the field of workshop scheduling, in particular to a dynamic multi-target scheduling method of a low-carbon distributed flexible job workshop based on deep reinforcement learning.
Background
In recent years, with the development of globalization and science and technology, numerous manufacturing enterprises gradually change from a traditional single job shop mode to a distributed job shop mode, thereby reducing labor and raw material costs and improving production efficiency. Compared with the traditional flexible job shop scheduling problem, the distributed flexible job shop breaks through the limitation of job shop uniqueness. Each workpiece may be transported to multiple job shops at different locations, and each process may be assigned to a different facility for processing. Thus, the distributed production mode is more suitable for the actual production environment. Because the distributed flexible job shops face more complicated and diversified emergency events, when dynamic scheduling is needed in emergency, rescheduling from zero consumes time, requires stronger expert experience, and is difficult to meet the requirement of a real-time production environment with higher scheduling quality. In addition, the low-carbon manufacturing mode is a new scheduling mode, and the problem is increasingly focused by academia and engineering due to the increase of energy cost and the aggravation of environmental pollution.
Flexible job shop scheduling is an NP-Hard problem, the extended problem is more complex, and the scheduling objective is shifted from solution optimality to fast rationality due to the uncertainty of the scheduling process. The low-carbon distributed flexible job shop has the characteristics of multiple constraint conditions, complex and changeable environment and strong dynamic property. Job shop scheduling strategies traditionally emphasize economic factors such as time to finish and equipment utilization, while ignoring energy and environmental factors that result in energy consumption during processing and transportation. In the field of distributed job shop scheduling, most conventional scheduling models do not allow workpieces to move between workshops. It is noted that, the scheduling algorithm widely used by the manufacturing enterprises is a heuristic algorithm, and although the heuristic algorithm has a rapid scheduling speed, the scheduling effect decreases with the increase of the scheduling scale.
Disclosure of Invention
Therefore, aiming at the defects or improvement demands of the prior art, the invention provides a dynamic multi-target scheduling method of a low-carbon distributed flexible job shop based on deep reinforcement learning, which aims at continuously interacting with a scheduling environment by a Rainbow agent under a Rainbow DQN framework to obtain a better scheduling rule of each rescheduling point or decision point; and an offline training scheduling strategy is adopted, so that a new scheduling problem is quickly solved on line, time is consumed in the training process, and a better solution is generated in a self-adaptive manner in the application process.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
A dynamic multi-target scheduling method for a low-carbon distributed flexible job shop comprises the following steps:
s1, establishing a dynamic multi-target scheduling planning model of a low-carbon distributed flexible job shop according to low-carbon job scheduling requirements of the low-carbon distributed flexible job shop;
S2, constructing a state space of a low-carbon distributed flexible job shop, designing an action space of a compound scheduling rule, and providing an instant rewarding function and a round rewarding function;
And S3, providing a Rainbow DQN deep reinforcement learning algorithm, and solving a dynamic multi-target scheduling planning model of the low-carbon distributed flexible job shop.
Specifically, in the step S1, the dynamic multi-objective scheduling planning model of the low-carbon distributed flexible job shop is composed of a multi-objective function and a series of constraint conditions:
The multi-objective function includes calculating a total delay time function and a total energy consumption function. The total delay time function is calculated from the workpiece cutoff time, the time required to complete the workpiece process, and the decision variables. The total energy consumption function is calculated from the process energy consumption, the equipment idle energy consumption and the transportation energy consumption.
The series constraint is set as follows:
(1) Defining a process can only be performed on one piece of equipment in one plant;
(2) Each process can be processed only after the process is reached;
(3) The completion time of the procedure is equal to the start time plus the processing time;
(4) The process of each workpiece must follow a front-to-back priority order;
(5) If the process of processing different workpieces on one device is to be performed, the process must be performed sequentially;
(6) The transportation time between the workshop and the equipment is not considered at the same time, and the transportation time between the equipment is not considered when the transportation time of the workshop is considered;
Specifically, in the step S2, a state space of the low-carbon distributed flexible job shop is constructed, an action space of the composite scheduling rule is designed, and an instant rewarding function and a round rewarding function are provided. The state space comprehensively reflects the production state of the rescheduling point or the decision point and contains the state information of 19 low-carbon distributed flexible job shops. The state space comprises an expected delay rate, an actual delay rate, an expected weighted delay rate, an average utilization rate of all equipment in all job shops, a standard deviation of the utilization rate of the equipment, an average completion rate of all workpieces, an average completion rate of all working procedures, a standard deviation of the completion rate of all workpieces, an energy consumption index of all completed working procedures and a simplified completion time of the last working procedure processed on the equipment at the rescheduling point. Based on the state space, 7 workpiece selection rules and 6 equipment allocation rules are set, and then a total of 42 compound scheduling rules are obtained through Cartesian products. The first 10 rules with the best average result are selected as the action space. The instant prize function includes an economic indicator, an energy consumption indicator, and an equipment indicator. The economic index is calculated according to the actual delay rate, the predicted weighted delay rate, the predicted delay rate, the average utilization rate of equipment, the minimum delay time and the current delay time. The energy consumption index is calculated from the lowest total energy consumption and the current total energy consumption. And the rewarding function of the equipment index is to give negative rewards to the simplification completion time of the last procedure on all the equipment, and feed back the negative rewards to the intelligent agent, so that the intelligent agent converges more quickly, and a better convergence effect is achieved. The round rewarding function generates a negative value, and the low-carbon distributed flexible job shop calculates the total delay time and the total energy consumption of each round of training, and the larger the two values are, the larger the punishment that the scheduling environment feeds back to the intelligent agent is.
Specifically, step S3 proposes a Rainbow DQN deep reinforcement learning algorithm, and solves a dynamic multi-objective scheduling planning model of a low-carbon distributed flexible job shop; the Rainbow DQN deep reinforcement learning algorithm comprises a Rainbow intelligent body and a scheduling environment of a low-carbon distributed flexible job shop; the interaction of the Rainbow agent with the scheduling environment of the low-carbon distributed flexible job shop is a discrete-time markov decision process model.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention provides a mathematical model of a low-carbon distributed flexible job shop, aims at the real shop scheduling problem, makes up for some defects in the scheduling field according to the characteristics of the shop, and expands the scheduling research field of the flexible job shop in practical sense;
(2) The Rainbow DQN algorithm used in the invention selects a composite scheduling rule according to the current state information and the future state information of the low-carbon distributed flexible job shop, and balances two optimization targets according to constraint conditions of the optimization targets;
(3) Compared with a single composite scheduling rule and standard DQN, the Rainbow DQN provided by the invention can sense the state of a low-carbon distributed flexible job shop at a rescheduling point or a decision point, and a better scheduling rule is selected to meet the production efficiency and low energy consumption;
(4) The invention discloses a low-carbon distributed flexible job shop scheduling method based on deep reinforcement learning, which comprises three parts of simulation environment, offline training and actual application, wherein a deep reinforcement learning agent in the offline training interacts with the scheduling environment and trains the agent to learn scheduling knowledge from the interactive composite scheduling rules and state information; the intelligent agent in the practical application directly utilizes the scheduling knowledge saved in the offline training and provides a quick and reasonable scheduling scheme for the new scheduling instance from the scheduling environment.
Drawings
FIG. 1 is a diagram of a Rainbow DQN architecture in a low carbon distributed flexible job shop
FIG. 2 is a parameter of a mathematical model
FIG. 3 is a diagram of the relevant parameters of a formula in the state space
FIG. 4 is a training effect of total delay time
FIG. 5 is a training effect of total energy consumption
Detailed Description
The above-described aspects are further described below with reference to the drawings and examples. Embodiments of the present invention include, but are not limited to, the following examples.
The implementation of the invention mainly comprises the steps of establishing a dynamic multi-objective scheduling planning model of a low-carbon distributed flexible job shop, establishing a state space of the low-carbon distributed flexible job shop, designing an action space of a composite scheduling rule, providing an instant rewarding function and a round rewarding function, and providing a Rainbow DQN deep reinforcement learning algorithm to solve the model. The method comprises the following specific steps:
S1, according to low-carbon job scheduling requirements of a low-carbon distributed flexible job shop, establishing a multi-objective function and a series constraint condition of a dynamic multi-objective scheduling planning model of the low-carbon distributed flexible job shop, wherein the main purposes are to reduce total delay time of a processing process and reduce total energy consumption. This step includes establishing multiple objective functions and establishing a series of constraints, fig. 2 containing parameters of the mathematical model:
(1) The multi-objective function includes calculating a total delay time function and a total energy consumption function:
The total delay time function TT is calculated by the cut-off time D i of the workpieces J i and the completion time CT i of the workpieces J i, wherein N is the total number of the workpieces;
The total energy consumption function TE is calculated by the equipment processing energy consumption, the equipment idle energy consumption and the workpiece transportation energy consumption;
TE=procE+idleE+transE
Wherein procE represents computing equipment processing energy consumption, idleE represents computing equipment idle energy consumption, and transE represents computing workpiece transportation energy consumption;
procE is represented as:
Wherein, Representing the processing time of device k in plant f to perform process O ij, and pe fk represents the unit processing energy consumption of device k in plant f,/>A 0-1 decision variable indicating whether or not the process O ij is performed on the equipment k in the shop F, F being the total number of the shop, M f being the total number of the equipment in the shop F, n i being the total number of the processes of the work J i;
idleE is represented as:
Wherein idleE denotes a computing device idle energy consumption, ie fk denotes a unit idle energy consumption of a device k in the plant f, S g,h denotes a start time of the process O gh, C i,j denotes an end time of the process O ij, y ij,gh denotes a 0-1 decision variable for determining whether a post process of the process O ij is O gh, A 0-1 decision variable indicating whether or not the process O gh is performed on the equipment k in the shop f, n i is the total number of processes for the workpiece J i, and n g is the total number of processes for the workpiece J g;
transE is represented as:
Wherein transE denotes the calculated work transport energy consumption, te denotes the unit transport energy consumption between workshops/facilities, transF fu denotes the transport time of the work from workshop f to workshop u, transM lk denotes the transport time of the work from facility l to facility k, A0-1 decision variable representing a determination of whether a workpiece is transported from shop floor f to shop floor u,/>A 0-1 decision variable representing a decision to determine whether a workpiece is transported from device l to device k in the same shop;
(2) The series of constraints include:
① Defining a process can only be performed on one piece of equipment in one plant;
② Each process can only be processed after reaching, and A i represents the reaching time of the workpiece J i;
③ The completion time C i,j of process O ij must be equal to the start time S i,j plus the processing time
④ The process of each workpiece must follow a front-to-back priority order;
⑤ If the process of processing different workpieces on one device is to be performed, the process must be performed sequentially;
⑥ The transportation time between the workshop and the equipment is not considered at the same time, and the transportation time between the equipment is not considered when the transportation time of the workshop is considered.
S2, constructing a state space of the low-carbon distributed flexible job shop, designing an action space of a compound scheduling rule, and providing an instant rewarding function and a round rewarding function.
(1) State space of low-carbon distributed flexible job shop (reference is made to FIGS. 2 and 3 for details of parameters):
The state space of the low-carbon distributed flexible job shop comprehensively reflects the production state of the rescheduling point or the decision point, and contains 19 pieces of state information of the low-carbon distributed flexible job shop. The state space of the low-carbon distributed flexible job shop includes the predicted delay rate, the actual delay rate, the predicted weighted delay rate, the average utilization rate of all devices in all job shops, the standard deviation of the device utilization rate, the average completion rate of all workpieces, the average completion rate of all processes, the standard deviation of the completion rate of all workpieces, the energy consumption index of all completed processes, and the simplified completion time of the last process processed on the device at the rescheduling point or decision point.
① Predicted delay rate Tard e (t):
wherein TardJ e (t) represents a workpiece set predicted to be delayed at a rescheduling point or a decision point t, ucompJ (t) represents a workpiece set not completed in processing at the rescheduling point or the decision point t, NPO i(t)<ni and EDT i (t) >0 determine predicted delay conditions of the workpiece at the rescheduling point or the decision point t, NPO i (t) represents the number of processes completed by the workpiece J i at the rescheduling point or the decision point t, and EDT i (t) represents predicted delay time of the workpiece J i at the rescheduling point or the decision point t;
② Actual delay rate Tard a (t):
Wherein TardJ a (t) represents the set of workpieces actually delayed at the rescheduling point or decision point t, and is represented by NPO i(t)<ni and Judging the actual delay condition of the workpiece at a rescheduling point or a decision point,/>Indicating the completion time of the machined process of the workpiece J i;
③ The predicted weighted delay rate WTard e (t):
Wherein, Representing the predicted time required to machine the remainder of the workpiece J i at the rescheduling or decision point t,/>Representing the average processing time of the processing procedures O ij on all available equipment in all workshops;
Wherein, Representing the remaining estimated transit time of the workpiece J i at the rescheduling or decision point,/>Indicating the average transport time of the equipment where the work J i has completed the process to the equipment where the next process is processed, F i,j indicating the set of workplaces where the process O ij is possible;
④ Average utilization UR ave (t) for all devices in all workshops at rescheduling point or decision point t:
Wherein, Representing the device utilization of device k in plant f at rescheduling or decision point t,/>The completion time of the last procedure on the equipment k in the workshop f is represented;
⑤ Standard deviation UR std (t) of device utilization for rescheduling point or decision point t:
⑥ Average completion rate CRO ave (t) for all workpieces at rescheduling point or decision point t:
⑦ Average completion rate CRJ ave (t) at rescheduling point or decision point t for all processes:
Wherein CRJ i (t) represents the completion rate of workpiece J i at the rescheduling point or decision point t;
⑧ Standard deviation CRJ std (t) of all work-piece completion rates at rescheduling point or decision point t:
⑨ The rescheduling point or decision point t completes the energy consumption index ECI (t) of the process:
Wherein, Representing the minimum energy consumption required to complete a process at the rescheduling or decision point t,/>Representing a set of equipment within the plant f that can process the process O ij;
Wherein, Representing the maximum energy consumption required to complete the process at the rescheduling point or decision point t;
Wherein, An intermediate value of energy consumption required for completing the process at the rescheduling point or decision point t;
⑩ Simplified completion time RCTM fk (t) for the last process that rescheduling point or decision point t was processed on device k in shop f:
wherein RCTM fk (t) contains state information of all devices, Representing the completion time of the last process processed on the device k in the shop f at the rescheduling point or decision point T cur representing the average completion time of the last process for each device in each shop;
(2) Action space of the composite scheduling rules (for details of parameters reference is made to fig. 2 and 3):
based on the state space of the low-carbon distributed flexible job shop, 7 job selection rules and 6 device allocation rules are set, and then 42 total compound scheduling rules are obtained through Cartesian products. The first 10 rules with the best average result are selected as the action space of the compound scheduling rule.
The work piece selection rules are as follows:
① Workpiece selection rule 1: at the rescheduling point or decision point t, if TardJ a (t) is not an empty set, selecting the largest EDT i(t)·Wi in the actual delayed workpiece set as the next scheduling procedure, wherein W i represents the weight of the workpiece J i, namely the machining emergency degree; if TardJ a (t) is empty, selecting the next scheduling procedure with the smallest average relaxation time ST i (t) in the unfinished workpiece set;
② Workpiece selection rule 2: at a rescheduling point or decision point t, if TardJ a (t) is not an empty set, selecting the largest EDT i(t)·Wi in the actual delay workpiece set as the next scheduling procedure; if TardJ a (t) is empty, selecting the minimum critical ratio CR (i) in the unfinished workpiece set as the next scheduling procedure;
③ Workpiece selection rule 3: based on T cur, sequencing the workpieces according to the expected weighted delay EDT i(t)·Wi, and selecting the process with the largest EDT i(t)·Wi value as the next scheduling process; if there are multiple identical values, randomly selecting one;
④ Workpiece selection rule 4: randomly selecting one workpiece from the unfinished workpiece set;
⑤ Workpiece selection rule 5: at a rescheduling point or decision point t, if TardJ a (t) is not an empty set, a critical ratio of weighted delays in the actual delay workpiece set is selected The largest is the next scheduling procedure; if TardJ a (t) is empty, then select incomplete workpart set/> The minimum is the next scheduling procedure;
⑥ Workpiece selection rule 6: selecting a workpiece with the lowest completion rate CRJ i (t) in the unfinished workpiece set at a rescheduling point or a decision point t;
⑦ Workpiece selection rule 7: at a rescheduling point or a decision point t, assigning a priority according to the expiration date of the workpiece; the earlier the expiration date, the higher the processing priority, and the earliest work piece in the set of unfinished work pieces is selected;
the device allocation rules are as follows:
① Device allocation rule 1: the earliest available device m k is selected, Refers to the transportation time from the pre-process to the plant f equipment m k;
② Device allocation rule 2: selecting the available equipment with the lowest energy consumption (transportation energy consumption, processing energy consumption and idle energy consumption)
③ Device allocation rule 3: selecting device utilizationLowest available device/>
④ Device allocation rule 4: selecting available equipment with shortest processing time;
⑤ Device allocation rule 5: selecting available equipment with shortest finishing time of the previous working procedure;
⑥ Device allocation rule 6: selecting the available equipment with the least use times in all the processing procedures of the next round;
(3) Reward function
And the reward function judges whether the network selection strategy acts or not by feeding back the reward value to the Rainbow intelligent agent. As described above, the dynamic multi-objective function of a low-carbon distributed flexible job shop is to minimize the total delay time and total energy consumption. Thus, the bonus functions of the present invention include an instant bonus function and a round bonus function:
Reward=Rt+ER
① Instant prize function R t: the instant rewarding function comprises an economic index, an energy consumption index and an equipment index;
the basic formula of the rewards eco t for calculating the economic index is as follows:
ecot=ecotarda+ecowtard+ecotarde+ecour+ecotardc
Wherein eco tarda denotes that a corresponding bonus value is given according to the actual delay rate Tard a (t) of the current state and the next state; eco wtard denotes that corresponding prize values are awarded in accordance with the predicted weighted delay rates WTard e (t) for the current state and the next state; eco tarde denotes that corresponding prize values are awarded in accordance with the predicted delay rates Tard e (t) for the current state and the next state; eco ur denotes that a corresponding bonus value is given according to the device average utilization UR ave (t) of the current state and the next state; eco tardc represents calculating a prize value from the minimum total delay time minTard and the current total delay time currentTard in the training process; t represents a current rescheduling point or decision point, and t+1 represents a next rescheduling point or decision point;
The rewarding ene t of the energy consumption index is calculated, and the basic formula is as follows:
enet=eneECI+eneCE
wherein, ene ECI calculates the rewarding value according to the energy consumption index ECI (t) of the current state and the next state, and ene CE calculates according to the minimum total energy consumption MINENERGY and the current total energy consumption currentEnergy in the training process:
the weighted sum of the economic index and the energy consumption index is adopted to form instant rewards of rescheduling points or decision points, and the parameter beta epsilon [0,1] is used for balancing the economic index and the energy consumption index;
Rt=β·ecot+(1-β)·enet
Calculating an equipment index RCTM fk (t), giving negative rewards to the simplification completion time of the last process on all equipment by the equipment index, and feeding back the negative rewards to the intelligent body by a strongly correlated negative rewards value, so that the intelligent body converges more quickly, and a better convergence effect is achieved;
② Round prize function ER:
Round rewards are a negative value; after the intelligent agent selects actions and completes the scheduling process through the workshop environment, total delay time CT episode and total energy consumption TE episode are generated;
the larger the values of total delay time and total energy consumption, the greater the penalty the environment feeds back to the agent.
S3, providing a Rainbow DQN deep reinforcement learning algorithm, and solving a dynamic multi-target scheduling planning model of the low-carbon distributed flexible job shop; the Rainbow DQN deep reinforcement learning algorithm comprises a Rainbow agent and a scheduling environment of a low-carbon distributed flexible job shop; as shown in fig. 1, the scheduling environment in the Rainbow DQN produces workpieces by co-production between different low-carbon flexible job shops. It contains multiple low-carbon flexible workshops at different geographic locations, which may contain inconsistent numbers and types of machines. All of the work pieces are distributed to the processing machines of the different low carbon flexible job shops according to a predetermined or inherent sequence of operations. All the operations of the working procedures can be completed in the same workshop or can be transferred to different workshops. The Rainbow DQN takes as input the information of the workpiece and the equipment, the prediction network provides a predicted value for the state observation, and then transmits it to the Rainbow agent to output the learned actions. The scheduling context then performs this action, storing the experience in a priority experience replay buffer, sampling it for learning. The interaction between the Rainbow agent and the scheduling environment of the low-carbon distributed flexible job shop is a discrete time Markov decision process model; in the interface of the discrete time Rainbow agent and the scheduling environment, at the time t, the solving process is as follows:
(1) Obtaining an observation result s t epsilon s by the Rainbow agent to observe the state of the environment, wherein s represents a state space set of the low-carbon distributed flexible job shop;
(2) The Rainbow agent selects an action a t epsilon a according to the observation result, wherein a is an action space set of the compound scheduling rule; as the iteration times increase, the randomness of the selection actions of the Rainbow agent gradually decreases, and the probability of the selection actions in the priority experience replay buffer area gradually increases; resampling the noise network before each action is selected (i.e. at the beginning of a round), scheduling with a fixed noise network, and updating the noise of the neural network until the scheduling is finished so as to improve the action exploration capability of the deep reinforcement learning model. The generally linear layer of the neural network is expressed as the following formula:
y=ωx+b
Where x is the input layer, ω is the weight matrix, and b is the bias. The improved linear noise floor is defined as the following formula:
Where μ ω and μ b are the means to which the parameters ω and b need to obey, σ ω and σ b represent the variance due to noise, and ε is random noise of the same dimension. Noise weights and noise deviations are denoted ω=μ ωω⊙εω and b=μ bb⊙εb, respectively.
(3) The environment gives the value Reward of the Rainbow agent rewarding function according to the action selected by the Rainbow agent, and enters the next state s t+1; after the Rainbow agent obtains the rewarding value, sampling the experience group (s t,at,Reward,st+1) in the round of scheduling, and storing the sampled experience group into a priority experience replay buffer area;
And calculating the absolute value of the time sequence differential deviation by utilizing the Q values output by the evaluation network and the target network of the Rainbow DQN, and measuring the degree of priority learning by utilizing the Q values. The larger the timing difference deviation result, the more samples that need to be learned, i.e. the higher the priority. The priority of experience is proportional to the time differential bias, and the experience in the experience pool is ordered by the absolute value of the time differential bias, with the priority experience pool playing back those experiences of high bias more frequently.
Where p t denotes the priority of experience, r t+1 denotes the acquired single step rewards, γ t+1 denotes the discount coefficient, s t+1 denotes the state at the next moment, a t+1 denotes the selected action, θ denotes the evaluation network parameter, θ - denotes the target network parameter, ω denotes the hyper-parameter determining the distribution shape.
(4) The Rainbow DQN is provided with an evaluation network and a target network, the two network structures are completely consistent, the parameter updating frequency of the evaluation network is 1 step, and the target network parameters are updated into the parameters of the evaluation network after 200 steps, namely the two network parameters are the same, so that the convergence of the scheduling result is realized.
In addition, the network structure of the Rainbow DQN fuses the network structures of Double DQNs and Dueling DQN, and the Double DQNs solve the overestimation problem of Q learning through the selection of decoupling actions and the calculation of target Q values. The algorithm constructs two action value functions, the agent determines the action through the evaluation network, and calculates the value of the action using the target network when estimating the reward.
Wherein,The target Q value is represented by r t+1, the acquired single-step prize is represented by γ, the discount coefficient is represented by s t+1, the next time state is represented by a motion space, θ represents an evaluation network parameter, θ - represents a target network parameter, and Q (s t+1, a; θ) represents the Q value of the selected motion a in the next time state calculated by the evaluation network.
Dueling DQN proposes two value calculation branches, one for predicting state values and the other for predicting state-related action dominance values. The state function is used only to predict whether a state is good or bad, while the action dominance function is used only to predict the importance of each action in that state.
Wherein θ represents a shared neural network parameter, α and β represent network parameters of an action dominance function and a state value function, respectively, V (s t; θ, β) represents a state value function, and a scalar is output; a (s t,at; theta, alpha) is an action dominance function to output a vector, and the length of the vector is equal to the size of an action space; Representing the average of all motion dominance values in the current state, for ensuring the constraint that the expected value is not 0, thereby increasing the output stability of the motion dominance function and the state value function.
The Rainbow DQN adopts a multi-step reinforcement learning idea, and N-step regression is used for replacing single-step regression, so that the target value in the early training stage is estimated more accurately, and the training speed is increased. The loss function L N-step is as follows:
Where γ is the discount coefficient, r t+k is the reward obtained by the t+kth action, θ - represents the parameter of the target network, and the gradient of the loss is only back-propagated to the parameter θ of the online network.
The training curves of total delay time and total energy consumption are shown in fig. 4 and 5, respectively, wherein the light area represents the upper and lower bounds of the optimal target value in the multiple training, and the dark curve represents the average value of the multiple training. It can be found from the graph that the selection of actions tends to explore in the initial training scheduling strategy stage, the two optimization target values are at a higher level, and most of the selected action strategies cannot complete normal scheduling tasks. However, as the number of exercises increases, the false motion selection is gradually replaced by an excellent motion, the final total delay time is reduced to about 250, and the total energy consumption is reduced to about 9000. Experimental results show that the Rainbow intelligent agent can select excellent actions at a rescheduling point or a decision point through self-learning under the condition that the state of a machine is changed so as to optimize the total delay time and the total energy consumption target value. The feasibility and effectiveness of the method and model in solving the dynamic multi-objective problem of the low-carbon distributed flexible job shop are verified.
Other less than perfect matters are known in the art.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (2)

1. The dynamic multi-target scheduling method for the low-carbon distributed flexible job shop is characterized by comprising the following steps of:
s1: establishing a multi-objective function and a series constraint condition of a dynamic multi-objective scheduling planning model of the low-carbon distributed flexible job shop according to the low-carbon job scheduling requirement of the low-carbon distributed flexible job shop;
(1) The multi-objective function includes calculating a total delay time function and a total energy consumption function:
The total delay time function TT is calculated by the cut-off time D i of the workpieces J i and the completion time CT i of the workpieces J i, wherein N is the total number of the workpieces;
The total energy consumption function TE is calculated by the equipment processing energy consumption, the equipment idle energy consumption and the workpiece transportation energy consumption;
TE=procE+idleE+transE
Wherein procE represents computing equipment processing energy consumption, idleE represents computing equipment idle energy consumption, and transE represents computing workpiece transportation energy consumption;
procE is represented as:
Wherein, Representing the processing time of device k in plant f to perform process O ij, and pe fk represents the unit processing energy consumption of device k in plant f,/>A 0-1 decision variable indicating whether or not the process O ij is performed on the equipment k in the shop F, F being the total number of the shop, M f being the total number of the equipment in the shop F, n i being the total number of the processes of the work J i;
idleE is represented as:
Wherein idleE denotes a computing device idle energy consumption, ie fk denotes a unit idle energy consumption of a device k in the plant f, S g,h denotes a start time of the process O gh, C i,j denotes an end time of the process O ij, y ij,gh denotes a 0-1 decision variable for determining whether a post process of the process O ij is O gh, A 0-1 decision variable indicating whether or not the process O gh is performed on the equipment k in the shop f, n i is the total number of processes for the workpiece J i, and n 1 is the total number of processes for the workpiece J g;
transE is represented as:
Wherein transE denotes the calculated work transport energy consumption, te denotes the unit transport energy consumption between workshops/facilities, transF fu denotes the transport time of the work from workshop f to workshop u, transM lk denotes the transport time of the work from facility l to facility k, A0-1 decision variable representing a determination of whether a workpiece is transported from shop floor f to shop floor u,/>A 0-1 decision variable representing a decision to determine whether a workpiece is transported from device l to device k in the same shop;
(2) The series of constraints include:
① Defining a process can only be performed on one piece of equipment in one plant;
② Each process can only be processed after reaching, and A i represents the reaching time of the workpiece J i;
③ The completion time C i,j of process O ij must be equal to the start time S i,j plus the processing time
④ The process of each workpiece must follow a front-to-back priority order;
⑤ If the process of processing different workpieces on one device is to be performed, the process must be performed sequentially;
⑥ The transportation time between the workshop and the equipment is not considered at the same time, and the transportation time between the equipment is not considered when the transportation time of the workshop is considered;
S2: constructing a state space of a low-carbon distributed flexible job shop, designing an action space of a composite scheduling rule, and providing an instant rewarding function and a round rewarding function;
(1) State space of low-carbon distributed flexible job shop
① Predicted delay rate Tard e (t):
wherein TardJ e (t) represents a workpiece set predicted to be delayed at a rescheduling point or a decision point t, ucompJ (t) represents a workpiece set not completed in processing at the rescheduling point or the decision point t, NPO i(t)<ni and EDT i (t) >0 determine predicted delay conditions of the workpiece at the rescheduling point or the decision point t, NPO i (t) represents the number of processes completed by the workpiece J i at the rescheduling point or the decision point t, and EDT i (t) represents predicted delay time of the workpiece J i at the rescheduling point or the decision point t;
② Actual delay rate Tard a (t):
Wherein TardJ a (t) represents the set of workpieces actually delayed at the rescheduling point or decision point t, and is represented by NPO i(t)<ni and Judging the actual delay condition of the workpiece at a rescheduling point or a decision point,/>Indicating the completion time of the machined process of the workpiece J i;
③ The predicted weighted delay rate WTard e (t):
Wherein, Representing the predicted time required to machine the remainder of the workpiece J i at the rescheduling or decision point t,/>Representing the average processing time of the processing procedures O ij on all available equipment in all workshops; w i represents the weight of the workpiece J i, namely the machining emergency degree;
Wherein, Representing the remaining estimated transit time of the workpiece J i at the rescheduling or decision point,/>Indicating the average transport time of the equipment where the work J i has completed the process to the equipment where the next process is processed, F i,j indicating the set of workplaces where the process O ij is possible;
④ Average utilization UR ave (t) for all devices in all workshops at rescheduling point or decision point t:
Wherein, Representing the device utilization of device k in plant f at rescheduling or decision point t,/>The completion time of the last procedure on the equipment k in the workshop f is represented;
⑤ Standard deviation UR std (t) of device utilization for rescheduling point or decision point t:
⑥ Average completion rate CRO ave (t) for all workpieces at rescheduling point or decision point t:
⑦ Average completion rate CRI ave (t) for all procedures at rescheduling point or decision point t:
Wherein CRJ i (t) represents the completion rate of workpiece J i at the rescheduling point or decision point t;
⑧ Standard deviation CRJ std (t) of all work-piece completion rates at rescheduling point or decision point t:
⑨ The rescheduling point or decision point t completes the energy consumption index ECI (t) of the process:
Wherein, Representing the minimum energy consumption required to complete a process at the rescheduling or decision point t,/>Representing a set of equipment in the workshop u that can process the process O ij;
Wherein, Representing the maximum energy consumption required to complete the process at the rescheduling point or decision point t;
Wherein, An intermediate value of energy consumption required for completing the process at the rescheduling point or decision point t;
⑩ Simplified completion time RCTM fk (t) for the last process that rescheduling point or decision point t was processed on device k in shop f:
wherein RCTM fk (t) contains state information of all devices, Representing the completion time of the last process processed on the device k in the shop f at the rescheduling point or decision point T cur representing the average completion time of the last process for each device in each shop;
(2) Action space of composite scheduling rule
The work piece selection rules are as follows:
① Workpiece selection rule 1: at a rescheduling point or decision point t, if TardJ a (t) is not an empty set, selecting the largest EDT i(t)·Wi in the actual delayed workpiece set as the next scheduling procedure, representing the weight of the workpiece, namely, if TardJ a (t) is empty in the processing emergency degree, selecting the smallest average relaxation time ST i (t) in the unfinished workpiece set as the next scheduling procedure;
② Workpiece selection rule 2: at a rescheduling point or decision point t, if TardJ a (t) is not an empty set, selecting the largest EDT i(t)·Wi in the actual delay workpiece set as the next scheduling procedure; if TardJ a (t) is empty, selecting the minimum critical ratio CR (i) in the unfinished workpiece set as the next scheduling procedure;
③ Workpiece selection rule 3: based on T cur, sequencing the workpieces according to the expected weighted delay EDT i(t)·Wi, and selecting the process with the largest EDT i(t)·Wi value as the next scheduling process; if there are multiple identical values, randomly selecting one;
④ Workpiece selection rule 4: randomly selecting one workpiece from the unfinished workpiece set;
⑤ Workpiece selection rule 5: at a rescheduling point or decision point t, if TardJ a (t) is not an empty set, a critical ratio of weighted delays in the actual delay workpiece set is selected The largest is the next scheduling procedure; if TardJ a (t) is empty, then select incomplete workpart set/>The minimum is the next scheduling procedure;
⑥ Workpiece selection rule 6: selecting a workpiece with the lowest completion rate CRJ i (t) in the unfinished workpiece set at a rescheduling point or a decision point t;
⑦ Workpiece selection rule 7: at a rescheduling point or a decision point t, assigning a priority according to the expiration date of the workpiece; the earlier the expiration date, the higher the processing priority, and the earliest work piece in the set of unfinished work pieces is selected;
the device allocation rules are as follows:
① Device allocation rule 1: the earliest available device m k is selected, Refers to the transportation time from the pre-process to the plant f equipment m k;
② Device allocation rule 2: selecting available equipment with lowest energy consumption
③ Device allocation rule 3: selecting device utilizationLowest available device/>
④ Device allocation rule 4: selecting available equipment with shortest processing time;
⑤ Device allocation rule 5: selecting available equipment with shortest finishing time of the previous working procedure;
⑥ Device allocation rule 6: selecting the available equipment with the least use times in all the processing procedures of the next round;
(3) Reward function
The reward functions include an instant reward function and a round reward function;
Reward=RQ+ER
① Instant prize function R t: the instant rewarding function comprises an economic index, an energy consumption index and an equipment index;
the basic formula of the rewards eco t for calculating the economic index is as follows:
ecot=ecotarda+ecowtard+ecotarde+ecour+ecotardc
Wherein eco tarda denotes that a corresponding bonus value is given according to the actual delay rate Tard a (t) of the current state and the next state; eco wtard denotes that corresponding prize values are awarded in accordance with the predicted weighted delay rates WTard e (t) for the current state and the next state; eco tarde denotes that corresponding prize values are awarded in accordance with the predicted delay rates Tard e (t) for the current state and the next state; eco ur denotes that a corresponding bonus value is given according to the device average utilization UR ave (t) of the current state and the next state; eco tardc represents calculating a prize value from the minimum total delay time minTard and the current total delay time currentTard in the training process; t represents a current rescheduling point or decision point, and t+1 represents a next rescheduling point or decision point;
The rewarding ene t of the energy consumption index is calculated, and the basic formula is as follows:
enet=eneECI+eneCE
wherein, ene ECI calculates the rewarding value according to the energy consumption index ECI (t) of the current state and the next state, and ene CE calculates according to the minimum total energy consumption MINENERGY and the current total energy consumption currentEnergy in the training process:
the weighted sum of the economic index and the energy consumption index is adopted to form instant rewards of rescheduling points or decision points, and the parameter beta epsilon [0,1] is used for balancing the economic index and the energy consumption index;
RQ=β·ecot+(1-β)·enet
Calculating an equipment index RCTM fk (t), giving negative rewards to the simplification completion time of the last process on all equipment by the equipment index, and feeding back the negative rewards to the intelligent body by a strongly correlated negative rewards value, so that the intelligent body converges more quickly, and a better convergence effect is achieved;
② Round prize function ER:
Round rewards are a negative value; after the intelligent agent selects actions and completes the scheduling process through the workshop environment, total delay time CT episode and total energy consumption TE episode are generated;
the larger the value of the total delay time and the total energy consumption, the larger the penalty of environment feedback to the agent;
S3: and a Rainbow DQN deep reinforcement learning algorithm is provided, and a dynamic multi-target scheduling planning model of the low-carbon distributed flexible job shop is solved.
2. The method for dynamic multi-objective scheduling of a low-carbon distributed flexible job shop according to claim 1, wherein step S3 provides a Rainbow DQN deep reinforcement learning algorithm, and solves a dynamic multi-objective scheduling planning model of the low-carbon distributed flexible job shop; the Rainbow DQN deep reinforcement learning algorithm comprises a Rainbow agent and a scheduling environment of a low-carbon distributed flexible job shop; the interaction between the Rainbow agent and the scheduling environment of the low-carbon distributed flexible job shop is a discrete time Markov decision process model; in the interface of the discrete time Rainbow agent and the scheduling environment, at the time t, the solving process is as follows:
(1) Obtaining an observation result s t epsilon s by the Rainbow agent to observe the state of the environment, wherein s represents a state space set of the low-carbon distributed flexible job shop;
(2) The Rainbow agent selects an action a t epsilon a according to the observation result, wherein a is an action space set of the compound scheduling rule; as the iteration times increase, the randomness of the selection actions of the Rainbow agent gradually decreases, and the probability of the selection actions in the priority experience replay buffer area gradually increases;
(3) The environment gives the value Reward of the Rainbow agent rewarding function according to the action selected by the Rainbow agent, and enters the next state s t+1; after the Rainbow agent obtains the rewarding value, sampling the experience group (s t,at,Reward,st+1) in the round of scheduling, and storing the sampled experience group into a priority experience replay buffer area;
(4) The Rainbow DQN is provided with an evaluation network and a target network, the two network structures are completely consistent, the parameter updating frequency of the evaluation network is 1 step, and the target network parameters are updated into the parameters of the evaluation network after 200 steps, namely the two network parameters are the same, so that the convergence of the scheduling result is realized.
CN202310494027.XA 2023-05-05 2023-05-05 Dynamic multi-target scheduling method for low-carbon distributed flexible job shop Active CN116500994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310494027.XA CN116500994B (en) 2023-05-05 2023-05-05 Dynamic multi-target scheduling method for low-carbon distributed flexible job shop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310494027.XA CN116500994B (en) 2023-05-05 2023-05-05 Dynamic multi-target scheduling method for low-carbon distributed flexible job shop

Publications (2)

Publication Number Publication Date
CN116500994A CN116500994A (en) 2023-07-28
CN116500994B true CN116500994B (en) 2024-05-03

Family

ID=87322721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310494027.XA Active CN116500994B (en) 2023-05-05 2023-05-05 Dynamic multi-target scheduling method for low-carbon distributed flexible job shop

Country Status (1)

Country Link
CN (1) CN116500994B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149987A (en) * 2020-09-17 2020-12-29 清华大学 Multi-target flexible job shop scheduling method and device based on deep reinforcement learning
US11144847B1 (en) * 2021-04-15 2021-10-12 Latent Strategies LLC Reinforcement learning using obfuscated environment models
US11403538B1 (en) * 2020-11-05 2022-08-02 Arthur AI, Inc. Methods and apparatus for generating fast counterfactual explanations for black-box models using reinforcement learning
WO2022206265A1 (en) * 2021-04-02 2022-10-06 河海大学 Method for parameter calibration of hydrological forecasting model based on deep reinforcement learning
CN115454005A (en) * 2022-09-29 2022-12-09 河海大学常州校区 Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene
CN115640898A (en) * 2022-10-27 2023-01-24 西南交通大学 Large-scale flexible job shop scheduling method based on DDQN algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10606898B2 (en) * 2017-04-19 2020-03-31 Brown University Interpreting human-robot instructions
KR102251316B1 (en) * 2019-06-17 2021-05-12 (주)브이엠에스 솔루션스 Reinforcement learning and simulation based dispatching method within a factory, and an apparatus thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149987A (en) * 2020-09-17 2020-12-29 清华大学 Multi-target flexible job shop scheduling method and device based on deep reinforcement learning
US11403538B1 (en) * 2020-11-05 2022-08-02 Arthur AI, Inc. Methods and apparatus for generating fast counterfactual explanations for black-box models using reinforcement learning
WO2022206265A1 (en) * 2021-04-02 2022-10-06 河海大学 Method for parameter calibration of hydrological forecasting model based on deep reinforcement learning
US11144847B1 (en) * 2021-04-15 2021-10-12 Latent Strategies LLC Reinforcement learning using obfuscated environment models
CN115454005A (en) * 2022-09-29 2022-12-09 河海大学常州校区 Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene
CN115640898A (en) * 2022-10-27 2023-01-24 西南交通大学 Large-scale flexible job shop scheduling method based on DDQN algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
考虑物料搬运的离散制造车间多资源调度;肖蒙;《中国优秀硕士学位论文全文数据库(电子期刊)工程科技Ⅱ辑》;20230131;第C029-765页 *
面向大型石油装备的制造执行系统关键技术研究;白凯建;《中国优秀硕士学位论文全文数据库(电子期刊)工程科技Ⅰ辑》;20220331;第B019-646页 *

Also Published As

Publication number Publication date
CN116500994A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN112734172B (en) Hybrid flow shop scheduling method based on time sequence difference
CN112884239B (en) Space detonator production scheduling method based on deep reinforcement learning
CN110609531B (en) Workshop scheduling method based on digital twin
CN112149987A (en) Multi-target flexible job shop scheduling method and device based on deep reinforcement learning
CN113792924A (en) Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network
CN115454005A (en) Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene
CN116542445A (en) Intelligent scheduling method and system for equipment manufacturing workshop based on deep reinforcement learning
CN111160755B (en) Real-time scheduling method for aircraft overhaul workshop based on DQN
CN111353646B (en) Steelmaking flexible scheduling optimization method, system, medium and equipment with switching time
CN114707881A (en) Job shop adaptive scheduling method based on deep reinforcement learning
CN110264079A (en) Hot-rolled product qualitative forecasting method based on CNN algorithm and Lasso regression model
CN113406939A (en) Unrelated parallel machine dynamic hybrid flow shop scheduling method based on deep Q network
CN114565247A (en) Workshop scheduling method, device and system based on deep reinforcement learning
CN107357267B (en) The method for solving mixed production line scheduling problem based on discrete flower pollination algorithm
Li et al. A novel milling parameter optimization method based on improved deep reinforcement learning considering machining cost
Hosseinian et al. An energy-efficient mathematical model for the resource-constrained project scheduling problem: an evolutionary algorithm
Zhao et al. A drl-based reactive scheduling policy for flexible job shops with random job arrivals
CN116500994B (en) Dynamic multi-target scheduling method for low-carbon distributed flexible job shop
CN112686693A (en) Method, system, equipment and storage medium for predicting marginal electricity price of electric power spot market
CN116644902A (en) Multi-target dynamic flexible job shop scheduling method related to energy consumption based on deep reinforcement learning
Yuan et al. A multi-agent double Deep-Q-network based on state machine and event stream for flexible job shop scheduling problem
CN117314055A (en) Intelligent manufacturing workshop production-transportation joint scheduling method based on reinforcement learning
CN115860435A (en) Power equipment preventive maintenance dynamic flexible scheduling method and system with AGV
CN114219274A (en) Workshop scheduling method adapting to machine state based on deep reinforcement learning
CN114004065A (en) Transformer substation engineering multi-objective optimization method based on intelligent algorithm and environmental constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant