CN116562584A - Dynamic workshop scheduling method based on Conv-lasting and generalization characterization - Google Patents

Dynamic workshop scheduling method based on Conv-lasting and generalization characterization Download PDF

Info

Publication number
CN116562584A
CN116562584A CN202310600842.XA CN202310600842A CN116562584A CN 116562584 A CN116562584 A CN 116562584A CN 202310600842 A CN202310600842 A CN 202310600842A CN 116562584 A CN116562584 A CN 116562584A
Authority
CN
China
Prior art keywords
scheduling
time
network
workpiece
conv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310600842.XA
Other languages
Chinese (zh)
Inventor
刘海滨
夏铭浩
李明飞
王龙
董浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202310600842.XA priority Critical patent/CN116562584A/en
Publication of CN116562584A publication Critical patent/CN116562584A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Manufacturing & Machinery (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • General Factory Administration (AREA)

Abstract

The invention discloses a dynamic workshop scheduling method based on Conv-Dueling and generalization characterization, which comprises the steps of firstly adopting a multidimensional matrix to represent equipment states and workpiece states; a composite bonus function is designed to guide the convergence of the algorithm. And providing a Conv-lasting network model, taking a multidimensional state matrix as input, taking the scheduling rule value as output, and selecting the optimal scheduling rule at different rescheduling decision points. The network model consists of a feature extraction network, a state network and an advantage network, and global optimal scheduling is realized. Through verification under static and dynamic conditions, the network model can obtain good optimization effect. The dynamic workshop scheduling method provided by the invention can reduce the maximum finishing time, improve the punctual finishing rate and reduce the total delay time, and simultaneously ensure the robustness and the stability, and the comprehensive scheduling performance of the dynamic workshop scheduling method is superior to that of the existing scheduling method.

Description

Dynamic workshop scheduling method based on Conv-lasting and generalization characterization
Technical Field
The invention belongs to the field of dynamic scheduling production decision, and is used for material scheduling tasks of flexible job shops, in particular relates to a dynamic scheduling method based on Conv-lasting and generalization characterization of deep reinforcement learning.
Background
The material scheduling technology in the flexible job shop refers to the process of real-time monitoring, scheduling and optimizing the shop materials by using a computer technology and an artificial intelligence algorithm. The purpose is to improve the utilization efficiency of workshop materials, reduce the waste and the cost in the production process, and thereby realize the maximization of the production benefit. The material scheduling technique can be applied to a plurality of fields including manufacturing, logistics, storage and other fields. In the manufacturing industry, the material scheduling technology can optimize the production flow, improve the production efficiency and quality and reduce the cost. However, due to the complex real environment and the large number of disturbance factors, the solving performance of most workshop scheduling algorithms is poor, and the requirements of high efficiency, punctual time, stability and the like in the production process are difficult to meet. Therefore, developing a dynamic shop material scheduling algorithm with high scheduling performance is an urgent problem to be solved at present.
In recent years, most solutions to the multi-objective dynamic flexible shop scheduling problem (MFJSP) of a production shop are assumed to be performed in a static production environment, where the processing information of equipment and workpieces in the shop is completely known, and multiple disturbance influencing factors existing in the actual production process are not considered, so that a fixed scheduling scheme is output, and no change is performed in the whole production process. However, there are a number of dynamic event disturbances in the actual production process, such as: the insertion of new orders, equipment faults, workpiece processing time changes and other uncertain and unavoidable disturbance factors. These randomly occurring disturbances cause severe deviations from the expected results when the original static scheduling scheme is executed, greatly reducing the time rate of task completion and production efficiency. The dynamic multi-target flexible job shop scheduling problem (DMFJSP) aims at completing all scheduling rapidly, on time and with low delay, is oriented to complex task information constraint relation and real-time uncertainty event disturbance scenes in a manufacturing shop, and has important significance in the production and processing of modern manufacturing industry when researching a dynamic optimal scheduling solution.
In recent years, more and more scholars turn the research direction to the research of task scheduling algorithm based on artificial intelligent neural network, exert the advantage of deep reinforcement learning algorithm in machine learning, promote the robustness of material scheduling system and accomplish scheduling task with high efficiency. In order to select the most appropriate scheduling rule at each rescheduling time point, the dynamic multi-objective flexible job shop scheduling problem can be regarded as a markov decision process. Under the constraint of multiple processing information such as workpieces, equipment, working procedures, processing time and the like and an uncertainty disturbance event, the intelligent agent should comprehensively utilize the current production state information and select an optimal scheduling rule. However, few studies have considered dynamic disturbance events that occur with uncertainty in the real production environment, nor have they considered various goals required for achieving production scheduling while solving the disturbance problem, thereby efficiently completing scheduling tasks.
Disclosure of Invention
Aiming at the problems, the invention provides a dynamic workshop scheduling method based on Conv-lasting and generalization characterization.
The invention discloses a dynamic workshop scheduling method based on Conv-lasting and generalization characterization, which comprises the following steps:
and step A, determining a scheduling problem of the dynamic flexible job shop.
The invention designs a dynamic multi-objective flexible scheduling shop problem comprising a plurality of dynamic events and a plurality of objectives. These disturbance events include the insertion of production orders, variations in process tooling man-hours, and equipment failures. The three goals include minimizing maximum finishing time (makespan), maximizing workpiece completion time rate, and minimizing workpiece delay time.
First, a logical scheduling formula of JSSP is established, in which lowercase letters represent indexes and uppercase letters represent sets. Assume that there is a j= { J in the flexible shop scheduling problem 1 ,J 2 ,...,J j Sum of the workpiece m= { M 1 ,M 2 ,...,M m A } stage apparatus in which each work J j Comprises one or more processing procedures O= { O 1 ,O 2 ,...,O i E.g., turn-milling, planing, welding, etc. Each work-piece processing must be performed in a fixed sequence, each process being processed by a plurality of apparatuses and having different processing durations P in different apparatuses ji . The scheduling is to reasonably distribute all the workpieces to all the equipment for processing, and aims at minimizing the maximum processing time, the maximum workpiece completion time rate and the minimum workpiece total delay time.
The insertion of the workpiece refers to the fact that the workpiece needs to be added to complete the production task due to the conditions of insufficient shrink fit, new task requirements and the like besides the initial planned task of workshop production scheduling.
Equipment failure is an unavoidable and randomly occurring disturbance event in the actual production process. The equipment faults have a plurality of fault types and respectively have different maintenance times.
The processing time variation refers to the situation that processing tasks cannot be completed according to the specified processing time due to factors such as different proficiency of workers in operating equipment or equipment problems in the production process, and the processing is completed in advance or delayed.
And (B) step (B): a dynamic flexible job shop scheduling problem is determined.
(a) State feature design
In order to fully exploit deep learning to extract features from the original input, the proposed state space is composed of a multidimensional matrix containing workpiece and equipment state information. The matrix strengthens the mapping relation between the state characteristics and the action space, not only can complete expression of information on which the equipment performs actions, but also is beneficial to the rapid training of the neural network and better convergence effect, so that the equipment is easier to make an optimal action decision. The multidimensional state matrix uses different scheduling characteristic information as different channels of an image, and each channel has the length of a device serial number, the width of a working procedure sequence and the height of the number of workpieces. Scheduling features considered herein include workpiece, process, equipment, processing time, deadline, current time, etc. information. Each element is normalized by the overall maximum processing time. If an operation has been assigned to an apparatus, the apparatus is in a processing state, while the value is the remaining processing time of the process on the apparatus, the row remaining values are all 0. The rightmost processing time channel of the image represents a numerical value obtained by calculating weights of the processing time, the cut-off time and the current time, and the multi-time information is expressed more completely.
(b) Action set design
Therefore, 9 better scheduling rules are designed by comprehensively considering the information factors such as processing time, workpiece completion rate, waiting time, cut-off time, arrival time, idle time and the like.
(c) Bonus function design
The invention aims at comprehensively considering the minimized maximum finishing time, the minimized delay time and the maximized completion time, so that a compound rewarding mode of combining a main line task and a branch line task is adopted, the branch line rewards are designed to guide an intelligent body to learn to the optimal action, the main line rewards give successful or failed positive or negative feedback when one training is completed, and the problems that sparse rewards are difficult to converge and dense rewards are easy to cause local optimal are solved. Wherein the dominant line bonus function is as in equation (1).
Wherein R and R b Is the rewarding value set after multiple experiments, c r For the completion time rate of the workpiece, d r For failure rate of work, j t Max for the current processing time step t And r is a target completion rate index for the processing time step threshold.
Wherein the spur rewards are as shown in formula (2):
reward2=-{j l /m s )*μ (2)
wherein j is l For the number of unfinished tasks of the workpiece, m s And mu is the number of the total machines and mu is the weight coefficient.
The total prize is shown in formula (3): where α is a weight coefficient.
reward=reward1+α*reward2 (3)
Step C: the Conv-lasting scheduling algorithm optimally solves the scheduling problem of the large-scale flexible job shop.
In the training phase, the Conv-Dueling network adopts a deep convolutional neural network architecture. Specifically, the Conv-lasting network takes a multidimensional state matrix containing workpiece and equipment processing information as input, a predicted Q value of a scheduling action as output, acquires a response reward feedback, continuously performs trial and error learning, continuously interacts with the environment, and finally ensures that a global better solution is obtained under the condition of maximum accumulated reward value. The Conv-lasting scheduling algorithm solves the specific requirements of the dynamic flexible workshop scheduling problem as follows:
step 1: initializing the memory pool capacity as D, batch mini_batch, action cost function q and target cost functionInitializing the parameters of the target network and the estimated network, wherein the learning rate is alpha, the discount rate is gamma.
Step 2: resetting the scheduling context at the beginning of each round to obtain an initial state S 0
Step 3: at time t<At any time of T, the agent selects an action a from the action space according to the observed state t Execution is performed wherein T is equal to the total process time step. The action selection is based on the proposed epsilon-decrementing strategy.
Step 4: after the action is executed, the action with the highest priority in the equipment processing workpiece list is scheduled preferentially, and then the instant rewards r are observed t And next state s t+1
Step 5: data (S) t ,a t ,r t ,S t+1 ) The memory is stored in the memory pool D, the experience memory amount is detected, and if the maximum memory amount of the experience pool is exceeded, new experience is learned instead of old experience. The conversion for a given sample performs a loss calculation from the q value and the target value.
Step 6: network parameters are converted by all samplesIs updated by changing the cumulative weight value of (c). To ensure stable convergence of the training process, the weights of the target q network are replaced by the weights of the target q network periodicallyThe weight of the network.
Step 7: and (3) judging whether all work procedures of the case are scheduled, if yes, entering the next round, and if no, continuing to execute the step (3).
Step 8: and judging whether the round is ended, if so, outputting a better scheduling model, and if not, continuing to execute the step 2.
The beneficial technical effects of the invention are as follows:
(1) The method designs a multidimensional matrix containing workpiece and equipment state information as state representation, can completely express information on which equipment performs actions, is beneficial to the rapid training of a neural network and obtains better convergence effect, so that the equipment is easier to make optimal action decisions; meanwhile, the method designs the reward function and accelerates the algorithm convergence.
(2) The method designs a Conv-lasting network model, takes a multidimensional state matrix as input and takes the value of a scheduling rule as output, and selects an optimal scheduling rule on different rescheduling decision points based on the value of the scheduling rule. The network model consists of a feature extraction network, a state network and an advantage network, and global optimal scheduling is realized. The network model can be better optimized in both static and dynamic cases.
Drawings
FIG. 1 is a general scheduling flow chart of an implementation of the present invention.
FIG. 2 is a diagram showing the whole training process of Conv-Dueling network model in the implementation of the present invention.
FIG. 3 is a diagram illustrating a state S before a scheduling state transition in accordance with an embodiment of the present invention 0 A drawing.
FIG. 4 is a diagram illustrating a post-scheduling state S implemented by the present invention 1 Drawing of the figure
FIG. 5 is a graph of the on-time completion rate of a work piece for a scheduling agent learning process.
FIG. 6 is a diagram of reward and punishment records of the process of learning a scheduling agent
Detailed Description
The invention will now be described in further detail with reference to the drawings and to specific examples.
The chapter innovatively invents a dynamic scheduling method based on Conv-Dueling and generalization characterization of deep reinforcement learning, and provides a new effective method for solving the problem. Firstly, digitally modeling a scheduling problem of a dynamic flexible job shop and defining state characteristics, action space and rewarding functions. And secondly, carrying out unsupervised training on a network model by adopting a D3QN algorithm, finally taking a workpiece multidimensional state representation matrix as input according to multi-constraint and multi-disturbance production environment information, extracting and making a decision by Conv-lasting network model characteristics, and outputting an optimal scheduling rule.
The invention relates to a dynamic scheduling method based on Conv-Dueling and generalization characterization of deep reinforcement learning, which comprises the following steps:
and step A, determining a scheduling problem of the dynamic flexible job shop.
First, a logical scheduling formula of JSSP is established, in which lowercase letters represent indexes and uppercase letters represent sets. Assume that there is a j= { J in the flexible shop scheduling problem 1 ,J 2 ,...,J j Sum of the workpiece m= { M 1 ,M 2 ,...,M n A } stage apparatus in which each work J j Comprises one or more processing procedures O= { O 1 ,O 2 ,...,O i E.g., turn-milling, planing, welding, etc. Each work-piece processing must be performed in a fixed sequence, each process being processed by a plurality of apparatuses and having different processing durations P in different apparatuses ji . The scheduling is to reasonably distribute all the workpieces to all the equipment for processing, and aims at minimizing the maximum processing time, the maximum workpiece completion time rate and the minimum workpiece total delay time.
The research aims at processing unpredictable dynamic disturbance events such as insertion list, machine fault, working hour variation and the like in the multi-constraint relation between processing equipment and workpieces in the production scheduling process, and processing each working procedure O of the workpieces i,j Allocated to adaptations at appropriate timesCombined processing equipment M m Processing is performed, so that all scheduling tasks are efficiently completed, and superior comprehensive performances are obtained in terms of time, resource utilization and the like. The overall process of the production scheduling process is shown in fig. 1. The details of the dynamic disturbance event processing shown in fig. 1 are as follows.
The insertion of the workpiece refers to the fact that the workpiece needs to be added to complete the production task due to the conditions of insufficient shrink fit, new task requirements and the like besides the initial planned task of workshop production scheduling.
Equipment failure is an unavoidable and randomly occurring disturbance event in the actual production process. The equipment faults have a plurality of fault types and respectively have different maintenance times.
The processing time variation refers to the situation that processing tasks cannot be completed according to the specified processing time due to factors such as different proficiency of workers in operating equipment or equipment problems in the production process, and the processing is completed in advance or delayed.
And (B) step (B): conversion of scheduling problems
a) State feature design
In order to fully utilize deep learning to extract features from the original input, the state space proposed by the present invention is composed of a multi-dimensional matrix containing workpiece and equipment state information. The matrix strengthens the mapping relation between the state characteristics and the action space, not only can complete expression of information on which the equipment performs actions, but also is beneficial to the rapid training of the neural network and better convergence effect, so that the equipment is easier to make an optimal action decision. The multidimensional state matrix uses different scheduling characteristic information as different channels of an image, and each channel has the length of a device serial number, the width of a working procedure sequence and the height of the number of workpieces. Scheduling features considered herein include workpiece, process, equipment, processing time, deadline, current time, etc. information. Each element is normalized by the overall maximum processing time. If an operation has been assigned to an apparatus, the apparatus is in a processing state, while the value is the remaining processing time of the process on the apparatus, the row remaining values are all 0. As shown in fig. 3 and 4, the rightmost processing time channel of the image represents a numerical value obtained by weighting the processing time, the cut-off time and the current time, and the multi-time information is more completely expressed.
b) Motion space design
In order to solve the problems that a single scheduling rule is not suitable for various scheduling scenes and the like, according to different environment states, proper scheduling rules are selected through deep reinforcement learning. Too few scheduling rules easily cause that the agent cannot realize global optimal scheduling in the face of complex and diverse environments, and too many scheduling rules can cause that the agent consumes a great deal of time in the learning process, so that the goal of real-time efficient scheduling cannot be met. Therefore, we design 9 better scheduling rules by comprehensively considering the information factors such as processing time, workpiece completion rate, waiting time, cut-off time, arrival time, idle time and the like as shown in table 1.
Table 1 action set table.
c) Bonus function design
The invention aims at comprehensively considering the minimized maximum finishing time, the minimized delay time and the maximized completion time, so that a compound rewarding mode of combining a main line task and a branch line task is adopted, the branch line rewards are designed to guide an intelligent body to learn to the optimal action, the main line rewards give successful or failed positive or negative feedback when one training is completed, and the problems that sparse rewards are difficult to converge and dense rewards are easy to cause local optimal are solved. Wherein the dominant line bonus function is as in equation (4).
Wherein R and R b Is the rewarding value set after multiple experiments, c r For the completion time rate of the workpiece, d r For failure rate of work, j t Max for the current processing time step t For the processing time step threshold, r is the target completion index。
Wherein the spur rewards are as shown in formula (5):
reward2=-(j_l/m_s)*μ (2)
wherein j is l For the number of unfinished tasks of the workpiece, m s And mu is the number of the total machines and mu is the weight coefficient.
The total prize is shown in equation (6), where α is a weight coefficient.
raward=reward1+α*reward2 (3)
Step C: DDQN algorithm optimization solution for scheduling problem of large-scale flexible job shop
The scheduling agent selects proper scheduling rules in the workshop environment state, and sorts and distributes the workpieces to the processing equipment. When the environmental status of the plant changes, corresponding prize values are given according to the prize function, a high prize value meaning that the scheduling rule is more efficient in this case, and a negative prize value meaning that the rule scheduling is less efficient. The scheduling agent obtains the global better solution while obtaining the maximum cumulative prize value through continuous trial and error learning and interaction with the environment.
In the training phase, the Conv-Dueling network adopts a deep convolutional neural network architecture. Specifically, the Conv-forcing network takes as input a multidimensional state matrix containing workpiece and equipment processing information, and takes as output a predicted Q value of a scheduling operation. The specific requirements are as follows:
step 1: initializing the memory pool capacity as D, batch mini_batch, action cost function q and target cost functionInitializing the parameters of the target network and the estimated network, wherein the learning rate is alpha, the discount rate is gamma.
Step 2: resetting the scheduling context at the beginning of each round to obtain an initial state S 0
Step 3: at time t<At any time of T, the agent selects an action a from the action space according to the observed state t Execution is performed wherein T is equal to the total process time step. The action selection is based on the proposed epsilon-decrementing strategy.
Step 4: after the action is executed, the action with the highest priority in the equipment processing workpiece list is scheduled preferentially, and then the instant rewards r are observed t And next state s t+1
Step 5: data (S) t ,a t ,r t ,S t+1 ) The memory is stored in the memory pool D, the experience memory amount is detected, and if the maximum memory amount of the experience pool is exceeded, new experience is learned instead of old experience. The conversion for a given sample performs a loss calculation from the q value and the target value.
Step 6: the network parameters are updated by accumulated weight value changes over all sample transitions. To ensure stable convergence of the training process, the weights of the target q network are replaced by the weights of the target q network periodicallyThe weight of the network.
Step 7: and (3) judging whether all work procedures of the case are scheduled, if yes, entering the next round, and if no, continuing to execute the step (3).
Step 8: and judging whether the round is ended, if so, outputting a better scheduling model, and if not, continuing to execute the step 2.
(1) Design of experiment
The data example used in this test procedure was randomly generated, wherein the initial number of workpieces was 20 and the number of procedures per workpiece was random. The number of devices is 10, 20, and the processing functions of the devices are random. The number of dynamic disturbance events is 30, 50 and 80, the type of which is random. As with the training phase, the relevant parameters of the workpiece and the apparatus are random. Meanwhile, the influence of the correct selection of the super-parameters on the learning ability and algorithm performance of the intelligent agent is large, but the super-parameters are wide in range and difficult to select to be suitable, and the invention sets related parameters according to the general principle, and the related parameters are shown in a table 2. Finally, there are 30 test case combinations in total, 50 runs. The test code was written in the python programming language, compiled and run using python 3.8.12.
Table 2 training hyper-parameters
(2) Analysis of experimental results
The D3QN obtains the prize convergence result in the first 2000 training rounds as shown in fig. 5, and as the training set proceeds, the prizes of the three algorithms can all converge to the maximum value. The converged oscillations are mainly caused by a small probability of random action selection. Compared with the three algorithms, the DQN algorithm has the worst effect, the lowest learning efficiency and slow convergence speed. The DDQN is obviously improved and promoted, and the convergence speed is high, but still is inferior to the D3QN algorithm. The convergence and stability of the D3QN algorithm are best, and the influence of overestimation of the action value is relieved due to the structure of the duel-fight network, the multi-dimensional state space of the deep double-Q network and the convolutional neural network; the results of the completion of the time-lapse rate convergence for the workpieces of the different algorithms are shown in fig. 6, it is evident that the convergence can be achieved for all three deep reinforcement learning algorithms, respectively at 0.85,0.8,0.7. It can be seen that the D3QN algorithm can converge more stably to a higher prize value, while the other two algorithms do not learn more excellent scheduling rules at rescheduling points, so that a higher workpiece completion timing rate cannot be obtained. Both graphs can show that under the condition of multiple constraints and multiple disturbance, the invention provides a dynamic scheduling method based on Conv-lasting and generalization characterization of deep reinforcement learning, which learns more efficient scheduling rules.
In order to verify that the proposed scheduling algorithm based on D3QN is superior to the scheduling algorithm based on DDQN, the following experiment was designed. Wherein at m, n add And E is connected with ave Under different experimental parameter settings, multiple groups of experimental data are randomly generated for simulating different task scheduling conditions in the production process, and on each group of experimental data, the scheduling algorithm proposed by the user and each scheduling rule are repeated for 50 times respectively. The average and standard deviation values of the total completion time, task completion rate and total delay time for the tasks obtained for each method are shown in table 3, where the best results are indicated in bold. In order to guarantee fairness among different algorithms, DDQN uses action and rewarding functions of a scheduling algorithm proposed by us,the state features of DDQN will be divided into 9 discrete states using a neural network with a self-organizing map (SOM) layer. From the table, the optimal scheduling rule selected by the scheduling algorithm based on D3QN at the rescheduling point has better expectations and robustness in the aspect of realizing multiple targets than the scheduling algorithm based on DDQN.
TABLE 3 average and standard deviation values of results of the scheduling algorithm and DDQN scheduling algorithm after 50 runs

Claims (4)

1. The dynamic workshop scheduling method based on Conv-lasting and generalization characterization is characterized by comprising the following steps of:
step A, determining flexible job shop scheduling;
firstly, establishing a logic scheduling formula of the JSSP, wherein lowercase letters represent indexes, and uppercase letters represent sets; there is a j= { J in the flexible shop scheduling problem 1 ,J 2 ,...,J j Sum of the workpiece m= { M 1 ,M 2 ,...,M m A } stage apparatus in which each work J j Comprises one or more processing procedures O= { O 1 ,O 2 ,...,O i -a }; each workpiece processing is performed in a fixed sequence, each process is processed by a plurality of devices and has different processing time periods P in different devices ji The method comprises the steps of carrying out a first treatment on the surface of the The scheduling is to reasonably distribute all the workpieces to all the equipment for processing, and aims at minimizing the maximum processing time, the maximum workpiece completion time precision and the minimum workpiece total delay time;
and (B) step (B): determining a scheduling problem of a dynamic flexible job shop;
(a) Designing state characteristics;
the state space is formed by a multi-dimensional matrix containing workpiece and equipment state information, and the multi-dimensional state matrix takes different scheduling characteristic information as different channels of an image, wherein each channel has the length of equipment serial numbers, the width of procedure sequences and the height of the number of the workpieces; the considered scheduling characteristic information comprises workpieces, working procedures, equipment, processing time, cut-off time and current time; each time information is standardized according to the overall maximum processing time; if an operation has been assigned to an apparatus, the apparatus is in a process state; the rightmost processing time channel of the image represents a numerical value obtained by calculating weights of processing time, cut-off time and current time, and multi-time information is expressed completely;
(b) Designing an action set;
selecting 9 types of optimal scheduling rules by comprehensively considering the processing time, the workpiece completion rate, the waiting time, the cut-off time, the arrival time and the idle time information factors as shown in table 1;
TABLE 1 action set table
(c) Designing a reward function;
adopting a compound rewarding mode combining a main line task and a branch line task, designing branch line rewards to guide an intelligent body to learn to an optimal action, and giving positive or negative feedback of success or failure when the main line rewards complete one training;
step C: the Conv-Dueling scheduling algorithm optimally solves the scheduling problem of the large-scale flexible job shop; in the training phase, the Conv-Dueling network adopts a deep convolutional neural network architecture.
2. The dynamic shop scheduling method based on Conv-forcing and generalization characterization according to claim 1, characterized in that the dominant line reward function is as in equation (1);
wherein R and R b Is the rewarding value set after multiple experiments, c r For the completion time rate of the workpiece, d r For failure rate of work, j t Max for the current processing time step t R is a target completion rate index for a processing time step threshold;
the spur rewards are shown in formula (2):
reward2=-(j l /m s )*μ (2)
wherein j is l For the number of unfinished tasks of the workpiece, m s The number of the total machines is represented by mu, and mu is a weight coefficient;
the total prize is represented by equation (3): wherein alpha is a weight coefficient;
reward=reward1+α*reward2 (3)。
3. the dynamic workshop scheduling method based on Conv-Dueling and generalization characterization according to claim 1, wherein the Conv-Dueling network takes a multidimensional state matrix containing workpiece and equipment processing information as input, a predicted Q value of scheduling actions as output, acquires response reward feedback, and continuously performs trial and error learning and continuous interaction with the environment, so as to finally ensure that a global better solution is obtained under the condition of maximum accumulated reward value.
4. The dynamic workshop scheduling method based on Conv-Dueling and generalization characterization according to claim 1, wherein the Conv-Dueling scheduling algorithm solves the dynamic flexible workshop scheduling problem as follows:
step 1: initializing the memory pool capacity as D, batch mini_batch, action cost function q and target cost functionInitializing parameters of a target network and an estimated network, wherein the learning rate is alpha, the discount rate is gamma;
step 2: resetting the scheduling context at the beginning of each round to obtain an initial state S 0
Step 3: at time t<At any time of T, the agent selects an action a from the action space according to the observed state t Performing, wherein T is equal to the total processing time step; action selection is based on the proposed epsilon-decremental strategy;
step 4: after the action is executed, the action with the highest priority in the equipment processing workpiece list is scheduled preferentially, and then the instant rewards r are observed t And next state s t+1
Step 5: data (S) t ,a t ,r t ,S t+1 ) The experience memory is stored in a memory pool D, the experience memory is detected, and if the maximum memory of the experience pool is exceeded, new experience is learned instead of old experience; performing loss calculation from the q value and the target value by conversion of given samples;
step 6: the network parameters are updated by the accumulated weight value change on all sampling conversions; to ensure stable convergence of the training process, the weights of the target q network are replaced by the weights of the target q network periodicallyA weight of the network;
step 7: judging whether all work procedures of the case are scheduled to be completed, if yes, entering the next round, and if no, continuing to execute the step 3;
step 8: and judging whether the round is ended, if so, outputting the scheduling model, and if not, continuing to execute the step 2.
CN202310600842.XA 2023-05-25 2023-05-25 Dynamic workshop scheduling method based on Conv-lasting and generalization characterization Pending CN116562584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310600842.XA CN116562584A (en) 2023-05-25 2023-05-25 Dynamic workshop scheduling method based on Conv-lasting and generalization characterization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310600842.XA CN116562584A (en) 2023-05-25 2023-05-25 Dynamic workshop scheduling method based on Conv-lasting and generalization characterization

Publications (1)

Publication Number Publication Date
CN116562584A true CN116562584A (en) 2023-08-08

Family

ID=87487858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310600842.XA Pending CN116562584A (en) 2023-05-25 2023-05-25 Dynamic workshop scheduling method based on Conv-lasting and generalization characterization

Country Status (1)

Country Link
CN (1) CN116562584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993135A (en) * 2023-09-27 2023-11-03 中南大学 Multi-stage sequencing and reservation scheduling method and device based on waiting time constraint

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993135A (en) * 2023-09-27 2023-11-03 中南大学 Multi-stage sequencing and reservation scheduling method and device based on waiting time constraint
CN116993135B (en) * 2023-09-27 2024-02-02 中南大学 Multi-stage sequencing and reservation scheduling method and device based on waiting time constraint

Similar Documents

Publication Publication Date Title
CN112734172B (en) Hybrid flow shop scheduling method based on time sequence difference
Leon et al. Strength and adaptability of problem-space based neighborhoods for resource-constrained scheduling
CN108694502B (en) Self-adaptive scheduling method for robot manufacturing unit based on XGboost algorithm
CN110222938B (en) Short-term peak-load regulation scheduling collaborative optimization method and system for cascade hydropower station group
CN116562584A (en) Dynamic workshop scheduling method based on Conv-lasting and generalization characterization
CN111160755B (en) Real-time scheduling method for aircraft overhaul workshop based on DQN
CN101901426A (en) Dynamic rolling scheduling method based on ant colony algorithm
CN112836974B (en) Dynamic scheduling method for multiple field bridges between boxes based on DQN and MCTS
CN110458326B (en) Mixed group intelligent optimization method for distributed blocking type pipeline scheduling
CN116500986A (en) Method and system for generating priority scheduling rule of distributed job shop
CN105373845A (en) Hybrid intelligent scheduling optimization method of manufacturing enterprise workshop
CN116700176A (en) Distributed blocking flow shop scheduling optimization system based on reinforcement learning
CN109034540B (en) Machine tool sequence arrangement dynamic prediction method based on work-in-process flow
CN116151581A (en) Flexible workshop scheduling method and system and electronic equipment
WO2024113585A1 (en) Intelligent interactive decision-making method for discrete manufacturing system
CN112488543A (en) Intelligent work site shift arrangement method and system based on machine learning
CN117117850A (en) Short-term electricity load prediction method and system
CN116720703A (en) AGV multi-target task scheduling method and system based on deep reinforcement learning
CN114219274A (en) Workshop scheduling method adapting to machine state based on deep reinforcement learning
Chiu et al. A GA embedded dynamic search algorithm over a Petri net model for an fms scheduling
CN115933568A (en) Multi-target distributed hybrid flow shop scheduling method
CN116011723A (en) Intelligent dispatching method and application of coking and coking mixed flow shop based on Harris eagle algorithm
CN114912826B (en) Flexible job shop scheduling method based on multilayer deep reinforcement learning
CN114545884B (en) Equivalent parallel machine dynamic intelligent scheduling method based on enhanced topological neural evolution
CN114399152B (en) Method and device for optimizing comprehensive energy scheduling of industrial park

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination