CN114154821A - Intelligent scheduling dynamic scheduling method based on deep reinforcement learning - Google Patents

Intelligent scheduling dynamic scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN114154821A
CN114154821A CN202111390067.7A CN202111390067A CN114154821A CN 114154821 A CN114154821 A CN 114154821A CN 202111390067 A CN202111390067 A CN 202111390067A CN 114154821 A CN114154821 A CN 114154821A
Authority
CN
China
Prior art keywords
time
intelligent
reinforcement learning
production line
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111390067.7A
Other languages
Chinese (zh)
Inventor
宇文东方
万光华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Shenfuzhi Technology Co ltd
Original Assignee
Xiamen Shenfuzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Shenfuzhi Technology Co ltd filed Critical Xiamen Shenfuzhi Technology Co ltd
Priority to CN202111390067.7A priority Critical patent/CN114154821A/en
Publication of CN114154821A publication Critical patent/CN114154821A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Manufacturing & Machinery (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of intelligent scheduling, and discloses an intelligent scheduling dynamic scheduling method based on deep reinforcement learning, which comprises the following steps: 1) reading information; 2) processing data; 3) building a deep reinforcement learning frame; 4) considering the starting time and the ending time of each process, 5), dividing the order ending time into each process, building a deep reinforcement learning frame, using an Asynchronous Advantage Actor criticic (A3C) model, requiring the maximum reward value and the maximum entropy output by each selected action, randomizing the strategy by the method, dispersing the probability of each output action as much as possible instead of concentrating on one action, using the deep learning frame of A3C, having a high solving speed, and supporting the use requirement of a factory for intelligent production twice a day.

Description

Intelligent scheduling dynamic scheduling method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of intelligent scheduling, in particular to an intelligent scheduling method based on deep reinforcement learning.
Background
In the prior art, the intelligent scheduling dynamic scheduling method is mostly based on an optimization method and an approximation/heuristic algorithm. In recent years, many scholars have also begun to use deep reinforcement learning to solve various dynamic scheduling problems, including intelligent production dynamic scheduling problems. The optimization method mainly comprises a Mixed Integer Linear Programming (MILP), a branch-and-bound method, a Laplace relaxation method and the like; the approximation/heuristic method is introduced originally because of small calculation amount and easy realization of the algorithm, and mainly comprises a priority assignment rule (PDR), a Neural Network (NN) and a neighborhood search method (NS), wherein the neighborhood search method comprises an approximation optimization method which can be called as a sub-heuristic (Meta-heuristic) such as Tabu Search (TS), Genetic Algorithm (GA) and Simulated Annealing (SA), and the optimization method is mainly limited by the calculation scale. Since there are (n!). times.m possible solutions to an n.times.m intelligent scheduling problem, it is computationally infeasible for large scale problems to use an accurate solution.
At present, the research of a deep reinforcement learning model (DRL) on the intelligent scheduling dynamic scheduling problem is greatly developed, and the deep reinforcement learning is widely applied to solving various dynamic scheduling problems. Compared with the traditional heuristic priority scheduling rule, the model is more flexible, a reinforcement learning environment can model random decision and flexible problems, such as non-deterministic operation re-entry, serial-parallel sequence among processes, optional production lines in the processes, optional production lines in equipment, and the like, but most of the processing methods are still in the theoretical research stage, complex constraint modeling for real requirements of factories cannot be performed, and an intelligent production scheduling dynamic scheduling method for meeting the real requirements of the factories cannot be provided for the problems of shutdown, random processing time, order ending time and the like of some random factories; in addition, real plant requirements are generally to consider Advanced Planning Scheduling (APS), model short-term plans and medium-term plans separately, and ensure the accuracy of the short-term plans and the fast solution of the long-term plans, which is also a field that cannot be covered by the current mainstream deep reinforcement learning model.
Therefore, an intelligent scheduling dynamic scheduling method based on deep reinforcement learning is provided.
Disclosure of Invention
The invention aims to provide an intelligent scheduling method based on deep reinforcement learning, which solves the problem that the intelligent scheduling of a factory, which is real-time, autonomous and unmanned, is difficult to realize in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: the intelligent scheduling dynamic scheduling method based on deep reinforcement learning comprises the following steps:
s1: reading the order condition, the material quantity, the calendar of workers and the production calendar of a production line received by a factory at the current moment;
s2: processing the read original data, and distinguishing a short-term plan and a long-term plan according to the delivery date of the order and the condition of the required materials;
s3: building a deep reinforcement learning framework, and inputting and training production line, working procedure and productivity feature vectors to obtain a target strategy network of a target intelligent agent;
s4: considering the starting time and the ending time of each process and the time calendar of each production line and machine;
s5: and splitting the order ending time into the processes.
Further, in S1, where the order data includes the number of the required products and the product delivery deadline, each product needs to go through several processes, each process has a certain serial-parallel sequence, and the on-line switch machine or material needs a certain equipment transfer time, and there is usually a minimum waiting time or a maximum waiting time constraint between the serial processes.
Further, in S2, the short-term plan requires a fine processing in units of minutes, and all the processes in the part of the order are all arranged on the production line; the order of the long-term plan only needs to evaluate the resource conditions of material quantity, production line, machine, productivity and the like, an early warning is given when a resource bottleneck exists, a rough scheduling result in a day unit is provided when no resource bottleneck exists, the quantity of workers, machines and production line on each time node is calculated, and a resource time axis in a minute unit is generated by combining a worker shift calendar and a production line production calendar.
Further, in S3, using the Asynchronous variable Actor Critic (A3C) model, in addition to requiring the reward value to be maximum, the entropy output of each selected action is also required to be maximum, by which the strategy is randomized with the probability of each action being output being as distributed as possible rather than concentrated on one action.
Further, in S3, a scheduling target of the deep neural network in the current time target policy network is obtained, and the selection probability corresponding to each optimization target is obtained by inputting a classification function after processing the characteristic vectors of the production line, the process, and the capacity state.
Further, for the constraints described in S1, the earliest start time and the expected end time of the process are introduced, and all the constraints are converted into time axes on the process and the production line to be uniformly controlled and updated.
Further, in S4, the following steps are included:
s41: firstly, initializing a part with a requirement of a front process to a larger value;
s42: when all the pre-processes are completely finished, updating the starting time to be the maximum ending time of all the pre-processes;
s43: secondly, for the situation with the minimum waiting time or the maximum waiting time constraint, after the front process is finished, the start time needs to be updated to the end time plus the minimum/maximum waiting time of the front process.
Further, in S5, since the deep reinforcement learning model needs to update the reward value repeatedly, each order needs to be guaranteed to be completed before the delivery date as much as possible, and the importance of the order is taken into consideration, the reward function needs to be designed according to static attributes such as whether the order is urgent or not, by dividing the total available time into available times for each process according to a certain rule.
Further, the offline training comprises the following steps:
s01: generating each production line as a target strategy network of an intelligent agent;
s02: updating a reward function network used by the reward value;
s03: and storing the state feature vector of each intermediate state, and performing parameter initialization on each network.
In each training period, a new training environment is randomly generated, A3C is used for off-line pre-training all intelligent bodies, an optimal process-production line distribution scheme is generated according to a target strategy network of each intelligent body, target decision states such as the latest end time, the idle time ratio and whether the end time of each process is later than the expectation are considered according to each production line, a reward function network is generated, a target state value network and a state feature vector of a target intelligent body are updated through a minimum square error loss function (MSE), and the process is continuously carried out until the distribution schemes of all the processes meet the use requirements finally.
Further, in the intelligent scheduling process, firstly, a current production line scheduling feature vector is read, then a current executable process vector is screened according to the preorder process and material conditions, meanwhile, the production line and the process vector are used as input and trained in the deep reinforcement learning intelligent body network to obtain a process-production line assignment rule of the current time, then, according to whether all processes are allocated to a production line, if not, the time is updated according to a time axis movement rule, then, the intelligent body reward network is updated according to a reward value, finally, according to the completed tasks, the production line and the processes are updated, a new process-production line assignment rule is entered, if yes, whether the iteration is carried out for the maximum times or the target function is converged is continuously judged, if yes, a deep reinforcement learning intelligent scheduling result is output, if not, the production line scheduling feature vector is read again, until outputting the intelligent scheduling result of the deep reinforcement learning.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the intelligent scheduling method based on the deep reinforcement learning, a deep reinforcement learning frame is built, an Asynchronous Advantage Actor critical (A3C) model is used, the maximum reward value is required, the maximum entropy output by selecting an action each time is also required, the strategy is randomized through the method, the probability of each output action is dispersed as far as possible instead of being concentrated on one action, the solving speed is high by using the deep learning frame of A3C, and the use requirement that a factory does intelligent scheduling twice a day can be supported.
2. The invention provides an intelligent scheduling method based on deep reinforcement learning, which considers the starting time and the ending time of each process and the time calendar of each production line and machine, introduces the earliest starting time and the expected ending time of the process, converts all the constraints into time axes on the process and the production line for unified control and updating, firstly, for the part with the requirement of the preposed process, the starting time needs to be initialized to a larger value, and when all the preposed processes are completed, the starting time is updated to the maximum ending time of all the preposed processes; secondly, for the condition of having the minimum waiting time or the maximum waiting time constraint, after the front-end process is finished, the start time needs to be updated to the end time of the front-end process plus the minimum/maximum waiting time, so that the real requirements of the factory can be fully considered, a certain serial-parallel sequence exists among the processes, a certain equipment transfer time is needed for switching machines or materials on a production line, the serial processes usually have the minimum waiting time or the maximum waiting time constraint, each order has an order deadline time, and the order has a priority.
3. The intelligent scheduling method based on deep reinforcement learning processes the read original data, firstly, distinguishes short-term plans and long-term plans according to order delivery date and required material conditions, wherein the short-term plans need to be finely processed in units of minutes, and all procedures in the part of orders are completely scheduled to a production line; the order of the long-term plan only needs to evaluate the resource conditions of material quantity, production line, machine, productivity and the like, early warning is given when a resource bottleneck exists, a rough production scheduling result in days is provided when no resource bottleneck exists, then the quantity of workers, machines and production line on each time node is calculated, a resource time axis in minutes is generated by combining a worker shift calendar and a production line production calendar, the order can be distinguished according to Advanced Planning (APS), the short-term plan and the medium-term plan are separately modeled, meanwhile, the accuracy of the short-term plan and the fast solution of the long-term plan are guaranteed, and the time for solving the intelligent production scheduling problem of the factory can be greatly shortened.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
FIG. 1 is an overall flowchart of the intelligent scheduling method based on deep reinforcement learning according to the present invention;
FIG. 2 is a flow chart of the intelligent scheduling method based on deep reinforcement learning according to the present invention;
FIG. 3 is a flowchart of the offline training of the intelligent scheduling method based on deep reinforcement learning according to the present invention;
FIG. 4 is a flowchart of a time control method of the intelligent scheduling dynamic scheduling method based on deep reinforcement learning according to the present invention.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.
In addition, the term "plurality" shall mean two as well as more than two.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, the intelligent scheduling method based on deep reinforcement learning includes the following steps:
s1: reading the order condition, the material quantity, the calendar of workers and the production calendar of a production line received by a factory at the current moment;
s2: processing the read original data, and distinguishing a short-term plan and a long-term plan according to the delivery date of the order and the condition of the required materials;
s3: building a deep reinforcement learning framework, and inputting and training production line, working procedure and productivity feature vectors to obtain a target strategy network of a target intelligent agent;
s4: considering the starting time and the ending time of each process and the time calendar of each production line and machine;
s5: and splitting the order ending time into the processes.
At S1, the order data includes the number of the required products and the product delivery deadline, each product needs to go through several processes, each process has a certain serial-parallel sequence, and the on-line switch machine or material needs a certain equipment transfer time, and the serial processes usually have the minimum latency or maximum latency constraints.
In S2, the short-term plan requires a fine processing in units of minutes, and all the processes in the part of the order are arranged on the production line; the order of the long-term plan only needs to evaluate the resource conditions of material quantity, production line, machine, productivity and the like, an early warning is given when a resource bottleneck exists, a rough scheduling result in a day unit is provided when no resource bottleneck exists, the quantity of workers, machines and production line on each time node is calculated, and a resource time axis in a minute unit is generated by combining a worker shift calendar and a production line production calendar.
In S3, an Asynchronous variable access Actor critical (A3C) model is used, which requires the maximum reward value and the maximum entropy output for each action, so that the strategy is randomized, the probability of each output action is dispersed as much as possible rather than concentrated on one action, the scheduling target of the deep neural network in the current time target strategy network is obtained, and the production line, process, and capacity state feature vectors are processed and then input to a classification function to obtain the selection probability corresponding to each optimization target.
Referring to fig. 1 and 4, for the constraints described in S1, the earliest start time and the expected end time of the process are introduced, and all the constraints are converted into a time axis on the process and the production line to be uniformly controlled and updated.
The following steps are included in S4:
s41: firstly, initializing a part with a requirement of a front process to a larger value;
s42: when all the pre-processes are completely finished, updating the starting time to be the maximum ending time of all the pre-processes;
s43: secondly, for the situation with the minimum waiting time or the maximum waiting time constraint, after the front process is finished, the start time needs to be updated to the end time plus the minimum/maximum waiting time of the front process.
In S5, since the deep reinforcement learning model needs to update the reward value repeatedly, each order needs to be guaranteed to be completed before the delivery date as much as possible, and the importance of the order is taken into consideration, the reward function needs to be designed according to static attributes such as whether the order is urgent or not, by dividing the total available time into available time for each process according to a certain rule.
Referring to fig. 1 and 3, the off-line training includes the steps of:
s01: generating each production line as a target strategy network of an intelligent agent;
s02: updating a reward function network used by the reward value;
s03: and storing the state feature vector of each intermediate state, and performing parameter initialization on each network.
In each training period, a new training environment is randomly generated, A3C is used for off-line pre-training all intelligent bodies, an optimal process-production line distribution scheme is generated according to a target strategy network of each intelligent body, target decision states such as the latest end time, the idle time ratio and whether the end time of each process is later than the expectation are considered according to each production line, a reward function network is generated, a target state value network and a state feature vector of a target intelligent body are updated through a minimum square error loss function (MSE), and the process is continuously carried out until the distribution schemes of all the processes meet the use requirements finally.
Referring to fig. 2, in the intelligent scheduling process, firstly, the scheduling feature vector of the production line at the current moment is read, then the current executable process vector is screened according to the preorder process and material condition, meanwhile, the production line and the process vector are used as input to train in the deep reinforcement learning intelligent network, the process-production line assignment rule at the current moment is obtained, then, according to whether all the processes are allocated to the production line, if not, the time is updated according to the time axis movement rule, then, the intelligent reward network is updated according to the reward value, finally, the production line and the process are updated according to the completed tasks, the new process-production line assignment rule is entered, if yes, whether the iteration is carried out for the maximum times or the target function is converged is continuously judged, if yes, the deep reinforcement learning intelligent scheduling result is output, if not, the scheduling feature vector of the production line is read again, until outputting the intelligent scheduling result of the deep reinforcement learning.
In summary, the following steps: the invention provides an intelligent production scheduling dynamic scheduling method based on deep reinforcement learning, which comprises the following steps: s1: reading the order condition, the material quantity, the calendar of workers and the production calendar of a production line received by a factory at the current moment; s2: processing the read original data, and distinguishing a short-term plan and a long-term plan according to the delivery date of the order and the condition of the required materials; s3: building a deep reinforcement learning framework, and inputting and training production line, working procedure and productivity feature vectors to obtain a target strategy network of a target intelligent agent; s4: considering the starting time and the ending time of each process and the time calendar of each production line and machine; s5: splitting order ending time into each procedure, building a deep reinforcement learning frame, using an Asynchronous Advantage Actor criticic (A3C) model, requiring maximum reward value and maximum entropy output by selecting actions each time, randomizing strategies by the method, dispersing the probability of each output action as much as possible instead of concentrating on one action, using the A3C deep learning frame, having high solving speed, supporting the use requirement of a factory for intelligent production twice a day, considering the starting time and the ending time of each procedure, the time calendar of each production line and machine, introducing the earliest starting time and the expected ending time of the procedure, converting all constraints into time axes on the procedure and the production line to uniformly control and update, firstly, for a part with a requirement of a front procedure, initializing the starting time to be a large value, when all the pre-processes are completely finished, updating the starting time to be the maximum ending time of all the pre-processes; secondly, for the condition of having the minimum waiting time or the maximum waiting time constraint, after the preset process is finished, the starting time needs to be updated to the ending time plus the minimum/maximum waiting time of the preset process, so that the real requirements of a factory can be fully considered, a certain serial-parallel sequence exists among processes, a certain equipment transfer time is needed for switching machines or materials on a production line, the serial process usually has the minimum waiting time or the maximum waiting time constraint, each order has an order ending time and has a priority, finally, the read-in original data is processed, firstly, a short-term plan and a long-term plan are distinguished according to the delivery date of the order and the condition of the required materials, wherein the short-term plan needs to be finely processed in a unit of minutes, and all the processes in the part of the order are completely arranged on the production line; the order of the long-term plan only needs to evaluate the resource conditions of material quantity, production line, machine, productivity and the like, early warning is given when a resource bottleneck exists, a rough production scheduling result in days is provided when no resource bottleneck exists, then the quantity of workers, machines and production line on each time node is calculated, a resource time axis in minutes is generated by combining a worker shift calendar and a production line production calendar, the order can be distinguished according to Advanced Planning (APS), the short-term plan and the medium-term plan are separately modeled, meanwhile, the accuracy of the short-term plan and the fast solution of the long-term plan are guaranteed, and the time for solving the intelligent production scheduling problem of the factory can be greatly shortened.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. The intelligent scheduling dynamic scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
s1: reading the order condition, the material quantity, the calendar of workers and the production calendar of a production line received by a factory at the current moment;
s2: processing the read original data, and distinguishing a short-term plan and a long-term plan according to the delivery date of the order and the condition of the required materials;
s3: building a deep reinforcement learning framework, and inputting and training production line, working procedure and productivity feature vectors to obtain a target strategy network of a target intelligent agent;
s4: considering the starting time and the ending time of each process and the time calendar of each production line and machine;
s5: and splitting the order ending time into the processes.
2. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 1, wherein: at S1, the order data includes the number of the required products and the product delivery deadline, each product needs to go through several processes, each process has a certain serial-parallel sequence, and the on-line switch machine or material needs a certain equipment transfer time, and the serial processes usually have the minimum latency or maximum latency constraints.
3. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 2, wherein: in S2, the short-term plan requires a fine processing in units of minutes, and all the processes in the part of the order are arranged on the production line; the order of the long-term plan only needs to evaluate the material quantity, the production line, the machine and the capacity resource conditions, an early warning is given when a resource bottleneck exists, a rough scheduling result in a day unit is provided when no resource bottleneck exists, the quantity of workers, the machine and the production line on each time node is calculated, and a resource time axis in a minute unit is generated by combining a worker shift calendar and a production line production calendar.
4. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 3, wherein: in S3, using the Asynchronous variable Actor critical (A3C) model, in addition to requiring the maximum reward value, the maximum entropy output per selection action is required, by which the strategy is randomized with the probability of each action being output being as discrete as possible rather than concentrated on one action.
5. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 4, wherein: in S3, a scheduling target of the deep neural network in the current time target policy network is obtained, and the selection probability corresponding to each optimization target is obtained by inputting a classification function after processing the characteristic vectors of the production line, the process, and the capacity state.
6. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 5, wherein: for the constraints described in S1, the earliest start time and the expected end time of the process are introduced, and all the constraints are converted into time axes on the process and the production line to be uniformly controlled and updated.
7. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 6, wherein: the following steps are included in S4:
s41: firstly, initializing a part with a requirement of a front process to a larger value;
s42: when all the pre-processes are completely finished, updating the starting time to be the maximum ending time of all the pre-processes;
s43: secondly, for the situation with the minimum waiting time or the maximum waiting time constraint, after the front process is finished, the start time needs to be updated to the end time plus the minimum/maximum waiting time of the front process.
8. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 7, wherein: in S5, since the deep reinforcement learning model needs to update the reward value repeatedly, each order needs to be guaranteed to be completed before the delivery date as much as possible, and the importance of the order is taken into consideration, the reward function needs to be designed according to static attributes such as whether the order is urgent or not, by dividing the total available time into available time for each process according to a certain rule.
9. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 8, wherein: the off-line training comprises the following steps:
s01: generating each production line as a target strategy network of an intelligent agent;
s02: updating a reward function network used by the reward value;
s03: and storing the state feature vector of each intermediate state, and performing parameter initialization on each network.
In each training period, a new training environment is randomly generated, A3C is used for off-line pre-training all intelligent agents, an optimal process-production line distribution scheme is generated according to a target strategy network of each intelligent agent, a reward function network is generated according to the fact that the production lines consider the latest end time, the idle time ratio and whether the process end time is later than the expected target decision state, the target state value network and the state feature vector of the target intelligent agent are updated through a minimum square error loss function (MSE), and the process is continuously carried out until the distribution schemes of all the processes meet the use requirements finally.
10. The intelligent production scheduling dynamic scheduling method based on deep reinforcement learning of claim 9, wherein: in the intelligent production scheduling process, firstly, the scheduling characteristic vector of the production line at the current moment is read, then the vector of the current executable procedure is screened according to the preorder procedure and the material condition, and simultaneously, the production line and the procedure vector are used as input, training in a deep reinforcement learning intelligent network to obtain a process-production line assignment rule at the current moment, and distributing all processes to a production line according to whether all the processes are distributed or not, if not, updating time according to the time axis moving rule, updating the intelligent agent reward network according to the reward value, updating production lines and working procedures according to the completed tasks, entering a new working procedure-production line assignment rule, if yes, continuously judging whether the iteration reaches the maximum times or the target function is converged, if so, outputting an intelligent scheduling result of the deep reinforcement learning, otherwise, and reading the production line scheduling feature vector again until the intelligent scheduling result of the deep reinforcement learning is output.
CN202111390067.7A 2021-11-22 2021-11-22 Intelligent scheduling dynamic scheduling method based on deep reinforcement learning Pending CN114154821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111390067.7A CN114154821A (en) 2021-11-22 2021-11-22 Intelligent scheduling dynamic scheduling method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111390067.7A CN114154821A (en) 2021-11-22 2021-11-22 Intelligent scheduling dynamic scheduling method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114154821A true CN114154821A (en) 2022-03-08

Family

ID=80457290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111390067.7A Pending CN114154821A (en) 2021-11-22 2021-11-22 Intelligent scheduling dynamic scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114154821A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167136A (en) * 2022-07-21 2022-10-11 中国人民解放军国防科技大学 Intelligent agent control method based on deep reinforcement learning and conditional entropy bottleneck
CN116151599A (en) * 2023-04-21 2023-05-23 湖南维胜科技有限公司 Scheduling data processing method based on deep reinforcement learning
CN116307251A (en) * 2023-04-12 2023-06-23 哈尔滨理工大学 Work schedule optimization method based on reinforcement learning
CN117391423A (en) * 2023-12-11 2024-01-12 东北大学 Multi-constraint automatic scheduling method for chip high multilayer ceramic package substrate production line
CN117634859A (en) * 2024-01-26 2024-03-01 清云小筑(北京)创新技术有限公司 Resource balance construction scheduling method, device and equipment based on deep reinforcement learning
CN117631633A (en) * 2024-01-26 2024-03-01 四川琪达实业集团有限公司 Flexible control system and method for clothing customization production line

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167136A (en) * 2022-07-21 2022-10-11 中国人民解放军国防科技大学 Intelligent agent control method based on deep reinforcement learning and conditional entropy bottleneck
CN116307251A (en) * 2023-04-12 2023-06-23 哈尔滨理工大学 Work schedule optimization method based on reinforcement learning
CN116307251B (en) * 2023-04-12 2023-09-19 哈尔滨理工大学 Work schedule optimization method based on reinforcement learning
CN116151599A (en) * 2023-04-21 2023-05-23 湖南维胜科技有限公司 Scheduling data processing method based on deep reinforcement learning
CN117391423A (en) * 2023-12-11 2024-01-12 东北大学 Multi-constraint automatic scheduling method for chip high multilayer ceramic package substrate production line
CN117391423B (en) * 2023-12-11 2024-03-22 东北大学 Multi-constraint automatic scheduling method for chip high multilayer ceramic package substrate production line
CN117634859A (en) * 2024-01-26 2024-03-01 清云小筑(北京)创新技术有限公司 Resource balance construction scheduling method, device and equipment based on deep reinforcement learning
CN117631633A (en) * 2024-01-26 2024-03-01 四川琪达实业集团有限公司 Flexible control system and method for clothing customization production line
CN117634859B (en) * 2024-01-26 2024-04-12 清云小筑(北京)创新技术有限公司 Resource balance construction scheduling method, device and equipment based on deep reinforcement learning
CN117631633B (en) * 2024-01-26 2024-04-19 四川琪达实业集团有限公司 Flexible control system and method for clothing customization production line

Similar Documents

Publication Publication Date Title
CN114154821A (en) Intelligent scheduling dynamic scheduling method based on deep reinforcement learning
Nakasuka et al. Dynamic scheduling system utilizing machine learning as a knowledge acquisition tool
Wang et al. Application of reinforcement learning for agent-based production scheduling
Jones et al. Survey of job shop scheduling techniques
US20210278825A1 (en) Real-Time Production Scheduling with Deep Reinforcement Learning and Monte Carlo Tree Research
O'GRADY et al. An intelligent cell control system for automated manufacturing
Chiu et al. A learning-based methodology for dynamic scheduling in distributed manufacturing systems
CN109270904A (en) A kind of flexible job shop batch dynamic dispatching optimization method
Littman et al. Reinforcement learning: A survey
Qu et al. A centralized reinforcement learning approach for proactive scheduling in manufacturing
CN114503038A (en) Method and apparatus for self-learning manufacturing schedule for flexible manufacturing system using state matrix
McAllister et al. Rescheduling penalties for economic model predictive control and closed-loop scheduling
Palacio et al. A Q-Learning algorithm for flexible job shop scheduling in a real-world manufacturing scenario
CN112488542B (en) Intelligent material scheduling method and system for intelligent building site based on machine learning
Rovithakis et al. Application of a neural-network scheduler on a real manufacturing system
Eberts et al. Distributed planning of collaborative production
Varghese et al. Dynamic spatial block arrangement scheduling in shipbuilding industry using genetic algorithm
Asadi-Zonouz et al. A hybrid unconscious search algorithm for mixed-model assembly line balancing problem with SDST, parallel workstation and learning effect
Libosvar Hierarchies in production management and control: A survey
Michelini et al. Integrated management of concurrent shopfloor operations
Kádár Intelligent approaches to manage changes and disturbances in manufacturing systems
Martinez Solving batch process scheduling/planning tasks using reinforcement learning
Workneh et al. Deep Q Network Method for Dynamic Job Shop Scheduling Problem
WO2024028485A1 (en) Artificial intelligence control and optimization of agent tasks in a warehouse
Sanoff et al. Integrated information processing for production scheduling and control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination