CN116485119A - Scheduling method, scheduling device, scheduling equipment and storage medium - Google Patents
Scheduling method, scheduling device, scheduling equipment and storage medium Download PDFInfo
- Publication number
- CN116485119A CN116485119A CN202310358190.3A CN202310358190A CN116485119A CN 116485119 A CN116485119 A CN 116485119A CN 202310358190 A CN202310358190 A CN 202310358190A CN 116485119 A CN116485119 A CN 116485119A
- Authority
- CN
- China
- Prior art keywords
- training
- scheduling
- scheduled
- information
- equipment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 235
- 230000002787 reinforcement Effects 0.000 claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000009471 action Effects 0.000 claims description 83
- 238000004519 manufacturing process Methods 0.000 claims description 61
- 230000008859 change Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 10
- 238000013461 design Methods 0.000 claims description 9
- 238000003754 machining Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 16
- 238000013473 artificial intelligence Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000003993 interaction Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000009776 industrial production Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010924 continuous production Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000002986 genetic algorithm method Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000011031 large-scale manufacturing process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06316—Sequencing of tasks or work
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Manufacturing & Machinery (AREA)
Abstract
The invention discloses a scheduling method, a device, equipment and a storage medium, wherein the scheduling method comprises the following steps: determining information to be produced, wherein the information to be produced is used for indicating information of equipment to be produced and information of workpieces to be produced; initializing the on-line environment of the Russian square according to the information to be produced; according to the pre-trained scheduling model and the Russian square online environment, a scheduling plan of each to-be-scheduled device output by the scheduling model is obtained, wherein the scheduling model is a model obtained by training by using a deep reinforcement learning method and taking the balanced utilization rate of the device as an index, and the scheduling plan of each to-be-scheduled device comprises information of the to-be-scheduled workpiece corresponding to the to-be-scheduled device and the processing sequence of each to-be-scheduled workpiece. The scheduling scheme of the embodiment of the invention can improve the balanced utilization rate and the scheduling efficiency of the equipment.
Description
Technical Field
The present invention relates to the field of intelligent scheduling technologies, and in particular, to a scheduling method, device, apparatus, and storage medium.
Background
Along with the strong market competition, the processing production factory needs to be adjusted according to the actual capability of enterprise manufacturing resources and dynamic changes of equipment productivity, inventory and production progress, and the scheduling optimization and monitoring of the manufacturing process become unavoidable links for improving the core competitiveness of enterprises.
Under the condition that the production scale and the product complexity are continuously increased, the original manual scheduling method is more and more difficult to meet the requirements for solving the scheduling problem of production scheduling, and the traditional solver solving and genetic algorithm method for solving the scheduling problem has the problems of overlong calculation time of the scheduling, low balanced utilization rate of equipment and the like.
Disclosure of Invention
The invention provides a scheduling method, a scheduling device, scheduling equipment and a storage medium, which are used for solving the technical problems of overlong scheduling calculation time and low equipment balance utilization rate in the related technology.
According to an aspect of the present invention, there is provided a scheduling method including:
determining information to be produced; the information to be produced is used for indicating information of equipment to be produced and information of workpieces to be produced;
initializing an on-line environment of the Russian square according to the information to be produced;
according to a pre-trained scheduling model, interacting with the Russian square online environment to obtain a scheduling plan of each equipment to be scheduled output by the scheduling model; the scheduling model is a model obtained by training by using a deep reinforcement learning method and taking the balanced utilization rate of equipment as an index, and the scheduling plan of each piece of equipment to be scheduled comprises information of corresponding pieces of work to be scheduled and the processing sequence of the pieces of work to be scheduled.
In the method shown above, the interacting with the russian block online environment according to the pre-trained scheduling model to obtain a scheduling plan of each device to be scheduled output by the scheduling model includes:
acquiring intermediate state information output by the Russian block online environment according to the information to be produced;
inputting the intermediate state information into the scheduling model, acquiring a strategy action generated by the scheduling model according to the intermediate state information, feeding back the strategy action to the Russian block online environment, acquiring new intermediate state information output by the Russian block online environment according to the strategy action, and repeatedly executing the steps until all workpieces to be scheduled are scheduled, and determining a scheduling plan of each equipment to be scheduled according to the strategy action output by the scheduling model.
In the method as shown above, the intermediate state information includes: the type of the currently scheduled workpiece, the position of the currently scheduled workpiece, the load of each device to be scheduled, the information of the currently non-scheduled workpiece and the information of the current last action;
the policy actions include: and selecting the mark of the work piece which is not finished in production, and the mark of the equipment to be produced corresponding to the work piece which is not finished in production.
In the method as shown above, the information of the equipment to be produced includes the number and type of the equipment to be produced, and the information of the workpieces to be produced includes the number and type of the workpieces to be produced;
the information to be produced further includes: mapping relation between equipment to be produced and workpieces to be produced.
In the method as described above, when the types of the two workpieces to be produced that are adjacent in the machining order are different, the production schedule further includes: and the two processing sequences are adjacent to each other, and the workpiece to be produced is subjected to the type changing operation.
In the method shown above, before the interaction between the scheduling model and the russian block online environment according to the pre-trained scheduling model to obtain the scheduling plan of each device to be scheduled output by the scheduling model, the method further includes:
initializing a Russian block training environment according to the scheduling information for training;
acquiring new training intermediate state information output by the Russian block training environment according to the training scheduling information;
inputting the new training intermediate state information into a training scheduling model, acquiring a training strategy action generated randomly or acquiring a training strategy action generated by the training scheduling model according to the new training intermediate state information, feeding back the training strategy action to the Russian block training environment, acquiring updated new training intermediate state information output by the Russian block training environment according to the training strategy action, updating the training scheduling model with a set frequency by adopting a random gradient descent algorithm according to training intermediate features output by the Russian block training environment, and repeatedly executing the step until the training scheduling model meets the preset design requirement, wherein the training scheduling model is used as the scheduling model when the preset design requirement is met;
Wherein the training intermediate feature comprises: new training intermediate state information, old training intermediate state information, a reward value and a current action strategy, wherein the reward value is determined according to the equipment balance utilization rate.
In the method as shown above, the prize value is calculated as follows:
when the equipment balance utilization rate U1 before the current action strategy is larger than the equipment balance utilization rate U2 after the current action strategy, the rewarding value is a first value; wherein, u1= (number of times of change of the old training intermediate state is the equipment load difference of the old training intermediate state)/number of workpieces already scheduled in the old training intermediate state, u2= (number of times of change of the new training intermediate state is the equipment load difference of the new training intermediate state)/number of workpieces already scheduled in the new training intermediate state;
when U1 is equal to U2, the bonus value is a second value;
when U1 is smaller than U2, the reward value is a third value;
the magnitude relation among the first value, the second value and the third value is as follows: the first value > the second value > the third value.
According to another aspect of the present invention, there is provided a scheduling apparatus comprising:
the first determining module is used for determining information to be produced; the information to be produced is used for indicating information of equipment to be produced and information of workpieces to be produced;
The initialization module is used for initializing the on-line environment of the Russian square according to the information to be produced;
the second determining module is used for interacting with the Russian square online environment according to a pre-trained scheduling model to obtain a scheduling plan of each equipment to be scheduled, which is output by the scheduling model; the scheduling model is a model obtained by training by using a deep reinforcement learning method and taking the balanced utilization rate of equipment as an index, and the scheduling plan of each piece of equipment to be scheduled comprises information of corresponding pieces of work to be scheduled and the processing sequence of the pieces of work to be scheduled.
According to still another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the scheduling method of any one of the embodiments of the present invention.
According to yet another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute the scheduling method according to any one of the embodiments of the present invention.
According to the technical scheme, through interaction of the Russian square game and the scheduling model which is obtained by training in advance by taking the balanced utilization rate of the improved equipment as an index through a deep reinforcement learning method, the scheduling plan of each equipment to be scheduled which is output by the scheduling model is obtained, and application of the deep reinforcement learning combined with the scheduling algorithm technology of the Russian square game in the production and scheduling field is realized. Because the model is obtained by training by using a deep reinforcement learning method and taking the improved equipment balance utilization rate as an index when the scheduling model is trained, the output scheduling plan can improve the equipment balance utilization rate when scheduling according to the scheduling model, and meanwhile, the scheduling efficiency is higher because artificial intelligence is adopted in the scheduling process.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a scheduling method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a scheduling plan output by a scheduling method according to a first embodiment of the present invention;
FIG. 3 is a schematic illustration of the interaction of a scheduling model with the Russian block online environment;
FIG. 4 is a flow chart of a scheduling method according to a second embodiment of the present invention;
FIG. 5 is a schematic diagram of a game interface before and after a current action strategy in the embodiment shown in FIG. 4;
FIG. 6 is a flow chart of a specific training method for the scheduling model in the embodiment shown in FIG. 4;
FIG. 7 is a deep reinforcement learning principle frame diagram;
fig. 8 is a schematic structural view of a production scheduling device according to an embodiment of the present invention;
FIG. 9 is a schematic view of another apparatus for scheduling production according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device implementing a production scheduling method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "on-line," "training," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flow chart of a production scheduling method according to an embodiment of the present invention, where the method may be applied to a case of scheduling workpieces in a factory, and the method may be performed by a production scheduling device, and the production scheduling device may be implemented in hardware and/or software, and the production scheduling device may be configured in a computer device such as a server. As shown in fig. 1, the method comprises the steps of:
Step 101: and determining the information to be produced.
The information to be produced is used for indicating the information of the equipment to be produced and the information of the workpiece to be produced.
The production scheduling method in the embodiment can be used in a scene of scheduling workpieces in a factory. Illustratively, the factory herein may be an auto parts processing factory, a home electric appliance parts processing factory, or the like. The production scheduling in this embodiment refers to determining information and a processing order of the workpieces to be produced processed on the equipment to be produced.
Optionally, the information of the equipment to be scheduled in the present embodiment includes the number and types of the equipment to be scheduled. The information of the workpieces to be produced includes the number and types of the workpieces to be produced.
Optionally, the to-be-produced information further includes: mapping relation between equipment to be produced and workpieces to be produced.
It is understood that when the to-be-produced device is a general-purpose device, the mapping relationship between the to-be-produced device and the to-be-produced workpiece may not be included in the to-be-produced information.
Step 102: and initializing the on-line environment of the Russian square according to the information to be produced.
Step 103: and according to the pre-trained scheduling model, interacting with the Russian square online environment to obtain a scheduling plan of each equipment to be scheduled, which is output by the scheduling model.
The scheduling model is obtained by training by using a deep reinforcement learning method and taking the balance utilization rate of the equipment as an index. The scheduling plan of each to-be-scheduled device comprises information of the corresponding to-be-scheduled workpieces of the to-be-scheduled device and the processing sequence of the to-be-scheduled workpieces.
How to research and analyze various factors influencing production under the conditions of more production links, complex cooperation relationship, strong production continuity, balanced production, low equipment utilization rate and rapid condition change of modern processing and production enterprises, and adopting corresponding scheduling plans according to different conditions is a technical problem faced at present.
Russian squares are a classical video game, and the basic rules are to move, spin, and put the various squares that the game automatically outputs, to arrange them into a complete row or rows and to eliminate scoring. In terms of improving the plane utilization rate, the method has high similarity with a scene of improving the equipment equilibrium utilization rate in a factory. Therefore, in this embodiment, the Russian block on-line environment and the deep reinforcement learning method are utilized to train the scheduling model for scheduling.
In step 102, the Russian block on-line environment may be initialized based on the Russian block generation script and the information to be produced.
In step 103, the scheduling model trained in advance is used to interact with the Russian block online environment, so as to obtain a scheduling plan of each equipment to be scheduled output by the scheduling model. The model is obtained by training by using a deep reinforcement learning method and taking the balance utilization rate of the improved equipment as an index when training the scheduling model. Therefore, when the production is scheduled according to the production scheduling model, the output production scheduling plan can improve the equipment balance utilization rate, and meanwhile, the production scheduling efficiency is higher due to the adoption of artificial intelligence in the production scheduling process.
The deep reinforcement learning of the artificial intelligence technology combines the perception capability of the deep learning and the decision capability of the reinforcement learning, can be directly controlled according to the input data characteristics, and is an artificial intelligence method which is more similar to a human thinking mode.
Deep learning has stronger perceptibility, but lacks certain decision-making capability; and reinforcement learning has decision making capability and is not in charge of sensing problems. Therefore, the two are combined, the advantages are complementary, and a solution idea is provided for the perception decision problem of the complex system.
In the embodiment, the application of the Russian square game and deep reinforcement learning combined algorithm technology in the production scheduling field coordinates internal and external resources and efficiently organizes the scheduling of production, so that the innovation of the traditional scheduling problem solving method is realized, the latest artificial intelligence technology research result at the front is applied to the actual industrial production, and the efficiency of the scheduling algorithm is improved. Meanwhile, the equilibrium utilization rate of the equipment is improved. The method solves the problems of low equipment utilization rate and unbalanced production faced by the current production scheduling algorithm, helps manufacturers reduce production cost, accelerates production period and improves timeliness of delivery.
Optionally, when the types of the two workpieces to be produced that are adjacent in the machining sequence are different, the production schedule further includes: and (3) performing a mold change operation between two workpieces to be produced, which are adjacent in processing sequence.
Fig. 2 is a schematic diagram of a scheduling plan output by a scheduling method according to an embodiment of the invention. The scheduling plan may be presented in the form of a russian square game interface. As shown in fig. 2, in the game interface, the respective devices 1 to 7 to be produced are indicated below. A, B and C are shown to represent various workpieces. The output scheduling plan comprises the number, the type and the processing sequence of the workpieces to be scheduled corresponding to the equipment to be scheduled. Taking the to-be-discharged apparatus 1 in fig. 2 as an example, the to-be-discharged apparatus 1 corresponds to 4 to-be-discharged workpieces, wherein two to-be-discharged workpieces a and two to-be-discharged workpieces B. It can be seen that since the workpieces a and B to be produced are of different types, a change operation is included between the adjacent two workpieces a and B.
One possible implementation of step 103 is: obtaining intermediate state information output by the Russian square online environment according to the information to be scheduled; inputting intermediate state information into the scheduling model, acquiring strategy actions generated by the scheduling model according to the intermediate state information, feeding back the strategy actions to the on-line environment of the Russian square, acquiring new intermediate state information output by the on-line environment of the Russian square according to the strategy actions, and repeatedly executing the steps until all workpieces to be scheduled are scheduled, and determining the scheduling plan of each equipment to be scheduled according to the strategy actions output by the scheduling model.
FIG. 3 is a schematic diagram of the interaction of the scheduling model with the Russian block online environment. As shown in FIG. 3, the Russian on-line environment inputs intermediate state information to the scheduling model, which feeds back strategy actions to the Russian on-line environment. And (3) circulating until all the workpieces to be produced are produced, namely, all the workpieces to be produced have corresponding equipment to be produced. It can be understood that the set of policy actions output by the scheduling model is the scheduling plan of each device to be scheduled.
In one possible implementation, the scheduling model may determine that all workpieces to be scheduled are scheduled according to intermediate state information input by the environment on the russian square line.
It should be noted that, the scheduling model may output the scheduling plan in real time, that is, after each time the corresponding relationship between one workpiece to be scheduled and the equipment to be scheduled is scheduled, the corresponding relationship is output, so as to meet the requirement of real-time production. Of course, the scheduling model may output the complete scheduling plan after all the workpieces to be scheduled are completely scheduled.
Optionally, the intermediate state information in the present embodiment includes: the type of the currently scheduled work piece, the position of the currently scheduled work piece, the load of each equipment to be scheduled, the information of the currently non-scheduled work piece, and the information of the current last action. Wherein, the load of each equipment to be produced can be represented by the two-dimensional picture information of the produced workpieces. Here, the two-dimensional picture information refers to a game interface diagram of russian squares output by the environment on the russian square line, similar to the diagram shown in fig. 2. The load of each of the apparatuses to be discharged refers to the number of the workpieces included in the apparatus to be discharged or a value processed according to the number of the workpieces included in the apparatus to be discharged. The load of the production equipment to be produced also includes the number of the parting operations if the parting operations are included between the workpieces, or the processed value according to the number of the parting operations. The current location of the produced workpiece refers to the location of the produced workpiece in the game interface diagram of the russian dice that is output by the environment on the russian dice line. A coordinate system may be established in the game interface diagram to characterize the location of each of the produced workpieces. The information of the currently unproductive workpiece may include a type and a number of the currently unproductive workpiece. The information of the current last action may include: the identification of the selected work piece which is not finished in production and the identification of the equipment to be produced corresponding to the work piece which is not finished in production in the current previous action. The actions herein refer to policy actions.
Optionally, the intermediate state information may further include: mapping relation between equipment to be produced and workpieces to be produced.
Optionally, the policy actions include: and the identification of the selected unfinished workpieces and the identification of equipment to be scheduled corresponding to the unfinished workpieces. The identification may be a number or the like that uniquely represents the work piece or the equipment to be scheduled. It will be appreciated that, after receiving the policy action output by a scheduling model, the russian square online environment will select the corresponding workpiece, and then place the corresponding workpiece in the manner of russian square game to the end workpiece task of the selected corresponding device for subsequent production. If the new workpiece and the end workpiece are of the same type, continuous production is possible. If the new workpiece and the end workpiece are not of the same type, a change operation may be added between the new workpiece and the end workpiece.
According to the scheduling method provided by the embodiment, the scheduling plan of each equipment to be scheduled, which is output by the scheduling model, is obtained through interaction between the Russian square game and the scheduling model which is obtained by training in advance by using the deep reinforcement learning method and taking the balanced utilization rate of the equipment as an index, so that the application of the scheduling algorithm technology of combining the deep reinforcement learning with the Russian square game in the production and scheduling field is realized. Because the model is obtained by training by using a deep reinforcement learning method and taking the improved equipment balance utilization rate as an index when the scheduling model is trained, the output scheduling plan can improve the equipment balance utilization rate when scheduling according to the scheduling model, and meanwhile, the scheduling efficiency is higher because artificial intelligence is adopted in the scheduling process.
Fig. 4 is a flowchart of a scheduling method according to a second embodiment of the present invention. The method for scheduling according to this embodiment will be described in detail with reference to the embodiment shown in fig. 1 and various alternative implementations. As shown in fig. 4, the method comprises the steps of:
step 401: and training by using a deep reinforcement learning method with the balanced utilization rate of the equipment as an index to obtain a scheduling model.
Fig. 7 is a deep reinforcement learning principle frame diagram. As shown in fig. 7, the deep reinforcement learning technology architecture includes two basic modules, namely an agent and an environment.
The intelligent agent is a self-defined neural network model, and decides to output the next action of the intelligent agent by sensing the observation information and rewards fed back by the receiving environment. The environment is a simulated digital environment which is simulated according to the actual industrial scene, can receive data information such as the number of devices, the product type and the number of workpieces and the like in the initial stage to initialize the environment, and can complete the updating of the function parameters of the environment by receiving the decision action of an intelligent agent, and can output a high-dimension characteristic for observation feedback and a rewarding value for the action to the outside. The environment in this embodiment is referred to as the russian training environment.
One possible implementation of step 401 is: initializing a Russian block training environment according to the scheduling information for training; acquiring new training intermediate state information output by the Russian block training environment according to the training scheduling information; inputting new training intermediate state information into the training scheduling model, acquiring randomly generated training strategy actions or acquiring training strategy actions generated by the training scheduling model according to the new training intermediate state information, feeding back the training strategy actions to the Russian block training environment, acquiring updated new training intermediate state information output by the Russian block training environment according to the training strategy actions, updating the training scheduling model according to training intermediate features output by the Russian block training environment by adopting a random gradient descent algorithm at a set frequency, and repeatedly executing the steps until the training scheduling model meets the preset design requirement, and taking the training scheduling model meeting the preset design requirement as the scheduling model. Wherein training the intermediate features comprises: new training intermediate state information, old training intermediate state information, a reward value and a current action strategy, wherein the reward value is determined according to the equipment balance utilization rate.
The training scheduling information in this embodiment refers to scheduling information to be used in a training phase, and it can be understood that, similar to the embodiment shown in fig. 1, the training scheduling information may include: the method comprises the steps of training the number of the equipment to be scheduled, training the type of the equipment to be scheduled, training the number of the workpieces to be scheduled, training the type of the workpieces to be scheduled and training the mapping relation between the equipment to be scheduled and the workpieces to be scheduled.
The Russian block training environment in this embodiment is similar to the Russian block on-line environment in the embodiment shown in FIG. 1, and the generation is similar, except that the data input during initialization is different.
Similar to the intermediate state information in the embodiment shown in fig. 1, the new training intermediate state information may include: the type of the work piece for the current scheduled training, the position of the work piece for the current scheduled training, the load of equipment to be scheduled for each training, the information of the work piece for the current scheduled training and the information of the current last training action. Similarly, the new training intermediate state information may further include: mapping relation between the work piece to be scheduled for training and the equipment to be scheduled for training.
Similar to the policy actions in the embodiment shown in FIG. 1, training policy actions may include: and the identification of the selected training workpieces which are not finished in production, and the identification of the training equipment to be produced, which corresponds to the training workpieces which are not finished in production.
It should be noted that, the russian block training environment inputs new training intermediate state information to the training scheduling model, and obtains a training strategy action generated randomly or a training strategy action generated by the training scheduling model according to the new training intermediate state information. The training strategy actions are generated by the training scheduling model according to the preset exploration rate and the new training intermediate state information or are generated randomly.
In the process of interaction between the training scheduling model and the Russian block training environment, the training scheduling model is updated at a set frequency by adopting a random gradient descent algorithm according to training intermediate characteristics output by the Russian block training environment. In the process of interacting with the Russian square training environment, if the training scheduling model is updated, outputting a training strategy action by using the updated training scheduling model; if not, the existing training scheduling model is also utilized to output training strategy actions. It should be noted that the process of updating the training scheduling model is independent of the process of interacting the training scheduling model with the russian training environment.
Updating the training scheduling model refers to updating parameters in the training scheduling model. The update of the training scheduling model can be realized by adopting the existing algorithm. For example, deep Q-Learning (DQN) algorithm may be used to implement the update of the training scheduling model.
In the process of updating the training scheduling model, training intermediate features output by the Russian block training environment are needed. The training intermediate feature comprises: new training intermediate state information, old training intermediate state information, a reward value and a current action strategy, wherein the reward value is determined according to the equipment balance utilization rate. The content included in the old training intermediate state information is similar to the content included in the new training intermediate state information, and will not be described here again. The current action policy refers to an action policy corresponding to the new training intermediate state information.
In one possible implementation, the prize value is calculated as follows: when the equipment balance utilization rate U1 before the current action strategy is larger than the equipment balance utilization rate U2 after the current action strategy, the rewarding value is a first value; when U1 is equal to U2, the bonus value is a second value; when U1 is less than U2, the prize value is a third value. Wherein, u1= (number of times of change of the old training intermediate state is the equipment load difference of the old training intermediate state)/number of workpieces already scheduled in the old training intermediate state, u2= (number of times of change of the new training intermediate state is the equipment load difference of the new training intermediate state)/number of workpieces already scheduled in the new training intermediate state. The magnitude relation among the first value, the second value and the third value is as follows: the first value > the second value > the third value. Illustratively, the first value may be 1, the second value may be 0, and the third value may be-1. In the actual training process, specific values of the first value, the second value and the third value can be selected according to actual conditions.
The training intermediate state in the above formula refers to a state corresponding to training intermediate state information, which may be characterized by a game interface output by the russian training environment. The number of times of the change in the above formula refers to the sum of the number of times of the change included in all the equipment to be produced. The equipment load difference refers to the difference between the number of the workpieces included in the equipment to be scheduled having the largest number of the corresponding workpieces and the number of the workpieces included in the equipment to be scheduled having the smallest number of the corresponding workpieces. It should be noted that the number of workpieces included in the apparatus to be produced includes a change operation. The number of workpieces that have been scheduled refers to the number of all workpieces that have been scheduled.
FIG. 5 is a schematic diagram of a game interface before and after a current action strategy in the embodiment shown in FIG. 4. As shown in FIG. 5, the upper graph is a game interface graph of the Russian block training environment output prior to the current action strategy, which may characterize the old training intermediate state; the next figure is a game interface diagram of the russian diamond training environment output after the current action strategy, which may characterize the new training intermediate state. The manner in which the device load difference is determined is shown in the upper graph of fig. 5. Illustratively, the current action strategy refers to placing the workpiece a after the end workpiece of the apparatus 4. Since the end piece of the apparatus 4 is a piece E, unlike piece a, a change of shape is added.
FIG. 6 is a flow chart of a specific training method for the scheduling model in the embodiment shown in FIG. 4. As shown in fig. 6, the training method includes the following steps:
step 601: and determining the information to be scheduled for training.
Wherein, training is with waiting to arrange production information includes: the type and the number of the equipment to be scheduled, the type and the number of the workpieces to be scheduled for training, and the mapping relation between the equipment to be scheduled for training and the workpieces to be scheduled for training.
Step 602: and setting parameters of the Russian block training environment according to the training information to be produced, and initializing the Russian block training environment.
The method specifically comprises the steps of setting the type and the number of the equipment to be produced, the type and the number of the workpieces to be produced for training and the mapping relation between the equipment to be produced for training and the workpieces to be produced for training through interfaces of Russian square training environment.
Step 603: obtaining the Russian square training environment.
Step 604: the russian training environment generates training intermediate features.
It should be noted that, in the first training process, an action may be randomly generated to interact with the russian training environment.
Step 605: and performing training calculation for training the scheduling model.
Step 606: and judging whether the training times are finished.
Step 607: if not, generating strategy actions or randomly generating strategy actions by the training scheduling model according to the exploration rate, and outputting the strategy actions to the Russian block training environment.
Step 608: and if the training times are finished, obtaining a trained scheduling model from the training scheduling model.
In the training process of the embodiment, the equipment balance utilization rate is utilized to determine the rewarding value, so that the scheduling plan output by the scheduling model obtained through training can improve the equipment balance utilization rate.
Step 402: and determining the information to be produced.
The information to be produced is used for indicating the information of the equipment to be produced and the information of the workpiece to be produced.
Step 403: and initializing the on-line environment of the Russian square according to the information to be produced.
Step 404: and according to the pre-trained scheduling model, interacting with the Russian square online environment to obtain a scheduling plan of each equipment to be scheduled, which is output by the scheduling model.
The scheduling model is obtained by training by using a deep reinforcement learning method and taking the balance utilization rate of the equipment as an index. The scheduling plan of each to-be-scheduled device comprises information of the corresponding to-be-scheduled workpieces of the to-be-scheduled device and the processing sequence of the to-be-scheduled workpieces.
The implementation process and technical principle of step 402 and step 101, step 403 and step 102, and step 404 and step 103 are similar, and will not be described again here.
The scheduling method provided by the embodiment is an important combination of the artificial intelligence technology and the application in the field of production scheduling. The method utilizes the strong perception and decision capability of artificial intelligence deep reinforcement learning, improves the utilization rate of equipment compared with the traditional scheduling algorithm, improves the production balance and shortens the calculation time in the scheduling process. For large-scale production, the equipment utilization rate is improved while the production balance is maintained, so that the production efficiency is improved as a whole.
The embodiment of the invention has the advantages that the factory improves the equipment utilization efficiency of a production workshop and keeps the equipment load balance by using the production method based on the combination of Russian square blocks and artificial intelligence deep reinforcement learning. The production process of the factory is optimized on the whole, the production cost is reduced, and the economic benefit is improved. Promotes the development and application of intelligent factories, combines the artificial intelligence and the industrial production field, and improves the intelligent level of industrial production.
Fig. 8 is a schematic structural diagram of a production scheduling device according to an embodiment of the present invention. The device can be arranged in electronic equipment such as a server. As shown in fig. 8, the scheduling device provided in this embodiment includes the following modules: a first determination module 81, an initialization module 82 and a second determination module 83.
The first determining module 81 is configured to determine information to be scheduled.
The information to be produced is used for indicating the information of the equipment to be produced and the information of the workpiece to be produced.
The initialization module 82 is configured to initialize the on-line environment of the russian block according to the information to be produced.
The second determining module 83 is configured to interact with the russian square online environment according to the pre-trained scheduling model, so as to obtain a scheduling plan of each device to be scheduled output by the scheduling model.
The scheduling model is obtained by training by using a deep reinforcement learning method and taking the balance utilization rate of the equipment as an index. The scheduling plan of each to-be-scheduled device comprises information of the corresponding to-be-scheduled workpieces of the to-be-scheduled device and the processing sequence of the to-be-scheduled workpieces.
The second determining module 83 is specifically configured to: obtaining intermediate state information output by the Russian square online environment according to the information to be scheduled; inputting intermediate state information into the scheduling model, acquiring strategy actions generated by the scheduling model according to the intermediate state information, feeding back the strategy actions to the on-line environment of the Russian square, acquiring new intermediate state information output by the on-line environment of the Russian square according to the strategy actions, and repeatedly executing the steps until all workpieces to be scheduled are scheduled, and determining the scheduling plan of each equipment to be scheduled according to the strategy actions output by the scheduling model.
Optionally, the intermediate state information includes: the type of the currently scheduled work piece, the position of the currently scheduled work piece, the load of each equipment to be scheduled, the information of the currently non-scheduled work piece, and the information of the current last action.
Optionally, the policy actions include: and the identification of the selected unfinished workpieces and the identification of equipment to be scheduled corresponding to the unfinished workpieces.
Optionally, the information of the to-be-produced devices includes the number and type of the to-be-produced devices, and the information of the to-be-produced workpieces includes the number and type of the to-be-produced workpieces.
Optionally, the to-be-produced information further includes: mapping relation between equipment to be produced and workpieces to be produced.
Optionally, when the types of the two workpieces to be produced that are adjacent in the machining sequence are different, the production schedule further includes: and (3) performing a mold change operation between two workpieces to be produced, which are adjacent in processing sequence.
The scheduling device provided by the embodiment of the invention can execute the scheduling method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 9 is a schematic structural view of another production scheduling device according to an embodiment of the present invention. As shown in fig. 9, this embodiment further includes a training module 91 based on the embodiment shown in fig. 8 and various alternative implementations.
Optionally, the training module 91 is specifically configured to: initializing a Russian block training environment according to the scheduling information for training; acquiring new training intermediate state information output by the Russian block training environment according to the training scheduling information; inputting new training intermediate state information into the training scheduling model, acquiring randomly generated training strategy actions or acquiring training strategy actions generated by the training scheduling model according to the new training intermediate state information, feeding back the training strategy actions to the Russian block training environment, acquiring updated new training intermediate state information output by the Russian block training environment according to the training strategy actions, updating the training scheduling model according to training intermediate features output by the Russian block training environment by adopting a random gradient descent algorithm at a set frequency, and repeatedly executing the steps until the training scheduling model meets the preset design requirement, and taking the training scheduling model meeting the preset design requirement as the scheduling model.
Wherein training the intermediate features comprises: new training intermediate state information, old training intermediate state information, a reward value and a current action strategy, wherein the reward value is determined according to the equipment balance utilization rate.
Optionally, the prize value is calculated according to the following: when the equipment balance utilization rate U1 before the current action strategy is larger than the equipment balance utilization rate U2 after the current action strategy, the rewarding value is a first value; when U1 is equal to U2, the bonus value is a second value; when U1 is less than U2, the prize value is a third value.
The magnitude relation among the first value, the second value and the third value is as follows: the first value > the second value > the third value. Wherein, u1= (number of times of change of the old training intermediate state is the equipment load difference of the old training intermediate state)/number of workpieces already scheduled in the old training intermediate state, u2= (number of times of change of the new training intermediate state is the equipment load difference of the new training intermediate state)/number of workpieces already scheduled in the new training intermediate state.
The scheduling device provided by the embodiment of the invention can execute the scheduling method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 10 is a schematic structural diagram of an electronic device implementing a production scheduling method according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 10, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above.
In some embodiments, the scheduling method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the scheduling method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the production method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method of scheduling, comprising:
determining information to be produced; the information to be produced is used for indicating information of equipment to be produced and information of workpieces to be produced;
initializing an on-line environment of the Russian square according to the information to be produced;
according to a pre-trained scheduling model, interacting with the Russian square online environment to obtain a scheduling plan of each equipment to be scheduled output by the scheduling model; the scheduling model is a model obtained by training by using a deep reinforcement learning method and taking the balanced utilization rate of equipment as an index, and the scheduling plan of each piece of equipment to be scheduled comprises information of corresponding pieces of work to be scheduled and the processing sequence of the pieces of work to be scheduled.
2. The method according to claim 1, wherein the interacting with the russian block online environment according to the pre-trained scheduling model obtains a scheduling plan of each device to be scheduled output by the scheduling model, comprising:
acquiring intermediate state information output by the Russian block online environment according to the information to be produced;
inputting the intermediate state information into the scheduling model, acquiring a strategy action generated by the scheduling model according to the intermediate state information, feeding back the strategy action to the Russian block online environment, acquiring new intermediate state information output by the Russian block online environment according to the strategy action, and repeatedly executing the steps until all workpieces to be scheduled are scheduled, and determining a scheduling plan of each equipment to be scheduled according to the strategy action output by the scheduling model.
3. The method of claim 2, wherein the intermediate state information comprises: the type of the currently scheduled workpiece, the position of the currently scheduled workpiece, the load of each device to be scheduled, the information of the currently non-scheduled workpiece and the information of the current last action;
the policy actions include: and selecting the mark of the work piece which is not finished in production, and the mark of the equipment to be produced corresponding to the work piece which is not finished in production.
4. A method according to any one of claims 1 to 3, wherein the information of the equipment to be scheduled comprises the number and type of the equipment to be scheduled, and the information of the work pieces to be scheduled comprises the number and type of the work pieces to be scheduled;
the information to be produced further includes: mapping relation between equipment to be produced and workpieces to be produced.
5. A method according to any one of claims 1 to 3, wherein when the types of two work pieces to be produced, which are adjacent in the machining order, are different, the production schedule further comprises: and the two processing sequences are adjacent to each other, and the workpiece to be produced is subjected to the type changing operation.
6. A method according to any one of claims 1 to 3, wherein before said interacting with said russian block online environment according to a pre-trained scheduling model to obtain a scheduling plan for each device to be scheduled output by said scheduling model, the method further comprises:
Initializing a Russian block training environment according to the scheduling information for training;
acquiring new training intermediate state information output by the Russian block training environment according to the training scheduling information;
inputting the new training intermediate state information into a training scheduling model, acquiring a training strategy action generated randomly or acquiring a training strategy action generated by the training scheduling model according to the new training intermediate state information, feeding back the training strategy action to the Russian block training environment, acquiring updated new training intermediate state information output by the Russian block training environment according to the training strategy action, updating the training scheduling model with a set frequency by adopting a random gradient descent algorithm according to training intermediate features output by the Russian block training environment, and repeatedly executing the step until the training scheduling model meets the preset design requirement, wherein the training scheduling model is used as the scheduling model when the preset design requirement is met;
wherein the training intermediate feature comprises: new training intermediate state information, old training intermediate state information, a reward value and a current action strategy, wherein the reward value is determined according to the equipment balance utilization rate.
7. The method of claim 6, wherein the prize value is calculated as follows:
when the equipment balance utilization rate U1 before the current action strategy is larger than the equipment balance utilization rate U2 after the current action strategy, the rewarding value is a first value; wherein, u1= (number of times of change of the old training intermediate state is the equipment load difference of the old training intermediate state)/number of workpieces already scheduled in the old training intermediate state, u2= (number of times of change of the new training intermediate state is the equipment load difference of the new training intermediate state)/number of workpieces already scheduled in the new training intermediate state;
when U1 is equal to U2, the bonus value is a second value;
when U1 is smaller than U2, the reward value is a third value;
the magnitude relation among the first value, the second value and the third value is as follows: the first value > the second value > the third value.
8. A scheduling device, comprising:
the first determining module is used for determining information to be produced; the information to be produced is used for indicating information of equipment to be produced and information of workpieces to be produced;
the initialization module is used for initializing the on-line environment of the Russian square according to the information to be produced;
the second determining module is used for interacting with the Russian square online environment according to a pre-trained scheduling model to obtain a scheduling plan of each equipment to be scheduled, which is output by the scheduling model; the scheduling model is a model obtained by training by using a deep reinforcement learning method and taking the balanced utilization rate of equipment as an index, and the scheduling plan of each piece of equipment to be scheduled comprises information of corresponding pieces of work to be scheduled and the processing sequence of the pieces of work to be scheduled.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the scheduling method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the scheduling method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310358190.3A CN116485119A (en) | 2023-04-04 | 2023-04-04 | Scheduling method, scheduling device, scheduling equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310358190.3A CN116485119A (en) | 2023-04-04 | 2023-04-04 | Scheduling method, scheduling device, scheduling equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116485119A true CN116485119A (en) | 2023-07-25 |
Family
ID=87218689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310358190.3A Pending CN116485119A (en) | 2023-04-04 | 2023-04-04 | Scheduling method, scheduling device, scheduling equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116485119A (en) |
-
2023
- 2023-04-04 CN CN202310358190.3A patent/CN116485119A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6962539B2 (en) | Business plan optimization method | |
CN114298322B (en) | Federal learning method and apparatus, system, electronic device, and computer readable medium | |
CN116166405B (en) | Neural network task scheduling strategy determination method and device in heterogeneous scene | |
CN113850394B (en) | Federal learning method and device, electronic equipment and storage medium | |
CN116562156B (en) | Training method, device, equipment and storage medium for control decision model | |
CN114819095A (en) | Method and device for generating business data processing model and electronic equipment | |
CN114416583A (en) | Workload determination method, device, equipment and storage medium for automatic test | |
CN117633184A (en) | Model construction and intelligent reply method, device and medium | |
CN116485119A (en) | Scheduling method, scheduling device, scheduling equipment and storage medium | |
CN116629519A (en) | Scheduling method, scheduling device, electronic equipment and storage medium | |
CN116579570A (en) | Product production scheduling method, device, equipment and medium | |
CN115890684A (en) | Robot scheduling method, device, equipment and medium | |
CN115759751A (en) | Enterprise risk prediction method and device, storage medium, electronic equipment and product | |
CN112632309B (en) | Image display method and device, electronic equipment and storage medium | |
CN115082624A (en) | Human body model construction method and device, electronic equipment and storage medium | |
CN115185606A (en) | Method, device, equipment and storage medium for obtaining service configuration parameters | |
CN114331379B (en) | Method for outputting task to be handled, model training method and device | |
CN114494818B (en) | Image processing method, model training method, related device and electronic equipment | |
CN115598985B (en) | Training method and device of feedback controller, electronic equipment and medium | |
CN116167464A (en) | Method, device, equipment and medium for predicting operation time of federal learning task | |
CN112561390B (en) | Training method of order scheduling model, order scheduling method and device | |
CN116468824B (en) | Animation redirection method, device, electronic equipment and storage medium | |
CN115741696A (en) | Object grabbing method, device and equipment and storage medium | |
CN117032262B (en) | Machine control method, device, electronic equipment and storage medium | |
CN118261303A (en) | Optimization method, equipment and storage medium for large model of carbon emission reduction scheme |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |