CN112633772B - Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop - Google Patents
Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop Download PDFInfo
- Publication number
- CN112633772B CN112633772B CN202110006953.9A CN202110006953A CN112633772B CN 112633772 B CN112633772 B CN 112633772B CN 202110006953 A CN202110006953 A CN 202110006953A CN 112633772 B CN112633772 B CN 112633772B
- Authority
- CN
- China
- Prior art keywords
- scheduling
- batch
- agent
- workshop
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004043 dyeing Methods 0.000 title claims abstract description 87
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000004753 textile Substances 0.000 title claims abstract description 44
- 239000004744 fabric Substances 0.000 title claims abstract description 40
- 230000002787 reinforcement Effects 0.000 title claims abstract description 39
- 238000004519 manufacturing process Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000009471 action Effects 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000011478 gradient descent method Methods 0.000 claims abstract description 7
- 238000003062 neural network model Methods 0.000 claims abstract description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 118
- 230000003993 interaction Effects 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 25
- 238000005457 optimization Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 5
- 238000005096 rolling process Methods 0.000 claims description 5
- 230000006399 behavior Effects 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 4
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000001960 triggered effect Effects 0.000 claims description 3
- 239000003086 colorant Substances 0.000 claims description 2
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000012384 transportation and delivery Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000003344 environmental pollutant Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 231100000719 pollutant Toxicity 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Strategic Management (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Manufacturing & Machinery (AREA)
- Primary Health Care (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Coloring (AREA)
- Treatment Of Fiber Materials (AREA)
Abstract
The invention provides a multi-agent deep reinforcement learning and dispatching method for a textile fabric dyeing workshop, which comprises the following steps of: and acquiring current dye vat processing state data of the dyeing workshop, acquiring task arrival data, and preprocessing the state data. And establishing a deep reinforcement learning multi-agent model for the dispatching of the textile fabric dyeing workshop. And establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment. Training a model for dispatching the textile fabric dyeing workshop, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain an optimal dispatching strategy for dispatching the textile fabric dyeing workshop. And deploying the obtained intelligent scheduling model of the dyeing workshop in the dyeing workshop, and scheduling according to the real-time production condition and the task arrival condition of the workshop. The invention realizes the dynamic scheduling of the dyeing workshop through the multi-agent deep reinforcement learning, and is suitable for the current dynamic production environment facing order type production.
Description
Technical Field
The invention relates to a multi-agent deep reinforcement learning and scheduling method for a textile fabric dyeing workshop, which is used for production scheduling of the textile fabric dyeing workshop and scheduling optimization in the textile fabric production process and belongs to the field of production planning.
Background
China is the biggest world textile garment production and export country, and the continuous and stable growth of textile garment export is crucial to ensuring foreign exchange storage, international balance of collection and balance, stable Renminbi exchange rate, and solving the sustainable development of social employment and textile industry in China. With the continuous improvement of the social living standard and the popularization of the internet, the consumption of textiles presents individual requirements and diversified requirements, and a textile company needs to optimize production operation so as to improve the productivity and quality and meet the requirements of customization and delivery date of customers. The production of fabrics can divide into and weave and dyeing and finishing two parts, and wherein the dyeing process time of dyeing and finishing stage is long, pollutant discharge level is high, is the most crucial link of textile manufacturing, through the dyeing process order of the reasonable arrangement cloth of dyeing workshop scheduling, can effectually promote punctual the delivery of product and reduce the colour and switch over material consumption cost, further promotes the customer satisfaction of enterprise and reduces environmental pollution, is the key problem that present textile industry waited to solve urgently.
The current dyeing workshop scheduling method mainly adopts static scheduling, cannot meet the quick response requirement brought by order-oriented production transformation, and only pays attention to real-time information of a workshop and lacks of consideration on historical dynamic information when the traditional reinforcement learning optimizes a scheduling target. Therefore, on the basis of the existing dyeing workshop scheduling research and PPO reinforcement learning algorithm, aiming at the problem of scheduling of the dyeing workshop with dynamically arrived tasks, the MA-RPPO reinforcement learning algorithm with the batched intelligent bodies and the cylinder arrangement intelligent bodies is designed, and an efficient scheduling strategy is established through interactive training of the intelligent bodies and the workshop.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the tasks of the textile fabric dyeing workshop dynamically arrive along with the orders, the workshop needs to make quick and effective dispatching response aiming at the production state of the workshop and the arrival condition of the tasks in real time, and the existing dispatching method cannot realize quick response and efficient dispatching.
In order to solve the problems, the technical scheme of the invention is to provide a multi-agent depth reinforcement learning scheduling method for a textile fabric dyeing workshop, which comprises the following steps:
s1, acquiring the current dye vat processing state data of the dyeing workshop, acquiring the task arrival data, and preprocessing the state data
S2, establishing a deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling
S3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment
S4, training the model of the textile fabric dyeing workshop dispatching, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain the optimal dispatching strategy of the textile fabric dyeing workshop dispatching
And S5, deploying the obtained intelligent dye shop scheduling model in the dye shop, and scheduling according to the real-time production condition and the task arrival condition of the shop.
Preferably, the step S1 further includes: the method comprises the steps of carrying out independent thermal coding on a color system of task arrival data, carrying out integer coding and normalization processing on color numbers and the like, realizing mathematical expression of dyeing tasks, and carrying out proportional scaling on time related parameters such as idle time of a dye vat.
Preferably, in S2, the multi-agent deep reinforcement learning dye shop scheduling model specifically includes a batch agent, a bank agent, and an information interaction center module integrating a memory and prediction function and an agent interaction function. The batch agent outputs a batch scheduling decision by inputting the communication information and the batch related information of the interaction center, the cylinder bank agent outputs a cylinder bank scheduling decision by inputting the communication information and the cylinder bank related information of the interaction center, the information interaction center module is responsible for recording the state change of a workshop and the scheduling history of the two agents, and the sent memory and prediction information is provided for the agent of the next scheduling decision.
Preferably, in S3, the state parameter model is input parameters of the agent, including task state parameters, batch state parameters, and processing state parameters of the dye vat. The scheduling action model is an output scheduling action set of the agent, and comprises a batch action set and a cylinder bank action set. The scheduling feedback model is a quantitative evaluation on the scheduling quality degree, is a step feedback model equivalent to a scheduling objective function, and has a mutual discount feedback mechanism between the batch agent and the cylinder bank agent, so that the agent can minimize the total delay time of the workshop scheduling.
Preferably, the S4 includes model training of two agents and an information interaction center, and the model training is performed mainly by interacting the multi-agent deep reinforcement learning model with the dyeing workshop to obtain a large amount of scheduling experience, and by using a gradient descent method. The training of the batched agents is to obtain the optimal batched strategy parameter combination, the training of the cylinder-arrangement agents is to obtain the optimal cylinder-arrangement strategy parameter combination, and the parameters of the interaction center are updated synchronously with the parameters of the two agents. And adjusting according to the evaluation value of the scheduling feedback model during parameter updating, and clipping the gradient to increase the stability of training.
Preferably, in S5, the dyeing workshop multi-agent deep reinforcement learning model is deployed into a real dyeing workshop production management system, and through a hybrid driving mechanism of event triggering and time window triggering, when scheduling triggering is performed, the batching agents make batching scheduling decisions according to the actual production conditions of the dyeing workshop, and the cylinder scheduling agents make cylinder scheduling decisions according to the actual production conditions of the dyeing workshop. Before the intelligent agent is scheduled, interactive information with memory and prediction and the real-time state of a workshop are obtained from the information interaction center, and after the intelligent agent is scheduled, the workshop state before the scheduling and a corresponding scheduling decision are sent to the information interaction center. After the dispatching command is issued, firstly checking the legality of the dispatching decision, if so, executing the dispatching decision, updating the state of the workshop, acquiring the evaluation value of a dispatching feedback model, recording the state, the action and the reward, and regularly adjusting the model parameters.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides the dyeing workshop scheduling method aiming at the problem of scheduling the textile dyeing workshop, and has important engineering value. Off-time delivery of an order may reduce customer satisfaction and thus market competitiveness for the business. Excessive color switching material consumption can cause the production cost of enterprises to rise, is contrary to the sustainable development concept of green production, and is not beneficial to the increase of the public confidence of the enterprises. Therefore, the scheduling of the dyeing workshop is optimized, the postponed time and the color switching cost of the product can be realized, the pollutant emission level can be further reduced, the credit and the public confidence of an enterprise are improved, and the method has important engineering practical application value.
Drawings
FIG. 1 is a schematic flow chart of a multi-agent depth reinforcement learning scheduling method for a textile fabric dyeing workshop;
FIG. 2 is a schematic diagram showing a dynamic scheduling process of a dyeing plant;
FIG. 3 is a schematic diagram of data preprocessing;
FIG. 4 is a schematic diagram showing the overall structure of the MA-RPPO model;
FIG. 5 is a schematic diagram illustrating the logic of the operation of the LSTM module;
FIG. 6 is a schematic diagram of the interaction between the MA-RPPO reinforcement learning agent and the workshop.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.
The multi-agent reinforcement learning scheduling method of the embodiment of the application uses a multi-agent reinforcement learning technology in the reinforcement learning field to schedule a dyeing workshop, and comprises the following two sub-problems: the method comprises the steps of batching subproblems and arranging subproblems, designing an intelligent body aiming at the two subproblems respectively, enabling the two intelligent bodies to dynamically carry out dispatching response according to the real-time state of a workshop, realizing the cooperation relationship between the two intelligent bodies through an information interaction center, and simultaneously enabling the information interaction center to have the functions of memorizing and predicting the state of the workshop. The two agents respectively obtain the optimal batching strategy and the cylinder arranging strategy through training, and the total delay time of the dyeing workshop scheduling is minimized through the cooperation of the two strategies, so that the dynamic response requirement facing the order type production is met.
Specifically, please refer to fig. 1, which is a flowchart of a multi-agent deep reinforcement learning scheduling method for a dyeing workshop according to an embodiment of the present application. The multi-agent deep reinforcement learning scheduling method for the dyeing workshop comprises the following steps:
and S1, acquiring the current dye vat processing state data of the dyeing workshop, acquiring task arrival data, and preprocessing the state data.
In step S1: the state data preprocessing comprises the following steps: because the original data part is characterized by character strings or numbers with special meanings, such as the numbers of cloth lots, colors and the like, the character strings cannot participate in the operation, and the direct participation of the numbered numbers in the operation can cause the phenomena of gradient disappearance, gradient explosion and the like in the network updating process. Therefore, the following features in the data should be first encoded before the experiment is performed, including: color number, color system, lot number. Commonly used encoding methods are binary encoding, one-hot encoding and integer encoding. The color numbers in the same color system are subjected to integer coding and normalization according to the depth of the color, the color systems and the lot numbers in different color systems are subjected to independent hot coding, the emergency task type parameters are the integer codes, and the emergency task type parameters are subjected to 0-1 normalization, as shown in a formula (1). Because the state characteristics of partial observation, such as waiting time and other parameters, increase along with the change of time, the gradient disappears or the gradient explodes, the state parameters related to the time are scaled by adopting a scale factor method, the order of magnitude difference of each characteristic dimension is reduced, and the scaling scale factor is set as bt. An example of data pre-processing is shown in figure 3.
In the above formula xnormRepresenting the value after normalization, x representing the integer code value before normalization, xmaxIs the maximum value of the integer code.
S2, establishing a deep reinforcement learning multi-agent model for the textile fabric dyeing workshop scheduling.
In step S2: the established deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling is an MA-RPPO reinforcement learning multi-agent model and comprises two PPO agents in batch and cylinder arrangement, each agent is provided with a scheduling strategy module Actor, mapping from a workshop state to batch or cylinder arrangement is realized through a deep neural network, and the structure is shown in FIG. 4. And the batch Actor and the cylinder bank Actor carry out sequential scheduling through a dynamic scheduling mechanism, and interact with the dyeing workshop environment and learn a scheduling experience optimization scheduling strategy. Two agents share a global Critic and a global LSTM network. Critic is a 'behavior-value' function, and the mapping from the overall state of a workshop and scheduling decision to scheduling evaluation is realized through a deep neural network. On the basis, the reinforcement learning multi-agent improves the overall optimization and problem dynamics of two sub-problems of the dyeing workshop scheduling.
In the embodiment, in order to solve the influence of dynamic arrival of the dyeing task along with the order and dynamic change of the workshop processing environment, a dynamic information fusion mechanism is designed. The LSTM module inputs the historical state of the workshop and the scheduling record to carry out coding and memory, fusion of historical dynamic information is achieved, and a one-dimensional matrix is output to provide key workshop dynamic information for intelligent agent scheduling. As shown in fig. 5, after the global state vector and the scheduling decision are spliced, the global state vector and the scheduling decision are input into the LSTM network to transmit information through hidden states h and c, and the input and output of the LSTM unit can be represented as follows:
mdc-1=LSTM(hdc-2,cdc-2,[sdc-1,adc-1];ψ) (2)
interaction vector m in the above equationdc-1Is a one-dimensional vector, is a code of historical plant status records and prediction information, sdc-1Is an input plant status, adc-1Is the scheduling decision at the previous moment, psi is the LSTM network parameter, hdc-2And cdc-2Are all internal hidden states of the LSTM. And the fusion of dynamic information of the workshop is realized through the memory and prediction functions of the LSTM.
In this embodiment, in order to solve the cooperative relationship between the two agents for scheduling the dyeing workshop, an agent interaction mechanism is designed. Interaction between agents is achieved through LSTM-centric scheduling decision input and interaction vector output. At the time of dc-1 scheduling, the LSTM records the workshop state and scheduling decision and outputs a vector mdc-1The agent for dc sub-decision is provided with interaction between agents by an interaction vector, shown as red path in fig. 4, by a message m1The interaction between one-time batching and cylinder arrangement is realized.
The two different agents have different functions and different information to be observed, so that the matrix input during agent scheduling is different subsets of the global state. Specifically, the observed state of the batching agent comprises a task state f to be batched1And batch status f2And the observed states of the bank agents include batch state f2And the state f of the dye vat3. The state matrix for the local observation of the design batch of agents is as follows:
sB=[f1,f2] (3)
the state matrix of the local observation of the bank cylinder intelligent agent is designed as follows:
sS=[f2,f3] (4)
thus agent A is scheduled dc timesiConsidering local state of workshop during decision makingAnd dynamic information mdc-1Scheduling, agent AiCan be expressed asAnd by having a parameter thetaiThe two deep neural networks respectively form a group batch strategy function pi0And bank strategy pi1An approximation is made. Global state of the workshop is sdcAgent AiPerforming scheduling decision adcGlobal critic, approximated by a deep neural network with parameter φ, may be denoted as V (m)dc-1,sdc,adc;φ,ψ)。
S3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of the textile fabric dyeing workshop scheduling environment.
In step S3: the state parameter model agent makes a scheduling decision depending on the state information of the workshop, and senses the dynamic change of the workshop environment through the state information. Designing a state matrix F according to state characteristics related to workshop scheduling constraint and optimization targetdye. The dyeing workshop scheduling mainly comprises three objects of tasks, batches and dye vats, becauseThis is with Fdye=[f1,f2,f3]Describing the state of the plant, wherein f1=[f1,1,…,f1,n]Is the status of the task to be batched, and f1,j=[f1,j,1,…,f1,j,8]Representing task JjThe feature vector of (2); f. of2=[f2,1,…,f2,b]Is in a batch state, wherein f2,k=[f2,k,1,…,f2,k,9]Represents batch BkIs a feature vector of3=[f3,1,…,f3,m]In a dye vat state, wherein f3,i=[f3,i,1,…,f3,i,6]Indicating dye vat MiThe feature vector of (2).
In step S3: the scheduling action model is a scheduling decision set of a dyeing workshop scheduling decision space which can be executed in different states of the workshop. The method mainly comprises a batch scheduling decision space and a cylinder bank scheduling decision space.
1) A batch scheduling decision space. Setting a group batch buffer area with the quantity of q, adding a current task to be batched into a certain group batch buffer area or suspending the group batch by a group batch scheduling decision, wherein a group batch scheduling decision space is defined as:
scheduling decision 1: selecting a kth batch buffer
a=k(0≤k<q) (5)
Scheduling decision 2: wait for
a=q (6)
And when a batch scheduling decision is made, a batch buffer area is selected, the current task to be batched is added into the batch buffer area, and when the task is selected to wait, the task suspends batching. If the batch to which the batching agent joins the task is incompatible or exceeds the maximum batch capacity, the batching fails, and the result is equivalent to action 2.
2) And (5) scheduling a decision space in a cylinder bank. And the cylinder bank scheduling decision selects one batch buffer area from the batch buffer areas, and the batch buffer area is matched with the equipment for processing and has the same spatial definition as the batch scheduling decision. Selecting a batch buffer area, performing equipment matching on the batch, performing dyeing production, and emptying the batch buffer area; selecting wait means that no lot is selected for processing. And waiting for the equipment matching failure. To reduce the drag, the equipment matching rules are set to select the dye vat with the smallest switching time among the dye vat set meeting the capacity requirement.
In step S3: the dispatching feedback model is a dispatching reward function, and decomposes the objective function according to the dispatching steps to realize step-by-step reward.
1) Group scheduling reward function
Group reward rBThe hold-off time generated for all the tasks to be batched in the scheduling periodThe opposite number of (c):
whereinDenotes the time, sw, at which the dc-th batch is performedjAnd (4) waiting for grouping or arranging the mark position of the cylinder for the task.
2) Cylinder bank scheduling reward function
Cylinder bank reward rSThe accumulated hold-off time generated by the buffer zone tasks and the processing tasks in the batch in the scheduling periodThe opposite number of (c).
WhereinIndicating the time, sp, at which the cylinder bank of the second dc-th time is to be performedjThe status flag bit is a status flag bit of a task which is not arranged with cylinders or is arranged with cylinders and is not finished.
And S4, training the model of the textile fabric dyeing workshop dispatching, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain the optimal dispatching strategy of the textile fabric dyeing workshop dispatching.
In step S4: when the scheduling is triggered, the agent firstly observes the dyeing workshop state s, then selects a scheduling decision a in an executable scheduling decision set according to the state information, and continuously executes the circular progress advancing processing shown in the figure 6 according to the contribution of the scheduling decision to the target and the reward value r, and obtains a large amount of scheduling experience data through the interaction, and the agent updates the model by a data-driven method to realize the optimization of the scheduling strategy. Parameter updating method for model training in traditional PPO algorithm[15]The improvement is made. (1) And carrying out global updating on the LSTM network, the Actor and the criticic to realize synchronous optimization of the LSTM network and the agent. And the LSTM network output is the input of the Actor network and the criticic network, and the gradient of the Actor network and the criticic network during updating is transmitted back to the prefix LSTM network to realize global parameter optimization. (2) And global discount is carried out on the reward values of the batch and the cylinder bank due to the same objective of batch and cylinder bank intelligent agent optimization, so that the mutual correlation and influence between the batch and the cylinder bank are realized:
wherein Q(s)dc,adc) Is in a state sdcLower selection scheduling decision adcThe global accumulated discount prize value obtained. The method comprises the steps of continuously scheduling through a rolling event and rolling time window drive to obtain a large amount of dyeing workshop scheduling interaction data<s,a,r>And storing the parameters until all tasks are completed, updating the parameters by adopting a gradient descent method, and continuously iterating to realize the strategy function optimization from the workshop state to the scheduling decision.
And S5, deploying the obtained intelligent dye shop scheduling model in the dye shop, and scheduling according to the real-time production condition and the task arrival condition of the shop.
In step S5: the proposed dyeing workshop scheduling intelligent agent model is deployed in the dyeing workshop, and production of tasks to be processed is arranged in real time according to the real-time state of the workshop. With the arrival of a new task and the change of the processing progress of a workshop, the new task needs to be arranged on idle equipment in time for processing, and the process is continuously repeated until all tasks are processed. Fig. 2 shows a dynamic scheduling process for scheduling a dyeing workshop according to the present invention. The batch sub-cycle and the bank sub-cycle are executed in sequence during the process as shown on the left side of fig. 2, with waiting in the scheduling strategy to achieve the target optimization being considered, and rolling continuously through the hybrid triggering method of events in combination with time windows as shown on the right side of fig. 2. Therefore, the reasonable waiting of the task order in the dynamic production environment can effectively reduce the completion time of the task.
Claims (10)
1. A multi-agent depth reinforcement learning and scheduling method for a textile fabric dyeing workshop is characterized by comprising the following steps:
s1, acquiring current dye vat processing state data and task arrival data of the dyeing workshop, and preprocessing the state data;
s2, establishing a deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling, wherein the deep reinforcement learning multi-agent model comprises batch agents, cylinder arrangement agents and an information interaction center module integrating memory, prediction and agent interaction functions;
the batching agent outputs a batching scheduling decision by inputting the communication information and the batching related information of the interaction center module; the cylinder bank intelligent agent outputs a cylinder bank scheduling decision by inputting the communication information of the interaction center module and the cylinder bank related information; the information interaction center module is responsible for recording the state change of the dyeing workshop and the dispatching history of the two intelligent agents, and sending memory and prediction information to the intelligent agent for next dispatching decision;
s3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment;
the state parameter model is input parameters of an agent, and comprises task state parameters, batch state parameters and processing state parameters of the dye vat; the scheduling action model is an output scheduling action set of the agent and comprises a batch action set and a cylinder bank action set; the scheduling feedback model is a step feedback model equivalent to a scheduling objective function and is used for mutual discount feedback between the batch intelligent agents and the cylinder bank intelligent agents, so that the total delay time of the intelligent agents for realizing workshop scheduling is minimized;
s4, training a model scheduled in a textile fabric dyeing workshop, wherein the model training comprises model training of two intelligent agents and an information interaction center module, interaction is carried out between a multi-intelligent-agent deep reinforcement learning model and the dyeing workshop, parameter optimization is carried out on a neural network model by using a gradient descent method, and an optimal scheduling strategy for the textile fabric dyeing workshop scheduling is obtained through training;
s5, deploying the dyeing workshop multi-agent deep reinforcement learning model into a real dyeing workshop production management system, and performing batch scheduling decision by the batch agents according to the actual production condition of the dyeing workshop through a mixed driving mechanism of event triggering and time window triggering when scheduling is triggered, wherein the cylinder bank agents perform cylinder bank scheduling decision according to the actual production condition of the dyeing workshop; before the intelligent agent is scheduled, acquiring interaction information with memory and prediction and a real-time state of a workshop from an information interaction center, and after the intelligent agent is scheduled, sending the workshop state before scheduling and a corresponding scheduling decision to the information interaction center; after the dispatching command is issued, firstly checking the legality of the dispatching decision, if so, executing the dispatching decision, updating the state of the workshop, acquiring the evaluation value of a dispatching feedback model, recording the state, the action and the reward, and regularly adjusting the model parameters.
2. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the preprocessing in the S1 comprises the steps of carrying out integer coding and normalization on color numbers in the same color system in the task arrival data according to the depth of colors, carrying out unique hot coding on different color systems and cloth batch numbers, and carrying out normalization on the types of emergency tasks to realize the mathematical expression of the dyeing tasks; and scaling the state parameters related to the time by adopting a scale factor method, so that the order difference of each characteristic dimension is reduced.
3. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: in the step S4, the training of the batched agents is to obtain an optimal batched strategy parameter combination, the training of the cylinder bank agents is to obtain an optimal cylinder bank strategy parameter combination, the parameters of the interaction center are updated synchronously with the parameters of the two agents, the parameters are adjusted according to the evaluation value of the scheduling feedback model when updated, and the gradient is cut to increase the stability of the training.
4. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the batch agent and the cylinder bank agent are respectively provided with a scheduling strategy module, and mapping from a workshop state to batch or cylinder bank is realized through a deep neural network; the batch scheduling strategy module and the cylinder bank scheduling strategy module carry out sequential scheduling through a dynamic scheduling mechanism, and interact with the dyeing workshop environment and learn scheduling experience to optimize the scheduling strategy.
5. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the batch agent and the cylinder bank agent share a global behavior-value function and a global LSTM network, the LSTM network inputs the historical state and the scheduling record of the workshop for coding and memorizing, the fusion of historical dynamic information is realized, and a one-dimensional matrix is output to provide key workshop dynamic information for the scheduling of the batch agent and the cylinder bank agent;
the input and output of the LSTM network are represented as follows:
mdc-1=LSTM(hdc-2,cdc-2,[sdc-1,adc-1];ψ)
wherein the interaction vector mdc-1Is a one-dimensional vector, is a code of historical plant status records and prediction information, sdc-1Is an input plant status, adc-1Is the scheduling decision at the previous moment, psi is the LSTM network parameter, hdc-2And cdc-2Are all internal hidden states of the LSTM;
the fusion of dynamic information of a workshop is realized through the memory and prediction functions of an LSTM network; the interaction between the batched intelligent bodies and the cylinder bank intelligent bodies is realized through the dispatching decision input and the interactive vector output which take the LSTM network as the center, the LSTM network records the workshop state and the dispatching decision at the time of dc-1 dispatching, and outputs a vector mdc-1The method is provided for an agent making a decision for the dc time, and interaction between a batch agent and an agent for arranging cylinders is realized through an interaction vector;
the observed state of the batched agents comprises a task state f to be batched1And batch status f2The observed states of the bank agents include a batch state f2And the state f of the dye vat3The state matrix for local observation of the batch of the intelligent agent is designed as follows:
sB=[f1,f2]
the state matrix of the local observation of the bank cylinder intelligent agent is designed as follows:
sS=[f2,f3]
agent A at dc schedulingiConsidering local state of workshop during decision makingAnd dynamic information mdc-1Scheduling, agent AiCan be expressed asAnd by having a parameter thetaiThe two deep neural networks respectively form a group batch strategy function pi0And bank strategy pi1Carrying out approximation; global state of the workshop is sdcAgent AiThe execution scheduling decision is adcThe global behavior-value function, approximated by a deep neural network with the parameter φ, may be represented as V (m)dc-1,sdc,adc;φ,ψ)。
6. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the state parameter model establishes a state matrix F designed according to the state characteristics related to the workshop scheduling constraint and the optimization targetdyeDescription of the plant status, Fdye=[f1,f2,f3]Wherein f is1=[f1,1,…,f1,n]For the status of the task to be batched, f1,j=[f1,j,1,…,f1,j,8]Representing task JjThe feature vector of (2); f. of2=[f2,1,…,f2,b]Is in a batch state, wherein f2,k=[f2,k,1,…,f2,k,9]Represents batch BkCharacteristic vector of f3=[f3,1,…,f3,m]In a dye vat state, wherein f3,i=[f3,i,1,…,f3,i,6]Indicating dye vat MiThe feature vector of (2).
7. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the scheduling action model comprises a batch scheduling decision space and a cylinder scheduling decision space;
batch scheduling decision space: setting a group batch buffer area with the quantity of q, adding a current task to be batched into a certain group batch buffer area or suspending the group batch by a group batch scheduling decision, wherein a group batch scheduling decision space is defined as:
scheduling decision 1: selecting a kth batch buffer
a is k, wherein k is more than or equal to 0 and less than q
Scheduling decision 2: wait for
a=q
When a batch scheduling decision is made, a batch buffer area is selected, then the current task to be batched is added into the batch buffer area, and if the task is selected to wait, the task suspends batching; if the batches added by the batch agent to the task are incompatible or exceed the maximum batch capacity, the batch fails, and the result is equal to scheduling decision 2;
cylinder bank scheduling decision space: selecting a batch buffer area, performing equipment matching on the batch, performing dyeing production, and emptying the batch buffer area; if the selection is waiting, the selection is not carried out in any batch; waiting for execution to wait if the equipment matching fails; to reduce the drag, the equipment matching rules are set to select the dye vat with the smallest switching time among the dye vat set meeting the capacity requirement.
8. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the scheduling feedback model comprises a cylinder scheduling reward function of the batch scheduling reward function; and
batch scheduling reward function:
group reward rBThe hold-off time generated for all the tasks to be batched in the scheduling periodThe opposite number of (c):
whereinDenotes the time, sw, at which the dc-th batch is performedjThe flag bit of the cylinder to be batched or arranged is a task;
cylinder bank scheduling reward function:
cylinder bank reward rSThe accumulated hold-off time generated by the buffer zone tasks and the processing tasks in the batch in the scheduling periodThe opposite of (d);
9. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: when scheduling is triggered, the agent firstly observes the dyeing workshop state s, then selects a scheduling decision a in an executable scheduling decision set according to state information, and continuously and circularly executes interaction between the agent and the workshop to advance the processing progress according to the contribution of the scheduling decision to a target and a reward value r, so as to obtain a large amount of scheduling experience data, and the agent updates a model by a data-driven method, thereby realizing scheduling strategy optimization.
10. The multi-agent deep reinforcement learning scheduling method for textile fabric dyeing workshop as claimed in claim 1, wherein the parameter updating method for model training in S4 includes:
step 1, carrying out global updating on an LSTM network, a scheduling policy module and a behavior-value function to realize synchronous optimization of the LSTM network and an intelligent agent;
and 2, carrying out global discount on the reward values of the batch and the cylinder bank, and realizing the correlation and influence between the batch and the cylinder bank:
wherein Q(s)dc,adc) Is in a state sdcThe global accumulated discount reward value obtained by the lower selection scheduling decision adc; the method comprises the steps of continuously scheduling through a rolling event and rolling time window drive to obtain a large amount of dyeing workshop scheduling interaction data<s,a,r>And storing the parameters until all tasks are completed, updating the parameters by adopting a gradient descent method, and continuously iterating to realize the strategy function optimization from the workshop state to the scheduling decision.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110006953.9A CN112633772B (en) | 2021-01-05 | 2021-01-05 | Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110006953.9A CN112633772B (en) | 2021-01-05 | 2021-01-05 | Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112633772A CN112633772A (en) | 2021-04-09 |
CN112633772B true CN112633772B (en) | 2021-12-10 |
Family
ID=75291395
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110006953.9A Active CN112633772B (en) | 2021-01-05 | 2021-01-05 | Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112633772B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780839B (en) * | 2021-09-15 | 2023-08-22 | 湖南视比特机器人有限公司 | Evolutionary sorting job scheduling method and system based on deep reinforcement learning |
CN113837628B (en) * | 2021-09-16 | 2022-12-09 | 中国钢研科技集团有限公司 | Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning |
CN113780883A (en) * | 2021-09-26 | 2021-12-10 | 无锡唯因特数据技术有限公司 | Production workshop scheduling method and device and storage medium |
CN114154821A (en) * | 2021-11-22 | 2022-03-08 | 厦门深度赋智科技有限公司 | Intelligent scheduling dynamic scheduling method based on deep reinforcement learning |
CN114219274A (en) * | 2021-12-13 | 2022-03-22 | 南京理工大学 | Workshop scheduling method adapting to machine state based on deep reinforcement learning |
CN116842856B (en) * | 2023-09-04 | 2023-11-14 | 长春工业大学 | Industrial process optimization method based on deep reinforcement learning |
CN117726160B (en) * | 2024-02-09 | 2024-04-30 | 厦门碳基翱翔数字科技有限公司 | Textile flow management method and system based on virtual reality and evolution reinforcement learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779220A (en) * | 2016-12-20 | 2017-05-31 | 浙江中控研究院有限公司 | A kind of steel-making continuous casting hot rolling integrated scheduling method and system |
CN112101773A (en) * | 2020-09-10 | 2020-12-18 | 齐鲁工业大学 | Task scheduling method and system for multi-agent system in process industry |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10096053B2 (en) * | 2012-11-05 | 2018-10-09 | Cox Communications, Inc. | Cloud solutions for organizations |
KR102251316B1 (en) * | 2019-06-17 | 2021-05-12 | (주)브이엠에스 솔루션스 | Reinforcement learning and simulation based dispatching method within a factory, and an apparatus thereof |
-
2021
- 2021-01-05 CN CN202110006953.9A patent/CN112633772B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779220A (en) * | 2016-12-20 | 2017-05-31 | 浙江中控研究院有限公司 | A kind of steel-making continuous casting hot rolling integrated scheduling method and system |
CN112101773A (en) * | 2020-09-10 | 2020-12-18 | 齐鲁工业大学 | Task scheduling method and system for multi-agent system in process industry |
Non-Patent Citations (2)
Title |
---|
"A two-phase approach to solve the synchronized bin-forklift scheduling problem";HACHEMI N EL等;《Journal of Intelligent Manufacturing》;20150510;第651-657页 * |
"多Agent 动态调度方法在染色车间调度中的应用";徐新黎;《计算机集成制造系统》;20100315;第16卷(第3期);第611-620页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112633772A (en) | 2021-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112633772B (en) | Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop | |
CN110298589A (en) | Based on heredity-ant colony blending algorithm dynamic Service resource regulating method | |
CN104866898B (en) | A kind of Solving Multi-objective Flexible Job-shop Scheduling method based on collaboration mixing artificial fish-swarm model | |
CN105959401B (en) | A kind of manufacturing service supply-demand mode and dynamic dispatching method based on super-network | |
CN107506956A (en) | Based on improvement particle cluster algorithm supply chain production and transport coordinated dispatching method and system | |
CN109816243A (en) | Cloud towards dynamic task perception manufactures cotasking dispatching method | |
CN111199272A (en) | Adaptive scheduling method for intelligent workshop | |
Zhou et al. | Bi-objective grey wolf optimization algorithm combined Levy flight mechanism for the FMC green scheduling problem | |
CN108805403A (en) | A kind of job-shop scheduling method based on improved adaptive GA-IAGA | |
Honghong et al. | The application of adaptive genetic algorithms in FMS dynamic rescheduling | |
CN114565247B (en) | Workshop scheduling method, device and system based on deep reinforcement learning | |
CN109872091A (en) | A kind of Job Scheduling method and device based on ant group algorithm | |
CN111260181A (en) | Workshop self-adaptive production scheduling device based on distributed intelligent manufacturing unit | |
CN107146039A (en) | The customized type mixed-model assembly production method and device of a kind of multiple target Collaborative Control | |
CN108803519A (en) | A kind of method that empire's Competitive Algorithms of improvement solve Flexible Job-shop Scheduling Problems | |
CN111665808A (en) | Production scheduling plan optimization method based on genetic algorithm | |
CN106327053A (en) | Method for constructing textile process recommendation models based on multi-mode set | |
CN113033928A (en) | Design method, device and system of bus shift scheduling model based on deep reinforcement learning | |
CN112488543A (en) | Intelligent work site shift arrangement method and system based on machine learning | |
CN111369130B (en) | Distributed self-adaptive production line reconstruction method based on semantic data and knowledge reasoning | |
CN110059908A (en) | New workpiece weight method for optimizing scheduling based on self-adapted genetic algorithm | |
Liu et al. | An improved nondominated sorting genetic algorithm-ii for multi-objective flexible job-shop scheduling problem considering worker assignments | |
Esquivel et al. | Parameter settings and representations in Pareto-based optimization for job shop scheduling | |
CN114881301A (en) | Simulation scheduling method and system for production line, terminal device and storage medium | |
CN107590616A (en) | Improved empire's Competitive Algorithms solve Flexible Job-shop Scheduling Problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |