CN112633772B - Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop - Google Patents

Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop Download PDF

Info

Publication number
CN112633772B
CN112633772B CN202110006953.9A CN202110006953A CN112633772B CN 112633772 B CN112633772 B CN 112633772B CN 202110006953 A CN202110006953 A CN 202110006953A CN 112633772 B CN112633772 B CN 112633772B
Authority
CN
China
Prior art keywords
scheduling
batch
agent
workshop
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110006953.9A
Other languages
Chinese (zh)
Other versions
CN112633772A (en
Inventor
张洁
贺俊杰
张朋
郑鹏
王明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN202110006953.9A priority Critical patent/CN112633772B/en
Publication of CN112633772A publication Critical patent/CN112633772A/en
Application granted granted Critical
Publication of CN112633772B publication Critical patent/CN112633772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Manufacturing & Machinery (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Coloring (AREA)
  • Treatment Of Fiber Materials (AREA)

Abstract

The invention provides a multi-agent deep reinforcement learning and dispatching method for a textile fabric dyeing workshop, which comprises the following steps of: and acquiring current dye vat processing state data of the dyeing workshop, acquiring task arrival data, and preprocessing the state data. And establishing a deep reinforcement learning multi-agent model for the dispatching of the textile fabric dyeing workshop. And establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment. Training a model for dispatching the textile fabric dyeing workshop, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain an optimal dispatching strategy for dispatching the textile fabric dyeing workshop. And deploying the obtained intelligent scheduling model of the dyeing workshop in the dyeing workshop, and scheduling according to the real-time production condition and the task arrival condition of the workshop. The invention realizes the dynamic scheduling of the dyeing workshop through the multi-agent deep reinforcement learning, and is suitable for the current dynamic production environment facing order type production.

Description

Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop
Technical Field
The invention relates to a multi-agent deep reinforcement learning and scheduling method for a textile fabric dyeing workshop, which is used for production scheduling of the textile fabric dyeing workshop and scheduling optimization in the textile fabric production process and belongs to the field of production planning.
Background
China is the biggest world textile garment production and export country, and the continuous and stable growth of textile garment export is crucial to ensuring foreign exchange storage, international balance of collection and balance, stable Renminbi exchange rate, and solving the sustainable development of social employment and textile industry in China. With the continuous improvement of the social living standard and the popularization of the internet, the consumption of textiles presents individual requirements and diversified requirements, and a textile company needs to optimize production operation so as to improve the productivity and quality and meet the requirements of customization and delivery date of customers. The production of fabrics can divide into and weave and dyeing and finishing two parts, and wherein the dyeing process time of dyeing and finishing stage is long, pollutant discharge level is high, is the most crucial link of textile manufacturing, through the dyeing process order of the reasonable arrangement cloth of dyeing workshop scheduling, can effectually promote punctual the delivery of product and reduce the colour and switch over material consumption cost, further promotes the customer satisfaction of enterprise and reduces environmental pollution, is the key problem that present textile industry waited to solve urgently.
The current dyeing workshop scheduling method mainly adopts static scheduling, cannot meet the quick response requirement brought by order-oriented production transformation, and only pays attention to real-time information of a workshop and lacks of consideration on historical dynamic information when the traditional reinforcement learning optimizes a scheduling target. Therefore, on the basis of the existing dyeing workshop scheduling research and PPO reinforcement learning algorithm, aiming at the problem of scheduling of the dyeing workshop with dynamically arrived tasks, the MA-RPPO reinforcement learning algorithm with the batched intelligent bodies and the cylinder arrangement intelligent bodies is designed, and an efficient scheduling strategy is established through interactive training of the intelligent bodies and the workshop.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the tasks of the textile fabric dyeing workshop dynamically arrive along with the orders, the workshop needs to make quick and effective dispatching response aiming at the production state of the workshop and the arrival condition of the tasks in real time, and the existing dispatching method cannot realize quick response and efficient dispatching.
In order to solve the problems, the technical scheme of the invention is to provide a multi-agent depth reinforcement learning scheduling method for a textile fabric dyeing workshop, which comprises the following steps:
s1, acquiring the current dye vat processing state data of the dyeing workshop, acquiring the task arrival data, and preprocessing the state data
S2, establishing a deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling
S3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment
S4, training the model of the textile fabric dyeing workshop dispatching, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain the optimal dispatching strategy of the textile fabric dyeing workshop dispatching
And S5, deploying the obtained intelligent dye shop scheduling model in the dye shop, and scheduling according to the real-time production condition and the task arrival condition of the shop.
Preferably, the step S1 further includes: the method comprises the steps of carrying out independent thermal coding on a color system of task arrival data, carrying out integer coding and normalization processing on color numbers and the like, realizing mathematical expression of dyeing tasks, and carrying out proportional scaling on time related parameters such as idle time of a dye vat.
Preferably, in S2, the multi-agent deep reinforcement learning dye shop scheduling model specifically includes a batch agent, a bank agent, and an information interaction center module integrating a memory and prediction function and an agent interaction function. The batch agent outputs a batch scheduling decision by inputting the communication information and the batch related information of the interaction center, the cylinder bank agent outputs a cylinder bank scheduling decision by inputting the communication information and the cylinder bank related information of the interaction center, the information interaction center module is responsible for recording the state change of a workshop and the scheduling history of the two agents, and the sent memory and prediction information is provided for the agent of the next scheduling decision.
Preferably, in S3, the state parameter model is input parameters of the agent, including task state parameters, batch state parameters, and processing state parameters of the dye vat. The scheduling action model is an output scheduling action set of the agent, and comprises a batch action set and a cylinder bank action set. The scheduling feedback model is a quantitative evaluation on the scheduling quality degree, is a step feedback model equivalent to a scheduling objective function, and has a mutual discount feedback mechanism between the batch agent and the cylinder bank agent, so that the agent can minimize the total delay time of the workshop scheduling.
Preferably, the S4 includes model training of two agents and an information interaction center, and the model training is performed mainly by interacting the multi-agent deep reinforcement learning model with the dyeing workshop to obtain a large amount of scheduling experience, and by using a gradient descent method. The training of the batched agents is to obtain the optimal batched strategy parameter combination, the training of the cylinder-arrangement agents is to obtain the optimal cylinder-arrangement strategy parameter combination, and the parameters of the interaction center are updated synchronously with the parameters of the two agents. And adjusting according to the evaluation value of the scheduling feedback model during parameter updating, and clipping the gradient to increase the stability of training.
Preferably, in S5, the dyeing workshop multi-agent deep reinforcement learning model is deployed into a real dyeing workshop production management system, and through a hybrid driving mechanism of event triggering and time window triggering, when scheduling triggering is performed, the batching agents make batching scheduling decisions according to the actual production conditions of the dyeing workshop, and the cylinder scheduling agents make cylinder scheduling decisions according to the actual production conditions of the dyeing workshop. Before the intelligent agent is scheduled, interactive information with memory and prediction and the real-time state of a workshop are obtained from the information interaction center, and after the intelligent agent is scheduled, the workshop state before the scheduling and a corresponding scheduling decision are sent to the information interaction center. After the dispatching command is issued, firstly checking the legality of the dispatching decision, if so, executing the dispatching decision, updating the state of the workshop, acquiring the evaluation value of a dispatching feedback model, recording the state, the action and the reward, and regularly adjusting the model parameters.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides the dyeing workshop scheduling method aiming at the problem of scheduling the textile dyeing workshop, and has important engineering value. Off-time delivery of an order may reduce customer satisfaction and thus market competitiveness for the business. Excessive color switching material consumption can cause the production cost of enterprises to rise, is contrary to the sustainable development concept of green production, and is not beneficial to the increase of the public confidence of the enterprises. Therefore, the scheduling of the dyeing workshop is optimized, the postponed time and the color switching cost of the product can be realized, the pollutant emission level can be further reduced, the credit and the public confidence of an enterprise are improved, and the method has important engineering practical application value.
Drawings
FIG. 1 is a schematic flow chart of a multi-agent depth reinforcement learning scheduling method for a textile fabric dyeing workshop;
FIG. 2 is a schematic diagram showing a dynamic scheduling process of a dyeing plant;
FIG. 3 is a schematic diagram of data preprocessing;
FIG. 4 is a schematic diagram showing the overall structure of the MA-RPPO model;
FIG. 5 is a schematic diagram illustrating the logic of the operation of the LSTM module;
FIG. 6 is a schematic diagram of the interaction between the MA-RPPO reinforcement learning agent and the workshop.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.
The multi-agent reinforcement learning scheduling method of the embodiment of the application uses a multi-agent reinforcement learning technology in the reinforcement learning field to schedule a dyeing workshop, and comprises the following two sub-problems: the method comprises the steps of batching subproblems and arranging subproblems, designing an intelligent body aiming at the two subproblems respectively, enabling the two intelligent bodies to dynamically carry out dispatching response according to the real-time state of a workshop, realizing the cooperation relationship between the two intelligent bodies through an information interaction center, and simultaneously enabling the information interaction center to have the functions of memorizing and predicting the state of the workshop. The two agents respectively obtain the optimal batching strategy and the cylinder arranging strategy through training, and the total delay time of the dyeing workshop scheduling is minimized through the cooperation of the two strategies, so that the dynamic response requirement facing the order type production is met.
Specifically, please refer to fig. 1, which is a flowchart of a multi-agent deep reinforcement learning scheduling method for a dyeing workshop according to an embodiment of the present application. The multi-agent deep reinforcement learning scheduling method for the dyeing workshop comprises the following steps:
and S1, acquiring the current dye vat processing state data of the dyeing workshop, acquiring task arrival data, and preprocessing the state data.
In step S1: the state data preprocessing comprises the following steps: because the original data part is characterized by character strings or numbers with special meanings, such as the numbers of cloth lots, colors and the like, the character strings cannot participate in the operation, and the direct participation of the numbered numbers in the operation can cause the phenomena of gradient disappearance, gradient explosion and the like in the network updating process. Therefore, the following features in the data should be first encoded before the experiment is performed, including: color number, color system, lot number. Commonly used encoding methods are binary encoding, one-hot encoding and integer encoding. The color numbers in the same color system are subjected to integer coding and normalization according to the depth of the color, the color systems and the lot numbers in different color systems are subjected to independent hot coding, the emergency task type parameters are the integer codes, and the emergency task type parameters are subjected to 0-1 normalization, as shown in a formula (1). Because the state characteristics of partial observation, such as waiting time and other parameters, increase along with the change of time, the gradient disappears or the gradient explodes, the state parameters related to the time are scaled by adopting a scale factor method, the order of magnitude difference of each characteristic dimension is reduced, and the scaling scale factor is set as bt. An example of data pre-processing is shown in figure 3.
Figure GDA0003333158850000041
In the above formula xnormRepresenting the value after normalization, x representing the integer code value before normalization, xmaxIs the maximum value of the integer code.
S2, establishing a deep reinforcement learning multi-agent model for the textile fabric dyeing workshop scheduling.
In step S2: the established deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling is an MA-RPPO reinforcement learning multi-agent model and comprises two PPO agents in batch and cylinder arrangement, each agent is provided with a scheduling strategy module Actor, mapping from a workshop state to batch or cylinder arrangement is realized through a deep neural network, and the structure is shown in FIG. 4. And the batch Actor and the cylinder bank Actor carry out sequential scheduling through a dynamic scheduling mechanism, and interact with the dyeing workshop environment and learn a scheduling experience optimization scheduling strategy. Two agents share a global Critic and a global LSTM network. Critic is a 'behavior-value' function, and the mapping from the overall state of a workshop and scheduling decision to scheduling evaluation is realized through a deep neural network. On the basis, the reinforcement learning multi-agent improves the overall optimization and problem dynamics of two sub-problems of the dyeing workshop scheduling.
In the embodiment, in order to solve the influence of dynamic arrival of the dyeing task along with the order and dynamic change of the workshop processing environment, a dynamic information fusion mechanism is designed. The LSTM module inputs the historical state of the workshop and the scheduling record to carry out coding and memory, fusion of historical dynamic information is achieved, and a one-dimensional matrix is output to provide key workshop dynamic information for intelligent agent scheduling. As shown in fig. 5, after the global state vector and the scheduling decision are spliced, the global state vector and the scheduling decision are input into the LSTM network to transmit information through hidden states h and c, and the input and output of the LSTM unit can be represented as follows:
mdc-1=LSTM(hdc-2,cdc-2,[sdc-1,adc-1];ψ) (2)
interaction vector m in the above equationdc-1Is a one-dimensional vector, is a code of historical plant status records and prediction information, sdc-1Is an input plant status, adc-1Is the scheduling decision at the previous moment, psi is the LSTM network parameter, hdc-2And cdc-2Are all internal hidden states of the LSTM. And the fusion of dynamic information of the workshop is realized through the memory and prediction functions of the LSTM.
In this embodiment, in order to solve the cooperative relationship between the two agents for scheduling the dyeing workshop, an agent interaction mechanism is designed. Interaction between agents is achieved through LSTM-centric scheduling decision input and interaction vector output. At the time of dc-1 scheduling, the LSTM records the workshop state and scheduling decision and outputs a vector mdc-1The agent for dc sub-decision is provided with interaction between agents by an interaction vector, shown as red path in fig. 4, by a message m1The interaction between one-time batching and cylinder arrangement is realized.
The two different agents have different functions and different information to be observed, so that the matrix input during agent scheduling is different subsets of the global state. Specifically, the observed state of the batching agent comprises a task state f to be batched1And batch status f2And the observed states of the bank agents include batch state f2And the state f of the dye vat3. The state matrix for the local observation of the design batch of agents is as follows:
sB=[f1,f2] (3)
the state matrix of the local observation of the bank cylinder intelligent agent is designed as follows:
sS=[f2,f3] (4)
thus agent A is scheduled dc timesiConsidering local state of workshop during decision making
Figure GDA0003333158850000061
And dynamic information mdc-1Scheduling, agent AiCan be expressed as
Figure GDA0003333158850000062
And by having a parameter thetaiThe two deep neural networks respectively form a group batch strategy function pi0And bank strategy pi1An approximation is made. Global state of the workshop is sdcAgent AiPerforming scheduling decision adcGlobal critic, approximated by a deep neural network with parameter φ, may be denoted as V (m)dc-1,sdc,adc;φ,ψ)。
S3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of the textile fabric dyeing workshop scheduling environment.
In step S3: the state parameter model agent makes a scheduling decision depending on the state information of the workshop, and senses the dynamic change of the workshop environment through the state information. Designing a state matrix F according to state characteristics related to workshop scheduling constraint and optimization targetdye. The dyeing workshop scheduling mainly comprises three objects of tasks, batches and dye vats, becauseThis is with Fdye=[f1,f2,f3]Describing the state of the plant, wherein f1=[f1,1,…,f1,n]Is the status of the task to be batched, and f1,j=[f1,j,1,…,f1,j,8]Representing task JjThe feature vector of (2); f. of2=[f2,1,…,f2,b]Is in a batch state, wherein f2,k=[f2,k,1,…,f2,k,9]Represents batch BkIs a feature vector of3=[f3,1,…,f3,m]In a dye vat state, wherein f3,i=[f3,i,1,…,f3,i,6]Indicating dye vat MiThe feature vector of (2).
In step S3: the scheduling action model is a scheduling decision set of a dyeing workshop scheduling decision space which can be executed in different states of the workshop. The method mainly comprises a batch scheduling decision space and a cylinder bank scheduling decision space.
1) A batch scheduling decision space. Setting a group batch buffer area with the quantity of q, adding a current task to be batched into a certain group batch buffer area or suspending the group batch by a group batch scheduling decision, wherein a group batch scheduling decision space is defined as:
scheduling decision 1: selecting a kth batch buffer
a=k(0≤k<q) (5)
Scheduling decision 2: wait for
a=q (6)
And when a batch scheduling decision is made, a batch buffer area is selected, the current task to be batched is added into the batch buffer area, and when the task is selected to wait, the task suspends batching. If the batch to which the batching agent joins the task is incompatible or exceeds the maximum batch capacity, the batching fails, and the result is equivalent to action 2.
2) And (5) scheduling a decision space in a cylinder bank. And the cylinder bank scheduling decision selects one batch buffer area from the batch buffer areas, and the batch buffer area is matched with the equipment for processing and has the same spatial definition as the batch scheduling decision. Selecting a batch buffer area, performing equipment matching on the batch, performing dyeing production, and emptying the batch buffer area; selecting wait means that no lot is selected for processing. And waiting for the equipment matching failure. To reduce the drag, the equipment matching rules are set to select the dye vat with the smallest switching time among the dye vat set meeting the capacity requirement.
In step S3: the dispatching feedback model is a dispatching reward function, and decomposes the objective function according to the dispatching steps to realize step-by-step reward.
1) Group scheduling reward function
Group reward rBThe hold-off time generated for all the tasks to be batched in the scheduling period
Figure GDA0003333158850000071
The opposite number of (c):
Figure GDA0003333158850000072
Figure GDA0003333158850000073
Figure GDA0003333158850000074
wherein
Figure GDA0003333158850000075
Denotes the time, sw, at which the dc-th batch is performedjAnd (4) waiting for grouping or arranging the mark position of the cylinder for the task.
2) Cylinder bank scheduling reward function
Cylinder bank reward rSThe accumulated hold-off time generated by the buffer zone tasks and the processing tasks in the batch in the scheduling period
Figure GDA0003333158850000076
The opposite number of (c).
Figure GDA0003333158850000077
Figure GDA0003333158850000078
Figure GDA0003333158850000079
Wherein
Figure GDA00033331588500000710
Indicating the time, sp, at which the cylinder bank of the second dc-th time is to be performedjThe status flag bit is a status flag bit of a task which is not arranged with cylinders or is arranged with cylinders and is not finished.
And S4, training the model of the textile fabric dyeing workshop dispatching, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain the optimal dispatching strategy of the textile fabric dyeing workshop dispatching.
In step S4: when the scheduling is triggered, the agent firstly observes the dyeing workshop state s, then selects a scheduling decision a in an executable scheduling decision set according to the state information, and continuously executes the circular progress advancing processing shown in the figure 6 according to the contribution of the scheduling decision to the target and the reward value r, and obtains a large amount of scheduling experience data through the interaction, and the agent updates the model by a data-driven method to realize the optimization of the scheduling strategy. Parameter updating method for model training in traditional PPO algorithm[15]The improvement is made. (1) And carrying out global updating on the LSTM network, the Actor and the criticic to realize synchronous optimization of the LSTM network and the agent. And the LSTM network output is the input of the Actor network and the criticic network, and the gradient of the Actor network and the criticic network during updating is transmitted back to the prefix LSTM network to realize global parameter optimization. (2) And global discount is carried out on the reward values of the batch and the cylinder bank due to the same objective of batch and cylinder bank intelligent agent optimization, so that the mutual correlation and influence between the batch and the cylinder bank are realized:
Figure GDA0003333158850000081
wherein Q(s)dc,adc) Is in a state sdcLower selection scheduling decision adcThe global accumulated discount prize value obtained. The method comprises the steps of continuously scheduling through a rolling event and rolling time window drive to obtain a large amount of dyeing workshop scheduling interaction data<s,a,r>And storing the parameters until all tasks are completed, updating the parameters by adopting a gradient descent method, and continuously iterating to realize the strategy function optimization from the workshop state to the scheduling decision.
And S5, deploying the obtained intelligent dye shop scheduling model in the dye shop, and scheduling according to the real-time production condition and the task arrival condition of the shop.
In step S5: the proposed dyeing workshop scheduling intelligent agent model is deployed in the dyeing workshop, and production of tasks to be processed is arranged in real time according to the real-time state of the workshop. With the arrival of a new task and the change of the processing progress of a workshop, the new task needs to be arranged on idle equipment in time for processing, and the process is continuously repeated until all tasks are processed. Fig. 2 shows a dynamic scheduling process for scheduling a dyeing workshop according to the present invention. The batch sub-cycle and the bank sub-cycle are executed in sequence during the process as shown on the left side of fig. 2, with waiting in the scheduling strategy to achieve the target optimization being considered, and rolling continuously through the hybrid triggering method of events in combination with time windows as shown on the right side of fig. 2. Therefore, the reasonable waiting of the task order in the dynamic production environment can effectively reduce the completion time of the task.

Claims (10)

1. A multi-agent depth reinforcement learning and scheduling method for a textile fabric dyeing workshop is characterized by comprising the following steps:
s1, acquiring current dye vat processing state data and task arrival data of the dyeing workshop, and preprocessing the state data;
s2, establishing a deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling, wherein the deep reinforcement learning multi-agent model comprises batch agents, cylinder arrangement agents and an information interaction center module integrating memory, prediction and agent interaction functions;
the batching agent outputs a batching scheduling decision by inputting the communication information and the batching related information of the interaction center module; the cylinder bank intelligent agent outputs a cylinder bank scheduling decision by inputting the communication information of the interaction center module and the cylinder bank related information; the information interaction center module is responsible for recording the state change of the dyeing workshop and the dispatching history of the two intelligent agents, and sending memory and prediction information to the intelligent agent for next dispatching decision;
s3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment;
the state parameter model is input parameters of an agent, and comprises task state parameters, batch state parameters and processing state parameters of the dye vat; the scheduling action model is an output scheduling action set of the agent and comprises a batch action set and a cylinder bank action set; the scheduling feedback model is a step feedback model equivalent to a scheduling objective function and is used for mutual discount feedback between the batch intelligent agents and the cylinder bank intelligent agents, so that the total delay time of the intelligent agents for realizing workshop scheduling is minimized;
s4, training a model scheduled in a textile fabric dyeing workshop, wherein the model training comprises model training of two intelligent agents and an information interaction center module, interaction is carried out between a multi-intelligent-agent deep reinforcement learning model and the dyeing workshop, parameter optimization is carried out on a neural network model by using a gradient descent method, and an optimal scheduling strategy for the textile fabric dyeing workshop scheduling is obtained through training;
s5, deploying the dyeing workshop multi-agent deep reinforcement learning model into a real dyeing workshop production management system, and performing batch scheduling decision by the batch agents according to the actual production condition of the dyeing workshop through a mixed driving mechanism of event triggering and time window triggering when scheduling is triggered, wherein the cylinder bank agents perform cylinder bank scheduling decision according to the actual production condition of the dyeing workshop; before the intelligent agent is scheduled, acquiring interaction information with memory and prediction and a real-time state of a workshop from an information interaction center, and after the intelligent agent is scheduled, sending the workshop state before scheduling and a corresponding scheduling decision to the information interaction center; after the dispatching command is issued, firstly checking the legality of the dispatching decision, if so, executing the dispatching decision, updating the state of the workshop, acquiring the evaluation value of a dispatching feedback model, recording the state, the action and the reward, and regularly adjusting the model parameters.
2. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the preprocessing in the S1 comprises the steps of carrying out integer coding and normalization on color numbers in the same color system in the task arrival data according to the depth of colors, carrying out unique hot coding on different color systems and cloth batch numbers, and carrying out normalization on the types of emergency tasks to realize the mathematical expression of the dyeing tasks; and scaling the state parameters related to the time by adopting a scale factor method, so that the order difference of each characteristic dimension is reduced.
3. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: in the step S4, the training of the batched agents is to obtain an optimal batched strategy parameter combination, the training of the cylinder bank agents is to obtain an optimal cylinder bank strategy parameter combination, the parameters of the interaction center are updated synchronously with the parameters of the two agents, the parameters are adjusted according to the evaluation value of the scheduling feedback model when updated, and the gradient is cut to increase the stability of the training.
4. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the batch agent and the cylinder bank agent are respectively provided with a scheduling strategy module, and mapping from a workshop state to batch or cylinder bank is realized through a deep neural network; the batch scheduling strategy module and the cylinder bank scheduling strategy module carry out sequential scheduling through a dynamic scheduling mechanism, and interact with the dyeing workshop environment and learn scheduling experience to optimize the scheduling strategy.
5. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the batch agent and the cylinder bank agent share a global behavior-value function and a global LSTM network, the LSTM network inputs the historical state and the scheduling record of the workshop for coding and memorizing, the fusion of historical dynamic information is realized, and a one-dimensional matrix is output to provide key workshop dynamic information for the scheduling of the batch agent and the cylinder bank agent;
the input and output of the LSTM network are represented as follows:
mdc-1=LSTM(hdc-2,cdc-2,[sdc-1,adc-1];ψ)
wherein the interaction vector mdc-1Is a one-dimensional vector, is a code of historical plant status records and prediction information, sdc-1Is an input plant status, adc-1Is the scheduling decision at the previous moment, psi is the LSTM network parameter, hdc-2And cdc-2Are all internal hidden states of the LSTM;
the fusion of dynamic information of a workshop is realized through the memory and prediction functions of an LSTM network; the interaction between the batched intelligent bodies and the cylinder bank intelligent bodies is realized through the dispatching decision input and the interactive vector output which take the LSTM network as the center, the LSTM network records the workshop state and the dispatching decision at the time of dc-1 dispatching, and outputs a vector mdc-1The method is provided for an agent making a decision for the dc time, and interaction between a batch agent and an agent for arranging cylinders is realized through an interaction vector;
the observed state of the batched agents comprises a task state f to be batched1And batch status f2The observed states of the bank agents include a batch state f2And the state f of the dye vat3The state matrix for local observation of the batch of the intelligent agent is designed as follows:
sB=[f1,f2]
the state matrix of the local observation of the bank cylinder intelligent agent is designed as follows:
sS=[f2,f3]
agent A at dc schedulingiConsidering local state of workshop during decision making
Figure FDA0003333158840000031
And dynamic information mdc-1Scheduling, agent AiCan be expressed as
Figure FDA0003333158840000032
And by having a parameter thetaiThe two deep neural networks respectively form a group batch strategy function pi0And bank strategy pi1Carrying out approximation; global state of the workshop is sdcAgent AiThe execution scheduling decision is adcThe global behavior-value function, approximated by a deep neural network with the parameter φ, may be represented as V (m)dc-1,sdc,adc;φ,ψ)。
6. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the state parameter model establishes a state matrix F designed according to the state characteristics related to the workshop scheduling constraint and the optimization targetdyeDescription of the plant status, Fdye=[f1,f2,f3]Wherein f is1=[f1,1,…,f1,n]For the status of the task to be batched, f1,j=[f1,j,1,…,f1,j,8]Representing task JjThe feature vector of (2); f. of2=[f2,1,…,f2,b]Is in a batch state, wherein f2,k=[f2,k,1,…,f2,k,9]Represents batch BkCharacteristic vector of f3=[f3,1,…,f3,m]In a dye vat state, wherein f3,i=[f3,i,1,…,f3,i,6]Indicating dye vat MiThe feature vector of (2).
7. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the scheduling action model comprises a batch scheduling decision space and a cylinder scheduling decision space;
batch scheduling decision space: setting a group batch buffer area with the quantity of q, adding a current task to be batched into a certain group batch buffer area or suspending the group batch by a group batch scheduling decision, wherein a group batch scheduling decision space is defined as:
scheduling decision 1: selecting a kth batch buffer
a is k, wherein k is more than or equal to 0 and less than q
Scheduling decision 2: wait for
a=q
When a batch scheduling decision is made, a batch buffer area is selected, then the current task to be batched is added into the batch buffer area, and if the task is selected to wait, the task suspends batching; if the batches added by the batch agent to the task are incompatible or exceed the maximum batch capacity, the batch fails, and the result is equal to scheduling decision 2;
cylinder bank scheduling decision space: selecting a batch buffer area, performing equipment matching on the batch, performing dyeing production, and emptying the batch buffer area; if the selection is waiting, the selection is not carried out in any batch; waiting for execution to wait if the equipment matching fails; to reduce the drag, the equipment matching rules are set to select the dye vat with the smallest switching time among the dye vat set meeting the capacity requirement.
8. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the scheduling feedback model comprises a cylinder scheduling reward function of the batch scheduling reward function; and
batch scheduling reward function:
group reward rBThe hold-off time generated for all the tasks to be batched in the scheduling period
Figure FDA0003333158840000041
The opposite number of (c):
Figure FDA0003333158840000042
Figure FDA0003333158840000043
Figure FDA0003333158840000044
wherein
Figure FDA0003333158840000045
Denotes the time, sw, at which the dc-th batch is performedjThe flag bit of the cylinder to be batched or arranged is a task;
cylinder bank scheduling reward function:
cylinder bank reward rSThe accumulated hold-off time generated by the buffer zone tasks and the processing tasks in the batch in the scheduling period
Figure FDA0003333158840000046
The opposite of (d);
Figure FDA0003333158840000047
Figure FDA0003333158840000048
Figure FDA0003333158840000051
wherein
Figure FDA0003333158840000052
Indicating the time, sp, at which the cylinder bank of the second dc-th time is to be performedjThe status flag bit is a status flag bit of a task which is not arranged with cylinders or is arranged with cylinders and is not finished.
9. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: when scheduling is triggered, the agent firstly observes the dyeing workshop state s, then selects a scheduling decision a in an executable scheduling decision set according to state information, and continuously and circularly executes interaction between the agent and the workshop to advance the processing progress according to the contribution of the scheduling decision to a target and a reward value r, so as to obtain a large amount of scheduling experience data, and the agent updates a model by a data-driven method, thereby realizing scheduling strategy optimization.
10. The multi-agent deep reinforcement learning scheduling method for textile fabric dyeing workshop as claimed in claim 1, wherein the parameter updating method for model training in S4 includes:
step 1, carrying out global updating on an LSTM network, a scheduling policy module and a behavior-value function to realize synchronous optimization of the LSTM network and an intelligent agent;
and 2, carrying out global discount on the reward values of the batch and the cylinder bank, and realizing the correlation and influence between the batch and the cylinder bank:
Figure FDA0003333158840000053
wherein Q(s)dc,adc) Is in a state sdcThe global accumulated discount reward value obtained by the lower selection scheduling decision adc; the method comprises the steps of continuously scheduling through a rolling event and rolling time window drive to obtain a large amount of dyeing workshop scheduling interaction data<s,a,r>And storing the parameters until all tasks are completed, updating the parameters by adopting a gradient descent method, and continuously iterating to realize the strategy function optimization from the workshop state to the scheduling decision.
CN202110006953.9A 2021-01-05 2021-01-05 Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop Active CN112633772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110006953.9A CN112633772B (en) 2021-01-05 2021-01-05 Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110006953.9A CN112633772B (en) 2021-01-05 2021-01-05 Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop

Publications (2)

Publication Number Publication Date
CN112633772A CN112633772A (en) 2021-04-09
CN112633772B true CN112633772B (en) 2021-12-10

Family

ID=75291395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110006953.9A Active CN112633772B (en) 2021-01-05 2021-01-05 Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop

Country Status (1)

Country Link
CN (1) CN112633772B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780839B (en) * 2021-09-15 2023-08-22 湖南视比特机器人有限公司 Evolutionary sorting job scheduling method and system based on deep reinforcement learning
CN113837628B (en) * 2021-09-16 2022-12-09 中国钢研科技集团有限公司 Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning
CN113780883A (en) * 2021-09-26 2021-12-10 无锡唯因特数据技术有限公司 Production workshop scheduling method and device and storage medium
CN114154821A (en) * 2021-11-22 2022-03-08 厦门深度赋智科技有限公司 Intelligent scheduling dynamic scheduling method based on deep reinforcement learning
CN114219274A (en) * 2021-12-13 2022-03-22 南京理工大学 Workshop scheduling method adapting to machine state based on deep reinforcement learning
CN116842856B (en) * 2023-09-04 2023-11-14 长春工业大学 Industrial process optimization method based on deep reinforcement learning
CN117726160B (en) * 2024-02-09 2024-04-30 厦门碳基翱翔数字科技有限公司 Textile flow management method and system based on virtual reality and evolution reinforcement learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779220A (en) * 2016-12-20 2017-05-31 浙江中控研究院有限公司 A kind of steel-making continuous casting hot rolling integrated scheduling method and system
CN112101773A (en) * 2020-09-10 2020-12-18 齐鲁工业大学 Task scheduling method and system for multi-agent system in process industry

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10096053B2 (en) * 2012-11-05 2018-10-09 Cox Communications, Inc. Cloud solutions for organizations
KR102251316B1 (en) * 2019-06-17 2021-05-12 (주)브이엠에스 솔루션스 Reinforcement learning and simulation based dispatching method within a factory, and an apparatus thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779220A (en) * 2016-12-20 2017-05-31 浙江中控研究院有限公司 A kind of steel-making continuous casting hot rolling integrated scheduling method and system
CN112101773A (en) * 2020-09-10 2020-12-18 齐鲁工业大学 Task scheduling method and system for multi-agent system in process industry

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A two-phase approach to solve the synchronized bin-forklift scheduling problem";HACHEMI N EL等;《Journal of Intelligent Manufacturing》;20150510;第651-657页 *
"多Agent 动态调度方法在染色车间调度中的应用";徐新黎;《计算机集成制造系统》;20100315;第16卷(第3期);第611-620页 *

Also Published As

Publication number Publication date
CN112633772A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN112633772B (en) Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop
CN110298589A (en) Based on heredity-ant colony blending algorithm dynamic Service resource regulating method
CN104866898B (en) A kind of Solving Multi-objective Flexible Job-shop Scheduling method based on collaboration mixing artificial fish-swarm model
CN105959401B (en) A kind of manufacturing service supply-demand mode and dynamic dispatching method based on super-network
CN107506956A (en) Based on improvement particle cluster algorithm supply chain production and transport coordinated dispatching method and system
CN109816243A (en) Cloud towards dynamic task perception manufactures cotasking dispatching method
CN111199272A (en) Adaptive scheduling method for intelligent workshop
Zhou et al. Bi-objective grey wolf optimization algorithm combined Levy flight mechanism for the FMC green scheduling problem
CN108805403A (en) A kind of job-shop scheduling method based on improved adaptive GA-IAGA
Honghong et al. The application of adaptive genetic algorithms in FMS dynamic rescheduling
CN114565247B (en) Workshop scheduling method, device and system based on deep reinforcement learning
CN109872091A (en) A kind of Job Scheduling method and device based on ant group algorithm
CN111260181A (en) Workshop self-adaptive production scheduling device based on distributed intelligent manufacturing unit
CN107146039A (en) The customized type mixed-model assembly production method and device of a kind of multiple target Collaborative Control
CN108803519A (en) A kind of method that empire&#39;s Competitive Algorithms of improvement solve Flexible Job-shop Scheduling Problems
CN111665808A (en) Production scheduling plan optimization method based on genetic algorithm
CN106327053A (en) Method for constructing textile process recommendation models based on multi-mode set
CN113033928A (en) Design method, device and system of bus shift scheduling model based on deep reinforcement learning
CN112488543A (en) Intelligent work site shift arrangement method and system based on machine learning
CN111369130B (en) Distributed self-adaptive production line reconstruction method based on semantic data and knowledge reasoning
CN110059908A (en) New workpiece weight method for optimizing scheduling based on self-adapted genetic algorithm
Liu et al. An improved nondominated sorting genetic algorithm-ii for multi-objective flexible job-shop scheduling problem considering worker assignments
Esquivel et al. Parameter settings and representations in Pareto-based optimization for job shop scheduling
CN114881301A (en) Simulation scheduling method and system for production line, terminal device and storage medium
CN107590616A (en) Improved empire&#39;s Competitive Algorithms solve Flexible Job-shop Scheduling Problems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant