CN112633772B

CN112633772B - Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop

Info

Publication number: CN112633772B
Application number: CN202110006953.9A
Authority: CN
Inventors: 张洁; 贺俊杰; 张朋; 郑鹏; 王明
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2021-01-05
Filing date: 2021-01-05
Publication date: 2021-12-10
Anticipated expiration: 2041-01-05
Also published as: CN112633772A

Abstract

The invention provides a multi-agent deep reinforcement learning and dispatching method for a textile fabric dyeing workshop, which comprises the following steps of: and acquiring current dye vat processing state data of the dyeing workshop, acquiring task arrival data, and preprocessing the state data. And establishing a deep reinforcement learning multi-agent model for the dispatching of the textile fabric dyeing workshop. And establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment. Training a model for dispatching the textile fabric dyeing workshop, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain an optimal dispatching strategy for dispatching the textile fabric dyeing workshop. And deploying the obtained intelligent scheduling model of the dyeing workshop in the dyeing workshop, and scheduling according to the real-time production condition and the task arrival condition of the workshop. The invention realizes the dynamic scheduling of the dyeing workshop through the multi-agent deep reinforcement learning, and is suitable for the current dynamic production environment facing order type production.

Description

Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop

Technical Field

The invention relates to a multi-agent deep reinforcement learning and scheduling method for a textile fabric dyeing workshop, which is used for production scheduling of the textile fabric dyeing workshop and scheduling optimization in the textile fabric production process and belongs to the field of production planning.

Background

China is the biggest world textile garment production and export country, and the continuous and stable growth of textile garment export is crucial to ensuring foreign exchange storage, international balance of collection and balance, stable Renminbi exchange rate, and solving the sustainable development of social employment and textile industry in China. With the continuous improvement of the social living standard and the popularization of the internet, the consumption of textiles presents individual requirements and diversified requirements, and a textile company needs to optimize production operation so as to improve the productivity and quality and meet the requirements of customization and delivery date of customers. The production of fabrics can divide into and weave and dyeing and finishing two parts, and wherein the dyeing process time of dyeing and finishing stage is long, pollutant discharge level is high, is the most crucial link of textile manufacturing, through the dyeing process order of the reasonable arrangement cloth of dyeing workshop scheduling, can effectually promote punctual the delivery of product and reduce the colour and switch over material consumption cost, further promotes the customer satisfaction of enterprise and reduces environmental pollution, is the key problem that present textile industry waited to solve urgently.

The current dyeing workshop scheduling method mainly adopts static scheduling, cannot meet the quick response requirement brought by order-oriented production transformation, and only pays attention to real-time information of a workshop and lacks of consideration on historical dynamic information when the traditional reinforcement learning optimizes a scheduling target. Therefore, on the basis of the existing dyeing workshop scheduling research and PPO reinforcement learning algorithm, aiming at the problem of scheduling of the dyeing workshop with dynamically arrived tasks, the MA-RPPO reinforcement learning algorithm with the batched intelligent bodies and the cylinder arrangement intelligent bodies is designed, and an efficient scheduling strategy is established through interactive training of the intelligent bodies and the workshop.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the tasks of the textile fabric dyeing workshop dynamically arrive along with the orders, the workshop needs to make quick and effective dispatching response aiming at the production state of the workshop and the arrival condition of the tasks in real time, and the existing dispatching method cannot realize quick response and efficient dispatching.

In order to solve the problems, the technical scheme of the invention is to provide a multi-agent depth reinforcement learning scheduling method for a textile fabric dyeing workshop, which comprises the following steps:

s1, acquiring the current dye vat processing state data of the dyeing workshop, acquiring the task arrival data, and preprocessing the state data

S2, establishing a deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling

S3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment

S4, training the model of the textile fabric dyeing workshop dispatching, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain the optimal dispatching strategy of the textile fabric dyeing workshop dispatching

And S5, deploying the obtained intelligent dye shop scheduling model in the dye shop, and scheduling according to the real-time production condition and the task arrival condition of the shop.

Preferably, the step S1 further includes: the method comprises the steps of carrying out independent thermal coding on a color system of task arrival data, carrying out integer coding and normalization processing on color numbers and the like, realizing mathematical expression of dyeing tasks, and carrying out proportional scaling on time related parameters such as idle time of a dye vat.

Preferably, in S2, the multi-agent deep reinforcement learning dye shop scheduling model specifically includes a batch agent, a bank agent, and an information interaction center module integrating a memory and prediction function and an agent interaction function. The batch agent outputs a batch scheduling decision by inputting the communication information and the batch related information of the interaction center, the cylinder bank agent outputs a cylinder bank scheduling decision by inputting the communication information and the cylinder bank related information of the interaction center, the information interaction center module is responsible for recording the state change of a workshop and the scheduling history of the two agents, and the sent memory and prediction information is provided for the agent of the next scheduling decision.

Preferably, in S3, the state parameter model is input parameters of the agent, including task state parameters, batch state parameters, and processing state parameters of the dye vat. The scheduling action model is an output scheduling action set of the agent, and comprises a batch action set and a cylinder bank action set. The scheduling feedback model is a quantitative evaluation on the scheduling quality degree, is a step feedback model equivalent to a scheduling objective function, and has a mutual discount feedback mechanism between the batch agent and the cylinder bank agent, so that the agent can minimize the total delay time of the workshop scheduling.

Preferably, the S4 includes model training of two agents and an information interaction center, and the model training is performed mainly by interacting the multi-agent deep reinforcement learning model with the dyeing workshop to obtain a large amount of scheduling experience, and by using a gradient descent method. The training of the batched agents is to obtain the optimal batched strategy parameter combination, the training of the cylinder-arrangement agents is to obtain the optimal cylinder-arrangement strategy parameter combination, and the parameters of the interaction center are updated synchronously with the parameters of the two agents. And adjusting according to the evaluation value of the scheduling feedback model during parameter updating, and clipping the gradient to increase the stability of training.

Preferably, in S5, the dyeing workshop multi-agent deep reinforcement learning model is deployed into a real dyeing workshop production management system, and through a hybrid driving mechanism of event triggering and time window triggering, when scheduling triggering is performed, the batching agents make batching scheduling decisions according to the actual production conditions of the dyeing workshop, and the cylinder scheduling agents make cylinder scheduling decisions according to the actual production conditions of the dyeing workshop. Before the intelligent agent is scheduled, interactive information with memory and prediction and the real-time state of a workshop are obtained from the information interaction center, and after the intelligent agent is scheduled, the workshop state before the scheduling and a corresponding scheduling decision are sent to the information interaction center. After the dispatching command is issued, firstly checking the legality of the dispatching decision, if so, executing the dispatching decision, updating the state of the workshop, acquiring the evaluation value of a dispatching feedback model, recording the state, the action and the reward, and regularly adjusting the model parameters.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides the dyeing workshop scheduling method aiming at the problem of scheduling the textile dyeing workshop, and has important engineering value. Off-time delivery of an order may reduce customer satisfaction and thus market competitiveness for the business. Excessive color switching material consumption can cause the production cost of enterprises to rise, is contrary to the sustainable development concept of green production, and is not beneficial to the increase of the public confidence of the enterprises. Therefore, the scheduling of the dyeing workshop is optimized, the postponed time and the color switching cost of the product can be realized, the pollutant emission level can be further reduced, the credit and the public confidence of an enterprise are improved, and the method has important engineering practical application value.

Drawings

FIG. 1 is a schematic flow chart of a multi-agent depth reinforcement learning scheduling method for a textile fabric dyeing workshop;

FIG. 2 is a schematic diagram showing a dynamic scheduling process of a dyeing plant;

FIG. 3 is a schematic diagram of data preprocessing;

FIG. 4 is a schematic diagram showing the overall structure of the MA-RPPO model;

FIG. 5 is a schematic diagram illustrating the logic of the operation of the LSTM module;

FIG. 6 is a schematic diagram of the interaction between the MA-RPPO reinforcement learning agent and the workshop.

Detailed Description

In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings.

The multi-agent reinforcement learning scheduling method of the embodiment of the application uses a multi-agent reinforcement learning technology in the reinforcement learning field to schedule a dyeing workshop, and comprises the following two sub-problems: the method comprises the steps of batching subproblems and arranging subproblems, designing an intelligent body aiming at the two subproblems respectively, enabling the two intelligent bodies to dynamically carry out dispatching response according to the real-time state of a workshop, realizing the cooperation relationship between the two intelligent bodies through an information interaction center, and simultaneously enabling the information interaction center to have the functions of memorizing and predicting the state of the workshop. The two agents respectively obtain the optimal batching strategy and the cylinder arranging strategy through training, and the total delay time of the dyeing workshop scheduling is minimized through the cooperation of the two strategies, so that the dynamic response requirement facing the order type production is met.

Specifically, please refer to fig. 1, which is a flowchart of a multi-agent deep reinforcement learning scheduling method for a dyeing workshop according to an embodiment of the present application. The multi-agent deep reinforcement learning scheduling method for the dyeing workshop comprises the following steps:

and S1, acquiring the current dye vat processing state data of the dyeing workshop, acquiring task arrival data, and preprocessing the state data.

In step S1: the state data preprocessing comprises the following steps: because the original data part is characterized by character strings or numbers with special meanings, such as the numbers of cloth lots, colors and the like, the character strings cannot participate in the operation, and the direct participation of the numbered numbers in the operation can cause the phenomena of gradient disappearance, gradient explosion and the like in the network updating process. Therefore, the following features in the data should be first encoded before the experiment is performed, including: color number, color system, lot number. Commonly used encoding methods are binary encoding, one-hot encoding and integer encoding. The color numbers in the same color system are subjected to integer coding and normalization according to the depth of the color, the color systems and the lot numbers in different color systems are subjected to independent hot coding, the emergency task type parameters are the integer codes, and the emergency task type parameters are subjected to 0-1 normalization, as shown in a formula (1). Because the state characteristics of partial observation, such as waiting time and other parameters, increase along with the change of time, the gradient disappears or the gradient explodes, the state parameters related to the time are scaled by adopting a scale factor method, the order of magnitude difference of each characteristic dimension is reduced, and the scaling scale factor is set as bt. An example of data pre-processing is shown in figure 3.

In the above formula x^normRepresenting the value after normalization, x representing the integer code value before normalization, x_maxIs the maximum value of the integer code.

S2, establishing a deep reinforcement learning multi-agent model for the textile fabric dyeing workshop scheduling.

In step S2: the established deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling is an MA-RPPO reinforcement learning multi-agent model and comprises two PPO agents in batch and cylinder arrangement, each agent is provided with a scheduling strategy module Actor, mapping from a workshop state to batch or cylinder arrangement is realized through a deep neural network, and the structure is shown in FIG. 4. And the batch Actor and the cylinder bank Actor carry out sequential scheduling through a dynamic scheduling mechanism, and interact with the dyeing workshop environment and learn a scheduling experience optimization scheduling strategy. Two agents share a global Critic and a global LSTM network. Critic is a 'behavior-value' function, and the mapping from the overall state of a workshop and scheduling decision to scheduling evaluation is realized through a deep neural network. On the basis, the reinforcement learning multi-agent improves the overall optimization and problem dynamics of two sub-problems of the dyeing workshop scheduling.

In the embodiment, in order to solve the influence of dynamic arrival of the dyeing task along with the order and dynamic change of the workshop processing environment, a dynamic information fusion mechanism is designed. The LSTM module inputs the historical state of the workshop and the scheduling record to carry out coding and memory, fusion of historical dynamic information is achieved, and a one-dimensional matrix is output to provide key workshop dynamic information for intelligent agent scheduling. As shown in fig. 5, after the global state vector and the scheduling decision are spliced, the global state vector and the scheduling decision are input into the LSTM network to transmit information through hidden states h and c, and the input and output of the LSTM unit can be represented as follows:

m_dc-1＝LSTM(h_dc-2,c_dc-2,[s_dc-1,a_dc-1]；ψ) (2)

interaction vector m in the above equation_dc-1Is a one-dimensional vector, is a code of historical plant status records and prediction information, s_dc-1Is an input plant status, a_dc-1Is the scheduling decision at the previous moment, psi is the LSTM network parameter, h_dc-2And c_dc-2Are all internal hidden states of the LSTM. And the fusion of dynamic information of the workshop is realized through the memory and prediction functions of the LSTM.

In this embodiment, in order to solve the cooperative relationship between the two agents for scheduling the dyeing workshop, an agent interaction mechanism is designed. Interaction between agents is achieved through LSTM-centric scheduling decision input and interaction vector output. At the time of dc-1 scheduling, the LSTM records the workshop state and scheduling decision and outputs a vector m_dc-1The agent for dc sub-decision is provided with interaction between agents by an interaction vector, shown as red path in fig. 4, by a message m₁The interaction between one-time batching and cylinder arrangement is realized.

The two different agents have different functions and different information to be observed, so that the matrix input during agent scheduling is different subsets of the global state. Specifically, the observed state of the batching agent comprises a task state f to be batched₁And batch status f₂And the observed states of the bank agents include batch state f₂And the state f of the dye vat₃. The state matrix for the local observation of the design batch of agents is as follows:

s^B＝[f₁,f₂] (3)

the state matrix of the local observation of the bank cylinder intelligent agent is designed as follows:

s^S＝[f₂,f₃] (4)

thus agent A is scheduled dc timesⁱConsidering local state of workshop during decision making

And dynamic information m_dc-1Scheduling, agent AⁱCan be expressed as

And by having a parameter thetaⁱThe two deep neural networks respectively form a group batch strategy function pi⁰And bank strategy pi¹An approximation is made. Global state of the workshop is s_dcAgent AⁱPerforming scheduling decision a_dcGlobal critic, approximated by a deep neural network with parameter φ, may be denoted as V (m)_dc-1,s_dc,a_dc；φ,ψ)。

S3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of the textile fabric dyeing workshop scheduling environment.

In step S3: the state parameter model agent makes a scheduling decision depending on the state information of the workshop, and senses the dynamic change of the workshop environment through the state information. Designing a state matrix F according to state characteristics related to workshop scheduling constraint and optimization target_dye. The dyeing workshop scheduling mainly comprises three objects of tasks, batches and dye vats, becauseThis is with F_dye＝[f₁,f₂,f₃]Describing the state of the plant, wherein f₁＝[f_1,1,…,f_1,n]Is the status of the task to be batched, and f_1,j＝[f_1,j,1,…,f_1,j,8]Representing task J_jThe feature vector of (2); f. of₂＝[f_2,1,…,f_2,b]Is in a batch state, wherein f_2,k＝[f_2,k,1,…,f_2,k,9]Represents batch B_kIs a feature vector of₃＝[f_3,1,…,f_3,m]In a dye vat state, wherein f_3,i＝[f_3,i,1,…,f_3,i,6]Indicating dye vat M_iThe feature vector of (2).

In step S3: the scheduling action model is a scheduling decision set of a dyeing workshop scheduling decision space which can be executed in different states of the workshop. The method mainly comprises a batch scheduling decision space and a cylinder bank scheduling decision space.

1) A batch scheduling decision space. Setting a group batch buffer area with the quantity of q, adding a current task to be batched into a certain group batch buffer area or suspending the group batch by a group batch scheduling decision, wherein a group batch scheduling decision space is defined as:

scheduling decision 1: selecting a kth batch buffer

a＝k(0≤k<q) (5)

Scheduling decision 2: wait for

a＝q (6)

And when a batch scheduling decision is made, a batch buffer area is selected, the current task to be batched is added into the batch buffer area, and when the task is selected to wait, the task suspends batching. If the batch to which the batching agent joins the task is incompatible or exceeds the maximum batch capacity, the batching fails, and the result is equivalent to action 2.

2) And (5) scheduling a decision space in a cylinder bank. And the cylinder bank scheduling decision selects one batch buffer area from the batch buffer areas, and the batch buffer area is matched with the equipment for processing and has the same spatial definition as the batch scheduling decision. Selecting a batch buffer area, performing equipment matching on the batch, performing dyeing production, and emptying the batch buffer area; selecting wait means that no lot is selected for processing. And waiting for the equipment matching failure. To reduce the drag, the equipment matching rules are set to select the dye vat with the smallest switching time among the dye vat set meeting the capacity requirement.

In step S3: the dispatching feedback model is a dispatching reward function, and decomposes the objective function according to the dispatching steps to realize step-by-step reward.

1) Group scheduling reward function

Group reward r^BThe hold-off time generated for all the tasks to be batched in the scheduling period

The opposite number of (c):

wherein

Denotes the time, sw, at which the dc-th batch is performed_jAnd (4) waiting for grouping or arranging the mark position of the cylinder for the task.

2) Cylinder bank scheduling reward function

Cylinder bank reward r^SThe accumulated hold-off time generated by the buffer zone tasks and the processing tasks in the batch in the scheduling period

The opposite number of (c).

Wherein

Indicating the time, sp, at which the cylinder bank of the second dc-th time is to be performed_jThe status flag bit is a status flag bit of a task which is not arranged with cylinders or is arranged with cylinders and is not finished.

And S4, training the model of the textile fabric dyeing workshop dispatching, optimizing parameters of the neural network model by using a gradient descent method, and training to obtain the optimal dispatching strategy of the textile fabric dyeing workshop dispatching.

In step S4: when the scheduling is triggered, the agent firstly observes the dyeing workshop state s, then selects a scheduling decision a in an executable scheduling decision set according to the state information, and continuously executes the circular progress advancing processing shown in the figure 6 according to the contribution of the scheduling decision to the target and the reward value r, and obtains a large amount of scheduling experience data through the interaction, and the agent updates the model by a data-driven method to realize the optimization of the scheduling strategy. Parameter updating method for model training in traditional PPO algorithm^[15]The improvement is made. (1) And carrying out global updating on the LSTM network, the Actor and the criticic to realize synchronous optimization of the LSTM network and the agent. And the LSTM network output is the input of the Actor network and the criticic network, and the gradient of the Actor network and the criticic network during updating is transmitted back to the prefix LSTM network to realize global parameter optimization. (2) And global discount is carried out on the reward values of the batch and the cylinder bank due to the same objective of batch and cylinder bank intelligent agent optimization, so that the mutual correlation and influence between the batch and the cylinder bank are realized:

wherein Q(s)_dc,a_dc) Is in a state s_dcLower selection scheduling decision a_dcThe global accumulated discount prize value obtained. The method comprises the steps of continuously scheduling through a rolling event and rolling time window drive to obtain a large amount of dyeing workshop scheduling interaction data<s,a,r>And storing the parameters until all tasks are completed, updating the parameters by adopting a gradient descent method, and continuously iterating to realize the strategy function optimization from the workshop state to the scheduling decision.

In step S5: the proposed dyeing workshop scheduling intelligent agent model is deployed in the dyeing workshop, and production of tasks to be processed is arranged in real time according to the real-time state of the workshop. With the arrival of a new task and the change of the processing progress of a workshop, the new task needs to be arranged on idle equipment in time for processing, and the process is continuously repeated until all tasks are processed. Fig. 2 shows a dynamic scheduling process for scheduling a dyeing workshop according to the present invention. The batch sub-cycle and the bank sub-cycle are executed in sequence during the process as shown on the left side of fig. 2, with waiting in the scheduling strategy to achieve the target optimization being considered, and rolling continuously through the hybrid triggering method of events in combination with time windows as shown on the right side of fig. 2. Therefore, the reasonable waiting of the task order in the dynamic production environment can effectively reduce the completion time of the task.

Claims

1. A multi-agent depth reinforcement learning and scheduling method for a textile fabric dyeing workshop is characterized by comprising the following steps:

s1, acquiring current dye vat processing state data and task arrival data of the dyeing workshop, and preprocessing the state data;

s2, establishing a deep reinforcement learning multi-agent model for textile fabric dyeing workshop scheduling, wherein the deep reinforcement learning multi-agent model comprises batch agents, cylinder arrangement agents and an information interaction center module integrating memory, prediction and agent interaction functions;

the batching agent outputs a batching scheduling decision by inputting the communication information and the batching related information of the interaction center module; the cylinder bank intelligent agent outputs a cylinder bank scheduling decision by inputting the communication information of the interaction center module and the cylinder bank related information; the information interaction center module is responsible for recording the state change of the dyeing workshop and the dispatching history of the two intelligent agents, and sending memory and prediction information to the intelligent agent for next dispatching decision;

s3, establishing a state parameter model, a scheduling action model and a scheduling feedback model of a textile fabric dyeing workshop scheduling environment;

the state parameter model is input parameters of an agent, and comprises task state parameters, batch state parameters and processing state parameters of the dye vat; the scheduling action model is an output scheduling action set of the agent and comprises a batch action set and a cylinder bank action set; the scheduling feedback model is a step feedback model equivalent to a scheduling objective function and is used for mutual discount feedback between the batch intelligent agents and the cylinder bank intelligent agents, so that the total delay time of the intelligent agents for realizing workshop scheduling is minimized;

s4, training a model scheduled in a textile fabric dyeing workshop, wherein the model training comprises model training of two intelligent agents and an information interaction center module, interaction is carried out between a multi-intelligent-agent deep reinforcement learning model and the dyeing workshop, parameter optimization is carried out on a neural network model by using a gradient descent method, and an optimal scheduling strategy for the textile fabric dyeing workshop scheduling is obtained through training;

s5, deploying the dyeing workshop multi-agent deep reinforcement learning model into a real dyeing workshop production management system, and performing batch scheduling decision by the batch agents according to the actual production condition of the dyeing workshop through a mixed driving mechanism of event triggering and time window triggering when scheduling is triggered, wherein the cylinder bank agents perform cylinder bank scheduling decision according to the actual production condition of the dyeing workshop; before the intelligent agent is scheduled, acquiring interaction information with memory and prediction and a real-time state of a workshop from an information interaction center, and after the intelligent agent is scheduled, sending the workshop state before scheduling and a corresponding scheduling decision to the information interaction center; after the dispatching command is issued, firstly checking the legality of the dispatching decision, if so, executing the dispatching decision, updating the state of the workshop, acquiring the evaluation value of a dispatching feedback model, recording the state, the action and the reward, and regularly adjusting the model parameters.

2. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the preprocessing in the S1 comprises the steps of carrying out integer coding and normalization on color numbers in the same color system in the task arrival data according to the depth of colors, carrying out unique hot coding on different color systems and cloth batch numbers, and carrying out normalization on the types of emergency tasks to realize the mathematical expression of the dyeing tasks; and scaling the state parameters related to the time by adopting a scale factor method, so that the order difference of each characteristic dimension is reduced.

3. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: in the step S4, the training of the batched agents is to obtain an optimal batched strategy parameter combination, the training of the cylinder bank agents is to obtain an optimal cylinder bank strategy parameter combination, the parameters of the interaction center are updated synchronously with the parameters of the two agents, the parameters are adjusted according to the evaluation value of the scheduling feedback model when updated, and the gradient is cut to increase the stability of the training.

4. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the batch agent and the cylinder bank agent are respectively provided with a scheduling strategy module, and mapping from a workshop state to batch or cylinder bank is realized through a deep neural network; the batch scheduling strategy module and the cylinder bank scheduling strategy module carry out sequential scheduling through a dynamic scheduling mechanism, and interact with the dyeing workshop environment and learn scheduling experience to optimize the scheduling strategy.

5. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the batch agent and the cylinder bank agent share a global behavior-value function and a global LSTM network, the LSTM network inputs the historical state and the scheduling record of the workshop for coding and memorizing, the fusion of historical dynamic information is realized, and a one-dimensional matrix is output to provide key workshop dynamic information for the scheduling of the batch agent and the cylinder bank agent;

the input and output of the LSTM network are represented as follows:

m_dc-1＝LSTM(h_dc-2,c_dc-2,[s_dc-1,a_dc-1]；ψ)

wherein the interaction vector m_dc-1Is a one-dimensional vector, is a code of historical plant status records and prediction information, s_dc-1Is an input plant status, a_dc-1Is the scheduling decision at the previous moment, psi is the LSTM network parameter, h_dc-2And c_dc-2Are all internal hidden states of the LSTM;

the fusion of dynamic information of a workshop is realized through the memory and prediction functions of an LSTM network; the interaction between the batched intelligent bodies and the cylinder bank intelligent bodies is realized through the dispatching decision input and the interactive vector output which take the LSTM network as the center, the LSTM network records the workshop state and the dispatching decision at the time of dc-1 dispatching, and outputs a vector m_dc-1The method is provided for an agent making a decision for the dc time, and interaction between a batch agent and an agent for arranging cylinders is realized through an interaction vector;

the observed state of the batched agents comprises a task state f to be batched₁And batch status f₂The observed states of the bank agents include a batch state f₂And the state f of the dye vat₃The state matrix for local observation of the batch of the intelligent agent is designed as follows:

s^B＝[f₁,f₂]

s^S＝[f₂,f₃]

agent A at dc schedulingⁱConsidering local state of workshop during decision making

And dynamic information m_dc-1Scheduling, agent AⁱCan be expressed as

And by having a parameter thetaⁱThe two deep neural networks respectively form a group batch strategy function pi⁰And bank strategy pi¹Carrying out approximation; global state of the workshop is s_dcAgent AⁱThe execution scheduling decision is a_dcThe global behavior-value function, approximated by a deep neural network with the parameter φ, may be represented as V (m)_dc-1,s_dc,a_dc；φ,ψ)。

6. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the state parameter model establishes a state matrix F designed according to the state characteristics related to the workshop scheduling constraint and the optimization target_dyeDescription of the plant status, F_dye＝[f₁,f₂,f₃]Wherein f is₁＝[f_1,1,…,f_1,n]For the status of the task to be batched, f_1,j＝[f_1,j,1,…,f_1,j,8]Representing task J_jThe feature vector of (2); f. of₂＝[f_2,1,…,f_2,b]Is in a batch state, wherein f_2,k＝[f_2,k,1,…,f_2,k,9]Represents batch B_kCharacteristic vector of f₃＝[f_3,1,…,f_3,m]In a dye vat state, wherein f_3,i＝[f_3,i,1,…,f_3,i,6]Indicating dye vat M_iThe feature vector of (2).

7. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the scheduling action model comprises a batch scheduling decision space and a cylinder scheduling decision space;

batch scheduling decision space: setting a group batch buffer area with the quantity of q, adding a current task to be batched into a certain group batch buffer area or suspending the group batch by a group batch scheduling decision, wherein a group batch scheduling decision space is defined as:

scheduling decision 1: selecting a kth batch buffer

a is k, wherein k is more than or equal to 0 and less than q

Scheduling decision 2: wait for

a＝q

When a batch scheduling decision is made, a batch buffer area is selected, then the current task to be batched is added into the batch buffer area, and if the task is selected to wait, the task suspends batching; if the batches added by the batch agent to the task are incompatible or exceed the maximum batch capacity, the batch fails, and the result is equal to scheduling decision 2;

cylinder bank scheduling decision space: selecting a batch buffer area, performing equipment matching on the batch, performing dyeing production, and emptying the batch buffer area; if the selection is waiting, the selection is not carried out in any batch; waiting for execution to wait if the equipment matching fails; to reduce the drag, the equipment matching rules are set to select the dye vat with the smallest switching time among the dye vat set meeting the capacity requirement.

8. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: the scheduling feedback model comprises a cylinder scheduling reward function of the batch scheduling reward function; and

batch scheduling reward function:

The opposite number of (c):

wherein

Denotes the time, sw, at which the dc-th batch is performed_jThe flag bit of the cylinder to be batched or arranged is a task;

cylinder bank scheduling reward function:

The opposite of (d);

wherein

9. The multi-agent deep reinforcement learning and scheduling method for the textile fabric dyeing workshop as claimed in claim 1, characterized in that: when scheduling is triggered, the agent firstly observes the dyeing workshop state s, then selects a scheduling decision a in an executable scheduling decision set according to state information, and continuously and circularly executes interaction between the agent and the workshop to advance the processing progress according to the contribution of the scheduling decision to a target and a reward value r, so as to obtain a large amount of scheduling experience data, and the agent updates a model by a data-driven method, thereby realizing scheduling strategy optimization.

10. The multi-agent deep reinforcement learning scheduling method for textile fabric dyeing workshop as claimed in claim 1, wherein the parameter updating method for model training in S4 includes:

step 1, carrying out global updating on an LSTM network, a scheduling policy module and a behavior-value function to realize synchronous optimization of the LSTM network and an intelligent agent;

and 2, carrying out global discount on the reward values of the batch and the cylinder bank, and realizing the correlation and influence between the batch and the cylinder bank:

wherein Q(s)_dc,a_dc) Is in a state s_dcThe global accumulated discount reward value obtained by the lower selection scheduling decision adc; the method comprises the steps of continuously scheduling through a rolling event and rolling time window drive to obtain a large amount of dyeing workshop scheduling interaction data<s,a,r>And storing the parameters until all tasks are completed, updating the parameters by adopting a gradient descent method, and continuously iterating to realize the strategy function optimization from the workshop state to the scheduling decision.