CN114281050B

CN114281050B - Q learning-based process manufacturing workshop rolling and binding process section production optimization method

Info

Publication number: CN114281050B
Application number: CN202111650352.8A
Authority: CN
Inventors: 韩忠华; 卞旭升; 常大亮
Original assignee: Shenyang Institute of Automation of CAS; Shenyang Jianzhu University
Current assignee: Shenyang Institute of Automation of CAS; Shenyang Jianzhu University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2024-06-07
Anticipated expiration: 2041-12-30
Also published as: CN114281050A

Abstract

The Q learning-based process manufacturing workshop rolling and ligating process section production optimization method relates to the field of process manufacturing industry production optimization, and simulation production can be carried out on actual process workshop rolling and ligating process section conditions by establishing a process manufacturing workshop rolling and ligating process section model; acquiring production data of a flow manufacturing workshop, defining Q learning parameters, generating a self-growing Q table for recording action-time-Q values, and recording a new state in the Q table every time the new state is encountered; the production optimization method based on Q learning is provided aiming at the characteristics of a dynamic production line, and the starting command of the rolling pot in the rolling process is controlled through a Q table obtained by continuously performing interactive training with a manufacturing workshop model through Q learning, so that the production optimization problem of rolling and ligating process sections in a flow manufacturing workshop is solved.

Description

Q learning-based process manufacturing workshop rolling and binding process section production optimization method

Technical Field

The invention relates to a production optimization problem in the process manufacturing industry, and aims to solve the production optimization problem of rolling, kneading and ligating process sections in a workshop of the process manufacturing industry.

Background

The process manufacturing industry refers to the industry that raw materials are subjected to a series of processes of mixing, separating, forming or chemical reaction and the like for changing physical and chemical properties of the raw materials, so that the raw materials are added with value to obtain products with specific physical and chemical properties and specific purposes. The industries of foods, colored building materials, petroleum and the like include manufacturing plants having typical process manufacturing plant characteristics. In the food industry, a ham sausage production workshop belongs to a typical process manufacturing workshop, ham sausage raw materials are processed in a raw material area to produce a twisted product and an emulsion, the twisted product and the emulsion are rolled and kneaded to produce ham sausage meat stuffing, then the ham sausage semi-finished product is produced in a ligation process, and the ham sausage semi-finished product is sterilized and packaged to produce the finished ham sausage. In the rolling and ligating process section, various ham sausages are rolled and kneaded in rolling and kneading pot equipment, the ham sausages produced by the rolling and kneading pot after working for a period of time are temporarily stored in a rolling and kneading pot discharging bin, the ham sausages in the rolling and kneading pot discharging bin are transported to a ligature feeding bin through an AGV trolley, and in actual production, one ligature is taken as a production unit, and a plurality of ligature machines are included. Sucking the ham sausage meat stuffing in the stuffing bin with the ligature line, stuffing the sausage casing with the ligature machine, and packing the sausage casing with aluminum buckles at two ends. The ham sausage meat stuffing produced by the rolling procedure enters the discharging bin and is transferred to the ligature by the AGV trolley which is arranged on the fixed track, in order to reduce the transfer time and facilitate the dispatching and commanding of the AGV trolley, the AGV trolley can only be matched with the corresponding ligature, thereby each rolling pot discharging bin can not feed all ligature feeding bins, when the specification and the model of ham sausage produced by each ligature can not lead to the weight difference of each ligature unit time, the speed of consuming the ham sausage meat stuffing by the ligature is different, and because the rolling pot is produced according to the whole pot, if the starting time and the starting sequence of the rolling pot can not be reasonably arranged, the situation that the ham sausage meat stuffing can not be completely consumed exists in part of the rolling pot at the end of production can be caused. If such a situation appears in production, in order to avoid ham sausage meat stuffing to remain, enterprises often transfer the remaining ham sausage meat stuffing in the rolling and kneading pot discharging bin to the corresponding ligature wire feeding bin in a manual transfer mode, so that the production efficiency can be greatly influenced, the automation level of the whole ham sausage production workshop is reduced, and the production capacity of the enterprises is seriously influenced. Therefore, in the stage of planning production, the starting sequence and the starting time of the rolling and kneading pot are reasonably controlled according to the processing speed and the corresponding relation between the discharging bin of the rolling and kneading pot and the feeding bin of the ligature, so that the residual quantity of the ham sausage meat stuffing in the rolling and kneading pot is reduced. Because the ham sausage production process is a dynamic process, the ham sausage production line state is changed continuously along with time, and the problem of production optimization of the rolling and ligating working procedure section of a ham sausage production workshop has certain complexity, an effective solution method needs to be provided.

As shown in fig. 2, when a single kneading pot cannot correspond to all ligatures, the kneading pot works according to the pot, the ligatures consume different material speeds, and the kneading pot and the ligatures are not in a full connection relationship, so that when the production is finished, part of the kneading pot has residual ham sausage meat stuffing, and part of the ligatures also have production tasks of incomplete whole ligature workload.

The production plan of the rolling and ligating working procedure section is compiled, a reasonable starting sequence and time of rolling and kneading equipment are provided, and the production optimization process is a process for continuously making intelligent decisions according to the working state of the equipment in continuous time. Reinforcement learning is mainly applied to numerous problems with intersection and continuous decision making, and simulates a learning mode of human beings, rewards are obtained according to execution effects after a certain action or decision is executed, and learning is carried out through continuous interaction with the environment so as to finally achieve the aim. Q learning is the most commonly used algorithm in reinforcement learning, and has very high universality and practicability through states, action ranges and returns of any action under any state which are met by a Q table recording environment. Therefore, the invention provides a Q learning-based production optimization method for rolling and ligating working procedure sections in a process manufacturing workshop.

Disclosure of Invention

The invention aims to solve the problems of rolling and ligating process sections in a process manufacturing workshop, and provides a production optimization method based on Q learning.

In order to achieve the above purpose, the method for optimizing the rolling and ligating process section production in the flow manufacturing workshop based on Q learning comprises the following steps:

Step 1, a rolling and ligating process section model of a flow manufacturing workshop is built, and simulation production can be carried out on actual rolling and ligating process section conditions of the production workshop.

The rolling and ligating process section model of the process manufacturing workshop comprises the following parameters.

Meter model parameter table

The model constraint relationship is as follows:

(1) Station information constraint

In the rolling procedure, the total number of the rolling pots is equal to the total number of the rolling discharging bins, and each rolling pot only corresponds to the corresponding rolling pot discharging bin. The correspondence is as follows.

P_T＝M_T (1)

Wherein, the formula (1) represents that the total number of the rolling pot is equal to the total number of the rolling discharging bin. In formula (2), WS _T,iCanOutT_c represents whether the ith tumbling pot WS _T,i can discharge in the c-th tumbling discharge bin.

In the ligature procedure, the total number of ligatures is equal to the total number of ligature feeding bins, and each ligature only corresponds to the respective ligature feeding bin. The correspondence is as follows.

M_L＝P_L (3)

Wherein, the formula (3) represents that the total number of ligatures is equal to the total number of ligation feeding bins. In equation (4), WS _T,jCanOutL_r represents whether the jth ligature WST, j can aspirate in the (r) ligation feed bin.

(2) Production relationship constraints

The constraint of production relation in the rolling and binding procedure section in the convection manufacturing workshop, namely, the ham sausage meat stuffing in each rolling and binding out bin can only be fed to part of binding threads out of the bin, and the general constraint is as follows.

In the formula, T _cCanInL_r indicates that the rolling discharging bin Tc can supply a material identifier to the ligature feeding bin Lr, when Lr belongs to the collection BTc, namely, the ligature feeding bin collection connected with the c-th rolling discharging bin, the rolling discharging bin Tc can supply material to the ligature feeding bin Lr, and otherwise, the ligature feeding bin Lr cannot be supplied with material.

(3) Ligature work task constraints

The weight of the ham sausage stuffing in the rolling and discharging bin corresponding to each minute of the binding line is related to the weight of the ham sausage stuffing in the rolling and discharging bin corresponding to each minute of the binding line, and the corresponding relation is as follows.

Wherein RWWSL, r (t+1) represents the residual work task at the next moment of the ligature feeding bin r, and the formula means that when the weight of the residual ham sausage meat stuffing in the rolling-out bin corresponding to the ligature feeding bin r is larger than the speed of the ligature feeding bin r for consuming the ham sausage meat stuffing, the residual task at the next moment of the ligature r is the ligature speed of the ligature r subtracted from the residual task at the previous moment; when the weight of the residual ham sausage meat stuffing in the rolling-out bin c is more than 0 and less than the speed of the ligature wire feeding bin r consuming the ham sausage meat stuffing in the rolling-out bin corresponding to the ligature wire feeding bin r, the residual task at the next time of the ligature wire r is the weight of the residual ham sausage in the rolling-out discharging bin subtracted from the residual task at the previous time; in other cases, the remaining task at the next time of the ligature r is equal to the remaining task at the previous time.

(4) Ham sausage residual quantity constraint

After the production is finished, the sum of the ham sausage surplus and the ham sausage meat stuffing surplus in each rolling discharging bin in the rolling ligation working procedure section is equal to the sum of the work task surplus and the work task surplus in each ligature. The relationship is as follows.

Wherein MDL is the residual amount of the ham sausage,Represents the sum of the remaining ham sausage fillings of each tumbling discharge bin at the end of production,/>Representing the sum of the remaining work tasks per ligature at the end of production.

Step 2, obtaining production data of a flow manufacturing workshop, wherein the production data comprise a rolling and kneading process, the number M _T and M _L of stations in each process in the ligation process, a ligature feeding bin set B _Tc for conveying ham sausage meat stuffing from a discharging bin of each rolling and kneading pot, the work task amount RWOper _T (0) of each ligature and a speed set VOper _L for consuming the ham sausage meat stuffing by each ligature.

And 3, defining Q learning parameters. The method comprises a state S in production, an action A corresponding to the state, a return R after the action is made and the iteration times.

Step 3.1 defines the state S in Q learning, i.e. the real-time state of the flow manufacturing plant. The invention discloses a real-time state S (t) of a manufacturing workshop with a specified flow, which consists of working states of working stations of a rolling process and a material level state of a material outlet bin of a rolling process. I.e. S (t) = RSOper _L (t) & RMP (t).

Step 3.2, defining an action A corresponding to each state, namely whether each station of the rolling process starts or not.

Step 3.3, defining a return R, wherein the next state is a final state, namely, when production cannot be continued, the return is defined according to the residual quantity of the ham sausage meat stuffing, and when the residual quantity of the ham sausage meat stuffing is equal to 0, the positive R value is most given; when the remaining amount of the ham sausage meat stuffing is more than 0, the feedback is updated according to the following formula.

R＝-kMDL(k＞0) (8)

Where k is the amplification factor.

Step 3.4 defines the number of iterations, i.e. the number of Q learning exercises. And after the intelligent agent is produced once in the rolling and binding working procedure section of the flow workshop, the process is iterated once.

Step 4, initializing a Q table, generating a self-increasing Q table recording action-time-Q values, and recording a new state in the Q table every time the state is encountered.

And 5, applying Q learning, initializing a state according to the process manufacturing workshop state S defined in the step 3, and generating corresponding actions, wherein the initialization time t is 0. Storing the initial state S (0) of the production line and the corresponding action A (0) thereof into a Q table.

And 6, selecting which station of the rolling process starts to give a high return R according to the Q table for the current production line state S (t), and if a plurality of actions A exist in the highest value of the return, randomly selecting the actions A from the action sets constructed by the actions A to obtain A (t).

And 7, accessing the flow workshop environment, namely simulating production, according to the action A (t) selected by the corresponding state S (t) at the moment t by the established flow workshop model to obtain the state S (t+1) at the next moment.

And 8, judging the working state of the rolling pot. If the rolling pot is in the working state in the state S (t+1) at the moment of t+1, returning to the step 7; if there is an empty tumbling pot, jumping to step 9.

Step 9, judging whether the state S (t+1) is a termination state, namely, all the equipment in the rolling and kneading process reach the production task amount and the discharging bin in the rolling and kneading process cannot feed the ligation process. If the state S (t+1) is a termination state, giving a feedback value R defined in the step 3, recording the number of iterations added with 1 and jumping to the step 12; if the state S (t+1) is not the termination state, the process jumps to step 10.

Step 10 determines whether the flow shop state S (t+1) at time t+1 exists in the state recorded in the Q table. If not, this state S (t+1) is recorded in the Q table and stored in its corresponding action A (t+1). If so, return to step 6.

Step 11 finds the largest Q value among the corresponding action Q values according to S (t+1) as its feedback value R according to the action a (t) selected in step 7.

Step 12, updating the Q value corresponding to the action A (t) selected by S (t) according to the return R obtained in step 9 or step 11, and recording the time t by 1. The return is updated by feedback according to the following formula;

wherein R (S (t), a (t)) is the Q value of the current state-action pair itself, The highest value of all action rewards when the next state is reached for taking an action. Gamma is the attenuation coefficient. If step 9 is skipped to this step, the process jumps to step 13. If step 11 is skipped to this step, the process returns to step 6.

Step 13, judging whether the iteration times reach a preset value. If not, returning to the step 5, and if so, outputting the Q table as the production optimization result of the rolling and ligating process section of the flow manufacturing workshop of the Q learning.

The beneficial technical effects of the invention are as follows: by the optimizing method, the residual quantity of the ham sausage meat stuffing can be controlled to be 0 to 0.05 tons when the production is carried out, so that the problems of production cost improvement, a large quantity of residual materials, enterprise profit reduction and the like caused by a large quantity of residual ham sausage meat stuffing are effectively avoided.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram showing a series relationship between a tumbling process and a ligating process in the prior art.

FIG. 3 is a diagram showing the arrangement of station information in the rolling and ligating process steps according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention.

Examples

A process manufacturing workshop rolling and ligating process section production optimization method based on Q learning,

TABLE 1 model parameter Table

The model constraint relationship is as follows:

(1) Station information constraint

P_T＝M_T (1)

M_L＝P_L (3)

(2) Production relationship constraints

(3) Ligature work task constraints

(4) Ham sausage residual quantity constraint

In this example, the rolling and ligating process section has the following production characteristics.

As shown, in the tumbling ligature procedure, there are three tumbling pans, four ligatures, i.e., mt=3, ml=4. Each tumbling pot corresponds to one tumbling discharging bin, and each ligature corresponds to one ligature feeding bin, namely Tc=3 and PL=4. Any bandit rolling and kneading bin Tc can only feed two ligature feeding bins, namely NTc =2, and 0 < c is less than or equal to MT. And only two adjacent ligation feed bins are connected, namely Bt1= [ L1, L2], bt2= [ L2, L3], bt3= [ L3, L4].

In this embodiment, the day plan task and task allocation are shown in the following table.

Table 2 day planning task parameters

Ham specification (Single weight)	Planning task volume (ton)	Ligature speed (ton/min)	Task allocation
				38g	30	0.0036	Ligature line 1
40g	24	0.0038	Ligature line 2
				50g	36	0.0046	Ligature line 3
60g	24	0.0057	Ligature line 4

RWOper _T(0)＝[30,24,36,24],VOper_L = [0.0036,0.0038,0.0046,0.0057] was obtained.

Step 3.3 defining a return R, wherein the next state is the final state, i.e. the return is defined according to the remaining amount of the ham sausage meat stuffing when the production cannot be continued, and when the remaining amount of the ham sausage meat stuffing is equal to 0, r=5000; when the remaining amount of the ham sausage meat stuffing is more than 0, the feedback is updated according to the following formula.

R＝-1000MDL(k＞0) (8)

Step 3.4 defines the iteration number as 50, i.e. the number of Q learning exercises. And after the intelligent agent is produced once in the rolling and binding working procedure section of the flow workshop, the process is iterated once.

Step 4, initializing a Q table, generating a self-increasing Q table recording action-time-Q values, and recording a new state in the Q table every time the state is encountered. The following table is the initialized Q table.

Wherein RSOper _L (t) is the starting state of each station in the rolling process, 0 represents idle, and 1 represents starting. RMP (t) represents the state of the discharge bin of the tumbling pot, 0 represents the blanking, and 1 represents the blanking. The number in the action represents the number of the rolling pot, T represents the boiling, and F represents the non-boiling. If 1_T represents the operation of the kneading pot 1 in the current state.

And 5, initializing the state according to the process manufacturing workshop state S defined in the step 3, and generating corresponding actions, wherein the initialization time t is 0. Storing the initial state S (0) of the production line and the corresponding action A (0) thereof into a Q table.

/>

wherein R (S (t), a (t)) is the Q value of the current state-action pair itself, The highest value of all action rewards when the next state is reached for taking an action. If step 9 is skipped to this step, the process jumps to step 13. If step 11 is skipped to this step, the process returns to step 6.

Finally, according to the Q learning-based rolling and ligating process section production optimization method in the process manufacturing workshop, a converged Q table is obtained, and the result is as follows:

q table Q learning-based partial results of multi-series production optimization problem Q table of ham sausage rolling and binding working procedure sections

As can be seen from the table Q, in the current embodiment, when the state of the kneading pot is all idle and the material of the discharging bin of the kneading pot is all absent, the 1 st kneading pot should be started; when only the 2 nd rolling pot is idle and only the 2 nd rolling discharging bin has materials, the second rolling pot is started. The Q-table details the return of each state and its corresponding action that the process manufacturing shop encounters during the production process. When Q learning is adopted to optimize the production problem of multiple series connection of rolling and ligating process steps, only a Q table is required to be queried according to the production state of a current production workshop, the equivalent production line state is found, namely the elements in a rolling and rubbing pot state set and a rolling and rubbing pot discharging bin state set are all equal, and the action command with the highest feedback value in the corresponding rolling and rubbing pot command is found, and is used as the start command of the current rolling and rubbing process. By continually querying the status to select the optimal action, it is ensured that each action is intended to reduce the remaining amount of ham sausage meat, and in the final result, it is ensured that the remaining amount of ham sausage meat is minimized. Finally, by the solution provided by the invention, the residual quantity of the ham sausage meat stuffing can be controlled to be 0 to 0.05 ton when the production scheduling is carried out, so that the problems of production cost improvement, a large quantity of residual materials, enterprise profit reduction and the like caused by a large quantity of residual ham sausage meat stuffing are effectively avoided.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the invention, which is defined by the following claims.

Claims

1. A Q learning-based process manufacturing workshop rolling and ligating process section production optimization method is characterized by comprising the following steps of: comprises the following steps

Step 1, establishing a rolling and binding procedure section model of a flow manufacturing workshop, and simulating the actual rolling and binding procedure section condition of a production workshop;

step 2, obtaining production data of a flow manufacturing workshop;

step3, defining Q learning parameters;

step 4, initializing a Q table, generating a self-growing Q table for recording action-time-Q values, and recording a new state in the Q table every time the new state is encountered;

step 5, initializing a state according to the process manufacturing workshop state S defined in the step 3, and generating corresponding actions, wherein the initialization time t is 0;

step 6, selecting which station of the rolling process starts to give a high return R according to the Q table for the current production line state S (t), and if a plurality of actions A exist in the highest value of the return, randomly selecting the actions A from action sets constructed by the actions A to obtain A (t);

Step 7, accessing the flow workshop environment, namely simulating production, according to the action A (t) selected by the corresponding state S (t) at the moment t through the established flow workshop model to obtain the state S (t+1) at the next moment;

step 8, judging the working state of the rolling pot, and if the rolling pot is in the working state in the state S (t+1) at the moment t+1, returning to the step 7; if the idle rolling pot exists, jumping to the step 9;

Step 9, judging whether the state S (t+1) is a termination state, namely, all equipment in the rolling and kneading process reach the production task quantity, and a discharging bin in the rolling and kneading process cannot feed materials to the ligation process, if the state S (t+1) is the termination state, giving a feedback value R defined in the step 3, recording and adding 1 to the iteration times, and jumping to the step 12; if the state S (t+1) is not the termination state, jumping to the step 10;

Step 10, judging whether the state S (t+1) of the process workshop at the time of t+1 exists in the state recorded by the Q table, if not, recording the state S (t+1) in the Q table and storing the corresponding action A (t+1), and if so, returning to the step 6;

Step 11, according to the action A (t) selected in the step 7, finding the largest Q value in the corresponding action Q values according to S (t+1) to be used as a feedback value R;

step 12, updating the Q value corresponding to the action A (t) selected by the S (t) according to the return R obtained in the step 9 or the step 11, and recording the time t plus 1, wherein the return is updated in a feedback way according to the following formula;

wherein R (S (t), a (t)) is the Q value of the current state-action pair itself, The highest value of all action rewards when the action reaches the next state for taking action; if the step 9 is skipped to the step, the step 13 is skipped, and if the step 11 is skipped to the step, the step 6 is returned to;

And step 13, judging whether the iteration times reach a preset value, if not, returning to the step 5, and if so, outputting the Q table at the moment as a production optimization result of the rolling and ligating process section of the Q learning process manufacturing workshop.

2. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the rolling and ligating process section model of the flow manufacturing workshop is in model constraint relation, and comprises station information constraint, production relation constraint, ligature work task constraint and ham sausage residual quantity constraint.

3. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the station information constraint is that in the rolling procedure, the total number of rolling pots is equal to the total number of rolling discharge bins, each rolling pot only corresponds to the respective rolling pot discharge bin, and the corresponding relation is as follows:

P_T＝M_T (1)

Wherein, the formula (1) represents that the total number of the rolling pots is equal to the total number of the rolling discharging bins, and in the formula (2), WS _T,iCanOutT_c represents whether the ith rolling pot WS _T,i can discharge in the c-th rolling discharging bin;

in the ligation process, the total number of ligatures is equal to the total number of ligature feeding bins, each ligature only corresponds to the ligature feeding bin, and the corresponding relation is as follows;

M_L＝P_L (3)

Wherein formula (3) represents that the total number of ligatures is equal to the total number of ligation feeding bins, and in formula (4), WS _T,jCanOutL_r represents whether the jth ligature WST, j can suck materials in the (r) ligation feeding bin.

4. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the production relation constraint aims at the constraint of the production relation in the rolling and binding procedure section in a stream manufacturing workshop, namely, the ham sausage meat stuffing in each rolling and binding discharging bin can only be fed to a part of binding thread discharging bins, and the general constraint is as follows:

5. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the ligature work task restrains the weight of the ham sausage in the rolling and kneading discharging bin corresponding to the ligature per minute, and the corresponding relation is as follows:

Wherein RWWS _L,f (t+1) represents the remaining work task of the ligature feeding bin r at the next moment, and the formula means that when the weight of the remaining ham sausage meat stuffing in the rolling-out bin corresponding to the ligature feeding bin r is larger than the speed of the ligature feeding bin r for consuming the ham sausage meat stuffing, the remaining task of the ligature r at the next moment is the ligature speed of the ligature r subtracted from the remaining task at the previous moment; when the weight of the residual ham sausage meat stuffing in the rolling-out bin c is more than 0 and less than the speed of the ligature wire feeding bin r consuming the ham sausage meat stuffing in the rolling-out bin corresponding to the ligature wire feeding bin r, the residual task at the next time of the ligature wire r is the weight of the residual ham sausage in the rolling-out discharging bin subtracted from the residual task at the previous time; in other cases, the remaining task at the next time of the ligature r is equal to the remaining task at the previous time.

6. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: after the production is finished, the remaining quantity of the ham sausage is limited, the sum of the ham sausage residues in the rolling and binding working procedure section and the remaining ham sausage meat stuffing in each rolling and binding discharging bin is equal, and the sum of the ham sausage meat stuffing residues in each binding line is equal, and the relation is as follows:

wherein MDL is the residual amount of the ham sausage, Represents the sum of the remaining ham sausage fillings of each tumbling discharge bin at the end of production,/>Representing the sum of the remaining work tasks per ligature at the end of production.

7. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the Q learning parameters comprise a state S in production, an action A corresponding to the state, a return R after the action is made and the iteration times, and the steps are as follows:

Step 3.1, defining a state S in Q learning, namely a real-time state of a process manufacturing workshop, wherein the real-time state S (t) of the process manufacturing workshop is composed of working states of all working stations in a rolling process and a material level state of a material outlet bin in a second rolling process; i.e., S (t) = RSOper _L (t) & RMP (t);

step 3.2, defining an action A corresponding to each state, namely whether each station of the rolling process starts;

step 3.3 defining a return R, wherein the next state is the final state, i.e. the return is defined according to the remaining amount of the ham sausage meat stuffing when the production cannot be continued, and when the remaining amount of the ham sausage meat stuffing is equal to 0, r=5000; when the residual amount of the ham sausage meat stuffing is more than 0, the feedback is updated according to the following formula;

R＝-1000MDL(k＞0) (8)

step 3.4 defines the iteration number as 50, namely the number of Q learning training, and the iteration is one after the intelligent agent is produced once in the rolling and binding working procedure section of the flow workshop.