CN114281050B - Q learning-based process manufacturing workshop rolling and binding process section production optimization method - Google Patents

Q learning-based process manufacturing workshop rolling and binding process section production optimization method Download PDF

Info

Publication number
CN114281050B
CN114281050B CN202111650352.8A CN202111650352A CN114281050B CN 114281050 B CN114281050 B CN 114281050B CN 202111650352 A CN202111650352 A CN 202111650352A CN 114281050 B CN114281050 B CN 114281050B
Authority
CN
China
Prior art keywords
rolling
state
ligature
production
bin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111650352.8A
Other languages
Chinese (zh)
Other versions
CN114281050A (en
Inventor
韩忠华
卞旭升
常大亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Automation of CAS
Shenyang Jianzhu University
Original Assignee
Shenyang Institute of Automation of CAS
Shenyang Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Automation of CAS, Shenyang Jianzhu University filed Critical Shenyang Institute of Automation of CAS
Priority to CN202111650352.8A priority Critical patent/CN114281050B/en
Publication of CN114281050A publication Critical patent/CN114281050A/en
Application granted granted Critical
Publication of CN114281050B publication Critical patent/CN114281050B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Meat, Egg Or Seafood Products (AREA)

Abstract

The Q learning-based process manufacturing workshop rolling and ligating process section production optimization method relates to the field of process manufacturing industry production optimization, and simulation production can be carried out on actual process workshop rolling and ligating process section conditions by establishing a process manufacturing workshop rolling and ligating process section model; acquiring production data of a flow manufacturing workshop, defining Q learning parameters, generating a self-growing Q table for recording action-time-Q values, and recording a new state in the Q table every time the new state is encountered; the production optimization method based on Q learning is provided aiming at the characteristics of a dynamic production line, and the starting command of the rolling pot in the rolling process is controlled through a Q table obtained by continuously performing interactive training with a manufacturing workshop model through Q learning, so that the production optimization problem of rolling and ligating process sections in a flow manufacturing workshop is solved.

Description

Q learning-based process manufacturing workshop rolling and binding process section production optimization method
Technical Field
The invention relates to a production optimization problem in the process manufacturing industry, and aims to solve the production optimization problem of rolling, kneading and ligating process sections in a workshop of the process manufacturing industry.
Background
The process manufacturing industry refers to the industry that raw materials are subjected to a series of processes of mixing, separating, forming or chemical reaction and the like for changing physical and chemical properties of the raw materials, so that the raw materials are added with value to obtain products with specific physical and chemical properties and specific purposes. The industries of foods, colored building materials, petroleum and the like include manufacturing plants having typical process manufacturing plant characteristics. In the food industry, a ham sausage production workshop belongs to a typical process manufacturing workshop, ham sausage raw materials are processed in a raw material area to produce a twisted product and an emulsion, the twisted product and the emulsion are rolled and kneaded to produce ham sausage meat stuffing, then the ham sausage semi-finished product is produced in a ligation process, and the ham sausage semi-finished product is sterilized and packaged to produce the finished ham sausage. In the rolling and ligating process section, various ham sausages are rolled and kneaded in rolling and kneading pot equipment, the ham sausages produced by the rolling and kneading pot after working for a period of time are temporarily stored in a rolling and kneading pot discharging bin, the ham sausages in the rolling and kneading pot discharging bin are transported to a ligature feeding bin through an AGV trolley, and in actual production, one ligature is taken as a production unit, and a plurality of ligature machines are included. Sucking the ham sausage meat stuffing in the stuffing bin with the ligature line, stuffing the sausage casing with the ligature machine, and packing the sausage casing with aluminum buckles at two ends. The ham sausage meat stuffing produced by the rolling procedure enters the discharging bin and is transferred to the ligature by the AGV trolley which is arranged on the fixed track, in order to reduce the transfer time and facilitate the dispatching and commanding of the AGV trolley, the AGV trolley can only be matched with the corresponding ligature, thereby each rolling pot discharging bin can not feed all ligature feeding bins, when the specification and the model of ham sausage produced by each ligature can not lead to the weight difference of each ligature unit time, the speed of consuming the ham sausage meat stuffing by the ligature is different, and because the rolling pot is produced according to the whole pot, if the starting time and the starting sequence of the rolling pot can not be reasonably arranged, the situation that the ham sausage meat stuffing can not be completely consumed exists in part of the rolling pot at the end of production can be caused. If such a situation appears in production, in order to avoid ham sausage meat stuffing to remain, enterprises often transfer the remaining ham sausage meat stuffing in the rolling and kneading pot discharging bin to the corresponding ligature wire feeding bin in a manual transfer mode, so that the production efficiency can be greatly influenced, the automation level of the whole ham sausage production workshop is reduced, and the production capacity of the enterprises is seriously influenced. Therefore, in the stage of planning production, the starting sequence and the starting time of the rolling and kneading pot are reasonably controlled according to the processing speed and the corresponding relation between the discharging bin of the rolling and kneading pot and the feeding bin of the ligature, so that the residual quantity of the ham sausage meat stuffing in the rolling and kneading pot is reduced. Because the ham sausage production process is a dynamic process, the ham sausage production line state is changed continuously along with time, and the problem of production optimization of the rolling and ligating working procedure section of a ham sausage production workshop has certain complexity, an effective solution method needs to be provided.
As shown in fig. 2, when a single kneading pot cannot correspond to all ligatures, the kneading pot works according to the pot, the ligatures consume different material speeds, and the kneading pot and the ligatures are not in a full connection relationship, so that when the production is finished, part of the kneading pot has residual ham sausage meat stuffing, and part of the ligatures also have production tasks of incomplete whole ligature workload.
The production plan of the rolling and ligating working procedure section is compiled, a reasonable starting sequence and time of rolling and kneading equipment are provided, and the production optimization process is a process for continuously making intelligent decisions according to the working state of the equipment in continuous time. Reinforcement learning is mainly applied to numerous problems with intersection and continuous decision making, and simulates a learning mode of human beings, rewards are obtained according to execution effects after a certain action or decision is executed, and learning is carried out through continuous interaction with the environment so as to finally achieve the aim. Q learning is the most commonly used algorithm in reinforcement learning, and has very high universality and practicability through states, action ranges and returns of any action under any state which are met by a Q table recording environment. Therefore, the invention provides a Q learning-based production optimization method for rolling and ligating working procedure sections in a process manufacturing workshop.
Disclosure of Invention
The invention aims to solve the problems of rolling and ligating process sections in a process manufacturing workshop, and provides a production optimization method based on Q learning.
In order to achieve the above purpose, the method for optimizing the rolling and ligating process section production in the flow manufacturing workshop based on Q learning comprises the following steps:
Step 1, a rolling and ligating process section model of a flow manufacturing workshop is built, and simulation production can be carried out on actual rolling and ligating process section conditions of the production workshop.
The rolling and ligating process section model of the process manufacturing workshop comprises the following parameters.
Meter model parameter table
The model constraint relationship is as follows:
(1) Station information constraint
In the rolling procedure, the total number of the rolling pots is equal to the total number of the rolling discharging bins, and each rolling pot only corresponds to the corresponding rolling pot discharging bin. The correspondence is as follows.
PT=MT (1)
Wherein, the formula (1) represents that the total number of the rolling pot is equal to the total number of the rolling discharging bin. In formula (2), WS T,iCanOutTc represents whether the ith tumbling pot WS T,i can discharge in the c-th tumbling discharge bin.
In the ligature procedure, the total number of ligatures is equal to the total number of ligature feeding bins, and each ligature only corresponds to the respective ligature feeding bin. The correspondence is as follows.
ML=PL (3)
Wherein, the formula (3) represents that the total number of ligatures is equal to the total number of ligation feeding bins. In equation (4), WS T,jCanOutLr represents whether the jth ligature WST, j can aspirate in the (r) ligation feed bin.
(2) Production relationship constraints
The constraint of production relation in the rolling and binding procedure section in the convection manufacturing workshop, namely, the ham sausage meat stuffing in each rolling and binding out bin can only be fed to part of binding threads out of the bin, and the general constraint is as follows.
In the formula, T cCanInLr indicates that the rolling discharging bin Tc can supply a material identifier to the ligature feeding bin Lr, when Lr belongs to the collection BTc, namely, the ligature feeding bin collection connected with the c-th rolling discharging bin, the rolling discharging bin Tc can supply material to the ligature feeding bin Lr, and otherwise, the ligature feeding bin Lr cannot be supplied with material.
(3) Ligature work task constraints
The weight of the ham sausage stuffing in the rolling and discharging bin corresponding to each minute of the binding line is related to the weight of the ham sausage stuffing in the rolling and discharging bin corresponding to each minute of the binding line, and the corresponding relation is as follows.
Wherein RWWSL, r (t+1) represents the residual work task at the next moment of the ligature feeding bin r, and the formula means that when the weight of the residual ham sausage meat stuffing in the rolling-out bin corresponding to the ligature feeding bin r is larger than the speed of the ligature feeding bin r for consuming the ham sausage meat stuffing, the residual task at the next moment of the ligature r is the ligature speed of the ligature r subtracted from the residual task at the previous moment; when the weight of the residual ham sausage meat stuffing in the rolling-out bin c is more than 0 and less than the speed of the ligature wire feeding bin r consuming the ham sausage meat stuffing in the rolling-out bin corresponding to the ligature wire feeding bin r, the residual task at the next time of the ligature wire r is the weight of the residual ham sausage in the rolling-out discharging bin subtracted from the residual task at the previous time; in other cases, the remaining task at the next time of the ligature r is equal to the remaining task at the previous time.
(4) Ham sausage residual quantity constraint
After the production is finished, the sum of the ham sausage surplus and the ham sausage meat stuffing surplus in each rolling discharging bin in the rolling ligation working procedure section is equal to the sum of the work task surplus and the work task surplus in each ligature. The relationship is as follows.
Wherein MDL is the residual amount of the ham sausage,Represents the sum of the remaining ham sausage fillings of each tumbling discharge bin at the end of production,/>Representing the sum of the remaining work tasks per ligature at the end of production.
Step 2, obtaining production data of a flow manufacturing workshop, wherein the production data comprise a rolling and kneading process, the number M T and M L of stations in each process in the ligation process, a ligature feeding bin set B Tc for conveying ham sausage meat stuffing from a discharging bin of each rolling and kneading pot, the work task amount RWOper T (0) of each ligature and a speed set VOper L for consuming the ham sausage meat stuffing by each ligature.
And 3, defining Q learning parameters. The method comprises a state S in production, an action A corresponding to the state, a return R after the action is made and the iteration times.
Step 3.1 defines the state S in Q learning, i.e. the real-time state of the flow manufacturing plant. The invention discloses a real-time state S (t) of a manufacturing workshop with a specified flow, which consists of working states of working stations of a rolling process and a material level state of a material outlet bin of a rolling process. I.e. S (t) = RSOper L (t) & RMP (t).
Step 3.2, defining an action A corresponding to each state, namely whether each station of the rolling process starts or not.
Step 3.3, defining a return R, wherein the next state is a final state, namely, when production cannot be continued, the return is defined according to the residual quantity of the ham sausage meat stuffing, and when the residual quantity of the ham sausage meat stuffing is equal to 0, the positive R value is most given; when the remaining amount of the ham sausage meat stuffing is more than 0, the feedback is updated according to the following formula.
R=-kMDL(k>0) (8)
Where k is the amplification factor.
Step 3.4 defines the number of iterations, i.e. the number of Q learning exercises. And after the intelligent agent is produced once in the rolling and binding working procedure section of the flow workshop, the process is iterated once.
Step 4, initializing a Q table, generating a self-increasing Q table recording action-time-Q values, and recording a new state in the Q table every time the state is encountered.
And 5, applying Q learning, initializing a state according to the process manufacturing workshop state S defined in the step 3, and generating corresponding actions, wherein the initialization time t is 0. Storing the initial state S (0) of the production line and the corresponding action A (0) thereof into a Q table.
And 6, selecting which station of the rolling process starts to give a high return R according to the Q table for the current production line state S (t), and if a plurality of actions A exist in the highest value of the return, randomly selecting the actions A from the action sets constructed by the actions A to obtain A (t).
And 7, accessing the flow workshop environment, namely simulating production, according to the action A (t) selected by the corresponding state S (t) at the moment t by the established flow workshop model to obtain the state S (t+1) at the next moment.
And 8, judging the working state of the rolling pot. If the rolling pot is in the working state in the state S (t+1) at the moment of t+1, returning to the step 7; if there is an empty tumbling pot, jumping to step 9.
Step 9, judging whether the state S (t+1) is a termination state, namely, all the equipment in the rolling and kneading process reach the production task amount and the discharging bin in the rolling and kneading process cannot feed the ligation process. If the state S (t+1) is a termination state, giving a feedback value R defined in the step 3, recording the number of iterations added with 1 and jumping to the step 12; if the state S (t+1) is not the termination state, the process jumps to step 10.
Step 10 determines whether the flow shop state S (t+1) at time t+1 exists in the state recorded in the Q table. If not, this state S (t+1) is recorded in the Q table and stored in its corresponding action A (t+1). If so, return to step 6.
Step 11 finds the largest Q value among the corresponding action Q values according to S (t+1) as its feedback value R according to the action a (t) selected in step 7.
Step 12, updating the Q value corresponding to the action A (t) selected by S (t) according to the return R obtained in step 9 or step 11, and recording the time t by 1. The return is updated by feedback according to the following formula;
wherein R (S (t), a (t)) is the Q value of the current state-action pair itself, The highest value of all action rewards when the next state is reached for taking an action. Gamma is the attenuation coefficient. If step 9 is skipped to this step, the process jumps to step 13. If step 11 is skipped to this step, the process returns to step 6.
Step 13, judging whether the iteration times reach a preset value. If not, returning to the step 5, and if so, outputting the Q table as the production optimization result of the rolling and ligating process section of the flow manufacturing workshop of the Q learning.
The beneficial technical effects of the invention are as follows: by the optimizing method, the residual quantity of the ham sausage meat stuffing can be controlled to be 0 to 0.05 tons when the production is carried out, so that the problems of production cost improvement, a large quantity of residual materials, enterprise profit reduction and the like caused by a large quantity of residual ham sausage meat stuffing are effectively avoided.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram showing a series relationship between a tumbling process and a ligating process in the prior art.
FIG. 3 is a diagram showing the arrangement of station information in the rolling and ligating process steps according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The specific embodiments described herein are to be considered in an illustrative sense only and are not intended to limit the invention.
Examples
A process manufacturing workshop rolling and ligating process section production optimization method based on Q learning,
Step 1, a rolling and ligating process section model of a flow manufacturing workshop is built, and simulation production can be carried out on actual rolling and ligating process section conditions of the production workshop.
The rolling and ligating process section model of the process manufacturing workshop comprises the following parameters.
TABLE 1 model parameter Table
The model constraint relationship is as follows:
(1) Station information constraint
In the rolling procedure, the total number of the rolling pots is equal to the total number of the rolling discharging bins, and each rolling pot only corresponds to the corresponding rolling pot discharging bin. The correspondence is as follows.
PT=MT (1)
Wherein, the formula (1) represents that the total number of the rolling pot is equal to the total number of the rolling discharging bin. In formula (2), WS T,iCanOutTc represents whether the ith tumbling pot WS T,i can discharge in the c-th tumbling discharge bin.
In the ligature procedure, the total number of ligatures is equal to the total number of ligature feeding bins, and each ligature only corresponds to the respective ligature feeding bin. The correspondence is as follows.
ML=PL (3)
Wherein, the formula (3) represents that the total number of ligatures is equal to the total number of ligation feeding bins. In equation (4), WS T,jCanOutLr represents whether the jth ligature WST, j can aspirate in the (r) ligation feed bin.
(2) Production relationship constraints
The constraint of production relation in the rolling and binding procedure section in the convection manufacturing workshop, namely, the ham sausage meat stuffing in each rolling and binding out bin can only be fed to part of binding threads out of the bin, and the general constraint is as follows.
In the formula, T cCanInLr indicates that the rolling discharging bin Tc can supply a material identifier to the ligature feeding bin Lr, when Lr belongs to the collection BTc, namely, the ligature feeding bin collection connected with the c-th rolling discharging bin, the rolling discharging bin Tc can supply material to the ligature feeding bin Lr, and otherwise, the ligature feeding bin Lr cannot be supplied with material.
(3) Ligature work task constraints
The weight of the ham sausage stuffing in the rolling and discharging bin corresponding to each minute of the binding line is related to the weight of the ham sausage stuffing in the rolling and discharging bin corresponding to each minute of the binding line, and the corresponding relation is as follows.
Wherein RWWSL, r (t+1) represents the residual work task at the next moment of the ligature feeding bin r, and the formula means that when the weight of the residual ham sausage meat stuffing in the rolling-out bin corresponding to the ligature feeding bin r is larger than the speed of the ligature feeding bin r for consuming the ham sausage meat stuffing, the residual task at the next moment of the ligature r is the ligature speed of the ligature r subtracted from the residual task at the previous moment; when the weight of the residual ham sausage meat stuffing in the rolling-out bin c is more than 0 and less than the speed of the ligature wire feeding bin r consuming the ham sausage meat stuffing in the rolling-out bin corresponding to the ligature wire feeding bin r, the residual task at the next time of the ligature wire r is the weight of the residual ham sausage in the rolling-out discharging bin subtracted from the residual task at the previous time; in other cases, the remaining task at the next time of the ligature r is equal to the remaining task at the previous time.
(4) Ham sausage residual quantity constraint
After the production is finished, the sum of the ham sausage surplus and the ham sausage meat stuffing surplus in each rolling discharging bin in the rolling ligation working procedure section is equal to the sum of the work task surplus and the work task surplus in each ligature. The relationship is as follows.
Wherein MDL is the residual amount of the ham sausage,Represents the sum of the remaining ham sausage fillings of each tumbling discharge bin at the end of production,/>Representing the sum of the remaining work tasks per ligature at the end of production.
Step 2, obtaining production data of a flow manufacturing workshop, wherein the production data comprise a rolling and kneading process, the number M T and M L of stations in each process in the ligation process, a ligature feeding bin set B Tc for conveying ham sausage meat stuffing from a discharging bin of each rolling and kneading pot, the work task amount RWOper T (0) of each ligature and a speed set VOper L for consuming the ham sausage meat stuffing by each ligature.
In this example, the rolling and ligating process section has the following production characteristics.
As shown, in the tumbling ligature procedure, there are three tumbling pans, four ligatures, i.e., mt=3, ml=4. Each tumbling pot corresponds to one tumbling discharging bin, and each ligature corresponds to one ligature feeding bin, namely Tc=3 and PL=4. Any bandit rolling and kneading bin Tc can only feed two ligature feeding bins, namely NTc =2, and 0 < c is less than or equal to MT. And only two adjacent ligation feed bins are connected, namely Bt1= [ L1, L2], bt2= [ L2, L3], bt3= [ L3, L4].
In this embodiment, the day plan task and task allocation are shown in the following table.
Table 2 day planning task parameters
Ham specification (Single weight) Planning task volume (ton) Ligature speed (ton/min) Task allocation
38g 30 0.0036 Ligature line 1
40g 24 0.0038 Ligature line 2
50g 36 0.0046 Ligature line 3
60g 24 0.0057 Ligature line 4
RWOper T(0)=[30,24,36,24],VOperL = [0.0036,0.0038,0.0046,0.0057] was obtained.
And 3, defining Q learning parameters. The method comprises a state S in production, an action A corresponding to the state, a return R after the action is made and the iteration times.
Step 3.1 defines the state S in Q learning, i.e. the real-time state of the flow manufacturing plant. The invention discloses a real-time state S (t) of a manufacturing workshop with a specified flow, which consists of working states of working stations of a rolling process and a material level state of a material outlet bin of a rolling process. I.e. S (t) = RSOper L (t) & RMP (t).
Step 3.2, defining an action A corresponding to each state, namely whether each station of the rolling process starts or not.
Step 3.3 defining a return R, wherein the next state is the final state, i.e. the return is defined according to the remaining amount of the ham sausage meat stuffing when the production cannot be continued, and when the remaining amount of the ham sausage meat stuffing is equal to 0, r=5000; when the remaining amount of the ham sausage meat stuffing is more than 0, the feedback is updated according to the following formula.
R=-1000MDL(k>0) (8)
Step 3.4 defines the iteration number as 50, i.e. the number of Q learning exercises. And after the intelligent agent is produced once in the rolling and binding working procedure section of the flow workshop, the process is iterated once.
Step 4, initializing a Q table, generating a self-increasing Q table recording action-time-Q values, and recording a new state in the Q table every time the state is encountered. The following table is the initialized Q table.
Wherein RSOper L (t) is the starting state of each station in the rolling process, 0 represents idle, and 1 represents starting. RMP (t) represents the state of the discharge bin of the tumbling pot, 0 represents the blanking, and 1 represents the blanking. The number in the action represents the number of the rolling pot, T represents the boiling, and F represents the non-boiling. If 1_T represents the operation of the kneading pot 1 in the current state.
And 5, initializing the state according to the process manufacturing workshop state S defined in the step 3, and generating corresponding actions, wherein the initialization time t is 0. Storing the initial state S (0) of the production line and the corresponding action A (0) thereof into a Q table.
/>
And 6, selecting which station of the rolling process starts to give a high return R according to the Q table for the current production line state S (t), and if a plurality of actions A exist in the highest value of the return, randomly selecting the actions A from the action sets constructed by the actions A to obtain A (t).
And 7, accessing the flow workshop environment, namely simulating production, according to the action A (t) selected by the corresponding state S (t) at the moment t by the established flow workshop model to obtain the state S (t+1) at the next moment.
And 8, judging the working state of the rolling pot. If the rolling pot is in the working state in the state S (t+1) at the moment of t+1, returning to the step 7; if there is an empty tumbling pot, jumping to step 9.
Step 9, judging whether the state S (t+1) is a termination state, namely, all the equipment in the rolling and kneading process reach the production task amount and the discharging bin in the rolling and kneading process cannot feed the ligation process. If the state S (t+1) is a termination state, giving a feedback value R defined in the step 3, recording the number of iterations added with 1 and jumping to the step 12; if the state S (t+1) is not the termination state, the process jumps to step 10.
Step 10 determines whether the flow shop state S (t+1) at time t+1 exists in the state recorded in the Q table. If not, this state S (t+1) is recorded in the Q table and stored in its corresponding action A (t+1). If so, return to step 6.
Step 11 finds the largest Q value among the corresponding action Q values according to S (t+1) as its feedback value R according to the action a (t) selected in step 7.
Step 12, updating the Q value corresponding to the action A (t) selected by S (t) according to the return R obtained in step 9 or step 11, and recording the time t by 1. The return is updated by feedback according to the following formula;
wherein R (S (t), a (t)) is the Q value of the current state-action pair itself, The highest value of all action rewards when the next state is reached for taking an action. If step 9 is skipped to this step, the process jumps to step 13. If step 11 is skipped to this step, the process returns to step 6.
Step 13, judging whether the iteration times reach a preset value. If not, returning to the step 5, and if so, outputting the Q table as the production optimization result of the rolling and ligating process section of the flow manufacturing workshop of the Q learning.
Finally, according to the Q learning-based rolling and ligating process section production optimization method in the process manufacturing workshop, a converged Q table is obtained, and the result is as follows:
q table Q learning-based partial results of multi-series production optimization problem Q table of ham sausage rolling and binding working procedure sections
As can be seen from the table Q, in the current embodiment, when the state of the kneading pot is all idle and the material of the discharging bin of the kneading pot is all absent, the 1 st kneading pot should be started; when only the 2 nd rolling pot is idle and only the 2 nd rolling discharging bin has materials, the second rolling pot is started. The Q-table details the return of each state and its corresponding action that the process manufacturing shop encounters during the production process. When Q learning is adopted to optimize the production problem of multiple series connection of rolling and ligating process steps, only a Q table is required to be queried according to the production state of a current production workshop, the equivalent production line state is found, namely the elements in a rolling and rubbing pot state set and a rolling and rubbing pot discharging bin state set are all equal, and the action command with the highest feedback value in the corresponding rolling and rubbing pot command is found, and is used as the start command of the current rolling and rubbing process. By continually querying the status to select the optimal action, it is ensured that each action is intended to reduce the remaining amount of ham sausage meat, and in the final result, it is ensured that the remaining amount of ham sausage meat is minimized. Finally, by the solution provided by the invention, the residual quantity of the ham sausage meat stuffing can be controlled to be 0 to 0.05 ton when the production scheduling is carried out, so that the problems of production cost improvement, a large quantity of residual materials, enterprise profit reduction and the like caused by a large quantity of residual ham sausage meat stuffing are effectively avoided.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the invention, which is defined by the following claims.

Claims (7)

1. A Q learning-based process manufacturing workshop rolling and ligating process section production optimization method is characterized by comprising the following steps of: comprises the following steps
Step 1, establishing a rolling and binding procedure section model of a flow manufacturing workshop, and simulating the actual rolling and binding procedure section condition of a production workshop;
step 2, obtaining production data of a flow manufacturing workshop;
step3, defining Q learning parameters;
step 4, initializing a Q table, generating a self-growing Q table for recording action-time-Q values, and recording a new state in the Q table every time the new state is encountered;
step 5, initializing a state according to the process manufacturing workshop state S defined in the step 3, and generating corresponding actions, wherein the initialization time t is 0;
step 6, selecting which station of the rolling process starts to give a high return R according to the Q table for the current production line state S (t), and if a plurality of actions A exist in the highest value of the return, randomly selecting the actions A from action sets constructed by the actions A to obtain A (t);
Step 7, accessing the flow workshop environment, namely simulating production, according to the action A (t) selected by the corresponding state S (t) at the moment t through the established flow workshop model to obtain the state S (t+1) at the next moment;
step 8, judging the working state of the rolling pot, and if the rolling pot is in the working state in the state S (t+1) at the moment t+1, returning to the step 7; if the idle rolling pot exists, jumping to the step 9;
Step 9, judging whether the state S (t+1) is a termination state, namely, all equipment in the rolling and kneading process reach the production task quantity, and a discharging bin in the rolling and kneading process cannot feed materials to the ligation process, if the state S (t+1) is the termination state, giving a feedback value R defined in the step 3, recording and adding 1 to the iteration times, and jumping to the step 12; if the state S (t+1) is not the termination state, jumping to the step 10;
Step 10, judging whether the state S (t+1) of the process workshop at the time of t+1 exists in the state recorded by the Q table, if not, recording the state S (t+1) in the Q table and storing the corresponding action A (t+1), and if so, returning to the step 6;
Step 11, according to the action A (t) selected in the step 7, finding the largest Q value in the corresponding action Q values according to S (t+1) to be used as a feedback value R;
step 12, updating the Q value corresponding to the action A (t) selected by the S (t) according to the return R obtained in the step 9 or the step 11, and recording the time t plus 1, wherein the return is updated in a feedback way according to the following formula;
wherein R (S (t), a (t)) is the Q value of the current state-action pair itself, The highest value of all action rewards when the action reaches the next state for taking action; if the step 9 is skipped to the step, the step 13 is skipped, and if the step 11 is skipped to the step, the step 6 is returned to;
And step 13, judging whether the iteration times reach a preset value, if not, returning to the step 5, and if so, outputting the Q table at the moment as a production optimization result of the rolling and ligating process section of the Q learning process manufacturing workshop.
2. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the rolling and ligating process section model of the flow manufacturing workshop is in model constraint relation, and comprises station information constraint, production relation constraint, ligature work task constraint and ham sausage residual quantity constraint.
3. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the station information constraint is that in the rolling procedure, the total number of rolling pots is equal to the total number of rolling discharge bins, each rolling pot only corresponds to the respective rolling pot discharge bin, and the corresponding relation is as follows:
PT=MT (1)
Wherein, the formula (1) represents that the total number of the rolling pots is equal to the total number of the rolling discharging bins, and in the formula (2), WS T,iCanOutTc represents whether the ith rolling pot WS T,i can discharge in the c-th rolling discharging bin;
in the ligation process, the total number of ligatures is equal to the total number of ligature feeding bins, each ligature only corresponds to the ligature feeding bin, and the corresponding relation is as follows;
ML=PL (3)
Wherein formula (3) represents that the total number of ligatures is equal to the total number of ligation feeding bins, and in formula (4), WS T,jCanOutLr represents whether the jth ligature WST, j can suck materials in the (r) ligation feeding bin.
4. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the production relation constraint aims at the constraint of the production relation in the rolling and binding procedure section in a stream manufacturing workshop, namely, the ham sausage meat stuffing in each rolling and binding discharging bin can only be fed to a part of binding thread discharging bins, and the general constraint is as follows:
In the formula, T cCanInLr indicates that the rolling discharging bin Tc can supply a material identifier to the ligature feeding bin Lr, when Lr belongs to the collection BTc, namely, the ligature feeding bin collection connected with the c-th rolling discharging bin, the rolling discharging bin Tc can supply material to the ligature feeding bin Lr, and otherwise, the ligature feeding bin Lr cannot be supplied with material.
5. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the ligature work task restrains the weight of the ham sausage in the rolling and kneading discharging bin corresponding to the ligature per minute, and the corresponding relation is as follows:
Wherein RWWS L,f (t+1) represents the remaining work task of the ligature feeding bin r at the next moment, and the formula means that when the weight of the remaining ham sausage meat stuffing in the rolling-out bin corresponding to the ligature feeding bin r is larger than the speed of the ligature feeding bin r for consuming the ham sausage meat stuffing, the remaining task of the ligature r at the next moment is the ligature speed of the ligature r subtracted from the remaining task at the previous moment; when the weight of the residual ham sausage meat stuffing in the rolling-out bin c is more than 0 and less than the speed of the ligature wire feeding bin r consuming the ham sausage meat stuffing in the rolling-out bin corresponding to the ligature wire feeding bin r, the residual task at the next time of the ligature wire r is the weight of the residual ham sausage in the rolling-out discharging bin subtracted from the residual task at the previous time; in other cases, the remaining task at the next time of the ligature r is equal to the remaining task at the previous time.
6. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: after the production is finished, the remaining quantity of the ham sausage is limited, the sum of the ham sausage residues in the rolling and binding working procedure section and the remaining ham sausage meat stuffing in each rolling and binding discharging bin is equal, and the sum of the ham sausage meat stuffing residues in each binding line is equal, and the relation is as follows:
wherein MDL is the residual amount of the ham sausage, Represents the sum of the remaining ham sausage fillings of each tumbling discharge bin at the end of production,/>Representing the sum of the remaining work tasks per ligature at the end of production.
7. The Q-learning-based process manufacturing shop roll ligation process segment production optimization method is characterized by comprising the following steps of: the Q learning parameters comprise a state S in production, an action A corresponding to the state, a return R after the action is made and the iteration times, and the steps are as follows:
Step 3.1, defining a state S in Q learning, namely a real-time state of a process manufacturing workshop, wherein the real-time state S (t) of the process manufacturing workshop is composed of working states of all working stations in a rolling process and a material level state of a material outlet bin in a second rolling process; i.e., S (t) = RSOper L (t) & RMP (t);
step 3.2, defining an action A corresponding to each state, namely whether each station of the rolling process starts;
step 3.3 defining a return R, wherein the next state is the final state, i.e. the return is defined according to the remaining amount of the ham sausage meat stuffing when the production cannot be continued, and when the remaining amount of the ham sausage meat stuffing is equal to 0, r=5000; when the residual amount of the ham sausage meat stuffing is more than 0, the feedback is updated according to the following formula;
R=-1000MDL(k>0) (8)
step 3.4 defines the iteration number as 50, namely the number of Q learning training, and the iteration is one after the intelligent agent is produced once in the rolling and binding working procedure section of the flow workshop.
CN202111650352.8A 2021-12-30 2021-12-30 Q learning-based process manufacturing workshop rolling and binding process section production optimization method Active CN114281050B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111650352.8A CN114281050B (en) 2021-12-30 2021-12-30 Q learning-based process manufacturing workshop rolling and binding process section production optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111650352.8A CN114281050B (en) 2021-12-30 2021-12-30 Q learning-based process manufacturing workshop rolling and binding process section production optimization method

Publications (2)

Publication Number Publication Date
CN114281050A CN114281050A (en) 2022-04-05
CN114281050B true CN114281050B (en) 2024-06-07

Family

ID=80878671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111650352.8A Active CN114281050B (en) 2021-12-30 2021-12-30 Q learning-based process manufacturing workshop rolling and binding process section production optimization method

Country Status (1)

Country Link
CN (1) CN114281050B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105519858A (en) * 2014-10-21 2016-04-27 嘉兴御庄园食品有限公司 Making method of ready-to-eat convenient flavored rice in soft package
WO2016169287A1 (en) * 2015-04-20 2016-10-27 海安县申菱电器制造有限公司 Productivity allocation method for mixed production line
CN110443412A (en) * 2019-07-18 2019-11-12 华中科技大学 The intensified learning method of Logistic Scheduling and path planning in dynamic optimization process
CN110636523A (en) * 2019-09-20 2019-12-31 中南大学 Millimeter wave mobile backhaul link energy efficiency stabilization scheme based on Q learning
WO2021052589A1 (en) * 2019-09-19 2021-03-25 Siemens Aktiengesellschaft Method for self-learning manufacturing scheduling for a flexible manufacturing system and device
CN113761732A (en) * 2021-08-30 2021-12-07 浙江工业大学 Method for modeling and optimizing one-class multi-disturbance workshop flexible scheduling based on reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105519858A (en) * 2014-10-21 2016-04-27 嘉兴御庄园食品有限公司 Making method of ready-to-eat convenient flavored rice in soft package
WO2016169287A1 (en) * 2015-04-20 2016-10-27 海安县申菱电器制造有限公司 Productivity allocation method for mixed production line
CN110443412A (en) * 2019-07-18 2019-11-12 华中科技大学 The intensified learning method of Logistic Scheduling and path planning in dynamic optimization process
WO2021052589A1 (en) * 2019-09-19 2021-03-25 Siemens Aktiengesellschaft Method for self-learning manufacturing scheduling for a flexible manufacturing system and device
CN110636523A (en) * 2019-09-20 2019-12-31 中南大学 Millimeter wave mobile backhaul link energy efficiency stabilization scheme based on Q learning
CN113761732A (en) * 2021-08-30 2021-12-07 浙江工业大学 Method for modeling and optimizing one-class multi-disturbance workshop flexible scheduling based on reinforcement learning

Also Published As

Publication number Publication date
CN114281050A (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN101999479A (en) Full-automatic tea processing method and equipment
CN107168267A (en) Based on the production scheduling method and system for improving population and heuristic strategies
Doganis et al. Optimal production scheduling for the dairy industry
CN112132546A (en) Method and device for scheduling production
CN110543953A (en) Multi-target disassembly line setting method under space constraint based on wolf colony algorithm
CN114281050B (en) Q learning-based process manufacturing workshop rolling and binding process section production optimization method
Seidmann Intelligent control schemes for automated storage and retrieval systems
WO2021190919A1 (en) Computer-implemented method for planning and/or controlling a production by a production system, and production planning and/or control system
CN113433915A (en) Automatic scheduling algorithm for workshop sheet metal machining
CN115933568A (en) Multi-target distributed hybrid flow shop scheduling method
CN115249121A (en) Discrete manufacturing workshop robust scheduling optimization method based on deep reinforcement learning
Chen et al. A modified adaptive switching-based many-objective evolutionary algorithm for distributed heterogeneous flowshop scheduling with lot-streaming
CN113050644A (en) AGV (automatic guided vehicle) scheduling method based on iterative greedy evolution
CN117689138A (en) Enameled wire productivity assisting distribution method, system and storage medium
Vo Development of insect production automation: Automated processes for the production of Black Soldier Fly (Hermetia illucens)
CN115860364A (en) Low-temperature ham sausage workshop production scheduling method based on OPC UA protocol
CN112731888B (en) Improved migrant bird optimization method for scheduling problem of batch flow mixed flow shop
CN115496403A (en) Workshop logistics system scheduling method and device, electronic equipment and storage medium
Trattner et al. Product wheels for scheduling in the baking industry: A case study
Doganis et al. 16 Mixed Integer Linear Programming Scheduling in the Food Industry
CN114418486A (en) Wave-time planning method for parallel relay picking
CN104573839B (en) A kind of stock control optimization method, device and system
Ram et al. Modelling furnace operations using simulation and heuristics
CN113723695A (en) Scene-based remanufacturing scheduling optimization method
CN113592393A (en) Rapid sorting line initial racking algorithm based on genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant