CN115061444B - Real-time optimization method for process parameters integrating probability network and reinforcement learning - Google Patents
Real-time optimization method for process parameters integrating probability network and reinforcement learning Download PDFInfo
- Publication number
- CN115061444B CN115061444B CN202210989613.7A CN202210989613A CN115061444B CN 115061444 B CN115061444 B CN 115061444B CN 202210989613 A CN202210989613 A CN 202210989613A CN 115061444 B CN115061444 B CN 115061444B
- Authority
- CN
- China
- Prior art keywords
- value
- process parameters
- network
- model
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 146
- 230000008569 process Effects 0.000 title claims abstract description 90
- 230000002787 reinforcement Effects 0.000 title claims abstract description 40
- 238000005457 optimization Methods 0.000 title claims abstract description 35
- 238000004519 manufacturing process Methods 0.000 claims abstract description 78
- 230000007704 transition Effects 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 230000009471 action Effects 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 26
- 239000003795 chemical substances by application Substances 0.000 claims description 23
- 230000004927 fusion Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000009826 distribution Methods 0.000 claims description 13
- 238000013461 design Methods 0.000 claims description 12
- 230000002776 aggregation Effects 0.000 claims description 8
- 238000004220 aggregation Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 7
- 230000002159 abnormal effect Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 12
- 238000003062 neural network model Methods 0.000 description 11
- 239000003245 coal Substances 0.000 description 10
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000287196 Asthenes Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/418—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
- G05B19/41865—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by job scheduling, process planning, material flow
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/32—Operator till task planning
- G05B2219/32252—Scheduling production, machining, job shop
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Manufacturing & Machinery (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a real-time optimization method for process parameters integrating a probability network and reinforcement learning, which comprises the following steps: collecting technological parameter data of a production system, and carrying out operations of preprocessing, processing and dividing a data set on the collected data; constructing a state transition model of adjacent time intervals in the production process based on the preprocessed data; building an intelligent agent model capable of outputting artificially controllable parameter data in the production process by using reinforcement learning; and fusing and applying the state transition model and the intelligent agent model to realize real-time optimization and output of process parameters in the production process. The invention divides the process parameters into the control variables, the influence variables and the target values, organically combines the control variables, the influence variables and the target values, recommends the controllable process parameters of the production process in real time, ensures the continuous and efficient operation of the production process, reduces the cost and improves the efficiency.
Description
Technical Field
The invention relates to the technical field of optimization of technological parameters in a production process, in particular to a real-time optimization method for technological parameters by fusing a probability network and reinforcement learning.
Background
With the rapid development of the internet of things and big data technology, the development and application of new-generation intelligent manufacturing are promoted, a new paradigm is provided for optimizing process parameters in the production process, and the optimization of the process parameters is to predict parameters which should be input into the production system in the next time period in advance, so that the continuous and efficient operation of the production process is ensured, and the cost reduction and the efficiency improvement of the operation process of the production system are promoted.
The current parameter optimization methods include optimization algorithm and artificial intelligence algorithm implementation, and although they can solve a set of optimal process parameters according to different targets, the methods have some defects. The method for constructing the parameter optimization model based on the optimization algorithm is quite dependent on the logical relationship between parameters and targets, so that the constructed model is insufficient in staticizing, disturbance resistance and migration capacity, when the parameter types or the targets change, the algorithm of the original constructed model is not applicable, and the convergence of the solving process is slow and time-consuming; most methods for constructing parameter optimization models based on artificial intelligence algorithms ignore the time sequence relation in data, cannot search optimal process parameters according to the time sequence process, and easily cause the constructed models to be separated from the real operation process of a system.
In order to overcome the defects of the existing parameter optimization method based on an optimization algorithm and an artificial intelligence algorithm, the method has the advantages that the probabilistic neural network can be combined with the data distribution situation, the fault tolerance rate is high, the model environment adaptability of reinforcement learning training is strong, and the model environment adaptability and the target positive feedback are realized, and meanwhile, the time sequence relation of data is considered in the training process of the model, so that the continuous and efficient operation of the production process is ensured, the cost is reduced, and the efficiency is improved.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned problems.
Therefore, the technical problem solved by the invention is as follows: in the prior art, the optimization method for the process parameters of the production system has the problems of over-experience, low prediction efficiency and insufficient fusion with production targets.
In order to solve the technical problems, the invention provides the following technical scheme: a real-time optimization method for process parameters fusing a probability network and reinforcement learning comprises the following steps: collecting technological parameter data of a production system, and carrying out operations of preprocessing, processing and dividing a data set on the technological parameter data; constructing a state transition model of adjacent time intervals in the production process based on the preprocessed process parameter data; by usingBuilding an intelligent agent model capable of outputting artificially controllable parameter data in a production process by reinforcement learning; and fusing and applying the state transition model and the intelligent agent model to realize real-time optimization and output of process parameters in the production process.
As an optimal scheme of the method for optimizing the technological parameters of the fusion probability network and the reinforcement learning in real time, the method comprises the following steps: the collection of the process parameter data comprises collecting control variables, influence variables and actual production target values of the production process at equal time intervals;
the control variables comprise process parameters which can be directly adjusted manually in the production process;
the influence variables comprise process parameters generated by the influence of manually input control variables on the production system;
the actual target value of production comprises a production target which is completed by the production system at a certain time interval.
As an optimal scheme of the method for optimizing the technological parameters of the fusion probability network and the reinforcement learning in real time, the method comprises the following steps: the pre-processing and processing of the process parameter data includes,
the preprocessing of the process parameter data comprises processing of abnormal samples, filling of null values and standardization of data;
and the processing of the process parameter data comprises the steps of differentiating two adjacent time intervals of the actual production target value, taking the difference value of the two adjacent time intervals as a new target value, and then aggregating the sample data of a plurality of time intervals in a time sequence, wherein the aggregation mode is average value aggregation.
As a preferred scheme of the process parameter real-time optimization method for integrating the probability network and the reinforcement learning, the method comprises the following steps: the dividing of the process parameter data comprises dividing a new data set after pretreatment and processing into a training set, a verification set and a test set according to a certain proportion.
As a preferred scheme of the process parameter real-time optimization method for integrating the probability network and the reinforcement learning, the method comprises the following steps: the construction of the state transition model includes,
constructing a probabilistic neural network by using the divided training set;
solving the influence variables and actual production target values of the immediately-after time interval state (the state of the next time interval in the current time interval state);
a state transition function and a reward function that can express the change of the actual target value of the production with high fidelity according to the state transition are obtained.
As an optimal scheme of the method for optimizing the technological parameters of the fusion probability network and the reinforcement learning in real time, the method comprises the following steps: the construction of the probabilistic neural network comprises the setting of a loss function and the training of a probabilistic neural network model;
the calculation of the log-prediction probability includes,
wherein,a set of training data is represented that is,a density function representing a probabilistic neural network model;
the output of the training of the probabilistic neural network model is a gaussian distribution parameterized by diagonal covariance;
the calculation of the density function of the probabilistic neural network model includes,
the computation of the loss function after the pull-in logarithmic prediction probability reduction is,
wherein,a mean vector representing each attribute, T represents the transpose of the matrix,a diagonal covariance matrix is represented as a diagonal covariance matrix,denotes the inverse of the diagonal covariance matrix, k denotesThe number of the features in (1) is,representing the determinant of the diagonal covariance matrix.
As an optimal scheme of the method for optimizing the technological parameters of the fusion probability network and the reinforcement learning in real time, the method comprises the following steps: the solving of the influence variables and the target values of the state of the tight time interval comprises the selection of a probabilistic neural network submodel in a model library, the solving of the difference value of the influence variables and the target values of the adjacent time intervals and the solving of the influence variables of the tight time interval;
the selection of the probabilistic neural network submodel comprises randomly selecting a submodel from a learned model library to obtain a mean vector and a diagonal covariance matrix output by the submodel;
the calculation of the solution of the adjacent time interval influencing variable and target value difference comprises,
wherein,the difference between the parameter value representing the current state and the value of the immediately subsequent state parameter,representing complianceDistribution of (2)The result is solved to form a random data set,represents the standard deviation;
the solving of the adjacent time interval influencing variable and target value difference comprisesAre randomly paired in a distributed data setSampling to obtain multiple samplesThe value, after taking the average number, is the state after being normalizedA parameter value of (d);
the parameter solving of the time interval influence variables comprises the steps of adding a solving difference value to a parameter value in the current time, carrying out inverse standardization according to a standardization mode of collected process parameters, and verifying the performance of the constructed state transition model by utilizing the training set.
As a preferred scheme of the process parameter real-time optimization method for integrating the probability network and the reinforcement learning, the method comprises the following steps: the construction of the intelligent agent model comprises an action design causing state transition and an action-caused reward design;
the action design for causing state transition comprises making difference between adjacent time intervals of each control variable, and taking median of all valuesCan be increased for each control variable individually at each momentA value is defined as an actionThe action space during the state transition contains elements of,Is the number of control variables;
the action-induced reward design includes a target value difference for a given actionFrom the current stateTransition to the next stateIs awardedAnd the target value changes correspondingly every time the control variable is changed, and the changed value is the reward after the control variable is changed.
As an optimal scheme of the method for optimizing the technological parameters of the fusion probability network and the reinforcement learning in real time, the method comprises the following steps: the learning process of the intelligent agent model comprises,
searching a minimum value of the TD error, and setting the minimum value of the TD error as a target;
the TD error is calculated as the difference between,
wherein,to representApplying motion in stateThe expectation of the gain to be obtained is,the discount factor is represented by a number of discount factors,representApplying motion in state(ii) a gain expectation obtained;
by usingThe reinforcement learning algorithm inputs the influence variables of the current time interval into the state value network and the strategy network, and generates the maximum reward value through loop iterationCorresponding actionThe action ofAll states, actions and rewards constitute a policy network.
As an optimal scheme of the method for optimizing the technological parameters of the fusion probability network and the reinforcement learning in real time, the method comprises the following steps: the real-time optimization and output of the process parameters in the production process comprises,
collecting the technological parameters of the production process in real time by taking each fixed time interval as a unit, selecting the number of time intervals to be aggregated according to the actual business requirements and the time sequence, and carrying out data processing and aggregation on the sample data consisting of the selected influence variables and the target value;
inputting the processed and aggregated data into the constructed state transition model, and outputting a difference value between an influence variable and a target value;
and inputting the influence variable of the current time interval into the trained strategy network, outputting a control variable, and realizing the fusion and application of the state transition model and the intelligent agent model in the actual production process.
The invention has the beneficial effects that: the real-time optimization method for the process parameters fusing the probability network and the reinforcement learning fully utilizes historical process parameter data, divides the process parameters into control variables, influence variables and target values, organically combines the control variables, the influence variables and the target values, recommends the controllable process parameters of the production process in real time, ensures the continuous and efficient operation of the production process, reduces the cost and improves the efficiency. Compared with the traditional method for constructing the parameter optimization model, the method has stronger disturbance resistance and migration resistance, better accords with the real application scene, has better parameter recommendation effect, and is suitable for most types of production systems.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a flowchart illustrating a method for real-time optimization of process parameters by fusion of a probabilistic network and reinforcement learning according to an embodiment of the present invention;
fig. 2 is a diagram showing and comparing the prediction effects of 100 groups of coal consumption selected in the process parameter real-time optimization method for integrating the probabilistic network and reinforcement learning according to the second embodiment of the present invention;
fig. 3 is a diagram showing an implementation effect of control variables recommended by an agent in a process parameter real-time optimization method combining a probabilistic network and reinforcement learning according to a second embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures of the present invention are described in detail below, and it is apparent that the described embodiments are a part, not all or all of the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and it will be appreciated by those skilled in the art that the present invention may be practiced without departing from the spirit and scope of the present invention and that the present invention is not limited by the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1, for an embodiment of the present invention, a method for real-time optimization of process parameters by fusion of a probability network and reinforcement learning is provided, including:
s1: collecting technological parameter data of a production system, and carrying out operations of preprocessing, processing and dividing a data set on the technological parameter data. It should be noted that:
collecting process parameter data comprises collecting control variables, influence variables and actual production target values of a production process at equal time intervals; the control variables comprise process parameters which can be directly adjusted manually in the production process; the influence variables comprise process parameters generated by the influence of manually input control variables on the production system; the actual target value of production includes a production target that the production system has completed at certain time intervals.
Further, the preprocessing of the process parameter data comprises processing of abnormal samples, filling of null values and standardization of data;
further, the processing of the process parameter data comprises that the difference between two adjacent time intervals for producing the actual target value is used as a new target value, then sample data of a plurality of time intervals in the time sequence are aggregated, and the aggregation mode is average value aggregation;
furthermore, the dividing of the process parameter data comprises dividing the new preprocessed and processed data set into a training set, a verification set and a test set according to a certain proportion.
S2: and constructing a state transition model of adjacent time intervals in the production process based on the preprocessed process parameter data. It should be noted that:
the method comprises the steps of firstly, building a probabilistic neural network by using a divided training set, then solving an influence variable of a time interval state (a state of a next time interval in the current time interval state) and a production actual target value, and finally obtaining a state transfer function and a reward function which can express that the production actual target value changes along with state transfer in a high-fidelity mode.
Further, the establishment of the probabilistic neural network comprises the setting of a loss function and the training of a probabilistic neural network model;
wherein,a set of training data is represented that is,a density function representing a probabilistic neural network model;
the output of the training of the probabilistic neural network model is a gaussian distribution parameterized by diagonal covariance, whose density function is calculated including,
the computation of the loss function after the pull-in logarithmic prediction probability reduction is,
wherein,a mean vector representing each attribute, T represents the transpose of the matrix,a diagonal covariance matrix is represented,denotes the inverse of the diagonal covariance matrix, k denotesThe number of the features in (2) is,a determinant representing a diagonal covariance matrix;
the input value of the probabilistic neural network model is the attribute of a training data set obtained from an intelligent agent model, the output is a mean value and a diagonal covariance matrix of distribution obeyed by influence variable differences, a plurality of probabilistic neural network submodels with excellent performance are built to form a model base, and a data set is verifiedThe smaller the value is, the better the trained probabilistic neural network model is;
further, the solving of the influence variables and the target values of the state of the time interval after the tightening comprises the selection of a probabilistic neural network sub-model in a model library, the solving of the difference value of the influence variables and the target values of the adjacent time intervals and the solving of the influence variables of the time interval after the tightening;
the selection of the probabilistic neural network submodel comprises randomly selecting a submodel from a learned model library to obtain a mean vector and a diagonal covariance matrix output by the submodel;
the calculation of the solution for the difference between the adjacent time interval influencing variables and the target value includes,
wherein,the difference between the parameter value representing the current state and the value of the immediately subsequent state parameter,representing complianceDistribution of (2)Solution result formationThe random data set of (a) is,represents the standard deviation;
it should be noted that the solution of the difference between the influencing variable and the target value of the adjacent time interval is based on complianceDistribution of (2)Solving a random data set formed by the result, the data set being defined asThen, thenIs also obeyedIs distributed andsetting upFrom the distribution of obeysIs randomly paired in the data setSampling to obtain a plurality of samplesTaking the average value as the parameter value of the normalized state after tightening;
the parameter solving of the time interval influence variables comprises the steps of adding the solved difference value to the parameter value in the current time, carrying out inverse standardization according to a standardization mode of the collected process parameters, and verifying the performance of the constructed state transition model by utilizing a training set.
S3: by usingAnd (3) building an intelligent agent model capable of outputting artificial controllable parameter data in the production process by reinforcement learning. It should be noted that:
constructing an intelligent agent model, wherein the intelligent agent model comprises an action design causing state transition and an action-caused reward design;
the action design for causing state transition comprises making difference between adjacent time intervals of each control variable, and taking median of all valuesCan be increased for each control variable individually at each instantA value is defined as an actionThe action space during a state transition contains elements of,Is the number of control variables;
action-induced reward design includes a target value difference for a given actionFrom the current stateTransition to the next stateIs awardedAnd the target value changes correspondingly every time the control variable is changed, and the changed value is the reward after the control variable is changed.
The learning process of the intelligent agent model comprises,
searching the minimum value of the TD error, and setting the minimum value of the TD error as a target;
the error of the TD is calculated as,
wherein,to representApplying motion in stateThe expectation of the gain to be obtained is,the discount-factor is represented by a number of discount factors,to representApplying motion in stateThe expected gain is obtained.
By usingReinforcement learning algorithm, inputting the influence variable of the current time interval to the state priceMaximum reward value generated by loop iteration in value network and policy networkCorresponding actionThe action ofAll states, actions and rewards constitute a policy network.
S4: and fusing and applying the state transition model and the intelligent agent model to realize the real-time optimization and output of the process parameters in the production process. It should be noted that:
the real-time optimization and output of the process parameters in the production process comprises,
collecting the technological parameters of the production process in real time by taking each fixed time interval as a unit, selecting the number of time intervals to be aggregated according to the actual business requirements and the time sequence, and carrying out data processing and aggregation on the sample data consisting of the selected influence variables and the target values;
inputting the processed and aggregated data into the constructed state transition model, and outputting a difference value between an influence variable and a target value;
and inputting the influence variable of the current time interval into the trained strategy network, outputting the control variable, and realizing the fusion and application of the state transition model and the intelligent agent model in the actual production process.
It should be noted that, in consideration of the problems in the prior art that the convergence of the solving process is slow, the prediction efficiency is low, and the fusion with the production target is insufficient, a probabilistic neural network which can quickly fit the data distribution situation and has high fault tolerance rate and a model for reinforcement learning training which has strong environmental adaptability and is positively feedback with the target are adopted, and the time sequence relation of data is taken into consideration in the training process of the model, so that the defects of most of the existing methods based on an artificial intelligence algorithm are overcome, the model fitting effect and the migration effect are more excellent, the pertinence of dynamically recommended process parameters to the optimized target is stronger, the recommended process parameters are more in line with the reality, the continuous and efficient operation of the production process is ensured, and the cost is reduced and the efficiency is increased.
The real-time optimization method for the process parameters fusing the probability network and the reinforcement learning, provided by the invention, fully utilizes historical process parameter data, divides the process parameters into control variables, influence variables and target values, organically combines the control variables, the influence variables and the target values, recommends the controllable process parameters of the production process in real time, ensures the continuous and efficient operation of the production process, reduces the cost and improves the efficiency. Compared with the traditional method for constructing the parameter optimization model, the method has stronger disturbance resistance and migration resistance, better accords with the real application scene, has better parameter recommendation effect, and is suitable for most types of production systems.
Example 2
Referring to fig. 2 and 3, a second embodiment of the present invention is different from the first embodiment in that a verification test for implementing an optimization method by fusing a probability network and a process parameter of reinforcement learning is provided, and in order to verify and explain the technical effects adopted in the method, the embodiment adopts a comparison test between a conventional technical scheme and the method of the present invention, and compares the test results by means of scientific demonstration to verify the actual effects of the method.
Taking the rotary kiln system of the Tai-Gai base in Taiyuan City as an example, the collection is carried out in 1 minute as unit time, and the collected process parameters comprise: the control variables are head coal, tail coal, grate speed, high-temperature fan high-pressure frequency conversion frequency setting, head row high-pressure frequency conversion frequency setting and tail row high-pressure frequency conversion frequency setting, the influence variables are secondary air temperature, decomposing furnace temperature, clinker temperature 2, kiln head cover negative pressure, kiln tail negative pressure and decomposing furnace outlet temperature, the target value is coal consumption, and finally acquired rotary kiln production system process parameter data are multi-dimensional time sequence data.
Then, carrying out operations of preprocessing, processing and dividing a data set on the collected process parameter data:
pretreatment:
1) Processing an abnormal sample: to avoid the influence of abnormal values on the subsequent modeling process, use is made ofThe principle modifies the exception sample to a null value, remaining in range for each parameter datumA value of (1), whereinIs shown asThe mean value of the data of the individual parameters,denotes the firstFor each standard deviation of the parameter data, the parameter values not in the range are replaced with null values.
2) Filling of null values: use ofIn this way, the average of 6 consecutive data is selected for filling, for example, if the value of the secondary air temperature at the 10 th minute is null, the average of the values of the secondary air temperature at the 7 th, 8 th, 9 th and 11 th, 12 th, 13 th minutes is filled into the 10 th time interval.
3) Normalization of the data: obeying all historical parameter dataThe normalization process of (1).
Processing: and taking the difference between two adjacent time intervals of the target value as a new target value, and then aggregating the sample data of 10 adjacent minutes in the time sequence, wherein the aggregation mode is an average value.
Dividing the data set: the processed and processed data were processed as per 6:2: the approach of 2 is divided into a training set, a prediction set, and a validation set.
Then constructing a state transition model, utilizing the divided training set to learn and train a probabilistic neural network model, setting an input layer as 14, an output layer as 8, a mode layer as 200, a summation layer as 8, a batch number as 256, setting the learning rate to be 0.001 by adopting an Adam optimizer, setting the epoch to be 1000 times, and outputting a result to be a diagonal covariance matrix formed by standard deviations of various parametersAnd a line vector composed of the average values of the parametersOn the verification data setThe values represent the smallest probabilistic neural network model.
According to the formulaAt random inIn the distribution, the number of the pairs is 5 times sampled and 5 are obtainedValue, averaged to a predicted state parameter value difference. The prediction result of the influencing variable and the target value is the difference value of the parameter value in the current time and the solutionAnd then carrying out inverse standardization according to a standardization mode of the collected process parameters, and verifying the performance of the constructed state transition model by using the test set.
Table 1: and (4) evaluation comparison of the influence variable and the target value.
Table 2 shows the effect of the evaluation comparison of the different algorithms on the influence variable and the target value (the evaluation index is mean square error MSE, the smaller the value of the mean square error MSE, the better the effect of the model is), and the effect of the probabilistic neural network prediction model constructed by using the loss function constructed by the method as the index of the model training is better and the advantage of the prediction effect is obvious.
The predicted effect of coal consumption is the basis of the recommended control variables, which represent the rewards obtained when actions leading to state transitions occur, and the predicted effect of randomly selecting 100 groups of coal consumption shows that the predicted value is very close to the actual value as shown in fig. 1.
Then, an agent model capable of outputting artificially controllable parameter data in the production process is constructed by using reinforcement learning, because the number of empty value variables is 6 in the present embodiment, 64 elements are included in the action space during the state transition process, and the following table shows parameter values corresponding to the action of each control variable.
Table 2: the action of each control variable.
By usingA reinforcement learning algorithm ofInput deviceTrain out the current stateDown-applying a certain actionProducing a rewardTable formObtaining an optimal policy network, wherein the discount coefficientSet to 0.98 and the probability of exploration set to 0.1.
Finally, the control variable of the previous state is input into the constructed intelligent agent model, the control variable value of the next state is output, and the coal consumption of the rotary kiln production system caused by the input of the control variable recommended by the intelligent agent is compared with the coal consumption of the next state predicted by the state transition model; fig. 2 shows the implementation effect of the control variables recommended by the agent, which can be obtained from fig. 2, and 100 sets of recommended process parameter data are selected, the optimized coal consumption value per unit time is 0.2893, the control variables recommended by the agent conform to the process of the actual production system, and the excellent control variables can be recommended to optimize the target coal consumption in real time.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.
Claims (9)
1. A real-time optimization method for process parameters fusing a probability network and reinforcement learning is characterized by comprising the following steps:
collecting technological parameter data of a production system, and carrying out operations of preprocessing, processing and dividing a data set on the technological parameter data;
constructing a state transition model of adjacent time intervals in the production process based on the preprocessed process parameter data;
the construction of the state transition model includes,
constructing a probability network by using the divided training set;
solving the influence variable of the state of the time interval after the closing and the actual target value of the production;
acquiring a state transition function and a reward function which can express that the actual target value of the production changes along with the state transition in a high fidelity way; building an intelligent agent model capable of outputting artificial controllable parameter data in the production process by utilizing Q-Learning reinforcement Learning;
and fusing and applying the state transition model and the intelligent agent model to realize the real-time optimization and output of the process parameters in the production process.
2. The method for optimizing process parameters of fusion probability network and reinforcement learning in real time as claimed in claim 1, wherein: the collection of the process parameter data comprises collecting control variables, influence variables and actual production target values of the production process at equal time intervals;
the control variables comprise process parameters which can be manually and directly adjusted in the production process;
the influence variables comprise process parameters generated by the influence of manually input control variables on the production system;
the actual target value of production comprises a production target which is completed by the production system at a certain time interval.
3. The method for optimizing process parameters of fusion probability network and reinforcement learning in real time as claimed in claim 2, wherein: the pre-processing and processing of the process parameter data includes,
the preprocessing of the process parameter data comprises processing of abnormal samples, filling of null values and standardization of data;
and the processing of the process parameter data comprises the steps of differentiating two adjacent time intervals of the actual production target value, taking the difference value of the two adjacent time intervals as a new target value, and then aggregating the sample data of a plurality of time intervals in the time sequence.
4. The method for optimizing process parameters of fusion probability network and reinforcement learning in real time as claimed in any one of claims 1 to 3, wherein: the dividing of the process parameter data comprises dividing a new preprocessed and processed data set into a training set, a verification set and a test set according to a certain proportion.
5. The method for optimizing process parameters in real time for fusion of a probabilistic network and reinforcement learning according to claim 4, wherein: the establishment of the probability network comprises the setting of a loss function and the training of a probability network model;
setting the loss function loss p Predicting the probability for a logarithm;
the calculation of the log-prediction probability includes,
loss p =-logf(X)
wherein X represents a training data set, and f (X) represents a density function of the probability network model;
the output of the training of the probabilistic network model is a gaussian distribution parameterized by diagonal covariance;
the calculation of the density function of the probabilistic network model includes,
the computation of the loss function after the pull-in logarithmic prediction probability reduction includes,
loss p =(X-μ) T ∑ -1 (X-μ)+(2π) k |∑|
where μ represents the mean vector of each attribute, T represents the transpose of the matrix, Σ represents the diagonal covariance matrix, Σ -1 Represents the inverse of the diagonal covariance matrix, k represents the number of features in X, and Σ | represents the determinant of the diagonal covariance matrix.
6. The method for optimizing process parameters in real time for fusion of a probabilistic network and reinforcement learning according to claim 5, wherein: the solving of the influence variables and the target values of the state of the tight time interval comprises the selection of a probability network submodel in a model library, the solving of the difference value of the influence variables and the target values of the adjacent time intervals and the solving of the influence variables of the tight time interval;
the selection of the probability network submodel comprises randomly selecting a submodel from a learned model library to obtain a mean vector and a diagonal covariance matrix output by the submodel;
the calculation of the solution of the adjacent time interval influencing variable and target value difference comprises,
ΔX=μ+ε*σ
where Δ X represents the difference between the current state parameter value and the immediate state parameter value, and ε represents the distribution obeying N (0, 1)Solving a random data set formed by the result, wherein sigma represents a standard deviation;
the solving of the difference between the adjacent time interval influencing variable and the target value comprises randomly sampling epsilon from a data set obeying the distribution of N (0, 1) to obtain a plurality of delta X values, and taking the average number to obtain the parameter value of the normalized state after tightening;
the parameter solving of the closely-spaced time interval influence variable comprises the steps of carrying out inverse standardization on a parameter value in the current time plus a solving difference value according to a standardization mode of collected process parameters, and verifying the performance of the constructed state transition model by utilizing the training set.
7. The method for optimizing process parameters in real time for fusion of a probabilistic network and reinforcement learning according to claim 6, wherein: the construction of the intelligent agent model comprises an action design causing state transition and an action-caused reward design;
the action design for causing state transition comprises that when each control variable is adjacentThe interval is differentiated, the median delta of all values is taken, the delta value which is independently increased for each control variable at each moment can be defined as an action a, and the element contained in the action space in the state transition process is 2 n N is the number of control variables;
the reward design caused by the action comprises a reward value r (s, a) with a target value difference value, wherein the reward value r (s, a) is the reward value of the given action a which is transferred from the current state s to the next state s', the target value changes correspondingly every time the control variable is changed, and the change value of the target value which changes correspondingly is the reward after the control variable changes.
8. The method for optimizing process parameters in real time for fusion of a probabilistic network and reinforcement learning according to claim 7, wherein: the learning process of the intelligent agent model comprises,
searching the minimum value of the TD error, and setting the minimum value of the TD error as a target;
the TD error is calculated as the difference between,
Q(s,a)=r(s,a)+γmax(Q(s',a'))
wherein Q (s, a) represents the revenue expectation obtained by applying action a in the s state, gamma represents the discount coefficient, and Q (s ', a') represents the revenue expectation obtained by applying action a 'in the s' state;
and inputting the influence variables of the current time interval into the state value network and the strategy network by using a Q-Learning reinforcement Learning algorithm, and generating an action a corresponding to the maximum reward value r (s, a) through loop iteration, wherein all states, actions and rewards of the action a form a strategy network.
9. The method for optimizing process parameters in real time for fusion of a probabilistic network and reinforcement learning according to claim 8, wherein: the real-time optimization and output of the process parameters in the production process comprises,
collecting the technological parameters of the production process in real time by taking each fixed time interval as a unit, selecting the number of time intervals to be aggregated according to the actual business requirements and the time sequence, and carrying out data processing and aggregation on the sample data consisting of the selected influence variables and the target value;
inputting the processed and aggregated data into the constructed state transition model, and outputting a difference value between an influence variable and a target value;
and inputting the influence variable of the current time interval into the trained strategy network, outputting a control variable, and realizing the fusion and application of the state transition model and the intelligent agent model in the actual production process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210989613.7A CN115061444B (en) | 2022-08-18 | 2022-08-18 | Real-time optimization method for process parameters integrating probability network and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210989613.7A CN115061444B (en) | 2022-08-18 | 2022-08-18 | Real-time optimization method for process parameters integrating probability network and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115061444A CN115061444A (en) | 2022-09-16 |
CN115061444B true CN115061444B (en) | 2022-12-09 |
Family
ID=83208015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210989613.7A Active CN115061444B (en) | 2022-08-18 | 2022-08-18 | Real-time optimization method for process parameters integrating probability network and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115061444B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115439021B (en) * | 2022-10-26 | 2023-03-24 | 江苏新恒基特种装备股份有限公司 | Metal strengthening treatment quality analysis method and system |
CN118642375A (en) * | 2024-08-14 | 2024-09-13 | 南通理工学院 | Self-adaptive control method and system for pyrolysis temperature and oxygen concentration of rotary kiln |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114692310A (en) * | 2022-04-14 | 2022-07-01 | 北京理工大学 | Virtual-real integration-two-stage separation model parameter optimization method based on Dueling DQN |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11327475B2 (en) * | 2016-05-09 | 2022-05-10 | Strong Force Iot Portfolio 2016, Llc | Methods and systems for intelligent collection and analysis of vehicle data |
US11965946B2 (en) * | 2020-12-04 | 2024-04-23 | Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften E. V. | Machine learning based processing of magnetic resonance data, including an uncertainty quantification |
-
2022
- 2022-08-18 CN CN202210989613.7A patent/CN115061444B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114692310A (en) * | 2022-04-14 | 2022-07-01 | 北京理工大学 | Virtual-real integration-two-stage separation model parameter optimization method based on Dueling DQN |
Non-Patent Citations (4)
Title |
---|
an incremental probabilistic neural network for regression and reinforcement learning tasks;Milton Roberto Heinen and Paulo Martins Engel;《豆丁网》;20170111;全文 * |
基于强化学习的参数化电路优化算法;唐长成;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200430(第4期);全文 * |
基于深度强化学习的多机器人协同导航;周世正;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190831(第8期);全文 * |
基于自适应归一化RBF网络的Q-V值函数协同逼近模型;刘全等;《计算机学报》;20150731;第38卷(第7期);第1386-1396页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115061444A (en) | 2022-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115061444B (en) | Real-time optimization method for process parameters integrating probability network and reinforcement learning | |
CN109993270A (en) | Lithium ion battery residual life prediction technique based on grey wolf pack optimization LSTM network | |
CN116596044B (en) | Power generation load prediction model training method and device based on multi-source data | |
CN112884236B (en) | Short-term load prediction method and system based on VDM decomposition and LSTM improvement | |
CN112967088A (en) | Marketing activity prediction model structure and prediction method based on knowledge distillation | |
CN112910690A (en) | Network traffic prediction method, device and equipment based on neural network model | |
CN115688913A (en) | Cloud-side collaborative personalized federal learning method, system, equipment and medium | |
CN110991621A (en) | Method for searching convolutional neural network based on channel number | |
CN111860787A (en) | Short-term prediction method and device for coupling directed graph structure flow data containing missing data | |
CN112270442A (en) | IVMD-ACMPSO-CSLSTM-based combined power load prediction method | |
CN110929958A (en) | Short-term traffic flow prediction method based on deep learning parameter optimization | |
CN114548591A (en) | Time sequence data prediction method and system based on hybrid deep learning model and Stacking | |
CN115828990A (en) | Time-space diagram node attribute prediction method for fused adaptive graph diffusion convolution network | |
CN113449919B (en) | Power consumption prediction method and system based on feature and trend perception | |
CN113627594A (en) | One-dimensional time sequence data amplification method based on WGAN | |
CN117668743A (en) | Time sequence data prediction method of association time-space relation | |
CN112381591A (en) | Sales prediction optimization method based on LSTM deep learning model | |
CN116822722A (en) | Water level prediction method, system, device, electronic equipment and medium | |
Wang et al. | Time series prediction with incomplete dataset based on deep bidirectional echo state network | |
CN114781699B (en) | Reservoir water level prediction and early warning method based on improved particle swarm Conv1D-Attention optimization model | |
CN114997464A (en) | Popularity prediction method based on graph time sequence information learning | |
CN112667394B (en) | Computer resource utilization rate optimization method | |
CN109117491B (en) | Agent model construction method of high-dimensional small data fusing expert experience | |
CN114118567B (en) | Power service bandwidth prediction method based on double-channel converged network | |
CN114841472B (en) | GWO optimization Elman power load prediction method based on DNA hairpin variation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |