CN113325721B - Model-free adaptive control method and system for industrial system - Google Patents

Model-free adaptive control method and system for industrial system Download PDF

Info

Publication number
CN113325721B
CN113325721B CN202110877921.6A CN202110877921A CN113325721B CN 113325721 B CN113325721 B CN 113325721B CN 202110877921 A CN202110877921 A CN 202110877921A CN 113325721 B CN113325721 B CN 113325721B
Authority
CN
China
Prior art keywords
control
monitoring data
data
model
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110877921.6A
Other languages
Chinese (zh)
Other versions
CN113325721A (en
Inventor
罗远哲
刘瑞景
赵爱民
李玉琼
耿云晓
刘志明
易文军
任光远
靳晓栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongchao Weiye Beijing Business Data Technology Service Co ltd
Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Original Assignee
Zhongchao Weiye Beijing Business Data Technology Service Co ltd
Beijing China Super Industry Information Security Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongchao Weiye Beijing Business Data Technology Service Co ltd, Beijing China Super Industry Information Security Technology Ltd By Share Ltd filed Critical Zhongchao Weiye Beijing Business Data Technology Service Co ltd
Priority to CN202110877921.6A priority Critical patent/CN113325721B/en
Publication of CN113325721A publication Critical patent/CN113325721A/en
Application granted granted Critical
Publication of CN113325721B publication Critical patent/CN113325721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Abstract

The invention relates to a model-free self-adaptive control method and system for an industrial system. The method comprises the following steps: acquiring historical monitoring data of various devices in an industrial process; generating a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment; constructing a prediction simulation model according to the historical monitoring data; training a reinforcement learning-based control model according to the prediction simulation model based on the control instruction set to generate a trained reinforcement learning-based control model; acquiring current monitoring data; and inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system, and outputting the optimal set target of the industrial system. The invention can greatly reduce the trial and error cost and obtain a more effective intelligent control strategy.

Description

Model-free adaptive control method and system for industrial system
Technical Field
The invention relates to the field of industrial intelligent control and reinforcement learning control, in particular to a model-free self-adaptive control method and system for an industrial system.
Background
In recent years, the development of the industrial field is promoted by the rapid development of modern science and technology, and the informatization, automation and intelligent development of the industrial field is mature day by day. With the increasing expansion of industrial production scale, the realization of unmanned intelligent control in a complex industrial scene, how to further reduce the labor cost and the skill training cost of operators, how to separate from human experience intervention, and realize a more accurate and reliable intelligent control strategy becomes a key problem to be solved urgently. The traditional intelligent control technology is only suitable for simple industrial environments, in actual industrial production, a large number of sensors used for monitoring data exist in complex industrial environments, the traditional intelligent control technology cannot well utilize the potential characteristics of the monitoring data, the control method based on machine learning can learn the change rule of the monitoring data, has certain learning capacity and generalization capacity, can extract the objective rule of the production environment from the monitoring data, and sums up the experience and knowledge which cannot be found by human experts.
In machine Learning-based control, a typical method is to use a control law Learning method based on a Reinforcement Learning (RL) algorithm. The monitoring value migration rule in the complex industrial environment can be learned from data through reinforcement learning, a field expert is not needed for designing a control rule, and the method is suitable for complex industrial scenes. And increment learning is carried out on the basis of reinforcement learning, so that the control model has self-adaptive capacity and is closer to the actual industrial production condition in the actual application process. Reinforcement learning has wide application in various industrial fields such as power grid emergency control strategy research [ Liuwei, Zdongxia, Wangxingying, Houjinxiu, Liuliping, deep reinforcement learning-based power grid emergency control strategy research [ J ]. China Motor engineering bulletin, 2018, 38(01): 109-. In the existing actual industrial production control, training and testing of a control strategy in an industrial environment are needed to obtain a self-adaptive model with better performance, and the trial-and-error cost and the research and development cost are too high.
Disclosure of Invention
The invention aims to provide a model-free self-adaptive control method and system for an industrial system, which aim to solve the problems of high trial and error cost and high research and development cost.
In order to achieve the purpose, the invention provides the following scheme:
a model-free adaptive control method for an industrial system, comprising:
acquiring historical monitoring data of various devices in an industrial process; the historical monitoring data comprises controllable data, state data, environmental noise data and target output data; the controllable data comprises the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter and the rotating speed of a pump; the state class data comprises pipeline pressure in industrial production; the environmental noise data comprises product information, temperature and humidity of the previous process; the target output class data comprises an object controlled in the production process;
generating a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment;
constructing a prediction simulation model according to the historical monitoring data;
training a reinforcement learning-based control model according to the prediction simulation model based on the control instruction set to generate a trained reinforcement learning-based control model;
acquiring current monitoring data;
and inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system, and outputting the optimal set target of the industrial system.
Optionally, the generating a control instruction set by using the controllable class data specifically includes:
defining a piece of monitoring data, wherein the monitoring data is historical monitoring data S or current monitoring data
Figure 100002_DEST_PATH_IMAGE001
Figure 290402DEST_PATH_IMAGE002
For any one of the monitored data, the controllable variable of the controllable class data,
Figure 100002_DEST_PATH_IMAGE003
for the system state quantity of the state class data in any one piece of the monitoring data,
Figure 570991DEST_PATH_IMAGE004
for the amount of ambient noise of the ambient noise-like data in any of the monitoring data,
Figure 100002_DEST_PATH_IMAGE005
the target output quantity of the target output class data in any piece of monitoring data is S, the historical monitoring data of a continuous time period,
Figure 209783DEST_PATH_IMAGE006
the size of a historical monitoring data set is shown, a control is controllable data, a state is state data, env is environmental noise data, and a goal is target output data;
to the controlled variable from the historical monitoring data S
Figure 100002_DEST_PATH_IMAGE007
Collecting and generating
Figure 965512DEST_PATH_IMAGE006
A bar control instruction;
shrinking by clustering
Figure 842201DEST_PATH_IMAGE006
Determining the optimal clustering center number by using Bayesian information criterion according to the scale of the control instructionkAnd all cluster centers in each cluster
Figure 992559DEST_PATH_IMAGE008
As the enhancement basisA control instruction set is generated by an action instruction of the learned control model.
Optionally, the constructing a predictive simulation model according to the historical monitoring data specifically includes:
constructing a plurality of prediction models to predict the system state quantity and the target prediction state output quantity at the next moment
Figure 100002_DEST_PATH_IMAGE009
Each variable in (a) is independently predicted; for the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and l2 regular terms are adopted to reduce overfitting;
dividing the historical monitoring data into 7: 3; wherein 30% of the historical monitoring data is used as a validation set for determining the hyper-parameters of the optimal prediction model;
according to the controllable variable given by the controller and the volume of the environmental noise
Figure 413920DEST_PATH_IMAGE010
And the system state quantity and the target current state output quantity in the historical monitoring data
Figure 100002_DEST_PATH_IMAGE011
And integrating a plurality of the prediction models to construct a prediction simulation model.
Optionally, based on the control instruction set, training a reinforcement learning-based control model according to the predictive simulation model, and generating the trained reinforcement learning-based control model specifically includes:
constructing a control model based on reinforcement learning and acquiring the current monitoring data
Figure 3033DEST_PATH_IMAGE012
Setting a control target value
Figure 100002_DEST_PATH_IMAGE013
And the historyMonitoring the amount of ambient noise at the next moment in the data
Figure 298011DEST_PATH_IMAGE014
The current monitoring data is processed
Figure 100002_DEST_PATH_IMAGE015
And setting a control target value
Figure 150429DEST_PATH_IMAGE016
Input to the reinforcement learning-based control model, and output
Figure 100002_DEST_PATH_IMAGE017
The profit value of each control instruction is used as probability weight for sampling, and one control instruction in the control instruction set is sampled
Figure 522068DEST_PATH_IMAGE018
According to the current monitoring data
Figure 100002_DEST_PATH_IMAGE019
And the control instruction
Figure 118134DEST_PATH_IMAGE020
Predicting the system state quantity and the target predicted state output quantity at the next moment by using the prediction simulation model
Figure 100002_DEST_PATH_IMAGE021
According to the set control target value
Figure 533197DEST_PATH_IMAGE022
And target output quantity at next time
Figure 100002_DEST_PATH_IMAGE023
Calculating decision rewardsr
Based on theDecision rewardsrThe current monitoring data
Figure 822096DEST_PATH_IMAGE024
The control instruction
Figure 100002_DEST_PATH_IMAGE025
And the system state quantity and the target predicted state output quantity at the next moment
Figure 686891DEST_PATH_IMAGE026
Training the reinforcement Learning-based control model with a Q-Learning-based time sequence difference loss function to enable the reinforcement Learning-based control model to monitor the current monitoring data
Figure 100002_DEST_PATH_IMAGE027
Then, a control instruction for maximizing the future accumulated award is output
Figure 86648DEST_PATH_IMAGE028
Monitoring data of the next moment
Figure 100002_DEST_PATH_IMAGE029
Replacing the current monitoring data
Figure 356218DEST_PATH_IMAGE030
And training the reinforcement learning-based control model until the average reward of the reinforcement learning-based control model is not increased any more, and determining the trained reinforcement learning-based control model.
Optionally, the time sequence differential loss function is:
Figure 100002_DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 550439DEST_PATH_IMAGE032
to a cumulative discount value;ssystem state quantity and target current state output quantity at current moment
Figure 100002_DEST_PATH_IMAGE033
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure 908388DEST_PATH_IMAGE034
Figure 100002_DEST_PATH_IMAGE035
Control instructions for sampling
Figure 111837DEST_PATH_IMAGE036
Figure 100002_DEST_PATH_IMAGE037
Is at the same times'The control input values that are available for selection in a state,
Figure 501492DEST_PATH_IMAGE038
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure 100002_DEST_PATH_IMAGE039
indicating the system state assThe control command is executed as
Figure 397773DEST_PATH_IMAGE035
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure 145149DEST_PATH_IMAGE040
indicating the system state ass'The control command is executed as
Figure 588506DEST_PATH_IMAGE037
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 3307DEST_PATH_IMAGE035
Evolved intos'The obtained single-step control benefit isrTo the network output value
Figure 100002_DEST_PATH_IMAGE041
Optimizing to obtain the optimized result of the time sequence difference loss function
Figure 804910DEST_PATH_IMAGE042
An industrial system model-free adaptive control system, comprising:
the historical monitoring data acquisition module is used for acquiring historical monitoring data of various devices in the industrial process; the historical monitoring data comprises controllable data, state data, environmental noise data and target output data; the controllable data comprises the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter and the rotating speed of a pump; the state class data comprises pipeline pressure in industrial production; the environmental noise data comprises product information, temperature and humidity of the previous process; the target output class data comprises an object controlled in the production process;
the control instruction set generating module is used for generating a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment;
the prediction simulation model building module is used for building a prediction simulation model according to the historical monitoring data;
the trained reinforcement learning-based control model determining module is used for training a reinforcement learning-based control model according to the prediction simulation model based on the control instruction set to generate the trained reinforcement learning-based control model;
the current monitoring data acquisition module is used for acquiring current monitoring data;
and the self-adaptive control module is used for inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system and outputting the optimal set target of the industrial system.
Optionally, the control instruction set generating module specifically includes:
a parameter definition unit for defining a piece of monitoring data, wherein the monitoring data is historical monitoring data S or current monitoring data
Figure 806626DEST_PATH_IMAGE001
Figure 289560DEST_PATH_IMAGE043
For any one of the monitored data, the controllable variable of the controllable class data,
Figure DEST_PATH_IMAGE044
for the system state quantity of the state class data in any one piece of the monitoring data,
Figure 621184DEST_PATH_IMAGE045
for the amount of ambient noise of the ambient noise-like data in any of the monitoring data,
Figure 313064DEST_PATH_IMAGE005
the target output quantity of the target output class data in any piece of monitoring data is S, the historical monitoring data of a continuous time period,
Figure 300612DEST_PATH_IMAGE006
the size of a historical monitoring data set is shown, a control is controllable data, a state is state data, env is environmental noise data, and a goal is target output data;
a control instruction generation unit for generating a controllable variable from the historical monitoring data S
Figure DEST_PATH_IMAGE046
Collecting and generating
Figure 915133DEST_PATH_IMAGE006
A bar control instruction;
a control instruction set generation unit for employingClustering mode reduction
Figure 71570DEST_PATH_IMAGE006
Determining the optimal clustering center number by using Bayesian information criterion according to the scale of the control instructionkAnd all cluster centers in each cluster
Figure 949396DEST_PATH_IMAGE008
As an action command of the reinforcement learning-based control model, a control command set is generated.
Optionally, the prediction simulation model building module specifically includes:
a plurality of prediction model construction units for constructing a plurality of prediction models to predict the system state quantity and the target predicted state output quantity at the next time
Figure 453933DEST_PATH_IMAGE047
Each variable in (a) is independently predicted; for the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and l2 regular terms are adopted to reduce overfitting;
the dividing unit is used for dividing the historical monitoring data into 7: 3; wherein 30% of the historical monitoring data is used as a validation set for determining the hyper-parameters of the optimal prediction model;
a prediction simulation model construction unit for constructing a prediction simulation model based on the controllable variables and the amount of the environmental noise
Figure DEST_PATH_IMAGE048
And the system state quantity and the target current state output quantity in the historical monitoring data
Figure 137724DEST_PATH_IMAGE049
And integrating a plurality of the prediction models to construct a prediction simulation model.
Optionally, the trained control model determination module based on reinforcement learning specifically includes:
a reinforcement learning based control model construction unit for constructing a reinforcement learning based control model and acquiring the current monitoring data
Figure DEST_PATH_IMAGE050
Setting a control target value
Figure 476564DEST_PATH_IMAGE051
And the volume of the environmental noise at the next moment in the historical monitoring data
Figure 994133DEST_PATH_IMAGE014
A control instruction sampling unit for sampling the current monitoring data
Figure DEST_PATH_IMAGE052
And setting a control target value
Figure 319722DEST_PATH_IMAGE022
Input to the reinforcement learning-based control model, and output
Figure 948149DEST_PATH_IMAGE017
The profit value of each control instruction is used as probability weight for sampling, and one control instruction in the control instruction set is sampled
Figure 577714DEST_PATH_IMAGE018
A prediction unit for predicting the current monitoring data
Figure 502070DEST_PATH_IMAGE053
And the control instruction
Figure DEST_PATH_IMAGE054
Predicting the system state quantity and the target predicted state output quantity at the next moment by using the prediction simulation model
Figure 13822DEST_PATH_IMAGE021
A decision reward calculation unit for controlling the target value according to the setting
Figure 711520DEST_PATH_IMAGE022
And target output quantity at next time
Figure 428547DEST_PATH_IMAGE023
Calculating decision rewardsr
A training unit for rewarding based on the decisionrThe current monitoring data
Figure 553498DEST_PATH_IMAGE024
The control instruction
Figure 490230DEST_PATH_IMAGE025
And the system state quantity and the target predicted state output quantity at the next moment
Figure 493083DEST_PATH_IMAGE026
Training the reinforcement Learning-based control model with a Q-Learning-based time sequence difference loss function to enable the reinforcement Learning-based control model to monitor the current monitoring data
Figure 566081DEST_PATH_IMAGE027
Then, a control instruction for maximizing the future accumulated award is output
Figure 861933DEST_PATH_IMAGE028
A control model determining unit based on reinforcement learning after training, which is used for determining the monitoring data of the next moment
Figure 754803DEST_PATH_IMAGE055
Replacing the current monitoring data
Figure 21400DEST_PATH_IMAGE030
And training the reinforcement learning-based control model until the average reward of the reinforcement learning-based control model is not increased any more, and determining the trained reinforcement learning-based control model.
Optionally, the time sequence differential loss function is:
Figure DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 11221DEST_PATH_IMAGE032
to a cumulative discount value;ssystem state quantity and target current state output quantity at current moment
Figure 212396DEST_PATH_IMAGE033
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure 94026DEST_PATH_IMAGE034
Figure 406059DEST_PATH_IMAGE035
Control instructions for sampling
Figure 453649DEST_PATH_IMAGE036
Figure 825725DEST_PATH_IMAGE037
Is at the same times'The control input values that are available for selection in a state,
Figure 254039DEST_PATH_IMAGE038
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure 432079DEST_PATH_IMAGE039
indicating the system state assThe control command is executed as
Figure 101220DEST_PATH_IMAGE035
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure 644197DEST_PATH_IMAGE040
indicating the system state ass'The control command is executed as
Figure 530113DEST_PATH_IMAGE037
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 449528DEST_PATH_IMAGE035
Evolved intos'The obtained single-step control benefit isrTo the network output value
Figure 710526DEST_PATH_IMAGE041
Optimizing to obtain the optimized result of the time sequence difference loss function
Figure 424404DEST_PATH_IMAGE042
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention provides a model-free self-adaptive control method and system for an industrial system, which directly utilize sensing monitoring data of the industrial system to establish a prediction simulation model for environment state deduction, simultaneously obtain a set of control instructions in the data preprocessing process, finally utilize a reinforcement learning method to learn a control strategy based on the prediction simulation model, train the control model based on the reinforcement learning, generate the trained control model based on the reinforcement learning, and output the optimal set target of the industrial system, so that the training and the testing of the control strategy in the industrial environment are not needed, and the trial-and-error cost is greatly reduced. And even if the actual industrial equipment generating the training data does not show better control performance, the intelligent control strategy which is more effective than the existing control system or algorithm can be obtained by utilizing the model-free adaptive control method or the system learning control experience of the industrial system provided by the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a model-free adaptive control method for an industrial system according to the present invention;
FIG. 2 is a technical framework diagram of a model-free adaptive control method for an industrial system according to the present invention;
FIG. 3 is a schematic diagram of a predictive simulation model according to the present invention;
FIG. 4 is a schematic diagram of a reinforcement learning network structure according to the present invention;
FIG. 5 is a block diagram of a model-free adaptive control system for an industrial system according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a model-free self-adaptive control method and system for an industrial system, which can greatly reduce the trial and error cost and obtain a more effective intelligent control strategy.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a model-free adaptive control method for an industrial system according to the present invention, and as shown in fig. 1, the model-free adaptive control method for an industrial system includes:
step 101: acquiring historical monitoring data of various devices in an industrial process; the historical monitoring data comprises controllable data, state data, environmental noise data and target output data; the controllable data comprises the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter and the rotating speed of a pump; the state class data comprises pipeline pressure in industrial production; the environmental noise data comprises product information, temperature and humidity of the previous process; the target output class data includes objects controlled in the production process.
Firstly, classifying and defining various monitoring data collected in an industrial process from devices such as sensors, motor devices, valve switches and the like, and specifically classifying the monitoring data into the following four types:
1) controllable class: production parameters which allow direct control in the industrial field, such as the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter, the rotating speed of a pump and the like which can be controlled in industrial production, are classified into controllable variables, which are hereinafter referred to as control.
2) The state class: the pressure value of the pipeline can not be directly controlled, but the pressure value of the pipeline can not be directly controlled, and the flow of the pipeline can be adjusted only by controlling the pump speed of one section of the pipeline, so that the pressure value of the pipeline can be controlled. Such variables are hereinafter referred to as states.
3) Ambient noise class: the variables are not determined internally by the production system, but only from the outside, including product information of the previous process or external environmental influences such as temperature, moderation, etc., hereinafter abbreviated as env.
4) Target output class: the object to be controlled in the production process is often a key object influencing quality and cost in the production process, and is hereinafter referred to as "good".
In practical applications, as shown in fig. 2, it is first necessary to install sensors at key nodes in the production process to measure the system state quantity s' (state, good) and the environmental noiseThe quantity s (env) is measured and the controllable quantity in the production process can be generally directly obtained from an on-site control system. After the data collection is completed, different time series need to be aligned according to time, and specifically, a linear interpolation method or a gaussian process method can be adopted. Assume that the aligned sequence length is
Figure 532037DEST_PATH_IMAGE006
Step 102: generating a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment.
The step 102 specifically includes: defining a piece of monitoring data, wherein the monitoring data is historical monitoring data S or current monitoring data
Figure 989563DEST_PATH_IMAGE001
Figure 836559DEST_PATH_IMAGE043
For any one of the monitored data, the controllable variable of the controllable class data,
Figure 986917DEST_PATH_IMAGE044
for the system state quantity of the state class data in any one piece of the monitoring data,
Figure 316268DEST_PATH_IMAGE045
for the amount of ambient noise of the ambient noise-like data in any of the monitoring data,
Figure 311905DEST_PATH_IMAGE005
the target output quantity of the target output class data in any piece of monitoring data is S, the historical monitoring data of a continuous time period,
Figure 541636DEST_PATH_IMAGE006
the size of a historical monitoring data set is shown, a control is controllable data, a state is state data, env is environmental noise data, and a goal is target output data; from the historyTo controlled variable in monitoring data S
Figure 862896DEST_PATH_IMAGE007
Collecting and generating
Figure 413963DEST_PATH_IMAGE006
A bar control instruction; shrinking by clustering
Figure 478871DEST_PATH_IMAGE006
Determining the optimal clustering center number by using Bayesian information criterion according to the scale of the control instructionkAnd all cluster centers in each cluster
Figure 566038DEST_PATH_IMAGE008
As an action command of the reinforcement learning-based control model, a control command set is generated.
In practical application, after data collection and data alignment are completed, controllable parameters in original monitoring data are extracted, in the case of the scheme, the controllable parameters comprise the opening of a flow valve, the rotating speeds of 2 frequency converters and 1 pump, the data of the controllable units can be derived together with the monitoring data of other sensors, and the first step is that
Figure 58199DEST_PATH_IMAGE057
Bar control instruction
Figure DEST_PATH_IMAGE058
Expressed as shown in formula 1:
Figure 893300DEST_PATH_IMAGE059
(1)
wherein, s (control) i Representing the controllable part of the ith record in the monitored data. Obtained through step 101
Figure DEST_PATH_IMAGE060
A control instruction, which is too large to make a decision by the reinforcement learning modelAnd there may be a large number of similar or identical instructions, the present invention employs clustering to reduce its size.
Specifically, a K-means clustering algorithm is adopted to aggregate similar control instructions into a cluster, only the center of the cluster is used as a control instruction for selection of a reinforcement learning model, and due to different dimensions of different input items, a normalization method is required to be adopted before clustering to make calculation of instruction distances in actions sets more meaningful:
Figure 340329DEST_PATH_IMAGE061
(2)
where mean represents the mean of all data entries calculated and std represents the standard deviation of all data entries calculated. Center of cluster numberkThe value of (b) is measured by referring to a Bayes Information Criterion (BIC) value, and the larger the BIC value is, the better the clustering effect is. BIC is defined as shown in formula 3:
Figure DEST_PATH_IMAGE062
(3)
where L is the sum of the likelihood values of all data points for the class to which they belong. By comparing the number of different clusterskThe optimal number of clusters is obtained by the BIC value of (1):
Figure 108434DEST_PATH_IMAGE063
(4)
components denote the number of clustering centers, and kmeans denotes the process of performing the K-means clustering algorithm. Finally, the mean value of the instructions in each class is used for representing one instruction, so that the average value of the instructions in each class is obtained under the given data setkA control instruction.
Figure DEST_PATH_IMAGE064
(5)
Step 103: and constructing a prediction simulation model according to the historical monitoring data.
The step 103 specifically includes: constructing a plurality of prediction models to predict the system state quantity and the target prediction state output quantity at the next moment
Figure 69699DEST_PATH_IMAGE009
Each variable in (a) is independently predicted; for the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and l2 regular terms are adopted to reduce overfitting; dividing the historical monitoring data into 7: 3; wherein 30% of the historical monitoring data is used as a validation set for determining the hyper-parameters of the optimal prediction model; according to the controllable variable given by the controller and the volume of the environmental noise
Figure 860937DEST_PATH_IMAGE010
And the system state quantity and the target current state output quantity in the historical monitoring data
Figure 2069DEST_PATH_IMAGE065
And integrating a plurality of the prediction models to construct a prediction simulation model.
In practical applications, as shown in FIG. 3, it is necessary to construct multiple prediction models to pair
Figure 795319DEST_PATH_IMAGE009
And finally, integrating all independent models together to serve as a complete system simulation prediction model.
For the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and the l2 regular term is adopted to reduce overfitting.
The historical monitoring data was divided into 7:3, with 30% of the data being used as validation set to determine the optimal model hyper-parameters.
For each one
Figure 363704DEST_PATH_IMAGE009
The predicted dependent variable is integrated with all the models to construct a simulation model of the industrial process, namely, the simulation model of the industrial process can be constructed according to the control quantity or the environmental noise quantity given by the controller
Figure 642238DEST_PATH_IMAGE048
And the current state quantity of the system
Figure 587060DEST_PATH_IMAGE065
Predict what is new
Figure 237747DEST_PATH_IMAGE009
Step 104: and training a reinforcement learning-based control model according to the prediction simulation model based on the control instruction set to generate the trained reinforcement learning-based control model.
The step 104 specifically includes: constructing a control model based on reinforcement learning and acquiring the current monitoring data
Figure DEST_PATH_IMAGE066
Setting a control target value
Figure 570508DEST_PATH_IMAGE067
And the volume of the environmental noise at the next moment in the historical monitoring data
Figure DEST_PATH_IMAGE068
(ii) a The current monitoring data is processed
Figure 637471DEST_PATH_IMAGE069
And setting a control target value
Figure DEST_PATH_IMAGE070
Input to the reinforcement learning-based control model, and output
Figure 182722DEST_PATH_IMAGE017
The profit value of each control instruction is used asSampling probability weight, and sampling one control instruction in the control instruction set
Figure 717608DEST_PATH_IMAGE018
(ii) a According to the current monitoring data
Figure 863681DEST_PATH_IMAGE019
And the control instruction
Figure 851229DEST_PATH_IMAGE054
Predicting the system state quantity and the target predicted state output quantity at the next moment by using the prediction simulation model
Figure 137854DEST_PATH_IMAGE021
(ii) a According to the set control target value
Figure 261667DEST_PATH_IMAGE022
And target output quantity at next time
Figure 608335DEST_PATH_IMAGE023
Calculating decision rewardsr(ii) a Reward based on the decisionrThe current monitoring data
Figure 581714DEST_PATH_IMAGE024
The control instruction
Figure 672030DEST_PATH_IMAGE025
And the system state quantity and the target predicted state output quantity at the next moment
Figure 915929DEST_PATH_IMAGE026
Training the reinforcement Learning-based control model with a Q-Learning-based time sequence difference loss function to enable the reinforcement Learning-based control model to monitor the current monitoring data
Figure 167919DEST_PATH_IMAGE027
Next, an output maximizes the future accumulated awardControl instruction
Figure 631524DEST_PATH_IMAGE028
(ii) a Monitoring data of the next moment
Figure 259951DEST_PATH_IMAGE071
Replacing the current monitoring data
Figure DEST_PATH_IMAGE072
And training the reinforcement learning-based control model until the average reward of the reinforcement learning-based control model is not increased any more, and determining the trained reinforcement learning-based control model.
The timing differential loss function is:
Figure 155095DEST_PATH_IMAGE031
wherein the time sequence difference loss function is an iterative optimization function based on Q learning,
Figure 336241DEST_PATH_IMAGE032
to the cumulative discount value, set to 0.95;ssystem state quantity and target current state output quantity at current moment
Figure 785677DEST_PATH_IMAGE033
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure 217795DEST_PATH_IMAGE034
Figure 170708DEST_PATH_IMAGE035
Control instructions for sampling
Figure 531544DEST_PATH_IMAGE036
Figure 202697DEST_PATH_IMAGE037
Is at the same times'The control input values that are available for selection in a state,
Figure 172927DEST_PATH_IMAGE038
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure 245925DEST_PATH_IMAGE039
indicating the system state assThe control command is executed as
Figure 774733DEST_PATH_IMAGE035
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure 933182DEST_PATH_IMAGE040
indicating the system state ass'The control command is executed as
Figure 175945DEST_PATH_IMAGE037
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 369029DEST_PATH_IMAGE035
Evolved intos'Assume that the control command is executed as
Figure 71668DEST_PATH_IMAGE037
In the case of (3), long-term gains that the control strategy can achieve in the future. Using iterative bellman equations, using collected system state evolution data, i.e. system statessIs controlled by
Figure 717413DEST_PATH_IMAGE035
Evolved intos'The obtained single-step control benefit isrThereby outputting the value to the network
Figure 29445DEST_PATH_IMAGE041
Optimizing to obtain
Figure 811456DEST_PATH_IMAGE042
In practical application, a control model based on reinforcement learning can be trained by using a prediction simulation model constructed by a plurality of LightGBM models, and the specific steps are as follows:
constructing a deep neural network-based reinforcement learning model, and inputting the model into the current system state quantity as shown in FIG. 4
Figure 183532DEST_PATH_IMAGE030
And setting a control target value human (goal), outputting a predicted reward for each action
Figure 820967DEST_PATH_IMAGE073
Wherein
Figure DEST_PATH_IMAGE074
The control command is the ith control command in the control command set Actions, i belongs to n, n is the control command serial number, and the set control target value human (goal) refers to the target value of the goal set by the human.
The network consists of a full connection layer, a RELU nonlinear activation layer, a noise linear layer and a softmax normalization layer. A state value branch V and an action dominance value estimation branch A are respectively introduced into the network, and the accuracy of action value estimation can be improved through experimental verification of the network design.
And (3) rewarding of the reinforcement learning model is defined in sections according to the difference between the predicted value of the target parameter and the artificially set value, wherein the difference value is e:
Figure 733429DEST_PATH_IMAGE075
the calculation method of the prize is as shown in table 1.
TABLE 1 reward definition Table
Figure DEST_PATH_IMAGE076
Reward for
Figure 933728DEST_PATH_IMAGE077
10
Figure DEST_PATH_IMAGE078
6
Figure 539022DEST_PATH_IMAGE079
2
Figure DEST_PATH_IMAGE080
0
The segmentation criteria of the difference values are related to the industrial scenario applied by the embodiment, and the parameters have no universality in different industrial scenarios.
Training a reinforcement learning network: the reinforcement learning takes the fragments as a unit, and meanwhile, in order to accelerate the training speed of the model, a parallelization technology is adopted to enable the model to process and learn a plurality of fragments simultaneously, the parallelization quantity is represented by batch _ size =32, and the specific training process is as follows:
randomly fetching batch _ size current monitoring data from actual production data
Figure 454632DEST_PATH_IMAGE030
And the state parameter used for representing the current production situation is used as the starting state of each training segment. Setting an artificially set control target valuehuman(gold), the present invention is exemplified by an industrial thickener underflow concentration control, set to 67.
For each time state, the reinforcement learning network inputs the state parameters
Figure 639626DEST_PATH_IMAGE030
And a control target valuehuman(coarse), output the size of batch _ size as the number of cluster centerskEach value in the vector represents a long-term discount yield in the future brought by selecting one of the control inputs. At the moment, the yield value is converted into the probability distribution of action selection by using a softmax function, a control command s' (control) is sampled from the probability distribution, and the system state at the next moment is predicted by adopting a prediction simulation model.
Will be provided with
Figure 396229DEST_PATH_IMAGE030
S '(control) is used as an input of the obtained prediction simulation model to predict the system state quantity and the target quantity s' (state, good) at the next time.
According to artificially set target valueshuman(gold) and predicted s' (gold) computational decision rewardsrAnd based on the target current state output quantity s (state, good), the target predicted state output quantity s' (state, good), the reward
Figure DEST_PATH_IMAGE082
And a control input s' (control), training control model parameters by using a Q-Learning based time sequence difference loss function so as to enable the reinforcement Learning model to be given
Figure 408310DEST_PATH_IMAGE030
Lower, output awardsrAs large as s' (control) as possible. The timing difference loss function is expressed as:
Figure 781522DEST_PATH_IMAGE083
(6)
wherein the content of the first and second substances,
Figure 973469DEST_PATH_IMAGE084
to the cumulative discount value, set to 0.95;ssystem state quantity and target current state output quantity at current moment
Figure 584579DEST_PATH_IMAGE085
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure 251051DEST_PATH_IMAGE086
Figure 580401DEST_PATH_IMAGE087
Control instructions for sampling
Figure 310460DEST_PATH_IMAGE088
Figure 41655DEST_PATH_IMAGE089
Is at the same times'The control input values that are available for selection in a state,
Figure 864380DEST_PATH_IMAGE038
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure 415447DEST_PATH_IMAGE090
indicating the system state assThe control command is executed as
Figure 214776DEST_PATH_IMAGE087
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure 800478DEST_PATH_IMAGE091
indicating the system state ass'The control command is executed as
Figure 27060DEST_PATH_IMAGE089
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 829537DEST_PATH_IMAGE087
Evolved intos'Assume that the control command is executed as
Figure 166978DEST_PATH_IMAGE089
In the case of (3), long-term gains that the control strategy can achieve in the future. Using iterative bellman equations, using collected system state evolution data, i.e. system statessIs controlled by
Figure 607186DEST_PATH_IMAGE087
Evolved intos'The obtained single-step control benefit isrThereby outputting the value to the network
Figure 270249DEST_PATH_IMAGE092
Optimizing to obtain
Figure 297373DEST_PATH_IMAGE093
And (4) replacing s (control, state, env, good) with s' (control, state, env, good), and repeatedly training the control model based on reinforcement learning, wherein in the training process of the control model, the average reward obtained by the model in continuous 50 iterations is not increased, which indicates that the model parameters reach a convergence state.
Step 105: and acquiring current monitoring data.
Step 106: and inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system, and outputting the optimal set target of the industrial system.
And deploying the trained reinforcement learning model on a DCS engineer station or a high-performance computing server of an industrial field, and deploying a model inference program into a Web service supporting access of a RestFul protocol.
And accessing industrial system state quantities s (control, state, env, good) such as a sensor monitoring value, a controllable unit state value, an external environment quantity and the like at regular time according to a corresponding data acquisition interval during control model training by using an industrial control system data acquisition protocol, such as an OPC UA protocol.
Inputting s (control, state, env, good) and artificial set value into control model, selecting from solution resultkAnd (4) a command with the largest future potential income estimation value in the candidate actions is written into the control system by utilizing an industrial control protocol to complete control.
Fig. 5 is a structural diagram of a model-free adaptive control system of an industrial system according to the present invention, and as shown in fig. 5, a model-free adaptive control system of an industrial system includes:
a historical monitoring data acquisition module 501, configured to acquire historical monitoring data of various devices in an industrial process; the historical monitoring data comprises controllable data, state data, environmental noise data and target output data; the controllable data comprises the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter and the rotating speed of a pump; the state class data comprises pipeline pressure in industrial production; the environmental noise data comprises product information, temperature and humidity of the previous process; the target output class data includes objects controlled in the production process.
A control instruction set generating module 502, configured to generate a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment.
The control instruction set generating module 502 specifically includes: a parameter definition unit for defining a piece of monitoring data, wherein the monitoring data is historical monitoring data S or current monitoring data
Figure 704084DEST_PATH_IMAGE001
Figure 998799DEST_PATH_IMAGE002
For any one of the monitored data, the controllable variable of the controllable class data,
Figure 567183DEST_PATH_IMAGE003
for the system state quantity of the state class data in any one piece of the monitoring data,
Figure 350112DEST_PATH_IMAGE004
for the amount of ambient noise of the ambient noise-like data in any of the monitoring data,
Figure 294935DEST_PATH_IMAGE005
the target output quantity of the target output class data in any piece of monitoring data is S, the historical monitoring data of a continuous time period,
Figure 444156DEST_PATH_IMAGE006
the size of a historical monitoring data set is shown, a control is controllable data, a state is state data, env is environmental noise data, and a goal is target output data; a control instruction generation unit for generating a controllable variable from the historical monitoring data S
Figure 183442DEST_PATH_IMAGE046
Collecting and generating
Figure 418114DEST_PATH_IMAGE006
A bar control instruction; a control instruction set generation unit for narrowing down by clustering
Figure 668092DEST_PATH_IMAGE006
Determining the optimal clustering center number by using Bayesian information criterion according to the scale of the control instructionkAnd all cluster centers in each cluster
Figure 937400DEST_PATH_IMAGE008
As an action command of the reinforcement learning-based control model, a control command set is generated.
And a predictive simulation model constructing module 503, configured to construct a predictive simulation model according to the historical monitoring data.
The prediction simulation model building module 503 specifically includes: a plurality of prediction model construction units for constructing a plurality of prediction models to predict the system state quantity and the target predicted state output quantity at the next time
Figure 113166DEST_PATH_IMAGE009
Each variable in (a) is independently predicted; for the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and l2 regular terms are adopted to reduce overfitting; the dividing unit is used for dividing the historical monitoring data into 7: 3; wherein 30% of the historical monitoring data is used as a validation set for determining the hyper-parameters of the optimal prediction model; a prediction simulation model construction unit for constructing a prediction simulation model based on the controllable variables and the amount of the environmental noise
Figure 100714DEST_PATH_IMAGE048
And the system state quantity and the target current state output quantity in the historical monitoring data
Figure 885874DEST_PATH_IMAGE011
And integrating a plurality of the prediction models to construct a prediction simulation model.
A trained reinforcement learning based control model determining module 504, configured to train a reinforcement learning based control model according to the predictive simulation model based on the control instruction set, and generate the trained reinforcement learning based control model.
The trained control model determination module 504 based on reinforcement learning specifically includes: a reinforcement learning based control model construction unit for constructing a reinforcement learning based control model and acquiring the current monitoringData of
Figure 9688DEST_PATH_IMAGE066
Setting a control target value
Figure 418672DEST_PATH_IMAGE067
And the volume of the environmental noise at the next moment in the historical monitoring data
Figure 863822DEST_PATH_IMAGE068
(ii) a A control instruction sampling unit for sampling the current monitoring data
Figure 954138DEST_PATH_IMAGE052
And setting a control target value
Figure 932458DEST_PATH_IMAGE022
Input to the reinforcement learning-based control model, and output
Figure 450027DEST_PATH_IMAGE017
The profit value of each control instruction is used as probability weight for sampling, and one control instruction in the control instruction set is sampled
Figure 904843DEST_PATH_IMAGE018
(ii) a A prediction unit for predicting the current monitoring data
Figure 267691DEST_PATH_IMAGE019
And the control instruction
Figure 366097DEST_PATH_IMAGE054
Predicting the system state quantity and the target predicted state output quantity at the next moment by using the prediction simulation model
Figure 788988DEST_PATH_IMAGE021
(ii) a A decision reward calculation unit for controlling the target value according to the setting
Figure 707266DEST_PATH_IMAGE070
And target output quantity at next time
Figure 640849DEST_PATH_IMAGE023
Calculating decision rewardsrA training unit for rewarding based on the decisionrThe current monitoring data
Figure 593761DEST_PATH_IMAGE024
The control instruction
Figure 453133DEST_PATH_IMAGE025
And the system state quantity and the target predicted state output quantity at the next moment
Figure 858707DEST_PATH_IMAGE026
Training the reinforcement Learning-based control model with a Q-Learning-based time sequence difference loss function to enable the reinforcement Learning-based control model to monitor the current monitoring data
Figure 563357DEST_PATH_IMAGE027
Then, a control instruction for maximizing the future accumulated award is output
Figure 869312DEST_PATH_IMAGE028
(ii) a A control model determining unit based on reinforcement learning after training, which is used for determining the monitoring data of the next moment
Figure 165164DEST_PATH_IMAGE029
Replacing the current monitoring data
Figure 58033DEST_PATH_IMAGE030
And training the reinforcement learning-based control model until the average reward of the reinforcement learning-based control model is not increased any more, and determining the trained reinforcement learning-based control model.
The timing differential loss function is:
Figure 831954DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 260924DEST_PATH_IMAGE032
to the cumulative discount value, set to 0.95;ssystem state quantity and target current state output quantity at current moment
Figure 462098DEST_PATH_IMAGE033
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure 576685DEST_PATH_IMAGE094
Figure 154296DEST_PATH_IMAGE035
Control instructions for sampling
Figure 936308DEST_PATH_IMAGE036
Figure 547198DEST_PATH_IMAGE037
Is at the same times'The control input values that are available for selection in a state,
Figure 414660DEST_PATH_IMAGE038
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure 264805DEST_PATH_IMAGE039
indicating the system state assThe control command is executed as
Figure 166902DEST_PATH_IMAGE035
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure 444299DEST_PATH_IMAGE040
indicating the system state ass'The control command is executed as
Figure 566101DEST_PATH_IMAGE037
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 954357DEST_PATH_IMAGE035
Evolved intos'The obtained single-step control benefit isrAssume that the control command is executed as
Figure 710961DEST_PATH_IMAGE037
In the case of (3), long-term gains that the control strategy can achieve in the future. Using iterative bellman equations, using collected system state evolution data, i.e. system statessIs controlled by
Figure 159259DEST_PATH_IMAGE035
Evolved intos'The obtained single-step control benefit isrThereby outputting the value to the network
Figure 499849DEST_PATH_IMAGE041
Optimizing to obtain
Figure 957375DEST_PATH_IMAGE042
And a current monitoring data obtaining module 505, configured to obtain current monitoring data.
And the adaptive control module 506 is used for inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system and outputting the optimal set target of the industrial system.
Aiming at the limitation that the traditional intelligent control technology is only suitable for simple industrial environment, the invention provides a control method based on the combination of machine learning and reinforcement learning. The method can excavate objective rules of the production environment from the monitoring data by utilizing the strong self-learning capability and generalization capability of the method, and converts the objective rules into an intelligent control strategy with good control precision, thereby being capable of separating from the human intervention of field experts and control experts.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A model-free adaptive control method for an industrial system, comprising:
acquiring historical monitoring data of various devices in an industrial process; the historical monitoring data comprises controllable data, state data, environmental noise data and target output data; the controllable data comprises the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter and the rotating speed of a pump; the state class data comprises pipeline pressure in industrial production; the environmental noise data comprises product information, temperature and humidity of the previous process; the target output class data comprises an object controlled in the production process;
generating a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment;
constructing a prediction simulation model according to the historical monitoring data;
the building of the prediction simulation model according to the historical monitoring data specifically comprises the following steps:
constructing a plurality of prediction models to predict the system state quantity and the target prediction state output quantity at the next moment
Figure DEST_PATH_IMAGE001
Each variable in (a) is independently predicted; for the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and l2 regular terms are adopted to reduce overfitting;
dividing the historical monitoring data into 7: 3; wherein 30% of the historical monitoring data is used as a validation set for determining the hyper-parameters of the optimal prediction model;
according to the controllable variable given by the controller and the volume of the environmental noise
Figure DEST_PATH_IMAGE002
And the system state quantity and the target current state output quantity in the historical monitoring data
Figure DEST_PATH_IMAGE003
Integrating a plurality of the prediction models to construct a prediction simulation model;
training a reinforcement learning-based control model according to the prediction simulation model based on the control instruction set to generate a trained reinforcement learning-based control model;
acquiring current monitoring data;
and inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system, and outputting the optimal set target of the industrial system.
2. The model-free adaptive control method for industrial systems according to claim 1, wherein the generating a set of control instructions using the controllable class data comprises:
defining a piece of monitoring data, wherein the monitoring data is historical monitoring data S or current monitoring data
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
For any one of the monitored data, the controllable variable of the controllable class data,
Figure DEST_PATH_IMAGE006
for the system state quantity of the state class data in any one piece of the monitoring data,
Figure DEST_PATH_IMAGE007
for the amount of ambient noise of the ambient noise-like data in any of the monitoring data,
Figure DEST_PATH_IMAGE008
the target output quantity of the target output class data in any piece of monitoring data is S, the historical monitoring data of a continuous time period,
Figure DEST_PATH_IMAGE009
the size of a historical monitoring data set is shown, a control is controllable data, a state is state data, env is environmental noise data, and a goal is target output data;
to the controlled variable from the historical monitoring data S
Figure 913403DEST_PATH_IMAGE005
Collecting and generating
Figure 112434DEST_PATH_IMAGE009
A bar control instruction;
shrinking by clustering
Figure 973729DEST_PATH_IMAGE009
Determining the optimal clustering center number by using Bayesian information criterion according to the scale of the control instructionkAnd all cluster centers in each cluster
Figure DEST_PATH_IMAGE010
As an action command of the reinforcement learning-based control model, a control command set is generated.
3. The model-free adaptive control method for the industrial system according to claim 2, wherein the training the reinforcement learning-based control model according to the predictive simulation model based on the control instruction set to generate the trained reinforcement learning-based control model specifically comprises:
constructing a control model based on reinforcement learning and acquiring the current monitoring data
Figure DEST_PATH_IMAGE011
Setting a control target value
Figure DEST_PATH_IMAGE012
And the volume of the environmental noise at the next moment in the historical monitoring data
Figure DEST_PATH_IMAGE013
The current monitoring data is processed
Figure DEST_PATH_IMAGE014
And setting a control target value
Figure DEST_PATH_IMAGE015
Input to the reinforcement learning-based control model, and output
Figure DEST_PATH_IMAGE016
The profit value of each control instruction is used as probability weight for sampling, and one control instruction in the control instruction set is sampled
Figure DEST_PATH_IMAGE017
According to the current monitoring data
Figure DEST_PATH_IMAGE018
And the control instruction
Figure DEST_PATH_IMAGE019
Predicting the system state quantity and the target predicted state output quantity at the next moment by using the prediction simulation model
Figure DEST_PATH_IMAGE020
According to the set control target value
Figure 350964DEST_PATH_IMAGE015
And target output quantity at next time
Figure DEST_PATH_IMAGE021
Calculating decision rewardsr
Reward based on the decisionrThe current monitoring data
Figure DEST_PATH_IMAGE022
The control instruction
Figure DEST_PATH_IMAGE023
And the system state quantity and the target predicted state output quantity at the next moment
Figure DEST_PATH_IMAGE024
Training the reinforcement Learning-based control model with a Q-Learning-based time sequence difference loss function to enable the reinforcement Learning-based control model to monitor the current monitoring data
Figure DEST_PATH_IMAGE025
Then, a control instruction for maximizing the future accumulated award is output
Figure DEST_PATH_IMAGE026
Replacing the current monitoring data s (control, state, env, good) with the monitoring data s' (control, state, env, good) at the next moment, training the reinforcement learning-based control model until the average reward of the reinforcement learning-based control model is not increased any more, and determining the trained reinforcement learning-based control model.
4. The model-free adaptive control method for industrial systems according to claim 3, wherein the timing differential loss function is:
Figure DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE028
to a cumulative discount value;ssystem state quantity and target current state output quantity at current moment
Figure DEST_PATH_IMAGE029
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure DEST_PATH_IMAGE030
Figure DEST_PATH_IMAGE031
Control instructions for sampling
Figure DEST_PATH_IMAGE032
Figure DEST_PATH_IMAGE033
Is at the same times'The control input values that are available for selection in a state,
Figure DEST_PATH_IMAGE034
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure DEST_PATH_IMAGE035
indicating the system state assThe control command is executed as
Figure 300422DEST_PATH_IMAGE031
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure DEST_PATH_IMAGE036
indicating the system state ass'The control command is executed as
Figure 673284DEST_PATH_IMAGE033
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 18946DEST_PATH_IMAGE031
Evolved intos'The obtained single-step control benefit isrTo the network output value
Figure DEST_PATH_IMAGE037
Optimizing to obtain the optimized result of the time sequence difference loss function
Figure DEST_PATH_IMAGE038
5. A model-free adaptive control system for an industrial system, comprising:
the historical monitoring data acquisition module is used for acquiring historical monitoring data of various devices in the industrial process; the historical monitoring data comprises controllable data, state data, environmental noise data and target output data; the controllable data comprises the opening degree of a flow valve, the opening degree of a valve, the rotating speed of a frequency converter and the rotating speed of a pump; the state class data comprises pipeline pressure in industrial production; the environmental noise data comprises product information, temperature and humidity of the previous process; the target output class data comprises an object controlled in the production process;
the control instruction set generating module is used for generating a control instruction set by using the controllable class data; the control instruction set comprises a plurality of control instructions generated at the next moment;
the prediction simulation model building module is used for building a prediction simulation model according to the historical monitoring data;
the prediction simulation model building module specifically comprises:
a plurality of prediction model construction units for constructing a plurality of prediction models to predict the system state quantity and the target predicted state output quantity at the next time
Figure 637753DEST_PATH_IMAGE001
Each variable in (a) is independently predicted; for the prediction of each univariate, a LightGBM algorithm is adopted to construct a prediction model, the maximum number of leaves num _ leaves is 10, the learning rate is 0.8, the feature screening proportion feature _ fraction is 0.9, and l2 regular terms are adopted to reduce overfitting;
the dividing unit is used for dividing the historical monitoring data into 7: 3; wherein 30% of the historical monitoring data is used as a validation set for determining the hyper-parameters of the optimal prediction model;
a prediction simulation model construction unit for constructing a prediction simulation model based on the controllable variables and the amount of the environmental noise
Figure 181998DEST_PATH_IMAGE002
And the system state quantity and the target current state output quantity in the historical monitoring data
Figure 50728DEST_PATH_IMAGE003
Integrating a plurality of the prediction models to construct a prediction simulation model;
the trained reinforcement learning-based control model determining module is used for training a reinforcement learning-based control model according to the prediction simulation model based on the control instruction set to generate the trained reinforcement learning-based control model;
the current monitoring data acquisition module is used for acquiring current monitoring data;
and the self-adaptive control module is used for inputting the current monitoring data into the trained control model based on reinforcement learning, adaptively controlling the production process of the industrial system and outputting the optimal set target of the industrial system.
6. The model-free adaptive control system for industrial systems according to claim 5, wherein the control instruction set generation module specifically comprises:
a parameter definition unit for defining a piece of monitoring data, wherein the monitoring data is historical monitoring data S or current monitoring data
Figure 414844DEST_PATH_IMAGE004
Figure 41871DEST_PATH_IMAGE005
For any one of the monitored data, the controllable variable of the controllable class data,
Figure 440622DEST_PATH_IMAGE006
for the system state quantity of the state class data in any one piece of the monitoring data,
Figure 745833DEST_PATH_IMAGE007
for the amount of ambient noise of the ambient noise-like data in any of the monitoring data,
Figure 534928DEST_PATH_IMAGE008
the target output quantity of the target output class data in any piece of monitoring data is S, the historical monitoring data of a continuous time period,
Figure 682488DEST_PATH_IMAGE009
the size of a historical monitoring data set is shown, a control is controllable data, a state is state data, env is environmental noise data, and a goal is target output data;
a control instruction generation unit for generating a controllable variable from the historical monitoring data S
Figure DEST_PATH_IMAGE039
Collecting and generating
Figure 76692DEST_PATH_IMAGE009
A bar control instruction;
a control instruction set generation unit for narrowing down by clustering
Figure 287224DEST_PATH_IMAGE009
Determining the optimal clustering center number by using Bayesian information criterion according to the scale of the control instructionkAnd all cluster centers in each cluster
Figure 894442DEST_PATH_IMAGE010
As an action command of the reinforcement learning-based control model, a control command set is generated.
7. The model-free adaptive control system for industrial systems according to claim 6, wherein the trained reinforcement learning-based control model determination module specifically comprises:
a reinforcement learning based control model construction unit for constructing a reinforcement learning based control model and acquiring the current monitoring data
Figure 583043DEST_PATH_IMAGE011
Setting a control target value
Figure 956387DEST_PATH_IMAGE012
And the historyMonitoring the amount of ambient noise at the next moment in the data
Figure 603400DEST_PATH_IMAGE013
A control instruction sampling unit for sampling the current monitoring data
Figure 223213DEST_PATH_IMAGE014
And setting a control target value
Figure 981084DEST_PATH_IMAGE015
Input to the reinforcement learning-based control model, and output
Figure 208934DEST_PATH_IMAGE016
The profit value of each control instruction is used as probability weight for sampling, and one control instruction in the control instruction set is sampled
Figure 26849DEST_PATH_IMAGE017
A prediction unit for predicting the current monitoring data
Figure 65781DEST_PATH_IMAGE018
And the control instruction
Figure 361765DEST_PATH_IMAGE019
Predicting the system state quantity and the target predicted state output quantity at the next moment by using the prediction simulation model
Figure 381804DEST_PATH_IMAGE020
A decision reward calculation unit for controlling the target value according to the setting
Figure 367690DEST_PATH_IMAGE015
And target output quantity at next time
Figure 168287DEST_PATH_IMAGE021
Calculating decision rewardsr
A training unit for rewarding based on the decisionrThe current monitoring data
Figure 2382DEST_PATH_IMAGE022
The control instruction
Figure 939245DEST_PATH_IMAGE023
And the system state quantity and the target predicted state output quantity at the next moment
Figure 898630DEST_PATH_IMAGE024
Training the reinforcement Learning-based control model with a Q-Learning-based time sequence difference loss function to enable the reinforcement Learning-based control model to monitor the current monitoring data
Figure 186523DEST_PATH_IMAGE025
Then, a control instruction for maximizing the future accumulated award is output
Figure 89888DEST_PATH_IMAGE026
A control model determining unit based on reinforcement learning after training, which is used for determining the monitoring data of the next moment
Figure DEST_PATH_IMAGE040
Replacing the current monitoring data
Figure DEST_PATH_IMAGE041
And training the reinforcement learning-based control model until the average reward of the reinforcement learning-based control model is not increased any more, and determining the trained reinforcement learning-based control model.
8. The industrial system model-free adaptive control system of claim 7, wherein the timing differential loss function is:
Figure DEST_PATH_IMAGE042
wherein the content of the first and second substances,
Figure 504426DEST_PATH_IMAGE028
to a cumulative discount value;ssystem state quantity and target current state output quantity at current moment
Figure 569466DEST_PATH_IMAGE029
s'Predicting the state output quantity for the system state quantity and the target at the next moment
Figure 31407DEST_PATH_IMAGE030
Figure 472884DEST_PATH_IMAGE031
Control instructions for sampling
Figure 384339DEST_PATH_IMAGE032
Figure 695978DEST_PATH_IMAGE033
Is at the same times'The control input values that are available for selection in a state,
Figure 489622DEST_PATH_IMAGE034
a learning rate for the reinforcement learning-based control model;Qin order to enhance the learning of the network,
Figure 734790DEST_PATH_IMAGE035
indicating the system state assThe control command is executed as
Figure 503681DEST_PATH_IMAGE031
Under the condition of (1), controlling the optimal long-term income obtained by the strategy in the future;
Figure 176102DEST_PATH_IMAGE036
indicating the system state ass'The control command is executed as
Figure 394725DEST_PATH_IMAGE033
Under the circumstances of (1), long-term gains obtained in the future by the control strategy; system statesIs controlled by
Figure 443583DEST_PATH_IMAGE031
Evolved intos'The obtained single-step control benefit isrTo the network output value
Figure 326701DEST_PATH_IMAGE037
Optimizing to obtain the optimized result of the time sequence difference loss function
Figure 842127DEST_PATH_IMAGE038
CN202110877921.6A 2021-08-02 2021-08-02 Model-free adaptive control method and system for industrial system Active CN113325721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110877921.6A CN113325721B (en) 2021-08-02 2021-08-02 Model-free adaptive control method and system for industrial system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110877921.6A CN113325721B (en) 2021-08-02 2021-08-02 Model-free adaptive control method and system for industrial system

Publications (2)

Publication Number Publication Date
CN113325721A CN113325721A (en) 2021-08-31
CN113325721B true CN113325721B (en) 2021-11-05

Family

ID=77426815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110877921.6A Active CN113325721B (en) 2021-08-02 2021-08-02 Model-free adaptive control method and system for industrial system

Country Status (1)

Country Link
CN (1) CN113325721B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428462B (en) * 2022-04-06 2022-06-24 蘑菇物联技术(深圳)有限公司 Method, equipment and medium for dynamically controlling controlled system based on MPC algorithm
CN117252111B (en) * 2023-11-15 2024-02-23 中国电建集团贵阳勘测设计研究院有限公司 Active monitoring method for hidden danger and dangerous case area of dyke
CN117331339B (en) * 2023-12-01 2024-02-06 南京华视智能科技股份有限公司 Coating machine die head motor control method and device based on time sequence neural network model
CN117473514B (en) * 2023-12-28 2024-03-15 华东交通大学 Intelligent operation and maintenance method and system of industrial control system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008647B (en) * 2014-06-12 2016-02-10 北京航空航天大学 A kind of road traffic energy consumption quantization method based on motor-driven vehicle going pattern
CN109871010B (en) * 2018-12-25 2022-03-22 南方科技大学 Method and system based on reinforcement learning
JP7225923B2 (en) * 2019-03-04 2023-02-21 富士通株式会社 Reinforcement learning method, reinforcement learning program, and reinforcement learning system
CN109947567B (en) * 2019-03-14 2021-07-20 深圳先进技术研究院 Multi-agent reinforcement learning scheduling method and system and electronic equipment
CN110187727B (en) * 2019-06-17 2021-08-03 武汉理工大学 Glass melting furnace temperature control method based on deep learning and reinforcement learning
CN113126576B (en) * 2019-12-31 2022-07-29 北京国双科技有限公司 Energy consumption optimization model construction method for gathering and transportation system and energy consumption control method for gathering and transportation system
CN111505943B (en) * 2020-06-03 2022-08-16 国电科学技术研究院有限公司 Steam turbine flow characteristic optimization method based on full-stroke modeling

Also Published As

Publication number Publication date
CN113325721A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN113325721B (en) Model-free adaptive control method and system for industrial system
CN116757534B (en) Intelligent refrigerator reliability analysis method based on neural training network
Lindemann et al. Anomaly detection and prediction in discrete manufacturing based on cooperative LSTM networks
CN109992921B (en) On-line soft measurement method and system for thermal efficiency of boiler of coal-fired power plant
Zhang et al. Automatic deep extraction of robust dynamic features for industrial big data modeling and soft sensor application
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN114282443B (en) Residual service life prediction method based on MLP-LSTM supervised joint model
Tian et al. Time-delay compensation method for networked control system based on time-delay prediction and implicit PIGPC
CN113219871B (en) Curing room environmental parameter detecting system
CN115271186B (en) Reservoir water level prediction and early warning method based on delay factor and PSO RNN Attention model
CN114218872A (en) Method for predicting remaining service life based on DBN-LSTM semi-supervised joint model
CN114819102A (en) GRU-based air conditioning equipment fault diagnosis method
CN112735541A (en) Sewage treatment water quality prediction method based on simple circulation unit neural network
CN111160659A (en) Power load prediction method considering temperature fuzzification
CN114119273A (en) Park comprehensive energy system non-invasive load decomposition method and system
CN115204491A (en) Production line working condition prediction method and system based on digital twinning and LSTM
CN115128978A (en) Internet of things environment big data detection and intelligent monitoring system
CN115062528A (en) Prediction method for industrial process time sequence data
CN113705897A (en) Product quality prediction method and system for industrial copper foil production
CN117668743A (en) Time sequence data prediction method of association time-space relation
CN116305985A (en) Local intelligent ventilation method based on multi-sensor data fusion
JPH11296204A (en) Multivariable process control system
CN114415503B (en) Temperature big data internet of things detection and intelligent control system
CN115879369A (en) Coal mill fault early warning method based on optimized LightGBM algorithm
CN114995248A (en) Intelligent maintenance and environmental parameter big data internet of things system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant