EP3872432A1 - Method, apparatus and electronic device for constructing reinforcement learning model - Google Patents
Method, apparatus and electronic device for constructing reinforcement learning model Download PDFInfo
- Publication number
- EP3872432A1 EP3872432A1 EP21164660.9A EP21164660A EP3872432A1 EP 3872432 A1 EP3872432 A1 EP 3872432A1 EP 21164660 A EP21164660 A EP 21164660A EP 3872432 A1 EP3872432 A1 EP 3872432A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- feed amount
- coal feed
- calciner
- reinforcement learning
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 149
- 238000000034 method Methods 0.000 title claims abstract description 74
- 239000003245 coal Substances 0.000 claims abstract description 221
- 238000004088 simulation Methods 0.000 claims abstract description 116
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 claims abstract description 78
- 229910052791 calcium Inorganic materials 0.000 claims abstract description 78
- 239000011575 calcium Substances 0.000 claims abstract description 78
- 239000000779 smoke Substances 0.000 claims abstract description 57
- 238000003860 storage Methods 0.000 claims abstract description 17
- 238000010276 construction Methods 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 20
- 230000009471 action Effects 0.000 claims description 19
- 238000012840 feeding operation Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 abstract description 11
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000001354 calcination Methods 0.000 description 36
- 239000004568 cement Substances 0.000 description 29
- 230000008569 process Effects 0.000 description 22
- 238000004891 communication Methods 0.000 description 7
- 238000004519 manufacturing process Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 239000002994 raw material Substances 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 239000012467 final product Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 239000000047 product Substances 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000001816 cooling Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 102100040653 Tryptophan 2,3-dioxygenase Human genes 0.000 description 1
- 101710136122 Tryptophan 2,3-dioxygenase Proteins 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229910002092 carbon dioxide Inorganic materials 0.000 description 1
- 239000001569 carbon dioxide Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000000227 grinding Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C04—CEMENTS; CONCRETE; ARTIFICIAL STONE; CERAMICS; REFRACTORIES
- C04B—LIME, MAGNESIA; SLAG; CEMENTS; COMPOSITIONS THEREOF, e.g. MORTARS, CONCRETE OR LIKE BUILDING MATERIALS; ARTIFICIAL STONE; CERAMICS; REFRACTORIES; TREATMENT OF NATURAL STONE
- C04B7/00—Hydraulic cements
- C04B7/36—Manufacture of hydraulic cements in general
- C04B7/361—Condition or time responsive control in hydraulic cement manufacturing processes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F27—FURNACES; KILNS; OVENS; RETORTS
- F27B—FURNACES, KILNS, OVENS, OR RETORTS IN GENERAL; OPEN SINTERING OR LIKE APPARATUS
- F27B7/00—Rotary-drum furnaces, i.e. horizontal or slightly inclined
- F27B7/20—Details, accessories, or equipment peculiar to rotary-drum furnaces
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F27—FURNACES; KILNS; OVENS; RETORTS
- F27B—FURNACES, KILNS, OVENS, OR RETORTS IN GENERAL; OPEN SINTERING OR LIKE APPARATUS
- F27B7/00—Rotary-drum furnaces, i.e. horizontal or slightly inclined
- F27B7/20—Details, accessories, or equipment peculiar to rotary-drum furnaces
- F27B7/42—Arrangement of controlling, monitoring, alarm or like devices
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F27—FURNACES; KILNS; OVENS; RETORTS
- F27D—DETAILS OR ACCESSORIES OF FURNACES, KILNS, OVENS, OR RETORTS, IN SO FAR AS THEY ARE OF KINDS OCCURRING IN MORE THAN ONE KIND OF FURNACE
- F27D19/00—Arrangements of controlling devices
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F27—FURNACES; KILNS; OVENS; RETORTS
- F27D—DETAILS OR ACCESSORIES OF FURNACES, KILNS, OVENS, OR RETORTS, IN SO FAR AS THEY ARE OF KINDS OCCURRING IN MORE THAN ONE KIND OF FURNACE
- F27D21/00—Arrangements of monitoring devices; Arrangements of safety devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- F—MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
- F27—FURNACES; KILNS; OVENS; RETORTS
- F27D—DETAILS OR ACCESSORIES OF FURNACES, KILNS, OVENS, OR RETORTS, IN SO FAR AS THEY ARE OF KINDS OCCURRING IN MORE THAN ONE KIND OF FURNACE
- F27D19/00—Arrangements of controlling devices
- F27D2019/0096—Arrangements of controlling devices involving simulation means, e.g. of the treating or charging step
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/08—Thermal analysis or thermal optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/14—Force analysis or force optimisation, e.g. static or dynamic forces
Definitions
- the present disclosure relates to the field of data processing technology, particularly to the field of big data and deep learning technology, and more particularly to a method, apparatus and electronic device for constructing a reinforcement learning model, and relates to a computer readable storage medium.
- Embodiments of the present disclosure propose a method, apparatus and electronic device for constructing a reinforcement learning model.
- the embodiments also refers to a computer readable storage medium.
- embodiments of the present disclosure provide a method for constructing a reinforcement learning model, comprising: establishing a first simulation model between a calciner coal feed amount and a calciner temperature; establishing a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature; establishing a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; and constructing a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount comprising the calciner coal feed amount and the kiln head coal feed amount.
- embodiments of the present disclosure provide an apparatus for constructing a reinforcement learning model, comprising: a first simulation model establishing unit, configured to establish a first simulation model between a calciner coal feed amount and a calciner temperature; a second simulation model establishing unit, configured to establish a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature; a prediction model establishing unit, configured to establish a prediction model among an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; and a reinforcement learning model construction unit, configure to construct a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount comprising the calciner coal feed amount and the kiln
- embodiments of the present disclosure provide an electronic device, comprising: one or more processors; and a storage apparatus, storing one or more programs thereon, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method provided by the first aspect.
- embodiments of the present disclosure provide a computer-readable medium, storing a computer program thereon, wherein the program, when executed by a processor, causes the processor to implement the method provided by the first aspect.
- embodiments of the present disclosure provide a computer program product including a computer program, where the computer program, when executed by a processing apparatus, implements the method provided by the first aspect.
- the method, apparatus for constructing a reinforcement learning model, electronic device, and computer readable storage medium provided by the embodiments of the present disclosure, first establishing the first simulation model between the calciner coal feed amount and the calciner temperature, and establishing the second simulation model between the kiln head coal feed amount and the kiln current, the secondary air temperature, and the smoke chamber temperature; then establishing the prediction model between the under-grate pressure, the calciner temperature output by the first simulation model, and the kiln current, the secondary air temperature, the smoke chamber temperature output by the second simulation model, and the free calcium content; and finally constructing the reinforcement learning model that represents the association between the coal feed amount and the free calcium content according to the preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model, the coal feed amount including the calciner coal feed amount and the kiln head coal feed amount.
- the present disclosure introduces the concept of reinforcement learning into a cement calcination scenario.
- the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount and the free calcium content of a final product under the influence of a plurality of parameters is constructed.
- the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to other similar scenarios.
- Fig. 1 shows an exemplary system architecture 100 to which the embodiments of the method, apparatus and electronic device for constructing a reinforcement learning model, and computer readable storage medium of the present disclosure may be applied.
- the system architecture 100 may include sensors 101, 102, and 103, a network 104, a server 105, and a coal feeding device 106.
- the network 104 is used to provide a communication link medium between the sensors 101, 102, and 103 and the server 105, and between the server 105 and the coal feeding device 106.
- the network 104 may include various connection types, such as wired, wireless communication links, or optic fibers.
- Various types of information acquired by the sensors 101, 102, and 103 may be sent to the server 105 through the network 104, and the server 105 may generate control instructions based on the received information after processing and then issue the control instructions to the coal feeding device 106 through the network 104.
- the above communication may be implemented by various applications installed on the sensors 101, 102, and 103, the server 105, and the coal feeding device 106, such as information transmission applications, coal feed optimization control applications, or control instruction sending and receiving applications.
- the sensors 101, 102, and 103 are physical components (such as pressure sensors, temperature sensors, current sensors) installed in relevant positions of cement calcination-related devices (such as calciners, clinker kilns) to receive actual signals generated by actual devices.
- the sensors 101, 102, and 103 may also be virtual components provided on virtual related devices of cement calcination, to receive parameters or simulation parameters predetermined in the test scenarios.
- the server 105 may be hardware or software.
- the server 105 When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server; when the server is software, it may be implemented as a plurality of software or software modules, or as a single software or software module, which is not limited herein.
- the coal feeding device 106 may be embodied as a physical device such as a coal conveyor belt or a coal conveyor. In a virtual test scenario, it may be directly replaced by a virtual device having a controlled coal conveying capacity.
- the server 105 may provide various services through various built-in applications.
- a coal feed optimization control application that may provide a coal feed optimization control service in cement calcination may be used as an example.
- the server 105 operates the coal feed optimization control application and may achieve the following effects: first, receive an instruction for a target free calcium content required for cement clinker production of the present batch; then, input the target free calcium content into a pre-constructed reinforcement learning model that represents a corresponding relationship between a coal feed amount and a free calcium content to obtain a theoretical coal feed amount output by the reinforcement learning model; next, issue a corresponding coal feed amount instruction to the coal feeding device 106 using a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount included in the theoretical coal feed amount.
- the reinforcement learning model used by the server 105 in the above process may be constructed based on the following method: first, receiving a large amount of historical calciner coal feed amount, calciner temperature, kiln head coal feed amount, kiln current, secondary air temperature, smoke chamber temperature and under-grate pressure from the sensors 101, 102, and 103 through the network 104; then, establishing a first simulation model between the calciner coal feed amount and the calciner temperature, and establishing a second simulation model among the kiln head coal feed amount, the kiln current, the secondary air temperature, and the smoke chamber temperature; then, establishing a prediction model among: the under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; and finally, constructing the reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second
- the parameters such as the calciner coal feed amount, the calciner temperature, the kiln head coal feed amount, the kiln current, the secondary air temperature, the smoke chamber temperature, and the under-grate pressure used to construct the simulation models and the prediction model, in addition to being acquired from the sensors 101, 102, and 103, these parameters may also be stored locally in the server 105 in various forms such as log or production inspection data report. Therefore, when the server 105 detects that the data have been stored locally, it may choose to directly acquire the data locally. In this regard, the process of generating the reinforcement learning model may also not require the sensors 101, 102, and 103 and the network 104.
- the method for constructing a reinforcement learning model provided in the subsequent embodiments of the present disclosure is generally performed by the server 105 having strong computing power and more computing resources. Accordingly, the apparatus for constructing a reinforcement learning model is generally also provided in the server 105.
- Fig. 1 the number of sensors, networks, servers and coal feeding devices in Fig. 1 is merely illustrative. Depending on the implementation needs, there may be any number of sensors, networks, servers and coal feeding devices.
- FIG. 2 is a flowchart of a method for constructing a reinforcement learning model according to an embodiment of the present disclosure.
- a flow 200 includes the following steps:
- Step 201 establishing a first simulation model between a calciner coal feed amount and a calciner temperature
- This step aims to establish the first simulation model between the calciner coal feed amount and the calciner temperature by an executing body of the method for constructing a reinforcement learning model (for example, the server 105 shown in Fig. 1 ).
- the first simulation model is used to represent a corresponding relationship between the calciner coal feed amount and the calciner temperature.
- a large number of historical coal feed amount and corresponding historical calciner temperature data are required to be used as sample data to participate in training and construction of the simulation model.
- y(k) is the calciner temperature at time k
- y(k-1) and u(k-1) are respectively the calciner temperature and the calciner coal feed amount at time k-1 (that is, a previous moment of time K)
- a and b are respectively undetermined coefficients, and the particular values may be obtained by calculation using the least squares method based on historical data. For example, in a certain experimental scenario, a is 0.983 and b is 0.801.
- This step aims to establish the second simulation model among the kiln head coal feed amount, the kiln current, the secondary air temperature, and the smoke chamber temperature by the executing body.
- the second simulation model is used to represent a corresponding relationship between the kiln head coal feed amount and the kiln current, the secondary air temperature, and the smoke chamber temperature.
- the second simulation model that may represent this corresponding relationship, a large number of historical kiln head coal feed amount and corresponding historical kiln current, historical secondary air temperature, and historical smoke chamber temperature are required to be used as sample data to participate in training and construction of the simulation model. It is also possible to construct the second simulation model in the same form as the above formula.
- adjustable variables mainly include: feed amount, calciner coal feed amount, kiln head coal feed amount, kiln speed, high temperature fan speed, grate cooler speed
- controlled variables mainly include: calciner outlet temperature, calciner outlet pressure, secondary air temperature, tertiary air temperature, kiln burning zone temperature, kiln head negative pressure, kiln tail temperature, smoke chamber temperature, kiln current, under-grate pressure, and vertical weight.
- the controlled variable refers to a variable that may not be directly debugged, but may be affected by an adjustable variable.
- the free calcium content is mainly related to the calciner temperature, the kiln current, the secondary air temperature, the smoke chamber temperature, and the under-grate pressure, and these variables are mainly determined by the three adjustable parameters: the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure.
- the three adjustable variables may be mainly considered: the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure
- four controlled variables may be mainly considered: the calciner temperature, the kiln current, the secondary air temperature, and the smoke chamber temperature
- one final target variable may be considered: the free calcium content.
- the construction of the simulation models that represent parameter changes related to the coal feed amount during cement calcination is indispensable. Therefore, the executing body respectively constructs the first simulation model that represents the corresponding relationship between the controlled variable-the calciner temperature and the adjustable variable-the calciner coal feed amount through step 201, and constructs the second simulation model that represents the corresponding relationship between the controlled variables-the kiln current, the secondary air temperature, the smoke chamber temperature and the adjustable variable-the kiln head coal feed amount through step 202.
- Step 203 establishing a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content;
- this step aims to establish the prediction model between the under-grate pressure, the calciner temperature output by the first simulation model, and the kiln current, the secondary air temperature, the smoke chamber temperature output by the second simulation model by the executing body, and the free calcium content.
- step 202 it may be considered that the clinker quality index-free calcium content is mainly affected by the five controlled variables of under-grate pressure, calciner temperature, kiln current, secondary air temperature, and smoke chamber temperature. Therefore, in this step, the prediction model between the above five controlled variables and the free calcium content is established, that is, the generated prediction model may predict a prediction value of the corresponding free calcium content based on actual values of the given five controlled variables.
- the establishing of the above prediction model requires a large amount of historical data to participate in training, so as to find a more accurate relationship of the influence of the controlled variables on the free calcium content, which may be achieved using various models or algorithms that support multiple input parameters to predict unique output parameter such as SVM (Support Vector Machine), neural network, or tree model, which is not limited herein, and the model or algorithm may be selected based on all possible influencing factors in actual application scenarios.
- SVM Small Vector Machine
- neural network or tree model
- the large amount of historical data as samples required to construct the models in the above steps may all come from acquisition of various sensors (such as the sensors 101, 102, and 103 shown in Fig. 1 ) installed on relevant devices used in clinker calcination.
- the under-grate pressure may be acquired by a pressure sensor installed in a grate cooler
- the kiln current may be acquired by a current sensor installed in the kiln head
- temperature sensors of different performance and models may be selected based on actual temperature ranges.
- Step 204 constructing a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model.
- this step aims to establish the reinforcement learning model that represents the association between the coal feed amount and the free calcium content according to the preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model by the executing body.
- the free calcium content is affected by the five controlled variables of under-grate pressure, calciner temperature, kiln current, secondary air temperature, and smoke chamber temperature, these five controlled variables are respectively controlled by the two adjustable variables, namely, the calciner coal feed amount and the kiln head coal feed amount.
- the reinforcement learning model that may represent the association between the coal feed amount and the free calcium content is constructed according to the reinforcement learning model architecture.
- Reinforcement learning also known as encouragement learning, evaluation learning, or enhancement learning
- Reinforcement learning is one of the paradigms and methodologies of machine learning. Reinforcement learning is used to describe and solve the problem that an agent maximizes returns or achieves a particular goal during interaction with the environment through learning strategies. Different from other neural network deep learning algorithms that simulate biological neural networks, the reinforcement learning algorithm is that the agent learns in a "trial and error" method, obtains a reward guide behavior through interaction with the environment, aims to maximize the reward for the agent. Reinforcement learning is different from supervised learning in connectionist learning, which is mainly manifested in reinforcement signals.
- a reinforcement signal provided by the environment in reinforcement learning is an evaluation of the quality of a generated action (usually a scalar signal), rather than telling a reinforcement learning system (RLS) how to generate a correct action. Because an external environment provides little information, RLS must rely on its own experience to learn. Using the method, RLS gains knowledge in an action-evaluation environment and improves action plans to adapt to the environment. Deep learning models may also be used in reinforcement learning to form deep reinforcement learning (DRL) having a better effect.
- DRL deep reinforcement learning
- Actor-critic (A2C), PPO, TRPO, and other reinforcement learning model architectures having different characteristics may be used to construct the reinforcement learning model that may represent the corresponding relationship between the coal feed amount and the free calcium content required in this step.
- the method for constructing a reinforcement learning model introduces the concept of reinforcement learning into a cement calcination scenario, based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount under the influence of a plurality of parameters and the free calcium content of a final product is constructed.
- the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to the present disclosure in other similar scenarios.
- the existing technology may not meet the needs of a complex scenario of cement calcination, because PID control only considers system deviations, mainly is to track system setting values, but does not support a multi-objective optimization of clinker quality and energy consumption in the cement calcination scenario.
- PID control only considers system deviations, mainly is to track system setting values, but does not support a multi-objective optimization of clinker quality and energy consumption in the cement calcination scenario.
- due to the real-time control of a plurality of parameters involved in the cement production process it is also difficult for MPC to achieve unified real-time control of the plurality of parameters.
- the generalization ability of MPC is poor.
- the models need to be re-established each time.
- FIG. 3 is a flowchart of another method for constructing a reinforcement learning model according to an embodiment of the present disclosure.
- a flow 300 includes the following steps:
- the above steps 301-304 are the same as steps 201-204 as shown in Fig. 2 .
- the above steps may be summarized as a construction process of the reinforcement learning model. For contents of the same parts, reference may be made to the corresponding parts of the previous embodiment, and detailed description thereof will be omitted.
- Step 305 receiving a target free calcium content given in a target scenario
- this step aims to receive the target free calcium content given by a user in the target scenario by the executing body.
- This step is used as a first step for the reinforcement learning model to instruct the use of the coal feed amount during cement calcination, that is, to acquire a set clinker quality index.
- Step 306 determining a theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model
- this step aims to determine the theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model by the executing body. That is, since the reinforcement learning model may represent the corresponding relationship between the coal feed amount and the free calcium content, given the target free calcium content, the corresponding theoretical coal feed amount may be inversely derived from the corresponding relationship, where the theoretical coal feed amount includes a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount.
- Step 307 guiding a calciner coal feeding operation and a kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount.
- this step aims to instruct the calciner coal feeding operation and the kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount by the executing body.
- a coal feeding device (such as the coal feeding device 106 shown in Fig. 1 ) is controlled to feed a corresponding amount of coal to the calciner and the kiln head.
- this embodiment of the present disclosure includes all the technical features of the previous embodiment (that is, the construction steps of the reinforcement learning model), it should have all the beneficial effects of the previous embodiment.
- this embodiment of the present disclosure also provides a solution on how to instruct the coal feed amount based on the constructed reinforcement learning model through steps 305 to 307, so as to instruct the input coal amount in the cement calcination process by giving a reasonable coal feed amount, put in as little coal as possible while ensuring the clinker quality as much as possible, to reduce costs and increase efficiency. Saved coal is also equivalent to reducing a corresponding amount of carbon dioxide emission to the atmosphere, which is conducive to striving for an environmentally friendly enterprise.
- Temperature control includes, but is not limited to, all effective means such as physical cooling or reduction of coal feed amount.
- An entire process of cement calcination may be seen in the schematic diagram of an apparatus given in the upper left corner of Fig. 4 .
- the entire process involves many controlled parameters, such as the calciner coal feed amount, or the kiln head coal feed amount, and these parameters may directly affect the quality of the clinker, that is, the free calcium content.
- an enterprise usually requires that the free calcium content is between 0.5% and 1.5%.
- the free calcium content in a modeling process of the embodiments of the present disclosure is adjusted to be between 1%-1.5%, to reduce production costs as much as possible while ensuring quality.
- the process parameters are adjusted based on the reinforcement learning model to reduce coal consumption while ensuring quality.
- the entire modeling process is very complicated. The following is a detailed introduction to the various parts of the construction process of the reinforcement learning model that the server is responsible for:
- the free calcium content is measured about once an hour in the production process. Because it is required to control and adjust the coal feed amount and other parameters in real time, it is necessary to establish the real-time prediction model on the free calcium content. Since the free calcium content is mainly related to the calciner temperature, the kiln current, the secondary air temperature, the smoke chamber temperature and the under-grate pressure, the established model is:
- Free calcium content f (calciner temperature, kiln current, secondary air temperature, smoke chamber temperature, under-grate pressure); in an experiment, a large amount of historical data are used to fit f. In the present embodiment, the large amount of historical data are used to construct the prediction model through neural networks.
- the simulation models in the cement calcination process it is necessary to construct the simulation models in the cement calcination process. That is, after the coal feed amount is adjusted, how the controlled variables such as the calciner temperature, the kiln current, the secondary air temperature, or the smoke chamber temperature may change during calcination.
- a first-order inertia model plus a hysteresis link are often selected to simulate a complex industrial system having large inertia and pure lag.
- the calciner temperature is mainly related to the calciner coal feed amount
- the kiln current, the secondary air temperature, and the smoke chamber temperature are mainly related to the kiln head coal feed amount.
- a system model of the calciner temperature with respect to the calciner coal feed amount may be established, and a system model of the kiln current, the secondary air temperature, and the smoke chamber temperature with respect to the kiln head coal feed amount may be established.
- the present embodiment uses an Actor-critic reinforcement learning model, uses the three adjustable parameters: calciner coal feed amount, kiln head coal feed amount, and under-grate pressure as an Action of the reinforcement learning model, and temporarily ignores other parameters in the calcination process.
- the purpose is to ensure that a final free calcium content is between 1%-1.5%, and at the same time, when the feed amount is set at a certain level, the calciner coal feed amount and the kiln head coal feed amount should be as little as possible.
- coal consumption Since the measurement standard of coal consumption is total coal feed amount/feed amount, it is assumed that the speed of the feed amount is fixed, that is, the feed amount per unit time is fixed, so the coal consumption only needs to consider the calciner coal feed amount and the kiln head coal feed amount.
- a data processing step at the bottom of Fig. 4 is a parameter update process of the Actor-critic reinforcement learning model based on sample.
- a sample is selected from each actual Action (the parameters may be named as S t , a t , r t , S t +1 , etc.).
- these samples are stored in a memory database in the form of tuple.
- some data are selected from the memory database by sampling to update the parameters of the Actor-critic reinforcement learning model.
- the effectiveness and usability of the Actor-critic reinforcement learning model are maintained using this update method.
- the minimized coal feed amount may be subsequently determined based on the given free calcium content, so as to achieve cost decreasing and benefit increasing.
- the present disclosure provides an embodiment of an apparatus for constructing a reinforcement learning model, and the apparatus embodiment corresponds to the method embodiment as shown in Fig. 2 .
- the apparatus may be particularly applied to various electronic devices.
- an apparatus 500 for constructing a reinforcement learning model of the present embodiment may include: a first simulation model establishing unit 501, a second simulation model establishing unit 502, a prediction model establishing unit 503 and a reinforcement learning model construction unit 504.
- the first simulation model establishing unit 501 is configured to establish a first simulation model between a calciner coal feed amount and a calciner temperature.
- the second simulation model establishing unit 502 is configured to establish a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature.
- the prediction model establishing unit 503 is configured to establish a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content.
- the reinforcement learning model construction unit 504 is configure to construct a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount including the calciner coal feed amount and the kiln head coal feed amount.
- the apparatus 500 for constructing a reinforcement learning model for the particular processing and the technical effects thereof of the first simulation model establishing unit 501, the second simulation model establishing unit 502, the prediction model establishing unit 503 and the reinforcement learning model construction unit 504, reference may be made to the relevant descriptions of steps 201-204 in the corresponding embodiment of Fig. 2 respectively, and detailed description thereof will be omitted.
- the apparatus 500 for constructing a reinforcement learning model may further include:
- the apparatus 500 for constructing a reinforcement learning model may further include:
- the apparatus 500 for constructing a reinforcement learning model may further include:
- the reinforcement learning model construction unit 504 may include: an A2C reinforcement learning model construction subunit, configured to construct the reinforcement learning model that represents the association between the coal feed amount and the free calcium content according to an Actor-Critic reinforcement learning model architecture.
- the A2C reinforcement learning model construction subunit may be further configured to:
- the present embodiment corresponds to the above method embodiment as the apparatus embodiment.
- the apparatus for constructing a reinforcement learning model provided in this embodiment of the present disclosure introduces the concept of reinforcement learning into a cement calcination scenario, based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount under the influence of a plurality of parameters and the free calcium content of a final product is constructed.
- the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to other similar scenarios.
- the present disclosure also provides an electronic device for constructing a reinforcement learning model and a readable storage medium.
- Fig. 6 shows a block diagram of an electronic device suitable for implementing the method for constructing a reinforcement learning model according to an embodiment of the present disclosure.
- the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
- the electronic device may also represent various forms of mobile apparatuses, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing apparatuses.
- the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure described and/or claimed herein.
- the electronic device includes: one or more processors 601, a memory 602, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces.
- the various components are connected to each other using different buses, and may be installed on a common motherboard or in other methods as needed.
- the processor may process instructions executed within the electronic device, including instructions stored in or on the memory to display graphic information of GUI on an external input/output apparatus (such as a display device coupled to the interface).
- a plurality of processors and/or a plurality of buses may be used together with a plurality of memories and a plurality of memories if desired.
- a plurality of electronic devices may be connected, and the devices provide some necessary operations, for example, as a server array, a set of blade servers, or a multi-processor system.
- one processor 601 is used as an example.
- the memory 602 is a non-transitory computer readable storage medium provided by the present disclosure.
- the memory stores instructions executable by at least one processor, so that the at least one processor performs the method for constructing a reinforcement learning model provided by the present disclosure.
- the non-transitory computer readable storage medium of the present disclosure stores computer instructions for causing a computer to perform the method for constructing a reinforcement learning model provided by the present disclosure.
- the memory 602 may be used to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for constructing a reinforcement learning model in the embodiments of the present disclosure (for example, the first simulation model establishing unit 501, the second simulation model establishing unit 502, the prediction model establishing unit 503 and the reinforcement learning model construction unit 504 as shown in Fig. 5 ).
- the processor 601 executes the non-transitory software programs, instructions, and modules stored in the memory 602 to execute various functional applications and data processing of the server, that is, to implement the method for constructing a reinforcement learning model in the foregoing method embodiments.
- the memory 602 may include a storage program area and a storage data area, where the storage program area may store an operating system and at least one function required application program; and the storage data area may store data created by the use of the electronic device according to the method for constructing a reinforcement learning model, etc.
- the memory 602 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
- the memory 602 may optionally include memories remotely provided with respect to the processor 601, and these remote memories may be connected to the electronic device of the method for constructing a reinforcement learning model through a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof.
- the electronic device of the method for constructing a reinforcement learning model may further include: an input apparatus 603 and an output apparatus 604.
- the processor 601, the memory 602, the input apparatus 603, and the output apparatus 604 may be connected through a bus or in other methods. In Fig. 6 , connection through a bus is used as an example.
- the input apparatus 603 may receive input digital or character information, and generate key signal inputs related to user settings and function control of the electronic device of the method for constructing a reinforcement learning model, such as touch screen, keypad, mouse, trackpad, touchpad, pointing stick, one or more mouse buttons, trackball, joystick and other input apparatuses.
- the output apparatus 604 may include a display device, an auxiliary lighting apparatus (for example, LED), a tactile feedback apparatus (for example, a vibration motor), and the like.
- the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
- Various embodiments of the systems and technologies described herein may be implemented in digital electronic circuit systems, integrated circuit systems, dedicated ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs that may be executed and/or interpreted on a programmable system that includes at least one programmable processor.
- the programmable processor may be a dedicated or general-purpose programmable processor, and may receive data and instructions from a storage system, at least one input apparatus, and at least one output apparatus, and transmit the data and instructions to the storage system, the at least one input apparatus, and the at least one output apparatus.
- These computing programs include machine instructions of the programmable processor and may use high-level processes and/or object-oriented programming languages, and/or assembly/machine languages to implement these computing programs.
- machine readable medium and “computer readable medium” refer to any computer program product, device, and/or apparatus (for example, magnetic disk, optical disk, memory, programmable logic apparatus (PLD)) used to provide machine instructions and/or data to the programmable processor, including machine readable medium that receives machine instructions as machine readable signals.
- machine readable signal refers to any signal used to provide machine instructions and/or data to the programmable processor.
- the systems and technologies described herein may be implemented on a computer, the computer has: a display apparatus for displaying information to the user (for example, CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and a pointing apparatus (for example, mouse or trackball), and the user may use the keyboard and the pointing apparatus to provide input to the computer.
- a display apparatus for displaying information to the user
- LCD liquid crystal display
- keyboard and a pointing apparatus for example, mouse or trackball
- Other types of apparatuses may also be used to provide interaction with the user; for example, feedback provided to the user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and any form (including acoustic input, voice input, or tactile input) may be used to receive input from the user.
- the systems and technologies described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., application server), or a computing system that includes frontend components (for example, a user computer having a graphical user interface or a web browser, through which the user may interact with the implementations of the systems and the technologies described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components.
- the components of the system may be interconnected by any form or medium of digital data communication (e.g., communication network). Examples of the communication network include: local area networks (LAN), wide area networks (WAN) and the Internet.
- the computing system may include a client and a server.
- the client and the server are generally far from each other and usually interact through the communication network.
- the relationship between the client and the server is generated by computer programs that run on the corresponding computer and have a client-server relationship with each other.
- the server may be a cloud server, also known as a cloud computing server or a cloud host.
- the server is a host product in the cloud computing service system to solve the defects of management difficulty in traditional physical host and virtual private server (VPS) services Large, and weak business scalability.
- VPN virtual private server
- the concept of reinforcement learning is introduced into a cement calcination scenario, based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount and the free calcium content of a final product under the influence of a plurality of parameters is constructed.
- the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to other similar scenarios.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Geometry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Ceramic Engineering (AREA)
- Materials Engineering (AREA)
- Structural Engineering (AREA)
- Organic Chemistry (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Curing Cements, Concrete, And Artificial Stone (AREA)
Abstract
Description
- The present disclosure relates to the field of data processing technology, particularly to the field of big data and deep learning technology, and more particularly to a method, apparatus and electronic device for constructing a reinforcement learning model, and relates to a computer readable storage medium.
- There are three main stages in the production process of cement: raw material mining and grinding, calcination of raw material to clinker, and clinker reprocessing. The calcination of raw material to clinker is a very complicated process, and the costs of coal and electricity consumed in the process is very high. In the calcination process, main consumption is coal and electricity, of which coal consumption accounts for the largest proportion. Thus, how to reasonably manage and control a coal feed amount in the calcination stage is the key to decrease the cost and increase the efficiency in the cement industry.
- Embodiments of the present disclosure propose a method, apparatus and electronic device for constructing a reinforcement learning model. The embodiments also refers to a computer readable storage medium.
- In a first aspect, embodiments of the present disclosure provide a method for constructing a reinforcement learning model, comprising: establishing a first simulation model between a calciner coal feed amount and a calciner temperature; establishing a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature; establishing a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; and constructing a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount comprising the calciner coal feed amount and the kiln head coal feed amount.
- In a second aspect, embodiments of the present disclosure provide an apparatus for constructing a reinforcement learning model, comprising: a first simulation model establishing unit, configured to establish a first simulation model between a calciner coal feed amount and a calciner temperature; a second simulation model establishing unit, configured to establish a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature; a prediction model establishing unit, configured to establish a prediction model among an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; and a reinforcement learning model construction unit, configure to construct a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount comprising the calciner coal feed amount and the kiln head coal feed amount.
- In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; and a storage apparatus, storing one or more programs thereon, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method provided by the first aspect.
- In a forth aspect, embodiments of the present disclosure provide a computer-readable medium, storing a computer program thereon, wherein the program, when executed by a processor, causes the processor to implement the method provided by the first aspect.
- In a fifth aspect, embodiments of the present disclosure provide a computer program product including a computer program, where the computer program, when executed by a processing apparatus, implements the method provided by the first aspect.
- The method, apparatus for constructing a reinforcement learning model, electronic device, and computer readable storage medium provided by the embodiments of the present disclosure, first establishing the first simulation model between the calciner coal feed amount and the calciner temperature, and establishing the second simulation model between the kiln head coal feed amount and the kiln current, the secondary air temperature, and the smoke chamber temperature; then establishing the prediction model between the under-grate pressure, the calciner temperature output by the first simulation model, and the kiln current, the secondary air temperature, the smoke chamber temperature output by the second simulation model, and the free calcium content; and finally constructing the reinforcement learning model that represents the association between the coal feed amount and the free calcium content according to the preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model, the coal feed amount including the calciner coal feed amount and the kiln head coal feed amount.
- Different from the existing technology that may not meet the needs of a complex scenario of cement calcination, the present disclosure introduces the concept of reinforcement learning into a cement calcination scenario. Based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount and the free calcium content of a final product under the influence of a plurality of parameters is constructed. In addition, since the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to other similar scenarios.
- It should be understood that the content described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be easily understood by the following description.
- By reading the detailed description of non-limiting embodiments with reference to the following accompanying drawings, other features, objectives and advantages of the present disclosure will become more apparent:
-
Fig. 1 is an exemplary system architecture in which the present disclosure may be implemented; -
Fig. 2 is a flowchart of a method for constructing a reinforcement learning model according to an embodiment of the present disclosure; -
Fig. 3 is a flowchart of another method for constructing a reinforcement learning model according to an embodiment of the present disclosure; -
Fig. 4 is a schematic flowchart of the method for constructing a reinforcement learning model in an application scenario according to an embodiment of the present disclosure; -
Fig. 5 is a structural block diagram of an apparatus for constructing a reinforcement learning model according to an embodiment of the present disclosure; and -
Fig. 6 is a block diagram of an electronic device suitable for implementing the method for constructing a reinforcement learning model according to an embodiment of the present disclosure. - The present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It may be understood that the embodiments described herein are only used to explain the relevant disclosure, but not to limit the disclosure. In addition, it should be noted that, for ease of description, only the parts related to the relevant disclosure are shown in the accompanying drawings.
- It should be noted that the embodiments in the present disclosure and the features in the embodiments may be combined with each other on a non-conflict basis. The present disclosure will be described below in detail with reference to the accompanying drawings and in combination with the embodiments.
-
Fig. 1 shows anexemplary system architecture 100 to which the embodiments of the method, apparatus and electronic device for constructing a reinforcement learning model, and computer readable storage medium of the present disclosure may be applied. - As shown in
Fig. 1 , thesystem architecture 100 may includesensors network 104, aserver 105, and acoal feeding device 106. Thenetwork 104 is used to provide a communication link medium between thesensors server 105, and between theserver 105 and thecoal feeding device 106. Thenetwork 104 may include various connection types, such as wired, wireless communication links, or optic fibers. - Various types of information acquired by the
sensors server 105 through thenetwork 104, and theserver 105 may generate control instructions based on the received information after processing and then issue the control instructions to thecoal feeding device 106 through thenetwork 104. The above communication may be implemented by various applications installed on thesensors server 105, and thecoal feeding device 106, such as information transmission applications, coal feed optimization control applications, or control instruction sending and receiving applications. - Typically, the
sensors sensors server 105 may be hardware or software. When theserver 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or as a single server; when the server is software, it may be implemented as a plurality of software or software modules, or as a single software or software module, which is not limited herein. In an actual scenario, thecoal feeding device 106 may be embodied as a physical device such as a coal conveyor belt or a coal conveyor. In a virtual test scenario, it may be directly replaced by a virtual device having a controlled coal conveying capacity. - The
server 105 may provide various services through various built-in applications. A coal feed optimization control application that may provide a coal feed optimization control service in cement calcination may be used as an example. Theserver 105 operates the coal feed optimization control application and may achieve the following effects: first, receive an instruction for a target free calcium content required for cement clinker production of the present batch; then, input the target free calcium content into a pre-constructed reinforcement learning model that represents a corresponding relationship between a coal feed amount and a free calcium content to obtain a theoretical coal feed amount output by the reinforcement learning model; next, issue a corresponding coal feed amount instruction to thecoal feeding device 106 using a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount included in the theoretical coal feed amount. - The reinforcement learning model used by the
server 105 in the above process may be constructed based on the following method: first, receiving a large amount of historical calciner coal feed amount, calciner temperature, kiln head coal feed amount, kiln current, secondary air temperature, smoke chamber temperature and under-grate pressure from thesensors network 104; then, establishing a first simulation model between the calciner coal feed amount and the calciner temperature, and establishing a second simulation model among the kiln head coal feed amount, the kiln current, the secondary air temperature, and the smoke chamber temperature; then, establishing a prediction model among: the under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; and finally, constructing the reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model. - It should be noted that the parameters such as the calciner coal feed amount, the calciner temperature, the kiln head coal feed amount, the kiln current, the secondary air temperature, the smoke chamber temperature, and the under-grate pressure used to construct the simulation models and the prediction model, in addition to being acquired from the
sensors server 105 in various forms such as log or production inspection data report. Therefore, when theserver 105 detects that the data have been stored locally, it may choose to directly acquire the data locally. In this regard, the process of generating the reinforcement learning model may also not require thesensors network 104. - Since the construction of the simulation models, the prediction model, and the reinforcement learning model based on a large number of parameters requires to occupy more computing resources and strong computing power, the method for constructing a reinforcement learning model provided in the subsequent embodiments of the present disclosure is generally performed by the
server 105 having strong computing power and more computing resources. Accordingly, the apparatus for constructing a reinforcement learning model is generally also provided in theserver 105. - It should be understood that the number of sensors, networks, servers and coal feeding devices in
Fig. 1 is merely illustrative. Depending on the implementation needs, there may be any number of sensors, networks, servers and coal feeding devices. - With reference to
Fig. 2, Fig. 2 is a flowchart of a method for constructing a reinforcement learning model according to an embodiment of the present disclosure. Aflow 200 includes the following steps: -
Step 201, establishing a first simulation model between a calciner coal feed amount and a calciner temperature; - This step aims to establish the first simulation model between the calciner coal feed amount and the calciner temperature by an executing body of the method for constructing a reinforcement learning model (for example, the
server 105 shown inFig. 1 ). - The first simulation model is used to represent a corresponding relationship between the calciner coal feed amount and the calciner temperature. In order to construct the first simulation model that may represent this corresponding relationship, a large number of historical coal feed amount and corresponding historical calciner temperature data are required to be used as sample data to participate in training and construction of the simulation model. For example, the first simulation model that represents the corresponding relationship between the calciner coal feed amount and the calciner temperature may be constructed in the form of the following formula:
- In the formula, y(k) is the calciner temperature at time k, and y(k-1) and u(k-1) are respectively the calciner temperature and the calciner coal feed amount at time k-1 (that is, a previous moment of time K); a and b are respectively undetermined coefficients, and the particular values may be obtained by calculation using the least squares method based on historical data. For example, in a certain experimental scenario, a is 0.983 and b is 0.801.
- S202, establishing a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature;
- This step aims to establish the second simulation model among the kiln head coal feed amount, the kiln current, the secondary air temperature, and the smoke chamber temperature by the executing body.
- Different from the first simulation model, the second simulation model is used to represent a corresponding relationship between the kiln head coal feed amount and the kiln current, the secondary air temperature, and the smoke chamber temperature. In order to construct the second simulation model that may represent this corresponding relationship, a large number of historical kiln head coal feed amount and corresponding historical kiln current, historical secondary air temperature, and historical smoke chamber temperature are required to be used as sample data to participate in training and construction of the simulation model. It is also possible to construct the second simulation model in the same form as the above formula.
- It should be noted that the executing body needs to construct the first simulation model and the second simulation model through
step 201 and step 202 respectively, because in a cement clinker calcination process, adjustable variables mainly include: feed amount, calciner coal feed amount, kiln head coal feed amount, kiln speed, high temperature fan speed, grate cooler speed, and controlled variables mainly include: calciner outlet temperature, calciner outlet pressure, secondary air temperature, tertiary air temperature, kiln burning zone temperature, kiln head negative pressure, kiln tail temperature, smoke chamber temperature, kiln current, under-grate pressure, and vertical weight. The controlled variable refers to a variable that may not be directly debugged, but may be affected by an adjustable variable. - All of the above variables ultimately act on an index-free calcium content of a calcined finished product. Therefore, in order to ensure a clinker quality of the finished product, it is necessary to monitor these variables during the entire calcination to calculate the quality of the calcined clinker product using these variables. After investigation, the free calcium content is mainly related to the calciner temperature, the kiln current, the secondary air temperature, the smoke chamber temperature, and the under-grate pressure, and these variables are mainly determined by the three adjustable parameters: the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure. Therefore, in the case that the present disclosure mainly focuses on coal consumption caused by coal feeding and the clinker quality (i.e., free calcium content), the three adjustable variables may be mainly considered: the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure, four controlled variables may be mainly considered: the calciner temperature, the kiln current, the secondary air temperature, and the smoke chamber temperature, and one final target variable may be considered: the free calcium content.
- In order to optimize parameter adjustment of the coal feed amount using the reinforcement learning model, the construction of the simulation models that represent parameter changes related to the coal feed amount during cement calcination is indispensable. Therefore, the executing body respectively constructs the first simulation model that represents the corresponding relationship between the controlled variable-the calciner temperature and the adjustable variable-the calciner coal feed amount through
step 201, and constructs the second simulation model that represents the corresponding relationship between the controlled variables-the kiln current, the secondary air temperature, the smoke chamber temperature and the adjustable variable-the kiln head coal feed amount throughstep 202. -
Step 203, establishing a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content; - On the basis of
step 201 and step 202, this step aims to establish the prediction model between the under-grate pressure, the calciner temperature output by the first simulation model, and the kiln current, the secondary air temperature, the smoke chamber temperature output by the second simulation model by the executing body, and the free calcium content. - As described in
step 202, it may be considered that the clinker quality index-free calcium content is mainly affected by the five controlled variables of under-grate pressure, calciner temperature, kiln current, secondary air temperature, and smoke chamber temperature. Therefore, in this step, the prediction model between the above five controlled variables and the free calcium content is established, that is, the generated prediction model may predict a prediction value of the corresponding free calcium content based on actual values of the given five controlled variables. - The establishing of the above prediction model requires a large amount of historical data to participate in training, so as to find a more accurate relationship of the influence of the controlled variables on the free calcium content, which may be achieved using various models or algorithms that support multiple input parameters to predict unique output parameter such as SVM (Support Vector Machine), neural network, or tree model, which is not limited herein, and the model or algorithm may be selected based on all possible influencing factors in actual application scenarios.
- The large amount of historical data as samples required to construct the models in the above steps may all come from acquisition of various sensors (such as the
sensors Fig. 1 ) installed on relevant devices used in clinker calcination. For example, the under-grate pressure may be acquired by a pressure sensor installed in a grate cooler, the kiln current may be acquired by a current sensor installed in the kiln head, and for the various temperatures, temperature sensors of different performance and models may be selected based on actual temperature ranges. -
Step 204, constructing a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model. - On the basis of
step 203, this step aims to establish the reinforcement learning model that represents the association between the coal feed amount and the free calcium content according to the preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model by the executing body. - Although the free calcium content is affected by the five controlled variables of under-grate pressure, calciner temperature, kiln current, secondary air temperature, and smoke chamber temperature, these five controlled variables are respectively controlled by the two adjustable variables, namely, the calciner coal feed amount and the kiln head coal feed amount. In addition, based on the main purpose of the present disclosure, based on the simulation models that represent the corresponding relationships between the adjustable variables and the controlled variables, and the prediction model that represents the corresponding relationship between the controlled variables and the quality index, the reinforcement learning model that may represent the association between the coal feed amount and the free calcium content is constructed according to the reinforcement learning model architecture.
- Reinforcement learning (RL), also known as encouragement learning, evaluation learning, or enhancement learning, is one of the paradigms and methodologies of machine learning. Reinforcement learning is used to describe and solve the problem that an agent maximizes returns or achieves a particular goal during interaction with the environment through learning strategies. Different from other neural network deep learning algorithms that simulate biological neural networks, the reinforcement learning algorithm is that the agent learns in a "trial and error" method, obtains a reward guide behavior through interaction with the environment, aims to maximize the reward for the agent. Reinforcement learning is different from supervised learning in connectionist learning, which is mainly manifested in reinforcement signals. A reinforcement signal provided by the environment in reinforcement learning is an evaluation of the quality of a generated action (usually a scalar signal), rather than telling a reinforcement learning system (RLS) how to generate a correct action. Because an external environment provides little information, RLS must rely on its own experience to learn. Using the method, RLS gains knowledge in an action-evaluation environment and improves action plans to adapt to the environment. Deep learning models may also be used in reinforcement learning to form deep reinforcement learning (DRL) having a better effect.
- Actor-critic (A2C), PPO, TRPO, and other reinforcement learning model architectures having different characteristics may be used to construct the reinforcement learning model that may represent the corresponding relationship between the coal feed amount and the free calcium content required in this step.
- Different from the existing technology that may not meet the needs of a complex scenario of cement calcination, the method for constructing a reinforcement learning model provided in the embodiment of the present disclosure introduces the concept of reinforcement learning into a cement calcination scenario, based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount under the influence of a plurality of parameters and the free calcium content of a final product is constructed. In addition, since the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to the present disclosure in other similar scenarios.
- The existing technology may not meet the needs of a complex scenario of cement calcination, because PID control only considers system deviations, mainly is to track system setting values, but does not support a multi-objective optimization of clinker quality and energy consumption in the cement calcination scenario. On the other hand, due to the real-time control of a plurality of parameters involved in the cement production process, it is also difficult for MPC to achieve unified real-time control of the plurality of parameters. At the same time, the generalization ability of MPC is poor. For a calcination system of similar scenarios, the models need to be re-established each time.
- With reference to
Fig. 3, Fig. 3 is a flowchart of another method for constructing a reinforcement learning model according to an embodiment of the present disclosure. Aflow 300 includes the following steps: - Step 301: establishing a first simulation model between a calciner coal feed amount and a calciner temperature;
- Step 302: establishing a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature;
- Step 303: establishing a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content;
- Step 304: constructing a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model;
- The above steps 301-304 are the same as steps 201-204 as shown in
Fig. 2 . The above steps may be summarized as a construction process of the reinforcement learning model. For contents of the same parts, reference may be made to the corresponding parts of the previous embodiment, and detailed description thereof will be omitted. - Step 305: receiving a target free calcium content given in a target scenario;
- On the basis of constructing the available reinforcement learning model in
step 304, this step aims to receive the target free calcium content given by a user in the target scenario by the executing body. This step is used as a first step for the reinforcement learning model to instruct the use of the coal feed amount during cement calcination, that is, to acquire a set clinker quality index. - Step 306: determining a theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model;
- On the basis of
step 305, this step aims to determine the theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model by the executing body. That is, since the reinforcement learning model may represent the corresponding relationship between the coal feed amount and the free calcium content, given the target free calcium content, the corresponding theoretical coal feed amount may be inversely derived from the corresponding relationship, where the theoretical coal feed amount includes a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount. - Step 307: guiding a calciner coal feeding operation and a kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount.
- On the basis of step 306, this step aims to instruct the calciner coal feeding operation and the kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount by the executing body. For example, a coal feeding device (such as the
coal feeding device 106 shown inFig. 1 ) is controlled to feed a corresponding amount of coal to the calciner and the kiln head. - Since this embodiment of the present disclosure include all the technical features of the previous embodiment (that is, the construction steps of the reinforcement learning model), it should have all the beneficial effects of the previous embodiment. On this basis, this embodiment of the present disclosure also provides a solution on how to instruct the coal feed amount based on the constructed reinforcement learning model through
steps 305 to 307, so as to instruct the input coal amount in the cement calcination process by giving a reasonable coal feed amount, put in as little coal as possible while ensuring the clinker quality as much as possible, to reduce costs and increase efficiency. Saved coal is also equivalent to reducing a corresponding amount of carbon dioxide emission to the atmosphere, which is conducive to striving for an environmentally friendly enterprise. - On the basis of the previous embodiment, although the above controlled variables are mainly affected by the adjustable variables, cement calcination is a very complicated process. There are many other sudden or indispensable factors that may cause some controlled variable changes, in turn affecting the clinker quality. Therefore, the following solution may also be used to determine whether other methods are required to adjust the controlled variables:
for the calciner temperature: - acquiring a current calciner temperature, and determining a simulated calciner coal feed amount corresponding to the current calciner temperature based on the first simulation model; and
- adjusting the calciner temperature based on a plus or minus of the first difference, in response to a first difference between the simulated calciner coal feed amount and the theoretical calciner coal feed amount exceeding a first preset threshold.
- Similarly, for the kiln current, the secondary air temperature, and the smoke chamber temperature:
- acquiring a current kiln current, a current secondary air temperature, and a current smoke chamber temperature, and determining a simulated kiln head coal feed amount corresponding to the current kiln current, the current secondary air temperature, and the current smoke chamber temperature based on the second simulation model; and
- adjusting the kiln current, the secondary air temperature, and the smoke chamber temperature based on the second difference, in response to a second difference between the simulated kiln head coal feed amount and the theoretical kiln head coal feed amount exceeding a second preset threshold.
- Temperature control includes, but is not limited to, all effective means such as physical cooling or reduction of coal feed amount.
- To deepen understanding, the present disclosure also provides an implementation solution in combination with an application scenario. Reference may be made to the schematic diagram as shown in
Fig. 4 . - An entire process of cement calcination may be seen in the schematic diagram of an apparatus given in the upper left corner of
Fig. 4 . First, feeding a raw material, and then sequentially going through four processes of preheater preheating, calciner heating, rotary kiln calcination, and grate cooler cooling to generate clinker. The entire process involves many controlled parameters, such as the calciner coal feed amount, or the kiln head coal feed amount, and these parameters may directly affect the quality of the clinker, that is, the free calcium content. In actual production, an enterprise usually requires that the free calcium content is between 0.5% and 1.5%. Through mechanism research, it is found that the free calcium content is low because the calcination temperature is too high, causing overburning, and the higher the corresponding coal feed amount, the higher the coal consumption. Therefore, in order to ensure low coal consumption under the premise of qualified quality, the free calcium content in a modeling process of the embodiments of the present disclosure is adjusted to be between 1%-1.5%, to reduce production costs as much as possible while ensuring quality. - The process parameters are adjusted based on the reinforcement learning model to reduce coal consumption while ensuring quality. The entire modeling process is very complicated. The following is a detailed introduction to the various parts of the construction process of the reinforcement learning model that the server is responsible for:
- The free calcium content is measured about once an hour in the production process. Because it is required to control and adjust the coal feed amount and other parameters in real time, it is necessary to establish the real-time prediction model on the free calcium content. Since the free calcium content is mainly related to the calciner temperature, the kiln current, the secondary air temperature, the smoke chamber temperature and the under-grate pressure, the established model is:
- Free calcium content = f (calciner temperature, kiln current, secondary air temperature, smoke chamber temperature, under-grate pressure); in an experiment, a large amount of historical data are used to fit f. In the present embodiment, the large amount of historical data are used to construct the prediction model through neural networks.
- To adjust the parameters using the reinforcement learning model, it is necessary to construct the simulation models in the cement calcination process. That is, after the coal feed amount is adjusted, how the controlled variables such as the calciner temperature, the kiln current, the secondary air temperature, or the smoke chamber temperature may change during calcination. In the industry, a first-order inertia model plus a hysteresis link are often selected to simulate a complex industrial system having large inertia and pure lag. By consulting relevant professional information, the calciner temperature is mainly related to the calciner coal feed amount, and the kiln current, the secondary air temperature, and the smoke chamber temperature are mainly related to the kiln head coal feed amount. A system model of the calciner temperature with respect to the calciner coal feed amount may be established, and a system model of the kiln current, the secondary air temperature, and the smoke chamber temperature with respect to the kiln head coal feed amount may be established.
- With the simulation models and the prediction model constructed in the above steps, the reinforcement learning model may be easily established. The present embodiment uses an Actor-critic reinforcement learning model, uses the three adjustable parameters: calciner coal feed amount, kiln head coal feed amount, and under-grate pressure as an Action of the reinforcement learning model, and temporarily ignores other parameters in the calcination process. The purpose is to ensure that a final free calcium content is between 1%-1.5%, and at the same time, when the feed amount is set at a certain level, the calciner coal feed amount and the kiln head coal feed amount should be as little as possible. Since the measurement standard of coal consumption is total coal feed amount/feed amount, it is assumed that the speed of the feed amount is fixed, that is, the feed amount per unit time is fixed, so the coal consumption only needs to consider the calciner coal feed amount and the kiln head coal feed amount.
- Model details are as follows:
- Action: is a three-dimensional vector, three-dimension continuous action, which is the calciner coal feed amount, the kiln head coal feed amount, and an under- grate pressure value. That is, these three parameters are output for control at every moment;
- State: is a 14-dimensional (10-dimensional, after cutting some of the parameters corresponding to t-2) vector, which is calciner temperature t-2 (may be cut), t-1, a value at time t, the kiln current, the secondary air temperature and smoke chamber temperature t-2 (may be cut), t-1, a value at time t, a current value of the under-grate pressure, and a prediction value of the free calcium content given by a free calcium content prediction model constructed through the above steps. After each execution of an Action, State updates through the simulation environment;
- Reward (reward value): since the purpose is to reduce coal consumption while ensuring quality, Reward is divided into two parts, that is, whether the free calcium content is within a target value range and a current coal feed amount. That is, Reward = -(kiln head coal feed amount + calciner coal feed amount)+100∗I_({1%≤actual free calcium content≤1.5%}). Here, I is an indicative function, when 1%≤actual free calcium content≤1.5%, the value of I is 1, otherwise the value of I is 0.
- It may be seen from the above Reward formula that when the free calcium content meets the standard, the less the total coal feed amount, the greater the value of Reward.
- A data processing step at the bottom of
Fig. 4 is a parameter update process of the Actor-critic reinforcement learning model based on sample. First, a sample is selected from each actual Action (the parameters may be named as St, at, rt , S t+1, etc.). Then, these samples are stored in a memory database in the form of tuple. Next, some data are selected from the memory database by sampling to update the parameters of the Actor-critic reinforcement learning model. Thus, the effectiveness and usability of the Actor-critic reinforcement learning model are maintained using this update method. - After the server is installed with the reinforcement learning model obtained by construction through the above construction steps, in the corresponding cement calcination scenario, the minimized coal feed amount may be subsequently determined based on the given free calcium content, so as to achieve cost decreasing and benefit increasing.
- With further reference to
Fig. 5 , as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for constructing a reinforcement learning model, and the apparatus embodiment corresponds to the method embodiment as shown inFig. 2 . The apparatus may be particularly applied to various electronic devices. - As shown in
Fig. 5 , anapparatus 500 for constructing a reinforcement learning model of the present embodiment may include: a first simulationmodel establishing unit 501, a second simulationmodel establishing unit 502, a predictionmodel establishing unit 503 and a reinforcement learningmodel construction unit 504. The first simulationmodel establishing unit 501 is configured to establish a first simulation model between a calciner coal feed amount and a calciner temperature. The second simulationmodel establishing unit 502 is configured to establish a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature. The predictionmodel establishing unit 503 is configured to establish a prediction model among: an under-grate pressure; the calciner temperature output by the first simulation model; the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; and a free calcium content. The reinforcement learningmodel construction unit 504 is configure to construct a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount including the calciner coal feed amount and the kiln head coal feed amount. - In the present embodiment, in the
apparatus 500 for constructing a reinforcement learning model, for the particular processing and the technical effects thereof of the first simulationmodel establishing unit 501, the second simulationmodel establishing unit 502, the predictionmodel establishing unit 503 and the reinforcement learningmodel construction unit 504, reference may be made to the relevant descriptions of steps 201-204 in the corresponding embodiment ofFig. 2 respectively, and detailed description thereof will be omitted. - In some alternative implementations of the present embodiment, the
apparatus 500 for constructing a reinforcement learning model may further include: - a given parameter receiving unit, configured to receive a target free calcium content given in a target scenario;
- a theoretical coal feed amount determination unit, configured to determine a theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model; where, the theoretical coal feed amount includes a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount; and
- a coal feeding operation instruction unit, configured to guide a calciner coal feeding operation and a kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount.
- In some alternative implementations of the present embodiment, the
apparatus 500 for constructing a reinforcement learning model may further include: - a simulated calciner temperature determination unit, configured to acquire a current calciner temperature, and determine a simulated calciner coal feed amount corresponding to the current calciner temperature based on the first simulation model; and
- a first adjusting unit, configured to adjust the calciner temperature based on a plus or minus of the first difference, in response to a first difference between the simulated calciner coal feed amount and the theoretical calciner coal feed amount exceeding a first preset threshold.
- In some alternative implementations of the present embodiment, the
apparatus 500 for constructing a reinforcement learning model may further include: - a simulated kiln head coal feed amount determination unit, configured to acquire a current kiln current, a current secondary air temperature, and a current smoke chamber temperature, and determine a simulated kiln head coal feed amount corresponding to the current kiln current, the current secondary air temperature, and the current smoke chamber temperature based on the second simulation model; and
- a second adjusting unit, configured to adjust the kiln current, the secondary air temperature, and the smoke chamber temperature based on the second difference, in response to a second difference between the simulated kiln head coal feed amount and the theoretical kiln head coal feed amount exceeding a second preset threshold.
- In some alternative implementations of the present embodiment, the reinforcement learning
model construction unit 504 may include:
an A2C reinforcement learning model construction subunit, configured to construct the reinforcement learning model that represents the association between the coal feed amount and the free calcium content according to an Actor-Critic reinforcement learning model architecture. - In some alternative implementations of the present embodiment, the A2C reinforcement learning model construction subunit may be further configured to:
- an Action configuration module, configured to construct the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure as an Action represented by a three-dimensional vector;
- a State configuration module, configured to construct a State represented by a ten-dimensional vector by at least using followings as a dimension respectively: a calciner temperature, a kiln current, a secondary air temperature, and a smoke chamber temperature at a previous moment; a calciner temperature, a kiln current, a secondary air temperature, a smoke chamber temperature and an under-grate pressure at a current moment; and a prediction value of the free calcium content output by the prediction model; wherein, after each execution of an Action, the State is updated through a preset simulation environment;
- a Reward configuration module, configured to determine a Reward indicating whether the output prediction value of the free calcium content is within a preset target value range, and indicating a current coal feed amount; and
- an A2C reinforcement learning model construction module, configured to construct the reinforcement learning model that represents the association between the coal feed amount and the free calcium content, based on the Action, the State and the Reward.
- The present embodiment corresponds to the above method embodiment as the apparatus embodiment. Different from the existing technology that may not meet the needs of a complex scenario of cement calcination, the apparatus for constructing a reinforcement learning model provided in this embodiment of the present disclosure introduces the concept of reinforcement learning into a cement calcination scenario, based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount under the influence of a plurality of parameters and the free calcium content of a final product is constructed. In addition, since the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to other similar scenarios.
- According to an embodiment of the present disclosure, the present disclosure also provides an electronic device for constructing a reinforcement learning model and a readable storage medium.
-
Fig. 6 shows a block diagram of an electronic device suitable for implementing the method for constructing a reinforcement learning model according to an embodiment of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure described and/or claimed herein. - As shown in
Fig. 6 , the electronic device includes: one ormore processors 601, amemory 602, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are connected to each other using different buses, and may be installed on a common motherboard or in other methods as needed. The processor may process instructions executed within the electronic device, including instructions stored in or on the memory to display graphic information of GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, a plurality of processors and/or a plurality of buses may be used together with a plurality of memories and a plurality of memories if desired. Similarly, a plurality of electronic devices may be connected, and the devices provide some necessary operations, for example, as a server array, a set of blade servers, or a multi-processor system. InFig. 6 , oneprocessor 601 is used as an example. - The
memory 602 is a non-transitory computer readable storage medium provided by the present disclosure. The memory stores instructions executable by at least one processor, so that the at least one processor performs the method for constructing a reinforcement learning model provided by the present disclosure. The non-transitory computer readable storage medium of the present disclosure stores computer instructions for causing a computer to perform the method for constructing a reinforcement learning model provided by the present disclosure. - The
memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the method for constructing a reinforcement learning model in the embodiments of the present disclosure (for example, the first simulationmodel establishing unit 501, the second simulationmodel establishing unit 502, the predictionmodel establishing unit 503 and the reinforcement learningmodel construction unit 504 as shown inFig. 5 ). Theprocessor 601 executes the non-transitory software programs, instructions, and modules stored in thememory 602 to execute various functional applications and data processing of the server, that is, to implement the method for constructing a reinforcement learning model in the foregoing method embodiments. - The
memory 602 may include a storage program area and a storage data area, where the storage program area may store an operating system and at least one function required application program; and the storage data area may store data created by the use of the electronic device according to the method for constructing a reinforcement learning model, etc. In addition, thememory 602 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, thememory 602 may optionally include memories remotely provided with respect to theprocessor 601, and these remote memories may be connected to the electronic device of the method for constructing a reinforcement learning model through a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof. - The electronic device of the method for constructing a reinforcement learning model may further include: an
input apparatus 603 and anoutput apparatus 604. Theprocessor 601, thememory 602, theinput apparatus 603, and theoutput apparatus 604 may be connected through a bus or in other methods. InFig. 6 , connection through a bus is used as an example. - The
input apparatus 603 may receive input digital or character information, and generate key signal inputs related to user settings and function control of the electronic device of the method for constructing a reinforcement learning model, such as touch screen, keypad, mouse, trackpad, touchpad, pointing stick, one or more mouse buttons, trackball, joystick and other input apparatuses. Theoutput apparatus 604 may include a display device, an auxiliary lighting apparatus (for example, LED), a tactile feedback apparatus (for example, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen. - Various embodiments of the systems and technologies described herein may be implemented in digital electronic circuit systems, integrated circuit systems, dedicated ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs that may be executed and/or interpreted on a programmable system that includes at least one programmable processor. The programmable processor may be a dedicated or general-purpose programmable processor, and may receive data and instructions from a storage system, at least one input apparatus, and at least one output apparatus, and transmit the data and instructions to the storage system, the at least one input apparatus, and the at least one output apparatus.
- These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions of the programmable processor and may use high-level processes and/or object-oriented programming languages, and/or assembly/machine languages to implement these computing programs. As used herein, the terms "machine readable medium" and "computer readable medium" refer to any computer program product, device, and/or apparatus (for example, magnetic disk, optical disk, memory, programmable logic apparatus (PLD)) used to provide machine instructions and/or data to the programmable processor, including machine readable medium that receives machine instructions as machine readable signals. The term "machine readable signal" refers to any signal used to provide machine instructions and/or data to the programmable processor.
- In order to provide interaction with a user, the systems and technologies described herein may be implemented on a computer, the computer has: a display apparatus for displaying information to the user (for example, CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and a pointing apparatus (for example, mouse or trackball), and the user may use the keyboard and the pointing apparatus to provide input to the computer. Other types of apparatuses may also be used to provide interaction with the user; for example, feedback provided to the user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and any form (including acoustic input, voice input, or tactile input) may be used to receive input from the user.
- The systems and technologies described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., application server), or a computing system that includes frontend components (for example, a user computer having a graphical user interface or a web browser, through which the user may interact with the implementations of the systems and the technologies described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., communication network). Examples of the communication network include: local area networks (LAN), wide area networks (WAN) and the Internet.
- The computing system may include a client and a server. The client and the server are generally far from each other and usually interact through the communication network. The relationship between the client and the server is generated by computer programs that run on the corresponding computer and have a client-server relationship with each other. The server may be a cloud server, also known as a cloud computing server or a cloud host. The server is a host product in the cloud computing service system to solve the defects of management difficulty in traditional physical host and virtual private server (VPS) services Large, and weak business scalability.
- According to the technical solution of the embodiments of the present disclosure, the concept of reinforcement learning is introduced into a cement calcination scenario, based on the established simulation models and the prediction model, under the reinforcement learning architecture, the reinforcement learning model that may represent the corresponding relationship between the input coal feed amount and the free calcium content of a final product under the influence of a plurality of parameters is constructed. In addition, since the reinforcement learning model is different from the compensator characteristics of other machine learning models, it is more compatible with the complex and multi-parameter cement calcination scenario, making the determined corresponding relationship more accurate, and at the same time a strong generalization ability of the reinforcement learning model may also be more simply applied to other similar scenarios.
- It should be understood that the various forms of processes shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in different orders. As long as the desired results of the technical solution disclosed in the present disclosure may be achieved, no limitation is made herein.
- The above particular embodiments do not constitute limitation on the protection scope of the present disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions may be made according to design requirements and other factors. Any modification, equivalent replacement and improvement made within the spirit and principle of the present disclosure shall be included in the protection scope of the present disclosure.
Claims (15)
- A method for constructing a reinforcement learning model, the method comprising:establishing (201) a first simulation model between a calciner coal feed amount and a calciner temperature;establishing (202) a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature;establishing (203) a prediction model among:an under-grate pressure;the calciner temperature output by the first simulation model;the kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; anda free calcium content; andconstructing (204) a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount comprising the calciner coal feed amount and the kiln head coal feed amount.
- The method according to claim 1, further comprising:receiving (305) a target free calcium content given in a target scenario;determining (306) a theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model; wherein, the theoretical coal feed amount comprises a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount; andguiding (307) a calciner coal feeding operation and a kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount.
- The method according to claim 2, further comprising:acquiring a current calciner temperature, and determining a simulated calciner coal feed amount corresponding to the current calciner temperature based on the first simulation model; andadjusting the calciner temperature based on a plus or minus of a first difference between the simulated calciner coal feed amount and the theoretical calciner coal feed amount, in response to the first difference exceeding a first preset threshold.
- The method according to claim 2, further comprising:acquiring a current kiln current, a current secondary air temperature, and a current smoke chamber temperature, and determining a simulated kiln head coal feed amount corresponding to the current kiln current, the current secondary air temperature, and the current smoke chamber temperature based on the second simulation model; andadjusting the kiln current, the secondary air temperature, and the smoke chamber temperature based on a second difference between the simulated kiln head coal feed amount and the theoretical kiln head coal feed amount, in response to the second difference exceeding a second preset threshold.
- The method according to any one of claims 1-4, wherein, the constructing (204) comprises:
constructing the reinforcement learning model according to an Actor-Critic reinforcement learning model architecture. - The method according to claim 5, wherein, the constructing the reinforcement learning model according to an Actor-Critic reinforcement learning model architecture, comprises:constructing the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure as an Action represented by a three-dimensional vector;constructing a State represented by a ten-dimensional vector by at least using followings as a dimension respectively:a calciner temperature, a kiln current, a secondary air temperature, and a smoke chamber temperature at a previous moment;a calciner temperature, a kiln current, a secondary air temperature, a smoke chamber temperature and an under-grate pressure at a current moment; anda prediction value of the free calcium content output by the prediction model;wherein, after each execution of an Action, the State is updated through a preset simulation environment;determining a Reward indicating whether the output prediction value of the free calcium content is within a preset target value range, and indicating a current coal feed amount; andconstructing the reinforcement learning model that represents the association between the coal feed amount and the free calcium content, based on the Action, the State and the Reward.
- An apparatus for constructing a reinforcement learning model, the apparatus comprising:a first simulation model establishing unit (501), configured to establish a first simulation model between a calciner coal feed amount and a calciner temperature;a second simulation model establishing unit (502), configured to establish a second simulation model among a kiln head coal feed amount, a kiln current, a secondary air temperature, and a smoke chamber temperature;a prediction model establishing unit (503), configured to establish a prediction model among:an under-grate pressure;the calciner temperature output by the first simulation model; andthe kiln current, the secondary air temperature, and the smoke chamber temperature output by the second simulation model; anda free calcium content; anda reinforcement learning model construction unit (504), configure to construct a reinforcement learning model that represents an association between a coal feed amount and the free calcium content according to a preset reinforcement learning model architecture, using the first simulation model, the second simulation model, and the prediction model; the coal feed amount comprising the calciner coal feed amount and the kiln head coal feed amount.
- The apparatus according to claim 7, further comprising:a given parameter receiving unit, configured to receive a target free calcium content given in a target scenario;a theoretical coal feed amount determination unit, configured to determine a theoretical coal feed amount corresponding to the target free calcium content using the reinforcement learning model; wherein, the theoretical coal feed amount comprises a theoretical calciner coal feed amount and a theoretical kiln head coal feed amount; anda coal feeding operation instruction unit, configured to guide a calciner coal feeding operation and a kiln head coal feeding operation in the target scenario based on the theoretical coal feed amount.
- The apparatus according to claim 8, further comprising:a simulated calciner temperature determination unit, configured to acquire a current calciner temperature, and determine a simulated calciner coal feed amount corresponding to the current calciner temperature based on the first simulation model; anda first adjusting unit, configured to adjust the calciner temperature based on a plus or minus of a first difference between the simulated calciner coal feed amount and the theoretical calciner coal feed amount, in response to the first difference exceeding a first preset threshold.
- The apparatus according to claim 8, further comprising:a simulated kiln head coal feed amount determination unit, configured to acquire a current kiln current, a current secondary air temperature, and a current smoke chamber temperature, and determine a simulated kiln head coal feed amount corresponding to the current kiln current, the current secondary air temperature, and the current smoke chamber temperature based on the second simulation model; anda second adjusting unit, configured to adjust the kiln current, the secondary air temperature, and the smoke chamber temperature based on a second difference between the simulated kiln head coal feed amount and the theoretical kiln head coal feed amount, in response to the second difference exceeding a second preset threshold.
- The apparatus according to any one of claims 7-10, wherein, the reinforcement learning model construction unit (504) comprises:
an A2C reinforcement learning model construction subunit, configured to construct the reinforcement learning model according to an Actor-Critic reinforcement learning model architecture. - The apparatus according to claim 11, wherein, the A2C reinforcement learning model construction subunit is further configured to:an Action configuration module, configured to construct the calciner coal feed amount, the kiln head coal feed amount, and the under-grate pressure as an Action represented by a three-dimensional vector;a State configuration module, configured to construct a State represented by a ten-dimensional vector by at least using followings as a dimension respectively:a calciner temperature, a kiln current, a secondary air temperature, and a smoke chamber temperature at a previous moment;a calciner temperature, a kiln current, a secondary air temperature, a smoke chamber temperature and an under-grate pressure at a current moment; anda prediction value of the free calcium content output by the prediction model;wherein, after each execution of an Action, the State is updated through a preset simulation environment;a Reward configuration module, configured to determine a Reward indicating whether the output prediction value of the free calcium content is within a preset target value range, and indicating a current coal feed amount; andan A2C reinforcement learning model construction module, configured to construct the reinforcement learning model that represents the association between the coal feed amount and the free calcium content, based on the Action, the State and the Reward.
- An electronic device, comprising:at least one processor (601); anda memory (602), communicatively connected to the at least one processor (601); wherein,the memory (602), storing instructions executable by the at least one processor (601), the instructions, when executed by the at least one processor (601), cause the at least one processor (601) to perform the method for constructing a reinforcement learning model according to any one of claims 1-6.
- A non-transitory computer readable storage medium, storing computer instructions, the computer instructions, being used to cause a computer to perform the method for constructing a reinforcement learning model according to any one of claims 1-6.
- A computer program product comprising a computer program, the computer program, when executed by a processor (601), implementing the method according to any one of claims 1-6.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010948561.XA CN112100916B (en) | 2020-09-10 | 2020-09-10 | Method, device, electronic equipment and medium for constructing reinforcement learning model |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3872432A1 true EP3872432A1 (en) | 2021-09-01 |
EP3872432B1 EP3872432B1 (en) | 2023-06-21 |
Family
ID=73750827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21164660.9A Active EP3872432B1 (en) | 2020-09-10 | 2021-03-24 | Method, apparatus and electronic device for constructing reinforcement learning model |
Country Status (5)
Country | Link |
---|---|
US (1) | US20210216686A1 (en) |
EP (1) | EP3872432B1 (en) |
JP (1) | JP7257436B2 (en) |
KR (1) | KR102506122B1 (en) |
CN (1) | CN112100916B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7348451B2 (en) | 2019-12-02 | 2023-09-21 | トヨタ車体株式会社 | Lower body reinforcement structure |
CN113657484B (en) * | 2021-08-13 | 2024-02-09 | 济南大学 | Method for dividing and identifying typical working conditions of cement grate cooler |
CN113673112A (en) * | 2021-08-27 | 2021-11-19 | 西安热工研究院有限公司 | Method and system for determining water production cost of water-electricity cogeneration unit |
CN114236104A (en) * | 2021-10-28 | 2022-03-25 | 阿里云计算有限公司 | Method, device, equipment, medium and product for measuring free calcium oxide |
CN114585004B (en) * | 2022-03-03 | 2023-04-25 | 南京信息工程大学 | Multi-agent heterogeneous network resource optimization method based on Actor-Critic algorithm |
CN114622912B (en) * | 2022-03-17 | 2022-12-27 | 中国矿业大学 | Intelligent control device and control method for coal mining machine |
CN114742312A (en) * | 2022-04-26 | 2022-07-12 | 西安热工研究院有限公司 | Coal mill coal blockage early warning method and device, electronic equipment and storage medium |
CN115186582B (en) * | 2022-07-05 | 2023-04-18 | 科大智能物联技术股份有限公司 | Steel rolling heating furnace control method based on machine learning model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202946A (en) * | 2016-07-18 | 2016-12-07 | 燕山大学 | Clinker free calcium levels Forecasting Methodology based on degree of depth belief network model |
CN109147878A (en) * | 2018-10-08 | 2019-01-04 | 燕山大学 | A kind of clinker free calcium flexible measurement method |
CN109165798A (en) * | 2018-10-19 | 2019-01-08 | 燕山大学 | A kind of Free Calcium Oxide Contents in Cement Clinker on-line prediction method and system |
CN109761517A (en) * | 2019-03-13 | 2019-05-17 | 安徽海螺集团有限责任公司 | A method of based on the control clinker production of free calcium prediction data |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1014240A1 (en) | 1998-12-17 | 2000-06-28 | Siemens Aktiengesellschaft | A system of case-based reasoning for sensor prediction in a technical process, especially in a cement kiln, method and apparatus therefore |
ATE292098T1 (en) * | 1999-11-04 | 2005-04-15 | Pretoria Portland Cement | CONTROL SYSTEM FOR A KILN SYSTEM |
US8095479B2 (en) * | 2006-02-28 | 2012-01-10 | Hitachi, Ltd. | Plant control apparatus and method having functions of determining appropriate learning constraint conditions |
US7660639B2 (en) * | 2006-03-27 | 2010-02-09 | Hitachi, Ltd. | Control system for control subject having combustion unit and control system for plant having boiler |
JP6636358B2 (en) * | 2015-09-30 | 2020-01-29 | 太平洋セメント株式会社 | How to predict the quality or manufacturing conditions of fly ash cement |
JP6639988B2 (en) | 2016-03-29 | 2020-02-05 | 太平洋セメント株式会社 | Prediction method of manufacturing conditions of cement clinker |
CN106570244B (en) * | 2016-10-25 | 2019-07-30 | 浙江邦业科技股份有限公司 | A kind of one-dimensional emulation mode for predicting cement rotary kiln clinker quality |
CN110187727B (en) * | 2019-06-17 | 2021-08-03 | 武汉理工大学 | Glass melting furnace temperature control method based on deep learning and reinforcement learning |
CN111061149B (en) * | 2019-07-01 | 2022-08-02 | 浙江恒逸石化有限公司 | Circulating fluidized bed coal saving and consumption reduction method based on deep learning prediction control optimization |
CN110981240B (en) * | 2019-12-19 | 2022-04-08 | 华东理工大学 | Calcination process optimization method and system |
-
2020
- 2020-09-10 CN CN202010948561.XA patent/CN112100916B/en active Active
-
2021
- 2021-03-24 EP EP21164660.9A patent/EP3872432B1/en active Active
- 2021-03-29 US US17/215,932 patent/US20210216686A1/en active Pending
- 2021-03-29 JP JP2021055392A patent/JP7257436B2/en active Active
- 2021-04-20 KR KR1020210051391A patent/KR102506122B1/en active IP Right Grant
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202946A (en) * | 2016-07-18 | 2016-12-07 | 燕山大学 | Clinker free calcium levels Forecasting Methodology based on degree of depth belief network model |
CN109147878A (en) * | 2018-10-08 | 2019-01-04 | 燕山大学 | A kind of clinker free calcium flexible measurement method |
CN109165798A (en) * | 2018-10-19 | 2019-01-08 | 燕山大学 | A kind of Free Calcium Oxide Contents in Cement Clinker on-line prediction method and system |
CN109761517A (en) * | 2019-03-13 | 2019-05-17 | 安徽海螺集团有限责任公司 | A method of based on the control clinker production of free calcium prediction data |
Non-Patent Citations (1)
Title |
---|
"LECTURE NOTES IN CONTROL AND INFORMATION SCIENCES", vol. 344, 1 January 2006, BERLIN, DE, ISSN: 0170-8643, article ZHOU XIAOJIE ET AL: "Supervisory Control for Rotary Kiln Temperature Based on Reinforcement Learning", pages: 428 - 437, XP055820193, DOI: 10.1007/978-3-540-37256-1_49 * |
Also Published As
Publication number | Publication date |
---|---|
CN112100916A (en) | 2020-12-18 |
JP7257436B2 (en) | 2023-04-13 |
JP2022023775A (en) | 2022-02-08 |
KR102506122B1 (en) | 2023-03-03 |
EP3872432B1 (en) | 2023-06-21 |
US20210216686A1 (en) | 2021-07-15 |
CN112100916B (en) | 2023-07-25 |
KR20210052412A (en) | 2021-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3872432A1 (en) | Method, apparatus and electronic device for constructing reinforcement learning model | |
JP6530783B2 (en) | Machine learning device, control device and machine learning program | |
Jin et al. | Zeroing neural networks: A survey | |
US20220100156A1 (en) | Control system database systems and methods | |
US9740186B2 (en) | Monitoring control system and control device | |
CN110481536B (en) | Control method and device applied to hybrid electric vehicle | |
US10564617B2 (en) | Plant control device, plant control method, and recording medium | |
US10809701B2 (en) | Network adapted control system | |
CN111966361B (en) | Method, device, equipment and storage medium for determining model to be deployed | |
US11675570B2 (en) | System and method for using a graphical user interface to develop a virtual programmable logic controller | |
CN115016353A (en) | Monitoring management system for remote control equipment | |
CN110297460A (en) | Thermal walking update the system and computer | |
CN112859601B (en) | Robot controller design method, device, equipment and readable storage medium | |
CN104081298A (en) | System and method for automated handling of a workflow in an automation and/or electrical engineering project | |
CN103558762A (en) | Method for implementing immune genetic PID controller based on graphic configuration technology | |
CN112828896B (en) | Household intelligent accompanying robot and execution control method thereof | |
Labib | Machine Learning-Based Framework to Predict Single and Multiple Daylighting Simulation Outputs Using Neural Networks | |
US11709471B2 (en) | Distributed automated synthesis of correct-by-construction controllers | |
Zhao et al. | Application Research of Intelligent Pneumatic Control System in Industrial Automation | |
He et al. | Embedded Dynamic Intelligent Algorithm in Computer Software Testing | |
Xu | Design and Implementation of Embedded Intelligent Control System Software Programming Based on Genetic Algorithms | |
Criollo et al. | DIGITAL SHADOW SUPPORTED BY AUGMENTED REALITY AND CLOUD FOR THE MONITORING OF AN INDUSTRIAL PROCESS | |
KR20230108341A (en) | Data collection analysis module, operation method of data collection analysis module and programmable logic controller | |
Huang et al. | Ant Colony Algorithm Theory and Its Application in Control Engineering | |
CN115560604A (en) | Control method, device, medium and equipment for material flow regulating valve of blast furnace charging bucket |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210324 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20220411 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602021002983 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: F27B0007200000 Ipc: F27B0007420000 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 3/08 20060101ALI20221004BHEP Ipc: G06N 3/04 20060101ALI20221004BHEP Ipc: G06N 3/00 20060101ALI20221004BHEP Ipc: F27D 21/00 20060101ALI20221004BHEP Ipc: F27D 19/00 20060101ALI20221004BHEP Ipc: F27B 7/20 20060101ALI20221004BHEP Ipc: C04B 7/36 20060101ALI20221004BHEP Ipc: F27B 7/42 20060101AFI20221004BHEP |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20230111 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602021002983 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1581158 Country of ref document: AT Kind code of ref document: T Effective date: 20230715 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20230621 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230921 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1581158 Country of ref document: AT Kind code of ref document: T Effective date: 20230621 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230922 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231021 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231023 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20231021 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602021002983 Country of ref document: DE |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240130 Year of fee payment: 4 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230621 |
|
26N | No opposition filed |
Effective date: 20240322 |