CN113885328A - Nuclear power tracking control method based on integral reinforcement learning - Google Patents
Nuclear power tracking control method based on integral reinforcement learning Download PDFInfo
- Publication number
- CN113885328A CN113885328A CN202111212559.7A CN202111212559A CN113885328A CN 113885328 A CN113885328 A CN 113885328A CN 202111212559 A CN202111212559 A CN 202111212559A CN 113885328 A CN113885328 A CN 113885328A
- Authority
- CN
- China
- Prior art keywords
- nuclear power
- evaluation network
- iteration
- strategy
- tracking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000002787 reinforcement Effects 0.000 title claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 55
- 238000011156 evaluation Methods 0.000 claims abstract description 44
- 238000011217 control strategy Methods 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000009257 reactivity Effects 0.000 claims description 5
- 239000002826 coolant Substances 0.000 claims description 3
- 230000003111 delayed effect Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 2
- 238000013528 artificial neural network Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000010248 power generation Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 239000003245 coal Substances 0.000 description 3
- 238000002485 combustion reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000003915 air pollution Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036632 reaction speed Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Monitoring And Testing Of Nuclear Reactors (AREA)
Abstract
The invention discloses a nuclear power tracking control method based on integral reinforcement learning, which comprises the following steps: selecting an initial strategy, initializing relevant parameters, and selecting an initial power point and an expected power point; starting global iteration, starting local iteration, training an evaluation network by utilizing a strategy iteration integral reinforcement learning algorithm, correcting a network weight, wherein the evaluation network is used for approximating a tracking error performance index function, evaluating the performance of a current tracking error control system by utilizing the evaluation network weight, selecting an optimal control strategy through an execution process, and minimizing the total cost of one-time global iteration; judging whether the current local iteration is finished, if not, returning to the local iteration, otherwise, updating an iteration performance index function and a tracking control law to obtain an optimal tracking control strategy; and (4) completing the iteration of the global strategy to obtain an optimal tracking control strategy, tracking to an expected power point, and calculating the total cost. Therefore, the invention can continuously learn and adjust the current strategy to track to the expected power point.
Description
Technical Field
The embodiment of the invention relates to the technical field of power control of nuclear power units, in particular to a nuclear power tracking control method based on integral reinforcement learning.
Background
In recent years, due to coal combustion power generation, the greenhouse effect and air pollution caused by the coal combustion power generation are increasingly serious, and the resource reserve amount of the coal combustion power generation is also reduced year by year. The nuclear energy is used as a clean energy source, has the advantages of no pollution and low transportation cost, is widely concerned by various countries, and is applied and popularized to the power generation industry. The safety of nuclear power systems is also always concerned by various fields, so the problem of power regulation becomes a focus. A stable, safe and efficient power control method of a nuclear power unit is particularly important for the whole nuclear power industry.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
In view of the above, the present invention is proposed to provide a nuclear power tracking control method based on reinforcement integral learning, which at least partially solves the above problems.
In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:
a method of nuclear power tracking control based on reinforcement integral learning, the method comprising:
s1: selecting an initial strategy, initializing relevant parameters, and selecting an initial power point and an expected power point;
s2: performing global iteration, and updating an iterative tracking error performance index function according to an iterative control sequence to obtain an optimal tracking error performance index function;
s3: performing local iteration, training an evaluation network by using an integral reinforcement learning algorithm, correcting the weight of the evaluation network, and obtaining an optimal error control strategy by using the optimal tracking error performance index function;
s4: judging whether the current local iteration is finished, if not, returning to the local iteration step, otherwise, updating the iterative tracking error performance index function and the control law to obtain the optimal tracking error performance index function;
s5: and (4) completing the iteration of the global strategy to obtain an optimal tracking control strategy, tracking to an expected power point, and calculating the total cost.
Compared with the prior art, the technical scheme at least has the following beneficial effects:
the embodiment of the invention constructs the self-learning power tracking controller based on the self-adaptive dynamic programming algorithm through the neural network, can continuously learn, adjust and adapt to different nuclear power states through real-time operation, and can track the working points of different nuclear power units.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention to the right. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a schematic illustration of a nuclear power system model shown in accordance with an exemplary embodiment;
fig. 2 is a flowchart illustrating a nuclear power generating unit power tracking control method based on integral intensification according to an exemplary embodiment.
Detailed Description
In order to more clearly illustrate the objects, technical solutions and advantages of the present invention, the present invention is further described in detail below with reference to the accompanying drawings in combination with specific examples.
Adaptive dynamic programming has evolved rapidly since the 80's of the 20 th century, as proposed by Paul j. The method is mainly used for solving the problem of 'dimension disaster' in dynamic planning, and the specific solution method is to solve through multiple iterative optimization. In recent years, adaptive dynamic programming algorithms have shown great advantages in solving optimal control. The adaptive dynamic programming method generally uses a controller-evaluator (operator-critical) structure and a neural network to approximate the tracking error performance index function and the control strategy, gradually approximates the equation analytic solution by adopting an iterative method, and finally converges to the optimal tracking error performance index function and the optimal tracking control strategy.
The self-adaptive dynamic programming method utilizes a function approximation structure (such as a neural network) to approximate a tracking error performance index function and a control strategy in a dynamic programming equation so as to meet the optimization principle, thereby obtaining the optimal error control and the optimal tracking error performance index function of the system. The self-adaptive dynamic planning structure mainly comprises a dynamic system, a control network and an evaluation network. The evaluation network is used for approximating the optimal cost function and giving an evaluation guide to execute the network to generate optimal control. After the output of the execution network acts on the dynamic system, the evaluation network is influenced through rewards/punishments generated at different stages of the dynamic system, and the update control strategy of the execution network is known, so that the total cost (namely the sum of the rewards/punishments) reaches the optimal value.
The method for the integral reinforcement learning self-adaptive dynamic planning does not depend on a system model, and the weights of the controller and the evaluator neural network are adjusted based on the system state generated in real time and corresponding control actions. Finally, the integration reinforcement learning self-adaptive dynamic programming method can be operated on line, and the controller and the evaluator neural network can be finally converged to the optimal control strategy and the optimal tracking error performance index function in an iterative mode. The method is particularly suitable for solving the optimal control problem on line of a linear or nonlinear continuous system.
FIG. 1 is a schematic diagram of a nuclear power system in which an embodiment of the present invention is applied, and schematically illustrates a reaction heat transfer model diagram of the nuclear power system. The nuclear power system consists of one reactor and two cooling stacks. Wherein Q only represents heat transfer and has no practical meaning for a nuclear power system model. The nuclear Power system comprises five system states, wherein the Power percentage represents the generated Power percentage of the system (the full-load generated Power is 2500 MW); the Delayed nuclear concentration represents the relative concentration of Delayed neutrons in a reaction kettle of the nuclear power system; the Reactor core Temperature is the average Temperature of the Reactor core of the nuclear power system (meanwhile, T can be usedfRepresents); coolant output Temperature represents the average Temperature of the Coolant inside the nuclear power system; the Reactor coefficient represents the reactivity change of the nuclear power system caused by the up-and-down movement of the control rod. The system only uses the reaction speed of the control rod as a control signal, and when the control rod moves up and down at a certain speed, the internal reaction of the reactor core of the system changes along with the control rod. The faster the control rod moves upward and the more violent the reaction. The control rod moves downwards, and vice versa.
As shown in fig. 2, an embodiment of the present invention provides a method for tracking and controlling power of a nuclear power system based on integral reinforcement learning, where the method may include step S1 and step S5.
S1: the initialization parameters include: nuclear power system parameters, evaluation network parameters, global iteration duration, integral time constant, local iteration duration, convergence accuracy and target parameters; the nuclear power system parameters are nuclear power model system parameters, and the model comprises five system input and output states.
The nuclear power system model mainly comprises a reactor core internal neutron reaction equation, two temperature feedback models of a reactor and a reactivity equation of a control rod. In the study of reactor characteristics, a control rod control method is often used. Because the control rod has very strong neutron absorbing capacity, and the translation rate is easily controlled moreover, convenient operation, the influence of the high control rod of accuracy to reactivity control to the reactivity can embody through two kinds of modes: a change in position and a change in velocity.
In addition, selection of an initial power operating point and a desired power operating point is required, and an initial stability control strategy is determined. The following parameters are also initialized: global training time-step, local iteration time-step, neural network structure (such as number of input nodes, number of hidden nodes, and number of output layer nodes), neural network weights.
Illustratively, the structure of the evaluation network is set to be 5-15-1, wherein 5 is the number of input nodes of the evaluation network, 15 is the number of hidden nodes of the evaluation network, 1 is the number of output nodes of the evaluation network, the number of hidden nodes can be adjusted according to experience to obtain the best approximation effect, and the convergence precision is defined to be 1.0 multiplied by 10-2。
In the execution stage, the embodiment of the invention uses the simplified finite dimension control variable, namely, the finite and determined nuclear power working condition point is set for tracking.
In practical application, the selection of the initial working condition point and the expected working condition point can be set according to actual requirements, wherein the power model and parameter setting of the nuclear power unit also need to have practical significance.
S2: when global training is carried out, updating an iterative tracking error performance index function according to an iterative control sequence so as to obtain an optimality tracking error performance index function;
specifically, according to the requirement of the integral reinforcement learning method of the controller, weight initialization training work needs to be performed on the evaluation network.
Training an evaluation network by using an integral reinforcement learning algorithm: evaluating the input values of the network includes: five states x (t) of nuclear power unit working point and five states x of nuclear power unit expected working pointd(t) nuclear power unit tracking error control strategy ue(t) the output value is a tracking error performance indicator function Je(t) of (d). Wherein, Je(t) the tracking error performance indicator function is referred to as the J function for short. Optimal tracking error control strategy ue(t) is approximated by a tracking error performance indicator function obtained from the evaluation network.
The weight initialization of the evaluation network is performed within the global iteration. Preferably, the weight value can be initialized again when global iteration starts each time, so that the convergence of the evaluation network is better ensured on the basis of ensuring the stability and the convergence speed of the evaluation network, and an optimal tracking control strategy of the power of the nuclear power system can be found as soon as possible.
In the execution stage, input data of the evaluation network are five state outputs x (t) and an expected power point x of the nuclear power unitd(t) difference xe(t) and an optimal tracking error control strategy u obtained from the trained evaluation networke(t) of (d). Evaluating the output data of the network as a tracking error performance index function Je(t)。
According to the Bellman equation, utilizing the output J of the evaluation network at the next momente(T + T) and utility function U (T) are calculated to obtain output data J at the current momente(t), the calculation formula is as follows:
The following example describes the process of obtaining the optimal tracking error performance indicator function in detail.
Let t time, x (t) be five input and output states of the nuclear power unit at the time, xd(t) As the desired power point, we have a systematic tracking error xe(t),ue(t) a tracking error control strategy; the error control system can be defined as:
xe(t+1)=f(x(t)-xd(t),ue(t),t)
wherein f can be derived from a nuclear power unit power model. The utility function is defined as follows:
U(t)=α[xe(t)]2+β[ue(t)]2
wherein α and β are constants; u. ofeAnd (t) is the difference value of the control law of the nuclear power unit at the current time and the expected working control law. And the utility function U (t) represents the sum of the difference value of the current working point and the expected working point of the nuclear power unit at the time t and the utility of the control rod control law.
We give a new form of utility function:
wherein, Q and R are positive definite matrices, and our global tracking error performance index function can be defined as:
the Hamiltonian equation can be derived as follows:
the optimal tracking error control law can be expressed as:
Where i is 0,1,2, …, the error tracking control law can be obtained by the following equation:
S3: performing local iteration, training an evaluation network by using an integral reinforcement learning algorithm, correcting the weight of the evaluation network, and obtaining an optimal error control strategy by using the optimal tracking error performance index function;
Under the condition of given initial stable control strategy, let us make the control law ue 0. Let the integration duration T equal to 1, and select the local training iteration duration as 30 steps.
The tracking error performance index function updating rule is as follows:
the optimal error control law update rule is as follows:
Then, the weight of the evaluation network is updated to approximate the optimal tracking error performance index function.
Wherein, the updating rule is as follows:
WCL=-(XTX)-1(XTY)
wherein,for evaluating weight vector deviation of network, X is weight vector inner product difference of network, Y is utility function value of network approximation, WCLTo evaluate the weight of the network.
Since the error control strategy and the tracking error performance indicator function change with the weights of the controller and the evaluator neural network, adjusting the weights of the controller and the evaluator neural network means updating of the error control strategy and the tracking error performance indicator function. In the execution stage, limited control variables are substituted into the optimal tracking error performance index function approximated by the evaluation network
The optimal error control strategy is obtained approximately according to a tracking error performance index function obtained by an evaluation network, and a control variable which enables the optimal tracking error performance index function to be minimum is selected as the optimal tracking error control strategy:
the evaluation network is used for approximating an optimal tracking error performance index function, evaluating the performance of the nuclear power control rod system by using the evaluation network weight, and selecting an optimal tracking control strategy through an execution flow to minimize the total tracking error cost of global training.
S4: judging whether the current local iteration is finished, if not, returning to the local iteration step, otherwise, updating the iterative tracking error performance index function and the error control law to obtain the optimal tracking error performance index function;
specifically, after local iteration is completed, whether the current iteration number reaches an iteration threshold value is determined, and if yes, an iterative tracking error performance index function and an error control law are updated to obtain an optimal tracking error performance index function and an optimal error control strategy.
If not, go to step S3; otherwise, step S5 is executed.
S5: and (4) finishing the iteration of the global strategy to obtain an optimal tracking error control strategy, tracking to a desired power point, and calculating the total cost (tracking error and control rod control cost).
Calculation of the total cost requires an optimal tracking error control strategySubstitution into the actual model, here due to the utility function U (x)e,ue) Is dependent on the actual model, so the total cost can be approximated to the resulting optimality tracking error performance indicator function
Although the steps in this embodiment are described in the foregoing sequence, those skilled in the art will understand that, in order to achieve the effect of this embodiment, the different steps need not be executed in such a sequence, and may be executed simultaneously (in parallel) or in an inverted sequence, and these simple changes are all within the protection scope of the present invention. The technical solutions provided by the embodiments of the present invention are described in detail above. Although specific examples have been employed herein to illustrate the principles and practice of the invention, the foregoing descriptions of embodiments are merely provided to assist in understanding the principles of embodiments of the invention; also, it will be apparent to those skilled in the art that variations may be made in the embodiments and applications of the invention without departing from the spirit and scope of the invention.
It should be noted that the flowcharts mentioned herein are not limited to the forms shown herein, and may be divided and/or combined.
It should be noted that: the numerals and text in the figures are only used to illustrate the invention more clearly and are not to be considered as an undue limitation of the scope of the invention.
The present invention is not limited to the above-described embodiments, and any variations, modifications, or alterations that may occur to one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.
Claims (8)
1. A nuclear power system power tracking control method based on integral reinforcement learning is characterized by comprising the following steps:
s1: selecting an initial strategy, initializing relevant parameters, and selecting an initial power point and an expected power point;
s2: performing global iteration, and updating an iterative tracking error performance index function according to an iterative control sequence to obtain an optimal tracking error performance index function;
s3: performing local iteration, training an evaluation network by using an integral reinforcement learning algorithm, correcting the weight of the evaluation network, and obtaining an optimal tracking control strategy by using the optimal tracking performance index function;
s4: judging whether the current local iteration is finished, if not, returning to the local iteration step, otherwise, updating the iterative tracking error performance index function and the tracking control law to obtain the optimal tracking error performance index function;
s5: and (4) completing the iteration of the global strategy to obtain an optimal tracking control strategy, tracking to an expected power point, and calculating the total cost.
2. The method according to claim 1, wherein in the step S1, the initialization parameters comprise: nuclear power system parameters, evaluation network parameters, global iteration duration, integral time constant, local iteration duration, convergence accuracy and target parameters; the nuclear power system parameters are nuclear power model system parameters, and the model comprises five system input and output states.
3. The method of claim 2Method, characterized in that the structure of the evaluation network is set to 5-15-1 and the convergence accuracy is defined to be 1.0 x 10-2Wherein, 5 is the number of input nodes of the evaluation network, 15 is the number of hidden nodes of the evaluation network, and 1 is the number of output nodes of the evaluation network.
4. The method of claim 1, wherein the step S1 further comprises selecting an initial control strategy, wherein the error control strategy is obtained by a conventional PID or MPC strategy, so as to obtain an initial stable control rate.
5. The method of claim 1, wherein in step S3, the input data of the evaluation network includes 5 operating states x (t) of the nuclear power plant and an operating state point x of the desired powerd(t) tracking error value xe(t), and tracking control strategy u for nuclear power control rodse(t); the output data of the evaluation network comprises: tracking error performance indicator function Je(t);
According to the Bellman equation, utilizing the output J of the next integration moment of the evaluation networke(T + T) and a utility function U (T), and calculating output data J at the current moment by the following formulae(t):
Wherein x ise(t) is the working state point x of 5 working states x (t) and expected power of the nuclear power unitd(t) tracking error value xe(t); utility function U (t) represents the tracking error value x at time te(t) and tracking control strategy u of nuclear power control rode(t) sum of the utilities.
6. The method of claim 5, wherein the utility function U (t) is calculated by:
U(t)=α[xe(t)]2+β[ue(t)]2
wherein α and β are constants; u. ofeAnd (t) is the difference value of the control law of the nuclear power unit at the current time and the expected working control law.
7. The method of claim 1, wherein in the step S3, the input data of the execution phase of the evaluation network includes relative power coefficient of the nuclear power plant to be controlled, relative concentration of delayed neutrons, average temperature of the reactor core, average temperature of coolant, and reactivity of control rods; the output data of the execution stage of the evaluation network comprises an optimal tracking control strategy; and the optimal tracking control strategy is obtained approximately according to a tracking error performance index function obtained by the evaluation network.
8. The method according to claim 1, wherein in the step S3, the update rule of the evaluation network is as follows:
WCL=-(XTX)-1(XTY)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111212559.7A CN113885328A (en) | 2021-10-18 | 2021-10-18 | Nuclear power tracking control method based on integral reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111212559.7A CN113885328A (en) | 2021-10-18 | 2021-10-18 | Nuclear power tracking control method based on integral reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113885328A true CN113885328A (en) | 2022-01-04 |
Family
ID=79003527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111212559.7A Pending CN113885328A (en) | 2021-10-18 | 2021-10-18 | Nuclear power tracking control method based on integral reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113885328A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880942A (en) * | 2022-05-23 | 2022-08-09 | 西安交通大学 | Nuclear reactor power and axial power distribution reinforcement learning decoupling control method |
CN114967445A (en) * | 2022-04-29 | 2022-08-30 | 中国科学院自动化研究所 | Nuclear power system control method and device |
CN117075588A (en) * | 2023-10-18 | 2023-11-17 | 北京网藤科技有限公司 | Safety prediction fitting method and system for industrial automation control behaviors |
CN118494790A (en) * | 2024-07-15 | 2024-08-16 | 北京易动宇航科技有限公司 | Ammonia working medium thruster thrust stability control method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103217899A (en) * | 2013-01-30 | 2013-07-24 | 中国科学院自动化研究所 | Q-function self-adaptation dynamic planning method based on data |
CN104022503A (en) * | 2014-06-18 | 2014-09-03 | 中国科学院自动化研究所 | Electric-energy optimal control method for intelligent micro-grid with energy storage device |
CN105843037A (en) * | 2016-04-11 | 2016-08-10 | 中国科学院自动化研究所 | Q-learning based control method for temperatures of smart buildings |
US20190384237A1 (en) * | 2018-06-13 | 2019-12-19 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Data-Driven Output Feedback Control |
CN111650830A (en) * | 2020-05-20 | 2020-09-11 | 天津大学 | Four-rotor aircraft robust tracking control method based on iterative learning |
CN111679577A (en) * | 2020-05-27 | 2020-09-18 | 北京交通大学 | Speed tracking control method and automatic driving control system of high-speed train |
-
2021
- 2021-10-18 CN CN202111212559.7A patent/CN113885328A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103217899A (en) * | 2013-01-30 | 2013-07-24 | 中国科学院自动化研究所 | Q-function self-adaptation dynamic planning method based on data |
CN104022503A (en) * | 2014-06-18 | 2014-09-03 | 中国科学院自动化研究所 | Electric-energy optimal control method for intelligent micro-grid with energy storage device |
CN105843037A (en) * | 2016-04-11 | 2016-08-10 | 中国科学院自动化研究所 | Q-learning based control method for temperatures of smart buildings |
US20190384237A1 (en) * | 2018-06-13 | 2019-12-19 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for Data-Driven Output Feedback Control |
CN111650830A (en) * | 2020-05-20 | 2020-09-11 | 天津大学 | Four-rotor aircraft robust tracking control method based on iterative learning |
CN111679577A (en) * | 2020-05-27 | 2020-09-18 | 北京交通大学 | Speed tracking control method and automatic driving control system of high-speed train |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114967445A (en) * | 2022-04-29 | 2022-08-30 | 中国科学院自动化研究所 | Nuclear power system control method and device |
CN114880942A (en) * | 2022-05-23 | 2022-08-09 | 西安交通大学 | Nuclear reactor power and axial power distribution reinforcement learning decoupling control method |
CN114880942B (en) * | 2022-05-23 | 2024-03-12 | 西安交通大学 | Nuclear reactor power and axial power distribution reinforcement learning decoupling control method |
CN117075588A (en) * | 2023-10-18 | 2023-11-17 | 北京网藤科技有限公司 | Safety prediction fitting method and system for industrial automation control behaviors |
CN117075588B (en) * | 2023-10-18 | 2024-01-23 | 北京网藤科技有限公司 | Safety prediction fitting method and system for industrial automation control behaviors |
CN118494790A (en) * | 2024-07-15 | 2024-08-16 | 北京易动宇航科技有限公司 | Ammonia working medium thruster thrust stability control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113885328A (en) | Nuclear power tracking control method based on integral reinforcement learning | |
CN113868961A (en) | Power tracking control method based on adaptive value iteration nuclear power system | |
CN109901403A (en) | A kind of face autonomous underwater robot neural network S control method | |
CN115933410B (en) | Optimal tracking control method for double-time-scale coal-fired power generation system based on Q learning | |
CN104991444B (en) | Non-linearity PID self-adaptation control method based on Nonlinear Tracking Differentiator | |
Taeib et al. | Tuning optimal PID controller | |
Gouadria et al. | Comparison between self-tuning fuzzy PID and classic PID controllers for greenhouse system | |
CN117970782B (en) | Fuzzy PID control method based on fish scale evolution GSOM improvement | |
CN116755409A (en) | Coal-fired power generation system coordination control method based on value distribution DDPG algorithm | |
CN112615364A (en) | Novel wide-area intelligent cooperative control method for power grid stability control device | |
Kostadinov et al. | Online weight-adaptive nonlinear model predictive control | |
Chidrawar et al. | Generalized predictive control and neural generalized predictive control | |
CN114722693A (en) | Optimization method of two-type fuzzy control parameter of water turbine regulating system | |
Yu et al. | A Knowledge-based reinforcement learning control approach using deep Q network for cooling tower in HVAC systems | |
CN117389132A (en) | Heating system multi-loop PID intelligent setting system based on cloud edge end cooperation | |
CN116594288A (en) | Control method and system based on longhorn beetle whisker fuzzy PID | |
CN115327890B (en) | Method for optimizing main steam pressure of PID control thermal power depth peak shaving unit by improved crowd searching algorithm | |
CN116880191A (en) | Intelligent control method of process industrial production system based on time sequence prediction | |
Lee et al. | Value function-based approach to the scheduling of multiple controllers | |
Berger et al. | Neurodynamic programming approach for the PID controller adaptation | |
El Aoud et al. | Intelligent control for a greenhouse climate | |
CN118192249B (en) | Boiler turbine system load control method based on experience-oriented Q learning | |
CN118494790B (en) | Ammonia working medium thruster thrust stability control method and system | |
Liu | On a method of single neural PID feedback compensation control | |
Yordanova | Robust stability of single input fuzzy system for control of industrial plants with time delay |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |