CN116540543A

CN116540543A - Multi-target control optimization method and device for nuclear steam supply system

Info

Publication number: CN116540543A
Application number: CN202310558207.XA
Authority: CN
Inventors: 张天昊; 黄晓津; 董哲
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-05-17
Filing date: 2023-05-17
Publication date: 2023-08-04

Abstract

The invention discloses a multi-objective control optimization method and device for a nuclear steam supply system. The method comprises the following steps: determining input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in a nuclear steam supply system; determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of an optimizer; loading a preset deep learning network model and a reward function into an optimizer, and configuring an event triggering mechanism for the optimizer; under the condition of triggering the execution of the optimizer, the control optimizer performs multiple iterations to obtain a multi-objective control optimization strategy; the multi-objective control optimization strategy is a strategy for acquiring a preset jackpot function value; the control optimizer performs control optimization operation based on a multi-objective control optimization strategy, and can perform performance optimization on a plurality of control objectives of the nuclear steam supply system while ensuring the running stability and robustness of the nuclear steam supply system.

Description

Multi-target control optimization method and device for nuclear steam supply system

Technical Field

The invention relates to the technical field of automatic control of nuclear power stations, in particular to a multi-target control optimization method and device of a nuclear steam supply system.

Background

With the global demand for energy resources and carbon emission reduction, nuclear fission energy is receiving more and more attention due to its clean and high density characteristics. The nuclear steam supply system (Nuclear Steam Supply System, NSSS) is a system for generating steam using nuclear fission energy in a nuclear power plant, and mainly consists of a fission reactor nuclear steam generator. A typical nuclear steam supply system transfers heat generated by a fission reactor through a steam generator to a two-circuit system, producing a steam stream to drive a thermal load device for power generation or cogeneration.

The control of the existing nuclear steam supply system is almost realized by the traditional control method, such as a state feedback control method, a sliding mode control method and the like, however, due to the complex coupling between the nuclear steam generators of the fission reactor, human intervention of operators is often required, and the task load of the operators is increased. To reduce human intervention, it is desirable to optimize the control techniques of the nuclear steam supply system.

Current control optimization techniques for nuclear steam supply systems rely on predictions of the system dynamics model. However, due to the characteristics of nonlinearity and strong coupling of the nuclear steam supply system, the model prediction result is poor in accuracy, so that the optimization performance and the system operation stability are difficult to ensure, and the process of optimizing the control technology cannot realize simultaneous optimization of a plurality of control targets.

Therefore, there is a need for a multi-objective control optimization method for a nuclear steam supply system that is independent of the dynamic model of the system.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a multi-objective control optimization method and device for a nuclear steam supply system, so as to solve the problem of poor control optimization technical effect of the current nuclear steam supply system.

According to a first aspect, an embodiment of the present invention provides a multi-objective control optimization method for a nuclear steam supply system, the method including:

determining input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in a nuclear steam supply system; the optimizer is an optimization program for solving the problem of multi-objective control optimization;

determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of the optimizer;

loading a preset deep learning network model and the rewarding function into the optimizer, and configuring an event triggering mechanism for the optimizer; the event triggering mechanism is a mechanism for triggering the optimizer to execute according to a predetermined operation state of the nuclear steam supply system;

under the condition of triggering the optimizer to execute, controlling the optimizer to perform multiple iterations so as to obtain a multi-objective control optimization strategy; the multi-objective control optimization strategy is a strategy for acquiring a preset cumulative prize function value;

And controlling the optimizer to execute control optimization operation based on the multi-objective control optimization strategy so as to optimize a plurality of control objectives.

In some embodiments, after determining the reward functions corresponding to the plurality of control targets based on the input parameters and the output parameters of the optimizer, the method further includes:

modeling the multi-objective control optimization problem as a Markov decision process, and defining five-tuple consisting of an input state space, an action space, a state transfer function, a discount factor and initial state distribution; wherein the input state space is a parameter set determined based on the input parameters; the action space is a parameter set determined based on the output parameters; the state transfer function is a function for determining the state of the current time step according to the state of the previous time step and the selected action; the discount factor is used to characterize the importance of the prize value obtained for each time step in the jackpot;

determining a deep reinforcement learning algorithm based on the markov decision process;

and determining the preset deep learning network model based on a deep reinforcement learning algorithm.

In some embodiments, the predetermined deep learning network model includes a first network and a second network; the controlling the optimizer to perform a plurality of iterations to obtain a multi-objective control optimization strategy includes:

Initializing network parameters of the first network and network parameters of the second network;

initializing the state space to obtain a current state;

executing a first processing procedure for the current time step; the first process includes: selecting an action in the action space based on the current state according to the strategy output by the first network, executing the state transfer function to obtain the state of the next time step, and acquiring the rewarding value of the current time step according to the rewarding function;

accumulating the prize value for each time step based on the discount factor and determining, by the second network, whether the current accumulated prize value is less than a preset jackpot function value;

under the condition that the accumulated rewards are smaller than the preset accumulated rewards function value, respectively updating network parameters of the first network and the second network based on rewards of the current time step to obtain an updated preset deep learning network model, and re-executing the first processing process based on the updated preset deep learning network model in the next time step;

and calling the preset deep learning network model to generate the multi-objective control optimization strategy under the condition that the accumulated prize value is larger than or equal to a preset accumulated prize function value.

In some embodiments, after loading the preset deep learning network model and the reward function into the optimizer and configuring an event trigger mechanism for the optimizer, the method further includes:

determining the running state of the nuclear steam supply system according to a preset performance evaluation index at each time step;

controlling the nuclear steam supply system to send out an operation early warning prompt under the condition that the operation state is a first state;

controlling the start of execution of the optimizer in the case that the running state is the second state;

and waiting for entering the next time step in the case that the running state is the third state.

In some embodiments, the input parameters include a deviation of an actual value corresponding to the control target from a reference value, a system parameter of the nuclear steam supply system, and an operating parameter; the output parameter is a preset correction amount corresponding to the control target.

In some embodiments, the step of determining a reward function based on the input parameters and the output parameters of the optimizer comprises:

aiming at each control target, obtaining a difference value of an input parameter corresponding to the control target and a steady-state maximum allowable error corresponding to the control target, and a ratio of an absolute value of the difference value to the steady-state maximum allowable error;

And determining the reward function based on the ratio corresponding to each control target.

In some embodiments, the depth reinforcement learning algorithm includes any one of a depth Q network learning algorithm, a depth deterministic strategy gradient algorithm, and an actor-critter algorithm.

According to a second aspect, an embodiment of the present invention provides a multi-objective control optimization apparatus for a nuclear steam supply system, the apparatus comprising:

the parameter determining module is used for determining input parameters and output parameters of the optimizer based on a plurality of control targets to be optimized in the nuclear steam supply system; the optimizer is an optimization program for solving the problem of multi-objective control optimization;

the function determining module is used for determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of the optimizer;

the optimizing configuration module is used for loading a preset deep learning network model and the rewarding function into the optimizer and configuring an event triggering mechanism for the optimizer; the event triggering mechanism is a mechanism for triggering the optimizer to execute according to a predetermined operation state of the nuclear steam supply system;

the iteration module is used for controlling the optimizer to carry out multiple iterations under the condition of triggering the execution of the optimizer so as to obtain a multi-objective control optimization strategy; the multi-objective control optimization strategy is a strategy for acquiring a preset rewarding function value;

And the optimization execution module is used for controlling the optimizer to execute control optimization operation based on the multi-objective control optimization strategy so as to realize optimization of a plurality of control objectives.

According to a third aspect, an embodiment of the present invention provides a computer device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the multi-objective control optimization method of a nuclear steam supply system as described in the first aspect.

According to a fourth aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the multi-objective control optimization method of a nuclear steam supply system as described in the first aspect.

The technical scheme of the invention has the following advantages:

the invention provides a multi-objective control optimization method and device of a nuclear steam supply system, wherein the method comprises the following steps: determining input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in a nuclear steam supply system; the optimizer is an optimization program for solving the problem of multi-objective control optimization; determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of an optimizer; loading a preset deep learning network model and a reward function into an optimizer, and configuring an event triggering mechanism for the optimizer; the event triggering mechanism is a mechanism for triggering the execution of the optimizer according to a predetermined operation state of the nuclear steam supply system; under the condition of triggering the execution of the optimizer, the control optimizer performs multiple iterations to obtain a multi-objective control optimization strategy; the multi-objective control optimization strategy is a strategy for acquiring a preset jackpot function value; the control optimizing device performs control optimizing operation based on the multi-objective control optimizing strategy, and can optimize the performance of a plurality of control targets of the nuclear steam supply system while guaranteeing the running stability and the robustness of the nuclear steam supply system, thereby improving the control optimizing effect and the running efficiency of the nuclear steam supply system.

Drawings

For a clearer description of embodiments of the invention or of the solutions of the prior art, the drawings that are needed in the description of the embodiments or of the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention, and that, without the inventive effort, other drawings can be obtained from them to those skilled in the art:

fig. 1 is a flowchart of a multi-objective control optimization method of a nuclear steam supply system according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a nuclear steam supply system according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of multi-objective control optimization of a nuclear steam supply system according to an embodiment of the present invention.

Fig. 4 is a flowchart of another multi-objective control optimization method for a nuclear steam supply system according to an embodiment of the present invention.

Fig. 5 is a flowchart of a method for obtaining a multi-objective control optimization strategy according to an embodiment of the present invention.

Fig. 6 is a flowchart of a multi-objective control optimization method of a nuclear steam supply system according to another embodiment of the present invention.

Fig. 7 is an exemplary diagram of the effect of a multi-objective control optimization method for a nuclear steam supply system according to an embodiment of the present invention.

Fig. 8 is a diagram illustrating an exemplary effect of a multi-objective control optimization method for a nuclear steam supply system according to another embodiment of the present invention.

Fig. 9 is a schematic structural diagram of a multi-objective control optimizing apparatus of a nuclear steam supply system according to an embodiment of the present invention.

Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Fig. 1 is a flowchart of a multi-objective control optimization method of a nuclear steam supply system according to an embodiment of the present invention. As shown in fig. 1, the multi-objective control optimization method of the nuclear steam supply system includes: step S1-step S2.

And S1, determining input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in the nuclear steam supply system.

The nuclear steam supply system (Nuclear Steam Supply System, NSSS) is a system for generating steam by utilizing nuclear fission energy in a nuclear power station, and mainly comprises a fission reactor nuclear steam generator.

The control targets are indicators of the nuclear steam supply system that need to be optimized, including but not limited to: thermal power, outlet steam temperature, outlet steam pressure, etc.

In some embodiments, the thermal power may be expressed in terms of reactor full power (RFP, reactor full power) in the control target, and the maximum allowable error of the control target transient is set to 5% and the steady state maximum allowable error is set to 0.5%. The outlet steam temperature may be expressed in degrees celsius and the maximum allowable error of the control target transient is set to 2 ℃ and the steady state maximum allowable error is set to 0.2 ℃.

Fig. 2 is a schematic structural diagram of a nuclear steam supply system according to an embodiment of the present invention. As shown in fig. 2, the operator manually adjusts the load demand (load demand) and inputs it as a System input into the nuclear steam supply System. The load demand is used to characterize the load of the nuclear steam supply system, which refers to the ratio of the actual power generation of the nuclear steam supply system to the rated power generation. The system maps load demand to controller reference values (reference values of controllers), such as neutron flux controller (neutron flux controller), primary loop temperature controller (primary temperature controller), and secondary loop temperature controller (secondary temperature controller) reference values, using a static mapping table that takes into account thermodynamic and physical design principles. And the controller adjusts actuators (actuators) such as a control rod, a pump, a blower, a water feeding pump and the like according to the reference value, so that the system output such as thermal power, outlet steam temperature (outlet steam temperature) and the like reaches the reference value, thereby meeting the load requirement. In order to ensure efficient, safe operation of the nuclear steam supply system, precise control of the thermal power of the fission reactor and the outlet steam temperature of the steam generator is required. The stable closed-loop control can be realized at present, but due to the characteristics of nonlinearity and strong coupling of the nuclear steam supply system, the reference value track provided by the static mapping table can only reflect the reference steady-state value of the process variable, and the transient performance of the system control needs to be further improved.

The optimizer is an optimization program for solving the multi-objective control optimization problem.

In the embodiment of the invention, in order to enable the optimizer to be directly applied to a nuclear steam supply system of a nuclear power plant, the input parameters of the optimizer should be directly obtained or inferred from the nuclear steam supply system of the nuclear power plant and include the operation parameters related to the control targets.

The input parameters of the optimizer include: deviation of the actual value corresponding to the control target from the reference value, system parameters and operation parameters of the nuclear steam supply system. The output parameters of the optimizer include: and a preset correction amount corresponding to the control target.

In one embodiment, taking the example that the control targets include thermal power and outlet steam temperature, the optimizer input parameters include: deviation deltan of the actual thermal power value from the reference value, deviation deltat of the actual outlet steam temperature value from the reference value _s As input parameters to the optimizer are measured values of the reactivity of the reactor ρ (system parameters), and associated flow G (operating parameters), temperature T (operating parameters), and pressure P (operating parameters). And is combined withAnd, in order to realize the optimization of the control performance of the nuclear steam supply system, the first derivative correction of the load demand is selected as the output of the optimizer. Further, considering the safe operation index of the nuclear power plant, the output of the optimizer is limited to [ -10%,10% ]FP/min。

Fig. 3 is a schematic diagram of multi-objective control optimization of a nuclear steam supply system according to an embodiment of the present invention. As shown in fig. 3, taking the control targets including the thermal power and the outlet steam temperature as an example, the primary load demand (Original load demand) and the related operation parameters (Related operational parameters) obtained from the nuclear steam supply system, such as the deviation deltan of the actual thermal power value from the reference value and the deviation deltat of the actual outlet steam temperature value from the reference value, are input into the Optimizer (Optimizer) to obtain the first derivative correction (First derivative revision of the load) of the primary load demand output by the Optimizer _s The reactivity measurements ρ of the reactor, and the associated flow G, temperature T, and pressure P, etc. After adding the original load demand to the first derivative correction of the original load demand, a modified load demand is obtained (Modified load demand). The modified load demand is applied to the nuclear steam supply system to realize multi-objective control optimization of the nuclear steam supply system, such as a control process of Thermal power generation (Thermal power) and a control process of outlet steam temperature (Outlet steam temperature).

And S2, determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of the optimizer.

Wherein the reward function is a function for determining a reward value obtained in a certain state or taking a specific action.

In one embodiment, the step of determining the reward function based on the plurality of control targets to be optimized, the input parameters and the output parameters of the optimizer, comprises: the following steps one and two.

Step one, aiming at each control target, obtaining a difference value of an input parameter corresponding to the control target and a steady-state maximum allowable error corresponding to the control target, and a ratio of an absolute value of the difference value to the steady-state maximum allowable error.

And step two, determining a reward function based on the ratio corresponding to each control target.

In this step two, the negative number of the sum of the products of each ratio and the corresponding coefficient may be taken as the bonus function.

Taking the control targets as thermal power and outlet steam temperature as examples, the bonus function is set as r1 below.

The bonus function r1= - (epsilon) _n ·β _n +ε _T ·β _TS ) Wherein, the method comprises the steps of, wherein,for representing the maximum allowable error E of the thermal power control deviation and the steady state _n The thermal power control deviation |Deltan- ∈is proportional to _n I represents a steady-state maximum allowable error e of an input parameter (deviation Δn of an actual thermal power value from a reference value) corresponding to thermal power and corresponding to thermal power _n Absolute value of the difference of (2); />For indicating the maximum allowable error of the outlet steam temperature control deviation from steady state +.>Is the ratio of the outlet steam temperature control deviation +.>Representing the input parameter corresponding to the outlet steam temperature (deviation DeltaT of the actual value of the outlet steam temperature from the reference value) _S ) Steady state maximum allowable error e corresponding to thermal power _n Is the absolute value of the difference of (c). The coefficient corresponding to the thermal power and the outlet steam temperature is beta _n And->

And S3, loading a preset deep learning network model and a reward function into the optimizer, and configuring an event trigger mechanism for the optimizer.

The preset deep learning network model is a pre-selected deep learning network model.

The event triggering mechanism is a mechanism for triggering the execution of the optimizer according to a predetermined operation state of the nuclear steam supply system.

According to the embodiment of the invention, the deep reinforcement learning-based optimizer is added into the nuclear steam supply system of the nuclear power station, so that an operator can be assisted to optimize the control performance of the nuclear steam supply system according to the operation parameters of the system, and the task load of the operator is reduced.

And, the event triggering mechanism described above may determine the operating range of the optimizer. Although the deep reinforcement learning-based control optimizer can acquire an optimal strategy by exploring and debugging without depending on a system dynamic model, the risk of the optimizer adopting a harmful strategy which negatively affects the safe operation of the nuclear power plant in the training and executing process is increased. The embodiment of the invention introduces an event driving mechanism to determine the working range of the optimizer, thereby ensuring that the optimizer optimizes a plurality of control targets of the nuclear steam supply system on the premise of safe and stable operation of the reactor.

And S4, under the condition of triggering the execution of the optimizer, the control optimizer performs multiple iterations to obtain a multi-objective control optimization strategy.

Wherein the multi-objective control optimization strategy is a strategy for obtaining a preset jackpot function value, which generally refers to the maximum jackpot function value that can be obtained by the bonus function after a number of iterations.

According to the embodiment of the invention, the optimizer capable of improving the performance of a plurality of control targets is automatically obtained through the deep reinforcement learning algorithm, personnel are not needed in the optimization process, and the intelligent level of the nuclear steam supply system is increased.

And S5, the control optimizer executes control optimization operation based on the multi-objective control optimization strategy so as to optimize a plurality of control objectives.

Wherein the control optimization operation includes: the reference values of the corresponding controllers in the system are dynamically modified according to the operating parameters of the nuclear steam supply system to optimize the control performance of the nuclear steam supply system.

The embodiment of the invention provides a multi-target control optimization method of a nuclear steam supply system, which comprises the steps of firstly determining input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in the nuclear steam supply system; the optimizer is an optimization program for solving the problem of multi-objective control optimization; determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of an optimizer; loading a preset deep learning network model and a reward function into an optimizer, and configuring an event triggering mechanism for the optimizer; the event triggering mechanism is a mechanism for triggering the execution of the optimizer according to a predetermined operation state of the nuclear steam supply system; then, under the condition of triggering the execution of the optimizer, the control optimizer performs multiple iterations to obtain a multi-objective control optimization strategy; the multi-objective control optimization strategy is a strategy for acquiring a preset jackpot function value; finally, the control optimizer performs control optimization operation based on a multi-objective control optimization strategy to optimize a plurality of control objectives.

Fig. 4 is a flowchart of another multi-objective control optimization method for a nuclear steam supply system according to an embodiment of the present invention. As shown in fig. 4, after determining the reward functions corresponding to the plurality of control targets based on the input parameters and the output parameters of the optimizer (step S2), the method further includes: step S6-step S7.

And step S6, modeling the multi-objective control optimization problem as a Markov decision process, and defining five-tuple consisting of the input state space, action space, state transfer function, discount factor and initial state distribution.

Wherein the input state space is a parameter set determined based on input parameters, each input parameter corresponding to one state in the state space; the action space is a parameter set determined based on output parameters, each output parameter corresponding to one action in the action space; the state transfer function is a function for determining the state of the current time step from the state of the last time step and the selected action; the discount factor is used to characterize the importance of the prize value obtained for each time step in the jackpot. It follows that the jackpot is dependent on the action selected by the optimizer, i.e. the strategy taken by the optimizer, in other words, the goal of the optimizer is to learn from the initial state distribution the strategy that maximizes the jackpot.

And S7, determining a deep reinforcement learning algorithm based on a Markov decision process, and determining a preset deep learning network model based on the deep reinforcement learning algorithm.

The deep reinforcement learning algorithm is a reinforcement learning algorithm capable of solving the Markov decision process, and the preset deep learning network model is a deep learning network model corresponding to the deep reinforcement learning algorithm.

In one embodiment, the depth reinforcement learning algorithm includes any one of a depth Q network learning algorithm, a depth deterministic strategy gradient algorithm, and an actor-critter algorithm.

In one embodiment, where the deep reinforcement learning algorithm is an actor-commentator algorithm, the preset deep learning network model is a deep learning network model comprising an actor network and a commentator network.

In one embodiment, a preset deep learning network model is taken as an example of an actor-critique algorithm. The actor-critique algorithm is an algorithmic model that utilizes a maximum entropy target to formulate a random strategy.

The structure of the preset deep learning network model is as follows.

Wherein pi ^* To optimize the strategy; argmax () is a function for making the objective function take the set of variable points corresponding to the maximum value; e represents the desire; t represents a time step; s represents a state, S _t Representation ofThe state at time step t; a represents an action, a _t Representing actions performed at time step t; ρ _π State-action distribution representing optimization strategies; i takes the value from t to infinity for counting time steps; r is used to represent a bonus function; alpha is the temperature coefficient, ">Represents equilibrium policy entropy, which->

The entropy target may be expressed in terms of an optimal soft Q function, which is shown below.

Wherein r is used to represent a reward function; e is used to represent the desire; the value of k is [1, ] infinity]For counting time steps.

In the optimal soft Q function, pi ^* Is an optimal soft strategy.

The method comprisesWherein V is ^* (s _t ) Is a soft state cost function.

Wherein A represents an action space containing all actions, i.e. A is a _t Is a set of (3).

Fig. 5 is a flowchart of a method for obtaining a multi-objective control optimization strategy according to an embodiment of the present invention. In one embodiment, the preset deep learning network model includes a first network and a second network. As shown in fig. 5, the step of performing a plurality of iterations by the control optimizer to obtain a multi-objective control optimization strategy (step S4) includes: step S41 to step S46.

Step S41, initializing network parameters of the first network and network parameters of the second network.

In one embodiment, where the predetermined deep learning network model is an actor-critter algorithm, the first network is an actor network and the second network is a critter network.

Step S42, initializing a state space to obtain a current state.

The state space refers to a defined input state space when the multi-objective control optimization problem is modeled as a Markov decision process, the input state space is a parameter set determined based on input parameters, and each input parameter corresponds to one state in the state space.

Step S43, a first processing procedure is executed for the current time step.

The first process includes: and selecting an action in the action space based on the current state according to the strategy output by the first network, executing the state transfer function to obtain the state of the next time step, and acquiring the rewarding value of the current time step according to the rewarding function.

Wherein the actions are contained in an action space, the action space being a set of parameters determined based on output parameters, each output parameter corresponding to an action in the action space; the state transfer function is a function for determining the state of the current time step from the state of the last time step and the selected action; the discount factor is used to characterize the importance of the prize value obtained for each time step in the jackpot.

Step S44, accumulating the prize value of each time step based on the discount factor, and determining whether the current accumulated prize value is less than a preset accumulated prize function value by the second network.

Step S45, respectively updating network parameters of the first network and the second network based on rewards of the current time step under the condition that the accumulated rewards are smaller than the preset accumulated rewards function value so as to obtain an updated preset deep learning network model, and re-executing the first processing process based on the updated preset deep learning network model in the next time step.

In this embodiment, in updating the network parameters of the first network and the second network, the soft Q-value function parameter θ may be iteratively updated using bellman residuals, and the gradient thereof may be calculated by the following equation 1.

Wherein v is used to represent the gradient; pi _φ Represents the first network and phi represents the network parameters of the first network. Q (Q) _θ And represents the second network, and θ is a network parameter of the second network.Representing a history network corresponding to the second network, +.>Is a historical network parameter. After initializing the network parameters of the first network and the network parameters of the second network, network +.>And network Q _θ As a rule, Q is the same as in the first treatment _θ Will iterate continuously according to equation 2 below. But->Then according to a certain rule (e.g. sum Q every N time steps _θ As same) as Q _θ Is updated continuously.

The random strategy can be updated by minimizing the expected KL divergence (Kullback-Leibler divergence), the gradient of which can be calculated by the following equation 2.

Wherein a re-parameterization technique a can be employed _t ＝f _φ (ε _t ；s _t ) So that the policy parameters phi can be counter-propagated.

And S46, calling a preset deep learning network model to generate a multi-objective control optimization strategy under the condition that the accumulated prize value is larger than or equal to a preset accumulated prize function value.

In the embodiment of the invention, the optimizers capable of improving the performance of a plurality of control targets are automatically acquired through the deep reinforcement learning algorithm, and the optimization process does not need personnel participation, so that the intelligent level of the nuclear steam supply system is increased.

Fig. 6 is a flowchart of a multi-objective control optimization method of a nuclear steam supply system according to another embodiment of the present invention. As shown in fig. 6, after loading the preset deep learning network model and the reward function into the optimizer and configuring the event trigger mechanism for the optimizer (step S5), the method further includes: step S8-step S11.

And S8, determining the operation state of the nuclear steam supply system according to the preset performance evaluation index at each time step.

Wherein the preset performance evaluation index is an index for indicating whether the nuclear steam supply system reaches an expected state. The preset performance evaluation index may be set according to an actual application process of the nuclear steam supply system, which is not specifically limited in the embodiment of the present invention.

In one embodiment, the operating conditions of the nuclear steam supply system include three conditions: a first state, a second state, and a third state. The first state is a 'difference' state and is used for indicating that the operation state of the nuclear steam supply system does not reach the expected state and the deviation degree from the expected state is larger than the preset deviation degree. The second state is a 'medium' state for indicating that the operating state of the nuclear steam supply system does not reach the expected state and the deviation degree from the expected state is smaller than the preset deviation degree. The third state is an "excellent" state for indicating that the operating state of the nuclear steam supply system has reached a desired state.

And step S9, controlling the nuclear steam supply system to send out an operation early warning prompt under the condition that the operation state is the first state.

In the case of the first operating state, the operating state of the nuclear steam supply system is greatly deviated from the expected state, and the operating state is unstable and may form a safety risk. Therefore, in the case that the operation state is the first state, control optimization cannot be performed, but a conservative strategy is adopted to solve the safety risk first. Therefore, the nuclear steam supply system is controlled to send out an operation early warning prompt so that operators of the nuclear steam supply system can deal with the safety risk problem.

Step S10, when the operation state is the second state, the start execution of the optimizer is controlled.

Wherein in case the operating state is the second state, it is explained that the operating state of the nuclear steam supply system is closer to the expected state, but still further optimization is needed. And the operating state of the nuclear steam supply system is relatively temperature, and a random strategy can be adopted to explore as much as possible so as to improve the control performance. Thus, the starting execution of the optimizer can be controlled, and the multi-objective control optimization is realized.

Step S11, when the operation state is the third state, waiting for entering the next time step.

Under the condition that the operation state is the third state, the operation state of the nuclear steam supply system is higher than the expected state, control optimization is not needed to be continued, a stabilization strategy is needed to be adopted at the moment, and the current operation state is kept waiting for entering the next time step.

In the embodiment of the invention, the working range of the optimizer is determined by introducing an event-driven mechanism, so that the optimizer is ensured to optimize a plurality of control targets of the nuclear steam supply system on the premise of safe and stable operation of the reactor.

As shown in FIG. 7, the abscissa is the simulated time steps in seconds; the ordinate is relative fissile nuclear power, 1 is full power, and 0.7 is 70 percent full power. The solid line is the relative nuclear power setting curve, the dot-dash line at the lowest is the relative nuclear power curve before optimization, the dot-line segment is the optimized relative nuclear power curve, and as can be seen from fig. 7, the optimized relative nuclear power curve is almost consistent with the relative nuclear power setting curve in a stable state, that is, the optimized relative nuclear power reaches the expected set relative nuclear power, and the optimization control is realized.

As shown in FIG. 8, the abscissa is the simulated time steps in seconds; the ordinate is outlet steam temperature in degrees celsius. The dotted line is the outlet steam temperature curve before optimization, and the dotted-line dotted line is the outlet steam temperature curve after optimization. In general, the outlet steam temperature is taken as a target value of the outlet steam temperature, and the outlet steam temperature curve before and after optimization shows that the outlet steam temperature after optimization deviates from the maximum amplitude of 575 ℃ in the period of 8000-8250s and is obviously smaller than the maximum amplitude of 575 ℃ in the period of 8000-8250s, so that the multi-target control optimization method of the nuclear steam supply system provided by the embodiment of the invention can realize control optimization of the outlet steam temperature in the nuclear steam supply system.

Fig. 9 is a schematic structural diagram of a multi-objective control optimizing apparatus of a nuclear steam supply system according to an embodiment of the present invention. As shown in fig. 9, the apparatus includes: a parameter determination module 91, a function determination module 92, an optimization configuration module 93, an iteration module 94 and an optimization execution module 95.

The parameter determination module 91 is configured to determine input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in the nuclear steam supply system, where the optimizer is an optimization program for solving a multi-target control optimization problem.

The function determining module 92 is configured to determine a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of the optimizer.

In one embodiment, the function determination module 92 is specifically configured to: aiming at each control target, obtaining a difference value of an input parameter corresponding to the control target and a steady-state maximum allowable error corresponding to the control target, and a ratio of an absolute value of the difference value to the steady-state maximum allowable error; a bonus function is determined based on the ratio corresponding to each control objective.

In one embodiment, the apparatus further comprises a function determination module for modeling the multi-objective control optimization problem as a markov decision process and defining a five-tuple of input state space, action space, state transfer function, discount factor, and initial state distribution, and determining a preset deep learning network model based on the markov decision process. Wherein the input state space is a parameter set determined based on the input parameters; the action space is a parameter set determined based on the output parameters; the state transfer function is a function for determining the state of the current time step from the action selected in the previous time step; the discount factor is used to characterize the importance of the prize value obtained for each time step in the jackpot;

The optimizing configuration module 93 is configured to load the preset deep learning network model and the reward function into the optimizer, and configure an event triggering mechanism for the optimizer, where the event triggering mechanism is a mechanism for triggering the execution of the optimizer according to a predetermined operation state of the nuclear steam supply system.

In one embodiment, the apparatus further comprises a status monitoring module and a monitoring processing module.

The state monitoring module is used for determining the operation state of the nuclear steam supply system according to the preset performance evaluation index at each time step.

The monitoring processing module is used for controlling the nuclear steam supply system to send out operation early warning reminding under the condition that the operation state is the first state; and the control unit is also used for controlling the start execution of the optimizer under the condition that the running state is the second state; and is also used to wait for the next time step to be entered in case the operating state is the third state.

The iteration module 94 is configured to control the optimizer to perform multiple iterations under the condition that the optimizer is triggered to execute, so as to obtain a multi-objective control optimization strategy, where the multi-objective control optimization strategy is a strategy for obtaining a preset reward function value.

In one embodiment, the preset deep learning network model includes a first network and a second network; the iteration module 94 is specifically configured to: initializing network parameters of a first network and network parameters of a second network; initializing a state space to obtain a current state; executing a first processing procedure for the current time step; the first process comprises: selecting an action in an action space based on the current state according to the strategy output by the first network, executing a state transfer function to obtain the state of the next time step, and acquiring a reward value of the current time step according to a reward function; accumulating the prize value for each time step based on the discount factor and determining whether the current accumulated prize value is less than a preset accumulated prize function value; updating network parameters of the first network and the second network based on rewards of the current time step respectively under the condition that the accumulated rewards are smaller than a preset accumulated rewards function value, and re-executing a first processing process based on the updated preset deep learning network model in the next time step; and calling a preset deep learning network model to generate a multi-objective control optimization strategy under the condition that the accumulated prize value is larger than or equal to the preset accumulated prize function value.

The optimization execution module 95 is configured to control the optimizer to execute a control optimization operation based on the multi-objective control optimization strategy, so as to optimize the plurality of control objectives.

The embodiment of the invention provides a multi-target optimization control device, wherein a parameter determination module is used for determining input parameters and output parameters of an optimizer based on a plurality of control targets to be optimized in a nuclear steam supply system; the optimizer is an optimization program for solving the problem of multi-objective control optimization; the function determining module is used for determining a reward function based on the input parameter and the output parameter of the optimizer; the optimizing configuration module is used for loading a preset deep learning network model and a reward function into the optimizer and configuring an event trigger mechanism for the optimizer; the event triggering mechanism is a mechanism for triggering the execution of the optimizer according to a predetermined operation state of the nuclear steam supply system; the iteration module is used for controlling the optimizer to iterate for a plurality of times under the condition of triggering the execution of the optimizer so as to obtain a multi-objective control optimization strategy; the multi-objective control optimization strategy is a strategy for acquiring a preset jackpot function value; the optimizing execution module is used for loading the multi-objective control optimizing strategy into the optimizer so that the optimizer can execute control optimizing operation based on the multi-objective control optimizing strategy, and performance optimization can be performed on a plurality of control targets of the nuclear steam supply system while the running stability and the robustness of the nuclear steam supply system are ensured, so that the control optimizing effect is improved, and the running efficiency of the nuclear steam supply system is improved.

Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 10, the computer device may include a processor 1001 and a memory 1002, where the processor 1001 and the memory 1002 may be connected by a bus or otherwise, as exemplified by a bus connection in fig. 10.

The processor 1001 may be a central processing unit (Central Processing Unit, CPU). The processor 1001 may also be a chip such as other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination thereof.

The memory 1002 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to a multi-objective control optimization method of a nuclear steam supply system in an embodiment of the present invention. The processor 1001 executes various functional applications of the processor and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1002, that is, implements the multi-objective control optimization method of the nuclear steam supply system in the above-described method embodiment.

Memory 1002 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the processor 1001, and the like. In addition, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 1002 may optionally include memory located remotely from processor 1001, such remote memory being connectable to processor 1001 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 1002 that, when executed by the processor 1001, perform the multi-objective control optimization method of the nuclear steam supply system in the embodiment shown in fig. 1.

The details of the above computer device may be understood correspondingly with respect to the corresponding relevant descriptions and effects in the embodiment shown in fig. 1, which are not repeated here.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A method of multi-objective control optimization of a nuclear steam supply system, the method comprising:

2. The method of claim 1, further comprising, after determining the bonus function corresponding to the plurality of control targets based on the plurality of control targets to be optimized, the input parameters and the output parameters of the optimizer:

and the preset deep learning network model is based on a deep reinforcement learning algorithm.

3. The method of claim 2, wherein the predetermined deep learning network model comprises a first network and a second network; the controlling the optimizer to perform a plurality of iterations to obtain a multi-objective control optimization strategy includes:

initializing the state space to obtain a current state;

4. A method according to any one of claims 1-3, wherein after loading a pre-set deep learning network model and the reward function into the optimizer and configuring an event trigger mechanism for the optimizer, further comprising:

5. The method according to claim 1, wherein the input parameters include a deviation of an actual value corresponding to the control target from a reference value, a system parameter of the nuclear steam supply system, and an operation parameter; the output parameter is a preset correction amount corresponding to the control target.

6. The method according to any one of claims 1 or 5, wherein the step of determining a reward function based on a plurality of control targets to be optimized, input parameters and output parameters of the optimizer, comprises:

7. The method of claim 2, wherein the depth reinforcement learning algorithm comprises any one of a depth Q network learning algorithm, a depth deterministic strategy gradient algorithm, and an actor-critique algorithm.

8. A multi-objective control optimizing apparatus for a nuclear steam supply system, the apparatus comprising:

the function determining module is used for determining a reward function based on the input parameter and the output parameter of the optimizer;

and the optimization execution module is used for loading the multi-objective control optimization strategy into the optimizer so that the optimizer can execute control optimization operation based on the multi-objective control optimization strategy to realize optimization of a plurality of control objectives.

9. A computer device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the steps of the multi-objective control optimization method of the nuclear steam supply system of any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the multi-objective control optimization method of a nuclear steam supply system according to any one of claims 1-7.