CN113435129B

CN113435129B - Method and device for optimizing control strategy of desuperheating water valve and desuperheating water valve

Info

Publication number: CN113435129B
Application number: CN202110978308.3A
Authority: CN
Inventors: 张超; 张海恩; 刘昌鑫; 高耸屹; 刘泽琳; 徐亮
Original assignee: Nanqi Xiance Nanjing Technology Co ltd
Current assignee: Nanqi Xiance Nanjing Technology Co ltd
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-11-23
Anticipated expiration: 2041-08-25
Also published as: CN113435129A

Abstract

The embodiment of the invention discloses a method and a device for optimizing a temperature-reducing water valve control strategy and a temperature-reducing water valve, which comprises the steps of obtaining original action data of the temperature-reducing water valve; constructing a state-action sequence data set of the temperature-reducing water valve based on the original action data; constructing a simulation environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set; and carrying out optimization interaction of preset optimization iteration times on the temperature reduction water valve control strategy model and the unit state transfer strategy model to obtain an optimized temperature reduction water valve control strategy model. The embodiment of the invention solves the technical problems of easy over-regulation and unstable system caused by the delay of the system when the temperature-reducing water valve is controlled by adopting a PID control strategy in the prior art, realizes the technical effect of automatic and intelligent control of the temperature-reducing water valve on the basis of considering the delay of the system, and avoids the harm caused by over-regulation.

Description

Method and device for optimizing control strategy of desuperheating water valve and desuperheating water valve

Technical Field

The embodiment of the invention relates to the technical field of temperature-reducing water valve control, in particular to a method and a device for optimizing a temperature-reducing water valve control strategy and a temperature-reducing water valve.

Background

In order to promote ultralow emission and energy-saving transformation of a coal-fired power plant, energy-saving and environment-friendly comprehensive transformation of a coal-fired boiler is implemented, so that the improvement of comprehensive economic indexes of thermal power generation is of great importance. The key problem of intelligent optimization in the thermal power generating unit is how to rapidly reduce the temperature of superheated steam in the boiler, maintain the temperature of an outlet of a superheater within a constraint range, and ensure that the temperature of a pipe wall does not exceed an allowable working temperature.

Too high or too low a superheater temperature can significantly affect the safety and economy of a thermal power plant. Too high a superheated steam temperature may cause metal damage to the superheater, steam piping, and high pressure parts of the steam turbine, and too low a superheated steam temperature may reduce the thermal efficiency of the entire power plant and may affect the safety and economy of the steam turbine.

The temperature of the superheated steam is controlled by a temperature-reducing water valve, and the temperature-reducing water valve control strategy in the existing popular thermal power generation adopts a PID control strategy, and the working principle is as follows: when the steam temperature exceeds the target temperature value, the valve opening of the temperature-reducing water valve is controlled to be larger, and the sprayed temperature-reducing water is more, so that the steam temperature is rapidly reduced. However, the working mode can only observe parameters in a few thermal generator sets, so that a temperature-reducing water valve control strategy is given, over-regulation is easy, and due to the fact that the opening degree control of the temperature-reducing water valve has system delay, the over-regulation has adverse effects on the system stability in a future period of time, and the steam temperature is made to oscillate.

Disclosure of Invention

The embodiment of the invention provides a method and a device for optimizing a temperature-reducing water valve control strategy and a temperature-reducing water valve, and solves the technical problems that in the prior art, a PID (proportion integration differentiation) control strategy is adopted to control the temperature-reducing water valve, and the system is easy to over-adjust and unstable due to delay.

The embodiment of the invention provides an optimization method of a temperature-reducing water valve control strategy, which comprises the following steps:

acquiring original action data of a temperature-reducing water valve, wherein the original action data comprises instrument data in the running process of a thermal generator set and control parameters of the temperature-reducing water valve;

constructing a state-action sequence dataset of the desuperheating water valve based on the raw action data;

constructing a simulated environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, wherein the simulated environment model comprises a desuperheating water valve control strategy model and a unit state transition strategy model;

and performing optimization interaction of preset optimization iteration times on the temperature reduction water valve control strategy model and the unit state transition strategy model to obtain the optimized temperature reduction water valve control strategy model, wherein the optimization target of the optimization interaction is that the current temperature of the superheated steam regulated by the temperature reduction water valve is within a preset target temperature range.

Further, in the optimizing interaction process of the preset optimization iteration number, a given initialization state is used as an input value of the optimizing interaction at the 1 st time, an output value of the optimizing interaction at the nth time is used as an input value of the optimizing interaction at the n +1 th time, and n is larger than or equal to 1.

Further, the optimizing interaction between the temperature-reducing water valve control strategy model and the unit state transition strategy model comprises:

will be firstmOutput value of the next optimization interactions _mInputting the temperature-reducing water valve control strategy model as an input value to obtain an output valuea _mWherein, in the step (A),s _mto representmThe unit state of the thermal generator set is determined,a _mthe state of the thermal generator set is represented ass _mThe valve opening degree of the temperature-reducing water valve,mless than or equal to the preset optimization iteration times;

will input the values _mAnd the output valuea _mConstructed as a state vectors _m，a _m]；

The state vector [ alpha ], [ beta ], [ alpha ], [ beta ] as _m，a _m]Inputting the state transition strategy model of the unit to obtain the firstm+1 output value of the optimization interactions _m+1Wherein, in the step (A),s _m+1to representmThe unit state of the thermal generator set at +1 moment;

and after the optimization interaction of the preset optimization iteration times is completed, obtaining the optimized temperature-reducing water valve control strategy model.

Further, in the process of constructing the simulated environment model, the constructing the simulated environment model of the thermal generator set through the reinforcement learning algorithm based on the state-action sequence data set comprises:

building the simulated environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, and building a discriminator model through a neural network, wherein the simulated environment model comprises the desuperheating water valve control strategy model and the unit state transition strategy model;

inputting a preset given unit state into the simulation environment model to generate a simulation state action sequence data set in a simulation environment;

inputting the simulated state action sequence data set and the state-action sequence data set of the thermal generator set under the real environment operation into the discriminator model to generate a confidence coefficient of the simulated environment model;

updating the network parameters of the temperature-reducing water valve control strategy model by using the generated confidence as an optimization target;

and repeating the action of updating the network parameters of the temperature reduction water valve control strategy model by presetting the maximum sampling iteration times to obtain the optimized simulation environment model.

Further, the constructing a state-action sequence dataset for the desuperheating water valve based on the raw action data comprises:

based on the raw motion data and the formula D = [, ]s ₀，a ₀，s ₁，a ₁，……s _T-1，a _T-1，s _T]Constructing a state-action sequence dataset for the desuperheating water valve, wherein D represents the state-action sequence dataset,s _Tshowing the unit state of the thermal generator set at the time T,a _Tthe state of the thermal generator set is represented ass _TAnd the opening degree of the temperature-reducing water valve is a positive integer, T is greater than or equal to 0 and the like.

Further, after obtaining the optimized temperature-reducing water valve control strategy model, the method further includes:

and controlling the work of the temperature-reducing water valve based on the optimized temperature-reducing water valve control strategy model.

The embodiment of the invention also provides an optimization device for the control strategy of the desuperheating water valve, which comprises the following steps:

the data acquisition unit is used for acquiring original action data of the desuperheating water valve;

the data construction unit is used for constructing a state-action sequence data set of the temperature reduction water valve based on the original action data;

the model construction unit is used for constructing a simulated environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, wherein the simulated environment model comprises a desuperheating water valve control strategy model and a unit state transition strategy model;

and the model optimization unit is used for carrying out optimization interaction on the temperature-reducing water valve control strategy model and the unit state transfer strategy model to obtain the optimized temperature-reducing water valve control strategy model, wherein the optimization interaction times are preset optimization iteration times, and the optimization target of the optimization interaction is that the current temperature of the superheated steam regulated by the temperature-reducing water valve is within a preset target temperature range.

The embodiment of the invention also provides a desuperheating water valve which comprises the optimization device for the control strategy of the desuperheating water valve in any embodiment.

The embodiment of the invention also provides equipment for optimizing the temperature-reducing water valve control strategy, which comprises the following steps:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of optimizing a desuperheating water valve control strategy as described in any of the embodiments above.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, wherein the program is executed by a processor to implement the optimization method of the desuperheating water valve control strategy according to any of the above embodiments.

Drawings

FIG. 1 is a flow chart of a method for optimizing a desuperheating water valve control strategy provided by an embodiment of the present invention;

FIG. 2 is a flow chart of another method for optimizing a desuperheating water valve control strategy provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for optimizing another desuperheating water valve control strategy provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of a further method for optimizing a desuperheating water valve control strategy provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart of a method for optimizing another desuperheating water valve control strategy provided by an embodiment of the present disclosure;

FIG. 6 is a block diagram of an optimization apparatus for a desuperheating water valve control strategy according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an optimization device of a temperature-reducing water valve control strategy according to an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

It should be noted that the terms "first", "second", and the like in the description and claims of the present invention and the accompanying drawings are used for distinguishing different objects, and are not used for limiting a specific order. The following embodiments of the present invention may be implemented individually, or in combination with each other, and the embodiments of the present invention are not limited in this respect.

Fig. 1 is a flowchart of an optimization method of a temperature-reducing water valve control strategy according to an embodiment of the present invention.

As shown in fig. 1, the optimization method of the temperature-reducing water valve control strategy specifically includes the following steps:

s101, acquiring original action data of the temperature-reducing water valve, wherein the original action data comprises instrument data in the operation process of the thermal generator set and control parameters of the temperature-reducing water valve.

S102, constructing a state-action sequence data set of the temperature-reduction water valve based on the original action data.

Specifically, instrument data recorded by each instrument in the running process of the unit and control parameters of the temperature-reducing water valve are acquired from a real thermal power generating unit, and the instrument data and the control parameters are constructed into a state-action sequence data set D = [ ([ means of limitation ] ])s ₀，a ₀，s ₁，a ₁，……s _T-1，a _T-1，s _T]Whereins _TIndicating the state of the thermal power generating unit obtained by instrument data recorded by each instrument of the thermal power generating unit at the time T,a _Tthe valve opening degree of the temperature reduction water valve at the time T is shown, T is a positive integer which is greater than or equal to 0, when T =0,s ₀the state of the thermal generator set obtained by representing the instrument data recorded by each instrument at the initial moment of the thermal generator set is showns _T+1Representing a warp actiona _TAnd the state of each instrument of the rear unit at the next moment.

S103, constructing a simulated environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, wherein the simulated environment model comprises a temperature reduction water valve control strategy model and a unit state transition strategy model.

Specifically, after a state-action sequence data set of the thermal power generator set is obtained, a simulated environment model of the thermal power generator set is constructed by using a reinforcement learning algorithm, wherein the simulated environment model comprises a temperature reduction water valve control strategy model and a unit state transfer strategy model, and the temperature reduction water valve control strategy model is used for simulating a temperature reduction water valve controller and the output of the temperature reduction water valve controller is the valve opening of a temperature reduction water valve; the unit state transition strategy model is used for predicting the unit operation state at the next moment according to the unit operation state of the thermal generator set at the current moment and the valve opening of the temperature reduction water valve, wherein the unit operation state comprises a superheated steam temperature value.

And S104, performing optimization interaction of preset optimization iteration times on the temperature reduction water valve control strategy model and the unit state transition strategy model to obtain an optimized temperature reduction water valve control strategy model, wherein the optimization target of the optimization interaction is that the temperature reduction water valve adjusts the current temperature of the superheated steam to be within a preset target temperature range.

Specifically, in order to obtain the optimized temperature-reducing water valve control strategy model, a reward function is set before the optimization process is carried out, then the maximized reward is taken as an optimization target, the optimization target is set under the normal condition that the temperature-reducing water valve adjusts the current temperature of the superheated steam to be within a preset target temperature range, and exemplarily, the optimization target can be set that the current temperature of the superheated steam is always kept within plus or minus 5 ℃ of the working condition preset target temperature range.

Optionally, in the optimization interaction process of the preset optimization iteration number, a given initialization state is used as an input value of the 1 st optimization interaction, an output value of the nth optimization interaction is used as an input value of the (n + 1) th optimization interaction, and n is larger than or equal to 1.

Illustratively, based on a set optimization target, an initialization state s of a random thermal power generating set is given, the initialization state s is used as an input value of the 1 st optimization interaction and is input into a temperature reduction water valve control strategy model, the temperature reduction water valve control strategy model outputs a control behavior, namely, the valve opening of a temperature reduction water valve under the condition that the output unit state is s, the unit state transition strategy model outputs the unit state at the next moment according to the current unit state and the control behavior of the temperature reduction water valve control strategy model, then, from the 2 nd optimization interaction, the output value of each unit state transition strategy model is used as an input value of the next optimization interaction time group temperature reduction water valve control strategy model, namely, the unit state at the next moment output by the unit state transition strategy model is input into the temperature reduction water valve control strategy model, and the optimization interaction of the preset optimization iteration times of the temperature reduction water valve control strategy model and the unit state transition strategy model is carried out, and obtaining an optimized temperature-reducing water valve control strategy model, and controlling the temperature-reducing water valve to work according to the optimized temperature-reducing water valve control strategy model.

The embodiment of the invention solves the technical problems of easy over-regulation and unstable system caused by the delay of the system when the temperature-reducing water valve is controlled by adopting a PID control strategy in the prior art, realizes the technical effect of automatic and intelligent control of the temperature-reducing water valve on the basis of considering the delay of the system, and avoids the harm caused by over-regulation.

On the basis of the above technical solutions of the present invention, fig. 2 is a flowchart of another method for optimizing a temperature-reducing water valve control strategy provided in an embodiment of the present disclosure, and as shown in fig. 2, the step S104 specifically includes:

s1041, mixingmOutput value of sub-optimal interactions _mInputting the temperature-reducing water valve control strategy model as an input value to obtain an output valuea _mWherein, in the step (A),s _mto representmThe unit state of the thermal generator set is changed at any moment,a _mindicating the state of the thermal generator set ass _mThe valve opening degree of the temperature-reducing water valve,mless than or equal to a preset number of optimization iterations.

Specifically, the description is given by taking an example of one of the optimization interactions, suppose the firstmAfter the second optimization interaction, the output value of the unit state transition strategy model iss _mWill bes _mInput desuperheating water valve control strategy as input valueModel, start firstmAnd +1 optimization interaction, inputting the temperature reduction water valve control strategy model and outputting the state of the unit ass _mValve opening of time temperature-reducing water valvea _m。

S1042, inputting the values _mAnd an output valuea _mConstructed as a state vectors _m，a _m]。

Specifically, will bemWhen optimizing interaction is carried out for +1 time, the input value of the temperature-reducing water valve control strategy model is controlleds _mAnd the output valuea _mConstructed as a state vectors _m，a _m]And take it asmAnd +1 input of the group state transition strategy model in the optimization interaction.

S1043, converting the state vector [ alpha ], [ beta ], [ alpha ], [ beta ]s _m，a _m]Inputting the state transition strategy model of the unit to obtain the firstmOutput value of +1 optimizing interactions _m+1Wherein, in the step (A),s _m+1to representmAnd the set state of the thermal generator set at +1 moment.

Specifically, the constructed state vector [ alpha ], [ alpha ] iss _m，a _m]Input into the state transition strategy model of the unit to obtain the firstmOutput value of +1 optimizing interactions _m+1Then output the values _m+1Can be taken asmAnd controlling the input value of the strategy model by the desuperheating water valve in the +2 optimizing interactions.

It should be noted that the output values _m+1Including superheated steam temperature values, from the output values _m+1Obtaining the temperature value T of the superheated steam_{For treating}And obtaining the temperature value T of the superheated steam_{For treating}And setting a target temperature value T_{Preset of}Making difference, taking absolute value of obtained difference value and taking inverse number to finally obtain- | T_{For treating}-T_{Preset of}If | is taken as the reward of the optimizing interaction, the optimization goal of the optimizing interaction is to maximize the reward.

And S1044, obtaining an optimized temperature-reducing water valve control strategy model after the optimization interaction of the preset optimization iteration times is completed.

Specifically, the optimized desuperheating water valve control strategy model is obtained through optimizing interaction of the preset optimization iteration times of the desuperheating water valve control strategy model and the unit state transfer strategy model, and then the desuperheating water valve can be controlled to work according to the optimized desuperheating water valve control strategy model.

On the basis of the above technical solutions of the present invention, fig. 3 is a flowchart of a further optimization method for a temperature-reducing water valve control strategy provided in the embodiment of the present disclosure, as shown in fig. 3, in a process of constructing a simulation environment model, the step S103 specifically includes:

and S1031, constructing a simulation environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, and constructing a discriminator model through a neural network, wherein the simulation environment model comprises a temperature reduction water valve control strategy model and a unit state transition strategy model.

Specifically, in the process of building the simulation environment model, in order to enable the built simulation environment model to be more fit with the real operating environment of the thermal generator set, a discriminator model is built through a neural network, and the discriminator model is used for assisting in optimizing the simulation environment model.

And S1032, inputting the preset given unit state into the simulation environment model, and generating a simulation state action sequence data set in the simulation environment.

Illustratively, the given unit state is preset tos _tWill preset the given unit states _tInputting the data into a simulated environment model, and obtaining a simulated state action sequence data set D 'generated in the simulated environment model through a temperature reduction water valve control strategy model and a unit state transition strategy model'_subWherein, D'_sub=[s _t，a'_t，s'_t+1，a'_t+1，s'_t+2，……s'_t+N-1，a'_t+N-1，s'_t+N]。

S1033, the simulation state is activatedProduction sequence data set D'_subState-action sequence data set D for operating thermal generator set in real environment_subThe confidence of the simulated environment model is generated by inputting the model into the discriminator model.

Specifically, the state of the unit is acquired ass _tState-action sequence data set D of thermal power generating set in real environment operation_subWherein D is_sub=[s _t，a _t，s _t+1，a _t+1，s _t+2，……s _t+N-1，a _t+N-1，s _t+N]A simulation state motion sequence data set D'_subState-action sequence data set D for operating thermal generator set in real environment_subAnd inputting the model into a discriminator model, verifying whether the simulated environment model is similar to the real environment or not, and generating the confidence coefficient of the simulated environment model.

S1034, updating network parameters of the temperature-reducing water valve control strategy model by using the generated confidence coefficient as an optimization target;

specifically, the discriminator model updates the network parameters of the temperature-reducing water valve control strategy model by using a supervised learning algorithm based on the generated confidence coefficient and taking the maximized confidence coefficient as an optimization target.

And S1035, repeating the action of updating the network parameters of the temperature-reducing water valve control strategy model by presetting the maximum sampling iteration times to obtain the optimized simulated environment model.

Specifically, by repeating the step S1032, the step S1033, and the step S1034 of presetting the maximum sampling iteration number, whether the simulated environment model is similar to the real environment is continuously verified, and the network parameters of the desuperheating water valve control strategy model are updated according to the verification result (i.e., the confidence), so that the desuperheating water valve control strategy model more fitting the real environment is finally obtained, that is, the optimized simulated environment model is obtained.

On the basis of the above technical solutions of the present invention, fig. 4 is a flowchart of a further method for optimizing a temperature-reduced water valve control strategy provided in an embodiment of the present disclosure, as shown in fig. 4, the step S102 specifically includes:

s1021, based on the original motion data and the formula D = [, ]s ₀，a ₀，s ₁，a ₁，……s _T-1，a _T-1，s _T]Constructing a state-action sequence data set of the desuperheating water valve, wherein D represents the state-action sequence data set,s _Tshows the unit state of the thermal generator set at the time T,a _Tindicating the state of the thermal generator set ass _TThe opening degree of the temperature-reducing water valve is a positive integer, T is greater than or equal to 0 and the like.

Specifically, the original action data comprises instrument data recorded by each instrument in the running process of the thermal generator set and control parameters of the desuperheating water valve, and the instrument data of the thermal generator set can obtain the state of the thermal generator sets _TThe control parameter of the desuperheating water valve comprises the valve opening degree of the desuperheating water valvea _TThe meter data and the control parameter are constructed as a state-action sequence data set D = [ ]s ₀，a ₀，s ₁，a ₁，……s _T-1，a _T-1，s _T]When the value of T =0, the value of,s ₀and the state of the thermal generator set is represented by instrument data recorded by each instrument at the initial moment of the thermal generator set.

On the basis of the above technical solutions of the present invention, fig. 5 is a flowchart of a further method for optimizing a temperature-reduced water valve control strategy provided in an embodiment of the present disclosure, as shown in fig. 5, after S104, the method further includes:

and S105, controlling the work of the temperature-reducing water valve based on the optimized temperature-reducing water valve control strategy model.

Specifically, after the optimized desuperheating water valve control strategy model is obtained, the desuperheating water valve is controlled to work according to the optimized desuperheating water valve control strategy model, so that the over-regulation of the desuperheating water valve is avoided, and the superheated steam is stabilized within a preset target temperature range.

In the embodiment of the invention, the optimal desuperheating water valve control strategy model is obtained in the simulated environment model by setting the optimization target and utilizing the reinforcement learning algorithm to carry out interaction between the desuperheating water valve control strategy model and the unit state transfer strategy model in the simulated environment model. The embodiment of the invention considers the system time delay of the temperature-reducing water valve control, avoids the over-adjustment of the temperature-reducing water valve, enables the superheated steam to be stabilized within the preset target temperature range, simultaneously does not need to manually adjust the parameters of the controller, and can realize fully intelligent automatic learning and iterative optimization.

Fig. 6 is a block diagram of an optimization apparatus of a temperature-reduced water valve control strategy according to an embodiment of the present invention, and as shown in fig. 6, the optimization apparatus of the temperature-reduced water valve control strategy includes:

the data acquisition unit 61 is used for acquiring original action data of the desuperheating water valve;

the data construction unit 62 is used for constructing a state-action sequence data set of the temperature reduction water valve based on the original action data;

the model construction unit 63 is used for constructing a simulation environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, wherein the simulation environment model comprises a temperature reduction water valve control strategy model and a unit state transition strategy model;

and the model optimization unit 64 is used for performing optimization interaction on the temperature-reducing water valve control strategy model and the unit state transition strategy model to obtain an optimized temperature-reducing water valve control strategy model, wherein the optimization interaction times are preset optimization iteration times, and the optimization target of the optimization interaction is that the temperature of the superheated steam regulated by the temperature-reducing water valve is within a preset target temperature range.

Optionally, in the process of performing the optimization interaction with the preset number of optimization iterations by the model optimization unit 64, a given initialization state is used as the input value of the 1 st optimization interaction, the output value of the nth optimization interaction is used as the input value of the (n + 1) th optimization interaction, and n is greater than or equal to 1.

Optionally, the model optimization unit 64 is specifically configured to:

will be firstmOutput value of sub-optimal interactions _mInput desuperheating water valve control as input valuePolicy model to obtain output valuesa _mWherein, in the step (A),s _mto representmThe unit state of the thermal generator set is changed at any moment,a _mindicating the state of the thermal generator set ass _mThe valve opening degree of the temperature-reducing water valve;

will input the values _mAnd an output valuea _mConstructed as a state vectors _m，a _m]；

The state vector [ 2 ]s _m，a _m]Inputting the state transition strategy model of the unit to obtain the firstmOutput value of +1 optimizing interactions _m+1Wherein, in the step (A),s _m+1to representmAnd the set state of the thermal generator set at +1 moment.

Optionally, the model building unit 63 includes:

the system comprises a model construction subunit and a decision device model construction subunit, wherein the model construction subunit is used for constructing a simulation environment model of the thermal generator set through a reinforcement learning algorithm based on a state-action sequence data set and constructing a discriminator model through a neural network, and the simulation environment model comprises a temperature reduction water valve control strategy model and a unit state transfer strategy model;

the data set generating subunit is used for inputting the preset given unit state into the simulation environment model and generating a simulation state action sequence data set in the simulation environment;

the confidence coefficient generation subunit is used for inputting the simulation state action sequence data set and the state-action sequence data set of the thermal generator set in the real environment operation into a discriminator model to generate the confidence coefficient of the simulation environment model;

the parameter updating subunit is used for updating the network parameters of the temperature-reducing water valve control strategy model by using the generated confidence coefficient as an optimization target;

and the iteration execution subunit is used for repeating the action of updating the network parameters of the temperature-reducing water valve control strategy model for presetting the maximum sampling iteration times to obtain the optimized simulated environment model.

Optionally, the data constructing unit 62 is specifically configured to:

based on the original motion data and the formula D = [ ]s ₀，a ₀，s ₁，a ₁，……s _T-1，a _T-1，s _T]Constructing a state-action sequence data set of the desuperheating water valve, wherein D represents the state-action sequence data set,s _Tshows the unit state of the thermal generator set at the time T,a _Tindicating the state of the thermal generator set ass _TThe opening degree of the temperature-reducing water valve is a positive integer, T is greater than or equal to 0 and the like.

Optionally, after the model optimizing unit 64 obtains the optimized temperature-reduced water valve control strategy model, the method further includes:

and the desuperheating water valve control unit is used for controlling the work of the desuperheating water valve based on the optimized desuperheating water valve control strategy model.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

The optimization method of the temperature-reducing water valve control strategy provided by the embodiment of the invention has the same technical characteristics as the optimization device of the temperature-reducing water valve control strategy provided by the embodiment, so that the same technical problems can be solved, and the same technical effect can be achieved.

The desuperheating water valve provided by the embodiment of the invention comprises an optimization device of a desuperheating water valve control strategy in the embodiment, so that the desuperheating water valve provided by the embodiment of the invention also has the beneficial effects described in the embodiment, and details are not repeated here.

Fig. 7 is a schematic structural diagram of an optimizing apparatus of a temperature-reduced water valve control strategy according to an embodiment of the present invention, as shown in fig. 7, the optimizing apparatus of the temperature-reduced water valve control strategy includes a processor 71, a memory 72, an input device 73 and an output device 74; the number of the processors 71 in the optimization device of the temperature reduction water valve control strategy can be one or more, and one processor 71 is taken as an example in fig. 7; the processor 71, the memory 72, the input device 73 and the output device 74 of the financial product recommendation apparatus may be connected by a bus or other means, and fig. 7 illustrates the example of connection by a bus.

The memory 72 is a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the financial product recommendation method in the embodiment of the present invention (for example, the data acquisition unit 61, the data construction unit 62, the model construction unit 63, and the model optimization unit 64 in the optimization apparatus of the temperature-reduced water valve control strategy). The processor 71 executes various functional applications and data processing of the optimization device of the desuperheating water valve control strategy by executing software programs, instructions and modules stored in the memory 72, i.e. implements the optimization method of the desuperheating water valve control strategy described above.

The memory 72 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 72 may further include memory located remotely from the processor 71, and these remote memories may be connected to the financial product recommendation device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 73 may be used to receive entered numerical or character information and generate key signal inputs related to user settings and function controls of the optimization apparatus of the desuperheating water valve control strategy. The output device 74 may include a display device such as a display screen.

Embodiments of the present invention also provide a storage medium containing computer-executable instructions that, when executed by a computer processor, perform a method of optimizing a desuperheating water valve control strategy.

Specifically, the optimization method of the temperature-reducing water valve control strategy comprises the following steps:

acquiring original action data of a temperature-reducing water valve, wherein the original action data comprises instrument data in the operation process of a thermal generator set and control parameters of the temperature-reducing water valve;

constructing a state-action sequence data set of the temperature-reducing water valve based on the original action data;

constructing a simulation environment model of the thermal generator set through a reinforcement learning algorithm based on a state-action sequence data set, wherein the simulation environment model comprises a desuperheating water valve control strategy model and a unit state transfer strategy model;

and carrying out optimization interaction of preset optimization iteration times on the temperature reduction water valve control strategy model and the unit state transfer strategy model to obtain an optimized temperature reduction water valve control strategy model, wherein the optimization objective of the optimization interaction is that the temperature reduction water valve adjusts the current temperature of the superheated steam to be within a preset objective temperature range.

Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operations of the method described above, but can also perform related operations in the optimization method of the temperature-reduced water valve control strategy provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the above search apparatus, each included unit and module are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

In the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Finally, it should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention and the technical principles applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of optimizing a desuperheating water valve control strategy, comprising:

the constructing a state-action sequence dataset for the desuperheating water valve based on the raw action data comprises:

based on the raw motion data and the formula D = [, ]s ₀，a ₀，s ₁，a ₁，……s _T-1，a _T-1，s _T]Constructing a state-action sequence dataset for the desuperheating water valve, wherein D represents the state-action sequence dataset,s _Tshowing the unit state of the thermal generator set at the time T,a _Tthe state of the thermal generator set is represented ass _TThe opening degree of the temperature-reducing water valve is T which is an integer greater than or equal to 0;

constructing a simulated environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, wherein the simulated environment model comprises a desuperheating water valve control strategy model and a unit state transition strategy model, and the unit state transition strategy model is used for predicting to obtain the unit operation state at the next moment according to the unit operation state of the thermal generator set at the current moment and the valve opening of a desuperheating water valve;

performing optimization interaction of preset optimization iteration times on the temperature reduction water valve control strategy model and the unit state transition strategy model to obtain the optimized temperature reduction water valve control strategy model, wherein the optimization goal of the optimization interaction is that the current temperature of superheated steam regulated by the temperature reduction water valve is within a preset target temperature range, and the optimization interaction is realized through the following processes: and inputting a set initialization state as an input value of the 1 st optimization interaction into the temperature-reducing water valve control strategy model, inputting the current unit state and the control behavior output by the temperature-reducing water valve control strategy model into the unit state transfer strategy model to obtain an output value of the unit state transfer strategy model, namely the unit state at the next moment, and taking the output value of the unit state transfer strategy model every time as an input value of the temperature-reducing water valve control strategy model at the next optimization interaction from the 2 nd optimization interaction.

2. The optimization method according to claim 1, wherein during the optimization interaction process of the preset number of optimization iterations, a given initialization state is used as an input value of the 1 st optimization interaction, an output value of the n-th optimization interaction is used as an input value of the n +1 th optimization interaction, and n is greater than or equal to 1.

3. The optimization method according to claim 2, wherein the optimizing interaction of the desuperheating water valve control strategy model and the crew state transition strategy model comprises:

4. The optimization method according to claim 1, wherein in constructing the simulated environment model, constructing the simulated environment model of the thermal power plant through a reinforcement learning algorithm based on the state-action sequence dataset comprises:

5. The optimization method according to claim 1, further comprising, after obtaining the optimized desuperheating water valve control strategy model:

6. An apparatus for optimizing a desuperheating water valve control strategy, comprising:

the system comprises a data acquisition unit, a temperature reduction water valve control unit and a control unit, wherein the data acquisition unit is used for acquiring original action data of the temperature reduction water valve, and the original action data comprises instrument data and control parameters of the temperature reduction water valve in the operation process of a thermal generator set;

the model construction unit is used for constructing a simulation environment model of the thermal generator set through a reinforcement learning algorithm based on the state-action sequence data set, wherein the simulation environment model comprises a temperature reduction water valve control strategy model and a unit state transition strategy model, and the unit state transition strategy model is used for predicting the unit operation state at the next moment according to the unit operation state at the current moment of the thermal generator set and the valve opening of a temperature reduction water valve;

the model optimization unit is used for carrying out optimization interaction on the temperature-reducing water valve control strategy model and the unit state transition strategy model to obtain the optimized temperature-reducing water valve control strategy model, wherein the optimization interaction times are preset optimization iteration times, the optimization target of the optimization interaction is that the current temperature of the superheated steam regulated by the temperature-reducing water valve is in a preset target temperature range, and the optimization interaction is realized through the following processes: and inputting a set initialization state as an input value of the 1 st optimization interaction into the temperature-reducing water valve control strategy model, inputting the current unit state and the control behavior output by the temperature-reducing water valve control strategy model into the unit state transfer strategy model to obtain an output value of the unit state transfer strategy model, namely the unit state at the next moment, and taking the output value of the unit state transfer strategy model every time as an input value of the temperature-reducing water valve control strategy model at the next optimization interaction from the 2 nd optimization interaction.

7. A desuperheating water valve comprising means for optimizing a desuperheating water valve control strategy as claimed in claim 6.

8. An optimization apparatus of a desuperheating water valve control strategy, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of optimizing a desuperheating water valve control strategy as claimed in any one of claims 1 to 5.

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of optimizing a desuperheating water valve control strategy as defined in any one of claims 1-5.