CN115933380A

CN115933380A - Training method of strategy prediction model of power generation process and related equipment

Info

Publication number: CN115933380A
Application number: CN202211436526.5A
Authority: CN
Inventors: 魏庆来; 高爱国; 尚勇; 那士博; 程相; 宋睿卓
Original assignee: State Grid Corp of China SGCC; North China Electric Power Research Institute Co Ltd; Institute of Automation of Chinese Academy of Science
Current assignee: State Grid Corp of China SGCC; North China Electric Power Research Institute Co Ltd; Institute of Automation of Chinese Academy of Science
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-04-07

Abstract

The invention provides a training method and related equipment of a strategy prediction model in a power generation process, wherein the method comprises the following steps: initializing an initial strategy prediction model, wherein the strategy prediction model comprises a control network, an interference network, a model network and an evaluation network; acquiring historical power generation data, and performing weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model; training the intermediate strategy prediction model based on historical power generation data to obtain an optimized evaluation network, a control network and an interference network; and determining whether the trained strategy prediction model is converged, and obtaining the strategy prediction model when convergence is determined. Through mutual limitation among networks, the influence of control and interference factors on power generation is introduced, and the accuracy of power generation strategy prediction is improved.

Description

Training method of strategy prediction model of power generation process and related equipment

Technical Field

The invention relates to the technical field of optimal control of a power generation process, in particular to a training method and device of a strategy prediction model of the power generation process, electronic equipment and a storage medium.

Background

With the continuous development and updating of the technology, the power generation technology is continuously perfected and developed, but in the industry, thermal power generation is still the most common power generation means, and in order to generate power more safely and reasonably, the automatic control technology plays a great role in the thermal power generation process, and the safety of power generation and the reasonable distribution of power resources are improved to a certain extent.

However, the conventional automatic control technology is relatively rough in parameter control, for example, the adjustment and setting modes of parameters are preset, and the influence of the change of the environment and the power generation scene in the power generation process on the power generation is ignored. The arrangement modes limit the power generation benefit and bring certain potential safety hazard to the power generation process due to environment variation.

Disclosure of Invention

The invention provides a training method, a training device, electronic equipment and a storage medium of a strategy prediction model in a power generation process, which are used for solving the defect of rough parameter control in the power generation process in the prior art, realizing dynamic adjustment of a power generation strategy, improving the operation safety and the power generation efficiency of power generation equipment and having better power generation control effect.

The invention provides a training method of a strategy prediction model in a power generation process, which comprises the following steps:

initializing an initial strategy prediction model, wherein the strategy prediction model comprises a control network, an interference network, a model network and an evaluation network;

obtaining historical power generation data, and performing weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model;

training the intermediate strategy prediction model based on historical power generation data to obtain the optimized evaluation network, the optimized control network and the optimized interference network;

and determining whether the trained strategy prediction model is converged or not, and obtaining the strategy prediction model when convergence is determined.

According to the training method of the strategy prediction model in the power generation process, provided by the invention, the model network is subjected to weight optimization according to the historical power generation data to obtain the optimized intermediate strategy prediction model, and the training method comprises the following steps:

based on time sequence, acquiring state information corresponding to each moment in the historical power generation data, and grouping and associating the state information to obtain a plurality of groups of state groups, wherein each state group comprises two pieces of state information, and the moments are adjacent;

taking the plurality of groups of state groups as a training set of the model network, and training the model network;

and when the training is determined to be completed, obtaining an intermediate strategy prediction model based on the model network after the weight optimization.

According to the training method of the strategy prediction model of the power generation process provided by the invention, the training of the model network by taking the plurality of groups of state groups as the training set of the model network comprises the following steps:

dividing state information in the plurality of groups of state groups into input states and output states based on a time sequence;

inputting the input state into the model network to obtain a prediction state corresponding to the input state;

and calculating a state error according to the predicted state and the output state, and determining that the training is finished when the state error is smaller than a preset error value.

According to the training method of the strategy prediction model in the power generation process, the training of the intermediate strategy prediction model is performed based on the historical power generation data to obtain the optimized evaluation network, the optimized control network and the optimized interference network, and the training method comprises the following steps:

the historical power generation data are used as input on the basis of time labels and are respectively input into a first evaluation network of the control network, the interference network and the evaluation network, so that control output, interference output and first evaluation output are respectively obtained;

inputting the control output, the interference output and the historical power generation data into the optimized model network, and outputting to obtain model output;

inputting the model output into a second evaluation network of the evaluation network, and outputting to obtain a second evaluation output;

calculating to obtain an evaluation loss value of the evaluation network according to the first evaluation output and the second evaluation output, and obtaining a control loss value corresponding to the control network and an interference loss value corresponding to the interference network according to an optimal value function;

and obtaining the optimized evaluation network, the optimized control network and the optimized interference network according to the evaluation loss value, the optimized control loss value and the optimized interference loss value.

According to the training method of the strategy prediction model of the power generation process, the control loss value corresponding to the control network and the interference loss value corresponding to the interference network are obtained according to the optimal value function, and the training method comprises the following steps:

the control network is subjected to derivation according to an optimal value function to obtain a first derivation result, and a control loss value corresponding to the control network is obtained through calculation according to the first derivation result and the control output;

and performing derivation on the interference network according to an optimal value function to obtain a second derivation result, and calculating to obtain an interference loss value corresponding to the control network according to the second derivation result and the interference output.

According to the training method of the strategy prediction model of the power generation process, the method for obtaining the optimized evaluation network, the optimized control network and the optimized interference network according to the evaluation loss value, the optimized control loss value and the optimized interference loss value comprises the following steps:

comparing the evaluation loss value with a loss threshold value, determining whether the evaluation network, the control network and the interference network are optimized and completed, and obtaining the optimized evaluation network, the optimized control network and the optimized interference network when determining that the optimization is completed;

wherein the determining whether optimization is complete comprises:

if the evaluation loss value is less than or equal to the loss threshold value, determining that the optimization is completed;

and if the evaluation loss value is larger than the loss threshold value, determining that the optimization is not completed.

According to the training method of the strategy prediction model of the power generation process, the method further comprises the following steps:

receiving an input initial state in response to a power generation prediction instruction;

obtaining a control strategy and an interference strategy in the initial state based on the trained strategy prediction model;

the invention also provides a training device of a strategy prediction model of a power generation process, which is used for controlling and adjusting power generation based on the control strategy and the interference strategy, and comprises the following components:

the system comprises an initial adjusting module, a strategy prediction module and a strategy prediction module, wherein the initial adjusting module is used for initializing an initial strategy prediction model, and the strategy prediction model comprises a control network, an interference network, a model network and an evaluation network;

the first optimization module is used for acquiring historical power generation data and carrying out weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model;

the second optimization module is used for training the intermediate strategy prediction model based on historical power generation data to obtain the optimized evaluation network, the optimized control network and the optimized interference network;

and the convergence judging module is used for determining whether the trained strategy prediction model converges and obtaining the strategy prediction model when convergence is determined.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the training method of the strategy prediction model of the power generation process.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of training a strategic prediction model of a power generation process as described in any of the above.

The invention provides a training method, a device, electronic equipment and a storage medium of a strategy prediction model in a power generation process, which are used for constructing a corresponding power generation strategy prediction model based on a power generation system, simultaneously setting a plurality of different networks, training and optimizing the weight of the model network which outputs the next state after the weight initialization of each network is completed, reducing the influence of other networks on the model network through separate training, improving the robustness and the prediction accuracy of the model network, optimizing the weight of each network by taking the strategy prediction model as a whole when the optimization training is completed, improving the robustness and the training efficiency of the model through controlling the mutual limitation of the network, an interference network and an evaluation network, and finally enabling the model to meet the use requirement through convergence judgment. By introducing control and interference factors, the comprehensive effect of better power generation control is achieved, the stable operation of equipment is guaranteed, and the accuracy of power generation strategy prediction is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for training a strategic prediction model of a power generation process provided by the present invention;

FIG. 2 is a schematic diagram of a strategy prediction model provided by the present invention;

FIG. 3 is a flow diagram of a process for deriving an intermediate policy prediction model provided by the present invention;

FIG. 4 is a flow chart illustrating a process for training a model network provided by the present invention;

FIG. 5 is a flow chart illustrating a process for training an intermediate strategy prediction model provided by the present invention;

FIG. 6 is a schematic structural diagram of a training apparatus for a strategy prediction model of a power generation process provided by the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The training method of the strategy prediction model of the power generation process of the present invention is described below with reference to fig. 1 to 5.

FIG. 1 is a flow chart of a method for training a strategy prediction model of a power generation process according to the present invention. As shown in fig. 1, the method includes:

step 101, initializing an initial policy prediction model, wherein the policy prediction model comprises a control network, an interference network, a model network and an evaluation network.

In order to timely and accurately adjust power generation in the using process, an adjusting strategy needs to be determined in advance, and then the power generation strategy is timely and accurately adjusted according to the set adjusting strategy and the actual situation and the power generation state. Specifically, when the regulation strategy of power generation is determined, the regulation strategy in the power generation system is accurately optimized through self-adaptive dynamic planning of the power generation system. For the power generation system, modeling can be carried out, and in order to realize the prediction of the power generation strategy, a corresponding strategy prediction model can be constructed, corresponding initialization processing is carried out, and then the model is optimized and trained for subsequent use.

As for the constructed policy prediction model, as shown in fig. 2, fig. 2 is a schematic structural diagram of the policy prediction model provided by the present invention, wherein the policy prediction model includes several parts, specifically: a control network, an interference network, a model network and several evaluation networks, where the number of evaluation networks is set to two. In the strategy prediction model, the control network and the interference network respectively give reasonable control quantity and interference quantity to the power generation strategy, so that the obtained power generation strategy is more suitable for actual application scenes and use conditions.

For each network structure in the constructed strategy prediction model, the network structure can be set and adjusted according to actual requirements, and specific setting can be performed according to requirements, such as setting different network structures for different networks. In each embodiment, for example, the network structure of the model network may be set to 8-17-4, where 8 is the number of nodes in the input layer, 17 is the number of nodes in the hidden layer, and 4 is the number of nodes in the output layer; setting the network structures of a control network and an interference network as 4-8-2, wherein 4 is the number of nodes of an input layer, 8 is the number of nodes of a hidden layer, and 2 is the number of nodes of an output layer; and setting the network structure of the evaluation network as 4-8-1, wherein 4 is the number of nodes of an input layer, 8 is the number of nodes of an implicit layer, and 1 is the number of nodes of an output layer.

Meanwhile, according to the use requirements of different networks, corresponding neural networks can be adopted when each network is constructed, such as model networks, control networks, interference networks and/or evaluation networks established based on BP neural networks.

It should be noted that, for the strategy prediction model, optimization training needs to be performed on all networks during training, but after the optimization training is finished, when strategy prediction is performed, it is only necessary to obtain a control quantity and an interference quantity in an actual input state according to the state, and then strategy adjustment is performed according to the obtained control quantity and interference quantity.

When the strategy prediction model of the power generation system is constructed, system characteristics, such as the fact that the system meets a specific change rule, including but not limited to meeting a linear condition and a nonlinear condition, can be obtained based on the constructed strategy prediction model. Here, taking the constructed power generation system as a discrete-time random linear time-invariant system as an example, the system at this time is:

wherein x is _k And x _k+1 ∈R ⁿ Is the state of the system, and x _k Is the previous state, x _k+1 Is in the next adjacent state, u _k ∈R ^p And v _k ∈R ^q Respectively a control quantity and a disturbance quantity,

is a random system matrix with appropriate dimensions:

in the above equation, a deterministic real number matrix A ∈ R is given ^n×n Control input matrix B e R ^n×p The interference input matrix C ∈ R ^n×q And random interference input sequence

Their mean value may be set to 0 and variance to 1. The three random interfering input sequences are independently uncorrelated.

In the power generation system, the consideration that control and interference can exist simultaneously and act on the power generation system, and a random system matrix exists in the system state, the control and the interference and is used for simulating noise existing in the power generation process under the real condition, so that the subsequent prediction is more accurate.

Since the power generation system is a linear quadratic system, its performance index can be set as:

wherein, F = F ^T ,R＝R ^T ,S＝S ^T Is a positive definite matrix.

The reference of the performance index is mainly to find Nash equilibrium solution (u) ^* (x _k ),v ^* (x _k ) ) minimize performance indicators with respect to control and maximize performance indicators with respect to interference.

In addition, there are also value functions corresponding to the performance indexes, and after defining and obtaining the performance indexes, the value functions corresponding to the performance indexes can be obtained, specifically:

wherein u is _k Is the control in the optimum case, v _k Is the worst case interference, E (·) represents the mathematical expectation, P = P ^T >0 is a unique positive definite matrix.

And then in the subsequent optimization and training process, the optimization condition of the network can be judged based on the performance index or the value function, and whether the optimization is completed or not is determined.

And 102, acquiring historical power generation data, and performing weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model.

After the construction and initialization of the strategic prediction model is completed, it will be optimized and trained. Specifically, when training is performed, corresponding historical power generation data are obtained first, then the initialized strategy prediction model is optimized and trained according to the obtained historical power generation data, and here, weight optimization, namely optimization training, is performed on a model network in the strategy prediction model first.

For the whole strategy prediction model, a plurality of different or same networks are included, which respectively play different roles, such as a model network used for predicting the state of the next adjacent system in the optimization process, a control network used for determining the control quantity, an interference network used for determining the interference quantity, and an evaluation model used for determining whether the whole model is optimized.

During the optimization training, one or some of the networks may be separately optimized according to the actual situation, for example, during the training optimization process of the model, the model network is independently optimized, specifically, referring to fig. 3, fig. 3 is a flowchart of the process for obtaining the intermediate policy prediction model provided by the present invention. Wherein the process comprises:

step 301, acquiring state information corresponding to each moment in historical power generation data based on a time sequence, and grouping and associating the state information to obtain a plurality of groups of state groups, wherein each state group comprises two pieces of state information, and the moments are adjacent;

step 302, taking a plurality of groups of state groups as a training set of the model network, and training the model network;

and 303, when the training is determined to be finished, obtaining an intermediate strategy prediction model based on the model network after the weight optimization.

The model network is used for obtaining the next state according to the current state, so that the next state is optimized according to the state data in the historical power generation data during training. Specifically, when historical power generation data are obtained, state information in the historical power generation data are obtained, each state information corresponds to a moment, the moments have a certain sequence, the obtained state information is grouped based on a time sequence to obtain a plurality of groups of state groups, the obtained groups of state groups are used as a training set for model network training to train a model network, weight optimization of the model network is achieved, and finally an intermediate strategy prediction model is obtained when optimization is completed.

Illustratively, each of the historical power generation dataThe data all can correspond to only time, and have certain precedence order, if historical electricity generation data includes 100 data, the precedence order based on time respectively is: g ₁ 、G ₂ 、...、G _n 、...G ₁₀₀ Where n < 100, when grouping, based on different grouping modes, the obtained grouping may be: (G) ₁ ，G ₂ )、(G ₃ ，G ₄ )、...(G _n-1 ，G _n )、...(G ₉₉ 、G ₁₀₀ ) The method can also be as follows: (G) ₁ ，G ₂ )、(G ₂ ，G ₃ )、...(G _n-1 ，G _n )、...(G ₉₉ 、G ₁₀₀ ) And are not particularly limited.

After the grouping of the historical data is completed, the model network is trained by using a training set obtained by grouping, the model network is initialized in weight in advance, after the network structure of the model network is determined, the weight of the model network can be initialized in a random mode within the range of (-1,1), and then the initialized model network is optimized and trained.

Referring to fig. 4, fig. 4 is a flowchart illustrating a process of training a model network according to the present invention, wherein the process includes:

step 401, dividing state information in a plurality of groups of state groups into input states and output states based on time sequence;

step 402, inputting an input state into a model network to obtain a prediction state corresponding to the input state;

and 403, calculating a state error according to the predicted state and the output state, and determining that the training is finished when the state error is smaller than a preset error value.

After the grouping of the historical data is completed, the grouped state groups comprise two state data, the states in each state group can be marked on the basis of time sequence and are respectively marked as an input state and an output state, then the input states are input into a model network, a prediction state corresponding to each input state can be obtained, finally, the optimization of the model network is completed by calculating the state error of the prediction state and the output state, and when the condition that the optimization of the model network is completed is determined, the obtained state error is required to be smaller than a preset error value.

For the obtained state group, setting the two states as the input state and the output state of the model network respectively according to the sequence or the time sequence of the states, such as the state group (G) ₁ ，G ₂ ) Then G will be ₁ Set to an input state and G ₂ Set the output state, and G ₂ Based on G in the real case ₁ And when the next state is optimized to the model network, inputting an input state into the model network, outputting the model network to obtain a predicted state, then comparing the predicted state with an output state to determine whether the model network is optimized, specifically obtaining a state error according to the predicted state and the output state, and when the state error is small (smaller than a preset error value), indicating that the model network is optimized, otherwise, optimizing and training again.

It should be noted that, for the determination of whether the optimization is completed, various manners may be adopted, such as calculating an error between the predicted state and the output state, then determining that the optimization is completed when the error is less than 0.01, and then determining that the optimization is completed when the similarity is greater than a set threshold value, for example, by using a similarity value between the predicted state and the output state.

And when the optimization of the model network is completed, determining the weight of the model network, and then adjusting the model network in the strategy prediction model based on the obtained weight to obtain an intermediate strategy prediction model.

And 103, training the intermediate strategy prediction model based on the historical power generation data to obtain an optimized evaluation network, a control network and an interference network.

After the model network is optimized to obtain the intermediate strategy prediction model, the other networks of the strategy prediction model are optimized and trained, specifically, when the other networks are optimized and trained, the training is performed based on historical power generation data, and the weight optimization and adjustment of the control network, the interference network and the evaluation network are realized by training the intermediate strategy prediction model.

When the intermediate strategy prediction model is trained, the weight optimization of the control network, the interference network and the evaluation network is synchronously performed, at the moment, three networks needing the weight optimization are taken as a training whole, the robustness of the model is improved through the limitation among the networks, and meanwhile, the accuracy of the strategy prediction is also improved.

Referring to fig. 5, fig. 5 is a schematic flowchart of a process for training an intermediate policy prediction model provided in the present invention, where the process includes:

step 501, inputting historical power generation data as input based on a time tag into a first evaluation network of a control network, an interference network and an evaluation network respectively to obtain a control output, an interference output and a first evaluation output respectively;

step 502, inputting control output, interference output and historical power generation data into an optimized model network, and outputting to obtain model output;

step 503, inputting the model output into a second evaluation network of the evaluation network, and outputting to obtain a second evaluation output;

step 504, calculating to obtain an evaluation loss value of the evaluation network according to the first evaluation output and the second evaluation output, and obtaining a control loss value corresponding to the control network and an interference loss value corresponding to the interference network according to an optimal value function;

and 505, obtaining an optimized evaluation network, a control network and an interference network according to the evaluation loss value, the control loss value and the interference loss value.

When the intermediate strategy prediction model is trained, historical power generation data are used as the input of a control network, an interference network and a first evaluation network and are respectively output to obtain a control output, an interference output and a first evaluation output, then the control output and the interference output are input into the model network together, meanwhile, historical power generation data are also input into the model network, the model network is processed to output to obtain a model output, then the model output is used as the input of a second evaluation network to output to obtain a second evaluation output, further, a price loss value of the evaluation network is obtained according to the first evaluation output and the second evaluation output, a control loss value of the control network and an interference loss value of the interference network are respectively obtained according to an optimal value function, and finally, the control network, the interference network and the evaluation network are subjected to weight optimization according to the obtained loss values.

In the actual training process, the output of the evaluation network is an approximate function, the historical power generation data is used as the input of the first evaluation network, the approximate function corresponding to each state can be obtained, and when the historical power generation data is input into the second evaluation network through the processing of the control network, the interference network and the model network, the model network predicts and outputs the next state based on the current state, so the approximate function corresponding to the next state of each state can be obtained. At this time, for the evaluation network, the corresponding loss function is the difference between two approximation functions, specifically: e _ci ＝V _i (X _k )-V _i+1 (X _k ) In which V is _i (X _k ) For the output of the first evaluation network, V _i+1 (X _k ) Is the output of the second evaluation network.

And when determining the loss values of the control network and the interference network, the method comprises the following steps: the method comprises the steps of conducting derivation on a control network according to an optimal value function to obtain a first derivation result, and calculating according to the first derivation result and control output to obtain a control loss value corresponding to the control network; and carrying out derivation on the interference network according to the optimal value function to obtain a second derivation result, and calculating to obtain an interference loss value corresponding to the control network according to the second derivation result and the interference output.

Wherein, the definition of the optimal value function is: if the expected reward of a policy π is greater than a policy π 'in all states, then the policy π is said to be better than π', or in other words π ≧ π 'and only if it satisfies v π (S) ≧ v π' (S) for all states S ∈ S. It is natural that the optimal strategy is of course a better strategy than all others, the optimal strategy may be more than one, but may be expressed uniformly as pi _＊ They have the same stateA value function, i.e. an optimum value function.

Therefore, when the control network and the interference network are optimized, optimization is performed according to the optimal value function, the optimal value functions corresponding to the control network and the interference network are determined respectively, then derivation is performed on the optimal value functions, calculation is performed according to the derivation results and corresponding network outputs respectively, and the control loss value corresponding to the control network and the interference loss value corresponding to the interference network can be obtained.

And when determining whether the control network, the interference network and the model network are optimized according to the obtained loss value, the judgment and the determination can be carried out through the analysis of the loss value, and the method specifically comprises the following steps: comparing the evaluation loss value with a loss threshold value, determining whether the evaluation network, the control network and the interference network are optimized, and obtaining the optimized evaluation network, the optimized control network and the optimized interference network when determining that the optimization is completed; wherein determining whether optimization is complete comprises: if the evaluation loss value is less than or equal to the loss threshold value, determining that the optimization is completed; and if the evaluation loss value is larger than the loss threshold value, determining that the optimization is not completed.

During training, whether the network is trained and optimized is determined according to the loss value of each network, and then parameter information corresponding to each network can be determined when the optimization is determined to be completed, so that when the whole model is determined to be trained, the power generation strategy in the power generation process can be correspondingly predicted. And when determining whether the training optimization is completed, comparing the loss value of each network with the corresponding threshold, and when the loss value is less than or equal to the set threshold, determining that the optimization is completed and the anti-regularization determination needs to be continued.

In addition, when whether optimization is completed or not is determined, the total loss value of the strategy prediction model can be obtained according to the evaluation loss value, the control loss value and the interference loss value, and whether optimization is completed or not is determined according to the total loss value. Specifically, when the intermediate policy prediction model is trained, optimization and training are performed as a whole, so when it is determined whether optimization is completed, loss values of the evaluation loss value, the control loss value and the interference loss value can be summed to obtain a total loss value of the policy prediction model, and the total loss value is compared with a preset value to determine whether the optimized control network, interference network and evaluation network can be obtained, if so, the three networks can be directly obtained, otherwise, step 501 is performed until the loss value meets the set condition, and if the total loss value is less than or equal to the preset value.

When the optimization adjustment is needed, the weights of the networks except the model network need to be updated, and step 501 is executed after the update. When updating the weights of the networks, the specific updating method may be as follows:

for the evaluation network, the weight update rule is:

for the control network, the weight update rule is:

for an interfering network, the weight update rule is:

when the current optimized weight does not meet the set conditions, the weights of the control network, the interference network and the evaluation network are updated according to the set equal weight updating rule and then determined.

And 104, determining whether the trained strategy prediction model is converged, and obtaining the strategy prediction model when convergence is determined.

After the weight optimization of the network in the strategy prediction model is completed, whether the strategy prediction model after optimization training is converged is determined, and then the strategy prediction model is obtained when the convergence is determined. Specifically, in the process of training the strategy prediction model, whether training is finished or not is determined according to an actual training result, and a conventional means and a conventional manner are to determine whether the trained strategy prediction model is converged or not, and meanwhile, a test on the model can be added during convergence judgment, so that the strategy prediction model which can be used is obtained under the condition that the model is converged and the test requirement is met.

For the judgment of convergence, the corresponding training times may be preset, and convergence is determined when the training times reach the set training times, at this time, the weights received by each network are retained, otherwise, training is continued to execute step 103.

In addition, after determining convergence, it is also possible to acquire the prediction accuracy of the strategic prediction model and then determine whether the model can be used according to the prediction accuracy. When the accuracy is checked, historical data in the power generation process can not be acquired in a sampling mode, and then the historical data is used as a test set to perform accuracy test on the converged strategy prediction model, so that when the accuracy meets set conditions (for example, the accuracy is higher than a set value), the strategy prediction model which can be used for strategy prediction is obtained.

It should be noted that, when a policy prediction model that can be used more than before is obtained, whether convergence is determined first, whether precision meets the condition is determined, and in addition, precision judgment can be directly performed after weight optimization of each network is completed, whether training is completed is determined according to precision, and taking the convergence judgment of training times as an example, at this time, training times can be disregarded, and further judgment on the model is realized according to precision only when loss meets the price adjustment.

Further, the training of the strategy prediction model is to obtain a corresponding power generation strategy for subsequent use, that is, according to the current state, wherein the power generation strategy mainly comprises a control strategy and an interference strategy for the power generation system, and the control and adjustment of the power generation process are realized.

Specifically, when performing policy prediction, the method includes: receiving an input initial state in response to a power generation prediction instruction; obtaining a control strategy and an interference strategy in an initial state based on the trained strategy prediction model; and performing power generation control adjustment based on the control strategy and the interference strategy.

In the using process of the model, a new initial state is given, and then the control strategy and the interference strategy corresponding to the new initial state are obtained by processing according to the control network and the interference network in the strategy prediction model.

In the method for training the strategy prediction model in the power generation process, a corresponding power generation strategy prediction model is constructed based on a power generation system, a plurality of different networks are arranged at the same time, after the weight initialization of each network is completed, the model network which is output in the next state is trained and weight optimization is performed, the influence of other networks on the model network is reduced through separate training, the robustness and the prediction accuracy of the model network are improved, the strategy prediction model is taken as a whole when the optimization training is completed, the weight of each network is optimized, the robustness and the training efficiency of the model are improved through controlling the mutual limitation of the network, an interference network and an evaluation network, and finally the model meets the use requirement through convergence judgment. By introducing control and interference factors, the comprehensive effect of better power generation control is achieved, the stable operation of equipment is guaranteed, and the accuracy of power generation strategy prediction is improved.

The following describes the training device of the strategy prediction model of the power generation process provided by the present invention, and the training device of the strategy prediction model of the power generation process described below and the training method of the strategy prediction model of the power generation process described above can be referred to each other correspondingly.

Fig. 6 is a schematic structural diagram of a training apparatus for a strategy prediction model of a power generation process according to the present invention, and as shown in fig. 7, the training apparatus 600 for a strategy prediction model of a power generation process includes:

an initial adjustment module 601, configured to initialize an initial policy prediction model, where the policy prediction model includes a control network, an interference network, a model network, and an evaluation network;

the first optimization module 602 is configured to obtain historical power generation data, and perform weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate policy prediction model;

a second optimization module 603, configured to train the intermediate policy prediction model based on historical power generation data, so as to obtain an optimized evaluation network, a control network, and an interference network;

and a convergence judging module 604, configured to determine whether the trained policy prediction model converges, and obtain the policy prediction model when convergence is determined.

Based on the above embodiment, the first optimization module is further configured to:

based on time sequence, acquiring state information corresponding to each moment in historical power generation data, and performing grouping association on the state information to obtain a plurality of groups of state groups, wherein each state group comprises two pieces of state information, and the moments are adjacent;

taking a plurality of groups of state groups as a training set of the model network, and training the model network;

and when the training is determined to be finished, obtaining an intermediate strategy prediction model based on the model network after the weight optimization.

dividing state information in a plurality of groups of state groups into input states and output states based on time sequence;

inputting the input state into a model network to obtain a prediction state corresponding to the input state;

Based on the above embodiment, the second optimization module is further configured to:

based on the time labels, the historical power generation data are used as input and are respectively input into a first evaluation network of the control network, the interference network and the evaluation network, and control output, interference output and first evaluation output are respectively obtained;

the method comprises the steps of conducting derivation on a control network according to an optimal value function to obtain a first derivation result, and calculating according to the first derivation result and control output to obtain a control loss value corresponding to the control network;

and performing derivation on the interference network according to the optimal value function to obtain a second derivation result, and calculating according to the second derivation result and the interference output to obtain an interference loss value corresponding to the control network.

comparing the evaluation loss value with a loss threshold value, determining whether the optimization of the evaluation network, the control network and the interference network is completed, and obtaining the optimized evaluation network, the optimized control network and the optimized interference network when determining that the optimization is completed;

wherein determining whether optimization is complete comprises:

Based on the above embodiment, the training apparatus of the strategy prediction model in the power generation process further includes a strategy prediction module, configured to:

obtaining a control strategy and an interference strategy in an initial state based on the trained strategy prediction model;

and performing power generation control adjustment based on the control strategy and the interference strategy.

Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor) 710, a communication Interface (Communications Interface) 720, a memory (memory) 730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a method of training a strategic prediction model of a power generation process, the method comprising: initializing an initial strategy prediction model, wherein the strategy prediction model comprises a control network, an interference network, a model network and an evaluation network; obtaining historical power generation data, and performing weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model; training the intermediate strategy prediction model based on historical power generation data to obtain an optimized evaluation network, a control network and an interference network; and determining whether the trained strategy prediction model is converged, and obtaining the strategy prediction model when convergence is determined.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when being executed by a processor, being capable of executing a method for training a strategy prediction model of a power generation process, the method comprising: initializing an initial strategy prediction model, wherein the strategy prediction model comprises a control network, an interference network, a model network and an evaluation network; acquiring historical power generation data, and performing weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model; training the intermediate strategy prediction model based on historical power generation data to obtain an optimized evaluation network, a control network and an interference network; and determining whether the trained strategy prediction model is converged, and obtaining the strategy prediction model when convergence is determined.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of training a strategic prediction model of a power generation process provided by the above methods, the method comprising: initializing an initial strategy prediction model, wherein the strategy prediction model comprises a control network, an interference network, a model network and an evaluation network; obtaining historical power generation data, and performing weight optimization on the model network according to the historical power generation data to obtain an optimized intermediate strategy prediction model; training the intermediate strategy prediction model based on historical power generation data to obtain an optimized evaluation network, a control network and an interference network; and determining whether the trained strategy prediction model is converged, and obtaining the strategy prediction model when convergence is determined.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A training method of a strategy prediction model of a power generation process is characterized by comprising the following steps:

2. The method for training the strategy prediction model in the power generation process according to claim 1, wherein the performing weight optimization on the model network according to the historical power generation data to obtain the optimized intermediate strategy prediction model comprises:

3. The method for training the strategic prediction model of the power generation process according to claim 2, wherein said training said plurality of sets of states as a training set of said model network and said training said model network comprises:

4. The method for training the strategy prediction model of the power generation process according to claim 1, wherein the training the intermediate strategy prediction model based on the historical power generation data to obtain the optimized evaluation network, the optimized control network and the optimized interference network comprises:

the historical power generation data is used as input based on a time tag and is respectively input into a control network, an interference network and a first evaluation network of the evaluation network, and control output, interference output and first evaluation output are respectively obtained;

5. The method for training the strategy prediction model of the power generation process according to claim 4, wherein the obtaining the control loss value corresponding to the control network and the interference loss value corresponding to the interference network according to the optimal value function comprises:

and performing derivation on the interference network according to an optimal value function to obtain a second derivation result, and calculating according to the second derivation result and the interference output to obtain an interference loss value corresponding to the control network.

6. The method for training the strategy prediction model of the power generation process according to claim 4, wherein the step of obtaining the optimized evaluation network, the optimized control network and the optimized interference network according to the evaluation loss value, the optimized control loss value and the optimized interference loss value comprises the following steps:

wherein the determining whether optimization is complete comprises:

7. The method of training a strategic predictive model of a power generation process as claimed in claim 1, said method further comprising:

8. A training device of a strategy prediction model of a power generation process is characterized in that,

and the convergence judgment module is used for determining whether the trained strategy prediction model converges and obtaining the strategy prediction model when convergence is determined.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements a method of training a strategic prediction model of a power generation process as claimed in any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements a method of training a strategic prediction model of a power generation process as claimed in any one of claims 1 to 7.