CN114611823B

CN114611823B - Optimized dispatching method and system for electricity-cold-heat-gas multi-energy-demand typical park

Info

Publication number: CN114611823B
Application number: CN202210289380.XA
Authority: CN
Inventors: 张海滨
Original assignee: Terminus Technology Group Co Ltd
Current assignee: Terminus Technology Group Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-11-08
Anticipated expiration: 2042-03-23
Also published as: CN114611823A

Abstract

The invention provides an optimized dispatching method and system for a typical park with electricity-cold-heat-gas multi-energy demands, and belongs to the technical field of intelligent dispatching. The method comprises the following steps: acquiring performance parameters and constraint conditions of each device in an electricity, cold, heat and gas multi-energy system in a park; determining an objective function for optimizing scheduling, wherein the objective function comprises cost of electricity and gas and carbon emission; establishing an optimized dispatching reinforcement learning model, and determining a state space and a reward function, wherein the state space is determined according to the performance parameters of each device; and performing optimized scheduling on each device in the multi-energy system by using the optimized scheduling reinforcement learning model and based on the constraint conditions. The invention strengthens learning and carries out real-time scheduling optimization on the multi-energy system by considering the objective function of expense cost and carbon emission, thereby ensuring that scheduling can meet the real-time change of energy demand and improving the economy and environmental protection of the multi-energy system.

Description

Optimized dispatching method and system for electricity-cold-heat-gas multi-energy-demand typical park

Technical Field

The invention relates to the field of intelligent scheduling, in particular to an optimized scheduling method and system for a typical park with electricity-cold-heat-gas multi-energy requirements.

Background

In a traditional energy system, cold, heat, electricity and gas are often designed, operated and controlled independently, and different energy supply and energy consumption system main bodies cannot be integrally coordinated, matched and optimized, so that the overall utilization rate of energy is low. The multi-energy complementary comprehensive energy system is an energy production, supply and marketing integrated system formed by organically coordinating and optimizing production, transmission, conversion, storage, consumption and other links of various energy sources of cold, heat, electricity and gas in the processes of planning, construction, operation and the like, on one hand, the cascade utilization of energy sources is realized, the comprehensive utilization level of the energy sources is improved, and on the other hand, the comprehensive management and the coordination and complementation of the various energy sources are realized by utilizing a coupling mechanism of each energy system on the time and the space. At present, researches on a multi-energy complementary comprehensive energy system at home and abroad are mostly concentrated on a macroscopic level, such as system planning, functional architecture, technical form and the like, partial scholars use a control theory of a micro-grid and a scheduling theory of a large grid for reference to develop optimization operation researches on the comprehensive energy system, but mainly only two kinds of energy are studied to be coupled and use a consistent optimization period, the optimization method is consistent with a traditional method, the characteristics of multi-energy flow are not fully embodied, meanwhile, the researches on real-time coordination control of the multi-energy flow are rarely seen, the influence of load prediction errors on day-ahead scheduling cannot be solved, the economy of the multi-energy system is reduced, and the carbon emission is increased and cannot meet the environment-friendly requirements.

Disclosure of Invention

Therefore, the technical problem to be solved by the embodiments of the present invention is to overcome the defect of low economy and environmental protection caused by the fact that a multi-energy system in the prior art cannot be accurately regulated and controlled in real time, so as to provide an optimal scheduling method and system for a typical electricity-cold-heat-gas multi-energy demand park.

Therefore, the invention provides an optimal scheduling method for a typical park with electricity-cold-heat-gas multi-energy demand, which comprises the following steps:

acquiring performance parameters and constraint conditions of each device in an electricity, cold, heat and gas multi-energy system in a park;

determining an objective function for optimizing scheduling, wherein the objective function comprises cost of electricity and gas and carbon emission;

establishing an optimized scheduling reinforcement learning model, and determining a state space and a reward function, wherein the state space is determined according to the performance parameters of each device;

and performing optimized scheduling on each device in the multi-energy system by using the optimized scheduling reinforcement learning model and based on the constraint condition.

Optionally, the optimally scheduling, by using the optimally scheduling reinforcement learning model and based on the constraint condition, each device in the multi-energy system is optimally scheduled, including:

determining a plurality of selectable action values according to the current state, energy supply demand and environmental information of each device by using a first deep learning neural network model;

calculating the probability corresponding to each optional action value by utilizing a second deep learning neural network model;

and selecting the selectable action value corresponding to the maximum probability value as the current action value and executing.

Optionally, the first deep learning neural network model includes a radial basis function neural network, and the radial basis function neural network is established as follows:

establishing an input layer, wherein the input layer is used for inputting the current state, energy supply demand and environmental information of each device;

establishing a Gaussian radial basis function layer;

establishing a radial basis function weight connection layer;

and establishing a weight matrix of the output layer to perform matrix product operation with the output of the radial basis function weight connection layer.

Optionally, the first deep learning neural network model includes a radial basis function neural network, and a neuron excitation function of the radial basis function neural network is:

wherein, delta _l (x) Is the excitation function of the ith neuron node in the hidden layer, x is the input vector, c _l Center of excitation function for the first neuron node of the hidden layer, d _l The width of the center of the excitation function of the ith neuron node of the hidden layer.

Optionally, the optimally scheduling each device in the multi-energy system by using the optimal scheduling reinforcement learning model and based on the constraint condition includes:

determining a current action value according to the following formula:

wherein, a _ij Action value, s, of the jth adjustable parameter for the ith device _ijmax Is the maximum state value, s, in the state space _ij Is the current state value, s _ijmin Is the minimum state value in the state space.

determining an initial action value;

calculating a reward function value and a Q value based on the initial action value;

judging whether the reward function value and the Q value meet preset conditions or not;

if so, determining the initial action value as the action value;

otherwise, adjusting the initial action value by using a preset algorithm to obtain a new action value, calculating a reward function value and a Q value based on the new action value, and judging whether the preset condition is met;

and if so, determining the new action value as the action value, otherwise, continuing to execute the previous step until the reward function value and the Q value corresponding to the latest action value meet the preset condition.

Optionally, the establishing an optimized scheduling reinforcement learning model includes:

initializing network parameters of the optimized dispatching reinforcement learning model;

training the optimized scheduling reinforcement learning model after network parameters are initialized by using a pre-obtained training sample, and determining a loss function value of the optimized scheduling reinforcement learning model according to an obtained Q value;

adjusting the network parameters according to the following formula:

wherein, w _m (t + 1) is the adjusted network parameter, w _m (t) is the current network parameter, σ (t) is the loss function value.

The invention also provides an optimized dispatching system for the electricity-cold-heat-gas multi-energy demand typical park, which comprises the following components:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement any of the methods described above.

The technical scheme of the embodiment of the invention has the following advantages:

according to the optimal scheduling method and system for the electricity-cold-heat-gas multi-energy demand typical park, provided by the embodiment of the invention, the multi-energy system is scheduled and optimized in real time by considering the cost and the objective function of carbon emission through reinforcement learning, so that the scheduling can meet the real-time change of the energy demand, and the economy and the environmental protection of the multi-energy system are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a specific example of an optimal scheduling method for an electric-cooling-heating-air multi-energy demand typical park according to embodiment 1 of the present invention;

fig. 2 is a flowchart of a specific example of selecting a current action value in embodiment 1 of the present invention;

fig. 3 is a schematic block diagram of a specific example of an optimal scheduling system of an electric-cooling-heating-air demand typical park according to embodiment 2 of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In describing the present invention, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises" and/or "comprising," when used in this specification, are intended to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The term "and/or" includes any and all combinations of one or more of the associated listed items. The terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," "outer," and the like are used in the orientation or positional relationship indicated in the drawings for convenience in describing the invention and for simplicity in description, and do not indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be construed as limiting the invention. The terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The terms "mounted," "connected," and "coupled" are to be construed broadly and may, for example, be fixedly coupled, detachably coupled, or integrally coupled; can be mechanically or electrically connected; the two elements can be directly connected, indirectly connected through an intermediate medium, or communicated with each other inside; either a wireless or a wired connection. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1

The embodiment provides an optimized dispatching method for a typical park with electricity-cold-heat-gas multi-energy demand, as shown in fig. 1, comprising the following steps:

s1: acquiring performance parameters and constraint conditions of each device in an electricity, cold, heat and gas multi-energy system in a park;

s2: determining an objective function for optimizing scheduling, wherein the objective function comprises cost of electricity and gas and carbon emission;

s3: establishing an optimized dispatching reinforcement learning model, and determining a state space and a reward function, wherein the state space is determined according to the performance parameters of each device;

s4: and performing optimized scheduling on each device in the multi-energy system by using the optimized scheduling reinforcement learning model and based on the constraint conditions.

In the embodiment of the invention, the multi-energy system realizes multi-energy coordinated supply of cooling, heating and power by integrating energy supply resources in a park, and real-time scheduling and optimization are carried out on the multi-energy system by strengthening learning and taking the objective function of expense cost and carbon emission into consideration, so that the scheduling can meet the real-time change of energy requirements, and the economy and the environmental protection of the multi-energy system are improved.

The scheduling problem of the multi-energy park is an optimization problem of multivariable, multi-constraint and energy coupling relation in time. Wherein the objective function may be a normalized weighted sum of the cost and the carbon emissions. The constraint conditions comprise energy constraint conditions including electric balance constraint, cold and heat balance constraint and air balance constraint, and energy conversion constraint of the equipment.

The device in the multi-energy system comprises: the combined heat and power type micro-gas turbine, the electric boiler, the gas boiler, the storage battery, the heat storage equipment, the refrigeration equipment and the like.

Optionally, as shown in fig. 2, the performing, by using the optimal scheduling reinforcement learning model and based on the constraint condition, the optimal scheduling on each device in the multi-energy system, that is, step S4, includes:

s41: determining a plurality of selectable action values according to the current state, energy supply demand and environmental information of each device by using a first deep learning neural network model;

the energy supply demand may include heating power demand, cooling power demand, etc., and the environmental information may include ambient temperature, humidity, etc.;

s42: calculating the probability corresponding to each optional action value by utilizing a second deep learning neural network model;

s43: and selecting the selectable action value corresponding to the maximum probability value as the current action value and executing.

In the embodiment of the invention, the selectable actions under the current state, the energy supply requirement and the environment are determined through the first deep learning neural network model, the probability corresponding to each selectable action value is calculated by utilizing the second deep learning neural network model, and the selectable action value corresponding to the maximum probability value is selected as the current action value, so that the real-time optimization performance of scheduling can be improved.

wherein, delta _l (x) Is the excitation function of the ith neuron node in the hidden layer, x is the input vector, c _l For the first neuron of the hidden layerCenter of excitation function of node, d _l The width of the center of the excitation function of the ith neuron node of the hidden layer.

Optionally, the number of neuron nodes in the hidden layer may be calculated according to the following formula:

wherein L is ₁ 、L ₂ Is the number of input layer neurons, the number of output layer neurons, L ₁ Determined according to the number of input parameters, L ₂ According to the maximum number of selectable action values.

Further optionally, the radial basis function neural network may be established by:

establishing a Gaussian radial basis function layer, which can be specifically established according to the neuron excitation function of the radial basis neural network;

establishing a radial basis function weight connection layer;

establishing an output layer comprising: and establishing a weight matrix of the output layer to perform matrix product operation with the output of the radial basis function weight connection layer.

Alternatively, the center and center width of the excitation function may be determined using a K-means clustering method. Specifically, a predetermined number of training samples are selected from a plurality of training samples to serve as initial clustering centers; determining Euclidean spatial distances of a plurality of the training samples to each initial clustering center; assigning a plurality of the training samples to a cluster set to which each of the initial cluster centers belongs based on the Euclidean distance; calculating the average value of training samples contained in each cluster set, and taking the average value as a new cluster center; and if the difference value between the new clustering center and the initial clustering center is less than or equal to a preset threshold value, determining the new clustering center as the center of the radial basis excitation function. Then, the distance between each cluster center and its nearest neighbor cluster center is calculated, and the center width is calculated from the average value of the distances.

Optionally, the second deep learning neural network model includes a plurality of BP neural networks, and the plurality of BP neural networks are combined according to the following formula:

wherein f is _m For the mth BP neural network, w _m And (3) taking the weight of the mth BP neural network, wherein M =1,2, …, M and M are the number of the BP neural networks, and sign is a sign function.

Further optionally, the weight w of the mth BP neural network _m Is calculated by the following method:

determining the weight w using the Adaboost algorithm _m 。

Specifically, the weight w may be calculated according to the following formula _m ：

Wherein, delta _t And determining the error rate of the selectable action value with the highest probability for the mth BP neural network, wherein k is the number of the selectable action values.

The weight w can also be determined according to the minimum loss function value at the end of the BP neural network training _m 。

Specifically, each BP neural network includes an input layer, three hidden layers, and an output layer. During forward propagation, an input signal (at least comprising each selectable action value, the current state of each device and energy supply requirement and environment information) acts on an output node through a hidden layer, an output signal is generated through nonlinear transformation, if actual output does not accord with expected output, a backward propagation process of an error is carried out, the output error is reversely propagated to the input layer by layer through the hidden layer, and the error is distributed to all neurons of each layer, so that the error signal obtained from each layer is used as a basis for adjusting the weight of each neuron. And (3) reducing the error along the gradient direction by adjusting the connection strength between the input node and the hidden node, the connection strength between the hidden node and the output node and the threshold, determining the weight and the threshold corresponding to the minimum error through repeated learning and training, and stopping the training.

determining the action value according to the following formula:

wherein, a _ij Action value, s, of the jth adjustable parameter for the ith device _ijmax Is the maximum state value, s, in the state space _ij Is the current state value, s _ijmin And rand is a random function which is the minimum state value in the state space.

In other optional specific embodiments, the action value may also be determined by:

determining an initial action value;

judging whether the reward function value and the Q value meet preset conditions or not; the preset condition may be that the difference between the reward function value and the Q value and the currently optimal reward function value and Q value is continuously small;

if so, determining the initial action value as the action value;

Wherein, the preset algorithm may be: a is _ij ′＝a _ij +rand[-0.5,0.5]*Δa _ij Wherein a is _ij 、a _ij ' is the action value before and after the adjustment of the jth adjustable parameter of the ith device, rand is a random function, delta a _ij And adjusting the step length for the action value of the jth adjustable parameter of the ith device.

In the embodiment of the invention, the optimal action value at the current moment is selected in an iterative optimization mode, so that each scheduling optimization is the optimal scheduling under the current condition, the action frequency of each device can be further reduced, and the influence on the service life of the device caused by frequent actions is avoided.

In other alternative embodiments, the current action value may also be obtained by a lave search method.

adjusting the network parameters according to the following formula:

wherein, w _m (t + 1) is the adjusted network parameter, w _m (t) is the current network parameter, and σ (t) is the loss function value.

In other optional specific embodiments, the optimal scheduling reinforcement learning model may be established and the network parameters of the optimal scheduling reinforcement learning model may be determined by the following method:

clustering the training samples according to preset parameters, and determining the number of neurons of the hidden layer of the model according to a clustering result;

determining parameters to be trained according to the determined number of the neurons of the model hidden layer;

evaluating the adaptive value of each parameter to obtain an initial population;

selecting individuals from the initial population as potential field centers according to attraction potential field center data;

and calculating the selected probability in each potential field according to the adaptive value, and updating the individual positions in the population.

When the positions of individuals in the population are updated, the size relationship between the average particle distance of the population and a preset threshold value is different, and the updating calculation modes are different.

The embodiment of the invention can avoid the problem that the network parameters of the optimal scheduling reinforcement learning model are possibly trapped in local optimization during the optimization.

Example 2

The present embodiment provides an optimized dispatching system 30 for electricity-cooling-heating-gas multi-energy demand typical park, as shown in fig. 3, including:

one or more processors 301;

a storage device 302 for storing one or more programs;

the one or more programs, when executed by the one or more processors 301, cause the one or more processors 301 to implement any of the methods described above in embodiment 1.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. An optimized dispatching method for a typical park with electricity-cold-heat-gas multi-energy demand comprises the following steps:

optimally scheduling each of the devices in the multi-energy system using the optimal scheduling reinforcement learning model and based on the constraints,

the optimally scheduling each device in the multi-energy system by using the optimally scheduling reinforcement learning model and based on the constraint condition comprises: determining an initial action value; calculating a reward function value and a Q value based on the initial action value; judging whether the reward function value and the Q value meet preset conditions or not; if so, determining the initial action value as the action value; otherwise, adjusting the initial action value by using a preset algorithm to obtain a new action value, calculating a reward function value and a Q value based on the new action value, and judging whether the preset condition is met; if so, determining the new action value as the action value, otherwise, continuing to execute the previous step until the reward function value and the Q value corresponding to the latest action value meet the preset condition;

the establishing of the optimized dispatching reinforcement learning model comprises the following steps:

adjusting the network parameters according to the following formula:

wherein,

in order to adjust the network parameters in question,

for the current said network parameters to be used,

the optimizing and scheduling the equipment in the multi-energy system by utilizing the optimizing and scheduling reinforcement learning model and based on the constraint condition further comprises the following steps:

selecting the selectable action value corresponding to the maximum probability value as a current action value and executing;

the first deep learning neural network model comprises a radial basis function neural network, and the radial basis function neural network is established by the following process:

establishing a Gaussian radial basis function layer;

establishing a radial basis function weight connection layer;

establishing a weight matrix of an output layer to perform matrix product operation with the output of the radial basis function weight connection layer;

the neuron excitation function of the radial basis function neural network is as follows: