CN117613983A - Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning - Google Patents

Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning Download PDF

Info

Publication number
CN117613983A
CN117613983A CN202410090946.5A CN202410090946A CN117613983A CN 117613983 A CN117613983 A CN 117613983A CN 202410090946 A CN202410090946 A CN 202410090946A CN 117613983 A CN117613983 A CN 117613983A
Authority
CN
China
Prior art keywords
charge
energy storage
power
storage battery
discharge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410090946.5A
Other languages
Chinese (zh)
Other versions
CN117613983B (en
Inventor
那琼澜
李信
邢宁哲
杨艺西
马跃
彭柏
邢海瀛
苏丹
娄竞
邬小波
陈重韬
王艺霏
张海明
张实君
周子阔
李宇鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Jibei Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Jibei Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Jibei Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202410090946.5A priority Critical patent/CN117613983B/en
Publication of CN117613983A publication Critical patent/CN117613983A/en
Application granted granted Critical
Publication of CN117613983B publication Critical patent/CN117613983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/24Arrangements for preventing or reducing oscillations of power in networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/28Arrangements for balancing of the load in a network by storage of energy
    • H02J3/32Arrangements for balancing of the load in a network by storage of energy using batteries with converting means
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/0047Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries with monitoring or indicating devices or circuits
    • H02J7/0048Detection of remaining charge capacity or state of charge [SOC]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/007Regulation of charging or discharging current or voltage
    • H02J7/00712Regulation of charging or discharging current or voltage the cycle being controlled or terminated in response to electric parameters
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

The specification relates to the technical field of electric power, in particular to an energy storage charging and discharging control decision method and device based on fusion rule reinforcement learning, which are applied to a power grid and a user side optical energy storage device and comprise the following steps: determining a state space by using user electric power, the charge state of an energy storage battery, outdoor temperature, solar irradiance, power grid electricity unit price and power grid selling unit price in any time period; inputting the state space into a charge and discharge control decision model based on fusion rule reinforcement learning to obtain an optimal charge and discharge decision variable, wherein the optimal charge and discharge decision variable comprises: the optimal charge and discharge power and the optimal coefficient of the energy storage battery are obtained through training a charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model. The method and the device integrate the predefined rules, improve the speed of convergence of reinforcement learning training to the optimal charge-discharge control strategy, improve the electricity economy and reduce the power fluctuation and burden of the power grid.

Description

Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning
Technical Field
The specification relates to the technical field of electric power, in particular to an energy storage charging and discharging control decision method and device based on fusion rule reinforcement learning.
Background
With the continuous increase of the total power consumption and the power consumption duty ratio of the third industry and urban and rural resident users, the peak load demand of the power grid is continuously increased when the power consumption is high, and the peak-valley power price difference is further increased and high level is maintained. In order to reduce the load fluctuation of the power grid, reduce the capacity expansion cost of the power grid and improve the economic efficiency of the power system, the user side energy storage technology is widely focused, and has great development potential.
In the technical field of user side energy storage, the user side photovoltaic energy storage simultaneously comprises photovoltaic power generation, an energy storage battery and power grid power supply, and efficient coordination among different power equipment is needed to be solved. The prior researches mainly focus on the optimal configuration and energy storage planning method of the energy storage facilities at the user side. The maximum value of the first power and the second power is taken as the power to be configured of the energy storage system, the minimum value of the first capacity and the second capacity is taken as the capacity to be configured of the energy storage system, and the benefits of peak clipping and valley filling are improved.
In the prior art, there is also a user energy storage planning method, a user side energy storage planning model is built, and the optimal planning model parameters are obtained by solving the comprehensive objective functions of initial consumption, operation and maintenance consumption, power consumption and basic consumption, the model is simple and cannot be adjusted in a self-adaptive manner, and photovoltaic power generation factors are not considered. In addition, photovoltaic power generation is affected by uncertainty factors such as dust, temperature, shielding and the like, and the power generation power has uncertainty. Therefore, the user-side energy storage operation control considering the uncertainty of photovoltaic power generation needs to be further optimized.
Disclosure of Invention
In order to solve the problem of energy storage at the user side, which does not consider uncertainty of photovoltaic power generation, in the prior art, the embodiment of the specification provides a charge-discharge control decision model training method based on fusion rule reinforcement learning.
The embodiment of the specification provides an energy storage charging and discharging control decision method based on fusion rule reinforcement learning, which is applied to a power grid and a user side light energy storage device, wherein the user side light energy storage device comprises: the photovoltaic power generation device, the user power utilization device and the energy storage battery, wherein the method comprises the following steps: determining a state space by using user electric power, the charge state of an energy storage battery, outdoor temperature, solar irradiance, power grid electricity unit price and power grid selling unit price in any time period; inputting the state space into a charge and discharge control decision model based on fusion rule reinforcement learning to obtain an optimal charge and discharge decision variable in any time period, wherein the optimal charge and discharge decision variable comprises: and the energy storage battery optimal charge and discharge power and the optimal coefficient are obtained by training the charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model.
According to one aspect of the embodiments of the present disclosure, a charge-discharge control decision model based on fusion rule reinforcement learning is obtained by training the following steps: determining a sample state space according to state variables and environment variables related to a user power device, an energy storage battery and a power grid in a sample time period; constructing a charge-discharge control decision base model according to a photovoltaic power generation uncertain model, wherein the photovoltaic power generation uncertain model is a Gaussian distribution model considering environmental variables; inputting state variables in a sample state space into the charge-discharge control decision basic model, and outputting charge-discharge decision variables; and constraining the charge-discharge decision variables according to a preset charge-discharge rule, and iteratively training the charge-discharge control decision basic model by using a reward function to obtain a trained charge-discharge control decision model.
According to one aspect of embodiments of the present description, a state variable of an energy storage battery for a sample period is determined by: acquiring the charge power of an energy storage battery in a historical sample period before the sample period; and determining the charge state of the energy storage battery in the sample time period according to the charge power of the energy storage battery in the historical sample time period by the following formula:
The method comprises the steps of carrying out a first treatment on the surface of the Wherein k represents the current sample period, +.>The state of charge of the energy storage battery for the 1 st sample period (i.e., the initial time); />Is the self-discharge rate of the energy storage battery; />Open circuit voltage of the energy storage battery; />Is the rated capacity of the energy storage battery; />The charge and discharge efficiency of the energy storage battery is improved; />,/>,/>Is the firstiCharging and discharging power of the energy storage battery in a single sample period, and when the energy storage battery is discharged, the energy storage battery is charged and discharged>When the energy storage battery is charged, the +.>;/>,/>Is the firstiAn efficiency index for each time period that satisfies: />
According to an aspect of embodiments of the present disclosure, constructing a charge-discharge control decision base model of an energy storage device from a photovoltaic power generation uncertainty model includes: determining photovoltaic power generation power of a sample period based on a photovoltaic power generation uncertainty model obeying Gaussian distributionThe method comprises the steps of carrying out a first treatment on the surface of the And determining the power output by the power generation device to the power grid at the sample time Duan Guangfu according to the difference between the photovoltaic power generation power and the user power consumption power in the sample time period.
According to one aspect of embodiments of the present description, the method comprises: the uncertain model of photovoltaic power generation is determined by the following formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Photovoltaic power generation power for a sample period; />Is photoelectric conversion efficiency; / >Is the area of the photovoltaic material; />Solar irradiance representing a sample time period; />Is the variance; photovoltaic power generation power obeys mean value expectations to be +.>Variance is->Is a gaussian distribution of (c); and determining the power output to the power grid of the photovoltaic power generation device according to the photovoltaic power generation power obtained by the photovoltaic power generation uncertain model by the following formula: />The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The power output to the grid for sample time Duan Guangfu by the power generation device; pload is the user power; />Representing photovoltaic power generation; />Is proportional weight, ++>
According to an aspect of the embodiments of the present specification, determining the preset charge and discharge rule includes: if the current period belongs to the electricity consumption peak period, setting the charge and discharge power of the energy storage battery to be negative; if the current period belongs to the electricity consumption valley period, setting the charge and discharge power of the energy storage battery to be positive;
if the photovoltaic power generation power is smaller than the load power, setting the proportion weight to be zero; if the photovoltaic power generation power is larger than the load power, setting the proportional weight as an original proportional weight value; and constraining decision variables in the charge-discharge control decision model according to the preset charge-discharge rule.
According to one aspect of embodiments of the present description, iteratively training the charge-discharge control decision base model using a reward function includes: determining an economic benefit function according to the unit price of electricity sold, unit price of electricity used and the input power of the power grid in each period; constructing a ride-on rewarding function according to the charge-discharge power of the energy storage battery at the current moment and the charge-discharge power of the energy storage battery at the last moment; determining a maximum demand power rewarding function according to the input power of the power grid in each period; determining a constraint punishment rewarding function according to the minimum state of charge of the energy storage battery, the maximum state of charge of the energy storage battery and the state of charge of the energy storage battery at the current moment; and iteratively training the charge and discharge control decision model according to the economical rewarding function, the smoothness rewarding function, the maximum demand power rewarding function and the constraint punishment rewarding function, and determining that the charge and discharge control decision model training is completed when the rewarding is the maximum and is kept stable in the preset iteration times.
The embodiment of the specification also provides a photovoltaic energy storage system based on fusion rule reinforcement learning, which comprises: the power grid is in communication connection with the user-side photovoltaic energy storage device and is used for supplying power to the user-side photovoltaic energy storage device; a user-side photovoltaic energy storage device comprising: the system comprises a photovoltaic power generation device, a user power utilization device and an energy storage battery, wherein the photovoltaic power generation device is used for providing the electric quantity of photovoltaic power generation for the user power utilization device and is used for conveying the residual electric quantity to the energy storage battery or a power grid when the electric quantity is residual; the energy storage battery is used for storing the residual electric quantity of the photovoltaic power generation device and supplying power to the user power utilization device.
The embodiment of the specification also provides a charge-discharge control decision model training device based on fusion rule reinforcement learning, which comprises: the sample state space construction unit is used for determining a sample state space according to state variables and environment variables related to the user power utilization device, the energy storage battery and the power grid in the sample time period; the basic model building unit is used for building a charging and discharging control decision basic model of the energy storage device according to a photovoltaic power generation uncertain model, wherein the photovoltaic power generation uncertain model is a Gaussian distribution model considering environmental variables; the output unit is used for inputting state variables in the sample state space into the charge-discharge control decision basic model and outputting charge-discharge decision variables; and the model training unit is used for iteratively training the charge-discharge control decision basic model according to a preset charge-discharge rule until the basic model meets a preset condition to obtain a charge-discharge control decision model after training.
The embodiment of the specification provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the charge and discharge control decision model training method based on fusion rule reinforcement learning when executing the computer program.
The embodiment of the specification also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the charge-discharge control decision model training method based on fusion rule reinforcement learning when being executed by a processor.
The method and the device integrate the predefined rules, improve the speed of convergence of reinforcement learning training to the optimal charge-discharge control strategy, and reduce power fluctuation and load of the power grid.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a charge and discharge control decision method based on fusion rule reinforcement learning according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for training a charge-discharge control decision model based on fusion rule reinforcement learning according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a method for constructing a charge-discharge control decision base model according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of a method of determining a state variable of an energy storage battery for a sample period of time according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a method for determining a preset charge and discharge rule according to an embodiment of the present disclosure;
FIG. 6 is a flowchart of a method for iteratively training the charge-discharge control decision base model using a reward function according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a photovoltaic energy storage system based on fusion rule reinforcement learning according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a charge-discharge control decision model training device based on fusion rule reinforcement learning according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
Description of the drawings:
701. a power grid;
702. A user-side photovoltaic energy storage device;
703. a photovoltaic power generation device;
704. a user power utilization device;
705. an energy storage battery;
801. a state space construction unit;
802. an optimal charge-discharge decision variable determining unit;
902. a computer device;
904. a processor;
906. a memory;
908. a driving mechanism;
910. an input/output module;
912. an input device;
914. an output device;
916. a presentation device;
918. a graphical user interface;
920. a network interface;
922. a communication link;
924. a communication bus.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
It should be noted that the terms "first," "second," and the like in the description and the claims of the specification and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present description described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
The present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings.
It should be noted that, the energy storage charge and discharge control decision method and device based on the fusion rule reinforcement learning in the present specification can be used in the technical field of electric power, and the application field of the energy storage charge and discharge control decision method and device based on the fusion rule reinforcement learning in the present specification is not limited.
Fig. 1 is a flowchart of a charge and discharge control decision method based on fusion rule reinforcement learning according to an embodiment of the present disclosure, which specifically includes the following steps:
and 101, determining a state space by using the user electric power, the charge state of the energy storage battery, the outdoor temperature, the solar irradiance, the power grid electricity unit price and the power grid selling unit price in any time period.
The method is applied to a model application stage, and similar to a model training stage construction state sample space, user power consumption, state of charge of an energy storage battery, outdoor temperature, solar irradiance, power grid power consumption unit price and power grid selling unit price in any unknown time period are taken, and a state space is determined.
And 102, inputting the state space into a charge-discharge control decision base model based on fusion rule reinforcement learning to obtain the optimal charge-discharge decision variable in any time period. Wherein the optimal charge-discharge decision variables include: and the energy storage battery optimal charge and discharge power and the optimal coefficient are obtained by training the charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model.
In this step, the state space is used as the input of the optimal strategy network, and the optimal charge and discharge actions of the user side energy storage battery in the time period are output. In this step, the training process of the charge-discharge control decision model based on fusion rule reinforcement learning is described in detail in fig. 2.
Fig. 2 is a flowchart of a method for training a charge and discharge control decision model based on fusion rule reinforcement learning according to an embodiment of the present disclosure, which specifically includes the following steps:
step 201, determining a sample state space according to state variables and environment variables related to a user power device, an energy storage battery and a power grid in a sample time period. In this step, the state variable associated with the consumer is the consumer load power, wherein the consumer load power for the sample period may be determined by A representation; the state variable related to the energy storage battery is the charge of the energy storage batteryStatus of->A representation; the state variables related to the power grid are electricity unit price and selling unit price, wherein the electricity unit price in the sample time period can be defined by +.>A representation; the unit price of the sample period can be represented by +.>And (3) representing. The environmental variable is the outdoor air temperature and the irradiance of sunlight. Wherein the outdoor air temperature of the sample period may be made of +.>A representation; the solar irradiance of the sample period can be defined by +.>And (3) representing.
And determining the state variables and the environment variables of the same sample time period as a sample state space, and subsequently inputting a charge-discharge control decision basic model.
And 202, constructing a charge-discharge control decision base model according to a photovoltaic power generation uncertain model, wherein the photovoltaic power generation uncertain model is a Gaussian distribution model considering environmental variables. In the step, a charge and discharge control decision base model of the user side optical energy storage device is built by building a sample state space and an action space and combining a photovoltaic power generation uncertain model. The charge-discharge control decision basic model is a strategy network, and the basic model is constructed by a fully-connected neural network. The charge-discharge control decision basic model is obtained by randomly initializing parameters of a strategy network.
The action space of the embodiment of the present specification can be expressed as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Indicating the maximum charge power of the energy storage battery, +.>Indicating the maximum discharge power of the energy storage cell.
And 203, inputting the state variables in the sample state space into the charge-discharge control decision base model, and outputting the charge-discharge decision variables. The decision variable is the charge and discharge power and the proportional weight of the energy storage battery. The proportion weight determines the power output by the photovoltaic to the power grid, and aims to sell surplus electric quantity generated by the photovoltaic to the power grid to earn economic benefits.
And 204, constraining the charge-discharge decision variables according to a preset charge-discharge rule, and iteratively training the charge-discharge control decision base model by using a reward function to obtain a trained charge-discharge control decision model. In the step, a reinforcement learning method of fusion rules is adopted, and the charge and discharge control decision base model is trained iteratively until a model with the maximum rewarding is obtained, so that the charge and discharge control decision model is obtained.
In the embodiment of the specification, the state variables of the charge-discharge control decision modelAs the input of the strategy network, the action network outputs the action variable of the charge-discharge control decision model +. >The method comprises the steps of carrying out a first treatment on the surface of the The action variables are corrected using predefined rules.
Further, training a charge-discharge control decision base model by using the reward function, and determining the charge-discharge control decision base model by using a strategy network corresponding to the maximum reward value when the value of the reward function of the charge-discharge control decision model reaches the maximum and is kept stable in a certain iteration time or iteration times, so that model training is completed.
The method and the device integrate the predefined rules, improve the speed of convergence of reinforcement learning training to the optimal charge-discharge control strategy, improve the electricity economy and reduce the power fluctuation and burden of the power grid.
Fig. 3 is a flowchart of a method for constructing a charge-discharge control decision base model according to an embodiment of the present disclosure, which specifically includes the following steps:
step 301, determining photovoltaic power generation power of a sample period based on a photovoltaic power generation uncertainty model conforming to Gaussian distribution. In the embodiment of the specification, the photovoltaic power generation uncertainty model can determine the photovoltaic power generation power of the kth sample time period, the photovoltaic power generation power is calculated by using the photovoltaic power generation uncertainty model, environmental factors including the ambient temperature and the irradiance of sunlight are considered, and the average value of the photovoltaic power generation power of the kth sample time period is expected to be Variance is->Is a Gaussian distribution of (1), wherein->Photovoltaic power generation power for a sample period; />Is photoelectric conversion efficiency; />Is the area of the photovoltaic material; />Solar irradiance representing a sample time period; />Is the variance. The uncertainty of photovoltaic power generation caused by environmental factors is considered in the specification, coordination among the photovoltaic power generation device, the energy storage battery and the power grid is achieved, and the economic cost of a user and the power supply load of the power grid are reduced.
And step 302, determining the power output by the power generation device to the power grid at the sample time Duan Guangfu according to the decision variable output by the charge-discharge control decision basic model and the difference between the photovoltaic power generation power at the sample time period and the user power.
In this step, the decision variables output by the charge-discharge control decision base model include proportional weightsAnd the charge and discharge power of the energy storage battery. According to the above steps, the photovoltaic power generation power of the sample period is determined by +.>And (3) representing.
In this step, the power output to the grid by the photovoltaic power generation device for the sample period is expressed by the following formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The power output to the grid by the power generation device for the kth sample time Duan Guangfu; />Using electric power for a user of a kth sample period; / >Representing the photovoltaic power generation power of the kth sample period; />Is proportional weight, ++>. The user power can be directly acquired. The power formula of the photovoltaic power generation device output to the power grid is as follows: the power generated by the photovoltaic power generation device is preferentially supplied to users, and if the residual power exists, the residual power is distributed to an energy storage battery or a power grid. In the embodiment of the specification, the power and the proportion weight output by the photovoltaic power generation device to the power grid are decision variables for training a charge-discharge control decision base model.
In other embodiments of the present description, the power balance equation is used to describe the balance of power input and output in the operating state of a system of grid and customer-side optical energy storage devices.
The power balance equation is expressed by the following formula:,/>is the firstkPower input from the grid for a sample period of time; />Is the firstkThe power output to the grid for each sample period is provided by the photovoltaic power generation device; />Using electric power for a user of a kth sample period; />Represents the photovoltaic power generation power of the kth sample period.
Fig. 4 is a flowchart of a method for determining a state variable of an energy storage battery in a sample period according to an embodiment of the present disclosure, which specifically includes the following steps:
Step 401, obtaining a charge power of the energy storage battery in a historical sample period before the sample period. In this step, the historical sample period preceding the sample period is: all historical sample periods preceding the current sample period. For example, the current sample period is the kth sample period, then the historical sample period preceding the sample period is the kth-1 sample period, the kth-2 sample period … …, the 2 nd sample period, the 1 st sample period.
In the embodiment of the present disclosure, the charge power of the energy storage battery in the historical sample period may be acquired.
Step 402, determining the state of charge of the energy storage battery in the sample period according to the charge power of the energy storage battery in the historical sample period. In this step, the state of charge of the energy storage battery in the sample period is determined specifically by the following formula:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein k represents the current sample period, +.>The state of charge of the energy storage battery for the 1 st sample period (i.e., the initial time); />Is the self-discharge rate of the energy storage battery; />Open circuit voltage of the energy storage battery; />Is the rated capacity of the energy storage battery; />The charge and discharge efficiency of the energy storage battery is improved; />,/>,/>Is the firstiCharging and discharging power of the energy storage battery in a single sample period, and when the energy storage battery is discharged, the energy storage battery is charged and discharged >When the energy storage battery is charged, the +.>;/>,/>Is the firstiAn efficiency index of a time period that satisfies:/>
In the embodiment of the present specification, the photovoltaic power generation uncertainty model is determined by the following formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Photovoltaic power generation power for a sample period; />Is photoelectric conversion efficiency; />Is the area of the photovoltaic material; />Solar irradiance representing a sample time period; />Is the variance; photovoltaic power generation power obeys mean value expectations to be +.>Variance is->Is a gaussian distribution of (c);
and determining the power output to the power grid of the photovoltaic power generation device according to the photovoltaic power generation power obtained by the photovoltaic power generation uncertain model by the following formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The power output to the grid by the power generation device for the kth sample time Duan Guangfu; />Is the firstUser power for k sample time periods; />Representing the photovoltaic power generation power of the light in the kth sample period; />Is proportional weight, ++>
Fig. 5 is a flowchart of a method for determining a preset charging and discharging rule according to an embodiment of the present disclosure, which specifically includes the following steps:
step 501, if the current period belongs to the electricity consumption peak period, setting the charge and discharge power of the energy storage battery to be negative. The preset charging and discharging rules in the step indicate the period of the electricity consumption peak, and the charging of the energy storage battery is forbidden, so that the period of the electricity consumption peak of the energy storage battery can be ensured to be comprehensively supplied to the electricity consumption device of a user for use, and the utilization efficiency of the energy storage battery in the period of the electricity consumption peak is improved.
Step 502, if the current period belongs to the electricity consumption low-valley period, setting the charge and discharge power of the energy storage battery to be positive. The preset charge-discharge rule in this step indicates that the energy storage battery is allowed to be charged during the electricity underestimation period, therebyThe method comprises the steps of carrying out a first treatment on the surface of the Wherein F represents a power consumption peak time period set, and G represents a power consumption valley time period set; />Representing the charge and discharge power of the energy storage battery of the kth period,
in step 503, if the photovoltaic power generation power is smaller than the load power, the proportional weight is set to be zero. The preset charge and discharge rules in this step represent: if the power generated by the photovoltaic power generation device is smaller than the load power of the user power generation device, the photovoltaic power generation device can not supply power to the energy storage battery or the power grid any more, and the proportion weight is set to be 0.
In step 504, if the photovoltaic power generation power is greater than the load power, the proportional weight is set as the original proportional weight value. In this step, the preset rule indicates that when the generated power of the photovoltaic power generation device is greater than the power used by the user to power the power generation device, the photovoltaic power generation device can supply power to the energy storage battery or the power grid, and the proportional weight is kept to be the original value.
In the embodiment of the present disclosure, according to the foregoing preset charging and discharging rule, decision variables in the charging and discharging control decision model may be constrained, and the efficiency of reinforcement learning and the efficiency of model iteration may be accelerated by using the rule.
Fig. 6 is a flowchart of a method for iteratively training the charge-discharge control decision base model by using a reward function according to an embodiment of the present disclosure, which specifically includes the following steps:
step 601, determining an economic benefit function according to the price of electricity sold, the price of electricity used and the input power of the power grid in each period. In the present embodiment, the economic benefit function may be determined by the following formula:
wherein,is the firstkEconomic rewards for time periods;
representing the power input from the power grid for the ith sample period;
representing the power output to the grid for the ith sample period, N representing the total number of periods;
indicating the electricity unit price of the ith sample period,/-for the sample period>Representing the ith sampleThe unit price of electricity selling in this period.
Step 602, constructing a ride comfort rewarding function according to the charge and discharge power of the energy storage battery at the current moment and the charge and discharge power of the energy storage battery at the last moment.
Specifically, the ride comfort rewards function is represented by the following formula:
in the method, in the process of the invention,is the firstkA ride reward for the sample time period; />And->Respectively the firstkTime period and the firstk-And charging and discharging power of the energy storage battery in the period 1.
Step 603, determining a maximum demand power reward function according to the input power of the power grid in each period. In this step, the maximum required power reward function is expressed by the following formula:
In the method, in the process of the invention,is the firstkThe maximum required power reward function for a time period. The maximum demand rewarding function is used for reducing the input power of the power grid as much as possible and reducing the load of the power grid in the power utilization peak period.
Step 604, determining a constraint punishment reward function according to the minimum state of charge of the energy storage battery, the maximum state of charge of the energy storage battery and the state of charge of the energy storage battery at the current moment.
The constraint penalty prize function may be expressed by the following formula:
in the method, in the process of the invention,is the firstkConstraint penalty reward function for sample time period, +.>Is an exponential parameter; />Andthe minimum state of charge and the maximum state of charge of the energy storage battery, respectively. The constraint punishment reward function aims at constraining the charge state of the energy storage battery and avoiding damaging the battery.
And step 605, iteratively training the charge and discharge control decision model according to the economical rewarding function, the smoothness rewarding function, the maximum demand power rewarding function and the constraint punishment rewarding function, and determining that the charge and discharge control decision model training is completed when the rewarding is the maximum value and remains stable in the preset iteration times.
In the embodiment of the present disclosure, the reward function of the charge and discharge control decision model is determined according to the economic reward function, the ride comfort reward function, the maximum demand power reward function, and the constraint punishment reward function constructed in the foregoing steps, where the following formula is expressed:
In the method, in the process of the invention,is the firstkRewards for time periods; />、/>、/>And->Respectively the firstkEconomic rewards, ride rewards, maximum demand power rewards and constraint penalty rewards for the sample time period; />、/>、/>And->Is a weight coefficient.
And iteratively training a charge-discharge control decision model according to the reward function, and determining that the charge-discharge control decision model is trained when the reward is maximum and is kept stable in the preset iteration times or is kept stable for a period of time during model training.
Fig. 7 is a schematic diagram of a photovoltaic energy storage system based on fusion rule reinforcement learning according to an embodiment of the present disclosure, where the system includes:
a grid 701, a user-side photovoltaic energy storage device 702. Wherein the power grid 701 is communicatively connected to the user-side photovoltaic energy storage device 702 for powering the user-side photovoltaic energy storage device 702. The user-side photovoltaic energy storage device 702 includes: the photovoltaic power generation device 703, the user power utilization device 704 and the energy storage battery 705, wherein the photovoltaic power generation device 703 is used for providing the power generated by photovoltaic power to the user power utilization device 704 and is used for conveying the residual power to the energy storage battery 705 or the power grid 701 when the power remains. The energy storage battery 705 is used for storing the remaining power of the photovoltaic power generation device 703 and for supplying power to the user power utilization device 704.
Fig. 8 is a schematic structural diagram of a charge and discharge control decision device based on fusion rule reinforcement learning according to an embodiment of the present disclosure, in which a basic structure of the charge and discharge control decision device based on fusion rule reinforcement learning is described, and functional units and modules thereof may be implemented in a software manner, or may also be implemented in a general chip or a specific chip, where the charge and discharge control decision device based on fusion rule reinforcement learning specifically includes:
a state space construction unit 801, configured to determine a state space from user power consumption, a state of charge of an energy storage battery, an outdoor temperature, solar irradiance, a unit price of power grid power consumption, and a unit price of power grid selling in any time period;
the optimal charge-discharge decision variable determining unit 802 is configured to input the state space to a charge-discharge control decision model based on fusion rule reinforcement learning, and obtain an optimal charge-discharge decision variable in any time period, where the optimal charge-discharge decision variable includes: and the energy storage battery optimal charge and discharge power and the optimal coefficient are obtained by training the charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model.
Aiming at the problems of photovoltaic power generation of a user side photovoltaic energy storage facility and coordination between an energy storage battery and a power grid, the photovoltaic power generation uncertainty of the user side photovoltaic energy storage facility is considered, a photovoltaic power generation uncertainty model is established, a predefined rule is fused, the speed of convergence of reinforcement learning training to an optimal charge and discharge control strategy is improved, economic rewards, smoothness rewards, maximum demand power rewards and constraint punishment rewards are designed, the user side power consumption economy is improved, and the power fluctuation and burden of the power grid are reduced.
As shown in fig. 9, a computer device provided in an embodiment of the present disclosure, where the computer device is configured to perform the energy storage charging and discharging control decision method based on fusion rule reinforcement learning, the computer device 902 may include one or more processors 904, such as one or more Central Processing Units (CPUs), each of which may implement one or more hardware threads. The computer device 902 may also include any memory 906 for storing any kind of information, such as code, settings, data, etc. For example, and without limitation, the memory 906 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may store information using any technique. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 902. In one case, when the processor 904 executes associated instructions stored in any memory or combination of memories, the computer device 902 can perform any of the operations of the associated instructions. The computer device 902 also includes one or more drive mechanisms 908 for interacting with any memory, such as a hard disk drive mechanism, optical disk drive mechanism, and the like.
The computer device 902 may also include an input/output module 910 (I/O) for receiving various inputs (via an input device 912) and for providing various outputs (via an output device 914). One particular output mechanism may include a presentation device 916 and an associated Graphical User Interface (GUI) 918. In other embodiments, input/output module 910 (I/O), input device 912, and output device 914 may not be included, but merely as a computer device in a network. The computer device 902 may also include one or more network interfaces 920 for exchanging data with other devices via one or more communication links 922. One or more communication buses 924 couple the above-described components together.
The communication link 922 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication link 922 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
Corresponding to the method in fig. 1 to 6, the present embodiment also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above method.
The present description also provides computer-readable instructions, wherein the program therein causes the processor to perform the method as shown in fig. 1 to 6 when the processor executes the instructions.
It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation of the embodiments of the present disclosure.
It should also be understood that, in the embodiments of the present specification, the term "and/or" is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In the present specification, the character "/" generally indicates that the front and rear related objects are an or relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the various example components and steps have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present specification.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this specification, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present description.
In addition, each functional unit in each embodiment of the present specification may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present specification is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present specification. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The principles and embodiments of the present specification are explained in this specification using specific examples, the above examples being provided only to assist in understanding the method of the present specification and its core ideas; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope based on the ideas of the present specification, the present description should not be construed as limiting the present specification in view of the above.

Claims (11)

1. The energy storage charging and discharging control decision-making method based on fusion rule reinforcement learning is characterized by being applied to a power grid and a user side optical energy storage device, wherein the user side optical energy storage device comprises: the photovoltaic power generation device, the user power utilization device and the energy storage battery, wherein the method comprises the following steps:
determining a state space by using user electric power, the charge state of an energy storage battery, outdoor temperature, solar irradiance, power grid electricity unit price and power grid selling unit price in any time period;
inputting the state space into a charge and discharge control decision model based on fusion rule reinforcement learning to obtain an optimal charge and discharge decision variable in any time period, wherein the optimal charge and discharge decision variable comprises: and the energy storage battery optimal charge and discharge power and the optimal coefficient are obtained by training the charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model.
2. The energy storage charge and discharge control decision method based on fusion rule reinforcement learning according to claim 1, wherein the charge and discharge control decision model based on fusion rule reinforcement learning is obtained by training the following steps:
determining a sample state space according to state variables and environment variables related to a user power device, an energy storage battery and a power grid in a sample time period;
constructing a charge-discharge control decision base model according to a photovoltaic power generation uncertain model, wherein the photovoltaic power generation uncertain model is a Gaussian distribution model considering environmental variables;
inputting state variables in a sample state space into the charge-discharge control decision basic model, and outputting charge-discharge decision variables;
and constraining the charge-discharge decision variables according to a preset charge-discharge rule, and iteratively training the charge-discharge control decision basic model by using a reward function to obtain a trained charge-discharge control decision model.
3. The energy storage charge and discharge control decision method based on fusion rule reinforcement learning of claim 2, wherein the state variable of the energy storage battery in the sample period is determined by:
acquiring the charge power of an energy storage battery in a historical sample period before the sample period;
And determining the charge state of the energy storage battery in the sample time period according to the charge power of the energy storage battery in the historical sample time period by the following formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein k represents the current sample period, +.>The state of charge of the energy storage battery for the 1 st sample period (i.e., the initial time); />Is the self-discharge rate of the energy storage battery; />Open circuit voltage of the energy storage battery; />Is the rated capacity of the energy storage battery; />The charge and discharge efficiency of the energy storage battery is improved; />,/>,/>Is the firstiCharging and discharging power of the energy storage battery in a single sample period, and when the energy storage battery is discharged, the energy storage battery is charged and discharged>When the energy storage battery is charged, the +.>;/>,/>Is the firstiAn efficiency index for a time period that satisfies: />
4. The fusion rule reinforcement learning-based energy storage charge and discharge control decision method according to claim 2, wherein after outputting the charge and discharge decision variables, the method further comprises:
determining photovoltaic power generation power of a sample time period based on a photovoltaic power generation uncertainty model obeying Gaussian distribution;
and determining the power output by the power generation device to the power grid at the sample time Duan Guangfu according to the decision variable output by the charge-discharge control decision basic model and the difference between the photovoltaic power generation power and the user power consumption power at the sample time.
5. The energy storage charge and discharge control decision method based on fusion rule reinforcement learning of claim 4, wherein the method comprises:
by passing throughThe following formula determines a photovoltaic power generation uncertainty model:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Photovoltaic power generation power for a sample period; />Is photoelectric conversion efficiency; />Is the area of the photovoltaic material;solar irradiance representing a sample time period; />Is the variance; photovoltaic power generation power obeys mean expectations asVariance is->Is a gaussian distribution of (c); and determining the power output to the power grid of the photovoltaic power generation device according to the photovoltaic power generation power obtained by the photovoltaic power generation uncertain model by the following formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The power output to the grid by the power generation device for the kth sample time Duan Guangfu; />Using electric power for a user of a kth sample period; />Representing the photovoltaic power generation power of the light in the kth sample period; />Is proportional weight, ++>
6. The energy storage charge and discharge control decision method based on fusion rule reinforcement learning of claim 5, wherein determining the preset charge and discharge rule comprises:
if the current period belongs to the electricity consumption peak period, setting the charge and discharge power of the energy storage battery to be negative;
If the current period belongs to the electricity consumption valley period, setting the charge and discharge power of the energy storage battery to be positive;
if the photovoltaic power generation power is smaller than the load power, setting the proportion weight to be zero;
if the photovoltaic power generation power is larger than the load power, setting the proportional weight as an original proportional weight value;
and constraining decision variables in the charge-discharge control decision model according to the preset charge-discharge rule.
7. The fusion rule reinforcement learning-based energy storage charge and discharge control decision method of claim 6, wherein iteratively training the charge and discharge control decision base model with a reward function comprises:
determining an economic benefit function according to the unit price of electricity sold, unit price of electricity used and the input power of the power grid in each period;
constructing a ride-on rewarding function according to the charge-discharge power of the energy storage battery at the current moment and the charge-discharge power of the energy storage battery at the last moment;
determining a maximum demand power rewarding function according to the input power of the power grid in each period;
determining a constraint punishment rewarding function according to the minimum state of charge of the energy storage battery, the maximum state of charge of the energy storage battery and the state of charge of the energy storage battery at the current moment;
And iteratively training the charge and discharge control decision model according to the economical rewarding function, the smoothness rewarding function, the maximum demand power rewarding function and the constraint punishment rewarding function, and determining that the charge and discharge control decision model training is completed when the rewarding is the maximum and is kept stable in the preset iteration times.
8. A photovoltaic energy storage system based on fusion rule reinforcement learning, the system comprising:
the power grid is in communication connection with the user-side photovoltaic energy storage device and is used for supplying power to the user-side photovoltaic energy storage device;
a user-side photovoltaic energy storage device comprising: the system comprises a photovoltaic power generation device, a user power utilization device and an energy storage battery, wherein the photovoltaic power generation device is used for providing the electric quantity of photovoltaic power generation for the user power utilization device and is used for conveying the residual electric quantity to the energy storage battery or a power grid when the electric quantity is residual;
the energy storage battery is used for storing the residual electric quantity of the photovoltaic power generation device and supplying power to the user power utilization device.
9. An energy storage charge-discharge control decision device based on fusion rule reinforcement learning, which is characterized by comprising:
the state space construction unit is used for determining a state space from user power consumption, the charge state of the energy storage battery, outdoor temperature, solar irradiance, power grid power utilization unit price and power grid selling unit price in any time period;
The optimal charge-discharge decision variable determining unit is used for inputting the state space into a charge-discharge control decision model based on fusion rule reinforcement learning to obtain an optimal charge-discharge decision variable in any time period, wherein the optimal charge-discharge decision variable comprises: and the energy storage battery optimal charge and discharge power and the optimal coefficient are obtained by training the charge and discharge control decision model based on fusion rule reinforcement learning through a sample state space and a photovoltaic power generation uncertain model.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 7 when executing the computer program.
11. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7.
CN202410090946.5A 2024-01-23 2024-01-23 Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning Active CN117613983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410090946.5A CN117613983B (en) 2024-01-23 2024-01-23 Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410090946.5A CN117613983B (en) 2024-01-23 2024-01-23 Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning

Publications (2)

Publication Number Publication Date
CN117613983A true CN117613983A (en) 2024-02-27
CN117613983B CN117613983B (en) 2024-04-16

Family

ID=89952061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410090946.5A Active CN117613983B (en) 2024-01-23 2024-01-23 Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning

Country Status (1)

Country Link
CN (1) CN117613983B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013109958A1 (en) * 2013-09-11 2015-03-12 Sma Solar Technology Ag Photovoltaic system and method for providing control power through a photovoltaic system
CN105811468A (en) * 2016-05-24 2016-07-27 江苏中科国能光伏科技有限公司 Distributed photovoltaic power generation intelligent control and transporting system for data collection arithmetic
US20170117744A1 (en) * 2015-10-27 2017-04-27 Nec Laboratories America, Inc. PV Ramp Rate Control Using Reinforcement Learning Technique Through Integration of Battery Storage System
WO2019126806A1 (en) * 2017-12-22 2019-06-27 The Regents Of The University Of California Design and control of electric vehicle charging infrastructure
WO2019196375A1 (en) * 2018-04-13 2019-10-17 华南理工大学 Demand side response-based microgrid optimal unit and time-of-use electricity price optimization method
US20200106385A1 (en) * 2018-09-28 2020-04-02 Johnson Controls Technology Company Photovoltaic energy system with stationary energy storage control
CN111525624A (en) * 2020-03-26 2020-08-11 天津理工大学 Household distributed energy scheduling method based on storage battery energy storage system
WO2020165631A1 (en) * 2019-02-14 2020-08-20 Enegan S.P.A. System for exchanging renewable source energy
CN111682563A (en) * 2020-05-12 2020-09-18 天津大学 Micro-grid intelligent frequency control method based on electric energy storage system
CN112614009A (en) * 2020-12-07 2021-04-06 国网四川省电力公司电力科学研究院 Power grid energy management method and system based on deep expected Q-learning
KR20210043387A (en) * 2019-10-11 2021-04-21 한국전력정보(주) Electric power market bidding system for distributed photovoltaic devices and bidding method using the same
CN114123273A (en) * 2021-11-12 2022-03-01 青海综合能源服务有限公司 Control method and system of wind power-photovoltaic-energy storage combined system
CN115018231A (en) * 2021-12-15 2022-09-06 昆明理工大学 Autonomous task planning method and system for reinforcement learning deep space probe based on dynamic rewards
CN115840794A (en) * 2023-02-14 2023-03-24 国网山东省电力公司东营供电公司 Photovoltaic system planning method based on GIS (geographic information System) and RL (Link State) models
CN116780506A (en) * 2023-05-23 2023-09-19 深圳供电局有限公司 Household micro-grid energy management method, device, equipment and storage medium
CN117060386A (en) * 2023-07-14 2023-11-14 安徽工程大学 Micro-grid energy storage scheduling optimization method based on value distribution depth Q network

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013109958A1 (en) * 2013-09-11 2015-03-12 Sma Solar Technology Ag Photovoltaic system and method for providing control power through a photovoltaic system
US20170117744A1 (en) * 2015-10-27 2017-04-27 Nec Laboratories America, Inc. PV Ramp Rate Control Using Reinforcement Learning Technique Through Integration of Battery Storage System
CN105811468A (en) * 2016-05-24 2016-07-27 江苏中科国能光伏科技有限公司 Distributed photovoltaic power generation intelligent control and transporting system for data collection arithmetic
WO2019126806A1 (en) * 2017-12-22 2019-06-27 The Regents Of The University Of California Design and control of electric vehicle charging infrastructure
WO2019196375A1 (en) * 2018-04-13 2019-10-17 华南理工大学 Demand side response-based microgrid optimal unit and time-of-use electricity price optimization method
US20200106385A1 (en) * 2018-09-28 2020-04-02 Johnson Controls Technology Company Photovoltaic energy system with stationary energy storage control
WO2020165631A1 (en) * 2019-02-14 2020-08-20 Enegan S.P.A. System for exchanging renewable source energy
KR20210043387A (en) * 2019-10-11 2021-04-21 한국전력정보(주) Electric power market bidding system for distributed photovoltaic devices and bidding method using the same
CN111525624A (en) * 2020-03-26 2020-08-11 天津理工大学 Household distributed energy scheduling method based on storage battery energy storage system
CN111682563A (en) * 2020-05-12 2020-09-18 天津大学 Micro-grid intelligent frequency control method based on electric energy storage system
CN112614009A (en) * 2020-12-07 2021-04-06 国网四川省电力公司电力科学研究院 Power grid energy management method and system based on deep expected Q-learning
CN114123273A (en) * 2021-11-12 2022-03-01 青海综合能源服务有限公司 Control method and system of wind power-photovoltaic-energy storage combined system
CN115018231A (en) * 2021-12-15 2022-09-06 昆明理工大学 Autonomous task planning method and system for reinforcement learning deep space probe based on dynamic rewards
CN115840794A (en) * 2023-02-14 2023-03-24 国网山东省电力公司东营供电公司 Photovoltaic system planning method based on GIS (geographic information System) and RL (Link State) models
CN116780506A (en) * 2023-05-23 2023-09-19 深圳供电局有限公司 Household micro-grid energy management method, device, equipment and storage medium
CN117060386A (en) * 2023-07-14 2023-11-14 安徽工程大学 Micro-grid energy storage scheduling optimization method based on value distribution depth Q network

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
卢美玲等: "面向能源互联网的家庭光伏发电系统经济效益优化调度模型", 《电力系统及其自动化学报》, vol. 28, no. 1, 15 December 2016 (2016-12-15), pages 141 - 146 *
张晋铭等: "《计及配电网可靠性和运行经济性的电网侧储能优化配置》", 《电力自动化设备》, 4 January 2024 (2024-01-04), pages 1 - 12 *
班银银等: "《公共建筑光伏发电系统的经济性优化与评估》", 《上海电力学院学报》, vol. 32, no. 03, 15 June 2016 (2016-06-15), pages 288 - 292 *
祁芙蓉: "《光储充放电站优化调度及储能配置的研究》", 中国硕博论文, 12 December 2023 (2023-12-12), pages 1 - 72 *
邓忻依等: "分布式光伏储能系统综合效益评估与激励机制", 《发电技术》, no. 01, 28 February 2018 (2018-02-28), pages 30 - 41 *
郭宁等: "家庭并网光伏发电系统优化调度及经济性分析", 《中国电力》, vol. 49, no. 1, 5 October 2016 (2016-10-05), pages 159 - 165 *
高雪莹;唐昊;苗刚中;平兆武;: "储能系统能量调度与需求响应联合优化控制", 《系统仿真学报》, vol. 28, no. 05, 8 May 2016 (2016-05-08), pages 1165 - 1172 *

Also Published As

Publication number Publication date
CN117613983B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
Foruzan et al. Reinforcement learning approach for optimal distributed energy management in a microgrid
Wu et al. Dynamic economic dispatch of a microgrid: Mathematical models and solution algorithm
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
Tao et al. Deep reinforcement learning based bidding strategy for EVAs in local energy market considering information asymmetry
CN116451880B (en) Distributed energy optimization scheduling method and device based on hybrid learning
CN110110895A (en) The method and terminal device of electric car Optimized Operation
CN103078152B (en) Intelligent charging method for centralized charging station
Chuang et al. Deep reinforcement learning based pricing strategy of aggregators considering renewable energy
CN116054270A (en) Power grid dispatching optimization method and device and nonvolatile storage medium
CN114169916B (en) Market member quotation strategy formulation method suitable for novel power system
CN104112168B (en) A kind of smart home optimization method based on multi-agent system
Chen et al. A Deep Reinforcement Learning-Based Charging Scheduling Approach with Augmented Lagrangian for Electric Vehicle
CN111898801B (en) Method and system for configuring multi-energy complementary power supply system
CN112670982B (en) Active power scheduling control method and system for micro-grid based on reward mechanism
CN112510690B (en) Optimal scheduling method and system considering wind-fire-storage combination and demand response reward and punishment
CN117613983B (en) Energy storage charge and discharge control decision method and device based on fusion rule reinforcement learning
Wang et al. Learning-based energy management policy with battery depth-of-discharge considerations
CN113554219B (en) Method and device for planning shared energy storage capacity of renewable energy power station
CN113555887B (en) Power grid energy control method and device, electronic equipment and storage medium
Abdelkader et al. A market oriented, reinforcement learning based approach for electric vehicles integration in smart micro grids
CN115759478A (en) Cooperative game-based micro-grid group optimal operation method, device, equipment and medium
CN117613848A (en) Load scheduling method and device for resident user aggregate
CN114723300A (en) Multi-energy scheduling method for comprehensive energy system
CN117610710A (en) Multi-target planning management method, device, storage medium and system for energy system
Hyder et al. Energy Management Optimization Through Conventional and AI Approaches for Efficient Electrical Energy Utilization.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant