CN116739158A

CN116739158A - Self-adaptive optimized energy storage method based on reinforcement learning

Info

Publication number: CN116739158A
Application number: CN202310640040.1A
Authority: CN
Inventors: 邢立宁; 蒋雪梅; 李豪; 郭泱泱; 吕旷达; 周宇; 万方高; 李济廷; 宋彦杰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-09-12

Abstract

The invention discloses a self-adaptive optimized energy storage method based on reinforcement learning, which comprises the following steps: acquiring power data of a user, and constructing a demand charge model of the user; combining the demand charge model, and analyzing and evaluating economic benefits of the energy storage system under the current electricity consumption condition by considering constraint conditions to obtain a preliminary distribution scheme of the energy storage system; preprocessing energy storage action parameters according to the preliminary allocation scheme of the energy storage system, and roughly allocating action strategies; converting the roughly distributed energy storage system actions into action sequences and corresponding to each moment point to generate an energy storage system scheduling sequence according to time sequencing; and dynamic action adjustment is required to be carried out on the energy storage system scheduling sequence according to the current environmental characteristics so as to achieve the aim of optimizing enterprise income. The method overcomes the defect of insufficient generality of the traditional algorithm, reduces the dependence of the traditional algorithm on scenes, can be applicable to different scenes, and finally obtains the optimized scheduling strategy.

Description

Self-adaptive optimized energy storage method based on reinforcement learning

Technical Field

The invention belongs to the technical field of battery energy storage optimization scheduling, and particularly relates to a self-adaptive energy storage optimization method based on reinforcement learning, which can be used for various scenes such as power system optimization, micro-grid energy storage, industrial engineering, municipal construction and the like.

Background

In recent years, in view of the continuous development and perfection of energy storage systems, the applicable power system field develops from peak clipping and valley filling to various directions such as frequency adjustment, demand side response, power supply reliability analysis, stabilization of new energy power generation fluctuation and the like, and the energy storage systems are used as carriers with very good peak clipping and valley filling of power terminal loads, can save power cost for industrial enterprise users, and the energy storage systems are widely promoted to be applied by governments in various places, so that the energy storage systems have wide commercial prospect. Therefore, research on the energy storage optimization scheduling at the user side is attracting attention. Although the energy storage optimal scheduling is beneficial to fully exerting the economic benefit and the environmental benefit of the energy system, the randomness and the uncertainty of the power use environment bring great difficulty to the energy storage scheduling, and in order to better popularize and use the energy storage system, the energy storage needs to be optimally scheduled to realize efficient, economical and stable operation.

The conventional energy storage scheduling mode is based on comprehensive analysis of data and is adjusted manually, and the whole energy storage scheduling process is from data analysis to a closed loop link of manually adjusting an energy storage scheme, as shown in fig. 1. It can be seen that the energy storage flow involves a plurality of modules and departments such as a user, a functional department, an energy storage system, an energy exchange device and the like, and the business of each component is not uniform, so that the interaction between the components is inconvenient. The environment information and constraint information according to the energy storage planning are not real-time information, and certain hysteresis exists. In addition, because the energy storage system and the energy supply party are not necessarily connected, the work flow between the two is imperfect at present, and the limited idle resources of the energy storage system play a vital role in the whole flow, so that the issuing of instructions and the feedback of states are influenced. Overall, the conventional energy storage planning process can complete the response to part of the energy, but is still insufficient to meet the new requirements and challenges. The drawbacks of the conventional energy storage process can be summarized as follows:

(1) The whole energy storage flow is complicated, the period from the time of providing the demands to the time of obtaining the energy use feedback information is often longer, and the requirement of users on timeliness cannot be met; (2) The energy storage planning is generally based on offline operation, and the planning scheme cannot adapt to the environment which changes in real time; (3) The system lacks of a quick auxiliary means and an automatic processing flow, and in many cases, the operation data of the energy storage system needs to be manually modified, the emergency adjustment flow is complex, the man-machine interaction operation is numerous, and the error operation is very easy to occur.

The energy storage system can be seen to work only according to the planning result in the whole process, and cannot respond in real time according to the changes of the enterprise working environment and the business acceptance condition, so that the actual scheduling result deviates from the expected result, and the fact that the actual operation efficiency of the energy storage system is lower than the expected result is further reflected.

With the intensive research of intelligent optimization methods, reinforcement learning algorithms become a main technology for solving the problem of energy storage scheduling at the user side. The algorithm does not need priori knowledge, can realize the dynamic learning evolution of the system through self-adaptive parameter adjustment, can be suitable for different scenes, and finally obtains an optimized scheduling strategy. At present, aiming at the problem of real-time scheduling of the hybrid energy storage system, a learner provides a real-time scheduling method of the hybrid energy storage system based on a dynamic programming-genetic algorithm, so that the timeliness of energy storage scheduling is effectively improved. And a learner optimizes the energy storage charging and discharging strategy at the user side by combining a genetic algorithm with a simulated annealing algorithm, so that the optimizing speed and the convergence performance of the algorithm are effectively improved. In addition, aiming at the problem of poor peak clipping and valley filling instantaneity of the battery energy storage system, the system is optimized and adjusted in real time by utilizing dynamic planning. The technology has important research significance and more extensive application value in the fields of power system optimization, micro-grid energy storage and industrial engineering.

Aiming at the optimal scheduling problem of the battery energy storage system, the economic problem of the energy storage system and the optimal scheduling problem of the system charging and discharging strategy need to be considered. The existing proposed algorithm only starts from economy and does not comprehensively consider two aspects of problems in solving the energy storage scheduling problem, the economical efficiency and the real-time performance of the proposed algorithm in peak clipping and valley filling cannot be considered, and the self-adaptive adjustment strategy for the system stability and the scheduling strategy is insufficient, so that the purpose of improving the economic benefit of the energy storage system cannot be achieved. The method is characterized in that the search result is extremely sensitive to the parameter configuration of the algorithm in the iterative process of the currently proposed optimization algorithm, the parameter configuration needs to be adjusted according to the problems in the calculation process, and the calculation efficiency is low; secondly, the algorithm is mainly based on problem guiding type, the solving process depends on problems or scenes, and the particularity of the energy storage scheduling problem and the applicability of a solving model are not considered; and thirdly, the energy storage optimization scheduling objective function and constraint conditions are extremely complex, the calculation complexity is high, and the current solving model and solving method have low convergence speed on the optimization result and still need to be further improved.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a self-adaptive optimized energy storage method based on reinforcement learning. The technical problems to be solved by the invention are realized by the following technical scheme:

The invention provides a self-adaptive optimized energy storage method based on reinforcement learning, which comprises the following steps:

s1: acquiring power data of a user, analyzing the power consumption condition of a current power system, and constructing a user demand charge model;

s2: combining the demand charge model, and analyzing and evaluating economic benefits of the energy storage system under the current electricity consumption condition by considering constraint conditions to obtain a preliminary distribution scheme of the energy storage system;

s3: preprocessing energy storage action parameters according to the preliminary distribution scheme of the energy storage system, and roughly distributing action strategies of the energy storage system at each stage;

s4: converting the roughly distributed energy storage system actions into action sequences and corresponding to each moment point to generate an energy storage system scheduling sequence according to time sequencing;

s5: and according to the current environmental characteristics, carrying out dynamic action adjustment on the energy storage system scheduling sequence based on a Q learning algorithm so as to achieve the aim of optimizing enterprise income.

In one embodiment of the invention, the demand charge model is expressed in terms of a maximum investment return rate max (E/C), where C represents the investment cost of the user installing the energy storage system and E represents the energy storage system return.

In one embodiment of the present invention, the S2 includes:

S2.1: constructing constraint conditions, wherein the constraint conditions comprise energy storage load constraint, energy storage system capacity constraint and energy storage multiplying power constraint,

energy storage load constraint:

energy storage system capacity constraints: l (L) _min,t ≤P _t +δ _i,t -ρ _i,t ≤L _max,t

Energy storage multiplying power constraint: e (E) _max ＝β*P _max

wherein ,δ_max Represents maximum discharge power, p _max Represents maximum charging power, P _t Represents the energy storage load at time t, S _t Representing the state of charge of the battery at time t, L _min，t 、L _max，t Represents the minimum value and the maximum value of the load of the energy storage system at the moment t, E _max Representing energy storage system capacityBeta represents the energy storage charge-discharge multiplying power, P _max Representing the rated power of the energy storage system;

s2.2: and analyzing and evaluating the economic benefit of the energy storage system under the current electricity consumption condition by combining the demand charge model and the constraint condition to obtain a preliminary distribution scheme of the energy storage system.

In one embodiment of the present invention, the S2.2 includes:

s2.21: initializing electricity parameters including rated power, charge-discharge power and charge-discharge multiplying power parameter setting;

s2.22: initializing charge and discharge actions to be executed in each time period in a day of a user to form a charge and discharge action sequence set in each time period in the day;

s2.23: and judging whether the current action sequence meets the expected value of an objective function formed by the demand charge model and the constraint condition, if so, outputting a pre-planning sequence, otherwise, returning to the step S2.22.

In one embodiment of the present invention, the S3 includes:

s3.1: calculating charge and discharge action probabilities of three stages of a power consumption peak section, a power consumption valley section and a power consumption level section by taking energy constraint, charge and discharge power constraint, energy storage load constraint and energy storage multiplying power constraint into consideration;

s3.2: and adjusting the action sequence in the preliminary distribution scheme of the energy storage system according to the charging and discharging action probability, and roughly dividing the action strategy set of the energy storage system.

In one embodiment of the invention, in the power consumption peak section, the discharge probability > the idle probability > the charge probability; in the electricity utilization valley section, the charging probability > the idle probability > the discharging probability; in the active level segment, the idle probability > the charge probability > the discharge probability.

In one embodiment of the present invention, the S5 includes:

s5.1: constructing an objective function for energy storage system optimization, wherein the objective function is measured by two dimensions of the charge and discharge times of the energy storage system and the economical efficiency of the energy storage system;

s5.2: the energy storage system adaptively adjusts triggering conditions of the energy storage system for optimal scheduling according to the change of the action candidate set, the emergency or the maintenance of the energy storage system;

s5.3: and optimizing the action sequence of the energy storage system based on the Q learning algorithm until the optimal action sequence is obtained.

In one embodiment of the invention, the objective function is expressed as:

wherein ,α ₁ +α ₂ ＝1，/>indicating that if the energy storage system k is in a charged state during the t period +.>Otherwise-> Indicating that if the energy storage system k is in a discharge state within the t period of time,/is>Otherwise-> Representing the expenditure of the energy storage system k to change from an idle state to a charged state in a t period; />Representing the benefit of the energy storage system k from an idle state to a discharge state in a t time period; c (C) _k Representing the investment cost, T, of the energy storage system k _k Indicating the period of time that the energy storage system k needs to schedule a charging and discharging action.

In one embodiment of the present invention, the S5.3 includes:

s5.31: loading various initial information of the energy storage system, wherein the initial information mainly comprises a power time period set, a working state set, an action initialization set, system operation data, basic conditions of occurrence probability of the energy storage state, a Q table and basic parameters of a Q learning algorithm;

s5.32: recording the current working state of the energy storage system, and selecting the current working mode according to the action probability distribution condition at the current moment;

s5.33: evaluating according to the selected action condition, and calculating an energy storage income value of the working state at the current moment;

s5.34: taking the energy storage gain value of the current moment point as an element for calculating the Q value, and selecting the maximum Q value of the next state from the Q table according to a greedy strategy so as to calculate the Q value under the current state;

S5.35: and according to the calculated Q value in the current state, updating the Q value in the current state, arranging actions at each moment point according to the updated Q value, and finally maintaining an optimal action sequence until the whole process is finished, wherein the optimal Q value corresponds to the SN value in the objective function, the SN value is used as an influence parameter for solving the benefits of the energy storage system, the objective function value is finally calculated, a final action sequence is generated, and the energy storage system executes according to the action sequence.

Compared with the prior art, the invention has the beneficial effects that:

1. the self-adaptive optimization energy storage method based on reinforcement learning, based on two aspects of energy storage economy and optimization strategy, considers a plurality of actual factors such as power, floating electricity price, user demand, peak-valley constraint and the like, provides a double-layer planning model which aims at energy storage investment return rate and energy storage income, outputs an action sequence of a scheduling layer of an energy storage system as an input of the scheduling layer in view of environmental change, emergency and the like, and feeds back the action sequence to the scheduling layer through system economy evaluation of the scheduling layer, so that iteration is repeated, and the self-adaptive learning process of the energy storage system is realized.

2. The invention uses the reinforcement learning method to drive the algorithm to evolve, generates the energy storage scheduling sequence and calculates the profit value, the algorithm does not need priori knowledge, can realize the system dynamic learning evolution by self-adaptive parameter adjustment, is globally superior to the traditional scheduling algorithm, overcomes the defect of insufficient generality of the traditional algorithm, reduces the dependence of the traditional algorithm on scenes, can be applicable to different scenes, and finally obtains the optimized scheduling strategy.

3. The invention introduces the action sequence re-planning triggering condition, which is divided into three types, namely: action candidate set change, burst state, system maintenance. Therefore, the energy storage system can be adjusted in real time in response to a complex environment, and under the condition of optimal scheduling of the battery energy storage system at the user side in the dynamic environment, an efficient energy storage scheduling algorithm is researched, so that efficient solution of a dynamic and complex energy storage scheduling problem is ensured, and self-adaptive real-time dynamic energy storage scheduling is realized.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

FIG. 1 is a flow chart of an implementation of a conventional energy storage planning method;

FIG. 2 is a flow chart of an adaptive optimized energy storage method based on reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a scheduling process of an adaptive optimized energy storage method based on reinforcement learning according to an embodiment of the present invention;

fig. 4 is a diagram of a charging model of domestic electricity prices provided by the embodiment of the invention;

FIG. 5 is a flow chart of an energy storage planning phase provided by an embodiment of the present invention;

FIG. 6 is a pre-allocation relationship diagram of charge and discharge actions at various time points provided by the embodiment of the invention;

FIG. 7 is a diagram of a constraint relationship between actions and energy storage systems provided by an embodiment of the present invention;

FIG. 8 is a flow chart of a scheduling layer process provided by an embodiment of the present invention;

FIG. 9 is a flowchart of an energy storage system action sequence optimization provided by an embodiment of the present invention;

FIG. 10 is a schematic diagram of an energy storage plan after an action candidate set is changed according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of an emergency re-planning provided by an embodiment of the present invention;

FIG. 12 is a schematic diagram of a system overhaul re-planning provided by an embodiment of the present invention;

FIG. 13 is a diagram of a reinforcement learning model provided by an embodiment of the present invention;

FIG. 14 is a flowchart of an energy storage optimization scheduling algorithm based on Q reinforcement learning provided by an embodiment of the invention;

FIG. 15 is a 5 month to 8 month electrical load for a business provided by an embodiment of the present invention;

FIG. 16 is a graph comparing enterprise power loads for a method of an embodiment of the present invention and a conventional TES algorithm;

FIG. 17 is a scheduling policy performance analysis of the method of an embodiment of the present invention and a conventional TES algorithm;

FIG. 18 is a graph of convergence speed of a method of an embodiment of the invention and a conventional TES algorithm.

Detailed Description

In order to further explain the technical means and effects adopted by the invention to achieve the preset aim, the self-adaptive optimized energy storage method based on reinforcement learning according to the invention is described in detail below with reference to the attached drawings and the specific embodiments.

The foregoing and other features, aspects, and advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments when taken in conjunction with the accompanying drawings. The technical means and effects adopted by the present invention to achieve the intended purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only, and are not intended to limit the technical scheme of the present invention.

The implementation of the embodiment of the invention comprises a planning layer and a scheduling layer, wherein the planning layer initializes parameters, the parameters comprise enterprise electricity consumption parameters and energy storage system action parameters, and the economic evaluation of the energy storage scheduling system and the optimization of energy storage distribution are completed; the scheduling layer completes the action strategy preprocessing, the initial state setting, and under the premise of considering the charge-discharge constraint conditions, a charge-discharge sequence meeting the conditions is generated according to the charge-discharge scheduling strategy. Referring to fig. 2 and 3, the adaptive optimized energy storage method of the present embodiment includes:

s1: and acquiring the power data of the user, analyzing the power consumption condition of the current power system, and constructing a user demand charge model.

Specifically, first, electric power data of a user (such as an enterprise) needs to be introduced, and the electricity consumption condition and the electricity consumption characteristics of the user are analyzed. The power consumption condition and the power consumption characteristics of the user refer to dynamic changes of the load and the use scene of the user. Currently, the domestic electricity price charging mode is divided into a charging mode of a combination of "basic electricity price + electricity price" and a combination of "basic electricity price + electricity price of demand", as shown in fig. 4. The cost generated by different charging models is different, the electricity charging mode takes the actual use condition of electricity as a charging standard, the electricity charging mode dynamically charges according to the electricity consumption requirement reported by an enterprise, and the scale value exceeding the electricity consumption requirement reported by the enterprise floats upwards by one time according to the existing electricity cost. In view of the flexibility and dynamic adjustability of the demand metering mode, the enterprise selects this type of metering mode to effectively control the power input. Therefore, in the initialization stage of the planning layer, the enterprise electricity consumption characteristics need to be started, and an enterprise demand charge model is constructed first.

Specifically, C represents the investment cost of installing the energy storage system by the user, E represents the energy storage system benefit, and the enterprise's demand charge model may be represented by the maximum investment benefit rate max (E/C). The embodiment of the invention defines the influence parameters of the investment benefits of the energy storage system, and comprises the following steps: period of energy storage system availability (N), period of energy storage system investment recovery (N _T ) Monthly power rate (C) ₁ ) Monthly after installation of the energy storage systemBasic electric charge (C) ₂ ) Power supply income of energy storage system (S _T ) Basic monthly electricity fee (B) _T ) The energy storage system benefits may be described in particular as:

wherein ,C ₂ ＝(S _T -B _T ) T represents days, n represents time points, namely 1-24 hours, the value is 1-24, m represents electricity consumption of different time periods, and delta _i，t Represents the discharge rate, ρ, of the energy storage system at the ith time point of the ith day _i，t The energy storage system charge rate at time point t on day i is indicated.

S2: and combining the demand charge model, and analyzing and evaluating the economic benefit of the energy storage system under the current electricity consumption condition by considering constraint conditions to obtain a preliminary distribution scheme of the energy storage system.

In this embodiment, the step S2 specifically includes the following steps:

s2.1: and constructing constraint conditions, wherein the constraint conditions comprise energy storage load constraint, energy storage system capacity constraint and energy storage multiplying power constraint.

Aiming at the situation that peak-valley electricity prices are different in each place and the difference of the peak-valley electricity prices in certain places is small, an evaluation model taking return on investment as an objective function is built based on a demand charge model, and the energy storage system benefits can be represented by the energy storage system benefit years and the electricity charge saving situation of users; and considering constraint conditions such as energy storage load constraint, energy storage system capacity constraint, energy storage multiplying power constraint and the like, and simultaneously adding an energy storage strategy into a planning process to generate an energy storage system economic evaluation result and an energy storage system primary allocation scheme.

Specifically, after initializing electricity parameters and building a user demand energy consumption model, the feasibility and economic benefit of enterprise installation of the energy storage system need to be evaluated from the energy storage system economic consideration. Because the peak-valley electricity prices are different in each place, the situation that the peak-valley electricity price difference in some places is small exists, and the energy storage system benefit can be represented by the energy storage system benefit age and the user electricity saving condition based on the demand charge model in the last stage. In addition, the model needs to consider constraint conditions such as energy storage load constraint, energy storage system capacity constraint, energy storage multiplying power constraint and the like, and meanwhile, an energy storage strategy is added into a planning process, and finally an energy storage system economic evaluation result and an energy storage system preliminary allocation scheme are given, wherein in the process, the energy storage load constraint, the energy storage system capacity constraint and the energy storage multiplying power constraint can be represented by the following several formulas:

The energy storage load constraint is expressed as:

the energy storage system capacity constraint is expressed as:

L _min,t ≤P _t +δ _i,t -ρ _i,t ≤L _max,t

the energy storage rate constraint is expressed as:

E _max ＝β*P _max

wherein ,δ_max Represents the maximum discharge power ρ _max Represents maximum charging power, P _t Represents the energy storage load at the time t, S _t Representing the state of charge of the battery at time t, L _min，t 、L _max，t Represents the minimum value and the maximum value of the load of the energy storage system at the moment t, E _max Represents the capacity of the energy storage system, beta represents the charge-discharge multiplying power of energy storage, and P _max Indicating the power rating of the energy storage system.

S2.2: and analyzing and evaluating the economic benefit of the energy storage system under the current electricity consumption condition by combining the demand charge model and the constraint condition to obtain a preliminary distribution scheme of the energy storage system. Specifically, referring to fig. 5, the steps may be expressed as:

s2.21: initializing electricity parameters including rated power, charge-discharge power and charge-discharge multiplying power parameter setting; the method comprises the steps of carrying out a first treatment on the surface of the

S3: and (3) transmitting the preliminary distribution scheme of the energy storage system obtained in the step (S2) to a scheduling layer, preprocessing energy storage action parameters, and roughly distributing energy storage strategies in each stage.

In the step, the first problem to be solved is the occurrence probability of the charge-discharge action at each moment, and the constraints related to the action strategy of the energy storage system include energy constraint, charge-discharge power constraint, load constraint of energy storage and energy storage multiplying power constraint, wherein the energy constraint indicates that the process of taking the charge-discharge action in any time period cannot exceed the maximum capacity EN of the battery _i The charge-discharge power constraint indicates that the power value at any time cannot exceed the rated power. To accomplish energy storage scheduling, these four types of constraints must be satisfied, so in the first step, the probabilities of actions at each time point are pre-allocated according to three phases of peak, valley and flat, as shown in fig. 6, the embodiment of the present invention uses p (s _t ，s _t+1 ，a _t ) Representing the state of the energy storage system by s _t State passing action a _t Transfer to s _t+1 Probability of state, when the energy storage system works, the probability of state is represented by state s _t Is converted into s _t+1 The state probabilities of the energy storage system are different, such as the magnitude of the energy storage system action selection probability is the charging probability during the electricity consumption valley period>Idle probability >The discharge probability is opposite to the probability of the electricity consumption peak period, and p(s) is different from the peak-to-valley electricity price _t ，s _t+1 ，a _t ) The following constraints need to be met. And then, according to the distribution result, the action sequence input by the planning layer is adjusted according to the occurrence probability.

Representing energy storage system during peak electricity utilization periodThe probability of selecting a discharging action is greater than the probability of the system being idle, in which case the probability of the energy storage system selecting a charging action is minimal; />The probability that the energy storage system selects an idle action is larger than the probability that the system charges the action when the level period is used, and in this case, the probability that the energy storage system selects a discharging action is minimum; /> Representing that the probability of the energy storage system selecting a charging action is greater than the probability of the system being idle during the valley period of use, in which case the probability of the energy storage system selecting a discharging action is minimal, wherein +.>Indicating the selection of the state of charge at time t +.>Indicating the selection of the discharge state at time t +.>Indicating that an idle state is selected at time t. And then, according to the distribution result, the action sequence input by the planning layer is adjusted according to the occurrence probability. The specific implementation of the steps is as follows:

s3.1: and calculating action probabilities of three stages of the electricity consumption peak section, the electricity consumption valley section and the electricity consumption level section by taking energy constraint, charge and discharge power constraint, energy storage load constraint and energy storage multiplying power constraint into consideration.

When the energy storage system works, for example, during a power consumption valley period, the action selection probability of the energy storage system is a charging probability > idle probability > discharging probability, and during a power consumption peak period, the probability is a discharging probability > idle probability > charging probability. Therefore, before the energy storage scheduling, the energy storage system needs to analyze the action at a certain moment according to the current specific situation, and takes the action which is suitable for selection at each moment as the constraint condition for charge and discharge action allocation. During charging, parameters to be considered in the overhead of the system include: the electric charge of the time period, the electric charge required to be consumed by the charging system, the electric quantity of the charging and the like. When the system is idle, the electricity charge of the time period is considered. When the system discharges, the overhead of the system needs to be considered to include: the electricity charge of the time period, the electricity charge saved by discharging of the discharging system, the electric quantity of the discharging and the like.

The charging and discharging process meets the energy constraint, the maximum of one-time charging and discharging in each time period, the minimum and maximum of the load constraint of the energy storage system, the rated power constraint of the energy storage system and the like, and the pre-distribution of the charging and discharging actions is carried out based on the constraint conditions to obtain a coarse distribution result.

S4: converting the roughly distributed energy storage system actions into action sequences and corresponding to each moment to generate an energy storage system scheduling sequence according to time sequencing.

Through step S3, although the action strategy of the energy storage system is roughly divided, the benefit of each adjustment result for the energy storage system is still not clear. The step converts the regulated action into action sequence and corresponds to each moment, and several constraint conditions of the energy storage system are defined in the foregoing, and obviously under the constraint conditions, action strategy selection is different from the strategy generated in the planning process, and according to the constraint relations, an initial action sequence in the scheduling process can be generated. Referring to fig. 7, fig. 7 illustrates basic constraint relationships that need to be satisfied in generating an energy storage action sequence, and when energy storage scheduling is performed at each moment, the action sequence needs to be reasonably generated according to the constraint relationships. In general, the sequence of actions that satisfy the constraint is not unique in view of the probability of occurrence of the actions. After the action at each moment is generated, a sequence meeting the constraint condition is searched in the processed action and is used as a scheduling result of the time, and after the scheduling result is generated, an energy storage system scheduling sequence can be generated according to the current scheduling result as shown in fig. 8.

S5: energy storage system self-adaptive learning: and according to the current environmental characteristics, carrying out dynamic action adjustment on the energy storage system scheduling sequence based on a Q learning algorithm so as to achieve the aim of optimizing enterprise income.

In the energy storage system optimization scheduling process, in view of environmental change, emergency, economic benefit consideration and other reasons, the process takes the action sequence output of the energy storage system scheduling layer as the input of the planning layer, and the action sequence is fed back to the scheduling layer through the system economy evaluation of the planning layer, so that iteration is repeated, and the energy storage system self-adaptive learning process is realized. Referring to fig. 9, fig. 9 is a flowchart of optimizing an energy storage system action sequence according to an embodiment of the present invention, wherein a dashed line a represents an initialization action sequence, and a solid line B represents an updated action sequence after a scheduling layer of the energy storage system is combined with a planning layer for re-planning scheduling.

After passing through the energy storage system planning layer, the energy storage system then needs to perform a responsive action according to the scheduling sequence. However, due to environmental changes, emergency situations, economic benefit consideration and the like, the preset action sequence cannot meet the actual use requirement, and excessive economic investment of enterprises is caused. In order to avoid economic loss caused in the working process of the energy storage system, dynamic action adjustment is needed according to the current environmental characteristics, so that the aim of optimizing the enterprise income is fulfilled. The specific implementation of the steps is as follows:

S5.1: and constructing an objective function for energy storage system optimization, wherein the objective function is measured by two dimensions of the charge and discharge times of the energy storage system and the economical efficiency of the energy storage system.

The optimization objective of the user side energy storage planning problem is to reduce the expenditure of the user electric charge as much as possible on the basis of considering the electricity balance of each module in each user industry chain. The user energy storage running condition is measured by recording the charge and discharge times SN of the energy storage system, and the overall benefit of the user energy storage is determined according to the economical efficiency of the energy storage system. The objective function of the user side energy storage is composed of the above two dimensions, and thus, the objective function of the energy storage scheduling system can be expressed as:

wherein ,α ₁ +α ₂ ＝1，/>indicating that if the energy storage system k is in a charged state during the t period +.>Otherwise-> Indicating that if the energy storage system k is in a discharge state within the t period of time,/is>Otherwise-> Representing the expenditure of the energy storage system k to change from an idle state to a charged state in a t period; />Representing the benefit of the energy storage system k from an idle state to a discharge state in a t time period; k epsilon S represents taking into account the energy storage system k; c (C) _k Representing the investment cost, T, of the energy storage system k _k Indicating the period of time that the energy storage system k needs to schedule a charging and discharging action.

S5.2: the energy storage system adaptively adjusts the triggering condition of the energy storage system for optimal scheduling according to the change of the action candidate set, the emergency or the maintenance of the energy storage system.

When the energy storage system performs action sequence preparation, an initialized sequence input is needed first, the input is the triggering premise of the system for optimizing and scheduling, and the triggering condition is needed to be set in advance before the energy storage system performs planning and scheduling. In the energy storage system optimal scheduling process, once the energy storage system rescheduling condition is met, the system completes energy storage planning and action sequence rescheduling response according to system setting. In general, in the energy storage system scheduling process, the situation that needs to be re-planned can be divided into the following three situations:

1) The action candidate set is changed. And adjusting the action candidate set corresponding to a certain moment. This may be due to changes in the price of electricity per hour or enterprise policy adjustments. For motion candidate set changes, assume that the energy storage system is at a time T _k The original action candidate set is c= { C ₁ ，C ₂ ，C ₃ For some special reason, T _k Action C above ₁ Will not be a candidate set, at which point the energy storage system needs to rely on the new candidate set C' = { C ₂ ，C ₃ Referring to fig. 10, fig. 10 is a schematic diagram of energy storage planning after the motion candidate set is changed according to the embodiment of the present invention.

2) An emergency situation. In the working process of the energy storage system, emergency conditions exist in the strategy arrangement of the action sequences due to the fact that the situation of enterprise power utilization emergency change and the like is not expected, and under the conditions, the action sequences need to be planned again, so that the maximum benefit under the conditions is ensured. Referring to fig. 11, fig. 11 is a schematic diagram illustrating an emergency re-planning according to an embodiment of the invention. As shown in fig. 11, T _k The power consumption condition of the enterprise suddenly changes at moment, and the action strategy is C ₁ Adjusted to C ₃ The enterprise determines the action strategy at the moment when the enterprise needs to schedule resources to ensure the operation of the enterprise under the condition, and the energy storage system needs to ensure the whole energy storage strategy according to the known conditionAnd adjusting to ensure that the enterprise has the maximum benefit.

3) And (5) overhauling the energy storage system. During the operation process of the energy storage system, maintenance work can be arranged, and the capacity of energy storage scheduling pin peak filling is lost. At this time, the arrangement of the action strategy at the moment when the system is not scheduled is considered, and the system can be restored and converted into normal execution within the t time of system maintenance. In this process, it is necessary to put the energy storage state of the intermediate t period into a uniform idle state and reschedule again based on this premise. Referring to fig. 12, fig. 12 is a schematic diagram of a system maintenance re-planning provided by an embodiment of the present invention, in the working process of the energy storage system, due to T _k The time maintenance work needs to be maintained for t hours, and the energy storage system adjusts the scheduling strategy according to the known condition, so that the efficient utilization of resources is ensured to the greatest extent.

The Q Learning algorithm (Q-Learning) is taken as a typical offline control strategy in the reinforcement Learning algorithm, is based on Markov theory, adopts a state-action cost function to carry out iterative solution, and effectively solves the sequential decision problem through Learning. When the energy storage scheduling problem is solved, the traditional scheduling method iterates through a greedy strategy, has obvious defects in global optimal scheduling, and is easy to fall into the optimal performance.

The Q-learning algorithm has the following advantages over general mathematical optimization:

1) The Q learning algorithm based on reinforcement learning is a model-free algorithm, and a very accurate model is not needed in the process of solving and optimizing. For example, in the energy storage optimization process, the coupling relation between the modules can cause the solving model to be difficult to build. Therefore, the selection reinforcement learning has a certain versatility.

2) The reward function is crucial in the reinforcement learning strategy, complex operations such as differentiation, matrix inversion and the like caused by the objective function can be avoided, and the calculation time is shortened. In addition, based on the repeated error trial and error thought, the Q learning algorithm can enable the intelligent agent to have certain reasoning learning capacity by continuously exploring the unknown field and utilizing the existing experience, and has good generalization capacity and certain advantages compared with the traditional heuristic algorithm.

Referring to fig. 13, fig. 13 is a diagram of a reinforcement learning model provided in an embodiment of the present invention, in which an agent connects an environment through sensing and actions, takes a current state as an input, selects an action to generate an output, and the selection of the action affects the state of the environment and is fed back to the agent. The selection of the behavior of the agent generates a prize value that is accumulated with the selection of the behavior to achieve system convergence in repeated iterations. Specifically, referring to fig. 14, fig. 14 is a flowchart of an energy storage optimization scheduling algorithm based on Q reinforcement learning, and the adaptive energy storage optimization scheduling method based on Q learning according to the embodiment of the present invention includes the following steps, and the specific flow is described as follows:

s5.31: initializing. And loading various initial information of the energy storage system. For the energy storage planning and scheduling system, the initial information mainly comprises a power time period set, a working state set, an action initialization set, system operation data and basic conditions of the probability of occurrence of the energy storage state, a Q table, basic parameters of a Q learning algorithm and the like.

S5.32: and (5) action selection. Recording the current working state of the energy storage system, and selecting the current working mode (charging, discharging and idle) according to the action probability distribution condition at the current moment. At this stage, when the probabilities of the respective actions are equal, one action may be selected at random, and the next action is selected.

S5.33: and (5) an evaluation stage. In this process, evaluation is required according to the selected action situation, and the reward value of the working state at this time is calculated. For the energy storage system, an energy storage action is selected, the benefit obtained by the action is executed by the evaluation system, and finally the evaluation result is recorded and is used as information input in the subsequent stage.

S5.34: the Q value is calculated. After the evaluation result, the energy storage gain value at the current moment can be used as an element for calculating the Q value. According to greedyThe strategy selects the maximum Q value of the next state from the Q table (the Q values are all 0 in the case of initialization), thereby calculating the Q value in the current state. In the process, in order to ensure the exploratory property of the algorithm, a greedy strategy selection probability needs to be reasonably set, and the energy storage system takes action a at the time t _t The value function of (c) is expressed as follows,

wherein ,Q(s_t ，a _t ) Representing the energy storage system at t moment by s _t State selection action a _t The obtained profit value, Q ^old (s _t ，a _t ) Represents the Q value before update, Q ^new (s _t ，a _t ) The Q value generated by the update is represented; alpha (0 < alpha < 1) represents a learning factor; r(s) _t ，s _t+1 ，a _t ) Representing the system state by s _t Through action a _t Transfer to s _t+1 The state-obtained reward value represents the benefit brought by the state change of the energy storage system, and the attention is paid to the fact that when the SOC of the energy storage system does not meet the constraint requirement, a penalty factor in a charge-discharge state needs to be added Is->μ _c and μ_d The method is characterized in that the profitability of the charging or discharging action is selected in the current state, and the profitability of the charging or discharging is selected at different time; p(s) _t ，s _t+1 ，a _t ) Representing the system state by s _t Through action a _t Transfer to s _t+1 Probability of state; gamma represents the attenuation coefficient gamma e (0, 1); p (P) _k Representing the price of the electric quantity; s is(s) _t Indicating the state (charge, discharge, idle) in which the energy storage system is at the moment.

S5.35: the Q table is updated. And updating the Q value of the current state according to the result of the calculation of the Q value, arranging the actions at each moment point according to the updated Q value, and finally maintaining an optimal action sequence until the whole process is finished, wherein the optimal Q value corresponds to the SN value in the objective function, the SN value can be used as an influence parameter for solving the benefits of the energy storage system, the objective function value is finally calculated, a final action sequence is generated, and the energy storage system executes according to the action sequence.

The goal of Q learning is to adaptively learn and find the optimal strategy according to the iterative update of state-action values (Q (s, a)), solving the action sequence decision problem. As a model-free reinforcement learning algorithm, Q learning is trained through interaction with the environment, and the Q learning consists of four parts of states, actions, strategies and environment and can be expressed as <S，A，L，E>Is a four-tuple of (2). In the Q learning algorithm, the energy storage system is calculated from the current state s _t Select the next action (charge, discharge, idle) a _t To transition to the next state s _t+1 The reward value Q (s, a) of the energy storage system is obtained to obtain an optimal action sequence pi, and the purpose of self-adaptive modulation energy storage strategy is achieved. In addition, the action selection process of each step is based on greedy strategy pi (s _t ) Expansion (the expression is as follows), and selecting the action corresponding to the maximum Q value.

The effect of the self-adaptive optimized energy storage method based on reinforcement learning can be further illustrated by the following comparative simulation experiment.

Simulation conditions

All simulation experiments performed by the embodiment of the invention are based on Core I7-8550 1.8GHz CPU,8GB memory, notebook computers of Windows 7 operating systems and Matlab 2020a coded environments.

(II) simulation Contents

The embodiment of the invention sets three different scenes for carrying out the test, and enterprises do not consider the access of the energy storage system under the first scene; in the second case, enterprises consider the access of energy storage systems with different models, and solve the charging and discharging strategies of the energy storage systems by using a traditional energy storage scheduling algorithm; under the third condition, enterprises consider the access of energy storage systems with different models, and solve the charge and discharge strategies of the energy storage systems by using the energy storage scheduling algorithm based on Q learning according to the embodiment of the invention. The settings of scenario one and scenario two are used to verify the methods presented herein. The setting of the third scenario can be used for verifying the economy of the energy storage model and the effect of the self-adaptive optimized energy storage (RLES) algorithm and the Traditional Energy Storage (TES) algorithm on energy storage scheduling.

The simulation process selects the power data of a large mechanical manufacturing enterprise in the berg as a test example, and the power consumption condition of the enterprise is counted from each time period (0:00-24:00) and two angles per day, as shown in fig. 15. As can be seen from fig. 15, the power consumption load of the enterprise shows obvious peak-valley characteristics in 5 months to 8 months, and the power consumption is full load in a single day, so that the peak-valley difference is large. The parameters of the energy storage product and the peak-valley period price of the electricity consumption in the area are shown in tables 1 and 2.

TABLE 1 energy storage system parameter Table

Index (I)Model number	CN-1	CN-2	CN-3	CN-4	CN-5
						Rated power (kW)	30	30	50	100	200
Battery capacity (kW h)	40	50	100	200	500
						Service life of equipment	For 10 years	For 10 years	For 10 years	For 10 years	For 10 years
Charge and discharge efficiency	90％	90％	90％	90％	90％
						Multiplying power of energy storage system	2	2	2	2	2
Cost of electricity	4 yuan/kW.h	4 yuan/kW.h	4 yuan/kW.h	4 yuan/kW.h	4 yuan/kW.h
						State of charge	(0.1，1)	(0.1，1)	(0.1，1)	(0.1，1)	(0.1，1)
Selling price	20 ten thousand yuan	30 ten thousand yuan	60 ten thousand yuan	66 ten thousand yuan	131 ten thousand yuan

TABLE 2 Peak-to-valley electricity price Meter

Category(s)	Time division	Electricity price (Yuan/kW h)
			Peak	09:00-12:00、19:00-22:00	1.367
Flat section	08:00-09:00、12:00-19:00、22:00-24:00	0.81
			Trough of low grain	00:00-8:00	0.33

The historical data of the enterprises from 2022 month 5 to 2022 month 9 are used as a test data set to evaluate the economy of the enterprises adopting the energy storage system, and the energy storage capacity of the enterprises and the analysis conditions of the energy storage economy of the enterprises under different energy storage configurations are shown in tables 3 to 5.

Table 3 annual energy saving electric charge (unit: yuan) for enterprises after energy storage

	Scene 2	Scene 3
			CN-1	39119.87	58348.17
CN-2	40117.15	54335.8
			CN-3	48249.35	63141.35
CN-4	48249.35	63141.35
			CN-5	48249.35	63141.35

TABLE 4 energy storage economy analysis results based on conventional TES algorithm

TABLE 5 energy storage economy analysis results based on RLES algorithm of an embodiment of the invention

According to the simulation results, after the enterprise selects to install the energy storage system, the electricity cost of the enterprise is obviously reduced, and the cost saved by the enterprise tends to be stable after the capacity of the energy storage battery is increased due to the self capacity limit of the enterprise. The enterprises can recover the input cost when adding the energy storage products of three types of CN-1, CN-2 and CN-3. The return on investment of the three under the TES algorithm is above 20%, and the return on investment of the CN-1 and the CN-2 is above 70%. And the return on investment of the three under the RLES algorithm is over 50 percent, and the return on investment of the CN-1 and the CN-2 exceeds 100 percent. And by combining comparison analysis, after the energy storage product of the CN-1 model is additionally arranged, the income of enterprises can be maximized. In addition, as can be seen from further analysis of simulation results, when the energy storage capacity is changed under the condition of respectively considering different configurations (energy storage power and energy storage capacity), the economic benefit of the enterprise is not obviously changed. And when the energy storage power is changed, the economic benefit of the enterprise is obviously changed. Further analysis shows that the energy storage investment and benefit change curve is in a trend of descending and ascending along with the change of the energy storage capacity configuration, and the energy storage benefit trend is opposite. Therefore, when the energy storage system is additionally arranged, the enterprise needs to consider proper parameter configuration, and the maximization of the income is ensured.

Then, the performance of solving the energy storage optimization scheduling problem by the RLES algorithm provided by the embodiment of the invention is verified (the parameter setting is shown in table 6), and the embodiment verifies the algorithm from two aspects of index performance (index such as battery utilization rate and annual income) and scheduling strategy in the energy storage scheduling process.

Table 6Q learning algorithm initialization parameter values

Parameters (parameters)	Learning factor alpha	Discount factor gamma	Greedy policy pi
				Value taking	0.1	0.9	0.9

Table 7 algorithm energy storage scheduling performance analysis

Performance index	Q learning algorithm	Traditional algorithm
			Battery utilization	66.1％	57.1％
Annual income	4.1 ten thousand yuan	2.18 ten thousand yuan

The results in Table 7 show that: under the same condition, the energy storage optimization based on the RLES algorithm and the performance index analysis of the traditional energy storage optimization algorithm show that the RLES algorithm has more advantages in performance, the battery utilization rate of the system is remarkably improved, and higher annual comprehensive income can be obtained.

In terms of verification of algorithm scheduling policy, a certain working day of an enterprise is selected as an example, please refer to fig. 16, 17 and 18, fig. 16 is an enterprise power load comparison chart of the method and the conventional TES algorithm according to the embodiment of the present invention; FIG. 17 is a scheduling policy performance analysis of the method of an embodiment of the present invention and a conventional TES algorithm; FIG. 18 is a graph of convergence speed of a method of an embodiment of the invention and a conventional TES algorithm. As can be seen from FIG. 16, after the enterprises are additionally provided with energy storage, both algorithms can realize peak filling and valley filling, and the power load of the enterprises is reduced. As can be seen from FIG. 17, the optimized scheduling of TES algorithm and RLES algorithm can realize the peak filling of the enterprise energy storage pin. At 0:00-8: during the period 00, the electricity price is lower, the energy storage system starts to charge the battery, and when the battery reaches a charge constraint value, the SOC stops charging and keeps an idle state. At 9:00-12: during the period 00 and 19:00-21:00, electricity prices rise, and the energy storage system begins to discharge. During the period 22:00-24:00, the battery charge of the energy storage system reaches a constraint value, and the idle state is maintained. In the energy storage strategy optimization process, the battery charge-discharge action strategy of the RLES algorithm and the battery charge-discharge action strategy of the TES algorithm are respectively in 2 points, 3 points, 6-8 points, 11 points, 13-15 points, 17 points and 20-21 points with deviation in action sequences. Further analysis shows that the addition of the Q learning algorithm improves the actual benefits of the energy storage system, optimizes the scheduling strategy and verifies the scheduling performance of the RLES algorithm. Fig. 18 shows an algorithm convergence curve comparison of the RLES algorithm proposed in the embodiment of the present invention and the conventional TES algorithm when solving the energy storage scheduling problem. As shown in the figure, the TES algorithm starts to converge at 68 iterations, while the RLES algorithm starts to converge at 13 iterations, and the algorithm provided by the invention can quickly obtain the optimal solution and has better algorithm convergence performance. The experimental results show that: the method provided by the embodiment of the invention can effectively solve the energy storage optimization scheduling problem, and the method is obviously better than an algorithm for comparison in the aspects of energy storage system economy and energy storage action strategy. In the energy storage scheduling process, the improvement of energy storage scheduling efficiency is realized by the quote of the Q learning algorithm, and the total investment of electric quantity of enterprises is reduced.

In summary, the self-adaptive optimization energy storage method based on reinforcement learning starts from two aspects of energy storage economy and optimization strategy, considers a plurality of actual factors such as power, floating electricity price, user demand, peak-valley constraint and the like, provides a double-layer planning model which aims at energy storage return on investment and energy storage income, outputs an action sequence of a scheduling layer of an energy storage system as an input of the planning layer in view of environmental change, emergency and the like, and feeds back the action sequence to the scheduling layer through system economy evaluation of the planning layer, so that iteration is repeated, and the self-adaptive learning process of the energy storage system is realized. The invention uses the reinforcement learning method to drive the algorithm to evolve, generates the energy storage scheduling sequence and calculates the profit value, the algorithm does not need priori knowledge, can realize the system dynamic learning evolution by self-adaptive parameter adjustment, is globally superior to the traditional scheduling algorithm, overcomes the defect of insufficient generality of the traditional algorithm, reduces the dependence of the traditional algorithm on scenes, can be applicable to different scenes, and finally obtains the optimized scheduling strategy.

In addition, the invention introduces action sequence re-planning triggering conditions, which are divided into three types, namely: action candidate set change, burst state, system maintenance. Therefore, the energy storage system can be adjusted in real time in response to a complex environment, and under the condition of optimal scheduling of the battery energy storage system at the user side in the dynamic environment, an efficient energy storage scheduling algorithm is researched, so that efficient solution of a dynamic and complex energy storage scheduling problem is ensured, and self-adaptive real-time dynamic energy storage scheduling is realized.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. An adaptive optimized energy storage method based on reinforcement learning is characterized by comprising the following steps:

2. The reinforcement learning based adaptive optimal energy storage method of claim 1, wherein the demand charge model is expressed in terms of a maximum investment return rate max (E/C), where C represents the investment cost of a user installing the energy storage system and E represents the energy storage system return.

3. The reinforcement learning-based adaptive optimal energy storage method of claim 1, wherein S2 comprises:

energy storage load constraint:

Energy storage multiplying power constraint: e (E) _max ＝β*P _max

wherein ,δ_max Represents the maximum discharge power ρ _max Represents maximum charging power, P _t Represents the energy storage load at time t, S _t Representing the state of charge of the battery at time t, L _min,t 、L _max,t Represents the minimum value and the maximum value of the load of the energy storage system at the moment t, E _max Represents the capacity of the energy storage system, beta represents the charge-discharge multiplying power of energy storage, and P _max Representing the rated power of the energy storage system;

4. The reinforcement learning-based adaptive optimal energy storage method of claim 3, wherein S2.2 comprises:

5. The reinforcement learning-based adaptive optimal energy storage method of claim 3, wherein S3 comprises:

6. The self-adaptive optimized energy storage method based on reinforcement learning according to claim 5, wherein in the power consumption peak section, the discharge probability > the idle probability > the charge probability; in the electricity utilization valley section, the charging probability > the idle probability > the discharging probability; in the active level segment, the idle probability > the charge probability > the discharge probability.

7. The reinforcement learning-based adaptive optimal energy storage method of claim 5, wherein S5 comprises:

8. The reinforcement learning-based adaptive optimal energy storage method of claim 7, wherein the objective function is expressed as:

wherein ,α ₁ + ₂ ＝1，/>indicating that if the energy storage system k is in a charged state during the t period +.>Otherwise-> Indicating that if the energy storage system k is in a discharge state within the t period of time,/is >Otherwise-> Representing the expenditure of the energy storage system k to change from an idle state to a charged state in a t period; />Representing the benefit of the energy storage system k from an idle state to a discharge state in a t time period; c (C) _k Representing the investment cost, T, of the energy storage system k _k Indicating the period of time that the energy storage system k needs to schedule a charging and discharging action.

9. The reinforcement learning-based adaptive optimal energy storage method of claim 7, wherein S5.3 comprises: