CN116780506A

CN116780506A - Household micro-grid energy management method, device, equipment and storage medium

Info

Publication number: CN116780506A
Application number: CN202310588303.9A
Authority: CN
Inventors: 甘振宁; 田杰; 杜进桥
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-09-19

Abstract

The invention provides a household micro-grid energy management method, which comprises the following steps: acquiring real-time state data of the home micro-grid; according to the acquired real-time state data, a trained neural network prediction model is adopted to acquire prediction data of the next time period, wherein the prediction data comprises generating capacity and electricity price data of the next time period; the trained neural network prediction model is a neural network prediction model based on a transducer algorithm; and forming state input according to the real-time state data and the predicted data, inputting the state input into an energy management strategy model based on an improved MAISAC algorithm, solving the optimal actions corresponding to all types of electric loads in the household micro-grid, and controlling all the electric loads by the corresponding optimal control. The invention also discloses a corresponding system, computer equipment and a storage medium. By implementing the invention, the safety and stability of energy management of the household micro-grid can be improved.

Description

Household micro-grid energy management method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of grid-connected type household micro-grids comprising photovoltaic power generation, in particular to a household micro-grid energy management method, device, equipment and storage medium.

Background

With the rapid conversion of the power grid to intelligence and green, the household micro-grid mainly comprising renewable energy sources such as wind energy, solar energy and the like is widely applied to the intelligent power grid, so that the economy and flexibility of the power grid are effectively improved, and the household micro-grid is a main direction of the development of the household micro-grid in the future. However, the intermittence and uncertainty specific to renewable energy sources, the large number of household appliances, the variable electricity demand and other factors present great challenges for energy management of the home micro-grid. In addition, with the development of power electronics, the number of household appliances and the power consumption demand are continuously increasing, and the energy management difficulty of the household micro-grid is also increased.

For the energy management problem of the home micro grid, researchers have proposed a large number of energy management methods, which can be roughly classified into 3 categories: energy management methods based on optimization, heuristic and reinforcement learning. The energy management method based on optimization comprises linear programming, dynamic programming and the like, and is high in calculation cost for complex management systems such as a home micro-grid; the heuristic-based energy management method cannot obtain an optimal control effect due to interference of environmental uncertainty. Therefore, optimization-based and heuristic-based energy management methods are difficult to achieve unification of real-time and optimality simultaneously.

In recent years, energy management methods based on reinforcement learning (reinforcement learning, RL) and deep reinforcement learning (deep reinforcement learning, DRL) that can achieve optimality and real-time trade-off have become a new research hotspot with the rapid development of computer technology. Currently, deep Q-Network (DQN) and other Deep reinforcement learning algorithms have achieved a certain result in the field of energy management of home micro-grids.

However, the existing deep reinforcement learning-based method is mainly an energy management method of a conventional deep Q network, and the deep Q network has a problem that control effect is deteriorated due to overestimation of Q value. Further, since the action of the depth Q network is discrete, it is necessary to discretize it when controlling the power by such a continuous amount, and a discrete error inevitably occurs. Meanwhile, for traditional deep reinforcement learning, the sample is updated from an experience buffer, so that the sample utilization rate is low, the training time is long, and the effect of the deep reinforcement learning is limited. Finally, at present, a plurality of algorithms do not effectively eliminate uncertainty caused by renewable energy power generation, and severely restrict the control effect of deep reinforcement learning. Therefore, how to design an energy management method which can meet the practical application requirements and has excellent control effect is an urgent problem to be solved.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a household micro-grid energy management method, device, equipment and storage medium, which can further reduce electricity consumption and improve the running safety and stability of a household grid on the basis of meeting the daily power supply requirement of the household.

The technical scheme adopted by the invention is that the energy management method of the household micro-grid is applied to the grid-connected household micro-grid comprising photovoltaic power generation, and at least comprises the following steps:

step S10, acquiring real-time state data of the household micro-grid;

step S11, according to the acquired real-time state data, a trained neural network prediction model is adopted to acquire prediction data of the next time period, wherein the prediction data comprises generating capacity and electricity price data of the next time period; the trained neural network prediction model is a neural network prediction model based on a transducer (deep learning based on a self-attention mechanism) algorithm;

and step S12, forming state input according to the real-time state data and the prediction data, inputting the state input into an energy management strategy model based on an improved multi-agent flexible action-evaluation (MAISAC) algorithm, solving the optimal action corresponding to each type of electric load in the household micro-grid, and controlling each electric load by the corresponding optimal control.

Preferably, the method further comprises the step of pre-training and obtaining the trained neural network prediction model, comprising:

selecting photovoltaic power generation capacity and local electricity price data of one year in a local grid-connected household micro grid containing photovoltaic power generation;

after the sequences of the original electricity price, the generated energy and the like are subjected to equidistant treatment by using a resampling and interpolation method, noise reduction is performed by adopting a wavelet transformation method;

dividing data into a training set and a testing set according to a fixed proportion, wherein the training set is used for training a neural network prediction model based on a transducer algorithm, and the testing set is used for verifying the effect of the neural network prediction model; simultaneously determining input factors on the training set;

and predicting a future electricity price and generating capacity sequence by using the neural network prediction model, calculating an absolute error with a real label, evaluating a prediction result according to a root mean square error and an average absolute error, and obtaining a trained neural network prediction model after convergence of the evaluation value.

Preferably, the step S12 further includes:

establishing a MAISAC-based energy management policy model, comprising:

establishing a state vector s _t ：

wherein ,is the electricity price at time t, is->The electricity price is predicted at the time t+1, P _t ^PV Is the solar energy generating capacity at the moment t->The solar energy predicted generating capacity at time t+1 is P _dem The total required power of the household micro-grid at the moment t;

the following Action vector actions are established:

wherein, for the uninterruptible power equipment, the action of the uninterruptible power equipment is only one opening action; for time-variable devices, action T thereof _delay For a deferred time; for a power variable device, the action is a continuous action between the minimum value and the maximum value of the required power, wherein and />Representing minimum and maximum required power of the power variable device; />Maximum power which can be provided for the electric automobile;

the following reward function is established:

wherein the first term on the right side of the equation is a limit on electricity price, P _dem Is the total household demand power at time t, P _PV The total solar energy generating capacity at the moment t; the second term is the gap between the power adjusted by the power adjustable device and the desired power,for the desired power, +.>Is the allocated power; the third item is the difference between the time after the time adjustment of the time-adjustable device and the time of the original electricity,/>Is the adjusted time, < >>Is the original time; the fourth item is the difference between the adjusted power and the expected power of the electric car, +.>Desired power for electric vehicle, < >>Is the allocated power; alpha, beta, gamma and sigma are the pre-coefficients of the three terms respectively.

Preferably, the step S12 further includes:

the motion entropy value is introduced into the objective function as follows:

wherein ,π^* Is an optimal policy function, s _t Is a state at time t, a _t Is an action at time t, τ _π Is the trajectory distribution under policy pi, r is the reward, gamma is the discount factor,is the entropy under the current strategy, and alpha is the entropy regularization coefficient.

Preferably, the step S12 further includes:

in the solving process, each time step is represented by the environmental state s _t For input, the agent randomly selects an action from the set action set

Executing the selected action, feeding back rewards to the agent by the environment, and immediately starting from s _t Conversion to s _t+1 Generating environment and agentIs(s) _t ,a _t ,s _t+1 R) and stored in the experience replay buffer D;

and the intelligent agent continuously adjusts the selected output action according to the principle of pursuing the maximum accumulated rewards to obtain the maximum accumulated rewards, and repeats the process until the rewarding function converges, so as to obtain the optimal action corresponding to each type of electricity load in the household micro-grid.

Preferably, the step S12 further includes:

a priority experience playback method is adopted to randomly select the minimum batch of experience samples from an experience playback buffer to continuously train the intelligent agent.

Preferably, further comprising determining the sampling frequency of the minimum lot size in a manner that emphasizes recent experience:

the sampling frequency of the minimum batch is determined in the way:

let K be the total number of small updates, 1.ltoreq.k.ltoreq.K for the kth update, the number of samples from the most recent experience, c _k The definition is as follows:

c _k ＝max{N _buffer ·η ^k ,c _min } (5)

wherein eta is E (0, 1)]Is a superparameter for determining the importance of the most recent experience, c _min Is allowed c _k A minimum value;

recalculating by adopting an annealing method to automatically adjust the sampling frequency of the latest experience:

wherein ,η₀ and η_T Respectively representing an initial value and a final value of eta; e represents the total number of rounds of training and E represents the current number of rounds.

Correspondingly, in another aspect of the present invention, there is also provided a home micro-grid energy management system, applied to a grid-connected home micro-grid including photovoltaic power generation, which at least includes:

the real-time state data acquisition unit is used for acquiring the real-time state data of the household micro-grid;

the data prediction unit is used for obtaining prediction data of the next time period by adopting a trained neural network prediction model according to the obtained real-time state data, wherein the prediction data comprises power generation capacity and electricity price data of the next time period; the trained neural network prediction model is a neural network prediction model based on a transducer algorithm;

and the energy management unit is used for forming state input according to the real-time state data and the prediction data, inputting the state input into an energy management strategy model based on a MAISAC algorithm, solving the optimal actions corresponding to all types of electric loads in the household micro-grid, and controlling all the electric loads by the corresponding optimal control.

Accordingly, in a further aspect of the present invention, there is also provided a computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements a method as described above when executing said computer program.

Accordingly, in yet another aspect of the present invention, there is also provided a computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the method as described above.

The embodiment of the invention has the following beneficial effects:

the invention provides a household micro-grid energy management method, device, equipment and storage medium. Household micro grid energy is managed by a Transformer algorithm and a modified MAISAC algorithm. Performing time sequence prediction on the generated energy and electricity price of the renewable energy by using a transducer so as to overcome the uncertainty of renewable energy power generation; then, the predicted real-time electricity price and renewable energy generating capacity are transmitted to a MAISAC algorithm, deep reinforcement learning is assisted to find an optimal energy management strategy, and electricity requirements among different electricity loads are reasonably distributed, so that the optimality and economy of household electricity are realized;

according to the invention, electricity price and renewable energy generating capacity can be predicted more accurately, and uncertainty caused by accessing renewable energy is avoided;

in the invention, by introducing MAISAC algorithm, the action can be a continuous variable, so that discrete error is avoided, and the exploration capacity of the algorithm is enhanced by the introduction of entropy;

in the invention, by introducing a preferential experience playback and emphasizing a near-term experience method in the MAISAC algorithm, faster convergence speed and better optimization effect can be realized.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a main flow chart of one embodiment of a home micro grid energy management method provided by the present invention;

fig. 2 is a schematic diagram of a home micro-grid architecture according to the present invention;

FIG. 3 is a schematic diagram of a principle framework of a transducer algorithm according to the present invention;

FIG. 4 is a schematic diagram of an energy management strategy model based on MAISAC algorithm in accordance with the present invention;

fig. 5 is a main flow chart of one embodiment of a home micro grid energy management system provided by the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a schematic diagram of a main flow of an embodiment of a home micro grid energy management method according to the present invention. As shown in fig. 2 to 4, in this embodiment, the method is applied to a grid-connected type home micro-grid including photovoltaic power generation, and at least includes the following steps:

step S10, acquiring real-time state data of the household micro-grid;

step S11, according to the acquired real-time state data, a trained neural network prediction model is adopted to acquire prediction data of the next time period, wherein the prediction data comprises generating capacity and electricity price data of the next time period; the trained neural network prediction model is a neural network prediction model based on a transducer algorithm;

and step S12, forming state input according to the real-time state data and the prediction data, inputting the state input into an energy management strategy model based on a MAISAC algorithm, solving the optimal actions corresponding to all types of electric loads in the household micro-grid, and controlling all electric loads by the corresponding optimal control.

For a better understanding of the method of the present invention, the following description of each of the above steps will be given in detail with reference to fig. 2 to 4.

In a specific embodiment, the method is applied to a grid-connected type household micro-grid comprising photovoltaic power generation as shown in fig. 2. The electric energy of the household micro-grid is mainly provided by photovoltaic, when the electric energy provided by the photovoltaic can not meet the electric energy demand of the household, the electric automobile can reversely supply power to the household, the electric energy of the power grid can also be used, and when the electric energy provided by the photovoltaic is larger than the required electric energy, the household can sell to the power grid according to market price.

The household electricity load is mainly divided into four types according to the electricity utilization characteristics, and each type of electric appliance can be distributed with corresponding intelligent bodies for electricity utilization control by an energy management strategy. Meanwhile, the real-time electric quantity of the photovoltaic array is input into an energy management strategy based on deep reinforcement learning as a state by predicting the next-hour electric quantity through a neural network so as to reduce uncertainty caused by renewable energy sources. In addition, the scheme hopes that the power grid can be utilized for supplying power under the condition of low electricity price, so that the neural network still needs to predict the electricity price trend at the next moment in advance.

In the invention, a trained neural network prediction model is adopted to obtain prediction data (electricity price and electric quantity) of the next time period. The trained neural network prediction model is based on a transducer algorithm.

As shown in fig. 3, the transducer is a neural network structure based on a multi-head attention mechanism, which is better at learning long-term dependencies than RNN neural networks, and is easier to parallelize. The input data contains position information of different data after passing through the embedded layer and adding position coding vectors, and the coding definition is as follows:

where pos represents the position of the current data in the sample; d represents the dimension of the position-coding vector; i denotes a position index of each value, even positions are sine-coded, and odd positions are cosine-coded. And then the obtained data matrixes are transmitted into an encoder, and the similarity among the data is obtained after the data matrixes are multiplied by different weights through a multi-head attention mechanism to obtain Q, K, V. And then, the network trained by the multi-layer network is prevented from degrading through residual connection, the data is transmitted to a feedforward neural network, and the result obtained through parallel calculation is input to the next encoder. The coding information matrix is obtained through N times of coding operation and is transmitted to a decoder. The decoder predicts the next data based on the currently predicted data. Comparing the transducer output sequence with an output tag sequence corresponding to the training sample, and minimizing a Kullback-Leibler (KL) divergence loss function to obtain optimal network parameters. Let P (X) be the tag sequence and Q (X) be the transducer predicted output sequence. The KL divergence calculation formula is shown as formula (3).

in the formula ：q(x_i ) Representing an ith load predicted value in the transform predicted output sequence; p (x) _i ) Representing q (x) _i ) Load values at corresponding times. D (D) _KL (P||Q) KL divergence loss function representing both P (X) and Q (X), D _KL The smaller (P Q) the closer the two sets of data distributions are, the closer the distribution of Q (X) is approximated by iteratively training the neural network.

It will be appreciated that in the method of the present invention, the step of pre-training and obtaining the trained neural network prediction model is required, and specifically includes:

and a data collection step: selecting photovoltaic power generation capacity and local electricity price data of one year in a local grid-connected household micro grid containing photovoltaic power generation;

a data preprocessing step: after the sequences of the original electricity price, the generated energy and the like are subjected to equidistant treatment by using a resampling and interpolation method, noise reduction is performed by adopting a wavelet transformation method;

factor selection and model training: dividing data into a training set and a testing set according to a fixed proportion (such as 8:2), wherein the training set is used for training a neural network prediction model based on a transducer algorithm, and the testing set is used for verifying the effect of the neural network prediction model; meanwhile, input factors are determined on the training set, and in a specific example, grid search and ten-fold cross validation can be adopted on the training set to determine proper input factors as a model;

modeling effect evaluation: and predicting a future electricity price and generating capacity sequence by using the neural network prediction model, calculating absolute errors with real labels, evaluating a prediction result according to root mean square errors (Root mean square error, RMSE) and average absolute errors (Mean absolute error, MAE), and obtaining a trained neural network prediction model after the evaluation value converges.

The calculation formulas of RMSE and MAE are as follows:

wherein ,N_test Is the total number of data, y _i Is the value of the prediction that is to be made,is a tag value.

After the predicted data is obtained, the invention adopts an energy management strategy model based on MAISAC algorithm to carry out energy management. Among them, the maiac algorithm involves a deep reinforcement learning strategy, which generally consists of three elements: environment, agents, and rewards functions. In the household micro-grid energy management problem, the environment is composed of household appliances together. According to the scheme, the household power consumption load is divided into the following four types according to different power consumption demands of different household appliances: 1) Uninterruptible power equipment such as refrigerators and the like; 2) Power-adjustable devices such as air conditioners, heating systems, etc.; 3) Time-variable devices, i.e. devices that can adjust the time of use of the device, such as washing machines, dishwashers, etc.; 4) And the electric automobile is used for storing household batteries of electric energy.

The intelligent agent is a method to be trained for executing relevant control, the rewarding function needs to be set aiming at specific problems, the intelligent agent continuously improves strategies according to the rewarding function from the environment in the exploration process, and various items and relevant coefficients of the rewarding function need to be continuously adjusted according to actual training conditions during specific setting. In addition, the state observance quantity of the intelligent agent and the attribute and dimension of the output action are required to be properly set, and the learning training effect is greatly affected.

FIG. 4 is a schematic diagram of an energy management strategy model based on MAISAC algorithm according to the present invention; in this model, the interaction process of the environment and the agent is: each time step, at an ambient state s _t Then, the intelligent agent randomly selects one action from the set action set and outputs the action to the environment, and the ringThe state of the environment is then from s _t Conversion to s _t+1 And meanwhile, the corresponding rewards of the action are fed back to the intelligent agent according to the designed rewarding function, the intelligent agent continuously adjusts the selected output action according to the principle of pursuing the maximum accumulated rewards to obtain the maximum accumulated rewards, and the process is repeated until the rewarding function converges.

It will be appreciated that in said step S12 further comprises:

(one), first build energy management strategy model based on MAISAC, its key element confirm include:

A. establishing a state vector s _t ：

Wherein the state s _t Reflecting the current situation of the home micro grid system,is the electricity price at time t, is->The electricity price is predicted at the time t+1, P _t ^PV Is the solar energy generating capacity at the moment t->The solar energy predicted generating capacity at time t+1 is P _dem The total required power of the household micro-grid at the moment t;

it will be appreciated that, in order to facilitate the neural network fitting, reduce the calculation amount and increase the training speed, in this embodiment, the state is normalized, that is, each component in the state is reduced to be between [ -1,1], and the arithmetic mean (x) and standard deviation std (x) are calculated according to the standard normalization general formula, that is, as follows:

wherein mean (x) and std (x) represent an arithmetic mean and standard deviation of input state data, respectively. The arithmetic mean (x) and standard deviation std (x) are calculated as follows:

B. the following Action vector actions are established:

the actions represent power dispatching conditions of each type of household appliances and electric automobiles, and different actions are set according to different household appliance power utilization types.

For uninterruptible power equipment, such as a refrigerator, daily normal operation of the uninterruptible power equipment needs to be ensured, and the action of the uninterruptible power equipment is only one opening action;

for time-variable devices, the agent can schedule the time-variable devices from peak time to low electricity consumption peak or low electricity price time to reduce operation cost and relieve electricity consumption force. Because the proposal sets the energy management strategy to repartition the electric quantity distribution every fifteen minutes, the action T thereof _delay For deferred time, it is in the range of [0,96 ]]；

For the power variable device, the aim of reducing the simultaneous operation cost of stable operation can be achieved by flexibly adjusting the charging power thereof, so that the action is continuous between the minimum value and the maximum value of the required power, wherein and />Representing minimum and maximum required power of the power variable device;

the electric automobile can reversely provide electricity for families when the electric quantity of the families is insufficient,maximum power which can be provided for the electric automobile;

C. the following reward function is established:

wherein, the right first term of the equation is the limit on the price of electricity, P _dem Is the total household demand power at time t, P _PV The total solar energy generating capacity at the moment t; the second term is the gap between the power adjusted by the power adjustable device and the desired power,for the desired power, +.>Is the allocated power; the third item is the difference between the time after the time adjustment of the time-adjustable device and the time of the original electricity,/>Is the adjusted time, < >>Is the original time; the fourth item is the difference between the adjusted power and the expected power of the electric car, +.>Desired power for electric vehicle, < >>Is the allocated power; alpha, beta, gamma and sigma are the pre-coefficients of the three terms respectively.

(II) further comprising formulating an update strategy of MAISAC algorithm at said step S12:

SAC, unlike conventional deep reinforcement learning algorithms, considers maximum entropy while pursuing maximum jackpot. Entropy is an index that measures random policy uncertainty, and the randomness of the policy increases with the increase of entropy in the RL, which makes the SAC algorithm more exploratory. Therefore, the motion entropy value is introduced into the objective function as follows:

For a fixed policy, the soft Q function may be iteratively calculated by applying the modified bellman backup operator:

wherein ,

in addition, the soft Q function may be performed by a neural network Q with a parameter θ _θ Fitting, θ can be trained by minimizing soft bellman residuals:

where D is an experience playback cache, storing past experiences.Is a state cost function, s _t+1 Is the next time state. />Implicit parameterization can be achieved by soft Q functions and updates can be done with random gradients:

is a parameter of the target soft Q function, phi is a parameter of the strategy function, which can be obtained by minimizing KL divergence:

where pi is the set of policies, D _KL Is the divergence, Z ^π (s _t ) For normalizing the distribution. Meanwhile, the activity is obtained by adopting a re-parameter technology, and the method is as follows:

wherein f function outputs mean and variance, ε _t Is the input noise vector, which can be sampled from a fixed distribution, μ represents the mean and σ represents the variance. Whereas the update of phi can be derived from equation (17) as follows:

the gradient was found to give the following form:

wherein ,λ_π Representing soft update coefficients.

(III) in the solving process, the step S12 further comprises:

Executing the selected action, feeding back rewards to the agent by the environment, and immediately starting from s _t Conversion to s _t+1 Generating interaction data(s) of the environment and the agent (s _t ,a _t ,s _t+1 R) and stored in the experience replay buffer D;

(fourth), the step S12 further includes:

Specifically, the SAC algorithm may store experiences in an experience replay buffer, and then randomly select a fixed number of experiences from the experience replay buffer to continuously train the agent.

However, standard experience replay methods are based on uniform sampling, ignoring the importance of different experiences. Aiming at the problem, a priority experience playback method is provided, and the training efficiency of the SAC algorithm is improved through random priority sampling. The weight of the priority samples is related to the absolute value of the Time Difference (TD) error:

the first two items on the right side of the equal sign represent the target Q network, and the last item represents the current Q network. For the ith data, its priority value is defined as follows:

p _i ＝|δ _i |+∈ (23)

where ε is a small value that prevents the sampling probability from approaching 0. For the ith data, the sampling probability is calculated as follows:

wherein ,α_per Is a hyper-parameter used to control the influence of the priority value on the sampling probability. However, non-uniform sampling causes different data to have different sampling probabilities, resulting in algorithmic prediction errors. Thus, the sample weights should be adjusted to counteract the error caused by the different sampling probabilities. For these data, the importance sampling weights are defined as:

wherein ,N_buffer Total data number, beta, for empirical return visit _per E (0, 1) is a super parameter that needs to be adjusted.

(v) further comprising determining the sampling frequency of the minimum lot size in a manner that emphasizes recent experience:

c _k ＝max{N _buffer ·η ^k ,c _min } (27)

It can be understood that in the method provided by the invention, the prediction of future electricity prices and solar energy power generation capacity based on the Transformer is input into an energy management strategy based on deep reinforcement learning;

a home micro grid energy management method based on MAISAC.

The scheme provides a household micro-grid energy management strategy based on a transducer-MAISAC, wherein the transducer is more accurate compared with a traditional prediction algorithm, and meanwhile, compared with other existing energy management methods, the energy distribution of the MAISAC has the advantages of better optimization effect, strong instantaneity and good adaptability.

Compared with a deep Q learning algorithm commonly used for a household micro-grid, the MAISAC algorithm has the advantages of being strong in exploration capability, good in robustness, capable of avoiding Q value errors and the like, and meanwhile, suitable for continuous variables such as power control.

In this embodiment, an improved multi-agent SAC algorithm is proposed, that is, a preferential empirical playback and emphasis of the most recently empirical method is used to accelerate the convergence speed of the SAC algorithm and enhance the control effect thereof. For traditional reinforcement learning algorithms, empirical playback is a key method that can significantly improve the convergence performance of the algorithm. Experience playback requires calling experience gained from past iterations to calculate gradient estimates for the current strategy. However, as the number of reinforcement learning exercises increases, the current strategy will deviate from the past experience, which degrades the accuracy of the gradient estimation and undermines the effectiveness of the empirical playback method. The invention adopts a method for emphasizing recent experience, the method utilizes the recent experience with smaller deviation from the current strategy, combines with a method for playing back the prior experience, and learns by preferentially extracting samples with larger value and capable of obtaining larger rewards, thereby improving the control effect of the deep reinforcement learning algorithm.

As shown in fig. 5, a schematic structural diagram of one embodiment of a home micro grid energy management system provided by the present invention is shown. In this embodiment, the home micro grid energy management system 1 is applied to a grid-connected home micro grid including photovoltaic power generation, and the home micro grid energy management system 1 at least includes:

a real-time status data acquisition unit 10 for acquiring real-time status data of the home micro grid;

the data prediction unit 11 is configured to obtain prediction data of a next time period by using a trained neural network prediction model according to the obtained real-time state data, where the prediction data includes power generation amount and electricity price data of the next time period; the trained neural network prediction model is a neural network prediction model based on a transducer algorithm;

the energy management unit 12 is configured to form a state input according to the real-time state data and the prediction data, input the state input into an energy management policy model based on a maiac algorithm, solve an optimal action corresponding to each type of electric load in the home micro grid, and control each electric load with the corresponding optimal control.

For more details, reference is made to the foregoing descriptions of fig. 1 to 4, and details are not repeated here.

Accordingly, in a further aspect of the invention, there is also provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of home microgrid energy management as described in the preceding figures 1 to 4 when executing the computer program. For more details, reference is made to the foregoing descriptions of fig. 1 to 4, and details are not repeated here.

Accordingly, in a further aspect of the present invention, there is also provided a computer readable storage medium having instructions stored thereon which, when executed by a processor, implement the steps of the method as described above with respect to fig. 1 to 4. For more details, reference is made to the foregoing descriptions of fig. 1 to 4, and details are not repeated here.

The embodiment of the invention has the following beneficial effects:

the invention provides a household micro-grid energy management method, device, equipment and storage medium. Household micro grid energy is managed by a Transformer algorithm and a modified MAISAC algorithm. Performing time sequence prediction on the generated energy and electricity price of the renewable energy by using a transducer so as to overcome the uncertainty of renewable energy power generation; and then, the predicted real-time electricity price and renewable energy generating capacity are transmitted to a MAISAC algorithm, the deep reinforcement learning is assisted to find an optimal energy management strategy, and the electricity demand among different electricity loads is reasonably distributed, so that the optimality and economy of household electricity are realized.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the modules specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements or changes may be made without departing from the spirit and principles of the present invention.

Claims

1. The household micro-grid energy management method is applied to a grid-connected household micro-grid containing photovoltaic power generation, and is characterized by at least comprising the following steps:

step S10, acquiring real-time state data of the household micro-grid;

and step S12, forming state input according to the real-time state data and the predicted data, inputting the state input into an energy management strategy model based on an improved MAISAC algorithm, solving the optimal actions corresponding to all types of electric loads in the household micro-grid, and controlling all electric loads by the corresponding optimal control.

2. The method of claim 1, further comprising the step of pre-training and obtaining the trained neural network predictive model, comprising:

3. The method of claim 2, wherein the step S12 further comprises:

establishing a MAISAC-based energy management policy model, comprising:

establishing a state vector s _t ：

the following Action vector actions are established:

the following reward function is established:

4. The method of claim 3, wherein said step S12 further comprises:

the motion entropy value is introduced into the objective function as follows:

wherein ,π^* Is an optimal policy function, s _t Is a state at time t, a _t Is an action at time t, τ _π Is the distribution of trajectories under the policy pi, _r is a reward, gamma is a discount factor,is the entropy under the current strategy, and alpha is the entropy regularization coefficient.

5. The method of claim 4, wherein the step S12 further comprises:

6. The method of claim 5, wherein the step S12 further comprises:

7. The method of claim 6, further comprising determining a sampling frequency for a minimum lot in a manner that emphasizes recent experience:

c _k ＝max{N _buffer ·η ^k ,c _min } (5)

wherein eta is E (0, 1)]Is a superparameter for determining the importance of the most recent experience, c _min Is allowed c _k Minimum value, N _buffer Total data number for the empirical return visit;

8. The utility model provides a little electric wire netting energy management system of family is applied to and contains photovoltaic power generation's grid-connected family little electric wire netting which characterized in that includes at least:

and the energy management unit is used for forming state input according to the real-time state data and the prediction data, inputting the state input into an energy management strategy model based on an improved MAISAC algorithm, solving the optimal actions corresponding to all types of electric loads in the household micro-grid, and controlling all the electric loads by the corresponding optimal control.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the method of any of claims 1 to 7.