CN115396955A

CN115396955A - Resource allocation method and device based on deep reinforcement learning algorithm

Info

Publication number: CN115396955A
Application number: CN202211019477.5A
Authority: CN
Inventors: 蒋雯倩; 周密; 张焜; 张帆; 陈俊; 罗奕; 林秀清; 唐建林; 赵誉洲; 林晓明
Original assignee: China South Power Grid International Co ltd; Guangxi Power Grid Co Ltd
Current assignee: China South Power Grid International Co ltd; Guangxi Power Grid Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2022-11-25

Abstract

The invention discloses a resource allocation method and a device based on a deep reinforcement learning algorithm, which comprises the following steps: acquiring power load data of each low-voltage user in a low-voltage user group on various load types, wherein the load types comprise uncontrollable loads, transferable loads and interruptible loads; based on a deep reinforcement learning algorithm, obtaining an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource allocation information; the optimal resource allocation strategy comprises an optimal edge calculation frequency value and an optimal channel bandwidth value; and distributing the total computing frequency of the edge computing server to each low-voltage user according to the optimal edge computing frequency value to perform edge computing, and distributing the total channel bandwidth to each low-voltage user according to the optimal channel bandwidth value to realize real-time transmission. The invention can obtain the optimal time delay value, realize the instant processing and real-time transmission of the power load data, and provide the communication support with high bandwidth, low power consumption, high reliability and high stability for the power demand response.

Description

Resource allocation method and device based on deep reinforcement learning algorithm

Technical Field

The invention relates to the technical field of machine learning, in particular to a resource allocation method and device based on a deep reinforcement learning algorithm.

Background

Demand side management is one of methods for improving the utilization efficiency of electric energy, and in the power market environment, optimal decision management in the implementation process of the energy internet user side is an important aspect of the demand side management. The power demand response is a key technology for realizing the energy internet, and means that when the price of a power wholesale market rises or the reliability of a system is threatened, a power consumer changes the inherent conventional power mode after receiving a direct compensation notice of inductive reduction load or a power price rising signal sent by a power supply party, and the power consumer reduces or shifts the power consumption load for a certain period of time to respond to power supply, so that the stability of a power grid is guaranteed, and the short-term behavior of power price rising is inhibited.

The existing scheme for analyzing the power demand response through deep reinforcement learning is usually executed on a local server and cannot meet the timeliness requirement on the power demand response in the energy Internet. With the development of mobile edge networks in recent years, power demand response to such computationally intensive tasks can be effectively addressed. Under the condition of limited computing resources and network resources, how to plan a reasonable resource allocation strategy for a large number of energy internet users is an important problem to be solved in the energy internet.

Disclosure of Invention

The invention aims to provide a resource allocation method and device based on a deep reinforcement learning algorithm, and aims to solve the technical problem that the prior art cannot meet the timeliness requirement on power demand response in an energy internet.

The purpose of the invention can be realized by the following technical scheme:

a resource allocation method based on a deep reinforcement learning algorithm comprises the following steps:

acquiring power load data of each low-voltage user in a low-voltage user group on various load types, wherein the load types comprise an uncontrollable load, a transferable load and an interruptible load;

based on a deep reinforcement learning algorithm, obtaining an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource allocation information; the resource allocation information comprises the total channel bandwidth and the total computing frequency of an edge computing server of a 5G base station, the optimal resource allocation strategy comprises an optimal edge computing frequency value and an optimal channel bandwidth value, and the optimal resource allocation strategy meets a power consumption index, a time delay index and a reliability index;

distributing the total calculation frequency to each low-voltage user according to the optimal edge calculation frequency value so that each low-voltage user can unload the power load data as a calculation task to the edge calculation server for edge calculation;

and distributing the total channel bandwidth amount to each low-voltage user according to the optimal channel bandwidth value so as to realize the real-time transmission of the power load data.

Optionally, the power consumption indicator comprises: edge computation power consumption and transmission power consumption.

Optionally, the edge computing power consumption is:

in the formula,

is U _nm Edge calculation of (2) power consumption, x _nm To offload decision variables of the computational task, 0<x _nm ≤1，s _nm Is U _nm Computing task data size, y _nm Is U _nm The computational complexity of each bit of data, κ is the capacitance switch parameter of the edge computing server,

assignment of edge compute servers to U _nm Of the computing resources of (1), U _nm Power load data at the mth power load for the nth low voltage consumer.

Optionally, the transmission power consumption is:

in the formula,

is U _nm Transmission power consumption of p _nm Is U _nm Data transmission power of r _nm Is U _nm The data transmission rate of (a) is,

b _nm is U _nm Of the channel bandwidth d _nm Is U _nm To edge calculationDistance of server, α denotes path loss exponent, h _nm Is U _nm N is the noise power.

Optionally, the delay indicator includes: the edge calculates the delay and the propagation delay.

Optionally, the edge calculation delay is:

in the formula,

is U _nm The edge of (2) calculates the time delay.

Optionally, the transmission delay is:

in the formula,

is U _nm The transmission delay of (2).

Optionally, the reliability index is expressed by a normal operation probability of the edge computing server:

in the formula, R _nm Is U _nm The unloaded edge calculates the probability of normal operation of the server, A _n Representing the decision accuracy of the nth user communication transmission model,

a failure parameter indicative of an edge computing server,

is U _nm When calculating the edge ofDelay, U _nm Power load data at the mth power load for the nth low voltage customer.

Optionally, the constraint condition means:

and is provided with

In the formula, R _nm Is U _nm The offloaded edges compute the probability of normal operation of the server,

is U _nm Lowest tolerable reliability, E _nm Is U _nm The total power consumption of (a) is,

is U _nm The edge of (2) calculates the power consumption,

is U _nm The power consumption of the transmission of (2),

is U _nm Set maximum power consumption, U _nm Power load data at the mth power load for the nth low voltage consumer.

The invention also provides a resource allocation device based on the deep reinforcement learning algorithm, which comprises the following components:

the load data acquisition module is used for acquiring the power load data of each low-voltage user in the low-voltage user group on various load types, wherein the load types comprise an uncontrollable load, a transferable load and an interruptible load;

the optimal resource allocation strategy determining module is used for obtaining an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource allocation information based on a deep reinforcement learning algorithm; the resource allocation information comprises the total channel bandwidth and the total calculation frequency of an edge calculation server of a 5G base station, the optimal resource allocation strategy comprises an optimal edge calculation frequency value and an optimal channel bandwidth value, and the optimal resource allocation strategy meets a power consumption index, a time delay index and a reliability index;

the computing resource allocation module is used for allocating the total computing frequency to each low-voltage user according to the optimal edge computing frequency value, so that each low-voltage user can unload the power load data serving as a computing task to the edge computing server for edge computing;

and the communication resource allocation module is used for allocating the total channel bandwidth to each low-voltage user according to the optimal channel bandwidth value so as to realize the real-time transmission of the power load data.

Therefore, the invention has the beneficial effects that:

the invention can reasonably distribute edge computing resources and communication resources for low-voltage users under the condition of limited computing resources and network resources, keep the energy consumption and reliability at acceptable stable values, acquire the optimal time delay value, realize the timely processing and real-time transmission of power load data, and provide data processing and communication support with high bandwidth, low power consumption, high reliability and high stability for power demand response. The method utilizes the edge computing server to process mass power data of the low-voltage users in real time, can effectively reduce the computing task time delay of the low-voltage users, improves the resource utilization rate while improving the user experience, realizes the aims of reducing the time delay, optimizing the flow, enhancing the safety and saving the cost, and can meet the timeliness requirement on power demand response in the energy Internet.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the structure of the apparatus of the present invention;

FIG. 3 is a schematic diagram of a communication transmission structure according to the present invention;

FIG. 4 is a schematic diagram of a communication transmission model according to the present invention;

FIG. 5 is a DQN algorithm framework diagram of the present invention;

fig. 6 is a flow chart of the DQN algorithm of the present invention.

Detailed Description

The embodiment of the invention provides a resource allocation method and device based on a deep reinforcement learning algorithm, and aims to solve the technical problem that the prior art cannot meet the timeliness requirement on power demand response in an energy internet.

To facilitate an understanding of the invention, the invention will now be described more fully hereinafter with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

With the rapid increase of human energy demand, energy shortage and environmental problems have caused troubles in various parts of the world. The demand of people on electric energy rises year by year, the phenomenon of short-time power supply and demand unbalance generally exists, and the key point of how to effectively relieve the contradiction and promote the full and efficient utilization of electric energy is still the problem to be solved at present.

The low-voltage residential electricity consumption is an important component of the power grid energy consumption, a large amount of dispersed controllable load resources are possessed, and with the development of the smart power grid technology, the low-voltage residential electricity consumption can be centralized and deeply mined through a demand response means of a power load aggregator. But the capability of the dispersed low-voltage users for mining the self demand response potential is limited, the response degree is not high enough, the terminal transmission network data is excessive, and the service communication quality is difficult to guarantee. Therefore, the adjustable resources of the low-voltage users cannot play an effective role in the balance adjustment of the supply and demand of the power grid, and the comprehensive optimization configuration of the power resources cannot be well realized.

The fifth generation mobile communication technology (5G communication) has the characteristics of high bandwidth, low latency, high density of connections, high reliability and the like. Edge computing provides nearest-end services nearby. The application program is initiated at the edge side, so that a faster network service response is generated, and the basic requirements of the interactive response in the aspects of real-time power business, intelligent home, safety, privacy protection and the like are met. The invention can integrate the adjustable load resources of the low-voltage users in the power grid region, realize the classification, aggregation, management and optimal configuration of the resources, and can ensure the bidirectional stable transmission of user data and power grid response strategies in real time.

With the popularization of various intelligent terminals, the network consumption and the transmission delay are greatly increased by the growing terminal data, and the characteristics of real-time performance, safety and low cost of 5G edge calculation can be effectively applied to a power user side terminal platform.

Referring to fig. 1, an embodiment of a resource allocation method based on a deep reinforcement learning algorithm according to the present invention includes:

s100: acquiring power load data of each low-voltage user in a low-voltage user group on various load types, wherein the load types comprise an uncontrollable load, a transferable load and an interruptible load;

s200: based on a deep reinforcement learning algorithm, obtaining an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource configuration information; the resource allocation information comprises the total channel bandwidth and the total calculation frequency of an edge calculation server of a 5G base station, the optimal resource allocation strategy comprises an optimal edge calculation frequency value and an optimal channel bandwidth value, and the optimal resource allocation strategy meets a power consumption index, a time delay index and a reliability index;

s300: distributing the total calculation frequency to each low-voltage user according to the optimal edge calculation frequency value so that each low-voltage user can unload the power load data as a calculation task to the edge calculation server for edge calculation;

s400: and distributing the total channel bandwidth amount to each low-voltage user according to the optimal channel bandwidth value so as to realize the real-time transmission of the power load data.

In the embodiment, each low-voltage user is provided with an intelligent ammeter, a plurality of low-voltage users in a certain range form a low-voltage user group, and each low-voltage user group is provided with a local intelligent controller; each 5G communication base station is provided with an edge computing server, and the 5G base stations are connected with the local intelligent controllers of the low-voltage user groups by adopting a TCP/IP protocol. Each power load aggregator is provided with a central intelligent controller, all low-voltage user terminal equipment in a low-voltage user group served by an edge computing server of the 5G base station is accessed to the power load aggregator according to management requirements, and the power load aggregator respectively signs a power demand response protocol with each low-voltage user in the low-voltage user group; the power load aggregators are connected to the power grid districts of the local districts according to the management requirements, and a three-party power demand response protocol is signed with local power grid companies and low-voltage users.

In this embodiment, the load types of the low-voltage users include: uncontrollable loads, transferable loads and interruptible loads; wherein, uncontrollable loads such as lighting, computers, etc.; transferable loads such as electric vehicles, timing appliances, and the like; loads such as water heaters, air conditioners, etc. may be interrupted. The classified load monitoring is carried out on the power loads of the low-voltage users through an intelligent control terminal (such as an intelligent ammeter), and the power load data of each low-voltage user in the low-voltage user group on various load types is obtained.

In this embodiment, the network communication interface includes three modules:

(1) And the low-voltage user side interface is connected into the low-voltage user intelligent electric meter according to the management requirement and respectively signs a power demand response protocol with each low-voltage user. All low-voltage users served by the power load aggregation provider sign an agreement content, which comprises the following steps: the method comprises the following steps of low-voltage user information, signed peak clipping and valley filling strategies, data input and output monitoring, load resource management of transferable loads and interruptible loads, load operation strategy feedback and the like. And carrying out unified monitoring, operation and management by a terminal platform of the load aggregator.

(2) And the internal system interface of the load aggregator realizes the uploading synchronization of the data of the electricity selling system, such as real-time electricity price, provides user load prediction data for the electricity selling platform and adjusts the suggestions of low-voltage users in a strategic manner for ensuring the balance of the supply and demand of the power grid.

(3) And the power grid company side interface is accessed to the power grid distribution area of the local area according to the management requirement and signs a power demand response protocol with the local power grid company.

The local intelligent controller of the low-voltage user group is responsible for collecting the energy consumption of the low-voltage user power grid, namely power load data, the real-time electricity price of the power grid nodes and other information, and transmits the information to the central intelligent controller of the power load aggregator through the 5G base station communication. The power load aggregator transmits the data acquired by the local intelligent controller to a central intelligent controller of the power load aggregator completely at a high speed by using 5G high-speed transmission, transmits the data to a power grid company, and finally transmits formulated demand response instructions to the smart phone of the low-voltage user by matching with the power grid company so as to realize bidirectional communication.

And the power load aggregator completes real-time transmission of power data according to the demand response of peak clipping and valley filling of the power grid and the real-time electricity prices from all the network points. In this embodiment, the smart meter records power load data of the low-voltage user on various power loads, and may offload computation tasks carried by the smart meter, that is, the power load data, to an edge computing server configured in a 5G base station, where the edge computing server processes mass data of the low-voltage user by using an edge computing technology. The local intelligent controller is responsible for collecting information such as low-voltage user power grid energy consumption and power grid node electricity price and transmitting the information to the central intelligent controller of the power load aggregator through 5G communication. The power load aggregator transmits the user data to a power grid enterprise, the power grid enterprise issues a real-time peak clipping and valley filling strategy, the real-time peak clipping and valley filling strategy is transmitted to the low-voltage users through the aggregator, and the low-voltage users respond to the decision of the power load aggregator by adjusting energy consumption so as to achieve intelligent demand response based on low-voltage user access control.

In the active electricity selling market in the future, power load aggregators connected to power grid nodes of low-voltage users issue different real-time electricity prices through a power grid system. The 5G communication provides communication service for users, and edge calculation is proved to be capable of effectively reducing the core load rate of a network and reducing energy consumption brought by network transmission, so that the life cycle of the Internet of things equipment of the interactive response terminal is prolonged.

For a low-voltage user group which has signed a demand response protocol with a load aggregator, the load aggregator obtains an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource configuration information based on a deep reinforcement learning algorithm; the optimal resource allocation strategy comprises an optimal edge calculation frequency value and an optimal channel bandwidth value.

Then, the load aggregation provider performs resource allocation for each low-voltage user in the low-voltage user group according to the optimal resource allocation strategy, including: and allocating the computing resources according to the optimal edge computing frequency value and allocating the communication resources according to the optimal channel bandwidth value. Specifically, the method comprises the following steps: (1) The load aggregation provider distributes the total calculation frequency of the edge calculation server to each low-voltage user in the low-voltage user group according to the optimal edge calculation frequency value, so that each low-voltage user can conveniently unload the power load data serving as a calculation task to the edge calculation server for edge calculation; (2) And the load aggregation provider distributes the total channel bandwidth to each low-voltage user according to the optimal channel bandwidth value so as to realize the real-time transmission of the power load data.

The load aggregator can reasonably plan communication resources in a communication network space and power distribution of low-voltage users according to information such as real-time electricity prices from network points of each power grid region, and the low-voltage users respond to the decision of the power load aggregator by adjusting energy consumption so as to realize intelligent demand response based on access control of the low-voltage users.

It should be noted that, the optimal resource allocation strategy can ensure that each low-voltage user has an optimal time delay value for communication transmission, so as to implement real-time transmission of power load data of the low-voltage users, and at the same time, can ensure that the overall transmission rate and the overall transmission time delay of all low-voltage users in the low-voltage user group are reduced, thereby having higher stability and reliability.

In this embodiment, a resource is reasonably allocated by using a deep learning enhancement algorithm DQN (deep Q-learning) to establish an energy-saving high-speed transmission model. The intelligent electric meter records the power load data of the low-voltage users and transmits the power load data to the 5G base station, and the 5G base station processes mass data of the low-voltage users by using the edge computing server; the local intelligent controller is responsible for collecting information such as power grid energy consumption of low-voltage users in the intelligent electric meter, power grid node electricity price and the like and transmitting the information to the central intelligent controller of the power load aggregator through the 5G communication base station; the power load aggregator reasonably plans resources in a communication network space and power distribution of low-voltage users according to information such as power prices from power grid distribution area network points, and the low-voltage users respond to the decision of the power load aggregator by adjusting energy consumption so as to realize intelligent demand response based on low-voltage user access control.

In this embodiment, the power load aggregator completes real-time transmission of power data according to the demand response of peak clipping and valley filling of the power grid and the real-time electricity prices from each grid point. The local intelligent controller is responsible for collecting information such as low-voltage user power grid energy consumption and power grid node electricity price and transmitting the information to the central intelligent controller of the power load aggregator through 5G communication. The power load aggregator transmits the user data to a power grid enterprise, the power grid enterprise issues a real-time peak clipping and valley filling strategy, the real-time peak clipping and valley filling strategy is transmitted to the low-voltage users through the aggregator, and the low-voltage users respond to the decision of the power load aggregator by adjusting energy consumption so as to achieve intelligent demand response based on low-voltage user access control.

In this embodiment, the optimal resource allocation strategy needs to satisfy three indexes, namely, a power consumption index, a delay index and a reliability index. Considering N low-voltage users in a certain low-voltage user group, each low-voltage user is marked as N, and N is more than or equal to 1 and less than or equal to N. The load types of each low-voltage user include: uncontrollable loads (such as lighting and computers), transferable loads (such as electric automobiles and timing household appliances), interruptible loads (such as water heaters and air conditioners); the power loads of different types in the low-voltage users are divided into M load types in total, the power load of each low-voltage user is recorded as M, and M is larger than or equal to 1 and smaller than or equal to M. Recording the power load data of the nth low-voltage user on the mth power load as U _nm 。

It should be noted that the power load data of the low-voltage users on the power loads is a measurable and controllable intelligent unit, and the power load aggregators perform unified monitoring management.

According to the edge calculation technique, each U is divided into _nm And as one calculation task, all calculation tasks of the low-voltage users are unloaded to an edge calculation server of the 5G base station to execute the calculation tasks. In the whole process, the power consumption indexes comprise: edge computation power consumption and transmission power consumption.

Specifically, the edge calculation power consumption is:

in the formula,

assignment of edge compute servers to U _nm Of the computing resources of (2), U _nm Power load data at the mth power load for the nth low voltage consumer.

Specifically, the transmission power consumption is:

in the formula,

b _nm is U _nm Of the channel bandwidth d _nm Is U _nm Distance to edge compute server, α represents path loss exponent, h _nm Is U _nm N is the noise power.

After the edge calculation power consumption and the transmission power consumption are obtained, the total power consumption can be obtained. The formula for calculating the total power consumption is:

in the invention, the power load data of the nth low-voltage user on the mth power load is U _nm When the time delay is unloaded to an edge computing server of a 5G base station, the required time delay score comprises: the edge calculates the delay and the propagation delay.

Specifically, the edge calculation delay is:

in the formula,

is U _nm The edge of (2) calculates the time delay.

Specifically, the transmission delay is:

in the formula,

is U _nm The transmission delay of (2).

Because the edge computing server may be affected by software or hardware to cause failure in unloading computing tasks, considering the overall reliability of the communication system, the normal operation probability of the edge computing server is assumed to be R _nm The reliability index can be calculated by the normal operation of the serverThe ratio indicates:

in the formula, R _nm Is U _nm The unloaded edge calculates the probability of normal operation of the server, A _n Representing the decision accuracy of the communication transmission model of the nth user,

a failure parameter indicative of an edge computing server,

is U _nm Edge calculation delay, U _nm Power load data at the mth power load for the nth low voltage customer.

On the premise of meeting the requirements of low power consumption and reliability, load aggregators need to meet the following requirements:

and is

is U _nm The edge of (2) calculates the power consumption,

is U _nm The power consumption of the transmission of (2),

Aiming at the requirements of low-voltage users, the calculation frequency and the transmission bandwidth are reasonably distributed, and the total time delay is ensured to be the lowest; optimizing a delay function

I.e. minimizing the total delay.

Wherein,

since the computation tasks are totally offloaded to the edge computation x _nm ＝1，f _max Means total computation frequency of edge computation (MEC) server, i.e. sum of computation frequencies allocated by system cannot exceed total computation frequency; b _max Indicating that the total amount of channel bandwidth, i.e. the sum of the channel bandwidths allocated by the system, cannot exceed the total amount of channel bandwidth.

For the whole fixed number of low-voltage user groups, the distance between each low-voltage user and the base station is different. According to the Shannon theorem, the optimization of the channel bandwidth allocation related to the target model ensures that the overall transmission rate is increased and the overall time delay is reduced.

Referring to fig. 4, the DQN (deep Q-learning) frame diagram shown in fig. 4 includes: environment State, action executed, reward, next State S _n+ 1, next action A _n+1 Next prize R _n+1 。

The neural network model Agent is a core code, observes the current Environment and obtains a state space state, and according to the constraint condition, the invention means that both the edge calculation frequency and the channel bandwidth cannot exceed the maximum value. And making action space action on the state space state, wherein a reward value rewarded can be obtained, and the Environment is changed, so that the code Agent obtains a new state space state and continues to execute until a reward optimal solution is obtained.

The DQN algorithm is a combination of a convolutional neural network and a Q-learning algorithm. The Q-learning algorithm in the DQN algorithm is one of the reinforcement learning, and more precisely, a selection method of strategies. In fact, we have found that the core and training goal of reinforcement learning is to select an appropriate strategy that optimizes the reward value obtained at the end of each cycle. The core function is Q (S, A), and in the state S, after the action A is taken, a Reward value will be obtained in the future.

Aiming at the resource allocation of the load aggregators, an enhanced deep learning algorithm DQN (deep Q-learning) is adopted to solve the optimal problem. Based on DQN, resource allocation strategies with optimal calculation frequency and optimal channel bandwidth are researched, and self-updating of the resource allocation strategies can be realized according to past experience in the time-varying environment of the calculation frequency and the channel bandwidth, so that the execution delay and the transmission delay of a calculation task are effectively reduced, and the use experience of a low-voltage user terminal is improved.

Referring to fig. 5 and fig. 6, as shown in fig. 5, the acquisition of the low voltage user group related computing resources is started. And performing simulation learning by adopting an enhanced deep learning algorithm DQN (deep Q-learning).

DQN uses a neural network modeling function Q (s, a), comprising a state space (state); an action space (action); the reward function (reward) has three basic elements, the input is the state of the problem, the output is the Q value corresponding to each action a, and then the action executed by the corresponding state is selected according to the Q value, so as to complete the optimization.

In this embodiment, the state space of the DQN algorithm refers to: the decision variables of the edge computing frequency, the channel bandwidth and the unloading computing task correspond to

b _nm And x _nm And =1. The action space refers to the action space corresponding to the value of the state space. The action space is the resource allocation policy.

In this embodiment, the reward function of the DQN algorithm refers to a variable value of each round of state space set for the low-voltage user, a delay value of each scheme is calculated by using a delay model, a reward or penalty is given to reflect whether the selected state is correct, and the selected state is added to the memory pool. Judging whether the time delay value is optimal or not, if not, returning to the state space value, and traversing again; if the time delay value is already optimal, the optimal time delay value and the corresponding state space value are obtained. It can be understood that the corresponding state space value is the optimal resource allocation strategy of the resource.

Traversing all state space values, performing reinforcement learning in a cyclic algorithm process, finally applying a processed result to the power low-voltage user group, and calculating a frequency value and an optimal channel bandwidth allocation value according to an optimal state space value obtained by the algorithm, namely a corresponding optimal edge. And distributing computing resources to the low-voltage user group according to the optimal edge computing frequency value, distributing communication resources to the low-voltage user group according to the optimal channel bandwidth distribution value, finishing instant processing and high-speed transmission of data, and realizing synchronization of real-time transmission of power communication data and uploading of real-time electricity price so as to meet the timeliness requirement on power demand response in the energy Internet.

The invention can reasonably distribute edge computing resources and communication resources for low-voltage users under the condition of limited computing resources and network resources, keep the energy consumption and reliability at acceptable stable values, acquire the optimal time delay value, realize the timely processing and real-time transmission of power load data, and provide data processing and communication support with high bandwidth, low power consumption, high reliability and high stability for power demand response. The method utilizes the edge computing server to process mass power data of the low-voltage users in real time, can effectively reduce the computing task time delay of the low-voltage users, improves the resource utilization rate while improving the user experience, achieves the aims of reducing the time delay, optimizing the flow, enhancing the safety and saving the cost, and can meet the timeliness requirement on power demand response in the energy internet.

The optimal resource allocation strategy obtained by the invention meets the requirements of energy consumption, time delay and reliability, keeps the energy consumption and the reliability at acceptable stable values, can obtain the optimal time delay value, ensures the real-time transmission of power communication data, can realize the data processing and communication support with high bandwidth, low power consumption and high reliability, and provides high-quality computing service and communication service for low-voltage users and load aggregators to carry out power demand response.

The method is beneficial to standardizing the demand response flow of the power load aggregator, the low-voltage users can perform intelligent demand response by adjusting energy consumption, and the adjustable resources of the low-voltage users can be effectively transferred, including transferable loads (such as electric vehicles and timing household appliances) and interruptible loads (such as water heaters and air conditioners) participating in balance adjustment of supply and demand of the power grid, so that the power grid company, the low-voltage users and the power load aggregator can obtain benefits from the transferable loads, the low-voltage users are finally helped to reduce power consumption cost, and the balance of supply and demand of energy is adjusted for the power grid.

Referring to fig. 2, an embodiment of a resource allocation apparatus based on a deep reinforcement learning algorithm according to the present invention includes:

the load data acquisition module 11 is configured to acquire power load data of each low-voltage user in the low-voltage user group on various load types, where the load types include an uncontrollable load, a transferable load, and an interruptible load;

the optimal resource allocation strategy determining module 22 is configured to obtain an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource allocation information based on a deep reinforcement learning algorithm; the resource allocation information comprises the total channel bandwidth and the total computing frequency of an edge computing server of a 5G base station, the optimal resource allocation strategy comprises an optimal edge computing frequency value and an optimal channel bandwidth value, and the optimal resource allocation strategy meets a power consumption index, a time delay index and a reliability index;

a computing resource allocation module 33, configured to allocate the total computing frequency to each low-voltage user according to the optimal edge computing frequency value, so that each low-voltage user offloads the power load data as a computing task to the edge computing server for edge computing;

and the communication resource allocation module 44 is configured to allocate the total channel bandwidth to each low-voltage user according to the optimal channel bandwidth value, so as to implement real-time transmission of the power load data.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A resource allocation method based on a deep reinforcement learning algorithm is characterized by comprising the following steps:

based on a deep reinforcement learning algorithm, obtaining an optimal total time delay and a corresponding optimal resource allocation strategy according to the power load data and the resource configuration information; the resource allocation information comprises the total channel bandwidth and the total calculation frequency of an edge calculation server of a 5G base station, the optimal resource allocation strategy comprises an optimal edge calculation frequency value and an optimal channel bandwidth value, and the optimal resource allocation strategy meets a power consumption index, a time delay index and a reliability index;

2. The deep reinforcement learning algorithm-based resource allocation method according to claim 1, wherein the power consumption index comprises: edge computation power consumption and transmission power consumption.

3. The method according to claim 2, wherein the edge computing power consumption is: