CN110381541B - Smart grid slice distribution method and device based on reinforcement learning - Google Patents
Smart grid slice distribution method and device based on reinforcement learning Download PDFInfo
- Publication number
- CN110381541B CN110381541B CN201910452242.7A CN201910452242A CN110381541B CN 110381541 B CN110381541 B CN 110381541B CN 201910452242 A CN201910452242 A CN 201910452242A CN 110381541 B CN110381541 B CN 110381541B
- Authority
- CN
- China
- Prior art keywords
- slice
- reinforcement learning
- state
- intelligent power
- power grid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000006870 function Effects 0.000 claims description 33
- 230000009471 action Effects 0.000 claims description 29
- 230000007704 transition Effects 0.000 claims description 14
- 230000003993 interaction Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000009825 accumulation Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000010354 integration Effects 0.000 abstract description 4
- 238000007726 management method Methods 0.000 description 9
- 239000003795 chemical substances by application Substances 0.000 description 8
- 238000013507 mapping Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000007774 longterm Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 240000004760 Pimpinella anisum Species 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/24—Negotiating SLA [Service Level Agreement]; Negotiating QoS [Quality of Service]
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a smart grid slice distribution method based on reinforcement learning, which is characterized by comprising the following steps of: classifying the power business of the intelligent power grid according to the business type; corresponding the classifications to different slices; and constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing the distribution of the intelligent power grid slice through the reinforcement learning model to realize the resource scheduling management of the intelligent power grid. Classifying the service types of the intelligent power grid, corresponding the classifications to different slices, and completing the distribution of the intelligent power grid slices through the constructed reinforcement learning model of the intelligent power grid slices. Therefore, the integration problem of the 5G network slicing technology and the intelligent power grid based on reinforcement learning is solved.
Description
Technical Field
The application relates to the field of network resource distribution of power wireless communication, in particular to a smart grid slice distribution method based on reinforcement learning, and simultaneously relates to a smart grid slice distribution device based on reinforcement learning.
Background
With the advent of the 5G age with high-speed ubiquitous, low-power consumption and low time delay, the communication of the human society is gradually smoothened. Network slicing is considered as one of the important key technologies of 5G networks, which divides a single physical network into a plurality of independent logical networks to support various vertical multi-service networks, and is allocated in different service scenarios according to the characteristics thereof to adapt to different service requirements. The network slicing technology can greatly save the deployment cost and reduce the occupancy of the network.
Driven by the growing energy and power demands, world grids have stepped into the smart grid era from traditional networks. In combination with new energy revolution, development of communication field and global Internet strategic conception, the 5G network slicing technology has the possibility of being applied to smart grid business for the first time. The technical characteristics of the 5G network slice have the characteristics of customizable slice, safe and reliable isolation among slices and unified slice management for the wireless service application of the bearing power grid, have the advantages of quick networking, high efficiency and economy, and have wide application prospects in power systems. Therefore, the fusion of the reinforcement learning-based 5G network slicing technology and the smart grid is a problem to be solved.
Disclosure of Invention
The application provides a smart grid slice distribution method based on reinforcement learning, which solves the problem of integration of a 5G network slice technology and a smart grid based on reinforcement learning.
The application provides a smart grid slice distribution method based on reinforcement learning, which is characterized by comprising the following steps:
classifying the power business of the intelligent power grid according to the business type;
corresponding the classifications to different slices;
and constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing the distribution of the intelligent power grid slice through the reinforcement learning model to realize the resource scheduling management of the intelligent power grid.
Preferably, classifying the power service of the smart grid according to the service type includes:
and classifying the electric power business of the intelligent power grid into a control class, an information acquisition class and a mobile application class according to the business type.
Preferably, the classifying corresponds to different slices, including:
the control class corresponds to the uRLLC slice, the information acquisition class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
Preferably, the constructing the reinforcement learning model of the smart grid specifically uses Q -learning The algorithm builds a reinforcement learning model of the intelligent power grid.
Preferably, the constructing the reinforcement learning model of the smart grid slice includes: and respectively constructing reinforcement learning models of the wireless access side and the core network side.
Preferably, the constructing the reinforcement learning model of the smart grid slice includes:
defining the state space as s= { S 1 ,s 2 ,...,s n };
The action space a is defined as a= { a 1 ,a 2 ,...,a n };
The reward function is r= { s, a }, P (s, s * ) Representing the transition probability of transitioning from state s to s';
at any time, the slice controller in state s can select action a to obtain instant rewards R t The process of the Q-learning algorithm, which also transitions to the next state s', can be expressed by the following updated equation,
where α is the learning rate, anIs all instant rewards R t Is a discount accumulation for a group of (c) in the group,
by updating the Q value for a sufficiently long duration and by adjusting the values of alpha and gamma, it is ensured that Q (s, a) can eventually converge to the value at the time of the optimal strategy, i.e.
The application also provides a smart grid slice distribution device based on reinforcement learning, which is characterized by comprising the following components;
the classifying unit classifies the power business of the intelligent power grid according to the business type;
a classification and slice correspondence unit that corresponds the classification to different slices;
the model construction unit is used for constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid; and the distribution of the intelligent power grid slices is completed through the reinforcement learning model, so that the resource scheduling management of the intelligent power grid is realized.
The utility model provides a smart grid slice distribution method based on reinforcement learning, which is characterized in that the service types of the smart grid are classified, the classification is corresponding to different slices, and the distribution of the smart grid slices is completed through a constructed reinforcement learning model of the smart grid slices. Therefore, the integration problem of the 5G network slicing technology and the intelligent power grid based on reinforcement learning is solved.
Drawings
Fig. 1 is a schematic flow chart of a smart grid slice allocation method based on reinforcement learning according to an embodiment of the present application;
fig. 2 is a schematic view of a slice architecture in a smart grid scenario according to an embodiment of the present application;
fig. 3 is a schematic diagram of a relationship between three types of services of a slice and a smart grid according to an embodiment of the present application;
fig. 4 is a QoS index of a typical traffic slice of a smart grid according to an embodiment of the present application;
FIG. 5 is a mapping of a smart grid slice resource management mechanism to a RL according to an embodiment of the present application;
fig. 6 is a schematic diagram of a smart grid slice allocation device based on reinforcement learning according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
Referring to fig. 1, fig. 1 is a schematic diagram of a smart grid slice allocation method based on reinforcement learning according to an embodiment of the present application, and the method provided in the present application is described in detail below with reference to fig. 1.
Step S101, classifying the power business of the intelligent power grid according to the business type.
First, a slice architecture under a smart grid scenario on which the present application is based is described, as shown in fig. 2.
The network slice helps to realize control/data plane decoupling of the network by means of SDN technology, and defines an open interface between the control/data plane decoupling and the open interface, so that flexible definition of network functions in the network slice is realized. To meet the needs of such services, network slices contain only network functions that support a particular service. The electric power business can be divided into three major categories, namely control categories (such as power distribution automation, accurate load control and the like), information collection categories (such as power consumption information collection, power transmission line monitoring and the like) and mobile application categories (such as intelligent inspection, mobile operation and the like).
Step S102, the classification is corresponding to different slices.
Fig. 3 is a relationship between three general classes of slices and three classes of traffic of the smart grid. The control class corresponds to the uRLLC slice, the information acquisition class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
And step S103, constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing distribution of the intelligent power grid slice through the reinforcement learning model to realize resource scheduling management of the intelligent power grid.
Fig. 4 shows QoS (service) indicators of typical traffic slices of a smart grid. The present application contemplates a traffic plane, an orchestration control plane, and a data plane. The service plane divides the services into elastic applications (Elastic application) and Real-Time applications (Real-Time application). Elastic applications can tolerate relatively large delays without minimum bandwidth requirements. Specific examples are car-in distributed power, video surveillance, user metering, etc. Real-time applications require their networks to provide a minimum level of performance guarantee. The main representative type is URLLC slice service, typical examples are distribution automation, emergency communication, etc. The data plane stores data generated by the interaction of the power device with the physical layer.
In the application, a control plane is mainly considered, an access network SDN (software defined network) controller and a core network SDN controller are introduced and respectively responsible for Network Function (NF) management and coordination (such as service migration and deployment) of the access network and the core network, which are equivalent to two different agents, and can communicate with each other to complete coordination work together. The slice arrangement controller of the arrangement control plane finishes the division of the slice network and is divided into a Radio Access Network (RAN) side slice and a Core Network (CN) side slice facing to various prior knowledge of service types, channel conditions and user requirements of the service plane. The network slices on the RAN side and the CN side are respectively managed by respective SDN controllers and are responsible for executing the algorithm on the respective network side, namely the intelligent power grid slice distribution method based on reinforcement learning.
The reinforcement learning model on the RAN side and the CN side proposed in the present application is described below.
(1) RAN side radio resource slice
Given a series of existing slices χ 1 ,χ 2 ,...,χ n The vector χ is used to represent the set of existing slices as χ= { χ 1 ,χ 2 ,...,χ n -the slices share an aggregate bandwidth B; there is a series of traffic flows, with vector d= { D 1 ,d 2 ,...,d m And } represents. The variable D is actually a set of smart grid traffic flows. The QoS requirements to be met by each slice service are different in the face of the multi-service feature of the smart grid. But the traffic flow is in particular what kind of traffic in the smart grid is not known in advance and the real-time demand change of the traffic is not stable in the context of the smart grid. It can be seen that d i (i e m= {1,2,..m }) obeys a specific flow model.
First, a system state space, an action space, and a bonus function of the RAN-side network need to be defined. Interaction of slice controller with wireless environment is performed by the tuples [ S, a, P (S, S) * ),R(s,a)]Representation, where S represents a set of possible states, A represents a set of possible actions, P (S, S * ) Representing the transition probability of transitioning from state s to s', R (s, a) is the reward associated with the action trigger in state s, which is fed back to the slice controller. The mapping of wireless access side slice resource management to RL is as follows.
A. State space:
the state space is defined as a set of tuples S s= { S slice }。s slice Is a vector representing the status of all currently available bearer-related power traffic slices, where the nth element is
B. Action space:
in the face of time-varying unknown traffic flow models, reinforcement learning agents (agents) must allocate appropriate slice resources for the corresponding power traffic. The agent may decide how to perform the action next moment based on the current slice state and the reward function. The action space a is defined as a= { a bandwidth },a bandwidth The presentation Agent (Agent) allocates an appropriate bandwidth for each logically independently partitioned slice to carry the corresponding traffic.
Since network slices share network resources between virtual networks, virtual network slices must be isolated from each other so that if resources on one slice are insufficient to carry current traffic, congestion or failure occurs without affecting other slices. Therefore, to ensure that the isolation of the slices is maximized with the utility of resource allocation, it is defined that each slice can only carry one kind of traffic at most:
simultaneously defining binary variables
C. Reward function
After the agent distributes a specific slice to a certain smart grid service, a comprehensive benefit is obtained, and the comprehensive benefit is taken as a reward of the system. The control type power service has very strict requirements on the time delay and the bit error rate of communication, and the failure or error of the communication can influence the control execution of the power grid, so that the power grid operation is faulty. For some mobile application services (such as inspecting transmission video, playing back high-definition video, etc.), a certain transmission rate is required to be ensured, and a high requirement on communication bandwidth is met. The power supply reliability means continuous, sufficient, high quality power supply. For example, when the power supply reliability reaches 99.999% ("5 pieces of 9"), it means that the power outage time of the annual average of the regional power consumers does not exceed 5 minutes, and when this number reaches 99.9999% ("6 pieces of 9 pieces), the power outage time of the annual average of the regional power consumers is reduced to about 30 seconds. Due to limited spectrum resources on the RAN side, an optimal policy should be chosen when assigning slices to maximize the QoS requirements of the users.
Mainly consider the downlink situation, and use Spectral Efficiency (SE) and Delay (Delay) as evaluation indexes. The spectral efficiency of a system can be defined as:
according to shannon formula r=blog 2 (1+(g BS→UE P)/σ 2 ) The actual Base Station (BS) to user rate can be derived, where g BS→UE Is the Channel State (CSI) between base stations to devices, subject to rayleigh fading.
In describing the QoS requirements of a user, we introduce a utility function (utility function), i.e. a curve mapping between the bandwidth to which the slice traffic is allocated and the performance perceived by the user. In this context we assume that the traffic carried by a slice can be divided into elastic applications and real-time applications.
(a) Elastic application
For this type of application there is no minimum bandwidth requirement, as it can tolerate relatively large delays. The elastic flow utility model employs the following functions:
where k is an adjustable parameter that determines the shape of the utility function and ensures that, when the maximum requested bandwidth is received,but even if very high bandwidths are provided, user satisfaction with this application is very difficult to achieve 1. Therefore, we consider that bandwidth allocation to this application type does not occur even in the case of excessive network bandwidthShould exceed the maximum bandwidth b max 。
(b) Real-time application
This type of application traffic requires its network to provide a minimum level of performance guarantee. If the allocated bandwidth falls below a certain threshold, qoS will become unacceptable. Real-time applications are modeled using the following utility functions:
wherein k is 1 ,k 2 Are adjustable parameters that determine the shape of the utility function.
The rewards for defining the learning agent are as follows:
R=λ·SE+μ·U e +ξ·U rt
wherein λ, μ, ζ is SE, U e And U rt Is a weight of (2).
Thus, from a mathematical perspective, our problem can be formulated as:
d i (i e m= {1,2,..m }) obeys a specific flow model (
The key difficulty in solving the problem is that due to the existence of the flow model, the service demand change is unstable under the condition of not knowing in advance, i.e. the service real-time demand change in the smart grid scene is unknown.
(2) Core network slice based on priority scheduling
Similarly, e.g.If we virtualize the computational resources as VNFs per tile, then the problem of allocating computational resources to each VNF can be solved like a slice of radio resources. Therefore, in this section we discuss another important issue, namely priority-based generic VNFs core network slicing. The mapping we use is slightly different from the radio resource slices to represent the flexibility of the RL. Similarly, the interaction of the slice controller with the core network side is also performed by four tuples [ S, a, P (S, S) * ),R(s,a)]The appropriate mapping of RL elements to this slicing problem is denoted as defined below.
A. State space
On the core network side there are related Service Function Chains (SFCs) which have the same basic functions but which consume different Computational Processing Units (CPUs) and produce different results, such as queuing times for the traffic. For example, based on business value or other smart grid business related features, the business flows may be classified into three classes (e.g., class a, class B, class C), with priorities gradually decreasing from class a to class C, and priority-based scheduling rules defined as: SFC I processes class A traffic flows preferentially, SFC II treats class A and class B traffic flows equally, but service class C traffic flows have the lowest priority. SFC III is one-view of all traffic flows. The queuing time of traffic is generated when scheduling based on priority.
The state space may be defined as t= { T q },T q Is a vector that characterizes the queuing state of each element in the service set D. When N CPUs are used to calculate service d i When the ith element is T qi Representing service d i Where i e m= {1,2,..m }.
B. Action space
The CPU that each SFC ultimately uses depends on the number of traffic streams it has processed. In the case of a limited number of CPUs, each type of traffic flow needs to be scheduled to the appropriate SFC, resulting in acceptable queuing times. Thus processing traffic d i When the CPU number N is needed to be selected at the core network side CPU . Thus defining the action space as A CPU ={a CPU (wherein a) CPU Representation ofFacing incoming traffic d i (i e m= {1,2,..m }) the number of required CPUs is selected when performing the calculation.
C. Reward function
In defining the reward function, we first need the utility function U to characterize the sensitivity of the current traffic to latency, and then define a new metric "network request value" function W to characterize the traffic priority.
It has been mentioned above that in describing elastic applications and real-time applications, we use utility functions:
to characterize the traffic d respectively i Is not required for QoS. In contrast to the RAN side, in which the argument is changed to the computation service d i The number of CPUs n required by the core network side. But this can only reflect QoS requirements of different services. Due to the limited computational resources, after the computational resources are allocated, a reasonable scheduling rule is required to reflect which service is to be prioritized, and therefore a "network request cost" function W is introduced to characterize the priority of the service. For any application service d i The value of the network request to be satisfied is defined as:
W i =2 (p) U i
where p is traffic d i Priority level of U i Is any element in the elastic application and real-time application composition set, namely U i ∈{U e ,U kt }. Weight of service request 2 (p) Indicating the importance of the request relative to other requests. Defining a reward function as:
R=W i
the above-mentioned service d can only be obtained i We need to get priority queuing for a series of servicesIn the case of a long-term prize, the accumulation is therefore required to be maximized, i.e
FIG. 5 is a mapping of smart grid slice resource management mechanism to RL:
next, a slice allocation method based on reinforcement learning in the above model background proposed in the present application will be described.
Q-based -learning Reinforcement learning algorithms on RAN and CN sides. Since the expressions of RAN, CN side state set, action set and rewarding function are slightly different in the above, and here, Q is based on our proposed mapping model of RL to RAN, CN -learning The algorithm has universality, and for convenience of representation, the unified state space is S= { S in this section 1 ,s 2 ,...,s n Motion space is a= { a } 1 ,a 2 ,...,a n The reward function is r= { s, a }, P (s, s) * ) Representing the transition probability of a transition from state s to s'.
The final goal of the slice controller is to find the optimal slicing strategy pi * The policy is a mapping from state sets to action sets and needs to maximize the expected long-term discount rewards for each state:
the long-term discount rewards for state s is the sum of the discounts for rewards obtained on the state trajectory and is given by:
R(s,π(s))+γR(s 1 ,π(s 1 ))+γ 2 R(s 2 ,π(s 2 ))+...
where γ is a discount factor (0 < γ < 1), determining the present value for the future prize. The optimization objective in formula (x) represents the state value function of any policy, which can be expressed as follows:
there is at least one optimization strategy in a single environment setting according to the optimality criteria of Bellman. Thus, the state value function of the optimal strategy is given by:
the state transition probabilities depend on many factors, such as traffic load, traffic arrival and departure rates, decision algorithms, etc., and thus may not be readily available either on the radio side or on the core network side. Model-free reinforcement learning is therefore well suited to deriving an optimal strategy because it does not require the expectation of rewards and the state transition probabilities can be known as a priori knowledge. Among the various existing RL algorithms, we choose Q -learning 。
Taking the RAN side as an example, the slice controller interacts with the wireless environment in a very short discrete time period. The action-value function (also referred to as Q value) of a state-action doublet (s, pi (s)) may be represented as Q (s, pi (s)). Q (s, pi (s)) is defined as the expected long-term discount prize for state s when using policy pi. Our goal is to find an optimization strategy that maximizes the Q value for each state s:
according to Q -learning The algorithm, the slice controller can learn the optimal Q value through iteration based on the existing information. At any time, the slice controller in state s may select action a. This will result in a bonus instant prize R t At the same time, the next state s' is also shifted. Q (Q) -learning The process of the algorithm can be expressed by the following updated equation:
where α is the learning rate, anIs all instant rewards R t Discount accumulation of (c):
by updating the Q value for a sufficiently long duration and by adjusting the values of alpha and gamma, it is ensured that Q (s, a) can eventually converge to the value at the time of the optimal strategy, i.e.
The whole slicing strategy is given by the following algorithm. Initially, the Q value is set to 0. At Q -learning Prior to algorithm application, the slice controller performs initial slice allocation on the different slices based on the power traffic flow demand estimates for each slice, which is done for state initialization of the different slices. Existing radio resource slicing solutions use bandwidth-based or resource-based provisioning to allocate radio resources to different slices.
Due to Q -learning Is an online iterative learning algorithm that performs two different types of operations. In the explore mode, the slice controller randomly selects one possible action to enhance its future decisions. In contrast, in the development mode, the slice controller prefers that it has attempted in the past and found efficient operation. We assume that the slice controller in state s explores with a probability of epsilon and uses the previously stored Q value with a probability of 1-epsilon. In any state, not all actions are possible in order to maintain slice-to-slice isolation, the slice controller must ensure that the same physical resource blocks (PRBs) Assigned to two different slices (RAN side).
Corresponding to the method provided by the application, the application also provides a smart grid slice distribution device 600 based on reinforcement learning, which is characterized by comprising the following steps of;
the classifying unit 610 classifies the power service of the smart grid according to the service type;
a classification and slice correspondence unit 620 that corresponds the classification to different slices;
the model construction unit 630 constructs a reinforcement learning model of the smart grid slice according to the service index of the smart grid; and the distribution of the intelligent power grid slices is completed through the reinforcement learning model, so that the resource scheduling management of the intelligent power grid is realized.
The utility model provides a smart grid slice distribution method based on reinforcement learning, which is characterized in that the service types of the smart grid are classified, the classification is corresponding to different slices, and the distribution of the smart grid slices is completed through a constructed reinforcement learning model of the smart grid slices. Therefore, the integration problem of the 5G network slicing technology and the intelligent power grid based on reinforcement learning is solved.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents thereof without departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.
Claims (6)
1. The intelligent power grid slice distribution method based on reinforcement learning is characterized by comprising the following steps of:
classifying the power business of the intelligent power grid according to the business type;
corresponding the classifications to different slices;
constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing the distribution of the intelligent power grid slice through the reinforcement learning model to realize the resource scheduling management of the intelligent power grid; the reinforcement learning model of the smart grid slice comprises: reinforcement learning models of a wireless access side and a core network side;
a reinforcement learning model at a radio access RAN side, comprising: given a series of existing slices χ 1 ,χ 2 ,...,χ n The vector χ is used to represent the set of existing slices as χ= { χ 1 ,χ 2 ,...,χ n -the slices share an aggregate bandwidth B; there is a series of traffic flows, with vector d= { D 1 ,d 2 ,...,d m -representation; vector D is actually a collection of smart grid services; in the face of multi-service characteristics of the intelligent power grid, the QoS requirements to be met by each slice service are different; taking the traffic D in the vector D i Where i e m= {1,2,..m } obeys a specific flow model;
firstly, defining a system state space, an action space and a reward function of a RAN side network; interaction of slice controller with wireless environment is composed of four groupsRepresentation of->Representing a state set->A set of actions is represented and,representing transitions from state s to s * Transition probability of->Is the reward associated with the action trigger in state s, +.>Is fed back to the slice controller;
The reinforcement learning model of the core network CN side comprises: interaction between slice controller and core network side is composed of four groupsA representation; defining a state space as +.>T q Is a vector, and characterizes the queuing state of each element in the vector D; when processing service d using N CPUs i When the ith element is T qi Representing service d i Where i e m= {1,2,..m };
in processing service d i When the CPU number N is needed to be selected at the core network side CPU The method comprises the steps of carrying out a first treatment on the surface of the Thus defining the action space asWherein a is CPU Indicating that incoming traffic d is faced i Where i e m= {1,2,..m }, the number of required CPUs is selected when performing the calculation;
in defining the reward function, a utility function is used:
to characterize the traffic d respectively i Wherein U e (x) Representing an elastic application utility model, U rt (x) Representing a utility model applied in real time, k 1 And k 2 Is an adjustable parameter.
2. The method of claim 1, wherein classifying the power traffic of the smart grid according to the traffic type comprises:
and classifying the electric power business of the intelligent power grid into a control class, an information acquisition class and a mobile application class according to the business type.
3. The method of claim 1, wherein the classifying corresponds to different slices, comprising:
the control class corresponds to the uRLLC slice, the information acquisition class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
4. The method of claim 1, wherein the constructing a reinforcement learning model of the smart grid, in particular, using Q -learning The algorithm builds a reinforcement learning model of the intelligent power grid.
5. The method as recited in claim 4, further comprising: based on Q -learning The algorithm builds a reinforcement learning model of the smart grid, distributes the smart grid slices, and comprises the following steps: the state set is
The action set is
Reward functionIs the reward associated with the action a trigger corresponding to state s,/is the action a trigger corresponding to state s>Representing transitions from state s to s * Is a transition probability of (2);
at any time, the slice controller in state s can select action a to obtain instant rewardsAt the same time, it will transition to the next state s', Q -learning The course of the algorithm can be expressed in terms of the following updated equation,
wherein γ represents a discount factor; t represents the time experienced from state s to s'; a 'represents an action in state s'; a (s ') represents the set of motion spaces in state s', α is the learning rate, andis all instant rewards->Is a discount accumulation for a group of (c) in the group,
wherein T represents the time elapsed from t=0 to the T-th time; by updating the Q value over a duration of time and by adjusting the values of alpha and gamma, it is ensured that Q (s, a) eventually converges at the optimum strategy, the converging resulting value being
6. Smart power grids section distribution device based on reinforcement study, characterized by comprising:
the classifying unit classifies the power business of the intelligent power grid according to the business type;
a classification and slice correspondence unit that corresponds the classification to different slices;
the model construction unit is used for constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid; the distribution of the intelligent power grid slices is completed through the reinforcement learning model, and the resource scheduling management of the intelligent power grid is realized; the reinforcement learning model of the smart grid slice comprises: reinforcement learning models of a wireless access side and a core network side;
a reinforcement learning model at a radio access RAN side, comprising: given a series of existing slices χ 1 ,χ 2 ,...,χ n The vector χ is used to represent the set of existing slices as χ= { χ 1 ,χ 2 ,...,χ n -the slices share an aggregate bandwidth B; there is a series of traffic flows, with vector d= { D 1 ,d 2 ,...,d m -representation; vector D is actually a collection of smart grid services; in the face of multi-service characteristics of the intelligent power grid, the QoS requirements to be met by each slice service are different; taking the traffic D in the vector D i Where i e m= {1,2,..m } obeys a specific flow model;
firstly, defining a system state space, an action space and a reward function of a RAN side network; interaction of slice controller with wireless environment is composed of four groupsRepresentation of->Representing a state set->A set of actions is represented and,representing transitions from state s to s * Transition probability of->Is the reward associated with the action trigger in state s, +.>Is fed back to the slice controller;
the reinforcement learning model of the core network CN side comprises: interaction between slice controller and core network side is composed of four groupsA representation; defining a state space as +.>T q Is a vector, and characterizes the queuing state of each element in the vector D; when processing service d using N CPUs i When the ith element is T qi Representing service d i Where i e m= {1,2,..m };
in processing service d i When the CPU number N is needed to be selected at the core network side CPU The method comprises the steps of carrying out a first treatment on the surface of the Thus defining the action space asWherein a is CPU Indicating that incoming traffic d is faced i Where i e m= {1,2,..m }, the number of required CPUs is selected when performing the calculation;
in defining the reward function, a utility function is used:
to characterize the traffic d respectively i Wherein U e (x) Representing an elastic application utility model, U rt (x) Representing a utility model applied in real time, k 1 And k 2 Is an adjustable parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910452242.7A CN110381541B (en) | 2019-05-28 | 2019-05-28 | Smart grid slice distribution method and device based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910452242.7A CN110381541B (en) | 2019-05-28 | 2019-05-28 | Smart grid slice distribution method and device based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110381541A CN110381541A (en) | 2019-10-25 |
CN110381541B true CN110381541B (en) | 2023-12-26 |
Family
ID=68248856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910452242.7A Active CN110381541B (en) | 2019-05-28 | 2019-05-28 | Smart grid slice distribution method and device based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110381541B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255347B (en) * | 2020-02-10 | 2022-11-15 | 阿里巴巴集团控股有限公司 | Method and equipment for realizing data fusion and method for realizing identification of unmanned equipment |
CN111292570B (en) * | 2020-04-01 | 2021-09-17 | 广州爱浦路网络技术有限公司 | Cloud 5GC communication experiment teaching system and teaching method based on project type teaching |
CN111953510B (en) * | 2020-05-15 | 2024-02-02 | 中国电力科学研究院有限公司 | Smart grid slice wireless resource allocation method and system based on reinforcement learning |
CN111726811B (en) * | 2020-05-26 | 2023-11-14 | 国网浙江省电力有限公司嘉兴供电公司 | Slice resource allocation method and system for cognitive wireless network |
CN111711538B (en) * | 2020-06-08 | 2021-11-23 | 中国电力科学研究院有限公司 | Power network planning method and system based on machine learning classification algorithm |
CN112365366B (en) * | 2020-11-12 | 2023-05-16 | 广东电网有限责任公司 | Micro-grid management method and system based on intelligent 5G slice |
CN112383427B (en) * | 2020-11-12 | 2023-01-20 | 广东电网有限责任公司 | 5G network slice deployment method and system based on IOTIPS fault early warning |
CN112737813A (en) * | 2020-12-11 | 2021-04-30 | 广东电力通信科技有限公司 | Power business management method and system based on 5G network slice |
CN113316188B (en) * | 2021-05-08 | 2022-05-17 | 北京科技大学 | AI engine supporting access network intelligent slice control method and device |
CN113225759B (en) * | 2021-05-28 | 2022-04-15 | 广东电网有限责任公司广州供电局 | Network slice safety and decision management method for 5G smart power grid |
CN113329414B (en) * | 2021-06-07 | 2023-01-10 | 深圳聚创致远科技有限公司 | Smart power grid slice distribution method based on reinforcement learning |
CN113630733A (en) * | 2021-06-29 | 2021-11-09 | 广东电网有限责任公司广州供电局 | Network slice distribution method and device, computer equipment and storage medium |
CN113840333B (en) * | 2021-08-16 | 2023-11-10 | 国网河南省电力公司信息通信公司 | Power grid resource allocation method and device, electronic equipment and storage medium |
CN114531403A (en) * | 2021-11-15 | 2022-05-24 | 海盐南原电力工程有限责任公司 | Power service network distinguishing method and system |
CN115460613A (en) * | 2022-04-14 | 2022-12-09 | 国网福建省电力有限公司 | Safe application and management method for power 5G slice |
CN115913966A (en) * | 2022-12-06 | 2023-04-04 | 中国联合网络通信集团有限公司 | Virtual network function deployment method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238631A (en) * | 2011-08-17 | 2011-11-09 | 南京邮电大学 | Method for managing heterogeneous network resources based on reinforcement learning |
CN108965024A (en) * | 2018-08-01 | 2018-12-07 | 重庆邮电大学 | A kind of virtual network function dispatching method of the 5G network slice based on prediction |
CN109495907A (en) * | 2018-11-29 | 2019-03-19 | 北京邮电大学 | A kind of the wireless access network-building method and system of intention driving |
CN109600262A (en) * | 2018-12-17 | 2019-04-09 | 东南大学 | Resource self-configuring and self-organization method and device in URLLC transmission network slice |
-
2019
- 2019-05-28 CN CN201910452242.7A patent/CN110381541B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238631A (en) * | 2011-08-17 | 2011-11-09 | 南京邮电大学 | Method for managing heterogeneous network resources based on reinforcement learning |
CN108965024A (en) * | 2018-08-01 | 2018-12-07 | 重庆邮电大学 | A kind of virtual network function dispatching method of the 5G network slice based on prediction |
CN109495907A (en) * | 2018-11-29 | 2019-03-19 | 北京邮电大学 | A kind of the wireless access network-building method and system of intention driving |
CN109600262A (en) * | 2018-12-17 | 2019-04-09 | 东南大学 | Resource self-configuring and self-organization method and device in URLLC transmission network slice |
Also Published As
Publication number | Publication date |
---|---|
CN110381541A (en) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110381541B (en) | Smart grid slice distribution method and device based on reinforcement learning | |
Abiko et al. | Flexible resource block allocation to multiple slices for radio access network slicing using deep reinforcement learning | |
CN113254197B (en) | Network resource scheduling method and system based on deep reinforcement learning | |
Sun et al. | Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning | |
Qian et al. | Survey on reinforcement learning applications in communication networks | |
CN111953510B (en) | Smart grid slice wireless resource allocation method and system based on reinforcement learning | |
CN104572307B (en) | The method that a kind of pair of virtual resource carries out flexible scheduling | |
Kim et al. | Multi-agent reinforcement learning-based resource management for end-to-end network slicing | |
Dai et al. | Psaccf: Prioritized online slice admission control considering fairness in 5g/b5g networks | |
Fan et al. | Multi-objective optimization of container-based microservice scheduling in edge computing | |
Rezazadeh et al. | On the specialization of fdrl agents for scalable and distributed 6g ran slicing orchestration | |
Zhou et al. | Learning from peers: Deep transfer reinforcement learning for joint radio and cache resource allocation in 5G RAN slicing | |
Othman et al. | Efficient admission control and resource allocation mechanisms for public safety communications over 5G network slice | |
Hlophe et al. | QoS provisioning and energy saving scheme for distributed cognitive radio networks using deep learning | |
Grasso et al. | Smart zero-touch management of uav-based edge network | |
Shen et al. | Goodbye to fixed bandwidth reservation: Job scheduling with elastic bandwidth reservation in clouds | |
CN114938372B (en) | Federal learning-based micro-grid group request dynamic migration scheduling method and device | |
Zhou et al. | Digital twin-empowered network planning for multi-tier computing | |
Balasubramanian et al. | Reinforcing cloud environments via index policy for bursty workloads | |
Shokrnezhad et al. | Double deep q-learning-based path selection and service placement for latency-sensitive beyond 5g applications | |
Ren et al. | A memetic algorithm for cooperative complex task offloading in heterogeneous vehicular networks | |
Lotfi et al. | Attention-based open RAN slice management using deep reinforcement learning | |
Zhang et al. | Vehicular multi-slice optimization in 5G: Dynamic preference policy using reinforcement learning | |
Guo et al. | Delay-based packet-granular QoS provisioning for mixed traffic in industrial internet of things | |
Rashtian et al. | Balancing message criticality and timeliness in IoT networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |