CN110381541B - Smart grid slice distribution method and device based on reinforcement learning - Google Patents

Smart grid slice distribution method and device based on reinforcement learning Download PDF

Info

Publication number
CN110381541B
CN110381541B CN201910452242.7A CN201910452242A CN110381541B CN 110381541 B CN110381541 B CN 110381541B CN 201910452242 A CN201910452242 A CN 201910452242A CN 110381541 B CN110381541 B CN 110381541B
Authority
CN
China
Prior art keywords
slice
reinforcement learning
state
intelligent power
power grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910452242.7A
Other languages
Chinese (zh)
Other versions
CN110381541A (en
Inventor
孟萨出拉
王智慧
丁慧霞
吴赛
杨德龙
孙丽丽
曹新智
滕玲
段钧宝
李许安
王莹
王雪
陈源彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI, Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910452242.7A priority Critical patent/CN110381541B/en
Publication of CN110381541A publication Critical patent/CN110381541A/en
Application granted granted Critical
Publication of CN110381541B publication Critical patent/CN110381541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/24Negotiating SLA [Service Level Agreement]; Negotiating QoS [Quality of Service]

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a smart grid slice distribution method based on reinforcement learning, which is characterized by comprising the following steps of: classifying the power business of the intelligent power grid according to the business type; corresponding the classifications to different slices; and constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing the distribution of the intelligent power grid slice through the reinforcement learning model to realize the resource scheduling management of the intelligent power grid. Classifying the service types of the intelligent power grid, corresponding the classifications to different slices, and completing the distribution of the intelligent power grid slices through the constructed reinforcement learning model of the intelligent power grid slices. Therefore, the integration problem of the 5G network slicing technology and the intelligent power grid based on reinforcement learning is solved.

Description

Smart grid slice distribution method and device based on reinforcement learning
Technical Field
The application relates to the field of network resource distribution of power wireless communication, in particular to a smart grid slice distribution method based on reinforcement learning, and simultaneously relates to a smart grid slice distribution device based on reinforcement learning.
Background
With the advent of the 5G age with high-speed ubiquitous, low-power consumption and low time delay, the communication of the human society is gradually smoothened. Network slicing is considered as one of the important key technologies of 5G networks, which divides a single physical network into a plurality of independent logical networks to support various vertical multi-service networks, and is allocated in different service scenarios according to the characteristics thereof to adapt to different service requirements. The network slicing technology can greatly save the deployment cost and reduce the occupancy of the network.
Driven by the growing energy and power demands, world grids have stepped into the smart grid era from traditional networks. In combination with new energy revolution, development of communication field and global Internet strategic conception, the 5G network slicing technology has the possibility of being applied to smart grid business for the first time. The technical characteristics of the 5G network slice have the characteristics of customizable slice, safe and reliable isolation among slices and unified slice management for the wireless service application of the bearing power grid, have the advantages of quick networking, high efficiency and economy, and have wide application prospects in power systems. Therefore, the fusion of the reinforcement learning-based 5G network slicing technology and the smart grid is a problem to be solved.
Disclosure of Invention
The application provides a smart grid slice distribution method based on reinforcement learning, which solves the problem of integration of a 5G network slice technology and a smart grid based on reinforcement learning.
The application provides a smart grid slice distribution method based on reinforcement learning, which is characterized by comprising the following steps:
classifying the power business of the intelligent power grid according to the business type;
corresponding the classifications to different slices;
and constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing the distribution of the intelligent power grid slice through the reinforcement learning model to realize the resource scheduling management of the intelligent power grid.
Preferably, classifying the power service of the smart grid according to the service type includes:
and classifying the electric power business of the intelligent power grid into a control class, an information acquisition class and a mobile application class according to the business type.
Preferably, the classifying corresponds to different slices, including:
the control class corresponds to the uRLLC slice, the information acquisition class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
Preferably, the constructing the reinforcement learning model of the smart grid specifically uses Q -learning The algorithm builds a reinforcement learning model of the intelligent power grid.
Preferably, the constructing the reinforcement learning model of the smart grid slice includes: and respectively constructing reinforcement learning models of the wireless access side and the core network side.
Preferably, the constructing the reinforcement learning model of the smart grid slice includes:
defining the state space as s= { S 1 ,s 2 ,...,s n };
The action space a is defined as a= { a 1 ,a 2 ,...,a n };
The reward function is r= { s, a }, P (s, s * ) Representing the transition probability of transitioning from state s to s';
at any time, the slice controller in state s can select action a to obtain instant rewards R t The process of the Q-learning algorithm, which also transitions to the next state s', can be expressed by the following updated equation,
where α is the learning rate, anIs all instant rewards R t Is a discount accumulation for a group of (c) in the group,
by updating the Q value for a sufficiently long duration and by adjusting the values of alpha and gamma, it is ensured that Q (s, a) can eventually converge to the value at the time of the optimal strategy, i.e.
The application also provides a smart grid slice distribution device based on reinforcement learning, which is characterized by comprising the following components;
the classifying unit classifies the power business of the intelligent power grid according to the business type;
a classification and slice correspondence unit that corresponds the classification to different slices;
the model construction unit is used for constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid; and the distribution of the intelligent power grid slices is completed through the reinforcement learning model, so that the resource scheduling management of the intelligent power grid is realized.
The utility model provides a smart grid slice distribution method based on reinforcement learning, which is characterized in that the service types of the smart grid are classified, the classification is corresponding to different slices, and the distribution of the smart grid slices is completed through a constructed reinforcement learning model of the smart grid slices. Therefore, the integration problem of the 5G network slicing technology and the intelligent power grid based on reinforcement learning is solved.
Drawings
Fig. 1 is a schematic flow chart of a smart grid slice allocation method based on reinforcement learning according to an embodiment of the present application;
fig. 2 is a schematic view of a slice architecture in a smart grid scenario according to an embodiment of the present application;
fig. 3 is a schematic diagram of a relationship between three types of services of a slice and a smart grid according to an embodiment of the present application;
fig. 4 is a QoS index of a typical traffic slice of a smart grid according to an embodiment of the present application;
FIG. 5 is a mapping of a smart grid slice resource management mechanism to a RL according to an embodiment of the present application;
fig. 6 is a schematic diagram of a smart grid slice allocation device based on reinforcement learning according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
Referring to fig. 1, fig. 1 is a schematic diagram of a smart grid slice allocation method based on reinforcement learning according to an embodiment of the present application, and the method provided in the present application is described in detail below with reference to fig. 1.
Step S101, classifying the power business of the intelligent power grid according to the business type.
First, a slice architecture under a smart grid scenario on which the present application is based is described, as shown in fig. 2.
The network slice helps to realize control/data plane decoupling of the network by means of SDN technology, and defines an open interface between the control/data plane decoupling and the open interface, so that flexible definition of network functions in the network slice is realized. To meet the needs of such services, network slices contain only network functions that support a particular service. The electric power business can be divided into three major categories, namely control categories (such as power distribution automation, accurate load control and the like), information collection categories (such as power consumption information collection, power transmission line monitoring and the like) and mobile application categories (such as intelligent inspection, mobile operation and the like).
Step S102, the classification is corresponding to different slices.
Fig. 3 is a relationship between three general classes of slices and three classes of traffic of the smart grid. The control class corresponds to the uRLLC slice, the information acquisition class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
And step S103, constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing distribution of the intelligent power grid slice through the reinforcement learning model to realize resource scheduling management of the intelligent power grid.
Fig. 4 shows QoS (service) indicators of typical traffic slices of a smart grid. The present application contemplates a traffic plane, an orchestration control plane, and a data plane. The service plane divides the services into elastic applications (Elastic application) and Real-Time applications (Real-Time application). Elastic applications can tolerate relatively large delays without minimum bandwidth requirements. Specific examples are car-in distributed power, video surveillance, user metering, etc. Real-time applications require their networks to provide a minimum level of performance guarantee. The main representative type is URLLC slice service, typical examples are distribution automation, emergency communication, etc. The data plane stores data generated by the interaction of the power device with the physical layer.
In the application, a control plane is mainly considered, an access network SDN (software defined network) controller and a core network SDN controller are introduced and respectively responsible for Network Function (NF) management and coordination (such as service migration and deployment) of the access network and the core network, which are equivalent to two different agents, and can communicate with each other to complete coordination work together. The slice arrangement controller of the arrangement control plane finishes the division of the slice network and is divided into a Radio Access Network (RAN) side slice and a Core Network (CN) side slice facing to various prior knowledge of service types, channel conditions and user requirements of the service plane. The network slices on the RAN side and the CN side are respectively managed by respective SDN controllers and are responsible for executing the algorithm on the respective network side, namely the intelligent power grid slice distribution method based on reinforcement learning.
The reinforcement learning model on the RAN side and the CN side proposed in the present application is described below.
(1) RAN side radio resource slice
Given a series of existing slices χ 12 ,...,χ n The vector χ is used to represent the set of existing slices as χ= { χ 12 ,...,χ n -the slices share an aggregate bandwidth B; there is a series of traffic flows, with vector d= { D 1 ,d 2 ,...,d m And } represents. The variable D is actually a set of smart grid traffic flows. The QoS requirements to be met by each slice service are different in the face of the multi-service feature of the smart grid. But the traffic flow is in particular what kind of traffic in the smart grid is not known in advance and the real-time demand change of the traffic is not stable in the context of the smart grid. It can be seen that d i (i e m= {1,2,..m }) obeys a specific flow model.
First, a system state space, an action space, and a bonus function of the RAN-side network need to be defined. Interaction of slice controller with wireless environment is performed by the tuples [ S, a, P (S, S) * ),R(s,a)]Representation, where S represents a set of possible states, A represents a set of possible actions, P (S, S * ) Representing the transition probability of transitioning from state s to s', R (s, a) is the reward associated with the action trigger in state s, which is fed back to the slice controller. The mapping of wireless access side slice resource management to RL is as follows.
A. State space:
the state space is defined as a set of tuples S s= { S slice }。s slice Is a vector representing the status of all currently available bearer-related power traffic slices, where the nth element is
B. Action space:
in the face of time-varying unknown traffic flow models, reinforcement learning agents (agents) must allocate appropriate slice resources for the corresponding power traffic. The agent may decide how to perform the action next moment based on the current slice state and the reward function. The action space a is defined as a= { a bandwidth },a bandwidth The presentation Agent (Agent) allocates an appropriate bandwidth for each logically independently partitioned slice to carry the corresponding traffic.
Since network slices share network resources between virtual networks, virtual network slices must be isolated from each other so that if resources on one slice are insufficient to carry current traffic, congestion or failure occurs without affecting other slices. Therefore, to ensure that the isolation of the slices is maximized with the utility of resource allocation, it is defined that each slice can only carry one kind of traffic at most:
simultaneously defining binary variables
C. Reward function
After the agent distributes a specific slice to a certain smart grid service, a comprehensive benefit is obtained, and the comprehensive benefit is taken as a reward of the system. The control type power service has very strict requirements on the time delay and the bit error rate of communication, and the failure or error of the communication can influence the control execution of the power grid, so that the power grid operation is faulty. For some mobile application services (such as inspecting transmission video, playing back high-definition video, etc.), a certain transmission rate is required to be ensured, and a high requirement on communication bandwidth is met. The power supply reliability means continuous, sufficient, high quality power supply. For example, when the power supply reliability reaches 99.999% ("5 pieces of 9"), it means that the power outage time of the annual average of the regional power consumers does not exceed 5 minutes, and when this number reaches 99.9999% ("6 pieces of 9 pieces), the power outage time of the annual average of the regional power consumers is reduced to about 30 seconds. Due to limited spectrum resources on the RAN side, an optimal policy should be chosen when assigning slices to maximize the QoS requirements of the users.
Mainly consider the downlink situation, and use Spectral Efficiency (SE) and Delay (Delay) as evaluation indexes. The spectral efficiency of a system can be defined as:
according to shannon formula r=blog 2 (1+(g BS→UE P)/σ 2 ) The actual Base Station (BS) to user rate can be derived, where g BS→UE Is the Channel State (CSI) between base stations to devices, subject to rayleigh fading.
In describing the QoS requirements of a user, we introduce a utility function (utility function), i.e. a curve mapping between the bandwidth to which the slice traffic is allocated and the performance perceived by the user. In this context we assume that the traffic carried by a slice can be divided into elastic applications and real-time applications.
(a) Elastic application
For this type of application there is no minimum bandwidth requirement, as it can tolerate relatively large delays. The elastic flow utility model employs the following functions:
where k is an adjustable parameter that determines the shape of the utility function and ensures that, when the maximum requested bandwidth is received,but even if very high bandwidths are provided, user satisfaction with this application is very difficult to achieve 1. Therefore, we consider that bandwidth allocation to this application type does not occur even in the case of excessive network bandwidthShould exceed the maximum bandwidth b max
(b) Real-time application
This type of application traffic requires its network to provide a minimum level of performance guarantee. If the allocated bandwidth falls below a certain threshold, qoS will become unacceptable. Real-time applications are modeled using the following utility functions:
wherein k is 1 ,k 2 Are adjustable parameters that determine the shape of the utility function.
The rewards for defining the learning agent are as follows:
R=λ·SE+μ·U e +ξ·U rt
wherein λ, μ, ζ is SE, U e And U rt Is a weight of (2).
Thus, from a mathematical perspective, our problem can be formulated as:
d i (i e m= {1,2,..m }) obeys a specific flow model (
The key difficulty in solving the problem is that due to the existence of the flow model, the service demand change is unstable under the condition of not knowing in advance, i.e. the service real-time demand change in the smart grid scene is unknown.
(2) Core network slice based on priority scheduling
Similarly, e.g.If we virtualize the computational resources as VNFs per tile, then the problem of allocating computational resources to each VNF can be solved like a slice of radio resources. Therefore, in this section we discuss another important issue, namely priority-based generic VNFs core network slicing. The mapping we use is slightly different from the radio resource slices to represent the flexibility of the RL. Similarly, the interaction of the slice controller with the core network side is also performed by four tuples [ S, a, P (S, S) * ),R(s,a)]The appropriate mapping of RL elements to this slicing problem is denoted as defined below.
A. State space
On the core network side there are related Service Function Chains (SFCs) which have the same basic functions but which consume different Computational Processing Units (CPUs) and produce different results, such as queuing times for the traffic. For example, based on business value or other smart grid business related features, the business flows may be classified into three classes (e.g., class a, class B, class C), with priorities gradually decreasing from class a to class C, and priority-based scheduling rules defined as: SFC I processes class A traffic flows preferentially, SFC II treats class A and class B traffic flows equally, but service class C traffic flows have the lowest priority. SFC III is one-view of all traffic flows. The queuing time of traffic is generated when scheduling based on priority.
The state space may be defined as t= { T q },T q Is a vector that characterizes the queuing state of each element in the service set D. When N CPUs are used to calculate service d i When the ith element is T qi Representing service d i Where i e m= {1,2,..m }.
B. Action space
The CPU that each SFC ultimately uses depends on the number of traffic streams it has processed. In the case of a limited number of CPUs, each type of traffic flow needs to be scheduled to the appropriate SFC, resulting in acceptable queuing times. Thus processing traffic d i When the CPU number N is needed to be selected at the core network side CPU . Thus defining the action space as A CPU ={a CPU (wherein a) CPU Representation ofFacing incoming traffic d i (i e m= {1,2,..m }) the number of required CPUs is selected when performing the calculation.
C. Reward function
In defining the reward function, we first need the utility function U to characterize the sensitivity of the current traffic to latency, and then define a new metric "network request value" function W to characterize the traffic priority.
It has been mentioned above that in describing elastic applications and real-time applications, we use utility functions:
to characterize the traffic d respectively i Is not required for QoS. In contrast to the RAN side, in which the argument is changed to the computation service d i The number of CPUs n required by the core network side. But this can only reflect QoS requirements of different services. Due to the limited computational resources, after the computational resources are allocated, a reasonable scheduling rule is required to reflect which service is to be prioritized, and therefore a "network request cost" function W is introduced to characterize the priority of the service. For any application service d i The value of the network request to be satisfied is defined as:
W i =2 (p) U i
where p is traffic d i Priority level of U i Is any element in the elastic application and real-time application composition set, namely U i ∈{U e ,U kt }. Weight of service request 2 (p) Indicating the importance of the request relative to other requests. Defining a reward function as:
R=W i
the above-mentioned service d can only be obtained i We need to get priority queuing for a series of servicesIn the case of a long-term prize, the accumulation is therefore required to be maximized, i.e
FIG. 5 is a mapping of smart grid slice resource management mechanism to RL:
next, a slice allocation method based on reinforcement learning in the above model background proposed in the present application will be described.
Q-based -learning Reinforcement learning algorithms on RAN and CN sides. Since the expressions of RAN, CN side state set, action set and rewarding function are slightly different in the above, and here, Q is based on our proposed mapping model of RL to RAN, CN -learning The algorithm has universality, and for convenience of representation, the unified state space is S= { S in this section 1 ,s 2 ,...,s n Motion space is a= { a } 1 ,a 2 ,...,a n The reward function is r= { s, a }, P (s, s) * ) Representing the transition probability of a transition from state s to s'.
The final goal of the slice controller is to find the optimal slicing strategy pi * The policy is a mapping from state sets to action sets and needs to maximize the expected long-term discount rewards for each state:
the long-term discount rewards for state s is the sum of the discounts for rewards obtained on the state trajectory and is given by:
R(s,π(s))+γR(s 1 ,π(s 1 ))+γ 2 R(s 2 ,π(s 2 ))+...
where γ is a discount factor (0 < γ < 1), determining the present value for the future prize. The optimization objective in formula (x) represents the state value function of any policy, which can be expressed as follows:
there is at least one optimization strategy in a single environment setting according to the optimality criteria of Bellman. Thus, the state value function of the optimal strategy is given by:
the state transition probabilities depend on many factors, such as traffic load, traffic arrival and departure rates, decision algorithms, etc., and thus may not be readily available either on the radio side or on the core network side. Model-free reinforcement learning is therefore well suited to deriving an optimal strategy because it does not require the expectation of rewards and the state transition probabilities can be known as a priori knowledge. Among the various existing RL algorithms, we choose Q -learning
Taking the RAN side as an example, the slice controller interacts with the wireless environment in a very short discrete time period. The action-value function (also referred to as Q value) of a state-action doublet (s, pi (s)) may be represented as Q (s, pi (s)). Q (s, pi (s)) is defined as the expected long-term discount prize for state s when using policy pi. Our goal is to find an optimization strategy that maximizes the Q value for each state s:
according to Q -learning The algorithm, the slice controller can learn the optimal Q value through iteration based on the existing information. At any time, the slice controller in state s may select action a. This will result in a bonus instant prize R t At the same time, the next state s' is also shifted. Q (Q) -learning The process of the algorithm can be expressed by the following updated equation:
where α is the learning rate, anIs all instant rewards R t Discount accumulation of (c):
by updating the Q value for a sufficiently long duration and by adjusting the values of alpha and gamma, it is ensured that Q (s, a) can eventually converge to the value at the time of the optimal strategy, i.e.
The whole slicing strategy is given by the following algorithm. Initially, the Q value is set to 0. At Q -learning Prior to algorithm application, the slice controller performs initial slice allocation on the different slices based on the power traffic flow demand estimates for each slice, which is done for state initialization of the different slices. Existing radio resource slicing solutions use bandwidth-based or resource-based provisioning to allocate radio resources to different slices.
Due to Q -learning Is an online iterative learning algorithm that performs two different types of operations. In the explore mode, the slice controller randomly selects one possible action to enhance its future decisions. In contrast, in the development mode, the slice controller prefers that it has attempted in the past and found efficient operation. We assume that the slice controller in state s explores with a probability of epsilon and uses the previously stored Q value with a probability of 1-epsilon. In any state, not all actions are possible in order to maintain slice-to-slice isolation, the slice controller must ensure that the same physical resource blocks (PRBs) Assigned to two different slices (RAN side).
Corresponding to the method provided by the application, the application also provides a smart grid slice distribution device 600 based on reinforcement learning, which is characterized by comprising the following steps of;
the classifying unit 610 classifies the power service of the smart grid according to the service type;
a classification and slice correspondence unit 620 that corresponds the classification to different slices;
the model construction unit 630 constructs a reinforcement learning model of the smart grid slice according to the service index of the smart grid; and the distribution of the intelligent power grid slices is completed through the reinforcement learning model, so that the resource scheduling management of the intelligent power grid is realized.
The utility model provides a smart grid slice distribution method based on reinforcement learning, which is characterized in that the service types of the smart grid are classified, the classification is corresponding to different slices, and the distribution of the smart grid slices is completed through a constructed reinforcement learning model of the smart grid slices. Therefore, the integration problem of the 5G network slicing technology and the intelligent power grid based on reinforcement learning is solved.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, one skilled in the art may make modifications and equivalents to the specific embodiments of the present invention, and any modifications and equivalents thereof without departing from the spirit and scope of the present invention are within the scope of the claims of the present invention.

Claims (6)

1. The intelligent power grid slice distribution method based on reinforcement learning is characterized by comprising the following steps of:
classifying the power business of the intelligent power grid according to the business type;
corresponding the classifications to different slices;
constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid, and completing the distribution of the intelligent power grid slice through the reinforcement learning model to realize the resource scheduling management of the intelligent power grid; the reinforcement learning model of the smart grid slice comprises: reinforcement learning models of a wireless access side and a core network side;
a reinforcement learning model at a radio access RAN side, comprising: given a series of existing slices χ 12 ,...,χ n The vector χ is used to represent the set of existing slices as χ= { χ 12 ,...,χ n -the slices share an aggregate bandwidth B; there is a series of traffic flows, with vector d= { D 1 ,d 2 ,...,d m -representation; vector D is actually a collection of smart grid services; in the face of multi-service characteristics of the intelligent power grid, the QoS requirements to be met by each slice service are different; taking the traffic D in the vector D i Where i e m= {1,2,..m } obeys a specific flow model;
firstly, defining a system state space, an action space and a reward function of a RAN side network; interaction of slice controller with wireless environment is composed of four groupsRepresentation of->Representing a state set->A set of actions is represented and,representing transitions from state s to s * Transition probability of->Is the reward associated with the action trigger in state s, +.>Is fed back to the slice controller;
The reinforcement learning model of the core network CN side comprises: interaction between slice controller and core network side is composed of four groupsA representation; defining a state space as +.>T q Is a vector, and characterizes the queuing state of each element in the vector D; when processing service d using N CPUs i When the ith element is T qi Representing service d i Where i e m= {1,2,..m };
in processing service d i When the CPU number N is needed to be selected at the core network side CPU The method comprises the steps of carrying out a first treatment on the surface of the Thus defining the action space asWherein a is CPU Indicating that incoming traffic d is faced i Where i e m= {1,2,..m }, the number of required CPUs is selected when performing the calculation;
in defining the reward function, a utility function is used:
to characterize the traffic d respectively i Wherein U e (x) Representing an elastic application utility model, U rt (x) Representing a utility model applied in real time, k 1 And k 2 Is an adjustable parameter.
2. The method of claim 1, wherein classifying the power traffic of the smart grid according to the traffic type comprises:
and classifying the electric power business of the intelligent power grid into a control class, an information acquisition class and a mobile application class according to the business type.
3. The method of claim 1, wherein the classifying corresponds to different slices, comprising:
the control class corresponds to the uRLLC slice, the information acquisition class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
4. The method of claim 1, wherein the constructing a reinforcement learning model of the smart grid, in particular, using Q -learning The algorithm builds a reinforcement learning model of the intelligent power grid.
5. The method as recited in claim 4, further comprising: based on Q -learning The algorithm builds a reinforcement learning model of the smart grid, distributes the smart grid slices, and comprises the following steps: the state set is
The action set is
Reward functionIs the reward associated with the action a trigger corresponding to state s,/is the action a trigger corresponding to state s>Representing transitions from state s to s * Is a transition probability of (2);
at any time, the slice controller in state s can select action a to obtain instant rewardsAt the same time, it will transition to the next state s', Q -learning The course of the algorithm can be expressed in terms of the following updated equation,
wherein γ represents a discount factor; t represents the time experienced from state s to s'; a 'represents an action in state s'; a (s ') represents the set of motion spaces in state s', α is the learning rate, andis all instant rewards->Is a discount accumulation for a group of (c) in the group,
wherein T represents the time elapsed from t=0 to the T-th time; by updating the Q value over a duration of time and by adjusting the values of alpha and gamma, it is ensured that Q (s, a) eventually converges at the optimum strategy, the converging resulting value being
6. Smart power grids section distribution device based on reinforcement study, characterized by comprising:
the classifying unit classifies the power business of the intelligent power grid according to the business type;
a classification and slice correspondence unit that corresponds the classification to different slices;
the model construction unit is used for constructing a reinforcement learning model of the intelligent power grid slice according to the service index of the intelligent power grid; the distribution of the intelligent power grid slices is completed through the reinforcement learning model, and the resource scheduling management of the intelligent power grid is realized; the reinforcement learning model of the smart grid slice comprises: reinforcement learning models of a wireless access side and a core network side;
a reinforcement learning model at a radio access RAN side, comprising: given a series of existing slices χ 12 ,...,χ n The vector χ is used to represent the set of existing slices as χ= { χ 12 ,...,χ n -the slices share an aggregate bandwidth B; there is a series of traffic flows, with vector d= { D 1 ,d 2 ,...,d m -representation; vector D is actually a collection of smart grid services; in the face of multi-service characteristics of the intelligent power grid, the QoS requirements to be met by each slice service are different; taking the traffic D in the vector D i Where i e m= {1,2,..m } obeys a specific flow model;
firstly, defining a system state space, an action space and a reward function of a RAN side network; interaction of slice controller with wireless environment is composed of four groupsRepresentation of->Representing a state set->A set of actions is represented and,representing transitions from state s to s * Transition probability of->Is the reward associated with the action trigger in state s, +.>Is fed back to the slice controller;
the reinforcement learning model of the core network CN side comprises: interaction between slice controller and core network side is composed of four groupsA representation; defining a state space as +.>T q Is a vector, and characterizes the queuing state of each element in the vector D; when processing service d using N CPUs i When the ith element is T qi Representing service d i Where i e m= {1,2,..m };
in processing service d i When the CPU number N is needed to be selected at the core network side CPU The method comprises the steps of carrying out a first treatment on the surface of the Thus defining the action space asWherein a is CPU Indicating that incoming traffic d is faced i Where i e m= {1,2,..m }, the number of required CPUs is selected when performing the calculation;
in defining the reward function, a utility function is used:
to characterize the traffic d respectively i Wherein U e (x) Representing an elastic application utility model, U rt (x) Representing a utility model applied in real time, k 1 And k 2 Is an adjustable parameter.
CN201910452242.7A 2019-05-28 2019-05-28 Smart grid slice distribution method and device based on reinforcement learning Active CN110381541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910452242.7A CN110381541B (en) 2019-05-28 2019-05-28 Smart grid slice distribution method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910452242.7A CN110381541B (en) 2019-05-28 2019-05-28 Smart grid slice distribution method and device based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN110381541A CN110381541A (en) 2019-10-25
CN110381541B true CN110381541B (en) 2023-12-26

Family

ID=68248856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910452242.7A Active CN110381541B (en) 2019-05-28 2019-05-28 Smart grid slice distribution method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110381541B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255347B (en) * 2020-02-10 2022-11-15 阿里巴巴集团控股有限公司 Method and equipment for realizing data fusion and method for realizing identification of unmanned equipment
CN111292570B (en) * 2020-04-01 2021-09-17 广州爱浦路网络技术有限公司 Cloud 5GC communication experiment teaching system and teaching method based on project type teaching
CN111953510B (en) * 2020-05-15 2024-02-02 中国电力科学研究院有限公司 Smart grid slice wireless resource allocation method and system based on reinforcement learning
CN111726811B (en) * 2020-05-26 2023-11-14 国网浙江省电力有限公司嘉兴供电公司 Slice resource allocation method and system for cognitive wireless network
CN111711538B (en) * 2020-06-08 2021-11-23 中国电力科学研究院有限公司 Power network planning method and system based on machine learning classification algorithm
CN112365366B (en) * 2020-11-12 2023-05-16 广东电网有限责任公司 Micro-grid management method and system based on intelligent 5G slice
CN112383427B (en) * 2020-11-12 2023-01-20 广东电网有限责任公司 5G network slice deployment method and system based on IOTIPS fault early warning
CN112737813A (en) * 2020-12-11 2021-04-30 广东电力通信科技有限公司 Power business management method and system based on 5G network slice
CN113316188B (en) * 2021-05-08 2022-05-17 北京科技大学 AI engine supporting access network intelligent slice control method and device
CN113225759B (en) * 2021-05-28 2022-04-15 广东电网有限责任公司广州供电局 Network slice safety and decision management method for 5G smart power grid
CN113329414B (en) * 2021-06-07 2023-01-10 深圳聚创致远科技有限公司 Smart power grid slice distribution method based on reinforcement learning
CN113630733A (en) * 2021-06-29 2021-11-09 广东电网有限责任公司广州供电局 Network slice distribution method and device, computer equipment and storage medium
CN113840333B (en) * 2021-08-16 2023-11-10 国网河南省电力公司信息通信公司 Power grid resource allocation method and device, electronic equipment and storage medium
CN114531403A (en) * 2021-11-15 2022-05-24 海盐南原电力工程有限责任公司 Power service network distinguishing method and system
CN115460613A (en) * 2022-04-14 2022-12-09 国网福建省电力有限公司 Safe application and management method for power 5G slice
CN115913966A (en) * 2022-12-06 2023-04-04 中国联合网络通信集团有限公司 Virtual network function deployment method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238631A (en) * 2011-08-17 2011-11-09 南京邮电大学 Method for managing heterogeneous network resources based on reinforcement learning
CN108965024A (en) * 2018-08-01 2018-12-07 重庆邮电大学 A kind of virtual network function dispatching method of the 5G network slice based on prediction
CN109495907A (en) * 2018-11-29 2019-03-19 北京邮电大学 A kind of the wireless access network-building method and system of intention driving
CN109600262A (en) * 2018-12-17 2019-04-09 东南大学 Resource self-configuring and self-organization method and device in URLLC transmission network slice

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102238631A (en) * 2011-08-17 2011-11-09 南京邮电大学 Method for managing heterogeneous network resources based on reinforcement learning
CN108965024A (en) * 2018-08-01 2018-12-07 重庆邮电大学 A kind of virtual network function dispatching method of the 5G network slice based on prediction
CN109495907A (en) * 2018-11-29 2019-03-19 北京邮电大学 A kind of the wireless access network-building method and system of intention driving
CN109600262A (en) * 2018-12-17 2019-04-09 东南大学 Resource self-configuring and self-organization method and device in URLLC transmission network slice

Also Published As

Publication number Publication date
CN110381541A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN110381541B (en) Smart grid slice distribution method and device based on reinforcement learning
Abiko et al. Flexible resource block allocation to multiple slices for radio access network slicing using deep reinforcement learning
CN113254197B (en) Network resource scheduling method and system based on deep reinforcement learning
Sun et al. Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning
Qian et al. Survey on reinforcement learning applications in communication networks
CN111953510B (en) Smart grid slice wireless resource allocation method and system based on reinforcement learning
CN104572307B (en) The method that a kind of pair of virtual resource carries out flexible scheduling
Kim et al. Multi-agent reinforcement learning-based resource management for end-to-end network slicing
Dai et al. Psaccf: Prioritized online slice admission control considering fairness in 5g/b5g networks
Fan et al. Multi-objective optimization of container-based microservice scheduling in edge computing
Rezazadeh et al. On the specialization of fdrl agents for scalable and distributed 6g ran slicing orchestration
Zhou et al. Learning from peers: Deep transfer reinforcement learning for joint radio and cache resource allocation in 5G RAN slicing
Othman et al. Efficient admission control and resource allocation mechanisms for public safety communications over 5G network slice
Hlophe et al. QoS provisioning and energy saving scheme for distributed cognitive radio networks using deep learning
Grasso et al. Smart zero-touch management of uav-based edge network
Shen et al. Goodbye to fixed bandwidth reservation: Job scheduling with elastic bandwidth reservation in clouds
CN114938372B (en) Federal learning-based micro-grid group request dynamic migration scheduling method and device
Zhou et al. Digital twin-empowered network planning for multi-tier computing
Balasubramanian et al. Reinforcing cloud environments via index policy for bursty workloads
Shokrnezhad et al. Double deep q-learning-based path selection and service placement for latency-sensitive beyond 5g applications
Ren et al. A memetic algorithm for cooperative complex task offloading in heterogeneous vehicular networks
Lotfi et al. Attention-based open RAN slice management using deep reinforcement learning
Zhang et al. Vehicular multi-slice optimization in 5G: Dynamic preference policy using reinforcement learning
Guo et al. Delay-based packet-granular QoS provisioning for mixed traffic in industrial internet of things
Rashtian et al. Balancing message criticality and timeliness in IoT networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant