CN110381541A

CN110381541A - A kind of smart grid slice distribution method and device based on intensified learning

Info

Publication number: CN110381541A
Application number: CN201910452242.7A
Authority: CN
Inventors: 孟萨出拉; 王智慧; 丁慧霞; 吴赛; 杨德龙; 孙丽丽; 曹新智; 滕玲; 段钧宝; 李许安; 王莹; 王雪; 陈源彬
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date: 2019-05-28
Filing date: 2019-05-28
Publication date: 2019-10-25
Anticipated expiration: 2039-05-28
Also published as: CN110381541B

Abstract

The invention discloses a kind of, and the smart grid based on intensified learning is sliced distribution method characterized by comprising the power business of smart grid is classified according to type of service；By the corresponding different slice of the classification；The intensified learning model that smart grid slice is constructed according to the service indication of smart grid completes the distribution being sliced to smart grid, realizes the resource scheduling management of smart grid by the intensified learning model.By the way that the type of service of smart grid is classified, by the corresponding different slice of classification, the intensified learning model being sliced by the smart grid of building completes the distribution being sliced to smart grid.To solve the integration problem of 5G network microtomy and smart grid based on intensified learning.

Description

A kind of smart grid slice distribution method and device based on intensified learning

Technical field

This application involves the Internet resources of electric power wireless communication to distribute field, and in particular to a kind of intelligence based on intensified learning Energy power grid is sliced distribution method, while being related to a kind of smart grid slice distributor based on intensified learning.

Background technique

With high speed is ubiquitous, low-power consumption, low time delay the 5G epoch arrival, the communication of human society is done step-by-step unimpeded Change.Network slice is considered as one of important key technology of 5G network, and single physical network is divided into multiple independent patrol Network is collected to be allocated in different business scenarios, to support various vertical multi-service networks, and according to its characteristic to adapt to Different demands for services.The cost of deployment can be greatlyd save using network microtomy and reduces the occupation rate of network.

Under the driving that the energy and electricity needs increase, world power grid has stepped into the smart grid epoch from traditional network.Knot The development and Global Internet strategic idea of new round energy revolution, the communications field are closed, 5G network microtomy has for the first time A possibility that for smart grid business is applied to.The technical characteristic of 5G network slice is for carrying the wireless traffic towards power grid Using having the characteristics of slice customizable, be securely and reliably isolated and be sliced unified management between slice, and have quickly networking, The advantage of high-efficiency and economic, there is broad prospect of application in the power system.So the 5G network microtomy based on intensified learning With smart grid merge be urgent need to resolve the problem of.

Summary of the invention

The application provides a kind of smart grid slice distribution method based on intensified learning, solves the 5G based on intensified learning The integration problem of network microtomy and smart grid.

The application provides a kind of smart grid slice distribution method based on intensified learning characterized by comprising

The power business of smart grid is classified according to type of service；

By the corresponding different slice of the classification；

The intensified learning model that smart grid slice is constructed according to the service indication of smart grid, passes through the intensified learning Model completes the distribution being sliced to smart grid, realizes the resource scheduling management of smart grid.

Preferably, the power business of smart grid is classified according to type of service, comprising:

The power business of smart grid is divided into control class, information collection class and mobile application class according to type of service.

Preferably, by the corresponding different slice of the classification, comprising:

Control class is corresponded into uRLLC slice, information collection class is corresponded into mMTC slice, mobile application class is corresponded into eMBB and is cut Piece.

Preferably, the intensified learning model of the building smart grid, specifically, using Q_-learningAlgorithm building intelligence The intensified learning model of power grid.

Preferably, the intensified learning model of the building smart grid slice, comprising: construct wireless access side and core respectively The intensified learning model of heart net side.

Preferably, the intensified learning model of the building smart grid slice, comprising:

State space is defined as S={ s₁,s₂,...,s_n}；

Motion space A is defined as A={ a₁,a₂,...,a_n}；

Reward function is R={ s, a }, P (s, s^*) indicate the transition probability that s' is transferred to from state s；

At any time, the slice controller in state s can choose movement a, and receive awards reward R immediately_t, together When can also be transferred to next state s', the process of Q-learning algorithm can use the formula statement updated as follows,

Wherein α is learning rate, andIt is all instant reward R_tDiscount accumulation,

Can by updating Q value within the sufficiently long duration, and by adjusting the value of α and γ, guarantee Q (s, a) most Value when optimal policy can be converged to eventually, i.e.,

The application provides a kind of smart grid slice distributor based on intensified learning simultaneously, which is characterized in that including；

Taxon classifies the power business of smart grid according to type of service；

Classification and slice corresponding unit, by the corresponding different slice of the classification；

Model construction unit constructs the intensified learning model of smart grid slice according to the service indication of smart grid；It is logical The intensified learning model is crossed, the distribution being sliced to smart grid is completed, realizes the resource scheduling management of smart grid.

The application provides a kind of smart grid slice distribution method based on intensified learning, by by the business of smart grid Type is classified, by the corresponding different slice of classification, the intensified learning model being sliced by the smart grid of building, and completion pair The distribution of smart grid slice.To solve the integration problem of 5G network microtomy and smart grid based on intensified learning.

Detailed description of the invention

Fig. 1 is that a kind of process of smart grid slice distribution method based on intensified learning provided by the embodiments of the present application is shown It is intended to；

Fig. 2 be the invention relates to smart grid scene under slide holding frame structure schematic diagram；

Fig. 3 be the invention relates to slice and smart grid three classes business between relation schematic diagram；

Fig. 4 be the invention relates to smart grid typical services slice QoS index；

Fig. 5 be the invention relates to smart grid be sliced resource management mechanism to RL mapping；

Fig. 6 is a kind of smart grid slice distributor schematic diagram based on intensified learning provided by the embodiments of the present application.

Specific embodiment

Many details are explained in the following description in order to fully understand the application.But the application can be with Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where Under do similar popularization, therefore the application is not limited by following public specific implementation.

Fig. 1 is please referred to, Fig. 1 is a kind of smart grid slice distribution side based on intensified learning provided by the embodiments of the present application Method is described in detail method provided by the present application below with reference to Fig. 1.

Step S101 classifies the power business of smart grid according to type of service.

Firstly, introduce the application based on smart grid scene under slide holding frame structure, as shown in Figure 2.

Network slice realizes that network-based control/data plane decouples by the help of SDN technology, and definition is opened therebetween Interface is put, realizes the flexible definition to the network function in network slice.To meet the needs of this kind of business, network slice is only wrapped Containing the network function for supporting specific transactions.Power business can be divided into control class (such as power distribution automation, accurate load control system Deng), information collection class (such as power information acquisition, transmission line of electricity monitoring), mobile application class (such as intelligent patrol detection, mobile operation Deng) three categories.

Step S102, by the corresponding different slice of the classification.

Fig. 3 is the relationship between three categories slice and the three classes business of smart grid.Control class is corresponded into uRLLC slice, Information collection class is corresponded into mMTC slice, mobile application class is corresponded into eMBB slice.

Step S103 constructs the intensified learning model of smart grid slice according to the service indication of smart grid, passes through institute Intensified learning model is stated, the distribution being sliced to smart grid is completed, realizes the resource scheduling management of smart grid.

Fig. 4 gives QoS (service) index of smart grid typical services slice.The application considers service plane, layout Control plane and data plane.Delineation of activities is flexible application (Elastic application) and answered in real time by service plane With (Real-Time application).Flexible application can tolerate relatively large delay, and there is no minimum bandwidth requirements. Specific example, such as automobile enter a distributed generation resource, video monitoring, user's metering.Real-time its network of application requirement provides most The performance guarantee of low level.Main representative type is URLLC slice business, and typical example is power distribution automation, emergency communication Deng.Data plane stores power equipment and interacts the data generated with physical layer.

In this application, emphasis consider layout control plane, introduce access net SDN (software defined network) controller and Core net SDN controller is each responsible for network function (NF) management of access net and core net and coordinates (such as services migrating and portion Administration), they are equivalent to two different agencies, between can be in communication with each other and common complete co-ordination.In face of service plane Type of service, channel condition, user demand all kinds of priori knowledges, the slice layout controller of layout control plane completes to cutting The division of piece network, and it is divided into wireless access network (RAN) side slicing and core net (CN) side slicing.The network of the side RAN and the side CN Slice is managed by respective SDN controller respectively, the responsible algorithm for executing respective network side, that is, the application proposition Smart grid based on intensified learning is sliced distribution method.

Illustrate the intensified learning model for the side RAN and the side CN that the application proposes below.

(1) side RAN radio resource is sliced

Give a series of existing slice χ₁,χ₂,...,χ_n, indicate that the collection of existing slice is combined into χ={ χ with vector χ₁, χ₂,...,χ_n, these are sliced shared aggregate bandwidth B；There are a series of Business Streams, with vector D={ d₁,d₂,...,d_mTable Show.Variables D is actually the set that smart grid Business Stream is constituted.In face of smart grid multi-service feature, every kind of slice business The qos requirement of required satisfaction is different.But the Business Stream is specifically which kind of business in smart grid, unknown in advance, and The real-time requirement variation of business is unstable under the scene of smart grid.It can be seen that d_i(i ∈ M={ 1,2 .., m }) Obey specific discharge model.

Firstly the need of system state space, motion space and the reward function for defining the side RAN network.Be sliced controller with The interaction of wireless environment is by tuple [S, A, P (s, s^*), R (s, a)] it indicates, wherein S indicates possible state set, and A is indicated may Behavior aggregate, P (s, s^*) indicate to be transferred to the transition probability of s' from state s, (s a) is and the action triggers phase in state s R Associated reward is fed back to slice controller.The following are the mappings of wireless access side slicing resource management to RL.

A. state space:

State space is defined as a group of components S={ s^slice}。s^sliceIt is a vector, is used to indicate currently all The state of carrying associate power business slice can be used, wherein nth elements are

B. motion space:

The agency (Agent) of the service traffics model unknown in face of time-varying, intensified learning is necessary for corresponding power business The suitable slice resource of distribution.Agency can determine how lower a moment executes according to current slice state and reward function Movement.Motion space A is defined as A={ a^bandwidth, a_bandwidthIndicate that agency (Agent) is each to be logically independent to draw The slice divided distributes suitable bandwidth to carry corresponding business.

Since network slice is to share Internet resources between virtual network, must be mutually isolated between virtual network piece, Other slices are not interfered with when if congestion or failure occurs to carry current business so as to the inadequate resource on a slice. Therefore, to guarantee the isolation of slice with the maximization of utility of resource allocation, a kind of industry can only at most be carried by limiting each slice Business:

Two-valued variable is limited simultaneously

C. reward function

After specific slice is distributed to certain smart grid business by agency, a comprehensive income can be obtained, we are comprehensive by this Close reward of the income as system.It is very strict to the time delay of communication, bit error rate requirement to control class power business, the failure of communication Or mistake may influence the control execution of power grid, lead to operation of power networks failure.For some mobile application class business, (such as inspection is passed Defeated video, playback HD video etc.) need certain transmission rate to guarantee, and higher requirement is had to communication bandwidth.Power supply Reliability mean consistently and adequately, the power supply of high quality.For example, when power supply reliability reaches 99.999% (" 59 "), meaning Taste the year of Electricity customers, power off time did not exceeded 5 minutes per family in region, and when this number reaches 99.9999% (" 6 A 9 "), power off time will be reduced to 30 seconds or so per family in the year of Electricity customers in region.In the side RAN since frequency spectrum resource has Limit should choose optimal policy when distributing slice to maximize the QoS demand for meeting user.

It is main to consider downlink situation, using spectrum efficiency (SE) and time delay (Delay) as evaluation index.The frequency of system Spectrum efficiency can be with is defined as:

According to shannon formula R=blog₂(1+(g^BS→UEP)/σ²) it can be concluded that base station (BS) arrives the actual speed rate of user, Middle g^BS→UEIt is base station to the channel status (CSI) between equipment, obeys Rayleigh fading.

When describing the QoS demand of user, we introduce utility function (utility function), i.e. slice business quilt The curve mapping between performance that the bandwidth and user being assigned to perceive.Herein, it will be assumed that be sliced the business of carrying Flexible application and in real time application can be divided into.

(a) flexible application

For such application program, there is no minimum bandwidth requirements, because it can tolerate relatively large prolong Late.Elastomeric flow amount utility models are used with minor function:

Wherein k is an adjustable parameter, it determines the shape of utility function, and ensures receiving largest request bandwidth When,But even if providing very high bandwidth, the user satisfaction of this application program is also extremely difficult to 1.Therefore, I Even if think bandwidth allocation to this Application Type in the case where network bandwidth is excessive, also should not be more than maximum belt Wide b_max。

(b) application in real time

The performance guarantee of its network of the traffic requirement of this application type offer lowest level.If the bandwidth of distribution reduces To some threshold value hereinafter, QoS will become unacceptable.Real-time application is modeled using following utility function:

Wherein k₁, k₂It is adjustable parameter, they determine the shape of utility function.

The reward of definition study agency is as follows:

R=λ SE+ μ U_e+ξ·U_rt

Wherein λ, μ, ξ are SE, U_eAnd U_rtWeight.

Therefore, for the angle of mathematics, we the problem of can formulate are as follows:

d_i(i ∈ M={ 1,2 .., m }) obeys specific discharge model (*)

Solve the problems, such as that the crucial difficulty of (*) is, due to the presence of discharge model, in the case where thing is not known first, industry Business changes in demand be it is unstable, i.e., the variation of business real-time requirement under smart grid scene is unknown.

(2) the core network slice based on priority scheduling

Similarly, if computing resource is virtually turned to each VNFs by we, by computational resource allocation to every The problem of a VNF, can be resolved as being sliced radio resource.Therefore, in this part, we discuss that another is heavy The problem of wanting, that is, general VNFs core network slice priority-based.The mapping that we use is cut with radio resource Piece is slightly different, to embody the flexibility of RL.Similarly, the interaction of controller and core-network side is sliced also by four-tuple [S, A, P (s,s^*), R (s, a)] it indicates, the appropriate mapping of RL element to this slice problem is defined separately below.

A. state space

There are relevant service function chain (SFCs), their basic functions having the same in core-network side, but needs to disappear Different calculation processing units (CPUs) is consumed, and generates different results (queuing time of such as business).For example, based on business Value or other smart grid business correlated characteristics, Business Stream can be divided into three classes (such as A class, B class, C class), from A class to C class Priority gradually decrease, scheduling rule priority-based is defined as: SFC I priority processing A service stream, SFC II equality A class and b service stream are treated, but the priority for servicing c service stream is minimum.SFC III makes no exception to all Business Streams.? The queuing time of business is produced when based on priority scheduling.

State space can be defined as to T={ T_q, T_qIt is a vector, the row of each element in characterization collection of services D Team's state.When the N number of CPU of use calculates business d_iWhen, i-th of element is T_qi, indicate business d_iQueuing time, wherein i ∈ M= {1,2,..,m}。

B. motion space

The CPU that each SFC is finally used depends on the quantity of its processed Business Stream.The CPU limited amount the case where Under, each type of Business Stream needs to be scheduled for SFC appropriate, so as to cause acceptable queuing time.Therefore it is handling Business d_iWhen, it needs to select suitable CPU quantity N in core-network side_CPU.Therefore defining motion space is A_CPU={ a^CPU, wherein a^CPUIndicate the business d in face of arriving_i(i ∈ M={ 1,2 .., m }), the quantity of CPU required for being selected when executing and calculating.

C. reward function

When defining reward function, we characterize current business firstly the need of utility function U for the sensibility of time delay, Define new measurement " network request value " function W later to characterize the priority of business.

It has already mentioned above, in description flexible application and in real time in application, we use utility function:

To characterize business d respectively_iQoS demand.Compared to the side RAN, the difference is that independent variable becomes calculating business d_i When core network side needed for CPU number n.But this can only reflect the QoS demand of different business.It is limited due to computing resource Property, after distributing computing resource, reasonable scheduling rule is needed to reflect any business of priority processing, therefore introduces " net Network request value " function W characterizes the priority of business.For any applied business d_i, need to meet network request value Is defined as:

W_i=2^(p)U_i

Wherein p is business d_iPriority level, U_iIt is any one member in flexible application and real-time application composition set Element, i.e. U_i∈{U_e,U_kt}.The weight 2 of service request^(p)Indicate the importance that the request is requested relative to other.Definition reward letter Number are as follows:

R=W_i

Above formula can only obtain some business d_iCurrent preference grade, it would be desirable to obtain a series of business priority row Team's situation maximizes long-term reward so needing to accumulate, i.e.,

Fig. 5 is that smart grid is sliced the mapping of resource management mechanism to RL:

Next the slice distribution method based on intensified learning under the above-mentioned Model Background of the application proposition is introduced.

One kind being based on Q_-learningThe side RAN and CN nitrification enhancement.Due to the hereinbefore side RAN, CN state The statement of set, set of actions and reward function is slightly different, and herein, based on it is proposed that RL to RAN, CN Mapping model, Q_-learningAlgorithm has universality, and for convenience of indicating, it is S={ s that we, which unify state space, in this section₁, s₂,...,s_n, motion space is A={ a₁,a₂,...,a_n, reward function is R={ s, a }, P (s, s^*) indicate to turn from state s Move on to the transition probability of s'.

Being sliced the final target of controller is to find optimal dicing strategy π^*, which is from state set to behavior aggregate One mapping, and need to maximize each state expection long-term discount reward:

The long-term discount reward of state s is the discount summation of the reward obtained on state trajectory, and is given by:

R(s,π(s))+γR(s₁,π(s₁))+γ²R(s₂,π(s₂))+...

Wherein γ is discount factor (0 < γ < 1), determines the corresponding present value of the following reward.In formula (*) Optimization aim indicates the state value function of any strategy, can be expressed as follows:

According to the optimality criterion of Bellman, at least there is a kind of optimal policy in single environment setting.Therefore, most The state value function of dominant strategy is given by:

State transition probability depends on many factors, such as flow load, business arrive and depart from rate, decision making algorithm etc., Therefore, it either still may be all not readily available in core-network side in wireless side.Therefore model-free intensified learning is very suitable to Derive optimal policy, because it does not need the expection of reward, and state transition probability can be used as priori knowledge and be obtained Know.In various existing RL algorithms, we select Q_-learning。

By taking the side RAN as an example, slice controller is interacted in very short discrete time section with wireless environment.State-movement two Movement-value function (also referred to as Q value) of tuple (s, π (s)) can be represented as Q (s, π (s)).Q (s, π (s)) is defined It is rewarded for the expection long-term discount of state s when using strategy π.Our target is to find a kind of optimisation strategy, is maximized The Q value of each state s:

According to Q_-learningAlgorithm, slice controller can be based on existing information, pass through iterative learning to optimal Q value. At any time, the slice controller in state s can choose movement a.This reward R immediately that can receive awards_t, while also can It is transferred to next state s'.Q_-learningThe process of algorithm can be stated with the formula updated as follows:

Wherein α is learning rate, andIt is all instant reward R_tDiscount accumulation:

Entire dicing strategy is provided by following algorithm.When initial, Q value is set to 0.In Q_-learningAlgorithm applies it Before, slice controller executes initial slice distribution to different slices based on the power business flow demand estimation of each slice, this It is state initialization in order not to same slice that sample, which is done,.Existing radio resource slice solution is used based on bandwidth or based on money The supply in source gives radio resource allocation to different slices.

Due to Q_-learningIt is a kind of online Iterative Algorithm, it executes two distinct types of operation.In the mode of exploration Under, slice controller randomly chooses a possible movement, to enhance its following decision.On the contrary, in development mode, slice Controller, which prefers it, to be attempted concurrently now to operate effectively in the past.We assume that the slice controller in state s is with the general of ε Rate is explored, and the Q value stored before being utilized with the probability of 1- ε.Under any state, not every movement is all can It is capable for being isolated between retention tab and piece, being sliced that controller must assure that will not be by identical Physical Resource Block (PRB) point Dispensing two different pieces (side RAN).

Corresponding with method provided by the present application, the application provides a kind of smart grid based on intensified learning simultaneously and cuts Piece distributor 600, which is characterized in that including；

Taxon 610 classifies the power business of smart grid according to type of service；

Classification and slice corresponding unit 620, by the corresponding different slice of the classification；

Model construction unit 630 constructs the intensified learning model of smart grid slice according to the service indication of smart grid； By the intensified learning model, the distribution being sliced to smart grid is completed, realizes the resource scheduling management of smart grid.

The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, although referring to above-described embodiment pair The present invention is described in detail, those of ordinary skill in the art still can to a specific embodiment of the invention into Row modifies perhaps equivalent replacement and these exist without departing from any modification of spirit and scope of the invention or equivalent replacement Apply within pending claims of the invention.

Claims

1. a kind of smart grid based on intensified learning is sliced distribution method characterized by comprising

The power business of smart grid is classified according to type of service；

By the corresponding different slice of the classification；

The intensified learning model that smart grid slice is constructed according to the service indication of smart grid, passes through the intensified learning mould Type completes the distribution being sliced to smart grid, realizes the resource scheduling management of smart grid.

2. the method according to claim 1, wherein the power business of smart grid is carried out according to type of service Classification, comprising:

3. the method according to claim 1, wherein by the corresponding different slice of the classification, comprising:

Control class is corresponded into uRLLC slice, information collection class is corresponded into mMTC slice, mobile application class is corresponded into eMBB slice.

4. the method according to claim 1, wherein it is described building smart grid intensified learning model, specifically , use the intensified learning model of Q-learning algorithm building smart grid.

5. the method according to claim 1, wherein it is described building smart grid slice intensified learning model, It include: the intensified learning model for constructing wireless access side and core-network side respectively.

6. method according to claim 1 or 4, which is characterized in that the intensified learning mould of the building smart grid slice Type, comprising:

State space is defined as

Motion spaceIt is defined as

Reward function is Indicate the transition probability that s' is transferred to from state s；

At any time, the slice controller in state s can choose movement a, and receive awards instant rewardAlso can simultaneously It is transferred to next state s', the process of Q-learning algorithm can be stated with the formula updated as follows,

Wherein α is learning rate, andIt is all instant rewardsDiscount accumulation,

It can guarantee that Q (a) finally may be used by s by updating Q value within the sufficiently long duration, and by adjusting the value of α and γ To converge to value when optimal policy, i.e.,

7. a kind of smart grid based on intensified learning is sliced distributor, which is characterized in that including；

Model construction unit constructs the intensified learning model of smart grid slice according to the service indication of smart grid；Pass through institute Intensified learning model is stated, the distribution being sliced to smart grid is completed, realizes the resource scheduling management of smart grid.