CN110381541B - Smart grid slice distribution method and device based on reinforcement learning - Google Patents
Smart grid slice distribution method and device based on reinforcement learning Download PDFInfo
- Publication number
- CN110381541B CN110381541B CN201910452242.7A CN201910452242A CN110381541B CN 110381541 B CN110381541 B CN 110381541B CN 201910452242 A CN201910452242 A CN 201910452242A CN 110381541 B CN110381541 B CN 110381541B
- Authority
- CN
- China
- Prior art keywords
- smart grid
- business
- reinforcement learning
- slices
- slice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/24—Negotiating SLA [Service Level Agreement]; Negotiating QoS [Quality of Service]
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
技术领域Technical field
本申请涉及电力无线通信的网络资源分配领域,具体涉及一种基于强化学习的智能电网切片分配方法,同时涉及一种基于强化学习的智能电网切片分配装置。This application relates to the field of network resource allocation for power wireless communications, specifically to a smart grid slice allocation method based on reinforcement learning, and also to a smart grid slice allocation device based on reinforcement learning.
背景技术Background technique
随着高速泛在、低功耗、低时延的5G时代的到来,人类社会的通信逐步实现畅通化。网络切片被认为是5G网络的重要关键技术之一,其将单个物理网络分成多个独立的逻辑网络,以支持各种垂直的多服务网络,并根据其特性,分配于不同的业务场景中,以适应不同的服务需求。利用网络切片技术能够大大节省部署的成本并减少网络的占有率。With the advent of the 5G era of high-speed ubiquity, low power consumption, and low latency, communications in human society have gradually become smoother. Network slicing is considered one of the important key technologies of 5G networks. It divides a single physical network into multiple independent logical networks to support various vertical multi-service networks and distribute them in different business scenarios according to their characteristics. to adapt to different service needs. Using network slicing technology can greatly save deployment costs and reduce network occupancy.
在能源和电力需求增长的驱动下,世界电网从传统网络步入了智能电网时代。结合新一轮能源变革、通信领域的发展以及全球互联网战略构想,5G网络切片技术第一次具备了应用于智能电网业务的可能性。5G网络切片的技术特性对于承载面向电网的无线业务应用具备切片可定制化、切片间安全可靠隔离及切片统一管理的特点,并且具备快速组网、高效经济的优势,在电力系统中有着广阔应用前景。所以,基于强化学习的5G网络切片技术与智能电网的融合是亟需解决的问题。Driven by the growth in energy and power demand, the world's power grid has entered the smart grid era from traditional networks. Combined with the new round of energy revolution, development in the communications field and the global Internet strategic concept, 5G network slicing technology has the possibility of being applied to smart grid services for the first time. The technical characteristics of 5G network slicing have the characteristics of customizable slices, safe and reliable isolation between slices, and unified slice management for carrying wireless business applications for the power grid. It also has the advantages of fast networking, high efficiency and economy, and has broad applications in power systems. prospect. Therefore, the integration of 5G network slicing technology based on reinforcement learning and smart grid is an urgent problem that needs to be solved.
发明内容Contents of the invention
本申请提供一种基于强化学习的智能电网切片分配方法,解决基于强化学习的5G网络切片技术与智能电网的整合问题。This application provides a smart grid slicing allocation method based on reinforcement learning to solve the integration problem of 5G network slicing technology and smart grid based on reinforcement learning.
本申请提供一种基于强化学习的智能电网切片分配方法,其特征在于,包括:This application provides a smart grid slice allocation method based on reinforcement learning, which is characterized by including:
将智能电网的电力业务根据业务类型进行分类;Classify the power business of smart grid according to business type;
将所述分类对应不同的切片;Correspond to different slices according to the classification;
根据智能电网的服务指标构建智能电网切片的强化学习模型,通过所述强化学习模型,完成对智能电网切片的分配,实现智能电网的资源调度管理。A reinforcement learning model of smart grid slices is constructed according to the service indicators of the smart grid. Through the reinforcement learning model, the allocation of smart grid slices is completed and the resource scheduling management of the smart grid is realized.
优选的,将智能电网的电力业务根据业务类型进行分类,包括:Preferably, the power services of the smart grid are classified according to business types, including:
将智能电网的电力业务根据业务类型分为控制类、信息采集类,以及移动应用类。The power business of smart grid is divided into control category, information collection category and mobile application category according to the business type.
优选的,将所述分类对应不同的切片,包括:Preferably, the classification corresponds to different slices, including:
将控制类对应uRLLC切片,将信息采集类对应mMTC切片,将移动应用类对应eMBB切片。The control class corresponds to the uRLLC slice, the information collection class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
优选的,所述构建智能电网的强化学习模型,具体的,使用Q-learning算法构建智能电网的强化学习模型。Preferably, the reinforcement learning model of the smart grid is constructed. Specifically, the Q -learning algorithm is used to construct the reinforcement learning model of the smart grid.
优选的,所述构建智能电网切片的强化学习模型,包括:分别构建无线接入侧和核心网侧的强化学习模型。Preferably, constructing a reinforcement learning model for smart grid slicing includes: constructing reinforcement learning models for the wireless access side and the core network side respectively.
优选的,所述构建智能电网切片的强化学习模型,包括:Preferably, the reinforcement learning model for constructing smart grid slices includes:
将状态空间定义为S={s1,s2,...,sn};Define the state space as S={s 1 , s 2 ,..., s n };
动作空间A被定义为A={a1,a2,...,an};Action space A is defined as A={a 1 ,a 2 ,..., an };
奖励函数为R={s,a},P(s,s*)表示从状态s转移到s'的转移概率;The reward function is R={s,a}, and P(s,s * ) represents the transition probability from state s to s';
在任意时刻,处于状态s的切片控制器可以选择动作a,得到奖励即时奖励Rt,同时也会转移到下一个状态s',Q-learning算法的过程可以用如下更新的式子表述,At any time, the slice controller in state s can choose action a, receive an immediate reward R t , and will also move to the next state s'. The process of the Q-learning algorithm can be expressed by the following updated formula,
其中α是学习速率,并且是所有即时奖励Rt的折扣积累,where α is the learning rate, and is the discount accumulation of all instant rewards R t ,
可以通过在足够长的持续时间内更新Q值,并通过调整α和γ的值,保证Q(s,a)最终可以收敛到最优策略时候的值,即 By updating the Q value within a long enough duration and adjusting the values of α and γ, we can ensure that Q(s,a) can eventually converge to the value of the optimal strategy, that is,
本申请同时提供一种基于强化学习的智能电网切片分配装置,其特征在于,包括;This application also provides a smart grid slice distribution device based on reinforcement learning, which is characterized by including;
分类单元,将智能电网的电力业务根据业务类型进行分类;Classification unit, which classifies the power business of smart grid according to business type;
分类与切片对应单元,将所述分类对应不同的切片;A classification and slicing corresponding unit, which corresponds the classification to different slices;
模型构建单元,根据智能电网的服务指标构建智能电网切片的强化学习模型;通过所述强化学习模型,完成对智能电网切片的分配,实现智能电网的资源调度管理。The model building unit constructs a reinforcement learning model of smart grid slices according to the service indicators of the smart grid; through the reinforcement learning model, the allocation of smart grid slices is completed and the resource scheduling management of the smart grid is realized.
本申请提供一种基于强化学习的智能电网切片分配方法,通过将智能电网的业务类型进行分类,将分类对应不同的切片,通过构建的智能电网切片的强化学习模型,完成对智能电网切片的分配。从而解决基于强化学习的5G网络切片技术与智能电网的整合问题。This application provides a method of allocating smart grid slices based on reinforcement learning. By classifying the business types of smart grids, the classifications correspond to different slices, and the allocation of smart grid slices is completed through the constructed reinforcement learning model of smart grid slices. . This solves the integration problem of 5G network slicing technology based on reinforcement learning and smart grid.
附图说明Description of drawings
图1是本申请实施例提供的一种基于强化学习的智能电网切片分配方法的流程示意图;Figure 1 is a schematic flow chart of a smart grid slice allocation method based on reinforcement learning provided by an embodiment of the present application;
图2是本申请实施例涉及的智能电网场景下切片架构示意图;Figure 2 is a schematic diagram of the slicing architecture in the smart grid scenario involved in the embodiment of this application;
图3是本申请实施例涉及的切片与智能电网的三类业务之间的关系示意图;Figure 3 is a schematic diagram of the relationship between slicing and the three types of services of the smart grid involved in the embodiment of the present application;
图4是本申请实施例涉及的智能电网典型业务切片的QoS指标;Figure 4 is the QoS indicator of a typical smart grid business slice involved in the embodiment of this application;
图5是本申请实施例涉及的智能电网切片资源管理机制到RL的映射;Figure 5 is a mapping of the smart grid slice resource management mechanism to RL involved in the embodiment of this application;
图6是本申请实施例提供的一种基于强化学习的智能电网切片分配装置示意图。Figure 6 is a schematic diagram of a smart grid slice allocation device based on reinforcement learning provided by an embodiment of the present application.
具体实施方式Detailed ways
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. However, the present application can be implemented in many other ways different from those described here. Those skilled in the art can make similar extensions without violating the connotation of the present application. Therefore, the present application is not limited by the specific implementation disclosed below.
请参看图1,图1是本申请实施例提供的一种基于强化学习的智能电网切片分配方法,下面结合图1对本申请提供的方法进行详细说明。Please refer to Figure 1. Figure 1 is a smart grid slice allocation method based on reinforcement learning provided by an embodiment of this application. The method provided by this application will be described in detail below in conjunction with Figure 1.
步骤S101,将智能电网的电力业务根据业务类型进行分类。Step S101: Classify the power services of the smart grid according to service types.
首先,介绍本申请基于的智能电网场景下切片架构,如图2所示。First, the slicing architecture in the smart grid scenario based on this application is introduced, as shown in Figure 2.
网络切片借助SDN技术帮助实现网络的控制/数据平面解耦,并在两者之间定义开放接口,实现对网络切片中的网络功能的灵活定义。为满足该种业务的需求,网络切片只包含支持特定业务的网络功能。电力业务可以分为控制类(如配电自动化、精准负荷控制等)、信息采集类(如用电信息采集、输电线路监测等)、移动应用类(如智能巡检、移动作业等)三大类。Network slicing uses SDN technology to help decouple the control/data plane of the network and define open interfaces between the two to achieve flexible definition of network functions in network slicing. To meet the needs of this type of business, network slicing only includes network functions that support specific services. The power business can be divided into three categories: control (such as distribution automation, precise load control, etc.), information collection (such as power consumption information collection, transmission line monitoring, etc.), and mobile application (such as intelligent inspection, mobile operations, etc.) kind.
步骤S102,将所述分类对应不同的切片。Step S102: Correspond the categories to different slices.
图3是三大类切片与智能电网的三类业务之间的关系。将控制类对应uRLLC切片,将信息采集类对应mMTC切片,将移动应用类对应eMBB切片。Figure 3 shows the relationship between the three major types of slices and the three types of services of the smart grid. The control class corresponds to the uRLLC slice, the information collection class corresponds to the mMTC slice, and the mobile application class corresponds to the eMBB slice.
步骤S103,根据智能电网的服务指标构建智能电网切片的强化学习模型,通过所述强化学习模型,完成对智能电网切片的分配,实现智能电网的资源调度管理。Step S103: Construct a reinforcement learning model of smart grid slices according to the service indicators of the smart grid. Through the reinforcement learning model, the allocation of smart grid slices is completed and the resource scheduling management of the smart grid is realized.
图4给出了智能电网典型业务切片的QoS(服务)指标。本申请考虑业务平面、编排控制平面和数据平面。业务平面将业务划分为弹性应用(Elastic application)和实时应用(Real-Time application)。弹性应用可以容忍相对较大的延迟,不存在最小带宽要求。具体的例子,如汽车入桩分布式电源、视频监控、用户计量等。实时应用要求其网络提供最低级别的性能保证。主要代表类型是URLLC切片业务,典型的例子是配电自动化、应急通信等。数据平面存储着电力设备与物理层交互产生的数据。Figure 4 shows the QoS (service) indicators of typical business slices of smart grids. This application considers the business plane, orchestration control plane and data plane. The business plane divides the business into elastic application (Elastic application) and real-time application (Real-Time application). Resilient applications can tolerate relatively large delays and have no minimum bandwidth requirements. Specific examples include car-mounted distributed power supply, video surveillance, user metering, etc. Real-time applications require a minimum level of performance guarantees from their networks. The main representative type is URLLC slicing service. Typical examples are power distribution automation, emergency communications, etc. The data plane stores data generated by the interaction between power equipment and the physical layer.
在本申请中,重点考虑编排控制平面,引入接入网SDN(软件定义网络)控制器和核心网SDN控制器,分别负责接入网和核心网的网络功能(NF)管理和协调(如服务迁移和部署),它们相当于两个不同的代理,之间能够相互通信共同完成协调工作。面对业务平面的业务类型、信道条件、用户需求的各类先验知识,编排控制平面的切片编排控制器完成对切片网络的划分,并分为无线接入网(RAN)侧切片和核心网(CN)侧切片。RAN侧和CN侧的网络切片分别由各自的SDN控制器进行管理,负责执行各自网络侧的算法,也就是本申请提出的基于强化学习的智能电网切片分配方法。In this application, the orchestration control plane is mainly considered, and the access network SDN (Software Defined Network) controller and the core network SDN controller are introduced, respectively responsible for the network function (NF) management and coordination (such as services) of the access network and core network. Migration and deployment), they are equivalent to two different agents that can communicate with each other and complete coordination work together. Faced with various prior knowledge of service types, channel conditions, and user requirements on the service plane, the slice orchestration controller of the orchestration control plane completes the division of the slice network and divides it into radio access network (RAN) side slices and core networks. (CN) Side slice. The network slices on the RAN side and the CN side are managed by their respective SDN controllers, which are responsible for executing the algorithms on the respective network sides, which is the smart grid slice allocation method based on reinforcement learning proposed in this application.
下面说明本申请提出的RAN侧和CN侧的强化学习模型。The following describes the reinforcement learning models on the RAN side and CN side proposed in this application.
(1)RAN侧无线资源切片(1) RAN side radio resource slicing
给定一系列的现有切片χ1,χ2,...,χn,用向量χ表示现有切片的集合为χ={χ1,χ2,...,χn},这些切片共享聚合带宽B;存在一系列的业务流,用向量D={d1,d2,...,dm}表示。变量D实际上是智能电网业务流构成的集合。面对智能电网多业务特征,每种切片业务所需要满足的QoS要求均不同。但该业务流具体是智能电网中的何种业务,事先未知,并且在智能电网的场景下业务的实时需求变化是不稳定的。可以看出,di(i∈M={1,2,..,m})服从特定的流量模型。Given a series of existing slices χ 1 , χ 2 ,..., χ n , the vector χ is used to represent the set of existing slices as χ = {χ 1 , χ 2 ,..., χ n }. These slices Shared aggregate bandwidth B; there is a series of service flows, represented by vector D = {d 1 , d 2 ,..., d m }. Variable D is actually a collection of smart grid business flows. Faced with the multi-service characteristics of smart grids, each slice service needs to meet different QoS requirements. However, what kind of business this business flow is in the smart grid is unknown in advance, and the real-time demand changes of the business in the smart grid scenario are unstable. It can be seen that d i (i∈M={1,2,..,m}) obeys a specific traffic model.
首先需要定义RAN侧网络的系统状态空间、动作空间以及奖励函数。切片控制器与无线环境的交互由元组[S,A,P(s,s*),R(s,a)]表示,其中S表示可能的状态集,A表示可能的动作集,P(s,s*)表示从状态s转移到s'的转移概率,R(s,a)是与状态s中的动作触发相关联的奖励,其被反馈给切片控制器。以下为无线接入侧切片资源管理到RL的映射。First, it is necessary to define the system state space, action space and reward function of the RAN side network. The interaction between the slice controller and the wireless environment is represented by the tuple [S,A,P(s,s * ),R(s,a)], where S represents the possible state set, A represents the possible action set, and P( s,s * ) represents the transition probability from state s to s', and R(s,a) is the reward associated with the action trigger in state s, which is fed back to the slice controller. The following is the mapping of wireless access side slice resource management to RL.
A.状态空间:A. State space:
状态空间被定义为一组元组S={sslice}。sslice是一个向量,其用来表示当前所有可用承载相关电力业务切片的状态,其中第n个元素为 The state space is defined as a set of tuples S={s slice }. s slice is a vector that is used to represent the status of all currently available bearer-related power service slices, where the nth element is
B.动作空间:B. Action space:
面对时变未知的业务流量模型,强化学习的代理(Agent)必须为相应的电力业务分配合适的切片资源。代理可以根据当前的切片状态以及奖励函数来决定下一刻如何执行动作。动作空间A被定义为A={abandwidth},abandwidth表示代理(Agent)为每个逻辑上独立划分的切片分配合适的带宽以承载相应的业务。Facing the time-varying and unknown business traffic model, the reinforcement learning agent must allocate appropriate slice resources to the corresponding power business. The agent can decide how to perform actions at the next moment based on the current slice state and the reward function. Action space A is defined as A = {a bandwidth }, where a bandwidth indicates that the agent allocates appropriate bandwidth to each logically independent slice to carry the corresponding service.
由于网络切片是在虚拟网络之间共享网络资源,虚拟网络片之间必须互相隔离,以便若一个切片上的资源不足以承载当前业务而发生拥塞或故障时不会影响到其他切片。因此,为保证切片的隔离性以资源分配的效用最大化,限定每个切片最多只能承载一种业务:Since network slicing shares network resources between virtual networks, virtual network slices must be isolated from each other so that if the resources on one slice are insufficient to carry the current business and congestion or failure occurs, other slices will not be affected. Therefore, in order to ensure the isolation of slices and maximize the effectiveness of resource allocation, each slice is limited to carry only one kind of service at most:
同时限定二值变量 Limit binary variables at the same time
C.奖励函数C. Reward function
代理将特定的切片分配给某智能电网业务后,会得到一个综合收益,我们将此综合收益作为系统的奖励。控制类电力业务对通信的时延、误码率要求非常严格,通信的失效或差错可能影响电网的控制执行,导致电网运行故障。对于一些移动应用类业务(如巡检传输视频,回放高清视频等)需要一定的传输速率保证,且对通信带宽会有较高的要求。供电可靠率意味着持续、充足、高质量的供电。例如,当供电可靠率达到99.999%(“5个9”)时,意味着区域内用电客户的年户均停电时间不会超过5分钟,而当这一数字达到99.9999%(“6个9”),区域内用电客户的年户均停电时间将被缩减到30秒左右。在RAN侧由于频谱资源有限,在分配切片时应当选取最优策略以最大化满足用户的QoS需求。After the agent allocates a specific slice to a certain smart grid business, it will receive a comprehensive income, which we use as a reward for the system. Control power services have very strict requirements on communication delay and bit error rate. Communication failures or errors may affect the control execution of the power grid and lead to grid operation failures. For some mobile application services (such as inspection transmission video, high-definition video playback, etc.), a certain transmission rate is required, and there are higher requirements for communication bandwidth. Power supply reliability means continuous, sufficient, and high-quality power supply. For example, when the power supply reliability rate reaches 99.999% ("five nines"), it means that the average annual power outage time for electricity customers in the region will not exceed 5 minutes, and when this number reaches 99.9999% ("six nines") ”), the average annual power outage time for electricity customers in the region will be reduced to about 30 seconds. Due to limited spectrum resources on the RAN side, the optimal strategy should be selected when allocating slices to maximize the user's QoS needs.
主要考虑下行情况,采用频谱效率(SE)和时延(Delay)来作为评价指标。系统的频谱效率可以定义为:Mainly considering the downlink situation, spectrum efficiency (SE) and delay (Delay) are used as evaluation indicators. The spectral efficiency of the system can be defined as:
根据香农公式R=blog2(1+(gBS→UEP)/σ2)可以得出基站(BS)到用户的实际速率,其中gBS→UE是基站到设备之间的信道状态(CSI),服从瑞利衰落。According to the Shannon formula R=blog 2 (1+(g BS→UE P)/σ 2 ), the actual rate from the base station (BS) to the user can be obtained, where g BS→UE is the channel state (CSI) between the base station and the device ), obeys Rayleigh fading.
在描述用户的QoS需求时,我们引入效用函数(utility function),即切片业务被分配到的带宽与用户感知到的性能之间的曲线映射。在本文中,我们假设切片承载的业务可以分为弹性应用和实时应用。When describing users' QoS requirements, we introduce a utility function, which is a curve mapping between the bandwidth allocated to slice services and the performance perceived by users. In this article, we assume that the services carried by slices can be divided into elastic applications and real-time applications.
(a)弹性应用(a) Flexible application
对于这种类型的应用程序,不存在最小带宽要求,因为它可以容忍相对较大的延迟。弹性流量效用模型采用以下函数:For this type of application, there is no minimum bandwidth requirement because it can tolerate relatively large delays. The elastic flow utility model uses the following function:
其中k是一个可调参数,它决定效用函数的形状,并确保在接收到最大请求带宽时,但是即使提供非常高的带宽,这个应用程序的用户满意度也很难达到1。因此,我们认为带宽分配给这种应用程序类型即使在网络带宽过大的情况下,也不应该超过最大带宽bmax。where k is a tunable parameter that determines the shape of the utility function and ensures that when the maximum requested bandwidth is received, But even with very high bandwidth, the user satisfaction rating of this application is hardly 1. Therefore, we believe that the bandwidth allocated to this application type should not exceed the maximum bandwidth b max even in the case of excessive network bandwidth.
(b)实时应用(b)Real-time application
这种应用类型的流量要求其网络提供最低级别的性能保证。如果分配的带宽降低到某个阈值以下,QoS将变得不可接受。使用以下效用函数对实时应用进行建模:This application type of traffic requires a minimum level of performance guarantees from its network. If the allocated bandwidth drops below a certain threshold, QoS becomes unacceptable. Real-time applications are modeled using the following utility function:
其中k1,k2是可调参数,它们决定实用函数的形状。Among them, k 1 and k 2 are adjustable parameters, which determine the shape of the utility function.
定义学习代理的奖励如下:Define the rewards for the learning agent as follows:
R=λ·SE+μ·Ue+ξ·Urt R=λ·SE+μ·U e +ξ·U rt
其中λ,μ,ξ是SE、Ue和Urt的权重。where λ, μ, ξ are the weights of SE, U e and U rt .
因此,从数学的角度来说,我们的问题可以制定为:Therefore, mathematically speaking, our problem can be formulated as:
di(i∈M={1,2,..,m})服从特定的流量模型(*)d i (i∈M={1,2,..,m}) obeys a specific traffic model (*)
解决问题(*)的关键难度在于,由于流量模型的存在,在事未先知道的情况下,业务需求变化是不稳定的,即智能电网场景下的业务实时需求变化未知。The key difficulty in solving problem (*) is that due to the existence of the traffic model, changes in business demand are unstable without being known in advance, that is, the real-time changes in business demand in the smart grid scenario are unknown.
(2)基于优先级调度的核心网络切片(2) Core network slicing based on priority scheduling
类似地,如果我们将计算资源虚拟化为每个片的VNFs,那么将计算资源分配给每个VNF的问题就可以像无线资源切片那样得到解决。因此,在这一部分,我们讨论另一个重要的问题,那就是基于优先级的通用VNFs核心网络切片。我们使用的映射与无线电资源切片略有不同,以体现RL的灵活性。同样地,切片控制器与核心网侧的交互也由四元组[S,A,P(s,s*),R(s,a)]表示,下面分别定义RL元素到这个切片问题的适当映射。Similarly, if we virtualize computing resources into VNFs per slice, then the problem of allocating computing resources to each VNF can be solved like wireless resource slicing. Therefore, in this part, we discuss another important issue, which is priority-based core network slicing of common VNFs. The mapping we use is slightly different from radio resource slicing to reflect the flexibility of RL. Similarly, the interaction between the slicing controller and the core network side is also represented by the quadruple [S, A, P (s, s * ), R (s, a)]. The following defines the appropriate RL elements for this slicing problem. mapping.
A.状态空间A. State space
在核心网侧有相关的服务功能链(SFCs),它们具有相同的基本功能,但是需要消耗不同的计算处理单元(CPUs),并且产生不同的结果(如业务的排队时间)。例如,基于商业价值或其他智能电网业务相关特征,可以将业务流分为三类(如A类、B类、C类),从A类到C类的优先级逐渐降低,基于优先级的调度规则定义为:SFC I优先处理A类业务流,SFC II平等对待A类和B类业务流,但服务C类业务流的优先级最低。SFC III对所有业务流一视同仁。在基于优先级调度时产生了业务的排队时间。There are related service function chains (SFCs) on the core network side, which have the same basic functions, but require different computing processing units (CPUs) and produce different results (such as service queuing time). For example, based on business value or other smart grid business-related characteristics, business flows can be divided into three categories (such as Category A, Category B, and Category C). The priority from Category A to Category C gradually decreases. Priority-based scheduling The rule is defined as follows: SFC I gives priority to Class A business flows, and SFC II treats Class A and Class B business flows equally, but serves Class C business flows with the lowest priority. SFC III treats all business streams equally. When scheduling based on priority, the queuing time of the service is generated.
可以将状态空间定义为T={Tq},Tq是一个矢量,表征业务集合D内每一个元素的排队状态。当使用N个CPU计算业务di时,第i个元素为Tqi,表示业务di的排队时间,其中i∈M={1,2,..,m}。The state space can be defined as T={T q }, where T q is a vector representing the queuing status of each element in the business set D. When N CPUs are used to calculate service d i , the i-th element is T qi , which represents the queuing time of service d i , where i∈M={1,2,..,m}.
B.动作空间B. Action space
每个SFC最终使用的CPU取决于其处理过的业务流的数量。在CPU数量有限的情况下,每种类型的业务流需要被调度到适当的SFC,从而导致可接受的排队时间。因此在处理业务di时,在核心网侧需要选择合适的CPU数量NCPU。因此定义动作空间为ACPU={aCPU},其中aCPU表示面对到来的业务di(i∈M={1,2,..,m}),在执行计算时选择所需要CPU的数量。The final CPU usage of each SFC depends on the number of business flows it processes. With a limited number of CPUs, each type of business flow needs to be scheduled to the appropriate SFC, resulting in acceptable queuing time. Therefore, when processing service di , it is necessary to select an appropriate number of CPUs N CPU on the core network side. Therefore, the action space is defined as A CPU ={a CPU }, where a CPU represents the incoming business di (i∈M={1,2,..,m}), and the required CPU is selected when performing calculations. quantity.
C.奖励函数C. Reward function
在定义奖励函数时,我们首先需要效用函数U来表征当前业务对于时延的敏感性,之后定义一个新的度量“网络请求价值”函数W来表征业务的优先级。When defining the reward function, we first need a utility function U to represent the sensitivity of the current business to delay, and then define a new measurement "network request value" function W to represent the priority of the business.
上面已经提到过,在描述弹性应用和实时应用时,我们使用效用函数:As mentioned above, when describing elastic applications and real-time applications, we use utility functions:
来分别表征业务di的QoS需求。相比于RAN侧,其中不同的是自变量变为计算业务di时核心网络侧所需的CPU数目n。但是这只能反映不同业务的QoS需求。由于计算资源的有限性,在分配好计算资源后,需要合理的调度规则来反映优先处理哪一种业务,因此引入“网络请求价值”函数W来表征业务的优先级。对于任一应用业务di,需要满足的网络请求价值定义为:To respectively characterize the QoS requirements of business di . Compared with the RAN side, the difference is that the independent variable becomes the number n of CPUs required on the core network side when calculating the service di . But this can only reflect the QoS requirements of different services. Due to the limited nature of computing resources, after allocating computing resources, reasonable scheduling rules are needed to reflect which business should be processed first. Therefore, the "network request value" function W is introduced to characterize the priority of the business. For any application service d i , the network request value that needs to be satisfied is defined as:
Wi=2(p)Ui W i = 2 (p) U i
其中p是业务di的优先等级,Ui是弹性应用和实时应用构成集合中的任意一个元素,即Ui∈{Ue,Ukt}。业务请求的权重2(p)表示该请求相对于其他请求的重要性。定义奖励函数为:Where p is the priority level of business d i , and U i is any element in the set of elastic applications and real-time applications, that is, U i ∈ {U e , U kt }. The weight 2 (p) of a business request indicates the importance of the request relative to other requests. Define the reward function as:
R=Wi R= Wi
上式只能得到某个业务di的当前优先等级,我们需要得到一系列业务的优先级排队情况,所以需要累积最大化长期奖励,即The above formula can only get the current priority of a certain business di i . We need to get the priority queuing situation of a series of businesses, so we need to accumulate and maximize the long-term reward, that is
图5为智能电网切片资源管理机制到RL的映射:Figure 5 shows the mapping of smart grid slice resource management mechanisms to RL:
接下来介绍本申请提出的上述模型背景下基于强化学习的切片分配方法。Next, the slice allocation method based on reinforcement learning in the context of the above model proposed in this application is introduced.
一种基于Q-learning的RAN和CN侧的的强化学习算法。由于在上文中RAN、CN侧状态集合、动作集合以及奖励函数的表述略有不同,且在本文中,基于我们提出的RL到RAN、CN的映射模型,Q-learning算法具有普适性,为方便表示,在本节中我们统一状态空间为S={s1,s2,...,sn},动作空间为A={a1,a2,...,an},奖励函数为R={s,a},P(s,s*)表示从状态s转移到s'的转移概率。A reinforcement learning algorithm on the RAN and CN sides based on Q -learning . Since the expressions of the RAN and CN side state sets, action sets and reward functions are slightly different above, and in this article, based on the mapping model from RL to RAN and CN we proposed, the Q -learning algorithm has universal applicability, as For convenience, in this section we unify the state space as S={s 1 ,s 2 ,...,s n }, the action space as A={a 1 ,a 2 ,..., an }, and the reward The function is R={s,a}, and P(s,s * ) represents the transition probability from state s to s'.
切片控制器最终的目标是要找到最优的切片策略π*,该策略是从状态集到动作集的一个映射,并且需要最大化每个状态的预期长期折扣奖励:The ultimate goal of the slicing controller is to find the optimal slicing policy π * , which is a mapping from the state set to the action set and needs to maximize the expected long-term discounted reward of each state:
状态s的长期折扣奖励是在状态轨迹上获得的奖励的折扣总和,并由下式给出:The long-term discounted reward for state s is the discounted sum of rewards earned on the state trajectory and is given by:
R(s,π(s))+γR(s1,π(s1))+γ2R(s2,π(s2))+...R(s,π(s))+γR(s 1 ,π(s 1 ))+γ 2 R(s 2 ,π(s 2 ))+...
其中γ是折扣因子(0<γ<1),决定了未来奖励所对应的现在的值。式(*)中的优化目标表示任意策略的状态值函数,可表示如下:Among them, γ is the discount factor (0<γ<1), which determines the current value corresponding to future rewards. The optimization objective in formula (*) represents the state value function of any strategy, which can be expressed as follows:
根据Bellman的最优性标准,在单一环境设置中至少存在一种最优策略。因此,最优策略的状态值函数由下式给出:According to Bellman's optimality criterion, there exists at least one optimal strategy in a single environmental setting. Therefore, the state value function of the optimal policy is given by:
状态转移概率取决于许多因素,例如流量负载,业务到达和离开率,决策算法等,因此,无论是在无线侧还是在核心网侧都可能不容易获得。因此无模型强化学习非常适合推导出最优策略,因为它不需要奖励的预期,并且状态转移概率可以作为先验知识而被得知。在各种现有的RL算法中,我们选择Q-learning。The state transition probability depends on many factors, such as traffic load, service arrival and departure rate, decision-making algorithm, etc., therefore, it may not be easily obtained either on the wireless side or the core network side. Therefore, model-free reinforcement learning is very suitable for deriving optimal policies because it does not require reward expectations and the state transition probabilities can be known as prior knowledge. Among various existing RL algorithms, we choose Q -learning .
以RAN侧为例,切片控制器在很短的离散时间段内与无线环境交互。状态-动作二元组(s,π(s))的动作-值函数(也被称为Q值)可以被表示为Q(s,π(s))。Q(s,π(s))被定义为当使用策略π时状态s的预期长期折扣奖励。我们的目标是要找到一种优化策略,最大化每个状态s的Q值:Taking the RAN side as an example, the slicing controller interacts with the wireless environment in a short discrete time period. The action-value function (also known as Q-value) of the state-action tuple (s, π(s)) can be expressed as Q(s, π(s)). Q(s,π(s)) is defined as the expected long-term discounted reward of state s when using policy π. Our goal is to find an optimization strategy that maximizes the Q value of each state s:
根据Q-learning算法,切片控制器可以基于已有的信息,通过迭代学习到最优的Q值。在任意时刻,处于状态s的切片控制器可以选择动作a。这会得到奖励即时奖励Rt,同时也会转移到下一个状态s'。Q-learning算法的过程可以用如下更新的式子表述:According to the Q -learning algorithm, the slice controller can iteratively learn the optimal Q value based on existing information. At any time, a slice controller in state s can choose action a. This will be rewarded with an immediate reward R t and will also move to the next state s'. The process of Q -learning algorithm can be expressed by the following updated formula:
其中α是学习速率,并且是所有即时奖励Rt的折扣积累:where α is the learning rate, and is the accumulation of discounts over all instant rewards R t :
可以通过在足够长的持续时间内更新Q值,并通过调整α和γ的值,保证Q(s,a)最终可以收敛到最优策略时候的值,即 By updating the Q value within a long enough duration and adjusting the values of α and γ, we can ensure that Q(s,a) can eventually converge to the value of the optimal strategy, that is,
整个切片策略由下列的算法给出。初始时,Q值被设定为0。在Q-learning算法应用之前,切片控制器基于每个切片的电力业务流量需求估计对不同切片执行初始切片分配,这样做是为了不同切片的状态初始化。现有的无线资源切片解决方案使用基于带宽或基于资源的供应来将无线资源分配给不同的切片。The entire slicing strategy is given by the following algorithm. Initially, the Q value is set to 0. Before the Q -learning algorithm is applied, the slice controller performs initial slice allocation to different slices based on the power service traffic demand estimate of each slice. This is done for the state initialization of different slices. Existing wireless resource slicing solutions use bandwidth-based or resource-based provisioning to allocate wireless resources to different slices.
由于Q-learning是一种在线迭代学习算法,它执行两种不同类型的操作。在探索模式下,切片控制器随机选择一个可能的动作,以增强其未来的决策。相反,在开发模式中,切片控制器更喜欢它过去尝试过并发现有效的操作。我们假设处于状态s的切片控制器以ε的概率进行探索,并且以1-ε的概率利用之前存储的Q值。在任何状态下,不是所有的动作都是可行的为了保持片与片之间的隔离,切片控制器必须确保不会将相同的物理资源块(PRB)分配给两个不同的片(RAN侧)。Since Q -learning is an online iterative learning algorithm, it performs two different types of operations. In exploration mode, the slice controller randomly selects a possible action to enhance its future decisions. In contrast, in development mode, the slice controller prefers operations that it has tried in the past and found to work. We assume that the slice controller in state s explores with probability ε and exploits previously stored Q values with probability 1-ε. Not all actions are possible in any state. To maintain isolation between slices, the slice controller must ensure that the same physical resource block (PRB) is not allocated to two different slices (RAN side) .
与本申请提供的方法相对应的,本申请同时提供一种基于强化学习的智能电网切片分配装置600,其特征在于,包括;Corresponding to the method provided by this application, this application also provides a smart grid slice allocation device 600 based on reinforcement learning, which is characterized by including;
分类单元610,将智能电网的电力业务根据业务类型进行分类;Classification unit 610 classifies the power services of the smart grid according to service types;
分类与切片对应单元620,将所述分类对应不同的切片;The classification and slice corresponding unit 620 corresponds the classification to different slices;
模型构建单元630,根据智能电网的服务指标构建智能电网切片的强化学习模型;通过所述强化学习模型,完成对智能电网切片的分配,实现智能电网的资源调度管理。The model building unit 630 constructs a reinforcement learning model of smart grid slices according to the service indicators of the smart grid; through the reinforcement learning model, allocation of smart grid slices is completed, and resource scheduling management of the smart grid is realized.
本申请提供一种基于强化学习的智能电网切片分配方法,通过将智能电网的业务类型进行分类,将分类对应不同的切片,通过构建的智能电网切片的强化学习模型,完成对智能电网切片的分配。从而解决基于强化学习的5G网络切片技术与智能电网的整合问题。This application provides a method of allocating smart grid slices based on reinforcement learning. By classifying the business types of smart grids, the classifications correspond to different slices, and the allocation of smart grid slices is completed through the constructed reinforcement learning model of smart grid slices. . This solves the integration problem of 5G network slicing technology based on reinforcement learning and smart grid.
以上实施例仅用以说明本发明的技术方案而非对其限制,尽管参照上述实施例对本发明进行了详细的说明,所属领域的普通技术人员依然可以对本发明的具体实施方式进行修改或者等同替换,而这些未脱离本发明精神和范围的任何修改或者等同替换,其均在申请待批的本发明的权利要求保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present invention but not to limit them. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art can still make modifications or equivalent substitutions to the specific embodiments of the present invention. , and any modifications or equivalent substitutions that do not depart from the spirit and scope of the invention are within the scope of the claims of the pending invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910452242.7A CN110381541B (en) | 2019-05-28 | 2019-05-28 | Smart grid slice distribution method and device based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910452242.7A CN110381541B (en) | 2019-05-28 | 2019-05-28 | Smart grid slice distribution method and device based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110381541A CN110381541A (en) | 2019-10-25 |
CN110381541B true CN110381541B (en) | 2023-12-26 |
Family
ID=68248856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910452242.7A Active CN110381541B (en) | 2019-05-28 | 2019-05-28 | Smart grid slice distribution method and device based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110381541B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255347B (en) * | 2020-02-10 | 2022-11-15 | 阿里巴巴集团控股有限公司 | Method and equipment for realizing data fusion and method for realizing identification of unmanned equipment |
CN111292570B (en) * | 2020-04-01 | 2021-09-17 | 广州爱浦路网络技术有限公司 | Cloud 5GC communication experiment teaching system and teaching method based on project type teaching |
CN111953510B (en) * | 2020-05-15 | 2024-02-02 | 中国电力科学研究院有限公司 | Smart grid slice wireless resource allocation method and system based on reinforcement learning |
CN111726811B (en) * | 2020-05-26 | 2023-11-14 | 国网浙江省电力有限公司嘉兴供电公司 | A slice resource allocation method and system for cognitive wireless networks |
CN111711538B (en) * | 2020-06-08 | 2021-11-23 | 中国电力科学研究院有限公司 | Power network planning method and system based on machine learning classification algorithm |
CN112383427B (en) * | 2020-11-12 | 2023-01-20 | 广东电网有限责任公司 | 5G network slice deployment method and system based on IOTIPS fault early warning |
CN112365366B (en) * | 2020-11-12 | 2023-05-16 | 广东电网有限责任公司 | Micro-grid management method and system based on intelligent 5G slice |
CN112737813A (en) * | 2020-12-11 | 2021-04-30 | 广东电力通信科技有限公司 | Power business management method and system based on 5G network slice |
CN113316188B (en) * | 2021-05-08 | 2022-05-17 | 北京科技大学 | A method and device for intelligent slice management and control of an access network supporting an AI engine |
CN113225759B (en) * | 2021-05-28 | 2022-04-15 | 广东电网有限责任公司广州供电局 | Network slice safety and decision management method for 5G smart power grid |
CN113329414B (en) * | 2021-06-07 | 2023-01-10 | 深圳聚创致远科技有限公司 | Smart power grid slice distribution method based on reinforcement learning |
CN113630733A (en) * | 2021-06-29 | 2021-11-09 | 广东电网有限责任公司广州供电局 | Network slice distribution method and device, computer equipment and storage medium |
CN113840333B (en) * | 2021-08-16 | 2023-11-10 | 国网河南省电力公司信息通信公司 | Methods, devices, electronic equipment and storage media for power grid resource allocation |
CN114531403A (en) * | 2021-11-15 | 2022-05-24 | 海盐南原电力工程有限责任公司 | Power service network distinguishing method and system |
CN115460613B (en) * | 2022-04-14 | 2024-07-26 | 国网福建省电力有限公司 | A secure application and management method for power 5G slices |
CN115913966A (en) * | 2022-12-06 | 2023-04-04 | 中国联合网络通信集团有限公司 | Virtual network function deployment method, device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238631A (en) * | 2011-08-17 | 2011-11-09 | 南京邮电大学 | Method for managing heterogeneous network resources based on reinforcement learning |
CN108965024A (en) * | 2018-08-01 | 2018-12-07 | 重庆邮电大学 | A kind of virtual network function dispatching method of the 5G network slice based on prediction |
CN109495907A (en) * | 2018-11-29 | 2019-03-19 | 北京邮电大学 | A kind of the wireless access network-building method and system of intention driving |
CN109600262A (en) * | 2018-12-17 | 2019-04-09 | 东南大学 | Resource self-configuring and self-organization method and device in URLLC transmission network slice |
-
2019
- 2019-05-28 CN CN201910452242.7A patent/CN110381541B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102238631A (en) * | 2011-08-17 | 2011-11-09 | 南京邮电大学 | Method for managing heterogeneous network resources based on reinforcement learning |
CN108965024A (en) * | 2018-08-01 | 2018-12-07 | 重庆邮电大学 | A kind of virtual network function dispatching method of the 5G network slice based on prediction |
CN109495907A (en) * | 2018-11-29 | 2019-03-19 | 北京邮电大学 | A kind of the wireless access network-building method and system of intention driving |
CN109600262A (en) * | 2018-12-17 | 2019-04-09 | 东南大学 | Resource self-configuring and self-organization method and device in URLLC transmission network slice |
Also Published As
Publication number | Publication date |
---|---|
CN110381541A (en) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110381541B (en) | Smart grid slice distribution method and device based on reinforcement learning | |
CN113254197B (en) | Network resource scheduling method and system based on deep reinforcement learning | |
CN111246586B (en) | A method and system for allocating smart grid resources based on genetic algorithm | |
Sun et al. | Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning | |
CN111953510B (en) | Smart grid slice wireless resource allocation method and system based on reinforcement learning | |
Rezazadeh et al. | On the specialization of FDRL agents for scalable and distributed 6G RAN slicing orchestration | |
US20190124667A1 (en) | Method for allocating transmission resources using reinforcement learning | |
CN111866953A (en) | A kind of network resource allocation method, device and storage medium | |
US20140229210A1 (en) | System and Method for Network Resource Allocation Considering User Experience, Satisfaction and Operator Interest | |
Khumalo et al. | Reinforcement learning-based resource management model for fog radio access network architectures in 5G | |
CN106844051A (en) | The loading commissions migration algorithm of optimised power consumption in a kind of edge calculations environment | |
Othman et al. | Efficient admission control and resource allocation mechanisms for public safety communications over 5G network slice | |
CN110519783A (en) | 5G network based on enhancing study is sliced resource allocation methods | |
CN116541106B (en) | Calculation task offloading method, computing device and storage medium | |
Mohamad et al. | On demonstrating the gain of SFC placement with VNF sharing at the edge | |
CN117749631A (en) | Isolation method and device for dynamic network topology resources | |
CN114938372A (en) | A method and device for dynamic migration scheduling of microgrid group requests based on federated learning | |
CN116743669A (en) | A deep reinforcement learning group scheduling method, system, terminal and medium | |
Qiao et al. | Resource allocation for network slicing in open RAN: A hierarchical learning approach | |
CN108292122B (en) | Communication between distributed information agents within a data and energy storage internet architecture | |
Al-Abbasi et al. | On the information freshness and tail latency trade-off in mobile networks | |
US20250184819A1 (en) | Facilitating energy aware admission control with dynamic load balancing in advanced communication networks | |
CN114490018A (en) | Service scheduling algorithm based on resource feature matching | |
CN118759871A (en) | Main control equipment, home appliance computing equipment, distributed systems | |
Varma et al. | Mobile edge computing architecture challenges, applications, and future directions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |