CN114666220A

CN114666220A - Resource allocation method and device for network slice, storage medium and electronic equipment

Info

Publication number: CN114666220A
Application number: CN202210291243.XA
Authority: CN
Inventors: 郭益民; 张园; 史敏锐; 杨明川
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-06-24

Abstract

The disclosure belongs to the technical field of mobile communication, and relates to a resource configuration method and device for a network slice, a storage medium and an electronic device. The method comprises the following steps: acquiring initial network resources and slice types, and initializing the initial network resources according to the slice types to obtain service network resources; performing resource modeling on service network resources according to the slice type to obtain slice performance, and establishing an optimization target according to the slice performance; and performing optimal strategy solution on the optimization target by utilizing deep reinforcement learning to determine a target network slice. According to the method and the device, time delay and energy consumption are jointly considered, slice performance is modeled and designed from the overall performance, the resource allocation effect of reducing service energy consumption on the basis of meeting the service time delay and efficiency requirements is achieved, service type-oriented network slice deployment is achieved, appropriate network slice strategies are designed for different service types by using an artificial intelligence algorithm, effective allocation of network resources is achieved, service experience of users is optimized, and the user reflux degree is improved.

Description

Resource allocation method and device for network slice, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of mobile communications technologies, and in particular, to a resource allocation method for a network slice, a resource allocation device for a network slice, a computer-readable storage medium, and an electronic device.

Background

With the rapid development of 5G (5th Generation Mobile Communication Technology, fifth Generation Mobile Communication Technology), wireless networks have greatly improved throughput, reliability, connection number, transmission delay, and the like, and diversified and refined service scenarios are developed. Based on the above, the network slicing technology of "logical separation and service matching" can plan a specific service function chain, enhance the self-organization and self-management capability of the network, and is widely concerned by the academic and industrial fields.

However, the research on network slices is relatively few at present, and most theoretical researches still stay on core network slices and conventional cellular network slices, and only the allocation of communication resources is considered, and other network resources are not involved.

In view of the above, there is a need in the art to develop a new resource allocation method and apparatus for network slice.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a resource allocation method for a network slice, a resource allocation apparatus for a network slice, a computer-readable storage medium, and an electronic device, which overcome at least to some extent the technical problems of insufficient resource allocation comprehensiveness and poor allocation effect of a network slice due to limitations of related technologies.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the embodiments of the present invention, there is provided a resource allocation method for a network slice, the method including:

acquiring initial network resources and slice types, and initializing the initial network resources according to the slice types to obtain service network resources;

performing resource modeling on the service network resource according to the slice type to obtain slice performance, and establishing an optimization target according to the slice performance;

and performing optimal strategy solution on the optimized target by utilizing deep reinforcement learning to determine a target network slice.

In an exemplary embodiment of the invention, the slice type includes: information services, control services, and entertainment services.

In an exemplary embodiment of the present invention, the resource modeling the service network resource according to the slice type to obtain a slice performance includes:

establishing a cache indicating variable corresponding to the slice type, wherein the cache indicating variable is used for representing whether the slice performance is consumed or not;

and establishing distributed service resources corresponding to the slice type, and performing resource modeling on the service network resources, the distributed network resources and the cache indicating variable to obtain slice performance.

In an exemplary embodiment of the present invention, the resource modeling the serving network resource, the allocated network resource and the cache indicator variable to obtain slice performance includes:

modeling a cache part of the service network resource, the distribution network resource and the cache indicating variable to obtain a first performance;

modeling the non-cache part of the service network resource, the distribution network resource and the cache indicating variable to obtain a second performance;

and obtaining the slicing performance according to the first performance and the second performance.

In an exemplary embodiment of the present invention, the determining an object network slice by performing optimal strategy solution on the optimization object by using deep reinforcement learning includes:

carrying out decision modeling according to the optimization target to obtain a Markov decision process, wherein the Markov decision process comprises a state parameter and the optimization target;

and carrying out optimal strategy solution on the optimization target by utilizing deep reinforcement learning based on the Markov decision process to determine a target network slice.

training a deep network algorithm model to be trained in deep reinforcement learning to obtain a trained deep network algorithm model;

and carrying out optimal strategy solving on the optimized target by utilizing the trained deep network algorithm model to determine a target network slice.

In an exemplary embodiment of the present invention, the training of the deep network algorithm model to be trained in the deep reinforcement learning to obtain the trained deep network algorithm model includes:

initializing model parameters of a deep network algorithm model to be trained, and inputting the state parameters into the initialized deep network algorithm model to be trained to obtain a target state;

storing the state parameters and the target state according to the model parameters to obtain an experience pool, and training a neural network model in the deep network algorithm model to be trained by utilizing training samples in the experience pool to obtain a target network model;

and updating the model parameters according to a loss function based on the target network model to obtain a trained deep network algorithm model.

According to a second aspect of the embodiments of the present invention, there is provided a resource allocation apparatus for network slice, including:

the type dividing module is configured to acquire initial network resources and slice types, and initialize the initial network resources according to the slice types to obtain service network resources;

the performance modeling module is configured to perform resource modeling on the service network resources according to the slice types to obtain slice performance, and establish an optimization target according to the slice performance;

and the strategy solving module is configured to perform optimal strategy solution on the optimization target by utilizing deep reinforcement learning to determine a target network slice.

performing cache part modeling on the service network resources, the distribution network resources and the cache indicating variables to obtain first performance;

According to a third aspect of an embodiment of the present invention, there is provided an electronic apparatus including: a processor and a memory; wherein the memory has stored thereon computer readable instructions which, when executed by the processor, implement the resource configuration method of the network slice in any of the above exemplary embodiments.

According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the resource configuration method of the network slice in any of the above-described exemplary embodiments.

As can be seen from the foregoing technical solutions, the resource allocation method for a network slice, the resource allocation apparatus for a network slice, the computer storage medium, and the electronic device in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:

in the method and the device provided by the exemplary embodiment of the disclosure, resource modeling is performed on service network resources according to slice types to obtain slice performance, time delay and energy consumption are jointly considered, and the slice performance is modeled and designed from the global performance, so that a resource allocation effect of reducing service energy consumption as much as possible on the basis of meeting service time delay and efficiency requirements is realized, and service type-oriented network slice deployment is realized. Furthermore, in the process of designing the target network slice, the deep reinforcement learning is applied, and the artificial intelligence algorithm is used for designing the appropriate network slice strategy according to different service types, so that the effective allocation of network resources is realized, the service experience of a user is optimized, and the user reflux degree is improved to a certain extent.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically illustrates a flow chart of a resource configuration method of a network slice in an exemplary embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of deriving slice performance through resource modeling in an exemplary embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method for further resource modeling resulting in slice performance in an exemplary embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram of a method for optimal strategy solution using deep reinforcement learning in an exemplary embodiment of the present disclosure;

FIG. 5 is a flow diagram schematically illustrating a method for optimal policy resolution using a deep network algorithm model in an exemplary embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of a method of training a deep web algorithm model in an exemplary embodiment of the disclosure;

fig. 7 is a flowchart schematically illustrating a resource configuration method of a network slice in an application scenario in an exemplary embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of an MEC assisted in-vehicle networking slicing scenario in an application scenario in an exemplary embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a resource configuration apparatus for network slicing in an exemplary embodiment of the present disclosure;

fig. 10 schematically illustrates an electronic device for implementing a resource configuration method for network slicing in exemplary embodiments of the present disclosure;

fig. 11 schematically illustrates a computer-readable storage medium for implementing a resource configuration method for network slicing in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/parts/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first" and "second", etc. are used merely as labels, and are not limiting on the number of their objects.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities.

With the rapid development of the 5G communication technology, the wireless network has a huge improvement in the aspects of throughput, reliability, connection quantity, transmission delay and the like, and a variety of fine and diversified service scenes are developed.

The traditional network architecture which is switched once uses a group of vertically integrated network elements to provide all functions of the network, does not support the flexible and dynamic expansion of the network, and is difficult to realize differentiated service requirements.

With the aid of Software-Defined networking (SDN) and Network Function Virtualization (NFV) technologies, a "logic separation, service matching" Network slicing technology can plan a specific service Function chain, enhance self-organization and self-management capabilities of a Network, and is receiving wide attention from both academic and industrial fields.

The standards organization such as 3GPP (3rd Generation Partnership Project) also proposes that the radio access network should support slice design and develop research on related resource management technologies through working groups such as SA (standard alone)2, SA5, RAN3, and the like.

In recent years, the evolution of vehicle intelligence and the development of multi-access edge computing have made the types of business of the internet of vehicles more complicated and diversified, and the resources required by the business are not limited to communication resources, but rather require the assistance of computing resources and storage resources of edge facilities.

Different internet of vehicles services have great difference in requirements for information transmission quality, such as low time delay required by automatic driving service, high entertainment service required rate, and the like, so that the internet of vehicles needs to design a proper network slicing strategy for different service types to realize effective allocation of network resources and guarantee service experience and traffic safety of users.

In addition, with the development of Artificial Intelligence (AI) technology, how to implement reasonable scheduling of network resources by using an Artificial Intelligence algorithm has become a research hotspot. Therefore, real-time deployment of network slices also requires the assistance of AI algorithms.

Currently, there is little research on network slices, and most theoretical research still remains on core network slices and conventional cellular network slices, and only the allocation of communication resources is considered, and no other network resources are involved.

In the prior art, a method for allocating resources in a network slice based on resource allocation priority design is provided under the condition that multiple user terminals and multiple data packet streams coexist, and resource allocation is performed by taking the sum of weighted transmission rates in a maximized slice as an optimization target.

The method has the characteristic of low complexity, can realize flexible, quick and dynamic adjustment and distribution of resources in the slice, and can obviously improve the resource utilization rate and the service experience of the user terminal.

However, this method only considers the allocation of communication resources in the 5G mobile network and does not involve other network resources.

In addition, a content caching decision and resource allocation joint optimization method based on mobile edge computing in the Internet of vehicles is further provided. On the premise of ensuring the time delay requirement of a vehicle user, the method aims to maximize the system benefit by means of content caching decision, channel allocation, MEC (Mobile Edge Computing, Edge Computing technology) server Computing resource allocation and the like.

According to the method, resource allocation is realized through content caching decision channel allocation and then reinforcement learning, but various resources are not allocated from the aspect of global performance, and the problem of energy consumption is not considered sufficiently.

In order to solve the problems in the related art, the present disclosure provides a resource allocation method for network slices. Fig. 1 shows a flowchart of a resource allocation method of a network slice, and as shown in fig. 1, the resource allocation method of the network slice at least includes the following steps:

and S110, acquiring initial network resources and slice types, and initializing the initial network resources according to the slice types to obtain service network resources.

And S120, performing resource modeling on the service network resources according to the slice types to obtain slice performance, and establishing an optimization target according to the slice performance.

And S130, carrying out optimal strategy solution on the optimized target by utilizing deep reinforcement learning to determine a target network slice.

In the exemplary embodiment of the disclosure, resource modeling is performed on service network resources according to slice types to obtain slice performance, time delay and energy consumption are jointly considered, and the slice performance is modeled and designed from the overall performance, so that a resource allocation effect of reducing service energy consumption as much as possible on the basis of meeting service time delay and efficiency requirements is achieved, and service type-oriented network slice deployment is achieved. Furthermore, in the process of applying deep reinforcement learning to the design of the target network slice, an artificial intelligence algorithm is utilized to design a proper network slice strategy for different service types, so that the effective allocation of network resources is realized, the service experience of a user is optimized, and the user reflux is improved to a certain extent.

The following describes each step of the resource allocation method for network slices in detail.

In step S110, the initial network resource and the slice type are obtained, and the initial network resource is initialized according to the slice type to obtain the service network resource.

In an exemplary embodiment of the present disclosure, in an MEC-assisted communication scenario, the deployment with the computation capability of F is on the RSU (Road Side Unit) Side, the storage space of the RSU is M, and the total communication bandwidth of V2I is B.

The edge computing technology is a product of Information and Communication Technology (ICT) fusion, and combines with technologies such as SDN/NFV, big data, artificial intelligence and the like which are becoming mature day by day, when a 5G network becomes a key infrastructure of digital transformation in various industries, the MEC also becomes a key technology for supporting an operator to perform 5G network transformation, so as to meet business development requirements such as high-definition video, Virtual Reality/Augmented Reality (VR/AR), industrial internet, car networking and the like.

The MEC provides a building block combination of connection, calculation, capability and application on the basis of edge network and edge calculation resources, and provides service for users nearby.

The MEC provides services and cloud computing functions needed by users nearby, creates a service environment with high performance, low delay and high bandwidth, accelerates the rapid downloading of various contents, services and applications in the network, and enables consumers to enjoy uninterrupted high-quality network experience.

The MEC has various advantages, such as network and service cooperation, realization of differentiated customization and flexible routing, creation of intelligent connection with low time delay and high bandwidth; the cloud edge capability is coordinated, the cloud service boundary is extended, the cloud service quality is improved, and a convenient and ubiquitous cloud is created; the method provides a brand-new service which is based on 'connection + calculation' and takes connection as an access point and flexibly combines calculation, capacity and application, and breaks through a business boundary.

Based on this, the initial network resources may include computing resources, storage space, and communication bandwidth, and the computing resources are F, the storage space is M, and the communication bandwidth is B.

According to the QoS (Quality of Service) of a Service in the MEC-assisted communication scenario, and the difference in the requirements for communication bandwidth, computing resources, and storage space, the Service may be divided into three slice types.

Network resources are always limited, and the requirement of service quality can be met as long as the network resources are seized. The quality of service is relative to network traffic, and may be at the expense of the quality of service of other traffic while ensuring the quality of service of certain types of traffic.

For example, in the case of a fixed network total bandwidth, if a certain type of service occupies more bandwidth, the less bandwidth can be used by other services, which may affect the use of other services. Therefore, a network manager needs to reasonably plan and allocate network resources according to the characteristics of various services, so that the network resources are efficiently utilized.

For network traffic, QoS includes transmission bandwidth, transmission delay, packet loss rate of data, and the like. In the network, the service quality can be improved by ensuring the transmission bandwidth, reducing the transmission time delay, reducing the packet loss rate of data, reducing the time delay jitter and other measures.

In general, QoS provides three service models, which are Best-Effort service (Best-Effort service model), Integrated service (Int-Serv, Integrated service model), and Differentiated service (Diff-Serv, Differentiated service model).

Among them, Best-Effort is a single service model and is also the simplest service model. For the Best-Effort service model, the network sends the message as much as possible. But does not provide any guarantees on performance such as time delay, reliability, etc.

The Best-Effort service model is a default service model of a network, and is implemented by a FIFO (First Input First Output) queue. It is suitable for most network applications, such as FTP (File Transfer Protocol), E-Mail (Electronic Mail), etc.

Int-Serv is a comprehensive service model that can meet a variety of QoS requirements. The model uses the Resource Reservation Protocol (RSVP), RSVP running on each device from source to destination, which can monitor each flow to prevent it from consuming too many resources. The system can clearly distinguish and ensure the service quality of each service flow, and provides the finest service quality distinction for the network.

However, the Inter-Serv model has high requirements on devices, and when the number of data streams in the network is large, the storage and processing capabilities of the devices are stressed greatly. The Inter-Serv model has poor expandability and is difficult to implement in an Internet core network.

Diff-Serv is a multi-service model that can meet different QoS requirements. Unlike Int-Serv, it does not need to inform the network to reserve resources for each service. The differentiated service is simple to realize and good in expansibility.

In an alternative embodiment, the slice type, comprises: information services, control services, and entertainment services.

Further, the initial network resource may be initialized according to different slice types to obtain the serving network resource.

In particular, there may be N at a time₁、N₂And N₃Individual user class service requirements.

Wherein, the information service comprises information acquisition and processing, etc., and a small amount of computing tasks f need to be executed in the RSU_i ¹(i＝1,…,N₁) The size of the transmission is

The minimum delay requirement of the service content is T₁。

Control services including fusion processing of sensor-aware information with other infrastructure, etc., require a large number of computational tasks to be performed at the RSU

The transmission size is

Service content of (2), the minimum delay requirement is T₂。

The entertainment services comprise network browsing, high-definition video and other entertainment services, and the transmission size is

The minimum delay requirement of the service content is T₃。

Considering the actual situation, it is assumed that the service contents requested by different users are different from each other at the same time. The traffic content of the information and control services is derived from the RSU-collected pending context information, whereas the traffic content of the entertainment services requires the RSU to be obtained from the core network via the backhaul link without buffering, thus assuming that the download rate from the core network is fixed at r₀。

Where a backhaul link refers to a connection from an access network or cell site to a switching center. The switching center is connected to a backbone network, which is connected to a core network. Thus, a backhaul link network is an intermediate layer of any telecommunications network architecture that is located between an access network and a backbone network, providing an important connection for the two networks.

For example, when a user accesses the internet in an internet cafe using Wi-Fi (wireless communication technology), the Wi-Fi device must connect back to an ISP (internet service Provider) end, and the link task can be assumed by WiMAX (World Interoperability for Microwave Access). This functionality helps the service provider reduce the cost of backhaul transport.

In step S120, resource modeling is performed on the service network resource according to the slice type to obtain slice performance, and an optimization target is established according to the slice performance.

In an exemplary embodiment of the present disclosure, after initializing the initial network resource according to the slice type to obtain the service network resource, the service network resource may be resource modeled according to the slice type to obtain the slice performance.

In an alternative embodiment, fig. 2 shows a flow diagram of a method for deriving slice performance by resource modeling, which, as shown in fig. 2, comprises at least the following steps: in step S210, a cache indication variable corresponding to the slice type is established, and the cache indication variable is used for characterizing whether the slice performance is consumed.

Considering the construction targets of green and low carbon, the energy consumption of the service needs to be reduced as much as possible on the basis of meeting the requirements of service delay and rate. Therefore, the communication bandwidth, the computing resources and the storage space of the RSU are allocated by jointly considering the system delay and the energy consumption, so that the service-oriented type network slice deployment is realized.

Thus, define

And

a buffer indicating variable for corresponding contents of the information service, the control service and the entertainment service.

Wherein the content of the first and second substances,

when the temperature is higher than the set temperature

When the content is stored in the RSU, the content can be directly transmitted and obtained without processing; when in use

In time, it means that the content corresponding to the user request is not stored in the RSU, and the service delay and energy consumption need to be calculated.

In step S220, an allocation service resource corresponding to the slice type is established, and resource modeling is performed on the service network resource, the allocation network resource, and the cache indicator variable to obtain slice performance.

Suppose that the resources of the communication bandwidth allocated by the main communication network to the three types of services of information service, control service and entertainment service are respectively B₁、B₂、B₃The computing resources are respectively F₁、F₂The resources of the storage space are respectively M₁、M₂、M₃And resource average distribution in the slice.

Thus, the allocated service resources corresponding to the slice type established include communication bandwidth, calculation resources, and storage space, and the allocated service resources of the information service are respectively communication bandwidth B₁Computing resource is F₁M storage space₁(ii) a The service resources allocated for the control service are respectively the communication bandwidth B₂Computing resource is F₂M storage space₂(ii) a The distributed service resource of entertainment service is communication bandwidth B₃The storage space is M₃。

Furthermore, resource modeling can be performed on service network resources, distributed network resources and cache indicating variables to obtain slice performance.

In an alternative embodiment, fig. 3 shows a flow chart of a method for further performing resource modeling to obtain slice performance, and as shown in fig. 3, the method at least includes the following steps: in step S310, a first performance is obtained by performing cache portion modeling on the serving network resource, the allocated network resource, and the cache indicator variable.

For any user of the information service, the service delay caused by the cache may be:

the RSU energy consumption due to buffering may be:

for any user controlling the service, the service delay caused by the cache may be:

the RSU energy consumption due to buffering may be:

for any user of the entertainment service, since the service content of the entertainment service is obtained from the core network through the backhaul link without being cached, the service delay caused by the caching may be:

and RSU power consumption due to buffering is not incurred.

For example, when a user surfs the Internet in a bar with Wi-Fi, the Wi-Fi device must connect back to the ISP end, and the link task can be assumed by WiMAX. This functionality helps the service provider reduce the cost of backhaul transport.

In step S320, modeling the non-cache portion of the serving network resource, the allocated network resource, and the cache indicator to obtain a second performance.

For any user of the information service, the service delay caused by the non-cache part can be:

the RSU energy consumption due to the non-buffered portion may be:

for any user controlling the service, the traffic delay due to the non-buffer part may be:

the RSU energy consumption due to the non-buffered portion may be:

for any user of the entertainment service, the traffic delay due to the non-cached part may be:

the RSU energy consumption due to the non-buffered portion may be:

in step S330, a slicing performance is obtained from the first performance and the second performance.

After the cache part modeling and the non-cache part modeling are carried out on the service network resources, the allocation network resources and the cache indicating variables, the first performance and the second performance of different slice types can be obtained respectively, and therefore the slice performance of different slice types can be obtained according to the first performance and the second performance.

Specifically, the first performance and the second performance may be summed to obtain the corresponding slice performance.

For any user of the information service, the service delay can be as shown in formula (1):

RSU energy consumption can be shown as equation (2):

where P denotes the RSU transmit power, g_iDenotes the path loss, n_oRepresenting the noise power and ω representing the energy conversion coefficient of the MEC processor.

For any user controlling the service, the traffic delay can be as shown in equation (3):

RSU energy consumption can be shown as equation (4):

where P denotes the RSU transmit power, g_jRepresents the path loss, n_oRepresenting the noise power and ω representing the energy conversion coefficient of the MEC processor.

For any user of entertainment services, the traffic delay can be as shown in equation (5):

RSU energy consumption can be shown as equation (6):

wherein P represents RSU transmission power, g_kDenotes the path loss, n_oRepresenting the noise power and ω representing the energy conversion coefficient of the MEC processor.

In the exemplary embodiment, the performance of the network slices of different types is modeled and analyzed by setting the optimization function for different slice types, and the slice performance includes both time delay and energy consumption, which jointly covers the global performance and provides data base and theoretical support for allocating multiple resources of communication bandwidth, computing resources and storage space.

After the slice performance is obtained through resource modeling, an optimization objective may be established based on the slice performance.

The optimization objective r can be shown in equation (7):

s.t.C1:B₁+B₂+B₃＝B

C2:F₁+F₂＝F

C3:M₁+M₂+M₃＝M

C4:

C5:

C6:

C7:

C8:

C9:

wherein, α and β are respectively the influence factors of different types of service delay and energy consumption on system performance, C1, C2 and C3 are respectively the constraints of communication bandwidth, computing resource and storage space allocation, C4, C5 and C6 are the constraints of cache decision, and C7, C8 and C9 are respectively the delay constraints of different types of service.

In step S130, the optimization target is optimally solved by deep reinforcement learning to determine a target network slice.

In an exemplary embodiment of the present disclosure, after the optimization objective is established, the optimization objective may be optimally policy solved using deep reinforcement learning to determine the target network slice.

Reinforcement Learning is an important branch of Machine Learning (ML) methods in the field of artificial intelligence, and is also an effective means for handling multi-stage decision problems.

Deep Learning (DL) is used as a branch of ML, a sensor with a plurality of hidden layers is included, various methods based on artificial neural networks are mainly adopted to realize ML technology, features are learned autonomously, and the Deep Learning (DL) is successfully applied to the aspects of computer vision, translation, semantic mining, image processing and the like at present.

Although the deep learning has strong perception capability, certain decision-making capability is lacked; and the reinforcement learning has decision-making capability and is ineligible for perceiving problem tie.

Therefore, the two are combined, the advantages are complemented, and a solution is provided for the perception decision problem of a complex system. Deep Reinforcement Learning (Deep RL), which is developed from Reinforcement Learning and Deep Learning, has become one of popular research targets in the field of artificial intelligence.

The deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning, can be directly controlled according to the input image, and is an artificial intelligence method closer to the human thinking mode.

Deep reinforcement learning is a technology which is emerging in recent years and combines the deep learning technology and the reinforcement learning technology. The deep reinforcement learning has the capability of performing pattern recognition on a high-dimensional state in a complex system and performing action output on the basis of the pattern recognition.

In the machine learning terminology, deep reinforcement learning is expressed as a trial and error process driven by rewards, that is, the Agent continuously corrects the action strategy in the trial and error by repeatedly interacting with a complex environment with the lapse of time, and finally obtains the maximum expected accumulated benefit to obtain a series of strategy sequences.

Based on deep reinforcement learning, learning can be carried out in a mode of continuously trial and error summarizing through interaction with the environment. Deep RL is suitable for control, decision and complex system optimization tasks. Deep RL has huge potential application space in the fields of games, automatic driving control and decision, robot control, finance, industrial system control optimization and the like.

In an alternative embodiment, fig. 4 shows a flow chart of a method for performing optimal strategy solution using deep reinforcement learning, as shown in fig. 4, the method at least comprises the following steps: in step S410, a markov decision process is obtained by performing decision modeling according to the optimization target, where the markov decision process includes a state parameter and the optimization target.

Modeling the resource allocation optimization problem of the network slice as a Markov Decision Process (MDP), wherein the system state, namely the state parameter s comprises the resource allocation condition of different network slices, the cache Decision of the RSU, the business service quality of each user and the energy consumption of the RSU; system action, namely action parameter a is the allocation and buffering decision of adjusting RSU and MEC to different slice communication bandwidth resources, computing resources and storage space resources; the system reward function may be an optimization objective r.

The markov decision process is a mathematical model of sequential decision (sequential decision) for simulating stochastic strategies and returns that can be realized by an agent in an environment where the system state has markov properties.

The markov decision process is built on the basis of a set of interactive objects, namely agents and environments, with elements including states, actions, policies and rewards.

In the simulation of a markov decision process, the agent perceives the current system state and acts on the environment in a strategic manner, thereby changing the state of the environment and receiving a reward, the accumulation of which over time is referred to as a reward.

The theoretical basis of the markov decision process is the markov chain and is therefore also considered to be a markov model that takes action into account. The markov decision process built on discrete time is called "discrete-time markov decision process (descrete-time MDP)" and vice versa, called "continuous-time markov decision process (continuous-time MDP)". Furthermore, there are several variations of the markov decision process, including partially observable markov decision processes, constrained markov decision processes, and fuzzy markov decision processes.

In an application aspect, a markov decision process is used to model reinforcement learning (learning) problems in machine learning. By using methods such as dynamic programming, stochastic sampling, and the like, a markov decision process can solve an agent policy that maximizes returns and find application in topics such as automatic control, recommendation systems, and the like.

During reinforcement learning, agents and environments are interacting. At each time t, agent receives a state s from the context, based on which it takes an action a, which then acts on the context, so that agent can receive a reward R_t+1And agent will reach a new state. Therefore, the interaction between an agent and an environment is to generate a sequence.

The markov decision process is a formulation of a typical sequential decision process. With the assumption of Markov, the decision making process is more convenient and practical in solving the sequence.

In step S420, based on the markov decision process, performing optimal strategy solution on the optimized target by using deep reinforcement learning to determine a target network slice.

In an alternative embodiment, fig. 5 shows a flowchart of a method for performing optimal policy solution by using a deep network algorithm model, as shown in fig. 5, the method at least includes the following steps: in step S510, a deep network algorithm model to be trained in deep reinforcement learning is trained to obtain a trained deep network algorithm model.

Reinforcement learning algorithms can be divided into three broad categories, value based, policy based and operator critical.

A value-based algorithm represented by DQN (Deep Q Network) is common, and the algorithm has only one value function Network and no policy Network.

The algorithm is characterized by calculating a state value V(s) or a state action value Q (s, a), and optimizing the strategy by improving the value.

Typical algorithms are Q-learning, Sarsa, DQN, DDQN, among others.

From the Q-learning algorithm, the DQN is the root of a series of algorithms, and the core idea is to establish a reward table related to state action, called a Q table.

And interacting with the environment based on the form selection action, updating the form again according to the reward fed back, and circulating until the Q form is converged.

The Q-learning algorithm has the advantages of simplicity and fast convergence, because the method adopts a single step updating method and does not need to wait until the end of a round. Meanwhile, the Q-learning algorithm is an offline strategy, namely the action strategy of selecting action of the Q-learning algorithm is different from the target strategy of calculating the Q value, the action adopts an epsilon greedy strategy, and the greedy strategy is adopted for calculating the Q value, so that the exploratory property is ensured, and the local optimum cannot be converged.

Of course, the Q-learning algorithm has significant drawbacks, it only applies to states, the motion space is discrete, and the scene is small. When the state and the action space are too large, the Q table becomes large and is updated slowly.

Considering that the Q-learning algorithm cannot be used in a scenario with a large state space, since the calculation of Q value would be cumbersome, however, the neural network for deep learning is the most adept one for such an input huge amount of calculation. Thus, the DQN is obtained by combining the deep neural network with Q-learning, which also opens up the research in the field of deep reinforcement learning. The method has the main advantage of being suitable for scenes with large state spaces.

And solving an optimal slice resource allocation scheme by utilizing a DQN algorithm model in deep reinforcement learning according to a system state, a system dynamic state and a system reward function in the Markov decision process so as to determine the target network slice.

Thus, a function approximator, such as a neural network model, may be used to estimate the action cost function of the Q network; the Q (s, a; theta) parameter theta represents the weight of the neural network model, and the value of theta is updated iteratively.

Is the goal of each iteration, gamma denotes the discount factor, Q^*(s_i+1,a_i+1) Representing the maximum of all actions. By minimizing the loss function at each iteration i

To train the Q network.

In an alternative embodiment, fig. 6 is a flow chart illustrating a method for training a deep network algorithm model, as shown in fig. 6, the method at least includes the following steps: in step S610, model parameters of the deep network algorithm model to be trained are initialized, and state parameters are input into the initialized deep network algorithm model to be trained to obtain a target state.

Specifically, relevant model parameters of the DQN model are initialized, wherein the relevant model parameters comprise the capacity N of the experience pool D, the online network weight theta and the target network weight theta^-＝θ。

The training of the DQN algorithm model is performed on an X-screen (episode), each screen repeating step S630 from the step of obtaining the target state.

Furthermore, a state parameter, i.e. a starting state s, is input into the initialized DQN algorithm model to be trained₁And, step S620 and step S630 are performed from T-1 to T-T.

In step S620, an experience pool is obtained according to the model parameter storage state parameter and the target state, and a neural network model in the deep network algorithm model to be trained is trained by using training samples in the experience pool to obtain a target network model.

Randomly selecting action a with probability of epsilon_tOr by

Obtaining x_tAnd s_t+1。

Further, the state parameter s at time t can be determined_t+1And a target state a_tAnd storing the experience into an experience pool D.

Specifically, the sample(s) at time t may be_t,a_t,r_t,s_t+1) Storing the data in an experience pool D, and randomly drawing a small batch of samples(s) from the experience pool_i,a_i,r_i,s_i+1) To train) a neural network model in the deep network algorithm model to be trained, and calculating a target network y_iTo obtain a trained target network model.

In step S630, based on the target network model, the model parameters are updated according to the loss function, so as to obtain a trained deep network algorithm model.

In computing the target network y_iThen, the loss function L can also be matched_i(theta) updating the model parameter theta using a gradient descent method.

In addition, when the model parameter θ is updated, the model parameter θ may be updated every Y steps so that θ is updated every Y steps^-＝θ。

When theta is^-When the distance theta is close enough to the distance theta, the condition for training the deep network algorithm model to be trained is met, and the trained deep network algorithm model can be obtained at the moment.

In the exemplary embodiment, by training the deep network algorithm model, a solution model can be provided for optimal strategy solution, accuracy of target network slice output is guaranteed, and effective allocation of resources and user service experience are further guaranteed.

In step S520, the trained deep network algorithm model is used to perform optimal strategy solution on the optimized target to determine a target network slice.

The deep network algorithm model after a large amount of training can output an optimal target network slice scheme, and the target network slice can include contents such as communication bandwidth, computing resources, storage space and cache decision.

In the exemplary embodiment, the target network slice can be determined through the trained deep network algorithm model, and the network resources are flexibly and dynamically allocated in a customized manner according to the service requirements of the user, so that the service experience, efficiency and safety of the user are guaranteed.

The following describes a resource allocation method for a network slice in the embodiment of the present disclosure in detail with reference to an application scenario.

Fig. 7 is a flowchart illustrating a resource allocation method for a network slice in an application scenario, and as shown in fig. 7, in step S710, an MEC-assisted internet of vehicles slice scenario is constructed.

Fig. 8 shows a schematic diagram of an MEC-assisted car networking slicing scene in an application scenario, as shown in fig. 8, with the development of an intelligent internet-connected car, multiple different services will exist in the car networking, and performance indexes of each service are different.

For example, the traffic safety and road information related services have a small data transmission amount and high requirement on time delay; the automatic driving related service relates to the perception content of equipment such as a camera, an exciting radar, a millimeter wave radar and the like, so that the data transmission quantity is large, and the time delay requirement is extremely high; entertainment services such as video and music have high requirements on data transmission efficiency and need to maintain stable connection.

The edge computing technology is an ICT fusion product, and combines with technologies such as SDN/NFV, big data and artificial intelligence which are becoming mature day by day, when a 5G network becomes a key infrastructure of digital transformation of each industry, the MEC also becomes a key technology for supporting an operator to perform 5G network transformation, so as to meet business development requirements of high-definition videos, VR/AR, industrial internet, internet of vehicles and the like.

In order to reasonably distribute vehicle network resources aiming at different service types, network slice types are divided according to QoS and bandwidth communication, computing resources and storage space requirements of vehicle network services.

In general, QoS provides three service models, Best-efficiency service, Integrated service, and Differentiated service.

Among them, Best-efficiency is a single service model and is also the simplest service model. For the Best-Effort service model, the network sends the message as most as possible. But does not provide any guarantee for the performances of time delay, reliability and the like.

The Best-Effort service model is the default service model of the network, implemented through FIFO queues. It is suitable for most network applications, such as FTP, E-Mail, etc.

Int-Serv is a comprehensive service model that can meet a variety of QoS requirements. The model uses a resource reservation protocol, RSVP running on each device from source to destination, which can monitor each flow to prevent it from consuming too many resources. The system can clearly distinguish and guarantee the service quality of each service flow, and provides the finest granularity of service quality distinction for the network.

In the MEC-assisted communication scenario, the deployment with the computing power F is on the RSU side, the storage space of the RSU is M, and the total communication bandwidth of V2I is B.

Therefore, the initial network resources may include a computation resource, a storage space, and a communication bandwidth, and the computation resource is F, the storage space is M, and the communication bandwidth is B.

In step S711, the slice type is divided.

According to the service quality under the MEC-assisted communication scene and the difference of the requirements on communication bandwidth, computing resources and storage space, the service can be divided into three slice types.

Wherein the slice type may include: information services, control services, and entertainment services.

Specifically, the information service may be a traffic information service, the control service may be an autopilot service, and the entertainment service may be an in-vehicle entertainment service.

In step S712, network resources and user traffic requirements are initialized.

The traffic information service comprises information acquisition, processing and the like, and a small amount of calculation tasks f need to be executed in an RSU_i ¹(i＝1,…,N₁) The transmission size is

The minimum delay requirement of the service content is T₁。

The automatic driving service includes fusion processing of sensor perception information with other infrastructures, and the like, and a large amount of calculation tasks need to be executed on an RSU

The transmission size is

The minimum delay requirement of the service content is T₂。

The vehicle-mounted entertainment services comprise network browsing, high-definition video and other entertainment services, and the transmission size is

The minimum delay requirement of the service content is T₃。

Considering the actual situation, it is assumed that the service contents requested by different vehicles are different at the same time.

The traffic content of the traffic information service and the automatic driving service is derived from the to-be-processed environment information collected by the RSU, while the traffic content of the vehicle-mounted entertainment service needs the RSU to be obtained from the core network through a backhaul link under the condition of no buffer, so that the downloading rate from the core network is assumed to be fixed as r₀。

In step S720, a network slicing policy is designed.

In step S721, the modeling analyzes different slice properties, including latency and energy consumption.

Definition of

And

and caching the corresponding content of the traffic information service, the automatic driving service and the vehicle-mounted entertainment service.

Wherein the content of the first and second substances,

when in use

When the content is stored in the RSU, the content corresponding to the user request can be directly transmitted and obtained without processing; when the temperature is higher than the set temperature

In time, it means that the content requested by the corresponding user is not stored in the RSU, and the service delay and energy consumption need to be calculated.

The resources of the communication bandwidth allocated to the traffic information service, the automatic driving service and the vehicle-mounted entertainment service by the main communication network are respectively B₁、B₂、B₃The computing resources are respectively F₁、F₂The resources of the storage space are respectively M₁、M₂、M₃The resources within a slice are evenly distributed.

Therefore, the allocated service resources corresponding to the slice type established include communication bandwidth, calculation resources, and storage space, and the allocated service resources of the traffic information service are respectively communication bandwidth B₁Computing resource is F₁The storage space is M₁(ii) a Distribution suit for automatic driving serviceThe service resources are respectively communication bandwidth B₂Computing resource is F₂The storage space is M₂(ii) a The distributed service resources of the vehicle-mounted entertainment service are respectively that the communication bandwidth is B₃The storage space is M₃。

And carrying out cache part modeling on the service network resources, the distribution network resources and the cache indicating variables to obtain first performance.

For any user of the traffic information service, the service delay caused by the cache can be as follows:

the RSU energy consumption due to buffering may be:

for any user of the automatic driving service, the service delay caused by the cache can be as follows:

the RSU energy consumption due to buffering may be:

for any user of the in-vehicle entertainment service, since the service content of the entertainment service is obtained from the core network through the backhaul link without being cached, the service delay caused by the caching may be:

and RSU energy consumption caused by caching is avoided.

And modeling the non-cache part of the service network resources, the distribution network resources and the cache indicating variables to obtain a second performance.

For any user of the traffic information service, the service delay caused by the non-cache part can be as follows:

the RSU energy consumption due to the non-buffered portion may be:

for any user of the automatic driving service, the service delay caused by the non-cache part can be as follows:

the RSU energy consumption due to the non-buffered portion may be:

for any user of the in-vehicle entertainment service, the service delay caused by the non-cache part can be as follows:

the RSU energy consumption due to the non-buffered portion may be:

For any user of the traffic information service, the traffic delay can be as shown in formula (1), and the RSU energy consumption can be as shown in formula (2).

For any user of the autonomous driving service, the traffic delay may be as shown in equation (3) and the RSU energy consumption may be as shown in equation (4).

For any user of the in-vehicle entertainment service, the traffic delay can be as shown in equation (5) and the RSU energy consumption can be as shown in equation (6).

In step S722, an optimization objective is established.

After the slice performance is obtained through resource modeling, an optimization objective may be established based on the slice performance. The established optimization objective r can be shown in equation (7).

In step S730, the optimal slice scheme is solved by using deep reinforcement learning.

After the optimization objective is established, the optimization objective can be optimally solved by deep reinforcement learning to determine the target network slice.

The reinforcement learning is an important branch of a machine learning method in the field of artificial intelligence, and is also an effective means for processing a multi-stage decision problem.

Deep learning is used as a branch of ML, comprises a plurality of sensors with hidden layers, mainly adopts various methods based on artificial neural networks to realize ML technology, and self-learns characteristics, and is successfully applied in the aspects of computer vision, translation, semantic mining, image processing and the like at present.

Although the deep learning has stronger perception capability, certain decision-making capability is lacked; and the reinforcement learning has decision-making capability and is ineligible for perceiving problem tie.

Therefore, the two are combined, the advantages are complementary, and a solution is provided for the perception decision problem of a complex system. The deep reinforcement learning developed from reinforcement learning and deep learning has become one of the popular research objects in the field of artificial intelligence.

Based on deep reinforcement learning, learning can be carried out in a mode of continuously trial and error summarizing through interaction with the environment. Deep RL is suitable for control, decision and complex system optimization tasks. The Deep RL has huge potential application space in the fields of games, automatic driving control and decision, robot control, finance, industrial system control optimization and the like.

In step S731, the optimization problem is modeled as a markov decision process.

And carrying out decision modeling according to the optimization target to obtain a Markov decision process, wherein the Markov decision process comprises a state parameter and the optimization target.

Modeling the resource allocation optimization problem of the network slice as a Markov decision process, wherein the system state, namely the state parameter s comprises the resource allocation condition of different network slices, the cache decision of the RSU, the service quality of each user and the energy consumption of the RSU; system action, namely action parameter a is the allocation and buffering decision of adjusting RSU and MEC to different slice communication bandwidth resources, computing resources and storage space resources; the system reward function may be an optimization objective r.

In step S732, a DQN model solution is constructed.

And based on a Markov decision process, carrying out optimal strategy solution on the optimized target by utilizing deep reinforcement learning to determine a target network slice.

And training the deep network algorithm model to be trained in the deep reinforcement learning to obtain the trained deep network algorithm model.

Reinforcement learning algorithms can be divided into three major categories, value based, polarity based, and operator dependent.

A value based algorithm represented by DQN is common, and the algorithm has only one value function network and no policy network.

Typical algorithms are Q-learning, Sarsa, DQN, DDQN, among others.

Of course, the Q-learning algorithm has significant drawbacks, it only applies to states, the motion space is discrete, and the scene is small. When the state and the motion space are too large, the Q table becomes large and is updated slowly.

Considering that the Q-learning algorithm cannot be used in a scenario with a large state space, the calculation of the Q value is troublesome, but the deep learning neural network is most adept at calculating such an input huge amount. Thus, the DQN is obtained by combining the deep neural network with Q-learning, which also opens up the research in the field of deep reinforcement learning. The method has the main advantage of being suitable for scenes with large state spaces.

To train the Q network.

Initializing the model parameters of the deep network algorithm model to be trained, and inputting the state parameters into the initialized deep network algorithm model to be trained to obtain the target state.

And (3) carrying out X-screens on the training of the DQN algorithm model, wherein each screen needs to repeatedly carry out the subsequent steps from the step of obtaining the target state.

Furthermore, a state parameter, i.e. a starting state s, is input into the initialized DQN algorithm model to be trained₁And, the subsequent steps are performed from T-1 to T-T.

And obtaining an experience pool according to the model parameter storage state parameters and the target state, and training a neural network model in the deep network algorithm model to be trained by using training samples in the experience pool to obtain a target network model.

Randomly selecting action a with probability of epsilon_tOr by

Obtaining x_tAnd s_t+1。

Specifically, the sample(s) at time t may be_t,a_t,r_t,s_t+1) Storing the data in an experience pool D, and randomly drawing a small batch of samples(s) from the experience pool_i,a_i,r_i,s_i+1) To train) a neural network model in the deep network algorithm model to be trained, and calculating a target network y_iSo as to obtain the trained target network model.

And updating the model parameters according to the loss function based on the target network model to obtain a trained deep network algorithm model.

In updating the model parameter θ, the model parameter θ may be updated every Y steps so that θ is updated every Y steps^-＝θ。

When theta is measured^-When the distance theta is close enough to the distance theta, the condition for training the deep network algorithm model to be trained is met, and the trained deep network algorithm model can be obtained at the moment.

And performing optimal strategy solution on the optimized target by using the trained deep network algorithm model to determine a target network slice.

According to the resource allocation method of the network slice in the application scene, resource modeling is carried out on service network resources according to the slice type to obtain slice performance, time delay and energy consumption are jointly considered, the slice performance is modeled and designed from the overall performance, so that the resource allocation effect of reducing the service energy consumption as far as possible on the basis of meeting the service time delay and efficiency requirements is achieved, and network slice deployment facing the service type is achieved.

Furthermore, in the process of designing the target network slice, the artificial intelligence algorithm is used for designing appropriate network slice strategies for different service types, and the method is applied to the scene of the Internet of vehicles, so that the effective allocation of network resources is realized, the traffic safety is guaranteed, the service experience of users is optimized, and the user reflux degree is improved to a certain extent.

Fig. 9 shows a schematic structural diagram of a resource configuration apparatus of a network slice, and as shown in fig. 9, the resource configuration apparatus 900 of the network slice may include: a type partitioning module 910, a performance modeling module 920, and a policy solving module 930. Wherein:

a type dividing module 910 configured to obtain an initial network resource and a slice type, and initialize the initial network resource according to the slice type to obtain a service network resource;

a performance modeling module 920, configured to perform resource modeling on the service network resource according to the slice type to obtain slice performance, and establish an optimization target according to the slice performance;

and a strategy solving module 930 configured to determine a target network slice by performing optimal strategy solution on the optimization target by using deep reinforcement learning.

and performing optimal strategy solution on the optimization target by utilizing deep reinforcement learning based on the Markov decision process to determine a target network slice.

and performing optimal strategy solution on the optimization target by using the trained deep network algorithm model to determine a target network slice.

The details of the resource allocation apparatus 900 for a network slice are already described in detail in the resource allocation method for a corresponding network slice, and therefore are not described herein again.

It should be noted that although several modules or units of the resource configuration apparatus 900 of the network slice are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

An electronic device 1000 according to such an embodiment of the invention is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting different system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.

Wherein the storage unit stores program code that is executable by the processing unit 1010 to cause the processing unit 1010 to perform steps according to various exemplary embodiments of the present invention as described in the "exemplary methods" section above in this specification.

The memory unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)1021 and/or a cache memory unit 1022, and may further include a read-only memory unit (ROM) 1023.

Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 1200 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above-mentioned "exemplary methods" section of the present description, when said program product is run on the terminal device.

Referring to fig. 11, a program product 1100 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A resource allocation method for network slices, the method comprising:

and performing optimal strategy solution on the optimization target by utilizing deep reinforcement learning to determine a target network slice.

2. The method of claim 1, wherein the slice type comprises: information services, control services, and entertainment services.

3. The method of claim 1, wherein the resource modeling of the service network resource according to the slice type to obtain slice performance comprises:

4. The method of claim 3, wherein the resource modeling the serving network resource, the allocated network resource, and the cache indicator to obtain slice performance comprises:

5. The method for resource allocation of network slices according to claim 1, wherein the determining a target network slice by performing optimal strategy solution on the optimization target by using deep reinforcement learning comprises:

6. The method for resource allocation of network slices according to claim 5, wherein the determining the target network slice by performing optimal strategy solution on the optimization target by using deep reinforcement learning comprises:

7. The method for resource allocation of network slices according to claim 6, wherein the training of the deep network algorithm model to be trained in deep reinforcement learning to obtain the trained deep network algorithm model comprises:

8. A resource allocation apparatus for network slicing, comprising:

and the strategy solving module is configured to perform optimal strategy solving on the optimization target by utilizing deep reinforcement learning to determine a target network slice.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the resource configuration method of a network slice according to any one of claims 1 to 7.

10. An electronic device, comprising:

a processor;

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the resource configuration method of the network slice of any one of claims 1-7 via execution of the executable instructions.