CN114980324A

CN114980324A - Slice-oriented low-delay wireless resource scheduling method and system

Info

Publication number: CN114980324A
Application number: CN202210379968.4A
Authority: CN
Inventors: 刘铭; 桂振文; 谢伟坤
Original assignee: CETC 7 Research Institute
Current assignee: CETC 7 Research Institute
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-08-30
Anticipated expiration: 2042-04-12
Also published as: CN114980324B

Abstract

The invention discloses a slice-oriented low-delay wireless resource scheduling method and a slice-oriented low-delay wireless resource scheduling system, wherein the method comprises the following steps: receiving resource scheduling request information sent by a physical world user; acquiring the instantaneous transmission rate of a user based on the currently received resource scheduling request information; constructing a digital twin simulation environment of user resource allocation through available computing resources; in a digital twin simulation environment, calculating the priority of each user on each resource block by combining the instantaneous transmission rate of the user, the available calculation resources and the scheduling request information of the user, and preliminarily evaluating the allocation decision of the resource blocks; based on historical allocation data of a user, optimizing allocation decisions of the primarily evaluated resource blocks through a depth certainty strategy iteration model; and completing the resource block allocation to the user according to the optimized allocation decision, and mapping the allocation decision to the physical world.

Description

Slice-oriented low-delay wireless resource scheduling method and system

Technical Field

The invention relates to the technical field of 5G wireless resource allocation, in particular to a slice-oriented low-delay wireless resource scheduling method and system.

Background

In a traditional wireless resource allocation scenario, a user sends resource request information to a cellular network, and a base station fairly cuts available resource blocks according to the number of the user and sends the resource blocks corresponding to the available resource blocks to the user. However, the characteristics of the user themselves are often ignored. Therefore, in this case, some users with high priority are often divided into resource blocks which are not enough to support the needs of the users. On the other hand, the time that the data volume to be transmitted of the user stays in the queue on the base station side directly affects the time delay of the user. When the stay time is long, the user's own needs cannot be satisfied with a high probability. On the other hand, in order to consider the time delay of the user, the existing research establishes a multi-objective optimization model, but the model cannot accurately judge the accurate resource demand of the user.

In view of the above discussion, the existing solutions are not good for accurately allocating the required resources to the resource request information of the user.

Disclosure of Invention

In order to solve the problems of the defects and shortcomings of the prior art, the invention provides a slice-oriented low-delay wireless resource scheduling method and system, which are based on a depth certainty strategy gradient algorithm to complete accurate allocation of resources so as to meet the low-delay requirement of a user.

In order to achieve the purpose of the invention, the technical scheme is as follows:

a slice-oriented low-delay wireless resource scheduling method comprises the following steps:

receiving resource scheduling request information sent by a physical world user;

acquiring the instantaneous transmission rate of a user based on the currently received resource scheduling request information;

constructing a digital twin simulation environment for user resource allocation through the existing available computing resources;

in a digital twin simulation environment, calculating the priority of each user on each resource block by combining the instantaneous transmission rate of the user, the existing available calculation resources and the scheduling request information of the user, and preliminarily evaluating the allocation decision of the resource blocks;

based on historical allocation data of a user, optimizing allocation decisions of the primarily evaluated resource blocks through a depth certainty strategy iteration model;

and completing the resource block allocation to the user according to the optimized allocation decision, and mapping the allocation decision to the physical world.

Further, a priority R is calculated for each user i, on each resource block _i Expressed as:

wherein, ω is ₁ ,ω ₂ ,ω ₃ ,ω ₄ Represents a weight coefficient satisfying ω ₁ +ω ₂ +ω ₃ +ω ₄ ＝1；γ _i (t) represents the signal-to-noise ratio of user i at time t; r is _i (t) represents the instantaneous transmission rate of user i at time t; RA _i (t) represents the average transmission rate of user i for a period of time before time t; c _i (t) represents the queue buffer time at time t for user i; d _i (t) represents the amount of data that user i needs to transmit at time t.

Still further, the depth deterministic strategy iterative model comprises an Actor neural network and a criticc neural network;

taking the current resource scheduling request information as observation information and defining the current resource scheduling request information as S _i Putting the historical allocation data into the constructed replaymemory; the current data S _i Inputting the resource allocation decision a obtained from the Actor neural network _i And the corresponding reward value is calculated by a given priority formula.

Further, the current resource scheduling request information S _i Inputting the Actor neural network for iterative training, and after iteration is carried out for multiple times, reward considering memory discount can be rewritten as follows:

wherein R is _i (s, a) shows the reward obtained by user i, γ ^i-t Represents a discount factor, is a fixed value (e.g., set to 0.999); t represents a time scale.

Still further, based on obtaining the corresponding resource block allocation strategy a _i Establishing a behavior value function to express that the resource block allocation strategy a is adopted _i The behavior-value function, expressed as:

in the formula (I), the compound is shown in the specification,

further, aiming at establishing a behavior value function to express that a resource block allocation strategy a is adopted _i And obtaining the maximum expected return by constructing a loss function, wherein the loss function is expressed as:

wherein, theta ^Q Representing a function Q ^π Is determined, Y represents the true demand return of the user i,

Is a desired function.

Preferably, after the resource block allocation to the user is completed, the allocated resource block is deleted from the resource block list.

Further, after the allocated resource blocks are deleted from the resource block list,

judging whether the resource block list is empty or not, and finishing the distribution process if the resource block list is confirmed to be empty;

and if the resource block list is not determined to be empty, continuing to execute the allocation strategy, and sending the allocation strategy to the user in the physical world, so as to meet the requirement of the user on low time delay.

Preferably, the received resource scheduling request information is analyzed, whether the past resource demand information of the user exists in the cache of the base station side is judged, and if the past resource demand information of the user exists, the past resource demand information is added into the resource scheduling request information of the user;

the resource scheduling request information comprises the position of the user, the channel quality information transmitted by the user and the waiting time of the data of the user in the queue at the base station side.

A computer system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the slice-oriented low-latency radio resource scheduling method. The invention has the following beneficial effects:

1. the invention provides a slice-oriented low-delay wireless resource scheduling method, which effectively ensures the low-delay requirements of users and can accurately meet the delay requirements of different users.

2. The situation that resource allocation is unreasonable is overcome: and a digital twin system of the corresponding relation between the physical entity and the virtual entity is constructed at the base station side, and the current resource request of the user is accurately simulated by learning the historical distribution data.

3. And under the condition of obtaining the current resource request of the user by using a deep deterministic strategy iterative model, taking the priority as a reward accurate guide base station side to obtain an accurate resource allocation scheme.

4. The method is generally suitable for wireless resource allocation application under 5G network application.

Drawings

Fig. 1 is a block diagram of the steps of a slice-oriented low-latency radio resource scheduling method according to embodiment 1.

Fig. 2 is a flowchart of steps of a slice-oriented low-latency radio resource scheduling method according to embodiment 1.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

The present embodiment may be construed as follows, referring to the terminology used in the art:

1. the user: end users connecting the same cellular network.

2. Slicing: a cellular network is cut into a plurality of virtual end-to-end networks, each network is independent logically, and the failure of any one network does not affect other virtual networks.

3. Mapping: correspondence between the digital twin physical entity and the virtual entity.

4. Digital twinning: one or more mutually dependent digit mapping systems.

Example 1

The slice-oriented low-latency radio resource scheduling method provided in this embodiment is mainly applied to a fifth generation mobile communication technology (5G) scenario, and based on different user priorities, an existing slice technology may allocate corresponding resources to users in a cellular network based on The number of users. The most popular PF algorithm preferentially provides corresponding resource blocks for users with high requirements by establishing a priority formula so as to meet the resource requirements of the users with high priority. However, in addition to the priority index of each user, the delay requirement of the user cannot be ignored, and the delay of the user mainly includes the waiting delay staying in the buffer and the request sending delay of the user. The conventional PF algorithm does not implement more reasonable resource allocation according to the time delay requirement of users, and when the resources required by some users are more, the requirement on time delay is more strict, but the resources are equally sent to the corresponding users based on the fairness of the conventional PF algorithm. Based on the faced problems, the present embodiment designs an intelligent slicing technology based on Digital Twin (DT) to implement low-latency radio resource scheduling. Based on the method of the embodiment, the cellular network can train the deep deterministic strategy iterative model by using historical distribution data and combining the time delay demand characteristics of the user. Different from the traditional resource allocation algorithm, the deep deterministic strategy iterative model can help users with different delay requirements to provide different numbers of resource blocks. The method of the embodiment can obviously reduce the time delay of wireless resource allocation, and improves the low time delay performance of the fifth generation mobile communication technology.

As shown in fig. 1 and fig. 2, the present embodiment provides a slice-oriented low-latency radio resource scheduling method, where the method includes the following steps:

s1: receiving resource scheduling request information sent by a physical world user;

s2: acquiring the instantaneous transmission rate of a user based on the currently received resource scheduling request information;

s3: digital twinning simulation environment for user resource allocation is constructed by means of available computing resources, wherein digital twinning is located on base station side, namely digital twinning simulation environment is constructed on base station side

S4: in a digital twin simulation environment, calculating the priority of each user on each resource block by combining the instantaneous transmission rate of the user, the available calculation resources and the scheduling request information of the user, and obtaining the allocation decision of the resource blocks according to the preliminary evaluation of the priority; according to the principle that the user with high priority preferentially meets the resource request of the user, the allocation decision of the resource blocks is formed according to the high-low arrangement of the priority on each resource block obtained through calculation.

S5: based on historical allocation data of a user, optimizing allocation decisions of the primarily evaluated resource blocks through a depth certainty strategy iteration model;

s6: and completing the resource block allocation to the user according to the optimized allocation decision, and mapping the allocation decision to the physical world.

In this embodiment, based on the current resource scheduling request information, the instantaneous transmission rate of the user is obtained, that is, the transmission rate of the current user data flowing into the base station side is too high, and the base station side may not be able to effectively obtain complete user information, which results in overflow of the cache data at the base station side.

In a specific embodiment, the calculationPer user i, priority R on each resource block _i Expressed as:

In a specific embodiment, the Deep deterministic Policy iteration model (DDPG) includes an Actor neural network, a Critic neural network; the criticic network is used for evaluating the user resource allocation decision at the current moment. Taking the current resource scheduling request information as observation information and defining the current resource scheduling request information as S _i Putting the historical allocation data into the constructed replaymemory; scheduling request information S of current resource _i Inputting the resource allocation decision a obtained from the Actor neural network _i And the corresponding reward value is calculated by a given priority formula.

In this embodiment, the replay represents experience playback, which is a part of DDPG reinforcement learning, and history data is put into the replay, so that strong correlation of data during training sampling can be reduced, and a good training result cannot be guaranteed due to the strong correlation.

Specifically, the allocation decision of the preliminarily estimated resource block is optimized through a deep deterministic strategy iterative model, historical allocation data is used as training information, an actor-critic network framework is used for estimation, the actor is the preliminarily obtained resource allocation decision (S4), and the estimation process is represented as follows:

wherein, theta ^Q Denotes the network training parameters, Q denotes the evaluation function of DDPG, μ denotes the parameter of Q, E denotes the expectation function X ═ r _i +γQ ^μ′ (s _i ,a _i )|a′＝μ′ _i (s _i )，a _i Representing the motion space, s, of user i _i Representing the state space of user i. Gamma is a discount factor in reinforcement learning, and a represents a _i S denotes s _i In the set of (1), mu represents mu _i A collection of (a). In the embodiment, the representation of the next output is represented by a superscript' and is used for distinguishing the current output; such as a' representing the next output of a.

In a specific embodiment, the current resource scheduling request information S is used _i Inputting the Actor neural network for iterative training, and after iteration is performed for multiple times, reward considering the memory discount can be rewritten as follows:

wherein R is _i (S _i ,a _i ) Represents a reward obtained by user i; gamma ray ^i-t Represents a discount factor, is a fixed value (e.g., set to 0.999); t represents a time scale.

In a specific embodiment, the resource block allocation policy a is based on the acquisition of the corresponding resource block allocation policy _i Establishing a behavior value function to express that the resource block allocation strategy a is adopted _i The expected return is then expressed as:

in the formula (I), the compound is shown in the specification,

is a desired function.

In a specific embodiment, the resource block allocation strategy a is adopted according to the establishment of the action value function _i And obtaining the maximum expected return by constructing a loss function, wherein the loss function is expressed as:

Is a desired function.

In a specific embodiment, after the resource block allocation to the user is completed, the allocated resource block is deleted from the resource block list.

In a specific embodiment, after the allocated resource blocks are removed from the resource block list,

In a specific embodiment, the received resource scheduling request information is analyzed, whether the past resource demand information of the user exists in a cache of a base station side is judged, and if the past resource demand information of the user exists, the past resource demand information is added into the resource scheduling request information of the user;

The method in the embodiment is mainly used on the base station side of a 5G cellular network, and in the 5G cellular network, the request requirement of the user is acquired by the method, the resource block is reasonably scheduled and cut, and the requirement of the user on low time delay is met.

Example 2

The embodiment also provides a computer system, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method steps implemented by the processor are as follows:

s3: constructing a digital twin simulation environment for user resource allocation through the existing available computing resources;

Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the bus connecting together various circuits of the memory and the processor or processors. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.

Example 3

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method steps of:

s3: constructing a digital twin simulation environment for user resource allocation through the existing available computing resources; the currently available computing resources refer to available resources of a Central Processing Unit (CPU) of the base station;

Those skilled in the art will understand that all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware to complete, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A slice-oriented low-delay wireless resource scheduling method is characterized in that: the method comprises the following steps:

in a digital twin simulation environment, calculating the priority of each user on each resource block by combining the instantaneous transmission rate of the user, the available calculation resources and the scheduling request information of the user, and preliminarily evaluating the allocation decision of the resource blocks;

2. The slice-oriented low-latency radio resource scheduling method of claim 1, wherein: calculating the priority R of each user i on each resource block _i Expressed as:

wherein, ω is ₁ ，ω ₂ ，ω ₃ ，ω ₄ Represents a weight coefficient satisfying ω ₁ +ω ₂ +ω ₃ +ω ₄ ＝1；γ _i (t) represents the signal-to-noise ratio of user i at time t; r is _i (t) represents the instantaneous transmission rate of user i at time t; RA _i (t) represents the average transmission rate of user i for a period of time before time t; c _i (t) represents the queue buffer time at time t for user i; d _i (t) represents the amount of data that user i needs to transmit at time t.

3. The slice-oriented low-latency radio resource scheduling method of claim 2, wherein: the depth certainty strategy iterative model comprises an Actor neural network and a criticc neural network;

taking the current resource scheduling request information as observation information and defining the current resource scheduling request information as S _i Putting the historical allocation data into the constructed replaymemory; the current resource scheduling request information S _i Inputting the resource allocation decision a obtained from the Actor neural network _i And the corresponding reward value is calculated by a given priority formula.

4. The slice-oriented low-latency radio resource scheduling method of claim 3, wherein: scheduling request information S of current resource _i Inputting the Actor neural network for iterative training, and after iteration is carried out for multiple times, reward considering memory discount can be rewritten as follows:

wherein R is _i (s, a) indicates a reward earned by user i, γ ^i-t Represents a discount factor; t represents a time scale.

5. The slice-oriented low-latency radio resource scheduling method of claim 4, wherein: resource block allocation strategy a based on acquisition correspondence _i Establishing a behavior value function to express that the resource block allocation strategy a is adopted _i The behavior-value function, expressed as:

in the formula (I), the compound is shown in the specification,

。

6. the slice-oriented low-latency radio resource scheduling method of claim 5, wherein: aiming at establishing a behavior value function to express that a resource block allocation strategy a is adopted _i And acquiring the maximum expected return by constructing a loss function, wherein the loss function is expressed as follows:

Is a desired function.

7. The method for scheduling of slice-oriented low-latency radio resources of claim 1, wherein: and after the resource block allocation to the user is finished, deleting the allocated resource block from the resource block list.

8. The slice-oriented low-latency radio resource scheduling method of claim 7, wherein: after removing the allocated resource blocks from the resource block list,

9. The slice-oriented low-latency radio resource scheduling method of claim 1, wherein: analyzing the received resource scheduling request information, judging whether the past resource demand information of the user exists in a cache of the base station side, and if so, adding the past resource demand information into the resource scheduling request information of the user;

10. A computer system, characterized by: the method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the slice-oriented low-latency radio resource scheduling method according to any one of claims 1 to 9.