CN114867030A

CN114867030A - Double-time-scale intelligent wireless access network slicing method

Info

Publication number: CN114867030A
Application number: CN202210649530.3A
Authority: CN
Inventors: 李佳珉; 王洁; 叶枫; 朱鹏程; 盛彬; 尤肖虎
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-08-05

Abstract

The invention discloses a double-time scale intelligent wireless access network slicing method. The method is based on a cell-free distributed large-scale MIMO system architecture, combines non-orthogonal multiple access and massive terminal dynamic multi-connection, and respectively performs physical resource block allocation and power allocation on two time scales by using a reinforcement learning algorithm according to the characteristic of long-term change of a network state so as to realize self-adaptive resource allocation under different time and resource granularity. Compared with the prior art, the invention sets the upper and lower layer combined configuration resources, gives the number configuration of the physical resource blocks of each piece of the upper layer, controls the lower layer to perform physical resource block allocation and power allocation and dynamically selects links for each user according to the environmental change of the physical layer in a small time scale, improves the frequency spectrum efficiency of the system, meets the requirements of ultrahigh reliability and ultralow delay service of future 6G mass flow, and has very important significance for researching real-time resource allocation in a mobile scene.

Description

Double-time-scale intelligent wireless access network slicing method

Technical Field

The invention relates to a double-time-scale intelligent wireless access network slicing method based on a cell-free distributed large-scale MIMO system architecture, and belongs to the technical field of mobile communication.

Background

With the rapid development of the mobile internet, the communication service scale is continuously enlarged, the difference of user requirements is higher and higher, the limited frequency spectrum becomes more and more in short supply, the requirements for high system throughput, ultra-low delay, ultra-high reliability and real-time connection are further improved, and the traditional wireless communication system needs further improvement. The cellular-free distributed massive MIMO is an innovative scalable network MIMO, where a large number of APs distributed in one area serve all users in the same time-frequency resource. The cellular-free distributed massive MIMO has very high spectral efficiency, energy efficiency and coverage. In addition, the problem of mobility of user positions in a cellular network is solved through a cellular-free structure, and compared with a centralized system, the cellular-free distributed large-scale MIMO system has the advantages of channel diversity, no switching, higher coverage rate, no need of deploying cells in a specific area and the like. In addition, the multi-connection can effectively reduce the delay caused by retransmission and error transmission under the condition of a single link, and meet the high reliability requirement of a 6G system on mass service and the characteristic of a non-cellular distributed large-scale MIMO system; the non-orthogonal multiple access supports massive user access with limited spectrum resources, further develops a power domain and improves the throughput of the system.

In order to satisfy the requirement of providing customized services for future 6G large-scale services, the 6G system focuses more on the utilization rate of limited resources, and therefore a network slicing technology for realizing resource sharing by using a network virtualization technology is developed. The network slice abstracts physical resources into virtual logic networks suitable for different scenes by using independent and flexible virtual resource slices, and provides powerful guarantee for QoS. The research on the core network slice is relatively comprehensive, and mainly focuses on the configuration and management of the network slice; the existing wireless access network slicing technology combines self optimization of multi-granularity network resources, and a layered slicing framework is provided, but the existing wireless access network slicing technology is provided in a cellular network, and the non-cellular network and the network slicing technology are highly matched, so that on one hand, the non-cellular distributed large-scale MIMO system can reduce the randomness of wireless channels in the network slices, and on the other hand, the network slices enable the application in the non-cellular distributed large-scale MIMO system to be more flexible. Therefore, the combined research of the non-cellular distributed massive MIMO system and the network slice becomes more and more important, and has an important significance for meeting the diversity requirement in the future 6G and realizing the dynamic allocation of limited resources.

Disclosure of Invention

The technical problem is as follows: in view of the above, an object of the present invention is to provide a dual-time scale intelligent radio access network slicing method based on a cell-free distributed massive MIMO system architecture, so as to implement efficient application and dynamic allocation of limited resources in the cell-free distributed massive MIMO system architecture in combination with a network slicing technique.

The technical scheme is as follows: the invention provides a method for jointly optimizing QoS (quality of service) of a user by an algorithm under double time scales under the constraints of guaranteeing user queue time delay, meeting the requirements of slicing the lowest average rate, user rate interruption probability and the like by carrying out PRB (physical resource block) distribution and power distribution on an uplink user in a cell-free distributed large-scale MIMO (multiple input multiple output) system architecture, namely a method for slicing a double-time-scale intelligent wireless access network, which comprises the following specific steps:

the method is based on a cell-free distributed large-scale MIMO system architecture, in the distributed large-scale MIMO system, J Access Points (AP) are connected to a central processing unit, J is {1, 2.., J }, each AP has M antennas, users in a coverage range are divided into different slices according to service requirements, a slice set I in the coverage range is {1, 2.., i }, and users in a slice i are U _i ＝{1,2,...,u _i }; in a dual-time-scale network slice structure, a small-scale time dimension is a transmission time interval TTI of 1ms, a large-scale time dimension k includes Δ T TTIs, in each TTI, a total bandwidth W is divided into F physical resource blocks PRB shared by all APs, that is, F {1, 2.. multidot.f }, and a bandwidth allocated to each PRB is B ═ W/F; the method specifically comprises the following steps:

step S1, establishing a channel model and an uplink signal transmission model of the distributed large-scale MIMO system to obtain an uplink channel transmission expression and transmission rate expressions of enhanced mobile broadband eMBB users and high-reliability and low-delay communication URLLC users;

step S2, establishing a slice model, wherein each user of each slice introduces a buffer data queue transmitted according to a first-come first-obtained strategy on each AP, so that the data packet delay of the user can be divided into processing delay, transmission delay and queuing delay, and two indexes of quality of service (QoS), namely communication reliability and expressiveness of packet delay, are obtained;

step S3, establishing a hierarchical optimization model under the conditions that the user queue time delay is ensured, the requirements of the minimum average rate of slicing and the user rate interruption probability are met, and the like;

step S4, a method for double-time scale access network slicing is provided, firstly, an upper layer controller observes user service flow in large-scale time by using a deep Q network DQN algorithm, and different quantities of PRBs are distributed to each slice; based on the slice configuration method obtained by the upper layer controller, the lower layer controller carries out specific PRB allocation and power allocation on each user in the slice according to channel information in small-scale time by using a multi-agent depth certainty strategy gradient MADDPG algorithm.

Wherein the step S1 specifically includes:

step S101, considering a fading channel under a multi-connection scene, user u in the tth TTI _i The gain of the uplink channel between the ith PRB and the jth AP is modeled as

In the formula (1)

Representing from jth AP to user u _i The large-scale fading in between is reduced,

representing slave users u _i The distance, ζ, to the jth AP is the path loss exponent,

is a function of the logarithmic fading variation,

representing small-scale fading whose elements obey a standard Rayleigh distribution

Step S102, two slice types are considered in the distributed architecture, one is eMBB slice, the data transmission rate accords with the Shannon capacity theory, and the data transmission rate of eMBB users in the t-th TTI can be modeled as

The other is URLLC slice, the data rate of which is approximated by finite block length theory, and the data transmission rate of URLLC users in the tth TTI can be modeled as

In the formula (2) and the formula (3)

Representing the signal-to-noise ratio, Δ t refers to one TTI, and B is the bandwidth; in formula (3)

Representing the channel dispersion, p _i Is the average packet length, Q, of slice i ^-1 (. cndot.) is an inverse Gaussian Q function, and ε is the effective decoding error probability.

The step S2 specifically includes:

step S201, dividing the data packet time delay of the user into processing time delay, transmission time delay and queuing time delay, and slicing the total time delay D of the i in the t-th TTI _i,t Is composed of

In the formula (4)

Respectively representing the transmission delay, the transmission delay and the queuing delay of the slice i;

step S202, defining the packet loss rate of the ith slice as the probability that the total delay of packets in the slice i exceeds a predefined maximum slice delay threshold; then, the packet drop rate, i.e. the packet loss rate δ, of the ith slice in the tth TTI _i,t Can be expressed as

D in formula (5) _i,t Is the total time delay of the slice i,

represents the maximum packet delay acceptable for slice i, Pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance.

The step S3 specifically includes:

step S301, upper control strategy pi _C The dynamic change of the service flow and the dynamic change of QoS performance observation are converted into the PRB quantity distribution of each slice, so that the upper layer control strategy pi _C Can be expressed as the kth largestFrom the whole network S in time scale _k Global state to appropriate PRB number configuration C in slice _k Can be modeled as

A in formula (6) _i Indicating the packet arrival rate of the user in slice i,

is the average packet delay of all active users for slice i,

is the average packet loss rate, C, of all active users on slice i _i,k Is the PRB number configuration for slice i;

step S302, in the t TTI of the kth large-scale time, the lower layer controller will observe the user information X _t And PRB number configuration information C _k Method for allocating overall radio resources mapped in physical layer _t Lower control strategy pi _E Can be modeled as

C in formula (7) _k Is the PRB number configuration per slice, at is a large scale time length,

is the user queue length in slice i,

is the channel state information of the user and,

is a binary user association factor, representing AP association and PRB allocation,

indicating that the power allocated to the user ui may be one of Z different power levels;

step S303, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, an upper control and a lower control, so that the utility function U of the ith slice in the kth large-scale time _i,k Can be modeled as

In formula (8)

Is a QOS utility function for slice i, the average delay by all active users in slice i

And average packet drop rate

Determining;

is a spectral efficiency utility function for slice i, determined by the data rates of all active users in slice i and r _i,t Determining that Δ T is a large scale time duration, α _i,1 、α _i,2 、α _i,3 Is a positive weighting factor;

the goal of the hierarchical network slice architecture is to achieve optimal system performance based on satisfying the radio resource constraints, and therefore, the optimization problem in the hierarchical network slice can be designed as follows:

max in equation (9) is a maximization function，π _E Is a lower layer control strategy, π _C Is an upper control strategy, pi is a joint strategy, U _i,n Is the utility function of slice i with respect to index n, X is a discount factor, when n is sufficiently large, X is ⁿ Going to zero, the optimization problem has the following constraints:

1) limiting the total power allocated to each AP to less than the total power of all APs

In the formula (10)

Total power of APj;

2) minimum constraint on data rate per slice:

in formula (11)

Is u _i Associating the transmission rates of the jth AP and the f-th PRB,

is the minimum data rate for the slice;

3) the total data processing rate at each slice of an AP is less than the maximum data processing rate that the AP can achieve:

r in formula (12) _j,i Represents the total data processing rate of the jth AP on slice i,

denotes the maximum of the jth APA data processing rate;

4) packet delay constraint for each slice:

d in formula (13) _i,t Is the total time delay of the slice i,

indicating the maximum packet delay;

5) and (3) packet loss rate constraint of each slice:

delta in the formula (14) _i,t Is the packet loss rate of the slice i,

represents the minimum packet loss rate;

6)

equation (15) ensures that each AP can only allocate one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;

7)

equation (16) ensures that different APs cannot allocate the same PRB to the same user,

respectively indicates that two different APs are corresponding to the user u in t TTIs aiming at the same PRB _i The correlation factor of (a);

8)

equation (17) ensures that the same AP can allocate different PRBs for different users,

respectively indicates that two PRBs in t TTIs aim at the same AP and are different from each other to user u _i The allocation factor of (c);

9)

equation (18) ensures that active users in the system must connect to at least one AP and the allocated resources,

indicates APj to user u in the t-th TTI _i The correlation factor of (2).

The step S4 specifically includes:

step S401, in each slice C _k Under the configuration of PRB number belonging to the family C, the aim of lower-layer control strategy learning is to find an optimal strategy capable of obtaining the maximum expected reward of all states

The optimization problem of the underlying control strategy is therefore designed as follows to obtain the maximum desired jackpot;

pi in the formula (19) _E Is a lower layer control strategy, C _k Is a sliced PRB number configuration;

step S402, the optimization problem of the lower-layer control strategy can be solved by using the MADDPG algorithm, and the AP and the communication network can be respectively used as an agent and an environment; for the lower layer controller, the observed physical layer should dynamically perform the action of radio resource allocation to obtain the maximum expected jackpot for the system;

thus, for one agent

1) State s of _j : user channel state information H connected to agent _j (t) and user queue information Q _j (t)；

s _j ＝{Q _j (t),H _j (t)} (20)

2) Action a _j : for APj, the action corresponds to a radio resource allocation method, including power allocation and PRB allocation, and therefore the role of the agent at the current time t is denoted as

3) Prize r _j : the reward function for an agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRBs and power under constraints, otherwise it is defined as negative feedback, and thus the reward function for each agent can be expressed as

R in the formula (22) _reg Represents a fixed value;

step S403, the optimization problem of the upper control strategy can be solved by utilizing the DQN algorithm, and for an upper controller, the number of PRBs in each slice should be dynamically configured according to the service flow so as to maximize the overall utility of the system;

thus, for the upper layer controller

1) State s _k : the global state information includes an average arrival rate A of the users _i Average delay rate

And average packet loss rate

2) Action a _k : the action space of the upper layer controller is allocated C corresponding to the number of PRBs per slice _k ，C _i,k Is the number of PRB configurations for slice i; since there is a total of one I slice in the system, the motion space can be represented by an I-dimensional vector;

3) prize r _k : optimal control strategy at given lower layer

The convergence goal of the upper-level control strategy is to maximize the overall utility of the system, and thus, the reward function is defined as the utility of the system that satisfies the constraints, while the system that does not satisfy the constraints is negative feedback, specifically expressed as negative feedback

Equation (25) shows a fixed value, U _i,k Is the utility function of the ith slice in the kth large-scale time.

Has the advantages that: the invention provides a double-time-scale wireless access network slicing method in a cell-free distributed large-scale MIMO architecture, which is expanded from a network slicing method in the cell architecture to the cell-free architecture and is combined with a layered time model, so that the utilization rate of limited resources is effectively improved, the real-time property of resource allocation is enhanced, and the diversity of the requirements in the future 6G can be met.

Drawings

Fig. 1 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 2:4, where the red plot represents the spectral efficiency of the static resource allocation method;

fig. 2 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 3:3, where the red plot represents the spectral efficiency of the static resource allocation method;

fig. 3 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 4:2, where the red plot represents the spectral efficiency of the static resource allocation method;

fig. 4 is a simulation result of the configuration of the upper controller controlling the number of sliced PRBs.

Detailed Description

The present invention will be described in detail below with reference to examples:

assume a 0.5X 0.5m ² There are 2 APs with 50 antennas each in the large-scale MIMO system without cellular distribution. In the coverage area, there are two types of users with different service types, i.e. users with high reliability and ultra-low delay transmission service requirement are divided into URLLC slices, i.e. slice 0; users requiring high data rate services are divided into eMBB slices, i.e., slice 1.

The channel model consists of three parts: path loss, shadow fading, and small-scale fading, which can be expressed as

Wherein

Let the path fading factor α be 3.6, the reference distance be 1,

to satisfy the exponentially normally distributed shadow fading variations,

Within a dual-timescale network slice structure, the small-scale time T dimension is Δ T ═ 1ms transmission time interval, the large-scale time k dimension includes Δ T TTIs, Δ T ═ 10ms, in each TTI, the total bandwidth W is divided into 6 PRBs shared by all APs, and the average allocated bandwidth of F ═ 1, 2. The method is characterized by comprising the following steps:

step S1, establishing a channel model and an uplink signal transmission model of the distributed large-scale MIMO system, and obtaining an uplink channel transmission expression and transmission rate expressions of two types (URLLC, emBB) of users.

In this embodiment, step S1 specifically includes:

step S11, consider a fading channel under a multi-connection scenario, user u in the tth TTI _i The gain of the uplink channel between the ith PRB and the jth AP is modeled as

Step S12, eMBB slicing, the data transmission rate of which accords with the Shannon capacity theory, the data transmission rate of the eMBB user in the t-th TTI can be modeled as

In formula (3)

Representing the channel dispersion, Q ^-1 (. is an inverse Gaussian Q function, ρ _i Is the average packet length of slice i, ε being the effective decoding error probabilitySet to 0.05; in the formula (2) and the formula (3)

Representing the signal-to-noise ratio, can be modeled as

Additive white gaussian noise power σ in equation (4) ² ＝-174dBm/Hz；

Indicates that the user u is allocated from APj in the slice i on the f PRB in the t TTI _i The power of 0, 9, 19, 29 can be selected.

Step S2, establishing slice model, introducing a buffer data queue transmitted according to the first-come first-obtained strategy on each AP by each slice user, and obtaining two indexes of service quality, namely communication reliability and expressive property of packet delay.

In this embodiment, step S2 specifically includes:

step S21, assuming that each user has a data queue on the AP to buffer the incoming packets, indicating that the total packet length in slice i is Ω _i In which Ω is set ₀ ＝1000Byte，Ω ₁ 5000 bytes and the data queue is delivered according to a first come first get policy. In the t-th TTI, user u is in slice i _i In the buffer of (2) waiting for the queue length to be sent to be Q _ui (t), then user u _i The queue updating process of

A in the formula (5) _i Indicating the packet arrival rate of users in slice i, where A is set ₀ ＝0.2packets/s，A ₁ ＝1packets/s，

Is user u _i The transmission rate of (c).

Step S22, dividing the user data packet time delay into processing time delay, transmission time delay and queuing time delay, wherein the total time delay of the slice i in the tth TTI is

1) Transmission delay refers to the time required to transmit a packet over the link between the AP and the slice. Therefore, the transmission delay of slice i in the t-th TTI

Can be expressed as

R in formula (7) _i,t Is the total transmission rate of slice i;

2) the processing delay refers to the time required for processing a data packet after the AP receives a data request of a corresponding user. Processing delay of slice i in t-th TTI

Can be expressed as

R in formula (8) _j,i Represents the total data processing rate of the jth AP on slice i, where R is set _j,0 ＝1Mbit/s，R _j,1 ＝0.5Mbit/s；

3) According to the queuing theory, the average waiting time (including waiting time and service time) of the data packet arrival in the slice i, i.e. the queuing delay of the slice i in the TTI

Is composed of

Mu in formula (9) _i Representing the service rate of the user in slice i, θ _i Set to θ for the average service rate per PRB in slice i ₀ ＝50bit/s，θ ₁ ＝30bit/s，C _i Is PRB configuration, U, of slice i _i Is the number of users of slice i, and is set to 3.

Step S23, define the packet loss rate of the ith slice as the probability that the total delay of packets in slice i exceeds a predefined maximum slice delay threshold. Then, the packet drop rate, i.e. the packet loss rate, of the ith slice in the tth TTI can be expressed as

D in formula (10) _i，t Is the total time delay of the slice i,

And step S3, establishing a hierarchical optimization model under the conditions of ensuring the user queue time delay, meeting the constraints of slice minimum average rate requirement, user rate interruption probability and the like.

In this embodiment, step S3 specifically includes:

step S31, upper control strategy pi _C The dynamic change of the service flow and the dynamic change of QoS performance observation are converted into the PRB quantity distribution of each slice, so that the upper layer control strategy pi _C Can be expressed as from the entire network S _k Global state to appropriate PRB number configuration C in slice _k Can be modeled as

A in formula (11) _i Indicating the packet arrival rate of the user in slice i,

is the average packet delay for slice i users,

is the average packet loss rate, C, at slice i user _i,k Is the PRB number configuration for slice i.

Step S32, in every TTI of the kth large scale time, the lower layer controller will observe the user information X _t And PRB number configuration information C _k Method for allocating overall radio resources mapped in physical layer _t Lower control strategy pi _E Can be modeled as

C in formula (12) _k Is the PRB number configuration per slice, at is a large scale time length,

is the user queue length in slice i,

is the channel state information of the user.

indicating the allocation to user u _i May be one of Z different power levels.

Step S33, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, upper control and lower control, so that the utility function of the ith slice in the kth large-scale time can be modeled as

In formula (13)

And average packet drop rate

Determining;

is a spectral efficiency utility function for slice i, determined by the data rates of all active users in slice i and r _i，t Determining that Δ T is a large scale time duration, α _i，1 、α _i，2 、α _i，3 Are positive weighting factors, set to 1, 10, respectively ⁶ ，10 ⁵ 。

The goal of the hierarchical network slice architecture is to achieve optimal system performance based on satisfying radio resource constraints. Thus, the optimization problem in hierarchical network slices can be designed as follows:

pi in formula (14) _E Is a lower layer control strategy, pi _C Is an upper control strategy, pi is a joint strategy, U _i，n Is the utility function of slice i with respect to index n, X is a discount factor, when n is sufficiently large, X is ⁿ Tending to zero. The optimization problem has the following characteristicsThe following constraints:

In the formula (15)

Total power for all APs;

2) minimum constraint on data rate per slice:

in the formula (16)

Is u _i Associating the transmission rates of the jth AP and the f-th PRB,

is the minimum data rate of a slice, wherein

r in formula (17) _j,i Represents the total data processing rate of the jth AP on slice i,

represents the maximum data processing rate of the jth AP, wherein

4) Packet delay constraint for each slice:

d in formula (18) _i，t Is the total time delay of the slice i,

indicating the maximum packet delay, wherein

5) And (3) packet loss rate constraint of each slice:

δ in the formula (19) _i，t Is the packet loss rate of the slice i,

represents a minimum packet loss rate, wherein

6)

Equation (20) ensures that each AP can only allocate one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;

7)

equation (21) ensures that different APs cannot allocate the same PRB to the same user,

8)

equation (22) ensures that the same AP can allocate different PRBs for different users,

respectively indicates that two different PRBs are used for the user u in the same AP in t TTIs _i The allocation factor of (c);

9)

equation (23) ensures that active users in the system must connect to at least one AP and the allocated resources,

indicates APj to user u in the t-th TTI _i The correlation factor of (2).

Step S4, a double-time scale network slicing method is provided, firstly, an upper layer controller observes user service flow in large-scale time by using a DQN algorithm and distributes different quantities of PRBs to each slice, so that PRB resources can be shared among the slices; based on the slice configuration method obtained by the upper layer controller, the lower layer controller performs specific PRB allocation and power allocation on each user in the slice according to the channel state and the user queue information in the small-scale time by using the MADDPG algorithm.

In this embodiment, step S4 specifically includes:

step S41, for each slice C _k PRB quantity matching of E CThe goal of the lower level control strategy learning is to find an optimal strategy that can achieve the maximum desired reward for all states

step S42, the problem of optimization of the lower-layer control strategy can be solved by using the maddppg algorithm, and the AP and the communication network can be respectively used as an agent and an environment. For the lower layer controller, the observed physical layer should dynamically perform the action of radio resource allocation to achieve the maximum desired jackpot for the system.

Thus, for one agent

1) State s _j : considering that the packet arrival rate of each slice set is always the same and the user queue remains in the same state, the state formula of the agent at the current time t can be simplified to

2) Action a _j : for APj, the actions correspond to one radio resource allocation method, including power allocation and PRB allocation. Thus, the role of the agent at the current time t is denoted as

3) Prize r _j : the reward function of the agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRBs and power under the constraints, otherwise, negative feedback is defined. Thus, the reward function for each agent may be expressed as

R in the formula (27) _reg Representing a fixed value.

Step S43, the lower layer controller allocates PRBs and power by using the maddppg algorithm, including the following steps:

1) initializing a neural network by using random parameters, and setting a training _ epsilon as 1;

2) initializing an environment state during each training, observing an initial state s by all APs, and setting time _ slot to be 1;

3) all APs in each TTI perform action selection a according to the observed state, namely PRB allocation and power allocation are performed on the user, then the environment gives an intelligent agent reward r according to whether the action meets the constraint condition, and the environment enters the next state s';

4) storing the state transition sequences (s, a, r, s') transmitted by all APs in an experience buffer;

5) the lower layer controller passes

Updating the criticizing network and calculating the action gradient of all the agents, wherein

As a function of the action value of agent j,

a loss function that is a function of the action value;

6) all AP according to

Receiving an action gradient updated by an action network;

7) traversal time _ slot 1-T _L Time _ slot +1, updating the user position, and returning to execute 3);

8) traverse trailing _ epsilon 1-K _L The tracing _ epsilon +1 returns to execute 2),until the algorithm converges.

Step S44, control strategy at converged lower layer

Lower, upper layer control strategy

The optimization problem of (2) is designed as follows to learn the optimal upper control strategy;

step S45, the optimization problem of the upper layer control strategy can be solved by using the DQN algorithm, and for the upper layer controller, the number of PRBs in each slice should be dynamically configured according to the service traffic, so as to maximize the overall utility of the system.

Thus, for the upper layer controller

1) State s _k : since the user packet arrival rate of each slice is a fixed value and the average packet loss rate is determined by the average delay, the state can be simplified to

2) Action a _k : the action space of the upper layer controller is allocated C corresponding to the number of PRBs per slice _k ，C _i，k Is the number of PRB configurations for slice i. Since there is a total of one I slice in the system, the motion space can be represented by an I-dimensional vector;

3) prize r _k : optimal control strategy at given lower layer

The convergence goal of the upper-layer control strategy is to maximize the overall utility of the system. Thus, the reward function is defined as the utility of the system that satisfies the constraint, while the system that does not satisfy the constraint is negative feedback, specifically expressed as

R in formula (31) _reg Representing a fixed value.

Step S46, the upper controller controls the PRB number configuration of each slice by using the DQN algorithm, including the following steps:

2) initializing an environment state during each training, observing an initial state s by an upper layer controller, and setting time _ slot to be 1;

3) the upper controller adopts an action a based on an epsilon-greedy algorithm according to the observed state, obtains a corresponding reward r, and the environment enters the next state s';

4) after all the state transition sequences (s, a, r, s'), they are stored in an experience buffer;

5) updating weights of Q function in DQN by performing random gradient descent

To minimize the loss function

6) Traversal time _ slot 1-T _U Time _ slot +1, return to execute 3);

7) traverse trailing _ epsilon 1-K _U And (5) returning to execute 2) until the algorithm converges.

The whole process of dynamic resource allocation of a non-cellular massive MIMO wireless access network using the method of the present invention is presented above.

Fig. 1 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 2:4, where the red plot represents the spectral efficiency of Static Resource Allocation (SRA);

fig. 2 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 3:3, where the red plot represents the spectral efficiency of Static Resource Allocation (SRA);

fig. 3 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 4:2, where the red plot represents the spectral efficiency of Static Resource Allocation (SRA);

as can be seen from the above figure, when the maddppg algorithm is used to learn the lower layer control strategy, the optimal performance can be learned in all PRB number configurations. The learning of the lower-layer control strategy converges to 10000 epsilon, and the performance of the lower-layer control strategy is almost twice that of the SRA strategy.

Fig. 4 is a simulation result of the configuration of the number of control slice PRBs of the upper controller, namely the system utility. As can be seen from the figure, with iteration of the learning steps, the DQN algorithm converges to the action with the highest reward, and selects the PRB resource configuration that maximizes the total utility of the system according to the set weight, and allocates 6 PRBs to the URLLC slice and the eMBB slice. Therefore, the upper layer control strategy utilizes the DQN algorithm to solve the configuration of the upper layer PRB amount of the slice, so that an optimal method can be obtained.

The invention provides a double-time-scale wireless access network slicing method in a cell-free distributed large-scale MIMO architecture, which is expanded from a network slicing method in the cell architecture to the cell-free architecture and is combined with a layered time model, so that the utilization rate of limited resources is effectively improved, the real-time property of resource allocation is enhanced, the diversity of the requirements in the future 6G can be met, the method is served for various communication scenes, and has certain use value and research value.

Claims

1. A double-time scale intelligent wireless access network slicing method is characterized in that the method is based on a cell-free distributed large-scale MIMO system architecture, and a distributed large-scale MIMO system has a common structureJ access points AP are connected to the central processing unit, J ═ 1, 2.. once, J }, each AP has M antennas, users in a coverage area are divided into different slices according to service requirements, a slice set i in the coverage area is {1, 2.. once, i }, and users in a slice i are U _i ＝{1,2,...,u _i }; in a dual-time-scale network slice structure, a small-scale time dimension is a transmission time interval TTI of 1ms, a large-scale time dimension k includes Δ T TTIs, in each TTI, a total bandwidth W is divided into F physical resource blocks PRB shared by all APs, that is, F {1, 2.. multidot.f }, and a bandwidth allocated to each PRB is B ═ W/F; the method specifically comprises the following steps:

step S2, a slice model is established, and a user of each slice introduces a buffer data queue transmitted according to a first-come first-obtained strategy on each AP, so that the data packet delay of the user can be divided into processing delay, transmission delay and queuing delay, and two indexes of quality of service (QoS), namely communication reliability and expressive property of packet delay, are obtained;

2. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S1 specifically comprises:

In the formula (1)

is a function of the logarithmic fading variation,

In the formula (2) and the formula (3)

3. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S2 specifically comprises:

In the formula (4)

D in formula (5) _i,t Is the total time delay of the slice i,

4. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S3 specifically comprises:

step S301, upper control strategy pi _C The dynamic change of the service flow and the dynamic change of QoS performance observation are converted into the PRB quantity distribution of each slice, so that the upper layer control strategy pi _C Can be expressed as the k-th large scale time from the whole network S _k Global state to appropriate PRB number configuration C in slice _k Can be modeled as

A in formula (6) _i Indicating the packet arrival rate of the user in slice i,

is the average packet delay of all active users for slice i,

is the user queue length in slice i,

is the channel state information of the user and,

indicating the assignment to user u _i May be one of Z different power levels;

In formula (8)

And average packet drop rate

Determining;

in equation (9) max is the maximization function, π _E Is a lower layer control strategy, pi _C Is an upper control strategy, pi is a joint strategy, U _i,n Is the utility function of slice i with respect to index n, X is a discount factor, when n is sufficiently large, X is ⁿ Going to zero, the optimization problem has the following constraints:

In the formula (10)

Total power of APj;

2) minimum constraint on data rate per slice:

in formula (11)

Is u _i Associating the transmission rates of the jth AP and the f-th PRB,

minimum data rate for a slice;

represents the maximum data processing rate of the jth AP;

4) packet delay constraint for each slice:

d in formula (13) _i,t Is the total time delay of the slice i,

indicating the maximum packet delay;

5) and (3) packet loss rate constraint of each slice:

δ in formula (14) _i,t Is the packet loss rate of the slice i,

representing the minimum packet loss rate;

6)

7)

8)

9)

indicates APj to user u in the t-th TTI _i The correlation factor of (2).

5. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S4 specifically comprises:

step S401, for each slice C _k Under the PRB quantity configuration of the C, the lower layer control strategy learning aims to find an optimal strategy capable of obtaining the maximum expected reward of all states

thus, for one agent

1) State s _j : user channel state information H connected to agent _j (t) and user queue information Q _j (t)；

s _j ＝{Q _j (t),H _j (t)} (20)

R in the formula (22) _reg Represents a fixed value;

thus, for the upper layer controller

And average packet loss rate

3) prize r _k : optimal control strategy at given lower layer

The convergence goal of the upper-level control strategy is to maximize the overall utility of the system, and thus, the reward function is defined as the utility of the system that satisfies the constraints, while the system that does not satisfy the constraints is negative feedback, specifically expressed as

R in the formula (25) _reg Represents a fixed value, U _i,k Is the utility function of the ith slice in the kth large-scale time.