CN114867030A - Double-time-scale intelligent wireless access network slicing method - Google Patents

Double-time-scale intelligent wireless access network slicing method Download PDF

Info

Publication number
CN114867030A
CN114867030A CN202210649530.3A CN202210649530A CN114867030A CN 114867030 A CN114867030 A CN 114867030A CN 202210649530 A CN202210649530 A CN 202210649530A CN 114867030 A CN114867030 A CN 114867030A
Authority
CN
China
Prior art keywords
slice
user
delay
time
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210649530.3A
Other languages
Chinese (zh)
Inventor
李佳珉
王洁
叶枫
朱鹏程
盛彬
尤肖虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210649530.3A priority Critical patent/CN114867030A/en
Publication of CN114867030A publication Critical patent/CN114867030A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0426Power distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Power Engineering (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a double-time scale intelligent wireless access network slicing method. The method is based on a cell-free distributed large-scale MIMO system architecture, combines non-orthogonal multiple access and massive terminal dynamic multi-connection, and respectively performs physical resource block allocation and power allocation on two time scales by using a reinforcement learning algorithm according to the characteristic of long-term change of a network state so as to realize self-adaptive resource allocation under different time and resource granularity. Compared with the prior art, the invention sets the upper and lower layer combined configuration resources, gives the number configuration of the physical resource blocks of each piece of the upper layer, controls the lower layer to perform physical resource block allocation and power allocation and dynamically selects links for each user according to the environmental change of the physical layer in a small time scale, improves the frequency spectrum efficiency of the system, meets the requirements of ultrahigh reliability and ultralow delay service of future 6G mass flow, and has very important significance for researching real-time resource allocation in a mobile scene.

Description

Double-time-scale intelligent wireless access network slicing method
Technical Field
The invention relates to a double-time-scale intelligent wireless access network slicing method based on a cell-free distributed large-scale MIMO system architecture, and belongs to the technical field of mobile communication.
Background
With the rapid development of the mobile internet, the communication service scale is continuously enlarged, the difference of user requirements is higher and higher, the limited frequency spectrum becomes more and more in short supply, the requirements for high system throughput, ultra-low delay, ultra-high reliability and real-time connection are further improved, and the traditional wireless communication system needs further improvement. The cellular-free distributed massive MIMO is an innovative scalable network MIMO, where a large number of APs distributed in one area serve all users in the same time-frequency resource. The cellular-free distributed massive MIMO has very high spectral efficiency, energy efficiency and coverage. In addition, the problem of mobility of user positions in a cellular network is solved through a cellular-free structure, and compared with a centralized system, the cellular-free distributed large-scale MIMO system has the advantages of channel diversity, no switching, higher coverage rate, no need of deploying cells in a specific area and the like. In addition, the multi-connection can effectively reduce the delay caused by retransmission and error transmission under the condition of a single link, and meet the high reliability requirement of a 6G system on mass service and the characteristic of a non-cellular distributed large-scale MIMO system; the non-orthogonal multiple access supports massive user access with limited spectrum resources, further develops a power domain and improves the throughput of the system.
In order to satisfy the requirement of providing customized services for future 6G large-scale services, the 6G system focuses more on the utilization rate of limited resources, and therefore a network slicing technology for realizing resource sharing by using a network virtualization technology is developed. The network slice abstracts physical resources into virtual logic networks suitable for different scenes by using independent and flexible virtual resource slices, and provides powerful guarantee for QoS. The research on the core network slice is relatively comprehensive, and mainly focuses on the configuration and management of the network slice; the existing wireless access network slicing technology combines self optimization of multi-granularity network resources, and a layered slicing framework is provided, but the existing wireless access network slicing technology is provided in a cellular network, and the non-cellular network and the network slicing technology are highly matched, so that on one hand, the non-cellular distributed large-scale MIMO system can reduce the randomness of wireless channels in the network slices, and on the other hand, the network slices enable the application in the non-cellular distributed large-scale MIMO system to be more flexible. Therefore, the combined research of the non-cellular distributed massive MIMO system and the network slice becomes more and more important, and has an important significance for meeting the diversity requirement in the future 6G and realizing the dynamic allocation of limited resources.
Disclosure of Invention
The technical problem is as follows: in view of the above, an object of the present invention is to provide a dual-time scale intelligent radio access network slicing method based on a cell-free distributed massive MIMO system architecture, so as to implement efficient application and dynamic allocation of limited resources in the cell-free distributed massive MIMO system architecture in combination with a network slicing technique.
The technical scheme is as follows: the invention provides a method for jointly optimizing QoS (quality of service) of a user by an algorithm under double time scales under the constraints of guaranteeing user queue time delay, meeting the requirements of slicing the lowest average rate, user rate interruption probability and the like by carrying out PRB (physical resource block) distribution and power distribution on an uplink user in a cell-free distributed large-scale MIMO (multiple input multiple output) system architecture, namely a method for slicing a double-time-scale intelligent wireless access network, which comprises the following specific steps:
the method is based on a cell-free distributed large-scale MIMO system architecture, in the distributed large-scale MIMO system, J Access Points (AP) are connected to a central processing unit, J is {1, 2.., J }, each AP has M antennas, users in a coverage range are divided into different slices according to service requirements, a slice set I in the coverage range is {1, 2.., i }, and users in a slice i are U i ={1,2,...,u i }; in a dual-time-scale network slice structure, a small-scale time dimension is a transmission time interval TTI of 1ms, a large-scale time dimension k includes Δ T TTIs, in each TTI, a total bandwidth W is divided into F physical resource blocks PRB shared by all APs, that is, F {1, 2.. multidot.f }, and a bandwidth allocated to each PRB is B ═ W/F; the method specifically comprises the following steps:
step S1, establishing a channel model and an uplink signal transmission model of the distributed large-scale MIMO system to obtain an uplink channel transmission expression and transmission rate expressions of enhanced mobile broadband eMBB users and high-reliability and low-delay communication URLLC users;
step S2, establishing a slice model, wherein each user of each slice introduces a buffer data queue transmitted according to a first-come first-obtained strategy on each AP, so that the data packet delay of the user can be divided into processing delay, transmission delay and queuing delay, and two indexes of quality of service (QoS), namely communication reliability and expressiveness of packet delay, are obtained;
step S3, establishing a hierarchical optimization model under the conditions that the user queue time delay is ensured, the requirements of the minimum average rate of slicing and the user rate interruption probability are met, and the like;
step S4, a method for double-time scale access network slicing is provided, firstly, an upper layer controller observes user service flow in large-scale time by using a deep Q network DQN algorithm, and different quantities of PRBs are distributed to each slice; based on the slice configuration method obtained by the upper layer controller, the lower layer controller carries out specific PRB allocation and power allocation on each user in the slice according to channel information in small-scale time by using a multi-agent depth certainty strategy gradient MADDPG algorithm.
Wherein the step S1 specifically includes:
step S101, considering a fading channel under a multi-connection scene, user u in the tth TTI i The gain of the uplink channel between the ith PRB and the jth AP is modeled as
Figure BDA0003685515150000031
In the formula (1)
Figure BDA0003685515150000032
Representing from jth AP to user u i The large-scale fading in between is reduced,
Figure BDA0003685515150000033
representing slave users u i The distance, ζ, to the jth AP is the path loss exponent,
Figure BDA0003685515150000034
is a function of the logarithmic fading variation,
Figure BDA0003685515150000035
representing small-scale fading whose elements obey a standard Rayleigh distribution
Figure BDA0003685515150000036
Step S102, two slice types are considered in the distributed architecture, one is eMBB slice, the data transmission rate accords with the Shannon capacity theory, and the data transmission rate of eMBB users in the t-th TTI can be modeled as
Figure BDA0003685515150000037
The other is URLLC slice, the data rate of which is approximated by finite block length theory, and the data transmission rate of URLLC users in the tth TTI can be modeled as
Figure BDA0003685515150000038
In the formula (2) and the formula (3)
Figure BDA0003685515150000039
Representing the signal-to-noise ratio, Δ t refers to one TTI, and B is the bandwidth; in formula (3)
Figure BDA00036855151500000310
Representing the channel dispersion, p i Is the average packet length, Q, of slice i -1 (. cndot.) is an inverse Gaussian Q function, and ε is the effective decoding error probability.
The step S2 specifically includes:
step S201, dividing the data packet time delay of the user into processing time delay, transmission time delay and queuing time delay, and slicing the total time delay D of the i in the t-th TTI i,t Is composed of
Figure BDA00036855151500000311
In the formula (4)
Figure BDA00036855151500000312
Respectively representing the transmission delay, the transmission delay and the queuing delay of the slice i;
step S202, defining the packet loss rate of the ith slice as the probability that the total delay of packets in the slice i exceeds a predefined maximum slice delay threshold; then, the packet drop rate, i.e. the packet loss rate δ, of the ith slice in the tth TTI i,t Can be expressed as
Figure BDA00036855151500000313
D in formula (5) i,t Is the total time delay of the slice i,
Figure BDA00036855151500000314
represents the maximum packet delay acceptable for slice i, Pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance.
The step S3 specifically includes:
step S301, upper control strategy pi C The dynamic change of the service flow and the dynamic change of QoS performance observation are converted into the PRB quantity distribution of each slice, so that the upper layer control strategy pi C Can be expressed as the kth largestFrom the whole network S in time scale k Global state to appropriate PRB number configuration C in slice k Can be modeled as
Figure BDA0003685515150000041
A in formula (6) i Indicating the packet arrival rate of the user in slice i,
Figure BDA0003685515150000042
is the average packet delay of all active users for slice i,
Figure BDA0003685515150000043
is the average packet loss rate, C, of all active users on slice i i,k Is the PRB number configuration for slice i;
step S302, in the t TTI of the kth large-scale time, the lower layer controller will observe the user information X t And PRB number configuration information C k Method for allocating overall radio resources mapped in physical layer t Lower control strategy pi E Can be modeled as
Figure BDA0003685515150000044
C in formula (7) k Is the PRB number configuration per slice, at is a large scale time length,
Figure BDA0003685515150000045
is the user queue length in slice i,
Figure BDA0003685515150000046
is the channel state information of the user and,
Figure BDA0003685515150000047
is a binary user association factor, representing AP association and PRB allocation,
Figure BDA0003685515150000048
indicating that the power allocated to the user ui may be one of Z different power levels;
step S303, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, an upper control and a lower control, so that the utility function U of the ith slice in the kth large-scale time i,k Can be modeled as
Figure BDA0003685515150000049
In formula (8)
Figure BDA00036855151500000410
Is a QOS utility function for slice i, the average delay by all active users in slice i
Figure BDA00036855151500000411
And average packet drop rate
Figure BDA00036855151500000412
Determining;
Figure BDA00036855151500000413
is a spectral efficiency utility function for slice i, determined by the data rates of all active users in slice i and r i,t Determining that Δ T is a large scale time duration, α i,1 、α i,2 、α i,3 Is a positive weighting factor;
the goal of the hierarchical network slice architecture is to achieve optimal system performance based on satisfying the radio resource constraints, and therefore, the optimization problem in the hierarchical network slice can be designed as follows:
Figure BDA0003685515150000051
max in equation (9) is a maximization function,π E Is a lower layer control strategy, π C Is an upper control strategy, pi is a joint strategy, U i,n Is the utility function of slice i with respect to index n, X is a discount factor, when n is sufficiently large, X is n Going to zero, the optimization problem has the following constraints:
1) limiting the total power allocated to each AP to less than the total power of all APs
Figure BDA0003685515150000052
In the formula (10)
Figure BDA0003685515150000053
Total power of APj;
2) minimum constraint on data rate per slice:
Figure BDA0003685515150000054
in formula (11)
Figure BDA0003685515150000055
Is u i Associating the transmission rates of the jth AP and the f-th PRB,
Figure BDA0003685515150000056
is the minimum data rate for the slice;
3) the total data processing rate at each slice of an AP is less than the maximum data processing rate that the AP can achieve:
Figure BDA0003685515150000057
r in formula (12) j,i Represents the total data processing rate of the jth AP on slice i,
Figure BDA0003685515150000058
denotes the maximum of the jth APA data processing rate;
4) packet delay constraint for each slice:
Figure BDA0003685515150000059
d in formula (13) i,t Is the total time delay of the slice i,
Figure BDA00036855151500000510
indicating the maximum packet delay;
5) and (3) packet loss rate constraint of each slice:
Figure BDA00036855151500000511
delta in the formula (14) i,t Is the packet loss rate of the slice i,
Figure BDA00036855151500000512
represents the minimum packet loss rate;
6)
Figure BDA00036855151500000513
equation (15) ensures that each AP can only allocate one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;
7)
Figure BDA0003685515150000061
equation (16) ensures that different APs cannot allocate the same PRB to the same user,
Figure BDA0003685515150000062
respectively indicates that two different APs are corresponding to the user u in t TTIs aiming at the same PRB i The correlation factor of (a);
8)
Figure BDA0003685515150000063
equation (17) ensures that the same AP can allocate different PRBs for different users,
Figure BDA0003685515150000064
respectively indicates that two PRBs in t TTIs aim at the same AP and are different from each other to user u i The allocation factor of (c);
9)
Figure BDA0003685515150000065
equation (18) ensures that active users in the system must connect to at least one AP and the allocated resources,
Figure BDA0003685515150000066
indicates APj to user u in the t-th TTI i The correlation factor of (2).
The step S4 specifically includes:
step S401, in each slice C k Under the configuration of PRB number belonging to the family C, the aim of lower-layer control strategy learning is to find an optimal strategy capable of obtaining the maximum expected reward of all states
Figure BDA0003685515150000067
The optimization problem of the underlying control strategy is therefore designed as follows to obtain the maximum desired jackpot;
Figure BDA0003685515150000068
pi in the formula (19) E Is a lower layer control strategy, C k Is a sliced PRB number configuration;
step S402, the optimization problem of the lower-layer control strategy can be solved by using the MADDPG algorithm, and the AP and the communication network can be respectively used as an agent and an environment; for the lower layer controller, the observed physical layer should dynamically perform the action of radio resource allocation to obtain the maximum expected jackpot for the system;
thus, for one agent
1) State s of j : user channel state information H connected to agent j (t) and user queue information Q j (t);
s j ={Q j (t),H j (t)} (20)
2) Action a j : for APj, the action corresponds to a radio resource allocation method, including power allocation and PRB allocation, and therefore the role of the agent at the current time t is denoted as
Figure BDA0003685515150000069
3) Prize r j : the reward function for an agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRBs and power under constraints, otherwise it is defined as negative feedback, and thus the reward function for each agent can be expressed as
Figure BDA0003685515150000071
R in the formula (22) reg Represents a fixed value;
step S403, the optimization problem of the upper control strategy can be solved by utilizing the DQN algorithm, and for an upper controller, the number of PRBs in each slice should be dynamically configured according to the service flow so as to maximize the overall utility of the system;
thus, for the upper layer controller
1) State s k : the global state information includes an average arrival rate A of the users i Average delay rate
Figure BDA0003685515150000072
And average packet loss rate
Figure BDA0003685515150000073
Figure BDA0003685515150000074
2) Action a k : the action space of the upper layer controller is allocated C corresponding to the number of PRBs per slice k ,C i,k Is the number of PRB configurations for slice i; since there is a total of one I slice in the system, the motion space can be represented by an I-dimensional vector;
Figure BDA0003685515150000075
3) prize r k : optimal control strategy at given lower layer
Figure BDA0003685515150000076
The convergence goal of the upper-level control strategy is to maximize the overall utility of the system, and thus, the reward function is defined as the utility of the system that satisfies the constraints, while the system that does not satisfy the constraints is negative feedback, specifically expressed as negative feedback
Figure BDA0003685515150000077
Equation (25) shows a fixed value, U i,k Is the utility function of the ith slice in the kth large-scale time.
Has the advantages that: the invention provides a double-time-scale wireless access network slicing method in a cell-free distributed large-scale MIMO architecture, which is expanded from a network slicing method in the cell architecture to the cell-free architecture and is combined with a layered time model, so that the utilization rate of limited resources is effectively improved, the real-time property of resource allocation is enhanced, and the diversity of the requirements in the future 6G can be met.
Drawings
Fig. 1 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 2:4, where the red plot represents the spectral efficiency of the static resource allocation method;
fig. 2 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 3:3, where the red plot represents the spectral efficiency of the static resource allocation method;
fig. 3 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 4:2, where the red plot represents the spectral efficiency of the static resource allocation method;
fig. 4 is a simulation result of the configuration of the upper controller controlling the number of sliced PRBs.
Detailed Description
The present invention will be described in detail below with reference to examples:
assume a 0.5X 0.5m 2 There are 2 APs with 50 antennas each in the large-scale MIMO system without cellular distribution. In the coverage area, there are two types of users with different service types, i.e. users with high reliability and ultra-low delay transmission service requirement are divided into URLLC slices, i.e. slice 0; users requiring high data rate services are divided into eMBB slices, i.e., slice 1.
The channel model consists of three parts: path loss, shadow fading, and small-scale fading, which can be expressed as
Figure BDA0003685515150000081
Wherein
Figure BDA0003685515150000082
Let the path fading factor α be 3.6, the reference distance be 1,
Figure BDA0003685515150000083
to satisfy the exponentially normally distributed shadow fading variations,
Figure BDA0003685515150000084
representing small-scale fading whose elements obey a standard Rayleigh distribution
Figure BDA0003685515150000085
Within a dual-timescale network slice structure, the small-scale time T dimension is Δ T ═ 1ms transmission time interval, the large-scale time k dimension includes Δ T TTIs, Δ T ═ 10ms, in each TTI, the total bandwidth W is divided into 6 PRBs shared by all APs, and the average allocated bandwidth of F ═ 1, 2. The method is characterized by comprising the following steps:
step S1, establishing a channel model and an uplink signal transmission model of the distributed large-scale MIMO system, and obtaining an uplink channel transmission expression and transmission rate expressions of two types (URLLC, emBB) of users.
In this embodiment, step S1 specifically includes:
step S11, consider a fading channel under a multi-connection scenario, user u in the tth TTI i The gain of the uplink channel between the ith PRB and the jth AP is modeled as
Figure BDA0003685515150000086
Step S12, eMBB slicing, the data transmission rate of which accords with the Shannon capacity theory, the data transmission rate of the eMBB user in the t-th TTI can be modeled as
Figure BDA0003685515150000087
The other is URLLC slice, the data rate of which is approximated by finite block length theory, and the data transmission rate of URLLC users in the tth TTI can be modeled as
Figure BDA0003685515150000091
In formula (3)
Figure BDA0003685515150000092
Representing the channel dispersion, Q -1 (. is an inverse Gaussian Q function, ρ i Is the average packet length of slice i, ε being the effective decoding error probabilitySet to 0.05; in the formula (2) and the formula (3)
Figure BDA0003685515150000093
Representing the signal-to-noise ratio, can be modeled as
Figure BDA0003685515150000094
Additive white gaussian noise power σ in equation (4) 2 =-174dBm/Hz;
Figure BDA0003685515150000095
Indicates that the user u is allocated from APj in the slice i on the f PRB in the t TTI i The power of 0, 9, 19, 29 can be selected.
Step S2, establishing slice model, introducing a buffer data queue transmitted according to the first-come first-obtained strategy on each AP by each slice user, and obtaining two indexes of service quality, namely communication reliability and expressive property of packet delay.
In this embodiment, step S2 specifically includes:
step S21, assuming that each user has a data queue on the AP to buffer the incoming packets, indicating that the total packet length in slice i is Ω i In which Ω is set 0 =1000Byte,Ω 1 5000 bytes and the data queue is delivered according to a first come first get policy. In the t-th TTI, user u is in slice i i In the buffer of (2) waiting for the queue length to be sent to be Q ui (t), then user u i The queue updating process of
Figure BDA0003685515150000096
A in the formula (5) i Indicating the packet arrival rate of users in slice i, where A is set 0 =0.2packets/s,A 1 =1packets/s,
Figure BDA0003685515150000097
Is user u i The transmission rate of (c).
Step S22, dividing the user data packet time delay into processing time delay, transmission time delay and queuing time delay, wherein the total time delay of the slice i in the tth TTI is
Figure BDA0003685515150000098
1) Transmission delay refers to the time required to transmit a packet over the link between the AP and the slice. Therefore, the transmission delay of slice i in the t-th TTI
Figure BDA0003685515150000099
Can be expressed as
Figure BDA0003685515150000101
R in formula (7) i,t Is the total transmission rate of slice i;
2) the processing delay refers to the time required for processing a data packet after the AP receives a data request of a corresponding user. Processing delay of slice i in t-th TTI
Figure BDA0003685515150000102
Can be expressed as
Figure BDA0003685515150000103
R in formula (8) j,i Represents the total data processing rate of the jth AP on slice i, where R is set j,0 =1Mbit/s,R j,1 =0.5Mbit/s;
3) According to the queuing theory, the average waiting time (including waiting time and service time) of the data packet arrival in the slice i, i.e. the queuing delay of the slice i in the TTI
Figure BDA0003685515150000104
Is composed of
Figure BDA0003685515150000105
Mu in formula (9) i Representing the service rate of the user in slice i, θ i Set to θ for the average service rate per PRB in slice i 0 =50bit/s,θ 1 =30bit/s,C i Is PRB configuration, U, of slice i i Is the number of users of slice i, and is set to 3.
Step S23, define the packet loss rate of the ith slice as the probability that the total delay of packets in slice i exceeds a predefined maximum slice delay threshold. Then, the packet drop rate, i.e. the packet loss rate, of the ith slice in the tth TTI can be expressed as
Figure BDA0003685515150000106
D in formula (10) i,t Is the total time delay of the slice i,
Figure BDA0003685515150000107
represents the maximum packet delay acceptable for slice i, Pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance.
And step S3, establishing a hierarchical optimization model under the conditions of ensuring the user queue time delay, meeting the constraints of slice minimum average rate requirement, user rate interruption probability and the like.
In this embodiment, step S3 specifically includes:
step S31, upper control strategy pi C The dynamic change of the service flow and the dynamic change of QoS performance observation are converted into the PRB quantity distribution of each slice, so that the upper layer control strategy pi C Can be expressed as from the entire network S k Global state to appropriate PRB number configuration C in slice k Can be modeled as
Figure BDA0003685515150000111
A in formula (11) i Indicating the packet arrival rate of the user in slice i,
Figure BDA0003685515150000112
is the average packet delay for slice i users,
Figure BDA0003685515150000113
is the average packet loss rate, C, at slice i user i,k Is the PRB number configuration for slice i.
Step S32, in every TTI of the kth large scale time, the lower layer controller will observe the user information X t And PRB number configuration information C k Method for allocating overall radio resources mapped in physical layer t Lower control strategy pi E Can be modeled as
Figure BDA0003685515150000114
C in formula (12) k Is the PRB number configuration per slice, at is a large scale time length,
Figure BDA0003685515150000115
is the user queue length in slice i,
Figure BDA0003685515150000116
is the channel state information of the user.
Figure BDA0003685515150000117
Is a binary user association factor, representing AP association and PRB allocation,
Figure BDA0003685515150000118
indicating the allocation to user u i May be one of Z different power levels.
Step S33, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, upper control and lower control, so that the utility function of the ith slice in the kth large-scale time can be modeled as
Figure BDA0003685515150000119
In formula (13)
Figure BDA00036855151500001110
Is a QOS utility function for slice i, the average delay by all active users in slice i
Figure BDA00036855151500001111
And average packet drop rate
Figure BDA00036855151500001112
Determining;
Figure BDA00036855151500001113
is a spectral efficiency utility function for slice i, determined by the data rates of all active users in slice i and r i,t Determining that Δ T is a large scale time duration, α i,1 、α i,2 、α i,3 Are positive weighting factors, set to 1, 10, respectively 6 ,10 5
The goal of the hierarchical network slice architecture is to achieve optimal system performance based on satisfying radio resource constraints. Thus, the optimization problem in hierarchical network slices can be designed as follows:
Figure BDA0003685515150000121
pi in formula (14) E Is a lower layer control strategy, pi C Is an upper control strategy, pi is a joint strategy, U i,n Is the utility function of slice i with respect to index n, X is a discount factor, when n is sufficiently large, X is n Tending to zero. The optimization problem has the following characteristicsThe following constraints:
1) limiting the total power allocated to each AP to less than the total power of all APs
Figure BDA0003685515150000122
In the formula (15)
Figure BDA0003685515150000123
Total power for all APs;
2) minimum constraint on data rate per slice:
Figure BDA0003685515150000124
in the formula (16)
Figure BDA0003685515150000125
Is u i Associating the transmission rates of the jth AP and the f-th PRB,
Figure BDA0003685515150000126
is the minimum data rate of a slice, wherein
Figure BDA0003685515150000127
3) The total data processing rate at each slice of an AP is less than the maximum data processing rate that the AP can achieve:
Figure BDA0003685515150000128
r in formula (17) j,i Represents the total data processing rate of the jth AP on slice i,
Figure BDA0003685515150000129
represents the maximum data processing rate of the jth AP, wherein
Figure BDA00036855151500001210
4) Packet delay constraint for each slice:
Figure BDA00036855151500001211
d in formula (18) i,t Is the total time delay of the slice i,
Figure BDA00036855151500001212
indicating the maximum packet delay, wherein
Figure BDA00036855151500001213
Figure BDA00036855151500001214
5) And (3) packet loss rate constraint of each slice:
Figure BDA00036855151500001215
δ in the formula (19) i,t Is the packet loss rate of the slice i,
Figure BDA0003685515150000131
represents a minimum packet loss rate, wherein
Figure BDA0003685515150000132
6)
Figure BDA0003685515150000133
Equation (20) ensures that each AP can only allocate one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;
7)
Figure BDA0003685515150000134
equation (21) ensures that different APs cannot allocate the same PRB to the same user,
Figure BDA0003685515150000135
respectively indicates that two different APs are corresponding to the user u in t TTIs aiming at the same PRB i The correlation factor of (a);
8)
Figure BDA0003685515150000136
equation (22) ensures that the same AP can allocate different PRBs for different users,
Figure BDA0003685515150000137
respectively indicates that two different PRBs are used for the user u in the same AP in t TTIs i The allocation factor of (c);
9)
Figure BDA0003685515150000138
equation (23) ensures that active users in the system must connect to at least one AP and the allocated resources,
Figure BDA0003685515150000139
indicates APj to user u in the t-th TTI i The correlation factor of (2).
Step S4, a double-time scale network slicing method is provided, firstly, an upper layer controller observes user service flow in large-scale time by using a DQN algorithm and distributes different quantities of PRBs to each slice, so that PRB resources can be shared among the slices; based on the slice configuration method obtained by the upper layer controller, the lower layer controller performs specific PRB allocation and power allocation on each user in the slice according to the channel state and the user queue information in the small-scale time by using the MADDPG algorithm.
In this embodiment, step S4 specifically includes:
step S41, for each slice C k PRB quantity matching of E CThe goal of the lower level control strategy learning is to find an optimal strategy that can achieve the maximum desired reward for all states
Figure BDA00036855151500001310
The optimization problem of the underlying control strategy is therefore designed as follows to obtain the maximum desired jackpot;
Figure BDA00036855151500001311
step S42, the problem of optimization of the lower-layer control strategy can be solved by using the maddppg algorithm, and the AP and the communication network can be respectively used as an agent and an environment. For the lower layer controller, the observed physical layer should dynamically perform the action of radio resource allocation to achieve the maximum desired jackpot for the system.
Thus, for one agent
1) State s j : considering that the packet arrival rate of each slice set is always the same and the user queue remains in the same state, the state formula of the agent at the current time t can be simplified to
Figure BDA0003685515150000141
2) Action a j : for APj, the actions correspond to one radio resource allocation method, including power allocation and PRB allocation. Thus, the role of the agent at the current time t is denoted as
Figure BDA0003685515150000142
3) Prize r j : the reward function of the agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRBs and power under the constraints, otherwise, negative feedback is defined. Thus, the reward function for each agent may be expressed as
Figure BDA0003685515150000143
R in the formula (27) reg Representing a fixed value.
Step S43, the lower layer controller allocates PRBs and power by using the maddppg algorithm, including the following steps:
1) initializing a neural network by using random parameters, and setting a training _ epsilon as 1;
2) initializing an environment state during each training, observing an initial state s by all APs, and setting time _ slot to be 1;
3) all APs in each TTI perform action selection a according to the observed state, namely PRB allocation and power allocation are performed on the user, then the environment gives an intelligent agent reward r according to whether the action meets the constraint condition, and the environment enters the next state s';
4) storing the state transition sequences (s, a, r, s') transmitted by all APs in an experience buffer;
5) the lower layer controller passes
Figure BDA0003685515150000144
Updating the criticizing network and calculating the action gradient of all the agents, wherein
Figure BDA0003685515150000145
As a function of the action value of agent j,
Figure BDA0003685515150000146
a loss function that is a function of the action value;
6) all AP according to
Figure BDA0003685515150000147
Receiving an action gradient updated by an action network;
7) traversal time _ slot 1-T L Time _ slot +1, updating the user position, and returning to execute 3);
8) traverse trailing _ epsilon 1-K L The tracing _ epsilon +1 returns to execute 2),until the algorithm converges.
Step S44, control strategy at converged lower layer
Figure BDA0003685515150000148
Lower, upper layer control strategy
Figure BDA0003685515150000149
The optimization problem of (2) is designed as follows to learn the optimal upper control strategy;
Figure BDA00036855151500001410
step S45, the optimization problem of the upper layer control strategy can be solved by using the DQN algorithm, and for the upper layer controller, the number of PRBs in each slice should be dynamically configured according to the service traffic, so as to maximize the overall utility of the system.
Thus, for the upper layer controller
1) State s k : since the user packet arrival rate of each slice is a fixed value and the average packet loss rate is determined by the average delay, the state can be simplified to
Figure BDA0003685515150000151
2) Action a k : the action space of the upper layer controller is allocated C corresponding to the number of PRBs per slice k ,C i,k Is the number of PRB configurations for slice i. Since there is a total of one I slice in the system, the motion space can be represented by an I-dimensional vector;
Figure BDA0003685515150000152
3) prize r k : optimal control strategy at given lower layer
Figure BDA0003685515150000153
The convergence goal of the upper-layer control strategy is to maximize the overall utility of the system. Thus, the reward function is defined as the utility of the system that satisfies the constraint, while the system that does not satisfy the constraint is negative feedback, specifically expressed as
Figure BDA0003685515150000154
R in formula (31) reg Representing a fixed value.
Step S46, the upper controller controls the PRB number configuration of each slice by using the DQN algorithm, including the following steps:
1) initializing a neural network by using random parameters, and setting a training _ epsilon as 1;
2) initializing an environment state during each training, observing an initial state s by an upper layer controller, and setting time _ slot to be 1;
3) the upper controller adopts an action a based on an epsilon-greedy algorithm according to the observed state, obtains a corresponding reward r, and the environment enters the next state s';
4) after all the state transition sequences (s, a, r, s'), they are stored in an experience buffer;
5) updating weights of Q function in DQN by performing random gradient descent
Figure BDA0003685515150000155
To minimize the loss function
Figure BDA0003685515150000156
6) Traversal time _ slot 1-T U Time _ slot +1, return to execute 3);
7) traverse trailing _ epsilon 1-K U And (5) returning to execute 2) until the algorithm converges.
The whole process of dynamic resource allocation of a non-cellular massive MIMO wireless access network using the method of the present invention is presented above.
Fig. 1 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 2:4, where the red plot represents the spectral efficiency of Static Resource Allocation (SRA);
fig. 2 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 3:3, where the red plot represents the spectral efficiency of Static Resource Allocation (SRA);
fig. 3 is a graph of spectral efficiency for a lower layer controller when the number of PRBs allocated to URLLC slice (slice 0) and eMBB slice (slice 1) is 4:2, where the red plot represents the spectral efficiency of Static Resource Allocation (SRA);
as can be seen from the above figure, when the maddppg algorithm is used to learn the lower layer control strategy, the optimal performance can be learned in all PRB number configurations. The learning of the lower-layer control strategy converges to 10000 epsilon, and the performance of the lower-layer control strategy is almost twice that of the SRA strategy.
Fig. 4 is a simulation result of the configuration of the number of control slice PRBs of the upper controller, namely the system utility. As can be seen from the figure, with iteration of the learning steps, the DQN algorithm converges to the action with the highest reward, and selects the PRB resource configuration that maximizes the total utility of the system according to the set weight, and allocates 6 PRBs to the URLLC slice and the eMBB slice. Therefore, the upper layer control strategy utilizes the DQN algorithm to solve the configuration of the upper layer PRB amount of the slice, so that an optimal method can be obtained.
The invention provides a double-time-scale wireless access network slicing method in a cell-free distributed large-scale MIMO architecture, which is expanded from a network slicing method in the cell architecture to the cell-free architecture and is combined with a layered time model, so that the utilization rate of limited resources is effectively improved, the real-time property of resource allocation is enhanced, the diversity of the requirements in the future 6G can be met, the method is served for various communication scenes, and has certain use value and research value.

Claims (5)

1. A double-time scale intelligent wireless access network slicing method is characterized in that the method is based on a cell-free distributed large-scale MIMO system architecture, and a distributed large-scale MIMO system has a common structureJ access points AP are connected to the central processing unit, J ═ 1, 2.. once, J }, each AP has M antennas, users in a coverage area are divided into different slices according to service requirements, a slice set i in the coverage area is {1, 2.. once, i }, and users in a slice i are U i ={1,2,...,u i }; in a dual-time-scale network slice structure, a small-scale time dimension is a transmission time interval TTI of 1ms, a large-scale time dimension k includes Δ T TTIs, in each TTI, a total bandwidth W is divided into F physical resource blocks PRB shared by all APs, that is, F {1, 2.. multidot.f }, and a bandwidth allocated to each PRB is B ═ W/F; the method specifically comprises the following steps:
step S1, establishing a channel model and an uplink signal transmission model of the distributed large-scale MIMO system to obtain an uplink channel transmission expression and transmission rate expressions of enhanced mobile broadband eMBB users and high-reliability and low-delay communication URLLC users;
step S2, a slice model is established, and a user of each slice introduces a buffer data queue transmitted according to a first-come first-obtained strategy on each AP, so that the data packet delay of the user can be divided into processing delay, transmission delay and queuing delay, and two indexes of quality of service (QoS), namely communication reliability and expressive property of packet delay, are obtained;
step S3, establishing a hierarchical optimization model under the conditions that the user queue time delay is ensured, the requirements of the minimum average rate of slicing and the user rate interruption probability are met, and the like;
step S4, a method for double-time scale access network slicing is provided, firstly, an upper layer controller observes user service flow in large-scale time by using a deep Q network DQN algorithm, and different quantities of PRBs are distributed to each slice; based on the slice configuration method obtained by the upper layer controller, the lower layer controller carries out specific PRB allocation and power allocation on each user in the slice according to channel information in small-scale time by using a multi-agent depth certainty strategy gradient MADDPG algorithm.
2. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S1 specifically comprises:
step S101, considering a fading channel under a multi-connection scene, user u in the tth TTI i The gain of the uplink channel between the ith PRB and the jth AP is modeled as
Figure FDA0003685515140000011
In the formula (1)
Figure FDA0003685515140000012
Representing from jth AP to user u i The large-scale fading in between is reduced,
Figure FDA0003685515140000013
representing slave users u i The distance, ζ, to the jth AP is the path loss exponent,
Figure FDA0003685515140000014
is a function of the logarithmic fading variation,
Figure FDA0003685515140000015
representing small-scale fading whose elements obey a standard Rayleigh distribution
Figure FDA0003685515140000021
Step S102, two slice types are considered in the distributed architecture, one is eMBB slice, the data transmission rate accords with the Shannon capacity theory, and the data transmission rate of eMBB users in the t-th TTI can be modeled as
Figure FDA0003685515140000022
The other is URLLC slice, the data rate of which is approximated by finite block length theory, and the data transmission rate of URLLC users in the tth TTI can be modeled as
Figure FDA0003685515140000023
In the formula (2) and the formula (3)
Figure FDA0003685515140000024
Representing the signal-to-noise ratio, Δ t refers to one TTI, and B is the bandwidth; in formula (3)
Figure FDA0003685515140000025
Representing the channel dispersion, p i Is the average packet length, Q, of slice i -1 (. cndot.) is an inverse Gaussian Q function, and ε is the effective decoding error probability.
3. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S2 specifically comprises:
step S201, dividing the data packet time delay of the user into processing time delay, transmission time delay and queuing time delay, and slicing the total time delay D of the i in the t-th TTI i,t Is composed of
Figure FDA0003685515140000026
In the formula (4)
Figure FDA0003685515140000027
Respectively representing the transmission delay, the transmission delay and the queuing delay of the slice i;
step S202, defining the packet loss rate of the ith slice as the probability that the total delay of packets in the slice i exceeds a predefined maximum slice delay threshold; then, the packet drop rate, i.e. the packet loss rate δ, of the ith slice in the tth TTI i,t Can be expressed as
Figure FDA0003685515140000028
D in formula (5) i,t Is the total time delay of the slice i,
Figure FDA0003685515140000029
represents the maximum packet delay acceptable for slice i, Pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance.
4. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S3 specifically comprises:
step S301, upper control strategy pi C The dynamic change of the service flow and the dynamic change of QoS performance observation are converted into the PRB quantity distribution of each slice, so that the upper layer control strategy pi C Can be expressed as the k-th large scale time from the whole network S k Global state to appropriate PRB number configuration C in slice k Can be modeled as
Figure FDA0003685515140000031
A in formula (6) i Indicating the packet arrival rate of the user in slice i,
Figure FDA0003685515140000032
is the average packet delay of all active users for slice i,
Figure FDA0003685515140000033
is the average packet loss rate, C, of all active users on slice i i,k Is the PRB number configuration for slice i;
step S302, in the t TTI of the kth large-scale time, the lower layer controller will observe the user information X t And PRB number configuration information C k Method for allocating overall radio resources mapped in physical layer t Lower control strategy pi E Can be modeled as
Figure FDA0003685515140000034
C in formula (7) k Is the PRB number configuration per slice, at is a large scale time length,
Figure FDA0003685515140000035
is the user queue length in slice i,
Figure FDA0003685515140000036
is the channel state information of the user and,
Figure FDA0003685515140000037
is a binary user association factor, representing AP association and PRB allocation,
Figure FDA0003685515140000038
indicating the assignment to user u i May be one of Z different power levels;
step S303, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, an upper control and a lower control, so that the utility function U of the ith slice in the kth large-scale time i,k Can be modeled as
Figure FDA0003685515140000039
In formula (8)
Figure FDA00036855151400000310
Is a QOS utility function for slice i, the average delay by all active users in slice i
Figure FDA00036855151400000311
And average packet drop rate
Figure FDA00036855151400000312
Determining;
Figure FDA00036855151400000313
is a spectral efficiency utility function for slice i, determined by the data rates of all active users in slice i and r i,t Determining that Δ T is a large scale time duration, α i,1 、α i,2 、α i,3 Is a positive weighting factor;
the goal of the hierarchical network slice architecture is to achieve optimal system performance based on satisfying the radio resource constraints, and therefore, the optimization problem in the hierarchical network slice can be designed as follows:
Figure FDA0003685515140000041
in equation (9) max is the maximization function, π E Is a lower layer control strategy, pi C Is an upper control strategy, pi is a joint strategy, U i,n Is the utility function of slice i with respect to index n, X is a discount factor, when n is sufficiently large, X is n Going to zero, the optimization problem has the following constraints:
1) limiting the total power allocated to each AP to less than the total power of all APs
Figure FDA0003685515140000042
In the formula (10)
Figure FDA0003685515140000043
Total power of APj;
2) minimum constraint on data rate per slice:
Figure FDA0003685515140000044
in formula (11)
Figure FDA0003685515140000045
Is u i Associating the transmission rates of the jth AP and the f-th PRB,
Figure FDA0003685515140000046
minimum data rate for a slice;
3) the total data processing rate at each slice of an AP is less than the maximum data processing rate that the AP can achieve:
Figure FDA0003685515140000047
r in formula (12) j,i Represents the total data processing rate of the jth AP on slice i,
Figure FDA0003685515140000048
represents the maximum data processing rate of the jth AP;
4) packet delay constraint for each slice:
Figure FDA0003685515140000049
d in formula (13) i,t Is the total time delay of the slice i,
Figure FDA00036855151400000410
indicating the maximum packet delay;
5) and (3) packet loss rate constraint of each slice:
Figure FDA00036855151400000411
δ in formula (14) i,t Is the packet loss rate of the slice i,
Figure FDA00036855151400000412
representing the minimum packet loss rate;
6)
Figure FDA00036855151400000413
equation (15) ensures that each AP can only allocate one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;
7)
Figure FDA0003685515140000051
equation (16) ensures that different APs cannot allocate the same PRB to the same user,
Figure FDA0003685515140000052
respectively indicates that two different APs are corresponding to the user u in t TTIs aiming at the same PRB i The correlation factor of (a);
8)
Figure FDA0003685515140000053
equation (17) ensures that the same AP can allocate different PRBs for different users,
Figure FDA0003685515140000054
respectively indicates that two PRBs in t TTIs aim at the same AP and are different from each other to user u i The allocation factor of (c);
9)
Figure FDA0003685515140000055
equation (18) ensures that active users in the system must connect to at least one AP and the allocated resources,
Figure FDA0003685515140000056
indicates APj to user u in the t-th TTI i The correlation factor of (2).
5. The method for slicing a dual time scale intelligent radio access network as claimed in claim 1, wherein said step S4 specifically comprises:
step S401, for each slice C k Under the PRB quantity configuration of the C, the lower layer control strategy learning aims to find an optimal strategy capable of obtaining the maximum expected reward of all states
Figure FDA0003685515140000057
The optimization problem of the underlying control strategy is therefore designed as follows to obtain the maximum desired jackpot;
Figure FDA0003685515140000058
pi in the formula (19) E Is a lower layer control strategy, C k Is a sliced PRB number configuration;
step S402, the optimization problem of the lower-layer control strategy can be solved by using the MADDPG algorithm, and the AP and the communication network can be respectively used as an agent and an environment; for the lower layer controller, the observed physical layer should dynamically perform the action of radio resource allocation to obtain the maximum expected jackpot for the system;
thus, for one agent
1) State s j : user channel state information H connected to agent j (t) and user queue information Q j (t);
s j ={Q j (t),H j (t)} (20)
2) Action a j : for APj, the action corresponds to a radio resource allocation method, including power allocation and PRB allocation, and therefore the role of the agent at the current time t is denoted as
Figure FDA0003685515140000059
3) Prize r j : the reward function for an agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRBs and power under constraints, otherwise it is defined as negative feedback, and thus the reward function for each agent can be expressed as
Figure FDA0003685515140000061
R in the formula (22) reg Represents a fixed value;
step S403, the optimization problem of the upper control strategy can be solved by utilizing the DQN algorithm, and for an upper controller, the number of PRBs in each slice should be dynamically configured according to the service flow so as to maximize the overall utility of the system;
thus, for the upper layer controller
1) State s k : the global state information includes an average arrival rate A of the users i Average delay rate
Figure FDA0003685515140000062
And average packet loss rate
Figure FDA0003685515140000063
Figure FDA0003685515140000064
2) Action a k : the action space of the upper layer controller is allocated C corresponding to the number of PRBs per slice k ,C i,k Is the number of PRB configurations for slice i; since there is a total of one I slice in the system, the motion space can be represented by an I-dimensional vector;
Figure FDA0003685515140000065
3) prize r k : optimal control strategy at given lower layer
Figure FDA0003685515140000066
The convergence goal of the upper-level control strategy is to maximize the overall utility of the system, and thus, the reward function is defined as the utility of the system that satisfies the constraints, while the system that does not satisfy the constraints is negative feedback, specifically expressed as
Figure FDA0003685515140000067
R in the formula (25) reg Represents a fixed value, U i,k Is the utility function of the ith slice in the kth large-scale time.
CN202210649530.3A 2022-06-09 2022-06-09 Double-time-scale intelligent wireless access network slicing method Pending CN114867030A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210649530.3A CN114867030A (en) 2022-06-09 2022-06-09 Double-time-scale intelligent wireless access network slicing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210649530.3A CN114867030A (en) 2022-06-09 2022-06-09 Double-time-scale intelligent wireless access network slicing method

Publications (1)

Publication Number Publication Date
CN114867030A true CN114867030A (en) 2022-08-05

Family

ID=82623873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210649530.3A Pending CN114867030A (en) 2022-06-09 2022-06-09 Double-time-scale intelligent wireless access network slicing method

Country Status (1)

Country Link
CN (1) CN114867030A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115514685A (en) * 2022-09-14 2022-12-23 上海兰鹤航空科技有限公司 Delay analysis method of ARINC664 terminal based on transmission table mode
CN116016987A (en) * 2022-12-08 2023-04-25 上海大学 Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network
CN117098239A (en) * 2023-10-17 2023-11-21 国网信息通信产业集团有限公司 Power scene-oriented wireless resource double-layer isolation distribution method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115514685A (en) * 2022-09-14 2022-12-23 上海兰鹤航空科技有限公司 Delay analysis method of ARINC664 terminal based on transmission table mode
CN115514685B (en) * 2022-09-14 2024-02-09 上海兰鹤航空科技有限公司 Delay analysis method of ARINC664 terminal based on transmission table mode
CN116016987A (en) * 2022-12-08 2023-04-25 上海大学 Video code rate self-adaption method based on reinforcement learning and oriented to edge cellular network
CN117098239A (en) * 2023-10-17 2023-11-21 国网信息通信产业集团有限公司 Power scene-oriented wireless resource double-layer isolation distribution method and system
CN117098239B (en) * 2023-10-17 2024-03-19 国网信息通信产业集团有限公司 Power scene-oriented wireless resource double-layer isolation distribution method and system

Similar Documents

Publication Publication Date Title
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
Zhang et al. Deep reinforcement learning for multi-agent power control in heterogeneous networks
CN114867030A (en) Double-time-scale intelligent wireless access network slicing method
JP5824533B2 (en) LTE scheduling
JP4280275B2 (en) Scheduler for controlling transmitter / receiver, transmitter / receiver for mobile communication network, and method and computer program for operating them
AlQerm et al. Enhanced machine learning scheme for energy efficient resource allocation in 5G heterogeneous cloud radio access networks
Gatti et al. Bidirectional resource scheduling algorithm for advanced long term evolution system
CN103249157B (en) The resource allocation methods based on cross-layer scheduling mechanism under imperfect CSI condition
WO2023179010A1 (en) User packet and resource allocation method and apparatus in noma-mec system
Elsayed et al. Radio resource and beam management in 5G mmWave using clustering and deep reinforcement learning
Dao et al. Deep reinforcement learning-based hierarchical time division duplexing control for dense wireless and mobile networks
Casasole et al. Qcell: Self-optimization of softwarized 5g networks through deep q-learning
Zhu et al. Load-aware dynamic mode selection for network-assisted full-duplex cell-free large-scale distributed MIMO systems
Li et al. A general DRL-based optimization framework of user association and power control for HetNet
Huang et al. Joint AMC and resource allocation for mobile wireless networks based on distributed MARL
Chen et al. Virtualized radio resource pre-allocation for QoS based resource efficiency in mobile networks
CN114340017B (en) Heterogeneous network resource slicing method with eMBB and URLLC mixed service
Nouruzi et al. Toward a smart resource allocation policy via artificial intelligence in 6G networks: Centralized or decentralized?
CN115633402A (en) Resource scheduling method for mixed service throughput optimization
CN114928857A (en) Direct connection anti-interference configuration method for mobile equipment of cellular communication network
Venkatesan et al. Interference Mitigation Approach using Massive MIMO towards 5G networks
O'Neill et al. Wireless network utility maximization
Gemici et al. User scheduling and power allocation for nonfull buffer traffic in NOMA downlink systems
Tamilarasan et al. Dynamic Resource Allocation and Priority Based Scheduling for Heterogeneous Services in Cognitive Radio Networks
Wang et al. Integrated resource scheduling for user experience enhancement: A heuristically accelerated drl

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination