CN115412134A - Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method - Google Patents

Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method Download PDF

Info

Publication number
CN115412134A
CN115412134A CN202211051651.4A CN202211051651A CN115412134A CN 115412134 A CN115412134 A CN 115412134A CN 202211051651 A CN202211051651 A CN 202211051651A CN 115412134 A CN115412134 A CN 115412134A
Authority
CN
China
Prior art keywords
user
training
network
state
taking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211051651.4A
Other languages
Chinese (zh)
Inventor
李春国
孙希茜
徐澍
王东明
杨绿溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211051651.4A priority Critical patent/CN115412134A/en
Publication of CN115412134A publication Critical patent/CN115412134A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/309Measuring or estimating channel quality parameters
    • H04B17/336Signal-to-interference ratio [SIR] or carrier-to-interference ratio [CIR]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B17/00Monitoring; Testing
    • H04B17/30Monitoring; Testing of propagation channels
    • H04B17/391Modelling the propagation channel
    • H04B17/3911Fading models or fading generators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0426Power distribution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Power Engineering (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a user-centered cellular-free large-scale MIMO power distribution method based on offline reinforcement learning, which comprises the following steps: constructing a MIMO system with users as the center, and establishing a service relationship between a wireless access point and part of users; taking the power control coefficient of a downlink as an optimization parameter to solve a problem and construct a Markov decision process model; establishing a DuelingDDQN network, performing on-line training, and storing state transfer data generated by interaction of the environment and the network in the on-line training process; and (4) taking out 20% of an online data set, and introducing a regular term into the loss function to carry out offline training on the network. The power distribution strategy taking users as centers selects part of users to access for the wireless access point; the off-line algorithm provided by the invention reduces the training cost, and can realize off-line and real-time adjustment of the power control coefficient in a real scene only by deploying 20% of the data volume of the on-line training data set for training.

Description

Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method
Technical Field
The invention belongs to the field of non-cellular large-scale MIMO power distribution, and particularly relates to a non-cellular large-scale MIMO power distribution method based on offline reinforcement learning and taking a user as a center.
Background
Wireless communication services permeate all social industries, and from ordinary telephone answering and short message sending to some emerging fields such as unmanned driving and intelligent medical treatment, the services of large and small sizes depend on the deployment of a wireless network. In order to ensure the quality of service, the wireless communication service needs to cover a large geographical range, and the conventional wireless communication service uses a cellular network topology to deploy base stations, each of which serves a group of user equipments. This cellular network topology has been in use for decades, and user interference in this scenario is reduced by reducing cell size and applying advanced signal processing schemes. In recent years, a new network topology called a large-scale cellular-free MIMO system has emerged in the field of wireless network services. In the large-scale MIMO system without the cell, the division of the cell is cancelled, and the number of the base stations is far more than that of the users. The idea of cellular-less massive MIMO is to deploy a large number of distributed single antenna Access Points (APs) connected by a Central Processing Unit (CPU). The CPU operates the system in a cellular boundary network-free MIMO manner, and services the users jointly by using a cooperative transceiving manner. Compared with the traditional cellular massive MIMO network, the non-cellular MIMO scheme has strong macro diversity capability, multi-user interference suppression capability, and capability of providing users with similar quality services, and has been widely paid attention and deployed in recent years.
However, there are also some problems with cellular-less MIMO systems. Since all APs in the system are fully connected to the UE, a large amount of power consumption in the fronthaul link has a significant impact on the energy efficiency of the cellular-free MIMO system, and especially in a multi-antenna scenario, as the number of antennas increases, the power consumption of the fronthaul link further increases, thereby reducing the energy efficiency of the system. In addition, in order to further improve the user transmission rate and thus improve the user experience, the cellular-free MIMO system adopts a power allocation strategy to design a power control coefficient. The traditional power control method needs to establish an accurate model for the problem and then iterate to obtain an optimal solution, and the time complexity of the algorithm is often very high, so that a large amount of computing resources are consumed. With the development of modern computing resources, many algorithms based on deep neural networks emerge. The existing power distribution strategy based on deep reinforcement learning all uses an online training strategy, the algorithm needs to interact with the environment in real time while training a network so as to obtain more data sets, however, in an actual application scene, the interaction between an agent and the environment can only occur within a certain time interval, and the real-time interaction is unrealistic, so the algorithms cannot be put into practical application usually.
Disclosure of Invention
The invention aims to provide a user-centered non-cellular large-scale MIMO power distribution method based on offline reinforcement learning, and aims at a downlink data transmission stage in a user-centered non-cellular large-scale MIMO scene, and the user-centered non-cellular large-scale MIMO power distribution method based on a DuelingDDQN network is provided. After modeling of a user-centered non-cellular large-scale MIMO environment, establishment of an MDP model, online training of a Dueling DDQN network and offline training of the Dueling DDQN network, a power control coefficient of the user-centered non-cellular large-scale MIMO is finally obtained, and the technical problems mentioned in the background art are solved.
In order to solve the technical problems, the specific technical scheme of the invention is as follows:
a user-centered cellular-free large-scale MIMO power distribution method based on offline reinforcement learning comprises the following steps:
s1, modeling a non-cellular large-scale MIMO system with a user as a center, determining a service relation between a wireless Access Point (AP) and User Equipment (UE) according to channel estimation of an uplink, taking a power control coefficient in a downlink data transmission stage as an optimization object, and taking the sum of maximized downlink rates as a target to put forward an optimization problem;
s2, according to the optimization problem in the step S1, modeling the optimization process of the power control coefficient in the downlink data transmission stage into a Markov decision process, and determining the state transition, the action space, the strategy and the reward of the Markov decision process;
s3, providing a power distribution algorithm model based on deep reinforcement learning, wherein the model comprises a large-scale MIMO system environment module and an intelligent agent module; the massive MIMO system environment module is used for simulating a channel model and a downlink data transmission model of a cellular-free massive MIMO system with a user as a center, and the agent module is used for sensing the current system state, estimating a Q value of a power distribution strategy and selecting an optimal power distribution coefficient; the core of the intelligent agent module is a deep neural network, and the training mode of the deep neural network comprises early-stage online training and offline training in an application period;
s4, training a deep neural network on line; in the online training stage, before training the deep neural network based on parameters in a data set, a state transition parameter needs to be collected first to update the data set; after a large-scale MIMO system is initialized, firstly, inputting the state of the system into the deep neural network, then selecting a power control coefficient for the current AP based on the Q value output by the deep neural network, implementing a power control strategy in an environment, thereby changing the environment state and obtaining reward, and storing the state transition information of the time; randomly extracting a batch of data from the data set, respectively calculating an accumulated reward value and an expected value by using a deep neural network, and updating parameters of the deep neural network by taking the mean square error of the minimized reward value and the expected value as a target;
s5, training a DuelingDDQN network off line based on the state transition data set collected in the step S4; and (4) taking the first 20% of the state transition data set in the step S4 as an offline training data set, taking a batch of data from the offline data set each time and inputting the data into the deep neural network, respectively calculating the cumulative reward value and the expected value by using the deep neural network, and updating the parameters of the deep neural network by taking the mean square error of the minimal reward value and the expected value as a target, so that the intelligent module can select the optimal power control coefficient.
Further, in the step S1, the constructing a large-scale MIMO system with a user as a center specifically includes:
step S101, firstly setting a distribution area of a scene, setting N UEs to be served by each AP, wherein M APs and K UEs are randomly distributed, and then establishing large-scale fading and small-scale fading models of channels between the APs and the UEs;
step S102, modeling is carried out on an uplink training stage, and the method specifically comprises the following steps:
firstly, distributing an orthogonal pilot frequency sequence for UE, then enabling the UE to forward the pilot frequency sequence to each AP, and after receiving data, estimating a channel coefficient between the AP and the UE based on a minimum mean square error criterion;
step S103, associating the UE needing service for each AP, which specifically includes:
for each AP, arranging channel estimation coefficients between the AP and all the UEs in a descending order, selecting the UE with the highest channel coefficient for each AP to establish a service relationship, and forwarding the established service relationship information to a CPU;
step S104, modeling the downlink data transmission phase, specifically comprising:
and the AP terminal regards the channel estimation obtained in the step S102 as a real channel coefficient, carries out conjugate beam forming on data to be transmitted, and then sends the precoded data to the UE establishing a connection relation with the current AP by power.
Further, in step S1, the optimization problem in step S1 is constructed based on the user signal-to-noise ratio, the transmission rate and the power limitation condition in the downlink data transmission phase.
Further, the user snr expression at the downlink data transmission stage is:
Figure BDA0003823649220000031
in the formula, SINR k K = 1.. K denotes the signal-to-noise ratio of the kth user, β mk Denotes the firstLarge scale fading of the channel between the m APs and the kth UE;
Figure BDA0003823649220000032
represents the normalized signal-to-noise ratio of the pilot symbols,
Figure BDA0003823649220000033
pilot sequence representing the kth UE, η mk M =1,. A, M, K =1,. K denotes a power control coefficient between the mth AP and the kth UE, P (K), K =1,. K denotes a set of APs serving the kth user; in the formula, the content of the active carbon is shown in the specification,
Figure BDA0003823649220000041
wherein
Figure BDA0003823649220000042
Minimum mean square error estimate, τ, representing the channel between the mth AP and the kth UE cf Representing the number of up-training samples, c, in a coherence interval mk The expression of (a) is:
Figure BDA0003823649220000043
further, the expression of the transmission rate in the downlink data transmission phase is:
Figure BDA0003823649220000044
in the formula, the content of the active carbon is shown in the specification,
Figure BDA0003823649220000045
indicates the transmission rate, SINR, of the k-th UE k K = 1.. K denotes the downlink signal-to-noise ratio of K UEs.
Further, the expression of the power allocation optimization problem is as follows:
Figure BDA0003823649220000046
η mk ≥0,k=1,...,K,m=1,...,M;
wherein, T (M), M = 1., M denotes an index set of APs establishing a connection relationship with the mth AP, and N indexes are shared in the index set, which means that each AP serves N UEs.
Further, the step S2 of modeling the optimization process of the power control coefficient in the downlink data transmission stage as a markov decision process specifically includes:
step S201, modeling the optimization step of the power distribution coefficient in the system into a sequential decision process, wherein the process has elements including states, actions, transfer strategies and rewards; in the process, each step selects a power distribution coefficient for an AP in a large-scale MIMO system with a user as the center;
step S202, a system state is set, the system state describes the signal-to-noise ratio condition of a user under the current power distribution strategy, and an AP optimized power control coefficient at the current moment is appointed; setting the current system state as the mth AP updating power control coefficient, the parameter eta will be updated mk ,k∈T(m);
Step S203, setting an action space, wherein the action space is a limited set, and numbers in the set describe all selectable values of the power control coefficient;
step S204, setting a state transition probability, wherein the state transition probability describes the probability that the environment is changed into a new state after a power distribution strategy is implemented on a large-scale MIMO system taking a user as a center, and the value of the state transition probability is [0,1];
and S205, setting a reward, wherein the reward describes the gain of the sum of the transmission rates of the K users after a power distribution strategy is implemented on the large-scale MIMO system taking the users as the center.
Further, the system status in step S202 is represented as S t =[SINR,c]The SINR is the signal-to-noise ratio of a user and is a K-dimensional vector; the specific expression is as follows:
SINR=[SINR 1 ,...,SINR k ,...,SINR K ],
c is a one-hot code for indicating an AP index value, and its specific expression is:
Figure BDA0003823649220000051
e m the M-dimensional vector is 1 in the M dimension, and the other dimensions are 0, which indicates that the power control coefficient needs to be updated for the M-th AP currently, so that the intelligent agent can update the parameter eta of the large-scale MIMO environment centering on the user at the current moment mk ,k∈T(m),η mk =0,
Figure BDA0003823649220000052
Namely, for the UE establishing the service relationship with the mth AP, the updating of the power control coefficients between the UE and the current AP is implemented; setting the power control coefficient between the UE which does not establish service relationship with the mth AP and the mth AP to be 0;
in step S203, the motion space is a t =(η m1m2 ,…,η mK ) Wherein η mk =0,
Figure BDA0003823649220000053
It is described that the value of the power coefficient of the UE which does not establish a service relationship with the AP can only be 0; eta mk E {0.1,0.4,0.7,1.0}, M =1,.., M, k e T (M) describes alternative values for the power coefficient power control coefficient for the UE establishing a service relationship with the AP.
Further, the on-line training process of the dulingddqn network in step S4 specifically includes:
step S401, initializing a large-scale MIMO system environment module taking a user as a center, namely determining distribution and a channel model of an AP and UE; initializing an agent module, namely initializing parameters and a buffer area of a DuelingDDQN network;
s402, collecting state transition data; firstly, inputting a system state into the intelligent agent module, estimating a Q value of the current state by the intelligent agent module, then selecting a power distribution coefficient based on the Q value, transmitting the selected power control coefficient to the large-scale MIMO system environment module for implementation, thereby changing the environment state and obtaining a user signal-to-noise ratio gain as a reward, and finally saving the parameter of the state transition to the cache region;
step S403, training a network; randomly extracting a batch of state transition parameters from the cache region, and taking the system state before transition as the input of the intelligent agent module to enable the intelligent agent to sense the state and estimate the accumulated reward value; then the state after the state transition is used as the input of the intelligent agent module, so that the intelligent agent senses the state and obtains an expected accumulated reward value by combining the reward value information in the state transition;
step S404, updating the network parameters of the DuelingDDQN network by using a back propagation algorithm with the aim of minimizing the mean square error between the accumulated income and the expected value; and continuously and repeatedly carrying out the agent-environment interaction operation from the step S402 to the step S403, thereby continuously updating the network parameters and the data set.
Further, the step S5 of the offline training process specifically includes:
further, the step S5 of the offline training process specifically includes:
step S501, initializing a large-scale MIMO system environment module with a user as a center, namely determining distribution and a channel model of an AP and UE; initializing an agent module, namely randomly initializing parameters of the DuelingDDQN network, and taking out the first 20% of data of the state transition parameter data set collected in the step S4 as a data set for offline training;
step S502, randomly extracting a batch of state transition parameters from an off-line training data set, and taking the system state before transition as the input of an agent module to enable the agent to sense the state and estimate an accumulated reward value; then the state after the state transition is used as the input of the intelligent agent module, so that the intelligent agent senses the state and obtains an expected accumulated reward value by combining the reward value information in the state transition; updating the network parameters of the DuelingDDQN network by utilizing a back propagation algorithm with the aim of minimizing the mean square error between the calculated accumulated income and the expected value;
and S503, continuously repeating the step S502, updating parameters of the Dueling DDQN network by using the off-line data set until the signal-to-noise ratio gain of the user converges to a certain value, and stopping network training.
The off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method has the following advantages that:
1. compared with a general large-scale MIMO system model without a honeycomb, the method disclosed by the invention uses the steps of claim S1, and the large-scale MIMO model taking the user as the center reduces the power consumption of the system and improves the energy efficiency of the system;
2. compared with the traditional optimization-based power distribution algorithm, the method disclosed by the invention uses the steps of the claims S3-S5, and the algorithm based on reinforcement learning reduces the time complexity and time cost in calculation;
3. compared with an online reinforcement learning algorithm, the invention uses the steps of the claims S3-S5, reduces the data set size to 20% of the online training based on the offline reinforcement learning algorithm, and can perform real-time power distribution for a non-cellular large-scale MIMO system centered on users in the practical application scene that the environment and the intelligent agent can not interact in real time.
Drawings
Fig. 1 is a schematic flowchart of a user-centric large-scale MIMO power distribution method based on offline reinforcement learning according to embodiment 1 of the present invention;
FIG. 2 is a block diagram of a power distribution model of an offline reinforcement learning algorithm provided in embodiment 1 of the present invention;
fig. 3 is a flowchart of user-centric large-scale MIMO model establishment provided in embodiment 1 of the present invention;
fig. 4 is a schematic diagram of a user-centric large-scale MIMO system provided in embodiment 1 of the present invention;
fig. 5 is a schematic flowchart of online training of a dueling ddqn network provided in embodiment 1 of the present invention;
fig. 6 is a schematic flowchart of offline training a dulingddqn network according to embodiment 1 of the present invention;
fig. 7 is an offline training curve of the duelingdqn network provided in embodiment 1 of the present invention.
Detailed Description
In order to better understand the purpose, structure and function of the present invention, the following describes a user-centric large-scale MIMO power allocation method based on offline reinforcement learning in further detail with reference to the accompanying drawings.
Example 1
Referring to fig. 1 to fig. 7, the present embodiment provides a user-centric large-scale MIMO power allocation method based on offline reinforcement learning, specifically as shown in fig. 1, the method includes the following steps:
step S1, constructing a large-scale MIMO system without cells with users as the center, which specifically comprises the following steps:
firstly, setting a distribution area of a scene, setting the number of randomly distributed wireless Access Points (APs), user Equipment (UE) and the number of UEs to be served by each AP, and then establishing a large-scale fading model and a small-scale fading model of a channel between the APs and the UEs.
Then, an orthogonal pilot sequence is distributed for the UE, then the UE forwards the pilot sequence to each AP, and after the AP receives data, the channel coefficient between the AP and the characteristic UE is estimated based on the minimum mean square error criterion. And for each AP, sequencing the channel estimation coefficients between the AP and all the UE in a descending order, selecting N UEs with higher channel coefficients for each AP to establish a service relationship, and forwarding the established service relationship information to a CPU.
And the AP terminal carries out conjugate beam forming on the data to be transmitted based on channel estimation and then sends the precoded data to the UE establishing a connection relation with the current AP at a specific power.
And taking a power control coefficient between an AP and UE in a downlink data transmission stage in a non-cellular large-scale MIMO system with a user as a center as an optimization object, taking the sum of the rates of the UE in the downlink stage as a target, and constructing the power distribution optimization problem based on the signal-to-noise ratio of the user, the transmission rate and the power limiting condition in the downlink data transmission stage.
Modeling the optimization process of the power distribution coefficient into a Markov decision process, and determining the state transition, the action space, the strategy and the reward of the Markov decision process.
And S2, modeling the optimization process of the power distribution coefficient into a Markov decision process. The MDP model may be described by a quadruplet, which may be represented as
Figure BDA0003823649220000081
Namely, state space S, motion space
Figure BDA0003823649220000082
Probability of state transition
Figure BDA0003823649220000083
Reward
Figure BDA0003823649220000084
The method comprises the following specific steps:
1. state space S, describing the state of a user-centric cellular-less massive MIMO system. s t =[SINR,c]E S, where SINR is the user signal-to-noise ratio, which is a K-dimensional vector. The specific expression is as follows:
SINR=[SINR 1 ,…,SINR k ,…,SINR K ],
c is a one-hot code for indicating an AP index value, and its specific expression is:
Figure BDA0003823649220000085
e m the M-dimension vector is 1, and the other dimensions are 0, and the M-dimension vector indicates that the power control coefficient needs to be updated for the mth AP at present, so that the intelligent agent can update the parameter eta of the large-scale MIMO environment with the user as the center at the present moment mk ,k∈T(m),η mk =0,
Figure BDA0003823649220000086
Namely, for the UE establishing the service relationship with the mth AP, the updating of the power control coefficients between the UE and the current AP is implemented; and for the UE which does not establish the service relationship with the mth AP, the UE and the mth AP are connectedThe power control coefficient is set to 0.
2. Movement space
Figure BDA0003823649220000087
Power control coefficients are described that an agent may implement for a user-centric cellular-less massive MIMO system. In the present embodiment, it is preferred that,
Figure BDA0003823649220000088
wherein eta is mk =0,
Figure BDA0003823649220000089
It is described that the value of the power coefficient of the UE which does not establish a service relationship with the AP can only be 0; eta mk E {0.1,0.4,0.7,1.0}, M =1,.., M, k e T (M) describes alternative values for the power coefficient power control coefficient for the UE establishing a service relationship with the AP.
3. Transition probability between states
Figure BDA00038236492200000810
Has a value range of [0,1]]。
In this embodiment, assume a state s t =[SINR,c t ]By updating power control coefficients (η) in a user-centric cellular-free massive MIMO environment m1m2 ,…,η mK ) Make the environment transit to the state s t+1 =[SINR′,c t+1 ]。
4. Revenue information
Figure BDA00038236492200000811
In the present embodiment, it is shown as
Figure BDA00038236492200000812
Namely, the gain of the sum of all user rates in a massive MIMO system taking a user as the center before and after one state transition.
And S3, constructing a power distribution algorithm model based on deep reinforcement learning, wherein the model comprises a large-scale MIMO system environment module and an intelligent agent module which take a user as a center. The massive MIMO system environment module is used for simulating a channel model and a downlink data transmission model of a cellular-free massive MIMO system with a user as a center, and the intelligent agent module is used for sensing the current system state, estimating the Q value of a power distribution strategy and selecting the optimal power distribution coefficient; the core of the intelligent agent module is a deep neural network, and the training mode of the deep neural network comprises early-stage online training and off-line training in the application period.
And S4, training the DuelingDDQN network on line. In the online training phase, state transition parameters are acquired to update the data set before training the network based on the parameters in the data set. After a large-scale MIMO system is initialized, firstly, the state of the system is input into the deep neural network, then a power control coefficient is selected for the current AP based on the Q value output by the network, a power control strategy is implemented in the environment, so that the environment state is changed and awards are obtained, and the state transition information of the time is stored. Then, a batch of data is randomly extracted from the data set, and the accumulated reward value and the expected value are respectively calculated by the network, so that the network parameters are updated with the aim of minimizing the mean square error of the reward value and the expected value.
And S5, training the DuelingDDQN network off line based on the state transition data set collected in the step S4. And (4) taking the first 20% of the state transition data set in the step S4 as an offline training data set, taking a batch of data from the offline data set each time to input into a network, respectively calculating the cumulative reward value and the expected value by using the network, updating network parameters by taking the mean square error of the minimum reward value and the expected value as a target, and finally enabling the intelligent module to select the optimal power control coefficient.
Specifically, in this embodiment, a specific structure of the power allocation algorithm model is shown in fig. 2, and more specifically, the power allocation model includes:
user-centric cellular-free massive MIMO environment module: the state transition of a large-scale fading model, a small-scale fading model, an uplink training model, a downlink data transmission model and an MDP model of a channel is simulated, wherein the state transition mode of the MDP model comprises different system states, rewards under power control coefficients and the like.
An online training module: including buffers, the dulingddqn network and action selection policies. In the online training phase, state transition parameters are acquired to update the data set before training the network based on the parameters in the data set. After a large-scale MIMO system is initialized, firstly, the state of the system is input into the deep neural network, then, a power control coefficient is selected for the current AP based on the Q value output by the network, a power control strategy is implemented in the environment, so that the environment state is changed, the reward is obtained, and the state transition information of the time is stored. Then randomly extracting a batch of data from the data set, and respectively calculating the cumulative reward value and the expected value by using the network, and updating the network parameters with the aim of minimizing the mean square error of the reward value and the expected value.
An off-line training module: including an offline training data set, the dulingddqn network. And taking the first 20% of the online training buffer area data set as an offline training data set, taking a batch of data from the offline data set each time to input into a network, respectively calculating the cumulative reward value and the expected value by using the network, updating network parameters by taking the mean square error of the minimized reward value and the expected value as a target, and finally enabling the intelligent module to select the optimal power control coefficient. The updating of the network of off-line training modules relies entirely on the training data sampled from the buffer without requiring additional interaction with the environment.
Specifically, in this embodiment, a specific non-cellular massive MIMO system is provided, and a model establishing process of the system is shown in fig. 3, and more specifically, the non-cellular massive MIMO system is established through the following steps:
step S101, considering an area of 1km 2 The square area is set with M APs and K UEs randomly distributed in the area, and each AP needs to serve N UEs with characteristics. Fig. 4 shows the case when M =8, k =6, n =2, and the AP and the UE have only a single antenna, and the AP is connected to the CPU through an ideal backhaul network. With g mk Describing the channel coefficient between the mth AP and the kth UE, defined by the following equation:
Figure BDA0003823649220000101
in the formula, h mk M =1,.. M, K =1,. Wherein K represents a small scale fading, subject to an independent simultaneous gaussian distribution; beta is a mk M = 1., M, K = 1., K denotes large scale fading.
And step S102, modeling an uplink training phase. Firstly, an orthogonal pilot sequence is distributed for the UE, then the UE forwards the pilot sequence to each AP, after the AP receives data, a channel coefficient between the AP and the specific UE is estimated based on a minimum mean square error criterion, and the channel estimation can be expressed as:
Figure BDA0003823649220000102
in the formula, the first step is that,
Figure BDA0003823649220000103
represents the minimum mean square error estimate of the channel between the mth AP and the kth UE,
Figure BDA0003823649220000104
is the received signal y of the mth AP p,m Pilot at kth UE
Figure BDA0003823649220000105
Is projected on the surface. c. C mk The expression of (a) is:
Figure BDA0003823649220000106
step S103, associating UE needing service for each AP. And for each AP, sequencing the channel estimation coefficients between the AP and all the UE in a descending order, selecting N UEs with higher channel coefficients for each AP to establish a service relationship, and forwarding the established service relationship information to a CPU. As for the mth AP, there are:
Figure BDA0003823649220000111
then the mth AP is the user s m1 ,s m2 ,…,s mN Service, i.e. T (m) = { s = m1 ,s m2 ,...,s mN And for other users s m,N+1 ,…,s mK The mth AP and its no data transmission have
Figure BDA0003823649220000112
Step S104, in the downlink data transmission stage, the AP terminal regards the channel estimation obtained in the step S102 as a real channel coefficient, carries out conjugate beam forming on data to be transmitted, and then sends the pre-coded data to the UE establishing a connection relation with the current AP at a specific power. The data received by the kth UE may be represented as:
Figure BDA0003823649220000113
in the formula, r d,k Denotes data received by the kth UE in a downlink data transmission phase, P (k) denotes a set of APs serving the kth user, q k K = 1.. K denotes a symbol to be transmitted to the kth UE, q k Satisfy the requirement of
Figure BDA0003823649220000114
w d,k K = 1.. K is additive complex gaussian noise with mean 0 and variance 1, i.e. K is a complex gaussian noise with mean 0 and variance 1
Figure BDA0003823649220000115
Power control coefficient eta mk The following constraints are satisfied:
Figure BDA0003823649220000116
as previously mentioned, in the formula,
Figure BDA0003823649220000117
step S105, writing the power distribution problem of the downlink data transmission stage of the cellular-free massive MIMO system taking the user as the center as follows:
Figure BDA0003823649220000118
η mk ≥0,k=1,...,K,m=1,...,M。
in the formula, the first step is that,
Figure BDA0003823649220000119
indicates the transmission rate, SINR, of the k-th UE k K = 1.. K denotes the downlink signal-to-noise ratio of the kth UE, which may be specifically expressed as:
Figure BDA0003823649220000121
specifically, in the present embodiment, fig. 5 shows a specific flow of online training of the duelingdqn network. The method comprises the following steps:
step S401, initializing a large-scale MIMO system environment module taking a user as a center, namely determining distribution and a channel model of an AP and UE; and initializing the intelligent agent module, namely initializing parameters of the DuelingdQN network and a buffer area.
And S402, collecting state transition data. Firstly, inputting the system state into the intelligent agent module, estimating the Q value of the current state by the intelligent agent module, then selecting a power distribution coefficient based on the Q value, transmitting the selected power control coefficient to the large-scale MIMO system environment module for implementation, thereby changing the environment state and obtaining the signal-to-noise ratio gain of a user as reward, and finally saving the parameter of the state transition into the cache region.
And S403, training a network. Randomly extracting a batch of state transition parameters from the cache region, and taking the system state before transition as the input of the intelligent agent module to enable the intelligent agent to sense the state and estimate the accumulated reward value; and then the state after the state transition is used as the input of the intelligent agent module, so that the intelligent agent senses the state and obtains the expected accumulated reward value by combining the reward value information in the state transition.
And S404, updating the network parameters of the DuelingDDQN network by using a back propagation algorithm with the aim of minimizing the mean square error between the accumulated benefit and the expected value. And continuously and repeatedly carrying out the agent-environment interaction operations of S402-S403, thereby continuously updating the network parameters and the data set.
Specifically, in this embodiment, fig. 6 shows a specific flow of offline training of the duelingdqn network. The method comprises the following steps:
step S501, initializing a large-scale MIMO system environment module with a user as a center, namely determining distribution and a channel model of an AP and UE; and initializing an agent module, namely randomly initializing parameters of the DuelingDDQN network, and taking out the first 20% of the data set of the state transition parameters collected in the step S4 as a data set for off-line training.
Step S502, randomly extracting a batch of state transition parameters from an off-line training data set, and taking the system state before transition as the input of an intelligent agent module to enable the intelligent agent to sense the state and estimate an accumulated reward value; and then the state after the state transition is used as the input of the intelligent agent module, so that the intelligent agent senses the state and obtains the expected accumulated reward value by combining the reward value information in the state transition. Updating network parameters of the Dueling DDQN network by using a back propagation algorithm with the aim of minimizing the mean square error between the calculated accumulated benefit and the expected value.
And S503, continuously repeating the step S502, and updating parameters of the DuelingDDQN network by using the offline data set until the number of training steps reaches 10000 rounds.
Specifically, in this embodiment, the cumulative benefit curve of the Dueling DDQN network trained with the offline reinforcement learning algorithm is shown in FIG. 7. Fig. 7 shows an offline training curve of the network when M =10,k =6,n = 4. The abscissa of fig. 7 represents the number of rounds of training and the ordinate represents the normalized benefit value. The prize value begins to level off by 200 rounds of training until the prize value converges to substantially 0.71 at 400 rounds. This shows that the off-line reinforcement learning-based user-centric massive MIMO power allocation algorithm provided in this embodiment can obtain a better convergence effect even when training is performed on only 20% of the online training buffer, and can design a suitable power allocation coefficient, which is helpful for improving the energy efficiency of a user-centric cellular-free massive MIMO system.
In conclusion, the invention realizes the power distribution method in the user-centered cellular-free large-scale MIMO system based on the off-line reinforcement learning. The method comprises the steps of modeling an optimization problem process of power distribution into an MDP model by determining a system state, an action space, a transition probability and an incentive value in the optimization problem, then constructing an offline learning algorithm which is composed of a non-cellular large-scale MIMO environment module taking a user as a center, an online learning module and an offline learning module, and continuously optimizing parameters of a deep neural network by utilizing a back propagation algorithm to obtain a power control coefficient which maximizes the sum of user rates in the system. The invention adopts a large-scale MIMO system model without honeycomb which takes the user as the center, and can improve the energy efficiency of the system on the premise of ensuring the service quality; the invention provides an off-line reinforcement learning-based algorithm, and an off-line data set is used for training an agent to obtain a distribution scheme of a power distribution coefficient. The algorithm can be deployed to an actual scene by only once online training and can realize dynamic regulation and control of the power distribution coefficient by using an offline training algorithm.
The details of the present invention are well known to those skilled in the art.
It is to be understood that the present invention has been described with reference to certain embodiments and that various changes in form and details may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (10)

1. A user-centered cellular-free massive MIMO power distribution method based on offline reinforcement learning is characterized by comprising the following steps:
s1, modeling a non-cellular large-scale MIMO system with a user as a center, determining a service relation between a wireless Access Point (AP) and User Equipment (UE) according to channel estimation of an uplink, taking a power control coefficient in a downlink data transmission stage as an optimization object, and taking the sum of maximized downlink rates as a target to put forward an optimization problem;
s2, according to the optimization problem in the step S1, modeling the optimization process of the power control coefficient in the downlink data transmission stage into a Markov decision process, and determining the state transition, the action space, the strategy and the reward of the Markov decision process;
s3, providing a power distribution algorithm model based on deep reinforcement learning, wherein the model comprises a large-scale MIMO system environment module and an intelligent agent module; the massive MIMO system environment module is used for simulating a channel model and a downlink data transmission model of a cellular-free massive MIMO system with a user as a center, and the agent module is used for sensing the current system state, estimating a Q value of a power distribution strategy and selecting an optimal power distribution coefficient; the core of the intelligent agent module is a deep neural network, and the training mode of the deep neural network comprises early-stage online training and offline training in an application period;
s4, training a deep neural network on line; in the online training stage, before training the deep neural network based on parameters in a data set, a state transition parameter needs to be collected first to update the data set; after a large-scale MIMO system is initialized, firstly, inputting the state of the system into the deep neural network, then selecting a power control coefficient for the current AP based on the Q value output by the deep neural network, implementing a power control strategy in an environment, thereby changing the environment state and obtaining reward, and storing the state transition information of the time; then randomly extracting a batch of data from the data set, respectively calculating an accumulated reward value and an expected value by using a deep neural network, and updating parameters of the deep neural network by taking the mean square error of the minimized reward value and the expected value as a target;
s5, training a Dueling DDQN network in an off-line manner based on the state transition data set collected in the S4; and (4) taking the first 20% of the state transition data set in the step S4 as an offline training data set, taking a batch of data from the offline data set each time and inputting the data into the deep neural network, respectively calculating the cumulative reward value and the expected value by using the deep neural network, and updating the parameters of the deep neural network by taking the mean square error of the minimal reward value and the expected value as a target, so that the intelligent module can select the optimal power control coefficient.
2. The off-line reinforcement learning-based user-centric cellular-free massive MIMO power distribution method according to claim 1, wherein in the step S1, the constructing a user-centric massive MIMO system specifically comprises:
step S101, firstly setting a distribution area of a scene, setting N UEs to be served by each AP, wherein M APs and K UEs are randomly distributed, and then establishing large-scale fading and small-scale fading models of channels between the APs and the UEs;
step S102, modeling the uplink training phase, specifically comprising:
firstly, distributing an orthogonal pilot frequency sequence for UE, then enabling the UE to forward the pilot frequency sequence to each AP, and after receiving data, estimating a channel coefficient between the AP and the UE based on a minimum mean square error criterion;
step S103, associating the UE needing service for each AP, which specifically includes:
for each AP, arranging channel estimation coefficients between the AP and all the UEs in a descending order, selecting N UEs with the highest channel coefficients for each AP to establish a service relationship, and forwarding the established service relationship information to a CPU;
step S104, modeling the downlink data transmission phase, specifically comprising:
and the AP terminal regards the channel estimation obtained in the step S102 as a real channel coefficient, carries out conjugate beam forming on data to be transmitted, and then sends the precoded data to the UE establishing a connection relation with the current AP by power.
3. The off-line reinforcement learning-based user-centric large-scale MIMO power allocation method according to claim 2, wherein in step S1, the optimization problem in step S1 is constructed based on user snr, transmission rate and power limitation condition in downlink data transmission phase.
4. The off-line reinforcement learning-based user-centric large-scale MIMO power allocation method according to claim 3, wherein the SNR expression of the downlink data transmission stage is:
Figure FDA0003823649210000021
in the formula, SINR k K = 1.. K denotes the signal-to-noise ratio of the kth user, β mk Representing the large scale fading of the channel between the mth AP and the kth UE;
Figure FDA0003823649210000022
represents the normalized signal-to-noise ratio of the pilot symbols,
Figure FDA0003823649210000023
indicating the pilot sequence of the kth UE, η mk M = 1., M, K = 1.,. K denotes a power control coefficient between the mth AP and the kth UE, P (K), K = 1.,. K denotes a set serving the kth user AP; in the formula, the content of the active carbon is shown in the specification,
Figure FDA0003823649210000024
wherein
Figure FDA0003823649210000025
Minimum mean square error estimate, τ, representing the channel between the mth AP and the kth UE cf To representNumber of uplink training samples in a coherence interval, c mk The expression of (a) is:
Figure FDA0003823649210000031
5. the off-line reinforcement learning-based user-centric cell-free massive MIMO power allocation method according to claim 4, wherein the expression of the transmission rate in the downlink data transmission stage is:
Figure FDA0003823649210000032
in the formula, the first step is that,
Figure FDA0003823649210000033
indicates the transmission rate, SINR, of the k-th UE k K = 1.. K denotes the downlink signal-to-noise ratio of K UEs.
6. The off-line reinforcement learning-based user-centric cell-free massive MIMO power distribution method according to claim 5, wherein the power distribution optimization problem has an expression as follows:
Figure FDA0003823649210000034
η mk ≥0,k=1,...,K,m=1,...,M;
wherein, T (M), M = 1., M denotes an index set of APs establishing a connection relationship with the mth AP, and N indexes are shared in the index set, which means that each AP serves N UEs.
7. The off-line reinforcement learning-based user-centric large-scale MIMO power allocation method according to claim 6, wherein the step S2 of modeling the optimization process of the power control coefficients in the downlink data transmission stage as a markov decision process specifically comprises:
step S201, modeling the optimization step of the power distribution coefficient in the system into a sequential decision process, wherein the process has elements including states, actions, transfer strategies and rewards; in the process, each step is used for selecting a power distribution coefficient for an AP in a large-scale MIMO system taking a user as a center;
step S202, a system state is set, the system state describes the signal-to-noise ratio condition of a user under the current power distribution strategy, and an AP optimized power control coefficient at the current moment is appointed; setting the current system state as the mth AP to update the power control coefficient, the parameter eta will be updated mk ,k∈T(m);
Step S203, setting an action space, wherein the action space is a limited set, and numbers in the set describe all selectable values of the power control coefficient;
step S204, setting a state transition probability, wherein the state transition probability describes the probability that the environment is changed into a new state after a power distribution strategy is implemented on a large-scale MIMO system taking a user as a center, and the value of the state transition probability is [0,1];
and S205, setting a reward, wherein the reward describes the gain of the sum of the transmission rates of the K users after a power distribution strategy is implemented on the large-scale MIMO system taking the users as the center.
8. The off-line reinforcement learning-based user-centric cell-free massive MIMO power distribution method according to claim 7, wherein the system state in step S202 is represented as S t =[SINR,c]The component belongs to S, wherein SINR is the signal-to-noise ratio of a user and is a K-dimensional vector; the specific expression is as follows:
SINR=[SINR 1 ,...,SINR k ,...,SINR K ],
c is a one-hot code for indicating an AP index value, and its specific expression is:
Figure FDA0003823649210000041
e m the M-dimensional vector is 1 in the M dimension, and the other dimensions are 0, which indicates that the power control coefficient needs to be updated for the M-th AP currently, so that the intelligent agent can update the parameter eta of the large-scale MIMO environment centering on the user at the current moment mk ,k∈T(m),
Figure FDA0003823649210000042
Namely, for the UE establishing the service relationship with the mth AP, the power control coefficients between the UE and the current AP are updated; setting the power control coefficient between the UE which does not establish service relationship with the mth AP and the mth AP to be 0;
in step S203, the motion space is a t =(η m1m2 ,…,η mK ) Wherein, in the step (A),
Figure FDA0003823649210000043
it is described that the value of the power coefficient of the UE which does not establish a service relationship with the AP can only be 0; eta mk E {0.1,0.4,0.7,1.0}, M =1,.
9. The off-line reinforcement learning-based user-centric large-scale MIMO power allocation method according to claim 8, wherein the on-line training process of the dulling DDQN network in step S4 specifically includes:
step S401, initializing a large-scale MIMO system environment module taking a user as a center, namely determining distribution and a channel model of an AP and UE; initializing an agent module, namely initializing parameters and a buffer area of a Dueling DDQN network;
s402, collecting state transition data; firstly, inputting a system state into the intelligent agent module, estimating a Q value of the current state by the intelligent agent module, then selecting a power distribution coefficient based on the Q value, transmitting the selected power control coefficient to the large-scale MIMO system environment module for implementation, thereby changing the environment state and obtaining a user signal-to-noise ratio gain as a reward, and finally storing a parameter of the state transition into the cache region;
step S403, training a network; randomly extracting a batch of state transition parameters from the cache region, and taking the system state before transition as the input of the intelligent agent module to enable the intelligent agent to sense the state and estimate the accumulated reward value; then the state after the state transition is used as the input of the intelligent agent module, so that the intelligent agent senses the state and obtains an expected accumulated reward value by combining the reward value information in the state transition;
step S404, updating the network parameters of the Dueling DDQN network by using a back propagation algorithm with the aim of minimizing the mean square error between the accumulated income and the expected value; and continuously and repeatedly carrying out the agent-environment interaction operation from the step S402 to the step S403, thereby continuously updating the network parameters and the data set.
10. The off-line reinforcement learning-based user-centric large-scale MIMO power allocation method according to claim 9, wherein the off-line training process of step S5 specifically comprises:
step S501, initializing a large-scale MIMO system environment module taking a user as a center, namely determining distribution and a channel model of an AP and UE; initializing an agent module, namely randomly initializing parameters of the Dueling DDQN network, and taking out the first 20% of data of the state transition parameter data set collected in the step S4 as a data set for offline training;
step S502, randomly extracting a batch of state transition parameters from an off-line training data set, and taking the system state before transition as the input of an intelligent agent module to enable the intelligent agent to sense the state and estimate an accumulated reward value; then the state after the state transition is used as the input of the intelligent agent module, so that the intelligent agent senses the state and obtains an expected accumulated reward value by combining the reward value information in the state transition; updating the network parameters of the Dueling DDQN network by utilizing a back propagation algorithm with the aim of minimizing the mean square error between the accumulated income and the expected value;
and S503, continuously repeating the step S502, updating parameters of the Dueling DDQN network by using the offline data set until the signal-to-noise ratio gain of the user converges to a certain value, and stopping network training.
CN202211051651.4A 2022-08-31 2022-08-31 Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method Pending CN115412134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211051651.4A CN115412134A (en) 2022-08-31 2022-08-31 Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211051651.4A CN115412134A (en) 2022-08-31 2022-08-31 Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method

Publications (1)

Publication Number Publication Date
CN115412134A true CN115412134A (en) 2022-11-29

Family

ID=84162736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211051651.4A Pending CN115412134A (en) 2022-08-31 2022-08-31 Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method

Country Status (1)

Country Link
CN (1) CN115412134A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116032332A (en) * 2022-12-30 2023-04-28 哈尔滨工程大学 ICGAN network-based large-scale MIMO system detection model construction method
CN116761150A (en) * 2023-08-18 2023-09-15 华东交通大学 High-speed rail wireless communication method based on AP and STAR-RIS unit selection
WO2024140512A1 (en) * 2022-12-30 2024-07-04 维沃移动通信有限公司 Data set determination method, information transmission method, apparatus, and communication device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798842A (en) * 2019-01-31 2020-02-14 湖北工业大学 Heterogeneous cellular network flow unloading method based on multi-user deep reinforcement learning
US20210356923A1 (en) * 2020-05-15 2021-11-18 Tsinghua University Power grid reactive voltage control method based on two-stage deep reinforcement learning
CN114268348A (en) * 2021-12-21 2022-04-01 东南大学 Honeycomb-free large-scale MIMO power distribution method based on deep reinforcement learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798842A (en) * 2019-01-31 2020-02-14 湖北工业大学 Heterogeneous cellular network flow unloading method based on multi-user deep reinforcement learning
US20210356923A1 (en) * 2020-05-15 2021-11-18 Tsinghua University Power grid reactive voltage control method based on two-stage deep reinforcement learning
CN114268348A (en) * 2021-12-21 2022-04-01 东南大学 Honeycomb-free large-scale MIMO power distribution method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李孜恒;孟超;: "基于深度强化学习的无线网络资源分配算法", 通信技术, no. 08, 10 August 2020 (2020-08-10) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116032332A (en) * 2022-12-30 2023-04-28 哈尔滨工程大学 ICGAN network-based large-scale MIMO system detection model construction method
CN116032332B (en) * 2022-12-30 2024-04-12 哈尔滨工程大学 Large-scale MIMO system detection model construction method suitable for changeable channel state information
WO2024140512A1 (en) * 2022-12-30 2024-07-04 维沃移动通信有限公司 Data set determination method, information transmission method, apparatus, and communication device
CN116761150A (en) * 2023-08-18 2023-09-15 华东交通大学 High-speed rail wireless communication method based on AP and STAR-RIS unit selection
CN116761150B (en) * 2023-08-18 2023-10-24 华东交通大学 High-speed rail wireless communication method based on AP and STAR-RIS unit selection

Similar Documents

Publication Publication Date Title
Cao et al. Deep reinforcement learning for multi-user access control in non-terrestrial networks
CN115412134A (en) Off-line reinforcement learning-based user-centered non-cellular large-scale MIMO power distribution method
Huang et al. Deep learning-based sum data rate and energy efficiency optimization for MIMO-NOMA systems
CN109617584B (en) MIMO system beam forming matrix design method based on deep learning
CN108924935A (en) A kind of power distribution method in NOMA based on nitrification enhancement power domain
Chu et al. Power control in energy harvesting multiple access system with reinforcement learning
CN103763782B (en) Dispatching method for MU-MIMO down link based on fairness related to weighting users
CN111431646B (en) Dynamic resource allocation method in millimeter wave system
CN114337976A (en) Transmission method combining AP selection and pilot frequency allocation
CN114268348A (en) Honeycomb-free large-scale MIMO power distribution method based on deep reinforcement learning
Li et al. Deep reinforcement learning for energy-efficient beamforming design in cell-free networks
CN104009824B (en) Pilot aided data fusion method based on differential evolution in a kind of base station collaboration up-line system
CN114302487A (en) Energy efficiency optimization method, device and equipment based on adaptive particle swarm power distribution
Sun et al. Hierarchical reinforcement learning for AP duplex mode optimization in network-assisted full-duplex cell-free networks
Cui et al. Hierarchical learning approach for age-of-information minimization in wireless sensor networks
CN114745032B (en) Honeycomb-free large-scale MIMO intelligent distributed beam selection method
Ying et al. Heterogeneous massive MIMO with small cells
CN113595609B (en) Collaborative signal transmission method of cellular mobile communication system based on reinforcement learning
CN114710187A (en) Power distribution method for multi-cell large-scale MIMO intelligent communication under dynamic user number change scene
CN114844537A (en) Deep learning auxiliary robust large-scale MIMO transceiving combined method
CN110086591B (en) Pilot pollution suppression method in large-scale antenna system
Liu et al. A reinforcement learning approach for energy efficient beamforming in noma systems
Li et al. Distributed RIS-enhanced cell-free NOMA networks
CN113038583B (en) Inter-cell downlink interference control method, device and system
CN114884545B (en) Real-time power distribution method for multi-cell large-scale MIMO system based on intelligent optimization algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination