CN113472472B - Multi-cell collaborative beam forming method based on distributed reinforcement learning - Google Patents

Multi-cell collaborative beam forming method based on distributed reinforcement learning Download PDF

Info

Publication number
CN113472472B
CN113472472B CN202110768826.2A CN202110768826A CN113472472B CN 113472472 B CN113472472 B CN 113472472B CN 202110768826 A CN202110768826 A CN 202110768826A CN 113472472 B CN113472472 B CN 113472472B
Authority
CN
China
Prior art keywords
base station
channel
reinforcement learning
dqn
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110768826.2A
Other languages
Chinese (zh)
Other versions
CN113472472A (en
Inventor
高贞贞
廖学文
吴丹青
张金
罗伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Guotian Electronic Technology Co ltd
Original Assignee
Hunan Guotian Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Guotian Electronic Technology Co ltd filed Critical Hunan Guotian Electronic Technology Co ltd
Priority to CN202110768826.2A priority Critical patent/CN113472472B/en
Publication of CN113472472A publication Critical patent/CN113472472A/en
Application granted granted Critical
Publication of CN113472472B publication Critical patent/CN113472472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J11/00Orthogonal multiplex systems, e.g. using WALSH codes
    • H04J11/0023Interference mitigation or co-ordination
    • H04J11/005Interference mitigation or co-ordination of intercell interference
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0617Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0837Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
    • H04B7/0842Weighted combining
    • H04B7/086Weighted combining using weights depending on external parameters, e.g. direction of arrival [DOA], predetermined weights or beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/24Cell structures
    • H04W16/28Cell structures using beam steering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a multi-cell collaborative beam forming method based on distributed reinforcement learning, which comprises the following steps: establishing a weight theta for a base station j j And a weight θ' j Is a target DQN of (a) and an empty experience pool M j The method comprises the steps of carrying out a first treatment on the surface of the Initializing training DQN with random weights; the following steps are repeated every M slots: the base stations exchange channel state information of all users; each base station generates global channel samples of a plurality of groups of M time slots in the future; each base station takes action randomly and stores the corresponding experience in its experience pool M j In (a) and (b); each base station performs network training. The invention can be superior to a greedy scheme and a random scheme which are compared under the condition of extremely low cost, and the performance is close to the optimal scheme which needs global information.

Description

Multi-cell collaborative beam forming method based on distributed reinforcement learning
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to a multi-cell collaborative beam forming method based on distributed reinforcement learning.
Background
Conventional mobile communication systems typically employ cellular architecture designs that can improve throughput and save power consumption in cellular scenarios. However, since cells share a band portion of the spectrum between them, this may result in severe inter-cell interference, thereby compromising system capacity. Multi-cell co-beamforming is considered as one of the key technologies for interference management because it is possible to mitigate inter-cell interference and maximize system capacity by jointly controlling the transmit power and beamforming of neighboring base stations.
In general, the system capacity of a cellular communication system is represented by the sum of the achievable rates, i.e., sum rate, of all users. Clearly, maximization and rate are a NP-hard and non-convex problem, and thus it is difficult to obtain an optimal solution. Based on some optimization techniques, many suboptimal methods have been developed to address this problem, such as fractional planning algorithms, weighted minimum mean square error algorithms, and branch-and-bound algorithms. These algorithms may approach optimal performance, however they all have to know global channel state information and require multiple iterations to calculate the optimal solution, so the high overhead and computational complexity of these schemes when actually performed are intolerable. Distributed reinforcement learning has proven to be an emerging effective technique to address various problems in the communication and networking arts, such as internet of things, heterogeneous networks, and unmanned aerial vehicle networks. In these networks, agents (e.g., base stations) make their own decisions based on local information to optimize network performance. Studies of collaborative beamforming based on reinforcement learning are in the onset phase. Beams Ying Chang of university of electronics and technology, etc. train each base station to train its own deep Q network for a multi-cell multiple-input single-output system based on distributed deep reinforcement learning, using appropriate beam vectors and transmit power based on local information and limited information exchanged between neighboring base stations. The shift Jiang equivalent of tokyo industrial university aims at a multi-cell multi-input single-output system, and obtains the transmitting power and the beam vector based on the distributed reinforcement learning input global channel information. However, this approach requires that each base station be aware of the global channel state information, greatly increasing the overhead in execution.
Disclosure of Invention
In view of this, the present invention proposes a multi-cell multi-input single-output collaborative beamforming method based on distributed reinforcement learning, which enables each base station to train its own deep Q learning network (DQN), input channel information for base station-service users, and output optimal transmit power and codeword index by using sum rate as reward training. Meanwhile, the invention enables the base station to exchange channel state information once every fixed time slot, and the base station forms global channel state information according to the received channel information of other base stations to generate future channel sample retraining network so as to achieve the purpose of improving generalization of the network in the future fixed time slot and further improving network performance.
In order to achieve the above purpose, the present invention provides a low-overhead high-performance collaborative beamforming method based on global information training local information execution and channel prediction training. Consider a multi-cell multi-input single-output scenario with K cells, each base station equipped with N t A root antenna, each user is equipped with a single antenna. Each base station only serves one user on the same time-frequency resource, and each user receives a useful signal from the serving base station and interference signals from other base stations.
Specifically, the multi-cell collaborative beamforming method based on distributed reinforcement learning disclosed by the invention comprises the following steps of:
s1: for base station j, j E [1, K]Establishing a weight of theta j And a weight θ' j Is a target DQN of (a) and an empty experience pool M j
S2: initializing training DQN with random weights to let θ j =θ,j∈[1,K];
S3: repeating steps S4 to S7 every M slots;
s4: the base stations exchange channel state information of all users;
s5: each base station generates global channel samples of a plurality of groups of M time slots in the future;
s6: each base station takes action randomly and takes corresponding experience<s j ,a j ,r j ,s′ j >Stored in its experience pool M j In (a) and (b);
s7: each base station performs network training.
Further, the step of network training includes:
s70: the base station j observes its state in time slot t, s j (t),j∈[1,K];
S71: at time slot t, base station j is based on state s j (t) selecting an action according to the ε -greedy policy, a j ,j∈[1,K];
S72: according to s j And a j ,j∈[1,K]Calculating a global prize r (t);
s73: base station j observes its new state s 'at time slot t+1' j (t),j∈[1,K];
S74: base station j will experience<s j (t),a j (t),r j (t),r′ j (t)>Stored in its experience pool M j In j E [1, K];
S75: base station j is from experience pool M j Sampling to obtain small batches of samples;
s76: base station j updates its training DQN weights θ using reverse propagation j ,j∈[1,K];
S77: every other T of base station j step Time slot update once target DQN weight θ' j ,j∈[1,K];
S78: until convergence or maximum training times are reached.
Further, each slot base-to-user channel is modeled as a rayleigh channel, and the channels between adjacent slots are considered correlated and can be expressed as:
Figure BDA0003151759030000031
wherein,,
Figure BDA0003151759030000032
representing the channel from the jth base station to the kth user in the t time slot, h j,k (0) Each element in (a) obeys independent complex Gaussian distribution with mean value of 0 and variance of 1, e j,k (t) represents channel independent white gaussian noise, wherein each element is also subject to an independent complex gaussian distribution with a mean of 0 and a variance of 1, and ρ represents the correlation coefficient representing the rayleigh fading vector between adjacent time slots.
Further, when the time slot t itself is interacted between the base stations to the channel information h of all users 1,1 (t),h 1,2 (t),…,h 1,K (t)]After thatEach base station forms a global channel state information matrix, H (t) = [ H ] 1,1 (t),h 1,2 (t),…,h 1,K (t),…,h K,K (t)]And generating global channels for multiple sets of future M time slots using correlation of adjacent channels
Figure BDA0003151759030000033
n∈[1,N]N is the number of global channel groups generated.
Further, global information is needed to guide training when training, and only local information is needed when executing.
Further, the state of each DQN network is based on channel state information h from base station j to its own serving user j j,j (t) means for converting the channel state information h by using I/Q conversion j,j (t) dividing into in-phase and quadrature components, forming said in-phase and quadrature vectors into vectors
Figure BDA0003151759030000041
The state of the DQN network of base station j is expressed as:
Figure BDA0003151759030000042
further, the beamforming vector w transmitted by the base station j And (t) is a continuous complex vector:
Figure BDA0003151759030000043
wherein,,
Figure BDA0003151759030000044
normalized beamforming vector, p, taken for base station j j (t) is the transmit power of base station j;
discretizing the normalized beamforming vector using a selection of codewords from a codebook, defining an available transmit power set for each base station for transmit power
Figure BDA0003151759030000045
Figure BDA0003151759030000046
Wherein p is max For maximum transmit power of base station, Q pow Is an available discrete power level;
the actions of the DQN network of base station j are:
a j ={(P j ,c j ),p j ∈P,c j ∈C}
where p and c are the transmit power taken and the codeword index respectively,
Figure BDA0003151759030000047
for the codeword index set, Q code Is the number of codewords in the codebook.
Further, the received signal of user j at time t is expressed as:
Figure BDA0003151759030000048
wherein the method comprises the steps of
Figure BDA0003151759030000049
Additive white gaussian noise at user j;
the rate of user j is expressed as:
C j (w j (t))=log(1+SINR j (t))
wherein SINR j The signal-to-interference-and-noise ratio of the base station j at the time slot t is expressed as:
Figure BDA0003151759030000051
the sum rate is expressed as:
Figure BDA0003151759030000052
the rewards of the DQN network of base station j are:
r j (t)=C(t)。
further, the M is an integer from 0 to 300.
The beneficial effects of the invention are as follows:
every M time slots, the base stations exchange channel state information from the base stations to all users, channels of a plurality of groups of future M time slots are generated according to the obtained global channel state information, and the generated future channel samples are used for training the distributed reinforcement learning network, so that the aim of improving the network performance is achieved. The distributed reinforcement learning method and the distributed reinforcement learning system have the advantages that the distributed reinforcement learning is input into the channel state information from the base station to the service user, no information exchange between the base stations is needed, and the cost in execution is greatly reduced.
Simulation proves that the performance of the method is superior to that of a greedy scheme and a random scheme which are compared under the condition of extremely low cost, and the method is close to an optimal scheme requiring global information.
Drawings
Fig. 1 is a schematic diagram of a three-cell MISO cooperative beamforming system model;
FIG. 2 is a frame diagram of the present invention;
FIG. 3 is a flow chart of the present invention;
fig. 4 is a plot of sum rate versus time slot number M for the present invention and other schemes.
Detailed Description
The invention is further described below with reference to the accompanying drawings, without limiting the invention in any way, and any alterations or substitutions based on the teachings of the invention are intended to fall within the scope of the invention.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, consider a multi-cell multiple-input single-output scenario with K cells, each base station equipped with N t A root antenna, each user is equipped with a single antenna. Each base station only serves one user on the same time-frequency resource, eachThe user receives both the useful signal from the serving base station and the interfering signal from the other base stations. The whole cooperative beamforming process is described as shown in fig. 2, and is described as follows:
first, every M time slots, the base stations exchange channel state information from the base station to all users, so that each base station generates global channel state information of the future M time slots according to the received global channel state information to train. Each slot base-to-user channel is modeled as a rayleigh channel, and the channels between adjacent slots are considered correlated and can be expressed as:
Figure BDA0003151759030000061
wherein,,
Figure BDA0003151759030000062
representing the channel from the jth base station to the kth user in the t time slot, h j,k (0) Each element in (a) obeys independent complex Gaussian distribution with mean value of 0 and variance of 1, e j,k (t) represents channel independent white gaussian noise, wherein each element is also subject to an independent complex gaussian distribution with a mean of 0 and a variance of 1, and ρ represents the correlation coefficient representing the rayleigh fading vector between adjacent time slots. When the interaction time slot t between base stations reaches the channel information h of all users 1,1 (t),h 1,2 (t),…,h 1,K (t)]Thereafter, each base station may form a global channel state information matrix, H (t) = [ H ] 1,1 (t),h 1,2 (t),…,h 1,K (t),…,h K,K (t)]. It can be seen from equation 1 that knowing H (t) and ρ, the correlation of adjacent channels can be used to generate global channels +.>
Figure BDA0003151759030000063
n∈[1,N]N is the number of global channel groups generated.
Then, to improve network performance, the present invention trains the DQN network for each base station using the generated global channel state information. The invention defines a distributed DQNThree elements of the network, namely status, action and rewards. Channel state information h of each DQN network from base station j to self-service user j j,j (t) because the neural network cannot process complex numbers, the invention adopts I/Q transformation to divide complex vector into in-phase (real part) and quadrature (imaginary part) components, and the two vectors form a vector
Figure BDA0003151759030000064
Thus, the state of the DQN network of base station j is:
Figure BDA0003151759030000071
the action of the DQN network is typically a set of discrete real values, the transmitted beamforming vector w j And (t) is a continuous complex vector. Thus, the present invention discretizes the continuous complex vector. The beamforming vector is composed of two parts, as shown in the following equation:
Figure BDA0003151759030000072
Figure BDA0003151759030000073
normalized beamforming vector, p, taken for base station j j And (t) is the transmission power of the base station j. The present invention discretizes the normalized beamforming vector by selecting codewords from the codebook, and defines the available transmit power set for each base station for transmit power +.>
Figure BDA0003151759030000074
Figure BDA0003151759030000075
Wherein p is max For maximum transmit power of base station, Q pow Is a discrete power level available. The actions of the DQN network of base station j are therefore:
a j ={(p j ,c j ),p j ∈P,c j ∈C} (4)
where p and c are the transmit power taken and the codeword index respectively,
Figure BDA0003151759030000076
for the codeword index set, Q code Is the number of codewords in the codebook.
The rewards of the DQN network are sum rates, and the sum rates are indexes calculated according to global information, so that the purpose of training local information execution by the global information can be achieved by using the sum rates as rewards, the performance can be improved, and the network can be converged more quickly. In a multi-cell multiple-input single-output scenario, each user shares the same frequency band as users in other cells, and there is inter-cell interference. The received signal for user j at time t slot can be expressed as:
Figure BDA0003151759030000077
wherein the method comprises the steps of
Figure BDA0003151759030000078
Is additive white gaussian noise at user j. The rate of user j can be expressed as:
C j (w j (t))=log(1+SINR j (t)) (6)
wherein SINR j The signal-to-interference-and-noise ratio of the base station j at the time slot t is expressed as:
Figure BDA0003151759030000081
the sum rate can be expressed as:
Figure BDA0003151759030000082
the rewards of the DQN network of base station j are therefore:
r j (t)=C(t) (9)
as shown in fig. 3 and algorithm 1, the pseudo code of the distributed reinforcement learning method of the present invention is as follows:
algorithm 1. Distributed reinforcement learning method pseudo code
Figure BDA0003151759030000083
Figure BDA0003151759030000091
In order to verify the performance of the collaborative beamforming scheme based on distributed reinforcement learning, the invention carries out the following simulation:
the channel parameters are assumed to follow a standard unit complex gaussian random distribution. The source node transmitting power is P, the noise variance at the destination node is
Figure BDA0003151759030000092
Assuming that the base station and the user perform accurate channel estimation, fig. 4 shows a curve of the sum rate of the greedy scheme and the random scheme according to the time slot number M, in the experiment, there are 4 base stations, 1 antenna for the user, 4 transmitting antennas are provided for each base station, the available discrete power level is 4, and the code number is 4. The greedy scheme is that each base station maximizes the throughput of its serving user, and the random scheme is that each base station randomly selects codeword and transmit power, and the optimal performance obtained by traversing is used as an upper bound. As can be seen from fig. 3, when m=1, the performance of the present invention can reach 95% of the optimal performance, and as the number of slots M increases, the performance of the present invention decreases. The number M of time slots is [0,300 ]]Throughout this range, the present invention is superior to greedy and random schemes.
The beneficial effects of the invention are as follows:
every M time slots, the base stations exchange channel state information from the base stations to all users, channels of a plurality of groups of future M time slots are generated according to the obtained global channel state information, and the generated future channel samples are used for training the distributed reinforcement learning network, so that the aim of improving the network performance is achieved. The distributed reinforcement learning method and the distributed reinforcement learning system have the advantages that the distributed reinforcement learning is input into the channel state information from the base station to the service user, no information exchange between the base stations is needed, and the cost in execution is greatly reduced.
Simulation proves that the performance of the method is superior to that of a greedy scheme and a random scheme which are compared under the condition of extremely low cost, and the method is close to an optimal scheme requiring global information.
The embodiment of the present invention is an implementation manner of the present invention, but the implementation manner of the present invention is not limited by the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications made by the spirit and principle of the present invention should be equivalent substitution manner, and all the changes, substitutions, combinations, and simplifications are included in the protection scope of the present invention.

Claims (7)

1. The multi-cell collaborative beam forming method based on distributed reinforcement learning is characterized by comprising the following steps of:
s1: for base station j, j E [1, K]Establishing a weight of theta j And a weight θ j ' target DQN and an empty experience pool M j
S2: initializing training DQN with random weight θ, let θ be j =θ,j∈[1,K];
S3: repeating steps S4 to S7 every M slots;
s4: the base stations exchange channel state information of all users;
s5: each base station generates global channel samples of a plurality of groups of M time slots in the future;
s6: each base station takes action randomly and takes corresponding experience<s j ,a j ,r j ,s j ′>Stored in its experience pool M j In (a) and (b);
s7: each base station performs network training;
the step of network training comprises the following steps:
s70: base station j observes that it is atStatus s of time slot t j (t),j∈[1,K];
S71: at time slot t, base station j is based on state s j (t) selecting action a according to epsilon-greedy policy j ,j∈[1,K];
S72: according to s j And a j ,j∈[1,K]Calculating global rewards r j (t);
S73: base station j observes its new state s at time slot t+1 j ′(t),j∈[1,K];
S74: base station j will experience<s j (t),a j (t),r j (t),s j ′(t)>Stored in its experience pool M j In j E [1, K];
S75: base station j is from experience pool M j Sampling to obtain small batches of samples;
s76: base station j updates its training DQN weights θ using reverse propagation j ,j∈[1,K];
S77: every other T of base station j step Time slot update once target DQN weight θ j ′,j∈[1,K];
S78: until convergence or maximum training times are reached;
the received signal for user j at time t is expressed as:
Figure FDA0004164431690000021
wherein the method comprises the steps of
Figure FDA0004164431690000022
Is additive Gaussian white noise at user j, h j,j (t) is channel state information from base station j to self-service user j, w j (t) is a beamforming vector transmitted by base station j;
the rate of user j is expressed as:
C j (w j (t))=log(1+SINR j (t))
wherein SINR j The signal-to-interference-and-noise ratio of the base station j at the time slot t is expressed as:
Figure FDA0004164431690000023
wherein sigma 2 Is the noise variance;
the sum rate is expressed as:
Figure FDA0004164431690000024
the rewards of the DQN network of base station j are:
r j (t)=C(t)。
2. the distributed reinforcement learning based multi-cell collaborative beamforming method according to claim 1 wherein each time slot base station to user channel is modeled as a rayleigh channel and the channels between adjacent time slots are considered correlated expressed as:
Figure FDA0004164431690000025
wherein,,
Figure FDA0004164431690000026
representing the channel from the jth base station to the kth user in the t time slot, h j,k (0) Each element in (a) obeys independent complex Gaussian distribution with mean value of 0 and variance of 1, e j,k (t) represents channel independent white gaussian noise, wherein each element is also subject to an independent complex gaussian distribution with a mean of 0 and a variance of 1, and ρ represents the correlation coefficient representing the rayleigh fading vector between adjacent time slots.
3. The multi-cell collaborative beamforming method based on distributed reinforcement learning according to claim 2 wherein when interacting time slots t self to all users channel information h between base stations 1,1 (t),h 1,2 (t),…,h 1,K (t)]Thereafter, each base station constitutesA global channel state information matrix, H (t) = { H 1,1 (t),h 1,2 (t),…,h 1,K (t),…,h K,K (t)]And generating global channels for multiple sets of future M time slots using correlation of adjacent channels
Figure FDA0004164431690000031
And N is the number of the generated global channel groups.
4. The multi-cell collaborative beamforming method based on distributed reinforcement learning according to claim 1 wherein global information is needed to guide training when training and only local information is needed when performing.
5. The method of multi-cell collaborative beamforming based on distributed reinforcement learning according to claim 2 wherein the status of each DQN network is based on channel status information h from base station j to self-serving user j j,j (t) means for converting the channel state information h by using I/Q conversion j,j (t) dividing into in-phase and quadrature components, forming said in-phase and quadrature vectors into vectors
Figure FDA0004164431690000032
The state of the DQN network of base station j is expressed as:
Figure FDA0004164431690000033
6. the method for multi-cell collaborative beamforming based on distributed reinforcement learning according to claim 2 wherein a beamforming vector w transmitted by a base station j And (t) is a continuous complex vector:
Figure FDA0004164431690000034
wherein,,
Figure FDA0004164431690000041
normalized beamforming vector, p, taken for base station j j (t) is the transmit power of base station j;
discretizing the normalized beamforming vector using a selection of codewords from a codebook, defining an available transmit power set for each base station for transmit power
Figure FDA0004164431690000044
Figure FDA0004164431690000042
Wherein p is max For maximum transmit power of base station, Q pow Is an available discrete power level;
the actions of the DQN network of base station j are:
a j ={(p j ,c j ),p j ∈P,c j ∈C}
wherein p is j And c j The transmit power and codeword index taken are respectively,
Figure FDA0004164431690000043
for the codeword index set, Q code Is the number of codewords in the codebook.
7. The distributed reinforcement learning based multi-cell collaborative beamforming method according to claim 1 wherein M is an integer from 0 to 300.
CN202110768826.2A 2021-07-07 2021-07-07 Multi-cell collaborative beam forming method based on distributed reinforcement learning Active CN113472472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110768826.2A CN113472472B (en) 2021-07-07 2021-07-07 Multi-cell collaborative beam forming method based on distributed reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110768826.2A CN113472472B (en) 2021-07-07 2021-07-07 Multi-cell collaborative beam forming method based on distributed reinforcement learning

Publications (2)

Publication Number Publication Date
CN113472472A CN113472472A (en) 2021-10-01
CN113472472B true CN113472472B (en) 2023-06-27

Family

ID=77879037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110768826.2A Active CN113472472B (en) 2021-07-07 2021-07-07 Multi-cell collaborative beam forming method based on distributed reinforcement learning

Country Status (1)

Country Link
CN (1) CN113472472B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109039412A (en) * 2018-07-23 2018-12-18 西安交通大学 A kind of safe transmission method of physical layer based on random wave bundle figuration
CN110365387A (en) * 2019-07-16 2019-10-22 电子科技大学 A kind of beam selection method of cellular communication system
CN111181619A (en) * 2020-01-03 2020-05-19 东南大学 Millimeter wave hybrid beam forming design method based on deep reinforcement learning
CN111246497A (en) * 2020-04-10 2020-06-05 卓望信息技术(北京)有限公司 Antenna adjustment method based on reinforcement learning
CN111901862A (en) * 2020-07-07 2020-11-06 西安交通大学 User clustering and power distribution method, device and medium based on deep Q network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10694526B2 (en) * 2016-09-30 2020-06-23 Drexel University Adaptive pursuit learning method to mitigate small-cell interference through directionality

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109039412A (en) * 2018-07-23 2018-12-18 西安交通大学 A kind of safe transmission method of physical layer based on random wave bundle figuration
CN110365387A (en) * 2019-07-16 2019-10-22 电子科技大学 A kind of beam selection method of cellular communication system
CN111181619A (en) * 2020-01-03 2020-05-19 东南大学 Millimeter wave hybrid beam forming design method based on deep reinforcement learning
CN111246497A (en) * 2020-04-10 2020-06-05 卓望信息技术(北京)有限公司 Antenna adjustment method based on reinforcement learning
CN111901862A (en) * 2020-07-07 2020-11-06 西安交通大学 User clustering and power distribution method, device and medium based on deep Q network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于强化学习的定向无线通信网络抗干扰资源调度算法;谢添;高士顺;赵海涛;林沂;熊俊;;电波科学学报(04);全文 *

Also Published As

Publication number Publication date
CN113472472A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
Zaher et al. Learning-based downlink power allocation in cell-free massive MIMO systems
Weeraddana et al. Multicell MISO downlink weighted sum-rate maximization: A distributed approach
Ghiasi et al. Energy efficient AP selection for cell-free massive MIMO systems: Deep reinforcement learning approach
Wu et al. Intelligent resource allocation for IRS-enhanced OFDM communication systems: A hybrid deep reinforcement learning approach
Chafaa et al. Self-supervised deep learning for mmWave beam steering exploiting sub-6 GHz channels
Ji et al. Reconfigurable intelligent surface enhanced device-to-device communications
Chen et al. Energy-efficient cell-free massive MIMO through sparse large-scale fading processing
Lima et al. User pairing and power allocation for UAV-NOMA systems based on multi-armed bandit framework
XU et al. Resource Allocation for Two‑Tier RIS‑Assisted Heterogeneous NOMA Networks
Hua et al. Learning-based reconfigurable-intelligent-surface-aided rate-splitting multiple access networks
Sun et al. Hierarchical reinforcement learning for AP duplex mode optimization in network-assisted full-duplex cell-free networks
Chen et al. iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system
Nimmagadda Enhancement of efficiency and performance gain of massive MIMO system using trial-based rider optimization algorithm
Huang et al. Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems
CN113472472B (en) Multi-cell collaborative beam forming method based on distributed reinforcement learning
Sandberg et al. Learning robust scheduling with search and attention
Chen et al. Integrated Beamforming and Resource Allocation in RIS-Assisted mmWave Networks based on Deep Reinforcement Learning
CN101989875B (en) Multi-cell interference suppression method and base station controller
Wang et al. Hybrid OMA/NOMA Mode Selection and Resource Allocation in Space-Air-Ground Integrated Networks
Attaoui et al. Joint beam alignment and power allocation for multi-user NOMA-mmwave systems
Mohamed et al. Spectral Efficiency Improvement in Downlink Fog Radio Access Network With Deep-Reinforcement-Learning-Enabled Power Control
Chen et al. Planning optimization of the distributed antenna system in high‐speed railway communication network based on improved cuckoo search
Akbarpour-Kasgari et al. Deep Reinforcement Learning in mmW-NOMA: Joint Power Allocation and Hybrid Beamforming
Perera et al. Dynamic Spectrum Fusion: An Adaptive Learning Approach for Hybrid NOMA/OMA in Evolving Wireless Networks
Pala et al. Robust Design of RIS-aided Full-Duplex RSMA System for V2X communication: A DRL Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant