CN113472472B - Multi-cell collaborative beam forming method based on distributed reinforcement learning - Google Patents
Multi-cell collaborative beam forming method based on distributed reinforcement learning Download PDFInfo
- Publication number
- CN113472472B CN113472472B CN202110768826.2A CN202110768826A CN113472472B CN 113472472 B CN113472472 B CN 113472472B CN 202110768826 A CN202110768826 A CN 202110768826A CN 113472472 B CN113472472 B CN 113472472B
- Authority
- CN
- China
- Prior art keywords
- base station
- channel
- reinforcement learning
- dqn
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000002787 reinforcement Effects 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000009471 action Effects 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 27
- 239000000654 additive Substances 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000005562 fading Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000006467 substitution reaction Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 235000015429 Mirabilis expansa Nutrition 0.000 description 1
- 244000294411 Mirabilis expansa Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 235000013536 miso Nutrition 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04J—MULTIPLEX COMMUNICATION
- H04J11/00—Orthogonal multiplex systems, e.g. using WALSH codes
- H04J11/0023—Interference mitigation or co-ordination
- H04J11/005—Interference mitigation or co-ordination of intercell interference
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/06—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
- H04B7/0613—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
- H04B7/0615—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
- H04B7/0617—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/02—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
- H04B7/04—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
- H04B7/08—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
- H04B7/0837—Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
- H04B7/0842—Weighted combining
- H04B7/086—Weighted combining using weights depending on external parameters, e.g. direction of arrival [DOA], predetermined weights or beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/24—Cell structures
- H04W16/28—Cell structures using beam steering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a multi-cell collaborative beam forming method based on distributed reinforcement learning, which comprises the following steps: establishing a weight theta for a base station j j And a weight θ' j Is a target DQN of (a) and an empty experience pool M j The method comprises the steps of carrying out a first treatment on the surface of the Initializing training DQN with random weights; the following steps are repeated every M slots: the base stations exchange channel state information of all users; each base station generates global channel samples of a plurality of groups of M time slots in the future; each base station takes action randomly and stores the corresponding experience in its experience pool M j In (a) and (b); each base station performs network training. The invention can be superior to a greedy scheme and a random scheme which are compared under the condition of extremely low cost, and the performance is close to the optimal scheme which needs global information.
Description
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to a multi-cell collaborative beam forming method based on distributed reinforcement learning.
Background
Conventional mobile communication systems typically employ cellular architecture designs that can improve throughput and save power consumption in cellular scenarios. However, since cells share a band portion of the spectrum between them, this may result in severe inter-cell interference, thereby compromising system capacity. Multi-cell co-beamforming is considered as one of the key technologies for interference management because it is possible to mitigate inter-cell interference and maximize system capacity by jointly controlling the transmit power and beamforming of neighboring base stations.
In general, the system capacity of a cellular communication system is represented by the sum of the achievable rates, i.e., sum rate, of all users. Clearly, maximization and rate are a NP-hard and non-convex problem, and thus it is difficult to obtain an optimal solution. Based on some optimization techniques, many suboptimal methods have been developed to address this problem, such as fractional planning algorithms, weighted minimum mean square error algorithms, and branch-and-bound algorithms. These algorithms may approach optimal performance, however they all have to know global channel state information and require multiple iterations to calculate the optimal solution, so the high overhead and computational complexity of these schemes when actually performed are intolerable. Distributed reinforcement learning has proven to be an emerging effective technique to address various problems in the communication and networking arts, such as internet of things, heterogeneous networks, and unmanned aerial vehicle networks. In these networks, agents (e.g., base stations) make their own decisions based on local information to optimize network performance. Studies of collaborative beamforming based on reinforcement learning are in the onset phase. Beams Ying Chang of university of electronics and technology, etc. train each base station to train its own deep Q network for a multi-cell multiple-input single-output system based on distributed deep reinforcement learning, using appropriate beam vectors and transmit power based on local information and limited information exchanged between neighboring base stations. The shift Jiang equivalent of tokyo industrial university aims at a multi-cell multi-input single-output system, and obtains the transmitting power and the beam vector based on the distributed reinforcement learning input global channel information. However, this approach requires that each base station be aware of the global channel state information, greatly increasing the overhead in execution.
Disclosure of Invention
In view of this, the present invention proposes a multi-cell multi-input single-output collaborative beamforming method based on distributed reinforcement learning, which enables each base station to train its own deep Q learning network (DQN), input channel information for base station-service users, and output optimal transmit power and codeword index by using sum rate as reward training. Meanwhile, the invention enables the base station to exchange channel state information once every fixed time slot, and the base station forms global channel state information according to the received channel information of other base stations to generate future channel sample retraining network so as to achieve the purpose of improving generalization of the network in the future fixed time slot and further improving network performance.
In order to achieve the above purpose, the present invention provides a low-overhead high-performance collaborative beamforming method based on global information training local information execution and channel prediction training. Consider a multi-cell multi-input single-output scenario with K cells, each base station equipped with N t A root antenna, each user is equipped with a single antenna. Each base station only serves one user on the same time-frequency resource, and each user receives a useful signal from the serving base station and interference signals from other base stations.
Specifically, the multi-cell collaborative beamforming method based on distributed reinforcement learning disclosed by the invention comprises the following steps of:
s1: for base station j, j E [1, K]Establishing a weight of theta j And a weight θ' j Is a target DQN of (a) and an empty experience pool M j ;
S2: initializing training DQN with random weights to let θ j =θ,j∈[1,K];
S3: repeating steps S4 to S7 every M slots;
s4: the base stations exchange channel state information of all users;
s5: each base station generates global channel samples of a plurality of groups of M time slots in the future;
s6: each base station takes action randomly and takes corresponding experience<s j ,a j ,r j ,s′ j >Stored in its experience pool M j In (a) and (b);
s7: each base station performs network training.
Further, the step of network training includes:
s70: the base station j observes its state in time slot t, s j (t),j∈[1,K];
S71: at time slot t, base station j is based on state s j (t) selecting an action according to the ε -greedy policy, a j ,j∈[1,K];
S72: according to s j And a j ,j∈[1,K]Calculating a global prize r (t);
s73: base station j observes its new state s 'at time slot t+1' j (t),j∈[1,K];
S74: base station j will experience<s j (t),a j (t),r j (t),r′ j (t)>Stored in its experience pool M j In j E [1, K];
S75: base station j is from experience pool M j Sampling to obtain small batches of samples;
s76: base station j updates its training DQN weights θ using reverse propagation j ,j∈[1,K];
S77: every other T of base station j step Time slot update once target DQN weight θ' j ,j∈[1,K];
S78: until convergence or maximum training times are reached.
Further, each slot base-to-user channel is modeled as a rayleigh channel, and the channels between adjacent slots are considered correlated and can be expressed as:
wherein,,representing the channel from the jth base station to the kth user in the t time slot, h j,k (0) Each element in (a) obeys independent complex Gaussian distribution with mean value of 0 and variance of 1, e j,k (t) represents channel independent white gaussian noise, wherein each element is also subject to an independent complex gaussian distribution with a mean of 0 and a variance of 1, and ρ represents the correlation coefficient representing the rayleigh fading vector between adjacent time slots.
Further, when the time slot t itself is interacted between the base stations to the channel information h of all users 1,1 (t),h 1,2 (t),…,h 1,K (t)]After thatEach base station forms a global channel state information matrix, H (t) = [ H ] 1,1 (t),h 1,2 (t),…,h 1,K (t),…,h K,K (t)]And generating global channels for multiple sets of future M time slots using correlation of adjacent channelsn∈[1,N]N is the number of global channel groups generated.
Further, global information is needed to guide training when training, and only local information is needed when executing.
Further, the state of each DQN network is based on channel state information h from base station j to its own serving user j j,j (t) means for converting the channel state information h by using I/Q conversion j,j (t) dividing into in-phase and quadrature components, forming said in-phase and quadrature vectors into vectorsThe state of the DQN network of base station j is expressed as:
further, the beamforming vector w transmitted by the base station j And (t) is a continuous complex vector:
wherein,,normalized beamforming vector, p, taken for base station j j (t) is the transmit power of base station j;
discretizing the normalized beamforming vector using a selection of codewords from a codebook, defining an available transmit power set for each base station for transmit power
Wherein p is max For maximum transmit power of base station, Q pow Is an available discrete power level;
the actions of the DQN network of base station j are:
a j ={(P j ,c j ),p j ∈P,c j ∈C}
where p and c are the transmit power taken and the codeword index respectively,for the codeword index set, Q code Is the number of codewords in the codebook.
Further, the received signal of user j at time t is expressed as:
the rate of user j is expressed as:
C j (w j (t))=log(1+SINR j (t))
wherein SINR j The signal-to-interference-and-noise ratio of the base station j at the time slot t is expressed as:
the sum rate is expressed as:
the rewards of the DQN network of base station j are:
r j (t)=C(t)。
further, the M is an integer from 0 to 300.
The beneficial effects of the invention are as follows:
every M time slots, the base stations exchange channel state information from the base stations to all users, channels of a plurality of groups of future M time slots are generated according to the obtained global channel state information, and the generated future channel samples are used for training the distributed reinforcement learning network, so that the aim of improving the network performance is achieved. The distributed reinforcement learning method and the distributed reinforcement learning system have the advantages that the distributed reinforcement learning is input into the channel state information from the base station to the service user, no information exchange between the base stations is needed, and the cost in execution is greatly reduced.
Simulation proves that the performance of the method is superior to that of a greedy scheme and a random scheme which are compared under the condition of extremely low cost, and the method is close to an optimal scheme requiring global information.
Drawings
Fig. 1 is a schematic diagram of a three-cell MISO cooperative beamforming system model;
FIG. 2 is a frame diagram of the present invention;
FIG. 3 is a flow chart of the present invention;
fig. 4 is a plot of sum rate versus time slot number M for the present invention and other schemes.
Detailed Description
The invention is further described below with reference to the accompanying drawings, without limiting the invention in any way, and any alterations or substitutions based on the teachings of the invention are intended to fall within the scope of the invention.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, consider a multi-cell multiple-input single-output scenario with K cells, each base station equipped with N t A root antenna, each user is equipped with a single antenna. Each base station only serves one user on the same time-frequency resource, eachThe user receives both the useful signal from the serving base station and the interfering signal from the other base stations. The whole cooperative beamforming process is described as shown in fig. 2, and is described as follows:
first, every M time slots, the base stations exchange channel state information from the base station to all users, so that each base station generates global channel state information of the future M time slots according to the received global channel state information to train. Each slot base-to-user channel is modeled as a rayleigh channel, and the channels between adjacent slots are considered correlated and can be expressed as:
wherein,,representing the channel from the jth base station to the kth user in the t time slot, h j,k (0) Each element in (a) obeys independent complex Gaussian distribution with mean value of 0 and variance of 1, e j,k (t) represents channel independent white gaussian noise, wherein each element is also subject to an independent complex gaussian distribution with a mean of 0 and a variance of 1, and ρ represents the correlation coefficient representing the rayleigh fading vector between adjacent time slots. When the interaction time slot t between base stations reaches the channel information h of all users 1,1 (t),h 1,2 (t),…,h 1,K (t)]Thereafter, each base station may form a global channel state information matrix, H (t) = [ H ] 1,1 (t),h 1,2 (t),…,h 1,K (t),…,h K,K (t)]. It can be seen from equation 1 that knowing H (t) and ρ, the correlation of adjacent channels can be used to generate global channels +.>n∈[1,N]N is the number of global channel groups generated.
Then, to improve network performance, the present invention trains the DQN network for each base station using the generated global channel state information. The invention defines a distributed DQNThree elements of the network, namely status, action and rewards. Channel state information h of each DQN network from base station j to self-service user j j,j (t) because the neural network cannot process complex numbers, the invention adopts I/Q transformation to divide complex vector into in-phase (real part) and quadrature (imaginary part) components, and the two vectors form a vectorThus, the state of the DQN network of base station j is:
the action of the DQN network is typically a set of discrete real values, the transmitted beamforming vector w j And (t) is a continuous complex vector. Thus, the present invention discretizes the continuous complex vector. The beamforming vector is composed of two parts, as shown in the following equation:
normalized beamforming vector, p, taken for base station j j And (t) is the transmission power of the base station j. The present invention discretizes the normalized beamforming vector by selecting codewords from the codebook, and defines the available transmit power set for each base station for transmit power +.>
Wherein p is max For maximum transmit power of base station, Q pow Is a discrete power level available. The actions of the DQN network of base station j are therefore:
a j ={(p j ,c j ),p j ∈P,c j ∈C} (4)
where p and c are the transmit power taken and the codeword index respectively,for the codeword index set, Q code Is the number of codewords in the codebook.
The rewards of the DQN network are sum rates, and the sum rates are indexes calculated according to global information, so that the purpose of training local information execution by the global information can be achieved by using the sum rates as rewards, the performance can be improved, and the network can be converged more quickly. In a multi-cell multiple-input single-output scenario, each user shares the same frequency band as users in other cells, and there is inter-cell interference. The received signal for user j at time t slot can be expressed as:
wherein the method comprises the steps ofIs additive white gaussian noise at user j. The rate of user j can be expressed as:
C j (w j (t))=log(1+SINR j (t)) (6)
wherein SINR j The signal-to-interference-and-noise ratio of the base station j at the time slot t is expressed as:
the sum rate can be expressed as:
the rewards of the DQN network of base station j are therefore:
r j (t)=C(t) (9)
as shown in fig. 3 and algorithm 1, the pseudo code of the distributed reinforcement learning method of the present invention is as follows:
algorithm 1. Distributed reinforcement learning method pseudo code
In order to verify the performance of the collaborative beamforming scheme based on distributed reinforcement learning, the invention carries out the following simulation:
the channel parameters are assumed to follow a standard unit complex gaussian random distribution. The source node transmitting power is P, the noise variance at the destination node isAssuming that the base station and the user perform accurate channel estimation, fig. 4 shows a curve of the sum rate of the greedy scheme and the random scheme according to the time slot number M, in the experiment, there are 4 base stations, 1 antenna for the user, 4 transmitting antennas are provided for each base station, the available discrete power level is 4, and the code number is 4. The greedy scheme is that each base station maximizes the throughput of its serving user, and the random scheme is that each base station randomly selects codeword and transmit power, and the optimal performance obtained by traversing is used as an upper bound. As can be seen from fig. 3, when m=1, the performance of the present invention can reach 95% of the optimal performance, and as the number of slots M increases, the performance of the present invention decreases. The number M of time slots is [0,300 ]]Throughout this range, the present invention is superior to greedy and random schemes.
The beneficial effects of the invention are as follows:
every M time slots, the base stations exchange channel state information from the base stations to all users, channels of a plurality of groups of future M time slots are generated according to the obtained global channel state information, and the generated future channel samples are used for training the distributed reinforcement learning network, so that the aim of improving the network performance is achieved. The distributed reinforcement learning method and the distributed reinforcement learning system have the advantages that the distributed reinforcement learning is input into the channel state information from the base station to the service user, no information exchange between the base stations is needed, and the cost in execution is greatly reduced.
Simulation proves that the performance of the method is superior to that of a greedy scheme and a random scheme which are compared under the condition of extremely low cost, and the method is close to an optimal scheme requiring global information.
The embodiment of the present invention is an implementation manner of the present invention, but the implementation manner of the present invention is not limited by the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications made by the spirit and principle of the present invention should be equivalent substitution manner, and all the changes, substitutions, combinations, and simplifications are included in the protection scope of the present invention.
Claims (7)
1. The multi-cell collaborative beam forming method based on distributed reinforcement learning is characterized by comprising the following steps of:
s1: for base station j, j E [1, K]Establishing a weight of theta j And a weight θ j ' target DQN and an empty experience pool M j ;
S2: initializing training DQN with random weight θ, let θ be j =θ,j∈[1,K];
S3: repeating steps S4 to S7 every M slots;
s4: the base stations exchange channel state information of all users;
s5: each base station generates global channel samples of a plurality of groups of M time slots in the future;
s6: each base station takes action randomly and takes corresponding experience<s j ,a j ,r j ,s j ′>Stored in its experience pool M j In (a) and (b);
s7: each base station performs network training;
the step of network training comprises the following steps:
s70: base station j observes that it is atStatus s of time slot t j (t),j∈[1,K];
S71: at time slot t, base station j is based on state s j (t) selecting action a according to epsilon-greedy policy j ,j∈[1,K];
S72: according to s j And a j ,j∈[1,K]Calculating global rewards r j (t);
S73: base station j observes its new state s at time slot t+1 j ′(t),j∈[1,K];
S74: base station j will experience<s j (t),a j (t),r j (t),s j ′(t)>Stored in its experience pool M j In j E [1, K];
S75: base station j is from experience pool M j Sampling to obtain small batches of samples;
s76: base station j updates its training DQN weights θ using reverse propagation j ,j∈[1,K];
S77: every other T of base station j step Time slot update once target DQN weight θ j ′,j∈[1,K];
S78: until convergence or maximum training times are reached;
the received signal for user j at time t is expressed as:
wherein the method comprises the steps ofIs additive Gaussian white noise at user j, h j,j (t) is channel state information from base station j to self-service user j, w j (t) is a beamforming vector transmitted by base station j;
the rate of user j is expressed as:
C j (w j (t))=log(1+SINR j (t))
wherein SINR j The signal-to-interference-and-noise ratio of the base station j at the time slot t is expressed as:
wherein sigma 2 Is the noise variance;
the sum rate is expressed as:
the rewards of the DQN network of base station j are:
r j (t)=C(t)。
2. the distributed reinforcement learning based multi-cell collaborative beamforming method according to claim 1 wherein each time slot base station to user channel is modeled as a rayleigh channel and the channels between adjacent time slots are considered correlated expressed as:
wherein,,representing the channel from the jth base station to the kth user in the t time slot, h j,k (0) Each element in (a) obeys independent complex Gaussian distribution with mean value of 0 and variance of 1, e j,k (t) represents channel independent white gaussian noise, wherein each element is also subject to an independent complex gaussian distribution with a mean of 0 and a variance of 1, and ρ represents the correlation coefficient representing the rayleigh fading vector between adjacent time slots.
3. The multi-cell collaborative beamforming method based on distributed reinforcement learning according to claim 2 wherein when interacting time slots t self to all users channel information h between base stations 1,1 (t),h 1,2 (t),…,h 1,K (t)]Thereafter, each base station constitutesA global channel state information matrix, H (t) = { H 1,1 (t),h 1,2 (t),…,h 1,K (t),…,h K,K (t)]And generating global channels for multiple sets of future M time slots using correlation of adjacent channelsAnd N is the number of the generated global channel groups.
4. The multi-cell collaborative beamforming method based on distributed reinforcement learning according to claim 1 wherein global information is needed to guide training when training and only local information is needed when performing.
5. The method of multi-cell collaborative beamforming based on distributed reinforcement learning according to claim 2 wherein the status of each DQN network is based on channel status information h from base station j to self-serving user j j,j (t) means for converting the channel state information h by using I/Q conversion j,j (t) dividing into in-phase and quadrature components, forming said in-phase and quadrature vectors into vectorsThe state of the DQN network of base station j is expressed as:
6. the method for multi-cell collaborative beamforming based on distributed reinforcement learning according to claim 2 wherein a beamforming vector w transmitted by a base station j And (t) is a continuous complex vector:
wherein,,normalized beamforming vector, p, taken for base station j j (t) is the transmit power of base station j;
discretizing the normalized beamforming vector using a selection of codewords from a codebook, defining an available transmit power set for each base station for transmit power
Wherein p is max For maximum transmit power of base station, Q pow Is an available discrete power level;
the actions of the DQN network of base station j are:
a j ={(p j ,c j ),p j ∈P,c j ∈C}
7. The distributed reinforcement learning based multi-cell collaborative beamforming method according to claim 1 wherein M is an integer from 0 to 300.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110768826.2A CN113472472B (en) | 2021-07-07 | 2021-07-07 | Multi-cell collaborative beam forming method based on distributed reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110768826.2A CN113472472B (en) | 2021-07-07 | 2021-07-07 | Multi-cell collaborative beam forming method based on distributed reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113472472A CN113472472A (en) | 2021-10-01 |
CN113472472B true CN113472472B (en) | 2023-06-27 |
Family
ID=77879037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110768826.2A Active CN113472472B (en) | 2021-07-07 | 2021-07-07 | Multi-cell collaborative beam forming method based on distributed reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113472472B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109039412A (en) * | 2018-07-23 | 2018-12-18 | 西安交通大学 | A kind of safe transmission method of physical layer based on random wave bundle figuration |
CN110365387A (en) * | 2019-07-16 | 2019-10-22 | 电子科技大学 | A kind of beam selection method of cellular communication system |
CN111181619A (en) * | 2020-01-03 | 2020-05-19 | 东南大学 | Millimeter wave hybrid beam forming design method based on deep reinforcement learning |
CN111246497A (en) * | 2020-04-10 | 2020-06-05 | 卓望信息技术(北京)有限公司 | Antenna adjustment method based on reinforcement learning |
CN111901862A (en) * | 2020-07-07 | 2020-11-06 | 西安交通大学 | User clustering and power distribution method, device and medium based on deep Q network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10694526B2 (en) * | 2016-09-30 | 2020-06-23 | Drexel University | Adaptive pursuit learning method to mitigate small-cell interference through directionality |
-
2021
- 2021-07-07 CN CN202110768826.2A patent/CN113472472B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109039412A (en) * | 2018-07-23 | 2018-12-18 | 西安交通大学 | A kind of safe transmission method of physical layer based on random wave bundle figuration |
CN110365387A (en) * | 2019-07-16 | 2019-10-22 | 电子科技大学 | A kind of beam selection method of cellular communication system |
CN111181619A (en) * | 2020-01-03 | 2020-05-19 | 东南大学 | Millimeter wave hybrid beam forming design method based on deep reinforcement learning |
CN111246497A (en) * | 2020-04-10 | 2020-06-05 | 卓望信息技术(北京)有限公司 | Antenna adjustment method based on reinforcement learning |
CN111901862A (en) * | 2020-07-07 | 2020-11-06 | 西安交通大学 | User clustering and power distribution method, device and medium based on deep Q network |
Non-Patent Citations (1)
Title |
---|
基于强化学习的定向无线通信网络抗干扰资源调度算法;谢添;高士顺;赵海涛;林沂;熊俊;;电波科学学报(04);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113472472A (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zaher et al. | Learning-based downlink power allocation in cell-free massive MIMO systems | |
Weeraddana et al. | Multicell MISO downlink weighted sum-rate maximization: A distributed approach | |
Ghiasi et al. | Energy efficient AP selection for cell-free massive MIMO systems: Deep reinforcement learning approach | |
Wu et al. | Intelligent resource allocation for IRS-enhanced OFDM communication systems: A hybrid deep reinforcement learning approach | |
Chafaa et al. | Self-supervised deep learning for mmWave beam steering exploiting sub-6 GHz channels | |
Ji et al. | Reconfigurable intelligent surface enhanced device-to-device communications | |
Chen et al. | Energy-efficient cell-free massive MIMO through sparse large-scale fading processing | |
Lima et al. | User pairing and power allocation for UAV-NOMA systems based on multi-armed bandit framework | |
XU et al. | Resource Allocation for Two‑Tier RIS‑Assisted Heterogeneous NOMA Networks | |
Hua et al. | Learning-based reconfigurable-intelligent-surface-aided rate-splitting multiple access networks | |
Sun et al. | Hierarchical reinforcement learning for AP duplex mode optimization in network-assisted full-duplex cell-free networks | |
Chen et al. | iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system | |
Nimmagadda | Enhancement of efficiency and performance gain of massive MIMO system using trial-based rider optimization algorithm | |
Huang et al. | Self-attention reinforcement learning for multi-beam combining in mmWave 3D-MIMO systems | |
CN113472472B (en) | Multi-cell collaborative beam forming method based on distributed reinforcement learning | |
Sandberg et al. | Learning robust scheduling with search and attention | |
Chen et al. | Integrated Beamforming and Resource Allocation in RIS-Assisted mmWave Networks based on Deep Reinforcement Learning | |
CN101989875B (en) | Multi-cell interference suppression method and base station controller | |
Wang et al. | Hybrid OMA/NOMA Mode Selection and Resource Allocation in Space-Air-Ground Integrated Networks | |
Attaoui et al. | Joint beam alignment and power allocation for multi-user NOMA-mmwave systems | |
Mohamed et al. | Spectral Efficiency Improvement in Downlink Fog Radio Access Network With Deep-Reinforcement-Learning-Enabled Power Control | |
Chen et al. | Planning optimization of the distributed antenna system in high‐speed railway communication network based on improved cuckoo search | |
Akbarpour-Kasgari et al. | Deep Reinforcement Learning in mmW-NOMA: Joint Power Allocation and Hybrid Beamforming | |
Perera et al. | Dynamic Spectrum Fusion: An Adaptive Learning Approach for Hybrid NOMA/OMA in Evolving Wireless Networks | |
Pala et al. | Robust Design of RIS-aided Full-Duplex RSMA System for V2X communication: A DRL Approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |