CN113411110B - Millimeter wave communication beam training method based on deep reinforcement learning - Google Patents

Millimeter wave communication beam training method based on deep reinforcement learning Download PDF

Info

Publication number
CN113411110B
CN113411110B CN202110623890.1A CN202110623890A CN113411110B CN 113411110 B CN113411110 B CN 113411110B CN 202110623890 A CN202110623890 A CN 202110623890A CN 113411110 B CN113411110 B CN 113411110B
Authority
CN
China
Prior art keywords
channel
time
matrix
training
millimeter wave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110623890.1A
Other languages
Chinese (zh)
Other versions
CN113411110A (en
Inventor
戚晨皓
姜国力
王宇杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110623890.1A priority Critical patent/CN113411110B/en
Publication of CN113411110A publication Critical patent/CN113411110A/en
Application granted granted Critical
Publication of CN113411110B publication Critical patent/CN113411110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0617Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/08Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station
    • H04B7/0837Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the receiving station using pre-detection combining
    • H04B7/0842Weighted combining
    • H04B7/086Weighted combining using weights depending on external parameters, e.g. direction of arrival [DOA], predetermined weights or beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Radio Transmission System (AREA)
  • Variable-Direction Aerials And Aerial Arrays (AREA)

Abstract

The invention discloses a millimeter wave communication beam training method based on deep reinforcement learning, which is characterized in that a millimeter wave channel is tracked by defining the specific representation of elements such as states, targets, rewards and the like in a reinforcement learning model in the practical problem of beam training; defining the state as an image form, approximating a value function in reinforcement learning by using a convolutional neural network, and defining the action as a triple form based on the moving direction, the distance and the beam coverage range of the optimal beam combination of the channel at the last moment; when designing the reward function, taking the effective data reachable rate in a time slice as a target value; in the training process of the neural network, a Q learning method is used for updating network parameters; and predicting by utilizing the trained deep Q network, and selecting the action with the maximum Q value, wherein the action corresponds to the beam combination to be tested at the next moment.

Description

Millimeter wave communication beam training method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of millimeter wave wireless communication, in particular to a millimeter wave communication beam training method based on deep reinforcement learning.
Background
With the continuous development of wireless communication technology, some frequency bands with lower frequency spectrum resources are almost completely occupied. In order to meet the requirement of communication performance and obtain more spectrum resources, people pay attention to the frequency band with higher frequency band, namely the millimeter wave frequency band. The frequency band is a frequency band with the frequency within the range of 30-300 GHz, the frequency spectrum resource in the frequency band is rich, the transmission rate is high, and the requirements of applications with high bandwidth requirements can be met. However, due to the propagation characteristics of the millimeter-wave signal, the path loss of the millimeter-wave channel is high compared to the microwave channel. Considering that the wavelength of a millimeter wave signal is short compared to a microwave signal and the spacing of antennas is generally positively correlated to the signal wavelength, a large number of antennas can be concentrated in a small space to form a large-scale antenna array to improve a high gain. The large-scale MIMO technology and the millimeter wave communication are mutually complementary, the millimeter wave communication solves the problem of spectrum resource shortage of the large-scale MIMO technology, and meanwhile, the large-scale MIMO technology makes up the path loss of the millimeter wave communication, so that the application prospect of the millimeter wave large-scale MIMO communication is very wide.
In the existing research work, a codebook is usually preset at both the transmitting end and the receiving end, the codebook includes a plurality of beamforming vectors (also called codewords), the transmitting end and the receiving end traverse the codewords in the codebook to transmit and receive pilot signals, and the codeword combination with the maximum receiving power is used as the beamforming vector combination of the formal transmitting and receiving signals, which is called as beam training. However, the use of large-scale antenna arrays and directional narrow beams results in such a training algorithm that traverses the codebook being very time consuming. Especially in a dynamic scenario, the millimeter wave channel is constantly changing, and it is very difficult to achieve frequent and precise beam alignment, which is a very challenging problem so far. Therefore, if the process of the beam training can sense the change of the channel environment and adjust the trained beam in time according to the change, the training overhead can be greatly reduced, and the resources of the communication system can be saved.
In order to reduce the beam training overhead, document [1] "millimeter wave massive MIMO synchronous multi-user beam training using adaptive layered codebook" (Chen K, Qi C, Dobre O a, et al, sinusoidal multi-user beam training using adaptive hierarchical codebook for mmWave massive MIMO [ C ]//2019 IEEE Global Communications reference (GLOBECOM). Except for the bottom layer, each layer of the designed self-adaptive hierarchical codebook only has two code words, and no matter how many users are served by the BS, all the users only need to be subjected to beam training twice. The difficulty of the work is the design problem of the code words, because the layered codebook for the beam training is not fixed at first but is continuously constructed in the beam training process, the construction of the codebook is more complex, and the training difficulty is increased.
Document [2] "millimeter wave communication intelligent beam training based on deep reinforcement learning" (Zhang J, Huang Y, Wang J, et al. intelligent beam training for millimeter-wave communication via requirement learning [ C ]//2019 IEEE Global Communications reference. IEEE, 2019:1-7.) proposes a deep reinforcement learning beam training algorithm based on environmental perception. The algorithm can sense the change of the environment, learn the needed potential probability information from the environment and realize the intelligent training of the beam with lower expenditure. In addition, the algorithm does not need any prior knowledge of dynamic channel modeling, and therefore is suitable for various complex scenes. However, the method is only suitable for the condition of a single antenna at a receiving end, has a small application range, and does not support millimeter wave communication between similar base stations.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a millimeter wave communication beam training method based on deep reinforcement learning, which introduces a reinforcement learning framework into beam training, so that a trained beam is adjusted in time along with changes of a channel, and a state of the channel is tracked, thereby effectively reducing beam training overhead, ensuring performance of beam training, solving technical problems of large training overhead, hardware complexity and power consumption of an existing beam training method, and simultaneously supporting a communication scenario in which a transmitting end and a receiving end are multiple antennas.
In order to realize the purpose, the invention adopts the following scheme:
a millimeter wave communication beam training method based on deep reinforcement learning comprises the following steps:
step S1, constructing a millimeter wave communication channel model between the user side and the base station side;
step S2, designing codebooks of a user terminal and a base station terminal, constructing a model of a final received signal according to the designed codebooks, and carrying out mathematical modeling on a beam training process according to the model;
step S3, defining the representation of state, action and reward in the beam training;
and step S4, regarding the state defined in the step S3 as a multi-channel image, inputting the multi-channel image into the constructed convolutional neural network, and obtaining values of all actions corresponding to the state.
Further, the step S1 specifically includes:
setting a millimeter wave massive MIMO system aiming at single user, wherein, a user end has NrRoot antenna, base station end has NtRoot antenna, arrangement of antennasUniform linear arrays are adopted, and the millimeter wave communication channel model is modeled as follows:
Figure GDA0003218930120000021
in the formula (1), L and alphal
Figure GDA0003218930120000022
θlRespectively representing the number of paths, the channel gain of the ith path, the arrival angle of a channel and the departure angle of the channel; definition of
Figure GDA0003218930120000023
Figure GDA0003218930120000024
ΘlAnd ΨlObey [0, π ] for both the arrival angle and departure angle of the spatial domain]Inner uniform distribution, dtAnd drRespectively representing the interval between the array antennas of the base station side and the user side, wherein lambda is the wavelength of the millimeter wave signal, and u (-) represents a channel steering vector; the variation of the steering angle of the channel in the adjacent time interval follows Gaussian distribution, and the expression is as follows:
Figure GDA0003218930120000031
in the formula (2), θ0U (0, pi) represents the initial channel steering angle, θ, at which time t is 0 and randomtRepresenting the channel steering angle at time t,
Figure GDA0003218930120000032
indicating the amount of change in the channel steering angle.
Further, in step S2, the expression of the codebook of the ue and the bs is:
Figure GDA0003218930120000033
Figure GDA0003218930120000034
in the formula (3) and the formula (4),
Figure GDA0003218930120000035
Figure GDA0003218930120000036
the expression of the final received signal is:
Figure GDA0003218930120000037
in the formula (5), P,
Figure GDA0003218930120000038
Figure GDA0003218930120000039
Figure GDA00032189301200000310
Denotes a transmission power of the base station side, a reception codeword of the user terminal, a transmission codeword of the base station side, and a channel noise vector, respectively, and | w | |)2=‖f‖2=1,|x|2=1;
Thus, the expression of the received signal matrix is:
Figure GDA00032189301200000311
in the formula (6), the first and second groups,
Figure GDA00032189301200000312
Figure GDA00032189301200000313
respectively representing the receiving endsAnd a DFT codebook at a transmitting end,
Figure GDA00032189301200000314
representing the channel matrix, x, P representing the transmitted signal and the power of the signal respectively,
Figure GDA00032189301200000315
a matrix representing the noise of the channel is represented,
Figure GDA00032189301200000316
an element Y (m, N) in the mth row and nth column in the matrix indicates that the transmitting end uses the nth (N is 1,2, …, N in the codebook Ft) The receiving end uses the m (m is 1,2, …, N) th in the codebook W for transmitting each codewordr) Receiving the resulting signal by a code word; the beam training process is represented as the following optimization problem:
Figure GDA00032189301200000317
further, in step S3, defining the representation of the state in the beam training specifically includes:
let the channel matrix at time t be HtThe matrix of the received signals corresponding thereto is YtDefining a matrix ZtIs YtModulo of the received signal strength matrix Z for successive time instantstIs defined as a state StSpecifically, the following are shown:
St(i)=Zt+i-C,i=1,2,…,C (7)
in the formula (7), StIs a three-dimensional matrix with a third dimension of size C, C representing the number of successive time instants, Zt+i-CRepresenting the received signal strength matrix at time t + i-C.
Further, in step S3, defining the representation of the action in the beam training, specifically including:
defining said matrix ZtThe position corresponding to the middle largest element is
Figure GDA00032189301200000318
Wherein the content of the first and second substances,
Figure GDA00032189301200000319
respectively representing indexes of optimal transmitting and receiving beams at the time t in codebooks F and W;
the optimal beam combination at the current time is
Figure GDA0003218930120000041
Wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003218930120000042
Figure GDA0003218930120000043
the action at time t is defined as:
At=(d,o,r) (8)
in formula (8), d, o, and r respectively represent the direction, offset, and coverage of the beam search with respect to the optimal beam position at time t +1 at time t; where D ∈ D ═ {0,1,2,3,4}, there are 5 directions that can be favored: 0 represents no movement, 1,2,3,4 represent i respectivelytThe position is that the base point moves up, down, left and right; o e O ═ {0,1,2, …, M-1}, with M optional offsets, defined as the center position of the beam search at time t +1 and itThe distance of the location; and R ∈ R ═ {1,2, …, N }, where N optional radii exist, and the radius is defined as a coverage radius with the central position of the beam search at the time t +1 as a base point.
Further, in step S3, defining the representation of the reward in the beam training, specifically including:
Figure GDA0003218930120000044
in formula (9), Bt+1Represented as agent performing action A at time ttObtaining a set of beam combinations, t, for testing at the next timesRepresenting testing a waveTime of beam combination, tpExpressed as a precoding phase in one time step of the beam training, TsRepresented as a time step of the beam training,
Figure GDA0003218930120000045
and the data reachable rate corresponding to the optimal beam combination at the moment of t +1 is represented.
Further, the convolutional neural network specifically comprises two convolutional layers, two pooling layers, a flat layer, a full-link layer and an output layer; the states are normalized before they are input into the neural network, in particular, the two-dimensional image represented by each channel.
Furthermore, the input state is updated by using a convolutional neural network, the parameters of the convolutional neural network are updated by taking the predicted value of Q learning as a target, the trained network is used for predicting, the action with the maximum Q value is selected, and the beam combination cluster corresponding to the action is used for testing to reduce the training overhead.
A millimeter wave communication beam training device based on deep reinforcement learning, the device comprising:
the beam selection module is used for acquiring a receiving beam set and a transmitting beam set according to the executed action;
the channel sample generation module is used for generating a plurality of randomly changed channel matrixes and calculating the optimal receiving and transmitting beam combination of each channel matrix;
the receiving signal matrix module is used for calculating the receiving signal intensity corresponding to the receiving and transmitting beam pair set in the beam selection module;
the state updating module updates the current state by using the received signal strength matrix;
the optimal receiving and sending wave beam combination determining module is used for obtaining the optimal receiving and sending wave beam combination corresponding to the channel, obtaining a wave beam searching range of the next moment according to a certain action after the action is executed, testing by using all wave beam combinations in the range, and selecting the wave beam combination with the maximum received signal intensity as the optimal wave beam combination;
the reward calculation module is used for calculating a reward value for executing the action by using the obtained optimal beam combination and other parameters;
the parameter setting module is used for setting parameters of the neural network and other parameters in the beam training process;
the experience storage module is used for storing experiences in the beam training process into a set;
the neural network training module is used for inputting the state matrix into the neural network, outputting all action values corresponding to the state, and selecting a plurality of experiences from the memory library to update the network parameters;
a target value setting module which calculates a target value corresponding to each experience by using the updating strategy of the Q learning;
and the neural network prediction module predicts all action values corresponding to the input state by using the trained network and selects the action with the maximum Q value as the optimal action.
The beneficial effects of the invention are:
1. the invention introduces a reinforcement learning frame in the beam training, so that the trained beam can be adjusted in time along with the change of the channel, and the state of the channel can be tracked, thereby more accurately predicting the optimal receiving and transmitting beam combination of an unknown channel, effectively reducing the beam training overhead and ensuring the performance of the beam training.
2. Different from the traditional training modes such as beam scanning and the like, the number of beam combinations tested each time is not fixed and is dynamically changed under different channel states, so that the overhead of beam training is effectively reduced.
3. In the design of receiving and transmitting beams, the invention only adopts narrow beams, thereby greatly reducing the complexity of hardware.
Drawings
Fig. 1 is a schematic input/output diagram of a neural network in embodiment 1;
FIG. 2 is a graphical representation of the received signal strength matrix in example 1;
FIG. 3 is a diagram illustrating states (matrices) of reinforcement learning in example 1;
FIG. 4 shows the operation of example 1The implementation process is schematically shown, wherein, FIG. 4a shows the received signal strength matrix Z at time ττFIGS. 4 b-4 f show different actions taken, respectively
Figure GDA0003218930120000051
Obtaining the received signal intensity matrix at the time of tau +1
Figure GDA0003218930120000052
j=1,…,5;
FIG. 5 is a schematic time slice diagram of beam training in example 1;
fig. 6 is a diagram illustrating a comparison of beam search success rates when the number of channel paths is different;
FIG. 7 is a diagram illustrating a comparison of the achievable rates of users when the number of channel paths is different;
fig. 8 is a schematic diagram showing the beam training method proposed in embodiment 1 compared with the beam scanning, layered codebook-based beam training method in terms of the success rate of beam search;
fig. 9 is a schematic diagram showing a comparison between the beam training method proposed in embodiment 1 and the beam scanning, layered codebook-based beam training method in terms of user reachable rate.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1 to 5, the present embodiment provides a millimeter wave communication beam training method based on deep reinforcement learning, which specifically includes:
consider a millimeter-wave massive MIMO system for a single user, with N at the userrRoot antenna, baseAt station has NtThe antennas are arranged in a Uniform Linear Array (ULA) manner. According to the widely used Saleh-Valenzuela model, the millimeter wave channel of the downlink can be modeled as:
Figure GDA0003218930120000061
wherein, L, alphal
Figure GDA0003218930120000062
θlRespectively representing the number of paths, the channel gain of the ith path, the arrival angle of the channel and the departure angle of the channel. Usually, the path of 1 is the LOS path, and the other paths are the NLOS paths. Definition of
Figure GDA0003218930120000063
Figure GDA0003218930120000064
ΘlAnd ΨlObey [0, π ] for both arrival and departure angles in the spatial domain]Even distribution within. dtAnd drRespectively representing the spacing of the array antennas at the base station side and the user side, lambda is the wavelength of the millimeter wave signal, and in general,
Figure GDA0003218930120000065
u (-) denotes a channel steering vector, defined as follows:
Figure GDA0003218930120000066
since the channel considered in the beam training of the present invention is time-varying, the channel needs to be dynamically modeled. In an actual communication environment, the channel variation is generally random, and in the present invention, a gaussian random walk is adopted as a variation form of the channel, that is, the variation amount of the steering angle (departure angle and arrival angle) of the channel in adjacent time intervals obeys gaussian distribution, which is specifically expressed as follows:
Figure GDA0003218930120000067
wherein, theta0U (0, pi) represents the initial channel steering angle, θ, at which time t is 0 and randomtIndicating the channel steering angle at time t,
Figure GDA0003218930120000068
indicating the amount of change in the channel steering angle.
Before beam training, a codebook is defined at both the transmitting end and the receiving end, each codebook comprises a series of codewords, and each codeword represents a beam forming vector. In the present invention, a Discrete Fourier Transform (DFT) codebook is used as a codebook at a transmitting end, and the nature of the DFT codebook is a two-dimensional complex matrix determined according to the number of antennas, and a modulus value of each element in the matrix is constant. The DFT codebook is well suited for training of analog beams because the phase shifter network constituting the analog beamforming part only changes the phase of the transmitted signal and does not provide a gain in power.
DFT codebooks at a transmitting end and a receiving end are defined as F and W, respectively. Wherein the content of the first and second substances,
Figure GDA0003218930120000071
containing NtA number of code words that are each associated with a code word,
Figure GDA0003218930120000072
containing NrA code word. The codewords contained in the two codebooks each represent a channel steering vector pointing in different directions in space, and are represented as follows:
Figure GDA0003218930120000073
Figure GDA0003218930120000074
assuming that the sending end uses the codeword f to send the signal x, the receiving end uses the codeword w to receive the signal, and the transmission of the channel matrix H is performed in the middle, and the final received signal may be represented as:
Figure GDA0003218930120000075
wherein, P,
Figure GDA0003218930120000076
Figure GDA0003218930120000077
Figure GDA0003218930120000078
Respectively representing the transmission power of the base station, the received code word of the user terminal, the transmission code word of the base station terminal and the channel noise vector. Neither the transmitted nor the received codeword provides a power gain, i.e. | w |)2=‖f‖21, the transmission signal x has normalized power, | x ∞ n2=1。
The user achievable rate can be expressed as:
Figure GDA0003218930120000079
in the beam training process, the transmitting end and the receiving end respectively test each code word in the codebooks F and W to find a transmitting end beam forming vector F and a receiving end beam forming vector W which can be optimally matched with the channel H. Therefore, the beam training problem can be equivalent to the following optimization problem:
Figure GDA00032189301200000710
in the beam training, the transmitting power P of the signal and the variance sigma of the channel noise2Given, the above optimization problem can be simplified to:
Figure GDA00032189301200000711
however, in practical situations, the channel H is usually unknown and cannot be directly solved for optimal f and w. It is common practice to find the best combination of f and w by measuring the strength value of the received signal y, so the beam training process can be expressed as the following optimization problem:
Figure GDA00032189301200000712
the optimal solutions of the above two optimization problems may be different due to the presence of the channel noise η. If the two are the same, the beam training is successful, otherwise, the beam training is failed. Suppose that a total of N is performedtotalThe secondary beam training succeeds NsucNext, the success rate of the beam search can be expressed as:
Figure GDA0003218930120000081
the received signal matrix at time t is:
Figure GDA0003218930120000082
wherein the content of the first and second substances,
Figure GDA0003218930120000083
Figure GDA0003218930120000084
DFT codebooks at the receiving end and the transmitting end are respectively indicated,
Figure GDA0003218930120000085
a channel matrix representing time t, x, P representing the transmitted signal and the power of the signal respectively,
Figure GDA0003218930120000086
representing the channel noise matrix at time t, defining a received signal strength matrix ZtIs YtThe following steps of (1):
Zt(m,n)=|Yt(m,n)|,
as shown in FIG. 2, in the present example, Z istThe two dimensions of the image respectively represent the indexes of the transmitting and receiving end code words, and each grid in the image corresponds to one transmitting and receiving beam combination. The image describes the distribution of the corresponding received signal strength when the transceiving end uses different beams to test, and the pixel position with larger gray scale value in the image corresponds to the beam combination with high received signal strength. Image Z due to the sparsity of the millimeter wave channeltIs close to 0, the positions of those non-zero elements correspond to the distribution of the steering angles under the current channel, ZtThe position of the medium maximum element corresponds to the searched optimal beam combination. If the positions of the non-zero elements can be dynamically tracked, the change condition of the channel can be sensed in time, and the training overhead is greatly reduced. To capture the dynamically changing channel, we define several consecutive images as one state St, namely:
St(i)=Zt+i-C,i=1,2,…,C,
wherein S istIs a three-dimensional matrix with a third dimension of size C, indicating a state matrix StWhich comprises C two-dimensional matrices Z. S. thetThe ith two-dimensional matrix corresponds to a received signal strength matrix Z at the time of t + i-Ct+i-CAnd S istThe received signal strength matrix Z of which the last two-dimensional matrix is at the moment tt. As shown in FIG. 3, the state matrix S may also betThe image is viewed as a multi-channel image, so that the convolutional neural network can be used for training.
From the above, define the state matrix S at time ttThe last two-dimensional matrix contained is ZtDefinition of ZtThe position corresponding to the middle maximum element is:
Figure GDA0003218930120000087
wherein
Figure GDA0003218930120000088
Respectively representing the indexes of the optimal transmitting and receiving beams at the time t in the codebooks F and W, so that the optimal beam combination at the current time can be obtained as
Figure GDA0003218930120000089
Wherein
Figure GDA00032189301200000810
Figure GDA00032189301200000811
To obtain a state matrix S at time t +1t+1Defining the action at the time t as a triple:
At=(d,o,r),
where d, o, and r respectively represent the direction, offset, and coverage of the beam search at time t +1 with respect to the optimal beam position at time t. D ∈ D ═ {0,1,2,3,4}, there are 5 possible directions: 0 represents no movement, 1,2,3,4 represent i respectivelytThe position is that the base point moves up, down, left and right. O e O ═ {0,1,2, …, M-1}, with M optional offsets, defined as the center position of the beam search at time t +1 and itThe distance of the location. R ∈ {1,2, …, N }, where N optional radii exist, a radius is defined as a coverage radius with the center position of the beam search at time t +1 as a base point, and for example, R ∈ R ═ 1 indicates that the coverage area of the beam search is a square area with a side length of 3.
The specific execution process of the action is shown in fig. 4, each image represents a received signal strength matrix Z at a certain time, and the gray value of each pixel point in the image represents the modulus of the received signal obtained by testing the corresponding beam combination. The positions of the colored grids correspond to the beam combinations to be trained, and the gray value is greater than 0; at other positionsThe beam combination does not need to train the beam, and the gray value is set to be 0. It is assumed that graph (a) represents a received signal strength matrix Z at time ττThe dark grid positions represent the index i of the optimal beam combination determined by the beam trainingτFig. (4, 5), fig. (b-f) show taking different actions, respectively
Figure GDA0003218930120000091
Obtaining the received signal intensity matrix at the time of tau +1
Figure GDA0003218930120000092
For example, graph (b) shows taking action
Figure GDA0003218930120000093
The obtained received signal strength matrix at the time of tau +1
Figure GDA0003218930120000094
According to the definition of the above actions, the coverage area of the beam training is a square area with r being 1 (side length being 3), and the central position (dark grid) of the area is the last-time optimal beam combination index iτThe graphs (c-f) are similarly obtained.
Assume that at time t the agent performs action AtGet the set of beam combinations for testing at the next time as
Figure GDA0003218930120000095
Figure GDA0003218930120000096
Where I denotes the total number of beam combinations used for training, set Bt+1The elements contained in the test table are used as receiving and transmitting wave beams to be tested one by one, thereby obtaining a received signal intensity matrix Z at the t +1 momentt+1According to Zt+1The state matrix S at time t +1 can be constructedt+1
Figure GDA0003218930120000097
Selection matrix Zt+1Maximum value
Figure GDA0003218930120000098
The corresponding beam combination is used as the optimal beam combination at the moment of t +1
Figure GDA0003218930120000099
Then perform action At+1And so on.
Considering the channel reachable rate and the beam training overhead together, the reward function is defined as the following form in the invention:
Figure GDA00032189301200000910
where Ts represents the size of a time slice, tdIndicating the effective data transmission time within a time slice,
Figure GDA00032189301200000911
defining the data reachable rate corresponding to the optimal beam combination at the moment of t + 1:
Figure GDA00032189301200000912
the reward function may be understood as the effective data achievable rate over a time slice (since a time slice is subject to beam training and precoding in addition to transmitting data).
FIG. 5 is a definition of time slices, from which t can be knownd=Ts-tb-tp=Ts-Its-tpWherein I ═ Bt+1I denotes performing action AtThe obtained beam set size t used for testing at the moment t +1sIndicating the time at which one beam combination is tested.
Thus, the reward function R may eventually be appliedtExpressed in the following form:
Figure GDA0003218930120000101
because the state S can be represented as a multi-channel image, the present invention uses a convolutional neural network for processing. The network structure is shown in fig. 1, and comprises two convolutional layers, two pooling layers, a flat layer, a full-link layer, and an output layer. Convolutional layers are the result of a convolution operation, pooling layers are the result of a sampling operation, and flat layers are vectors that convert a multidimensional matrix into one dimension. The input to the network is a state matrix StAnd outputs all the operation values Q (S) corresponding to the statet,At) The dimension of the output layer is the size of the motion space a.
To accelerate the convergence of the model, at StBefore being input into the neural network, the neural network needs to be normalized, namely, the two-dimensional image represented by each channel is normalized:
Figure GDA0003218930120000102
therein, max (S)t(i) ) represents StThe maximum gray scale value of the ith two-dimensional image of (1).
The beam training based on deep reinforcement learning mainly comprises the following steps:
step 1, inputting an action space A, a discount factor gamma and a learning rate alpha.
Step 2, initializing DQN parameters, which specifically comprises: and randomly initializing and predicting a target neural network parameter theta, setting the target neural network parameter theta' to theta, and setting the size of a memory bank.
And 3, performing beam training, namely firstly setting the total number of the trained epsilon and the time step number T contained in each epsilon. At the beginning of each epsilon, C time-varying channels H are randomly generatedtInitializing the initial state in a beam scanning manner
Figure GDA0003218930120000103
The following steps are performed in each time step in turn:
step 3.1, supposethe state at time t is StSelecting an action A from an action space A according to an epsilon-greey strategyt
Step 3.2, perform action AtDetermining a set of t +1 time instances for testing beam combinations
Figure GDA0003218930120000104
Calculating the received signal strength corresponding to all elements in the set
Figure GDA0003218930120000105
Setting the received signal corresponding to the untested wave beam combination as 0, thereby obtaining a received signal strength matrix Zt+1
Step 3.3, update the state S at the time t +1t+1
Step 3.4, selecting matrix Zt+1Maximum value of
Figure GDA0003218930120000106
The corresponding beam combination is used as the optimal beam combination at the moment of t +1
Figure GDA0003218930120000107
Step 3.5, calculate the optimal beam combination
Figure GDA0003218930120000108
Corresponding user achievable rate
Figure GDA0003218930120000109
Step 3.6, calculate the reward function Rt
Step 4, updating the DQN parameters, which mainly comprises the following steps:
step 4.1, subject this time to step Et=(St,At,Rt,St+1) And storing the data into a memory bank.
Step 4.2, randomly selecting N experiences E ═ s from the memory bankj,aj,rj,s′j1,2, …, N, setting each experienceThe corresponding target value:
Figure GDA0003218930120000111
and 4.3, performing random gradient descent on the parameter theta, and training a neural network.
And 4.4, updating the parameter theta' of the target neural network after T time steps.
Step 5, output Q network Q (s, a; theta)
Specifically, in the training step:
channel H at time ttThe actual corresponding optimal beam combination is the optimal solution of the following optimization problem:
Figure GDA0003218930120000112
the above problem is actually to find the beam combination with the largest objective function value in the codebooks F and W, and assume that the optimal solution of the problem (the actual optimal beam combination) is
Figure GDA0003218930120000113
According to the previously defined optimal beam combination obtained through beam training
Figure GDA0003218930120000114
If it is not
Figure GDA0003218930120000115
The beam training is successful, otherwise, the beam training is failed. Because the channel variation is random, the state of the channel may not be accurately captured at some point in time, resulting in failure of beam training. In this case, the position of the channel cannot be tracked, and if such samples are still used to update the DQN, then the error will propagate continuously, resulting in the failure of the algorithm, and therefore the S needs to be redefinedt
St(C)=WHHtF,
W, F denotes DFT codebooks at the transmitting and receiving ends, respectively. According to the above definition, the state matrix S at time ttOnly the last two-dimensional matrix in the three-dimensional matrix is changed, and the other positions are not changed. Since before StThe last two-dimensional matrix of (a) is a received signal strength matrix Z at time ttTo do so
Figure GDA0003218930120000116
It is according to ZtObtained when
Figure GDA0003218930120000117
When necessary, Z is addedtFrom A to AtAnd (5) removing, so that the algorithm is relocated to the state of the current channel.
Example 2
In this embodiment, on the basis of embodiment 1, a millimeter wave communication beam training device based on deep reinforcement learning is provided, where the device includes:
a beam selection module for executing action A according to t timetGet the set of beam combinations for testing at the next time as
Figure GDA0003218930120000118
Where I represents the total number of beam combinations used for training,
Figure GDA0003218930120000119
representing the ith transceiving beam combination.
A channel sample generation module for generating a plurality of time-varying channel matrixes H according to the random variation of the channel guide angletDetermining each channel matrix H by beam scanningtCorresponding optimal transmit-receive beam combination
Figure GDA00032189301200001110
Received signal matrix module, using set Bt+1The beams in (1) are tested in sequence to obtain each beam combination
Figure GDA00032189301200001111
Corresponding received signal strength zt+1Setting the received signals corresponding to other untested beam combinations to 0, thereby obtaining a matrix Z of the received signal strengtht+1
A state updating module for updating the received signal strength matrix Z according to the t +1 momentt+1Constructing a state matrix S at time t +1t+1And updating the current state.
An optimal receiving and transmitting beam combination determining module for selecting matrix Zt+1Maximum value
Figure GDA0003218930120000121
The corresponding beam combination is used as the optimal beam combination at the moment of t +1
Figure GDA0003218930120000122
Wherein
Figure GDA0003218930120000123
In order to optimize the transmission of the beam,
Figure GDA0003218930120000124
is the best receive beam (obtained through beam training).
A reward calculation module for using the obtained optimal beam combination
Figure GDA0003218930120000125
Transmission signal power P and channel noise variance σ2Computing execution action AtThe prize value of (c).
And the parameter setting module is used for setting parameters of the neural network, other parameters in the beam training process and the like.
A experience storage module for storing experience E in the beam training processt=(St,At,Rt,St+1) And storing the data into a memory bank.
A neural network training module, the input of the neural network is a state matrix StAnd outputs all the operation values Q (S) corresponding to the statet,At) Selecting from the memory bankTake several experiences E ═ sj,aj,rj,s′j1,2, …, N to update the predicted neural network parameter θ.
A target value setting module that calculates a target value for each experience using the update strategy of Q learning:
Figure GDA0003218930120000126
a neural network prediction module for predicting the input state S using the trained networktCorresponding all operation values Q (S)tA), a belongs to A, and the action A with the maximum Q value is selectedt=argmaxa∈A Q(StAnd a) as the optimal action.
The invention is further described below with reference to simulation conditions and results:
setting status image StDepth C of 6, action triplet set D of {0,1,2,3,4}, O of {2, 4}, R of {1, 3}, slot size Ts of 20ms, time t of training a beam combination, and time t of training a beam combinations0.1ms, time of precoding tpThe learning rate α is 0.001, the discount factor γ is 0.95, the memory bank size D is 2000, the parameters of the predictive neural network are assigned to the target neural network through update _ freq is 100 time steps, and the training batch size batch _ size is 64. The adopted DQN is a convolutional neural network with 7 layers, the first convolution operation comprises 32 convolution kernels with 5 x 5, the second convolution operation comprises 16 convolution kernels with 5 x 5, 2 pooling operations all adopt a maximum pooling mode, the step length is 2, the flat function is to convert a three-dimensional matrix into a one-dimensional vector, a full connection layer comprises 128 neurons, and the dimension of an output layer corresponds to the size of an action space.
Considering the down link of a single-user millimeter wave large-scale MIMO communication system, the number of base station end antennas is Nt16, number of subscriber side antennas NrThe antenna arrays are all placed in ULA form 16. Assuming that the number L of propagation paths of millimeter wave signals is 3, LOS path channel gain
Figure GDA0003218930120000127
I.e. a complex gaussian distribution with variance of 1 and mean of 0; two NLOS path channel gains
Figure GDA0003218930120000128
I.e. a complex gaussian distribution with variance 0.01 and mean 0. For the sake of convenience of processing, the variance σ of the channel noise is assumed21, the amount of change in channel steering angle
Figure GDA0003218930120000131
The power P of the transmission signal is 1, and the transmitted signal x is 1. Fig. 6-7 are simulation results of DQN-based beam training considering multiple propagation paths of millimeter wave channels. As can be seen from the figure, as the acceptable signal-to-noise ratio increases, the success rate and the achievable rate of the beam search both show an increasing trend. When the NLOS path is increased, the state change of the channel is more complicated, and the corresponding beam search success rate and the reachable rate are both decreased to some extent, but the decrease is small. The method shows that the multipath effect of the millimeter wave channel has small influence on the algorithm, and the beam training algorithm based on deep reinforcement learning still keeps high performance in the multipath scene.
Considering the number N of the downlink, base station side and user side antennas of a single-user millimeter wave large-scale MIMO communication systemt=NrThe antenna arrays are all placed in ULA form 16. LOS path, channel gain considering only millimeter wave channel
Figure GDA0003218930120000132
Variance σ of channel noise 21, variation of channel steering angle
Figure GDA0003218930120000133
The power P of the transmission signal is 1, and the transmission signal x is 1. Hierarchical codebook adoption document [1]In the codebook construction method in (1), the codeword of the current layer is constructed according to the beam training result of the previous layer, and the DFT codebook is used for beam scanning. FIGS. 8-9 compare the proposed basis for the present inventionPerformance of Beam training algorithm (Beam training based on DQN, BT-DQN) and Beam Scanning (BS) based on DQN and Beam training algorithm (BT-HC) based on hierarchical codebook. As can be seen from fig. 8, the success rate of the BS is the highest under different signal-to-noise ratios among the three beam training schemes. The search success rate of BT-DQN in the areas with low signal to noise ratio and high signal to noise ratio is close to BT-HC and slightly higher than BT-HC; the success rate of BT-DQN in the middle area is higher than BT-HC. As can be seen from fig. 9, the achievable rate of the BS is still the highest, BT-DQN times, and BT-HC the lowest under different snr conditions.
Although the search success rate and achievable rate of the BS are the highest of the three, it requires more beams to be trained at a time, takes more time, and thus is more costly. Table 1 compares the overhead of three different beam training schemes, where t is the time to train a beam combinationsThe time of one beam training is t. As can be seen from the table, the overhead of the BS is 10 times as much as the overhead of the BT-HC, and the search success rate and the reachable rate are increased at the cost of huge overhead. The average expense of BT-DQN is lower than that of BT-HC, and is reduced by about 21 percent.
TABLE 1
Name of algorithm Average overhead (t/t)s)
Beam Scanning (BS) 256
Beam training based on hierarchical codebook (BT-HC) 24
DQN-based Beam Training (BT)-DQN) 19
According to the simulation result, the beam training scheme and the device provided by the invention have the advantages that the success rate and the reachable rate of beam search are higher than those of the beam training scheme based on the codebook under the dynamic channel environment, and the training overhead is lower. Although not as performance-efficient as beam scanning, beam scanning requires a significant training overhead in exchange for high success rates and achievable rates that are in most cases invaluable. Therefore, under the scene of a time-varying channel, the beam training scheme based on the deep Q network provided by the invention can greatly reduce the overhead of beam training on the premise of ensuring higher performance.
The details of the present invention are well known to those skilled in the art.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations can be devised by those skilled in the art in light of the above teachings. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (4)

1. A millimeter wave communication beam training method based on deep reinforcement learning is characterized by comprising the following steps:
step S1, constructing a millimeter wave communication channel model between the user side and the base station side;
step S2, designing codebooks of a user terminal and a base station terminal, constructing a model of a final received signal according to the designed codebooks, and carrying out mathematical modeling on a beam training process according to the model;
step S3, defining the representation of state, action and reward in the beam training;
in step S3, defining the representation of the state in the beam training specifically includes:
let the channel matrix at time t be HtThe matrix of the received signal corresponding thereto is YtDefine the matrix ZtIs YtWill be a matrix Z of received signal strengths at successive timestIs defined as a state StSpecifically, the following are shown:
St(i)=Zt+i-C,i=1,2,…,C (1)
in the formula (1), StIs a three-dimensional matrix with a third dimension of size C, C representing the number of successive time instants, Zt+i-CRepresenting the received signal strength matrix at time t + i-C;
defining the representation of the action in the beam training, specifically comprising:
defining said matrix ZtThe position corresponding to the middle largest element is
Figure FDA0003653426830000011
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003653426830000012
respectively representing indexes of optimal transmitting and receiving beams at the time t in codebooks F and W;
the optimal beam combination at the current time is
Figure FDA0003653426830000013
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003653426830000014
Figure FDA0003653426830000015
the action at time t is defined as:
At=(d,o,r) (2)
in formula (2), d, o, r respectively represent the direction, offset and coverage of the beam search at time t +1 relative to the optimal beam position at time t; where D ∈ D ═ {0,1,2,3,4}, there are preferably 5The direction of (c): 0 represents no movement, 1,2,3,4 represent i respectivelytThe position is that the base point moves up, down, left and right; o belongs to O ═ O {0,1,2, …, M-1}, and M optional offsets are provided, and the offsets are defined as the distance between the center position of the beam search at the moment t +1 and the position of it; r ∈ R ═ {1,2, …, N }, there are N optional radii, and the radius is defined as the coverage radius with the central position of the beam search at the time t +1 as the base point;
defining the representation of the reward in the beam training, and specifically comprising:
Figure FDA0003653426830000016
in the formula (3), Bt+1Represented as agent performing action A at time ttObtaining a set of beam combinations, t, for testing at the next timesRepresenting the time, t, at which a beam combination is testedpDenoted as the precoding phase in one time step of the beam training, Ts is denoted as one time step of the beam training,
Figure FDA0003653426830000017
representing the data reachable rate corresponding to the optimal beam combination at the moment of t + 1;
step S4, regarding the state defined in the step S3 as a multi-channel image, inputting the multi-channel image into the constructed convolutional neural network, and obtaining values of all actions corresponding to the state;
the method comprises the steps of updating an input state by using a convolutional neural network, updating parameters of the convolutional neural network by taking a predicted value of Q learning as a target, predicting by using a trained network, selecting an action with the maximum Q value, and testing by using a beam combination cluster corresponding to the action to reduce training overhead.
2. The method for training the millimeter wave communication beam based on deep reinforcement learning according to claim 1, wherein the step S1 specifically includes:
arranging a needleFor a millimeter wave MIMO communication system of a single user, the system has N user terminalsrRoot antenna, base station end has NtThe arrangement modes of the antennas are uniform linear arrays, and the millimeter wave communication channel model is modeled as follows:
Figure FDA0003653426830000021
in the formula (4), L and alphal
Figure FDA0003653426830000022
θlRespectively representing the number of paths, the channel gain of the ith path, the arrival angle of a channel and the departure angle of the channel; definition of
Figure FDA0003653426830000023
ΘlAnd ΨlObey [0, π ] for both the arrival angle and departure angle of the spatial domain]Inner uniform distribution, dtAnd drRespectively representing the interval between the array antennas of the base station end and the user end, wherein lambda is the wavelength of a millimeter wave signal, and u (-) represents a channel steering vector; the variation of the steering angle of the channel in the adjacent time interval follows Gaussian distribution, and the expression is as follows:
Figure FDA0003653426830000024
in the formula (5), θ0U (0, pi) represents the initial channel steering angle, θ, at which time t is 0 and randomtRepresenting the channel steering angle at time t,
Figure FDA0003653426830000025
indicating the amount of change in the channel steering angle.
3. The method for training millimeter wave communication beams based on deep reinforcement learning of claim 1, wherein in the step S2, the expression of the codebooks at the ue and the bs is:
Figure FDA0003653426830000026
Figure FDA0003653426830000027
in the formula (6) and the formula (7),
Figure FDA0003653426830000028
the expression of the final received signal is:
Figure FDA0003653426830000029
in the formula (8), P,
Figure FDA00036534268300000210
Respectively representing the transmitting power of a base station end, the receiving code word of a user end, the transmitting code word of the base station end and a channel noise vector, and | | | w | | calving2=||f||2=1,|x|2=1;
Thus, the expression of the received signal matrix is:
Figure FDA0003653426830000031
in the formula (9), the reaction mixture,
Figure FDA0003653426830000032
DFT codebooks at a receiving end and a transmitting end are respectively indicated,
Figure FDA0003653426830000033
representing the channel matrix, x, P representing the transmitted signal and the signal, respectivelyThe power of the electric motor is controlled by the power controller,
Figure FDA0003653426830000034
Figure FDA0003653426830000035
a matrix representing the noise of the channel is represented,
Figure FDA0003653426830000036
an element Y (m, N) in the mth row and nth column in the matrix indicates that the transmitting end uses the nth (N is 1,2, …, N in the codebook Ft) The code word transmitting/receiving end uses the m-th (m is 1,2, …, N) in the codebook Wr) Receiving the resulting signal by a code word; the beam training process is represented as the following optimization problem:
Figure FDA0003653426830000037
4. the deep reinforcement learning-based millimeter wave communication beam training method according to claim 3, wherein the convolutional neural network specifically comprises two convolutional layers, two pooling layers, a flat layer, a full-link layer and an output layer; the states are normalized before they are input to the neural network, in particular, the two-dimensional image represented by each channel is normalized.
CN202110623890.1A 2021-06-04 2021-06-04 Millimeter wave communication beam training method based on deep reinforcement learning Active CN113411110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110623890.1A CN113411110B (en) 2021-06-04 2021-06-04 Millimeter wave communication beam training method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110623890.1A CN113411110B (en) 2021-06-04 2021-06-04 Millimeter wave communication beam training method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113411110A CN113411110A (en) 2021-09-17
CN113411110B true CN113411110B (en) 2022-07-22

Family

ID=77676276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110623890.1A Active CN113411110B (en) 2021-06-04 2021-06-04 Millimeter wave communication beam training method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113411110B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113904704B (en) * 2021-09-27 2023-04-07 西安邮电大学 Beam prediction method based on multi-agent deep reinforcement learning
WO2023071760A1 (en) * 2021-10-29 2023-05-04 中兴通讯股份有限公司 Beam domain division method and apparatus, storage medium, and electronic device
CN114021987A (en) * 2021-11-08 2022-02-08 深圳供电局有限公司 Microgrid energy scheduling strategy determination method, device, equipment and storage medium
CN114567525B (en) * 2022-01-14 2023-07-28 北京邮电大学 Channel estimation method and device
CN114499605B (en) * 2022-02-25 2023-07-04 北京京东方传感技术有限公司 Signal transmission method, signal transmission device, electronic equipment and storage medium
CN117035018A (en) * 2022-04-29 2023-11-10 中兴通讯股份有限公司 Beam measurement parameter feedback method and receiving method and device
CN114844538B (en) * 2022-04-29 2023-05-05 东南大学 Millimeter wave MIMO user increment cooperative beam selection method based on wide learning
CN115065981B (en) * 2022-08-16 2022-11-01 新华三技术有限公司 Beam tracking method and device
CN115426007B (en) * 2022-08-22 2023-09-01 电子科技大学 Intelligent wave beam alignment method based on deep convolutional neural network
CN115580879A (en) * 2022-09-07 2023-01-06 重庆邮电大学 Millimeter wave network beam management method based on federal reinforcement learning
CN117692014B (en) * 2024-02-01 2024-04-23 北京雷格讯电子股份有限公司 Microwave millimeter wave communication method and communication system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417444B (en) * 2019-07-08 2020-08-04 东南大学 Millimeter wave channel beam training method based on deep learning
CN110401476B (en) * 2019-08-05 2022-07-08 东南大学 Codebook-based millimeter wave communication multi-user parallel beam training method
CN110971279B (en) * 2019-12-30 2021-09-21 东南大学 Intelligent beam training method and precoding system in millimeter wave communication system
CN112073106B (en) * 2020-08-14 2022-04-22 清华大学 Millimeter wave beam prediction method and device, electronic device and readable storage medium

Also Published As

Publication number Publication date
CN113411110A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
CN113411110B (en) Millimeter wave communication beam training method based on deep reinforcement learning
US11626909B2 (en) Method and device for enhancing power of signal in wireless communication system using IRS
KR102154481B1 (en) Apparatus for beamforming massive mimo system using deep learning
CN110113088B (en) Intelligent estimation method for wave arrival angle of separated digital-analog hybrid antenna system
CN113438002B (en) LSTM-based analog beam switching method, device, equipment and medium
Shen et al. Design and implementation for deep learning based adjustable beamforming training for millimeter wave communication systems
CN113162666B (en) Intelligent steel-oriented large-scale MIMO hybrid precoding method and device
CN113193893B (en) Millimeter wave large-scale MIMO intelligent hybrid beam forming design method
CN112448742A (en) Hybrid precoding method based on convolutional neural network under non-uniform quantization
Zhang et al. Intelligent beam training for millimeter-wave communications via deep reinforcement learning
Nguyen et al. Deep unfolding hybrid beamforming designs for THz massive MIMO systems
Chafaa et al. Federated channel-beam mapping: from sub-6ghz to mmwave
Elbir et al. Cognitive learning-aided multi-antenna communications
Abdallah et al. Multi-agent deep reinforcement learning for beam codebook design in RIS-aided systems
CN113872655A (en) Multicast beam forming rapid calculation method
CN113169777A (en) Beam alignment
CN114844538B (en) Millimeter wave MIMO user increment cooperative beam selection method based on wide learning
CN114866126B (en) Low-overhead channel estimation method for intelligent reflection surface auxiliary millimeter wave system
CN112242860B (en) Beam forming method and device for self-adaptive antenna grouping and large-scale MIMO system
CN115133969A (en) Performance improving method of millimeter wave large-scale MIMO-NOMA system
CN114598574A (en) Millimeter wave channel estimation method based on deep learning
CN115604824A (en) User scheduling method and system
CN115102590B (en) Millimeter wave beam space hybrid beam forming method and device
Wang et al. New Environment Adaptation with Few Shots for OFDM Receiver and mmWave Beamforming
CN113904704B (en) Beam prediction method based on multi-agent deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant