CN115580879A - Millimeter wave network beam management method based on federal reinforcement learning - Google Patents

Millimeter wave network beam management method based on federal reinforcement learning Download PDF

Info

Publication number
CN115580879A
CN115580879A CN202211088629.7A CN202211088629A CN115580879A CN 115580879 A CN115580879 A CN 115580879A CN 202211088629 A CN202211088629 A CN 202211088629A CN 115580879 A CN115580879 A CN 115580879A
Authority
CN
China
Prior art keywords
millimeter wave
base station
model
wave base
beam management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211088629.7A
Other languages
Chinese (zh)
Inventor
薛青
来东
徐勇军
梁志芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211088629.7A priority Critical patent/CN115580879A/en
Publication of CN115580879A publication Critical patent/CN115580879A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/24Cell structures
    • H04W16/28Cell structures using beam steering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a millimeter wave network beam management method based on federal reinforcement learning, and belongs to the field of wireless communication. The method aims to carry out wave beam configuration at the millimeter wave base station side by collecting position information of users, namely, dynamic control of wave beam direction at the millimeter wave base station side is carried out based on periodic sensing user distribution, the maximum user coverage rate is realized by utilizing limited wave beams, and the wave beam utilization efficiency is improved. The invention implements corresponding beam management strategies through a reinforcement learning algorithm, aims to maximize the long-term network throughput and simultaneously realizes the intellectualization of beam management. In addition, the invention introduces a federal learning framework, and protects the privacy and safety of user data while finding the optimal beam configuration strategy of the system.

Description

Millimeter wave network beam management method based on federal reinforcement learning
Technical Field
The invention belongs to the field of wireless communication, and relates to a millimeter wave network beam management method based on federal reinforcement learning.
Background
The ultra-dense networking technology can solve the problems of limited coverage area of the millimeter wave base stations and sharp increase of data flow by increasing the number of the millimeter wave base stations in a unit area. The multiple association technology is one of key technologies in ultra-dense networking, and aims to realize simultaneous connection between a user and a plurality of millimeter wave base stations, ensure the communication quality of the user and improve the service rate of the user. Therefore, the related research based on ultra-dense networking has important significance. Millimeter wave transmission has large path loss, but the short wavelength nature of millimeter waves enables millimeter wave devices to integrate large-scale antenna arrays in relatively small sizes. By using the beam forming technology, the energy of the transmitted signal can be concentrated in a specific direction, so that additional antenna gain is obtained to make up for the path loss and improve the received power of the signal. If the number of narrow beams that can be formed by the millimeter wave base station is limited, and the beams cannot be used to realize full-area coverage, and only cover a part of the user area, then how to perform reasonable beam configuration on the millimeter wave base station side, and using the limited beams to cover as many users as possible is a key problem for improving the performance of the millimeter wave system. The design of beam management strategies under ultra-dense networking typically faces the following problems: 1) The ultra-dense networking causes the number of beams in the network to be increased greatly, so that the beam management problem is more complex compared with a common scene; 2) The user data has privacy, and how to realize the user privacy protection is to be solved.
The prior art related to the present invention can be mainly summarized as follows.
(1) Beam management related art: in millimeter wave communication, beam management is a relatively broad concept, including beam training, beam tracking, etc., and various schemes have been proposed at present. For example, for the beam training process, the following steps are usually included: beam scanning Beam surfing, beam measurement Beam reporting, beam determination, beam maintenance, beam failure recovery, etc. Different from the concept, the beam management in the invention mainly refers to the beam configuration problem at the millimeter wave base station side.
(2) The prior patent technology is as follows: for example, patent CN113055059A discloses a beam management method for massive MIMO communication. The method has the advantages that the method focuses on beam selection and beam maintenance between the millimeter wave base station and the user, beam selection in a historical beam management scheme is achieved through a CF (compact filter) collaborative filtering algorithm, the moving path of the user is predicted, and a continuous communication link is established through beam refinement. Patent CN113785503A discloses a beam management method using adaptive learning. The method aims to establish a beam management model at a user side and utilizes a reinforcement learning algorithm to carry out intelligent processing. The method focuses on the selection (alignment) of the beams and the association process between the base station and the user. Compared with the scheme disclosed in the prior patent of the invention, the invention has different research scenes and different research problems, and the proposed solution is more focused on solving the problem of optimal configuration of beams at the millimeter wave base station side in the ultra-dense networking.
(3) Federal reinforcement learning related art: deep Reinforcement Learning (DRL) combines deep learning and reinforcement learning for processing the perceptual decision problem of complex systems. The DRL can be used for solving the beam management problem in the millimeter wave ultra-dense networking scene. Federal learning is a promising distributed machine learning architecture, equipment can perform data acquisition and model training locally, and then uploads the trained model to a central node for model aggregation, so that the flow of original data is avoided, and the data privacy safety is greatly protected. For beam management in a super-dense millimeter wave network scene, implementing centralized decision results in huge resource overhead and time overhead, and federal learning can effectively overcome the problem. Therefore, the invention discloses a large-scale beam management method adopting DRL under a federal learning architecture.
Disclosure of Invention
In view of this, the present invention provides a method for managing beams of a millimeter wave network based on federal reinforcement learning.
In order to achieve the purpose, the invention provides the following technical scheme:
a millimeter wave network beam management method based on federal reinforcement learning comprises the following steps:
s1: constructing a millimeter wave base station side beam management model;
s2: initializing parameters of a millimeter wave base station side beam management model;
s3: each millimeter wave base station respectively collects the position information of a local user and trains a local beam management model by using a reinforcement learning algorithm;
s4: updating local model parameters by using a random gradient descent SGD;
s5: repeating S3 and S4, and entering S6 after the local model converges or iterates for N times;
s6: each millimeter wave base station uploads the local model parameters to a central control node or a central server to carry out model parameter aggregation, and a global model of beam management is obtained;
s7: each millimeter wave base station downloads global model parameters from the central control node to update the local model, and carries out beam configuration decision according to the current user information;
s8: and returning to S2, and waiting for next round of beam optimization.
Optionally, in S1, the millimeter wave base station side beam management problem is modeled as a markov decision process, and is solved by a reinforcement learning algorithm; a Markov decision process typically contains four elements S, A, P, R, where S represents the state space of the Markov decision process, A represents the motion space, P represents the state transition probability, and R represents the reward value.
Optionally, in S1, the input of the markov decision process may be determined by a coverage sector set of the millimeter wave base station beam; and representing the beam management strategy at the time t as a sector set covered by all millimeter wave base stations at the time t, namely the beam management strategy C (t) = { C (C) of the system 1 (t),C 2 (t),...,C M (t) }, in which C M (t) represents a sector set covered by the millimeter wave base station M at the time t; and a proper strategy is adopted to enable the millimeter wave base station to cover more users by using limited wave beams, so that the utilization rate of the wave beams and the throughput of the system are improved.
Optionally, in S1, a reinforcement learning algorithm DDQN is used to solve the markov decision process, and an initial model is established; the initial model is composed of two four-layer fully-connected neural networks, namely a training neural network and a target neural network; the training neural network is used to evaluate the value of the current action-state, i.e., the Q-value, while the target neural network is used to determine the maximum Q-value, expressed asQ max And comparing the difference E [ (Q) between the two networks max -Q) 2 ]Defining as a loss function; the neural network adopts a ReLU function as an activation function, and the system and the rate are used as rewards for feedback; compared with the traditional reinforcement learning algorithm, the DDQN is added with an experience playback pool and model evaluation, and the problems of model deviation caused by excessively high DQN estimation and the cost of excessively large state space and action space caused by a Q-learning algorithm are solved.
Optionally, in S1, the beam management policy is performed periodically, and the beam configuration of each millimeter wave base station in the same period remains unchanged; each cycle contains three parts of content: 1) The network performance accumulated in the last period of the millimeter wave base station is configured by using a Federal-based reinforcement learning algorithm; 2) A user selects a proper millimeter wave base station for association; 3) And establishing a millimeter wave communication link for data transmission.
Optionally, in S2, a local model of millimeter wave base station side beam management is initialized; when the wave beam management period begins, each millimeter wave base station updates the local model by using the global model parameters downloaded from the central control node, so that local model convergence can be realized more quickly on the basis of keeping local characteristics; after the model is initialized, the millimeter wave base station configures the base station side wave beam of the current round according to the optimal wave beam management strategy obtained in the previous period; then, the corresponding local model is trained through the latest user position information.
Optionally, in S3, a user data screening mechanism is added before the beam management model training is performed, so as to ensure validity and diversity of data participating in the model training; the millimeter wave base station carries out relevance judgment by calculating the distance between the millimeter wave base station and each user, and selects the user position information in the coverage area as effective data for model training; in addition, whether the users participate in the training is judged according to the historical participation times of the users, and if the historical participation times are less, the users are included in the training range, so that the diversity of the participation model training data is ensured.
Optionally, in S6, the model parameters obtained in S5 are used to train a global model; and uploading the trained local model parameters to a central control node by the millimeter wave base station to perform aggregation of the model parameters so as to update the global model.
The invention has the beneficial effects that:
1. the invention provides a method for managing millimeter wave base station side wave beams based on federal reinforcement learning, which is characterized in that under the condition that the millimeter wave base station wave beams are limited, the position change of a user is periodically sensed by using a reinforcement learning algorithm, the wave beams on the millimeter wave base station side are configured, the self-adaptive management of the millimeter wave base station side wave beams is realized, the wave beam utilization efficiency of a millimeter wave base station is further improved, and the throughput of a system is optimized for a long time.
2. The invention provides a millimeter wave base station side beam management method based on federal reinforcement learning, which carries out model sharing through the idea of federal, so that user data can be trained locally without being uploaded to a central processing unit, the privacy safety of a user is ensured, and the convergence rate of a global model is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a method for managing beams of a millimeter wave network based on federated learning;
fig. 2 is a system model of an ultra-dense millimeter wave heterogeneous network.
Fig. 3 is a system model of a super-dense millimeter wave homogeneous network.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by the terms "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limiting the present invention, and the specific meaning of the terms described above will be understood by those skilled in the art according to the specific circumstances.
A method for managing beams of a millimeter wave network based on federal reinforcement learning is shown in fig. 1.
In this embodiment, the beam management method based on federal reinforcement learning may be applied to a two-layer heterogeneous millimeter wave ultra-dense networking, as shown in fig. 2. The macro base station working in the microwave frequency band can be used as a central control node, the global model training and the local model parameter aggregation task are carried out, and each millimeter wave base station is used as an intelligent agent to conduct local beam management model training. That is, both the millimeter wave base station and the macro base station can perform user location information acquisition and model training based on DDQN, while the macro base station needs to perform model parameter aggregation. The communication between the millimeter wave base station and the user is realized by establishing a corresponding millimeter wave link, and the millimeter wave base station and the macro base station can be connected through an X2 interface.
The specific implementation steps of the scheme are as follows:
step 1: and constructing a millimeter wave base station side beam management (beam configuration) model. The method specifically comprises the following steps: the beam management problem on the millimeter wave base station side can be described by a markov decision process and solved by DDQN. In the heterogeneous network shown in fig. 2, the set of millimeter wave base stations participating in training is represented by M = {1, 2.., M }, and the set of users is represented by U = {1, 2.., U }. Each macroelement in DDQN is defined as follows.
(1) Defining the state space at time t as S t ={U m (t),C m (t),D k (t) }, in which U m (t) represents the set of users served by the millimeter wave base station m at time t, C m (t) represents the set of beam sectors covered by the millimeter wave base station m at time t, D k (t)={C m (t)} k=1,2,...,M,k≠m And the set of coverage sectors of the millimeter wave base stations except the millimeter wave base station m at the time t is represented.
(2) Defining the motion space at the time t as A t ={C m (t)}。
(3) Defining the network state slave S of the millimeter wave base station at the time t t Go to S t+1 Has a probability of P = P r {S t+1 |S t }。
(4) Defining the reward function at time t as R t = R (t), wherein
Figure BDA0003836191750000051
Represents the system throughput at time t, where the received data rate r of user u at time t u (t) is represented by
Figure BDA0003836191750000052
Wherein W m Represents the bandwidth, W, allocated to the user by the millimeter wave base station M Denotes the bandwidth of the macro base station, N M (t) denotes the number of users served by the macro base station, M c (t) represents a set of millimeter wave base stations associated with the user at time t,
Figure BDA0003836191750000053
meaning that the user is associated with the macro base station only,
Figure BDA0003836191750000054
then it means that the user performs communication in the dual association mode at time t (i.e. the user associates the macro base station and the millimeter wave base station simultaneously), and the SINR u,m Represents the SINR, SINR when the user u is associated with the millimeter wave base station m u,M Representing the signal to interference plus noise ratio when user u is associated with the macro base station.
And 2, step: and initializing parameters of a millimeter wave base station side beam management model. The millimeter wave base station initializes the local model of the millimeter wave base station by downloading the global model parameters sent by the macro base station. Local model parameter θ t+1 In an update manner of
Figure BDA0003836191750000061
Wherein G represents the global model parameter output by the system after the upper round communication is finished, and rho is the model learning rate L (theta) t ) Representing the loss function of the ith millimeter wave base station, n i Representing the amount of training data for the ith mm wave base station. After initialization is complete, the base station begins training the local model.
And step 3: and each millimeter wave base station respectively collects the position information of the local user and trains a local beam management model by using a DDQN algorithm. And (3) each millimeter wave base station completes the beam configuration of the communication according to the initial model parameters obtained in the step (2), trains the corresponding beam management model according to the user position information, and further obtains the optimal beam management strategy. Before model training, each millimeter wave base station can perform user data screening once so as to ensure the validity and diversity of training data. DDQN is theta according to weight value t The training neural network evaluates the Q value and gives a weight value of
Figure BDA0003836191750000062
To estimate Q max . Wherein in state S t Lower adopting action A t Q value of (1) available Q function
Figure BDA0003836191750000063
And (4) showing. For objective functions in DDQN
Figure BDA0003836191750000064
Update wherein R t+1 Is the reward function at the moment of t +1, gamma is a reduction factor, and gamma belongs to [0,1']. The purpose of the DDQN is to minimize the difference between the target network and the training network, and the available loss function L (θ) = E [ (Y) t DDQN -Q(S t ,A t ;θ t )) 2 ]And (6) evaluating. The core of the method is to determine the optimal beam management scheme by minimizing the loss function.
And 4, step 4: the local model parameters are updated with a random gradient descent. Update local model with SGD, have
Figure BDA0003836191750000065
Where λ is the step size.
And 5: and (5) repeating the steps 3 and 4, and entering the step 6 after the local model converges or iterates for a certain number of times.
And 6: after training is finished, each millimeter wave base station uploads local model parameters to the macro base station for model parameter aggregation, and a global model for beam management is obtained
Figure BDA0003836191750000066
Where n represents the total training data volume for all millimeter wave base stations participating in the training.
And 7: and each millimeter wave base station downloads global model parameters from the macro base station to update the local model, and makes beam configuration decision according to the current user information.
And 8: and returning to the step 2, and waiting for the next round of beam optimization.
In this embodiment, a beam management model is established based on a DDQN algorithm by using a federal learning framework, so as to maximize long-term throughput and simultaneously implement intelligent management of beams. Local model parameters trained by the millimeter wave base station in the system are uploaded to the macro base station for aggregation, a global model is obtained, the local model is further updated by downloading the global model parameters, and therefore the influence of surrounding cells is taken into consideration on the basis of keeping the characteristics of local data. The scheme can reduce the flow of user data while finding the optimal beam distribution strategy of the system, thereby greatly protecting the privacy security of the user.
The second embodiment of the invention:
in the present embodiment, the beam management method based on the federal reinforcement learning will be used for the mm wave homogeneous network, as shown in fig. 3. Each millimeter wave base station is used as an intelligent agent to perform distributed learning and cooperation, and the function of the central control node can be temporarily borne by a certain millimeter wave base station. The information sharing is realized by establishing a corresponding communication link between each node and the user, and the millimeter wave base stations are connected through an X2 interface. The millimeter wave base stations adjacent to each other can form a millimeter wave base station cluster, and the millimeter wave base stations belonging to the same cluster can share model parameters in a federal mode. Fig. 3 shows three millimeter wave base station clusters, where millimeter wave base stations in the clusters use DDQN to train local beam management models, and then obtain beam management models of the clusters through model aggregation. Model parameters are shared among all clusters in the same mode, and therefore the globally optimal beam management strategy is obtained. Similar to the first embodiment, the method comprises the following specific implementation steps:
step 1: and constructing a millimeter wave base station side beam management model. In particular, the beam management problem on the millimeter wave base station side can be described by a Markov decision process and solved by DDQN. In the homogeneous network shown in fig. 3, the millimeter wave base station set participating in training in the same cluster is denoted by M = {1,2,. Multidot.m }, and the user set is denoted by U = {1,2,. Multidot.u }. Each macroelement in DDQN is defined as follows.
(1) Defining the state space at time t as S t ={U m (t),C m (t),D k (t) }, in which U m (t) represents the set of users served by the millimeter wave base station m at time t, C m (t) represents the set of beam sectors covered by the millimeter wave base station m at time t, D k (t)={C m (t)} k=1,2,...,M,k≠m And the set of coverage sectors of the millimeter wave base stations except the millimeter wave base station m at the time t is represented.
(2) Defining the motion space at the moment t as A t ={C m (t)}。
(3) Defining the network state slave S of the millimeter wave base station at the time t t Go to S t+1 Has a probability of P = P r {S t+1 |S t }。
(4) The reward function defining time t is defined as R t = R (t), wherein
Figure BDA0003836191750000071
Represents the system throughput at time t, where the received data rate r of user u at time t u (t) is represented by
Figure BDA0003836191750000072
Wherein W m Bandwidth allocated to users for millimeter wave base stations, M c (t) represents the millimeter wave base station set associated with the user at time t, SINR u,m Representing the signal to interference plus noise ratio when user u is associated with millimeter wave base station m.
Step 2: and initializing parameters of a millimeter wave base station side beam management model. The millimeter wave base station initializes the local model of the millimeter wave base station by downloading the global model parameters sent by the macro base station. Local model parameter θ t+1 Is updated in a manner that
Figure BDA0003836191750000073
Wherein G represents the global model parameter output by the system after the upper round communication is finished, and rho is the model learning rate L (theta) t ) Represents the loss function of the ith millimeter wave base station, n i Representing the amount of training data for the ith mm wave base station. Initialization is completedThereafter, the base station begins training the local model.
And step 3: and each millimeter wave base station respectively collects the position information of the local user and trains a local beam management model by using the DDQN. And (3) each millimeter wave base station completes the beam configuration of the communication of the current round according to the initial model parameters obtained in the step (2), trains a corresponding beam management model according to the user position information, and further obtains the optimal beam management strategy. Before model training, each millimeter wave base station can perform user data screening once so as to ensure the validity and diversity of training data. DDQN according to weight value theta t The training neural network evaluates the Q value and gives a weight value of
Figure BDA0003836191750000081
To estimate Q max . Wherein in state S t Lower adoption action A t Q value of (1) available Q function
Figure BDA0003836191750000082
To indicate. Update of the objective function in DDQN to
Figure BDA0003836191750000083
Wherein R is t+1 Is the reward function at the moment of t +1, gamma is a reduction factor, and gamma belongs to [0,1']. The purpose of the DDQN is to minimize the difference between the target network and the training network, i.e. the loss function L (θ) = E [ (Y) t DDQN -Q(S t ,A t ;θ t )) 2 ]. The core of the method is to determine the optimal beam management scheme by minimizing the loss function.
And 4, step 4: the local model parameters are updated using a stochastic gradient descent. Update local model with SGD, have
Figure BDA0003836191750000084
Where λ is the step size.
And 5: and (5) repeating the steps 3 and 4, and entering the step 6 after the local model converges or iterates for a certain number of times.
Step 6: after training, each millimeter wave base station will copy the bookUploading the earth model parameters to a macro base station for model parameter aggregation to obtain a global model for beam management
Figure BDA0003836191750000085
Where n represents the total training data volume for all millimeter wave base stations participating in training.
And 7: and each millimeter wave base station downloads global model parameters from the macro base station to update the local model, and makes beam configuration decision according to the current user information.
And 8: and returning to the step 2, and waiting for the next round of beam optimization.
In this embodiment, a fully distributed federal learning framework is utilized, and a beam management model is trained based on DDQN, so as to maximize the long-term throughput of the system and simultaneously achieve intelligent management of beams. In this embodiment, the training method of the beam management model in the same cluster is the same as that in the first embodiment. The difference between this embodiment and the first embodiment is that the central control node that undertakes the aggregation task is no longer a fixed macro base station, but is undertaken by millimeter wave base stations in turn inside the cluster or determined by setting certain selection conditions (for example, execution time of local model training, number of service users, and the like). If the millimeter wave base station bearing the aggregation task is selected based on a certain condition, the base station with the shortest time for executing the local model training may be selected, or the base station with the fewest number of users in service in the current period may be selected. This is mainly to consider factors such as model training efficiency and local computing resources. If a plurality of similar clusters exist in the system and model interaction can be performed in each cluster based on a federal learning framework, a globally optimal beam management strategy in the millimeter wave homogeneous system can be obtained by performing model sharing among the plurality of clusters.
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A millimeter wave network beam management method based on federal reinforcement learning is characterized in that: the method comprises the following steps:
s1: constructing a millimeter wave base station side beam management model;
s2: initializing parameters of a millimeter wave base station side beam management model;
s3: each millimeter wave base station respectively collects the position information of a local user and trains a local beam management model by using a reinforcement learning algorithm;
s4: updating local model parameters by using a random gradient descent SGD;
s5: repeating S3 and S4, and entering S6 after the local model converges or iterates for N times;
s6: each millimeter wave base station uploads the local model parameters to a central control node or a central server for model parameter aggregation, and a global model of beam management is obtained;
s7: each millimeter wave base station downloads global model parameters from a central control node to update a local model, and carries out beam configuration decision according to current user information;
s8: and returning to S2, and waiting for next round of beam optimization.
2. The method of claim 1, wherein the method comprises: in the S1, the millimeter wave base station side beam management problem is modeled into a Markov decision process and is solved through a reinforcement learning algorithm; a Markov decision process typically contains four elements S, A, P, R, where S represents the state space of the Markov decision process, A represents the motion space, P represents the state transition probability, and R represents the reward value.
3. The method of claim 2, wherein the method comprises: in S1, the input of the Markov decision process can be based on millimeter waveDetermining a set of coverage sectors of a station beam; representing the beam management strategy at the time t as a sector set covered by all millimeter wave base stations at the time t, namely the beam management strategy C (t) = { C of the system 1 (t),C 2 (t),...,C M (t) }, in which C M (t) represents a sector set covered by the millimeter wave base station M at the time t; and a proper strategy is adopted to enable the millimeter wave base station to cover more users by using limited beams, so that the utilization rate of the beams and the throughput of the system are improved.
4. The method of claim 3, wherein the method comprises: in the S1, a Markov decision process is solved by using a reinforcement learning algorithm DDQN, and an initial model is established; the initial model is composed of two four-layer fully-connected neural networks, namely a training neural network and a target neural network; the training neural network is used to evaluate the value of the current action-state, i.e., the Q value, while the target neural network is used to determine the maximum Q value, denoted Q max And the difference E [ (Q) between the two networks max -Q) 2 ]Defining as a loss function; the neural network adopts a ReLU function as an activation function, and the system and the rate are used as rewards for feedback; compared with the traditional reinforcement learning algorithm, the DDQN is added with an experience playback pool and model evaluation, and the problems of model deviation caused by excessively high DQN estimation and the cost of excessively large state space and action space caused by a Q-learning algorithm are solved.
5. The method of claim 4, wherein the method comprises: in the S1, the beam management strategy is performed periodically, and the beam configuration of each millimeter wave base station in the same period is kept unchanged; each cycle contains three parts of content: 1) The network performance accumulated in the last period of the millimeter wave base station is configured by using a Federal-based reinforcement learning algorithm; 2) A user selects a proper millimeter wave base station for association; 3) And establishing a millimeter wave communication link for data transmission.
6. The method of claim 5, wherein the method comprises: in the S2, initializing a local model of millimeter wave base station side beam management; when the wave beam management period begins, each millimeter wave base station updates the local model by using the global model parameters downloaded from the central control node, so that local model convergence can be realized more quickly on the basis of keeping local characteristics; after the model is initialized, the millimeter wave base station configures the base station side wave beam of the current round according to the optimal wave beam management strategy obtained in the previous period; then, the corresponding local model is trained through the latest user position information.
7. The method of claim 6, wherein the method comprises: in the S3, a user data screening mechanism is added before the beam management model training is carried out, so that the effectiveness and diversity of data participating in the model training are ensured; the millimeter wave base station carries out relevance judgment by calculating the distance between the millimeter wave base station and each user, and selects the user position information in the coverage area as effective data for model training; in addition, whether the user participates in the training is judged according to the historical participation times of the users, and if the historical participation times are less, the user is brought into the range of the training, so that the diversity of the model training data is guaranteed.
8. The method of claim 7, wherein the method comprises: in S6, training a global model by using the model parameters obtained in S5; and the millimeter wave base station uploads the trained local model parameters to the central control node to carry out model parameter aggregation so as to update the global model.
CN202211088629.7A 2022-09-07 2022-09-07 Millimeter wave network beam management method based on federal reinforcement learning Pending CN115580879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211088629.7A CN115580879A (en) 2022-09-07 2022-09-07 Millimeter wave network beam management method based on federal reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211088629.7A CN115580879A (en) 2022-09-07 2022-09-07 Millimeter wave network beam management method based on federal reinforcement learning

Publications (1)

Publication Number Publication Date
CN115580879A true CN115580879A (en) 2023-01-06

Family

ID=84580561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211088629.7A Pending CN115580879A (en) 2022-09-07 2022-09-07 Millimeter wave network beam management method based on federal reinforcement learning

Country Status (1)

Country Link
CN (1) CN115580879A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111447618A (en) * 2020-03-13 2020-07-24 重庆邮电大学 Intelligent reflector energy efficiency maximum resource allocation method based on secure communication
CN113411110A (en) * 2021-06-04 2021-09-17 东南大学 Millimeter wave communication beam training method based on deep reinforcement learning
CN113709701A (en) * 2021-08-27 2021-11-26 西安电子科技大学 Millimeter wave vehicle networking combined beam distribution and relay selection method
WO2022121985A1 (en) * 2020-12-10 2022-06-16 北京邮电大学 Static and dynamic combined millimeter wave beam resource allocation and optimization method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111447618A (en) * 2020-03-13 2020-07-24 重庆邮电大学 Intelligent reflector energy efficiency maximum resource allocation method based on secure communication
WO2022121985A1 (en) * 2020-12-10 2022-06-16 北京邮电大学 Static and dynamic combined millimeter wave beam resource allocation and optimization method
CN113411110A (en) * 2021-06-04 2021-09-17 东南大学 Millimeter wave communication beam training method based on deep reinforcement learning
CN113709701A (en) * 2021-08-27 2021-11-26 西安电子科技大学 Millimeter wave vehicle networking combined beam distribution and relay selection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIAN WANG; QING XUE: "Beam Management in Ultra-dense Millimeter Wave Network via Federated Learning", 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 11 December 2021 (2021-12-11) *
李中捷;吴婉敏;高伟;: "基于深度学习的D2D毫米波通信中继选择", 中南民族大学学报(自然科学版), no. 03, 15 June 2020 (2020-06-15), pages 2 - 4 *
马文焱;戚晨皓;: "基于深度学习的上行传输过程毫米波通信波束选择方法", 合肥工业大学学报(自然科学版), no. 12, 28 December 2019 (2019-12-28) *

Similar Documents

Publication Publication Date Title
Maksymyuk et al. Deep learning based massive MIMO beamforming for 5G mobile network
CN113162679B (en) DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
Wang et al. Artificial intelligence-based techniques for emerging heterogeneous network: State of the arts, opportunities, and challenges
Fan et al. Self-optimization of coverage and capacity based on a fuzzy neural network with cooperative reinforcement learning
CN112383922B (en) Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
Liu et al. Deep learning based hotspot prediction and beam management for adaptive virtual small cell in 5G networks
CN106054875B (en) A kind of distributed robots dynamic network connectivity control method
ElHalawany et al. Leveraging machine learning for millimeter wave beamforming in beyond 5G networks
Pan et al. Artificial intelligence-based energy efficient communication system for intelligent reflecting surface-driven vanets
CN114422363A (en) Unmanned aerial vehicle loaded RIS auxiliary communication system capacity optimization method and device
CN113791895A (en) Edge calculation and resource optimization method based on federal learning
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
Qureshi et al. Distributed self optimization techniques for heterogeneous network environments using active antenna tilt systems
CN115811788B (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
CN116074974A (en) Multi-unmanned aerial vehicle group channel access control method under layered architecture
CN115580879A (en) Millimeter wave network beam management method based on federal reinforcement learning
CN115278905B (en) Multi-node communication opportunity determination method for unmanned aerial vehicle network transmission
CN116867025A (en) Sensor node clustering method and device in wireless sensor network
CN115580885A (en) Intelligent decision method and framework of unmanned aerial vehicle communication system
CN109041009A (en) A kind of car networking uplink power distribution method and device
Cui et al. Hierarchical learning approach for age-of-information minimization in wireless sensor networks
CN114423070A (en) D2D-based heterogeneous wireless network power distribution method and system
Wang et al. Decentralized wireless resource allocation with graph neural networks
CN114340003A (en) Wireless network clustering heterogeneous optimization system based on game bidding mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination