CN112187387A

CN112187387A - Novel reinforcement learning method based on rasterization user position automatic antenna parameter adjustment

Info

Publication number: CN112187387A
Application number: CN202011002214.4A
Authority: CN
Inventors: 高晖; 林元杰; 许文俊; 曹若菡; 陆月明
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2021-01-05

Abstract

A novel reinforcement learning method based on automatic antenna parameter adjustment of a rasterized user position is used in the following scenes: a dual-layer heterogeneous cellular network including a plurality of macro base stations and micro base stations with a plurality of macro users. In order to enable a plurality of users moving at high speed in a complex network environment to always keep high weighting and high speed, a novel reinforcement learning method based on automatic antenna parameter adjustment of a rasterized user position is provided. The method comprises two steps: (1) in the off-line modeling stage, the greatest advantage is that the time overhead and the computational complexity in the on-line learning process can be reduced; (2) and (3) an online learning stage: based on the real-time SINR value fed back by the user, the antenna parameter configuration which can lead the user weighting and the speed R to reach the maximum is provided by utilizing the proposed novel reinforcement learning method. Compared with the traditional method, the method has the advantages that the applicable scene is closer to the current network situation, the reinforcement learning has a good effect on time sequence prediction, and meanwhile, the method based on the grid user position has expansibility.

Description

Novel reinforcement learning method based on rasterization user position automatic antenna parameter adjustment

Technical Field

The invention relates to a method for automatically and jointly adjusting multiple parameters of an antenna, in particular to a novel deep reinforcement learning method for automatically adjusting parameters of the antenna based on rasterized user position information in a double-layer heterogeneous cellular network with strong user mobility, and belongs to the technical field of wireless communication.

Background

With the popularization of 5G, the current network structure gradually develops to a dense and complex topology structure. With the continuous improvement of the user's perception of network quality, it is difficult for the conventional network coverage optimization method to meet the development process of high-speed and low-delay networks, and the adjustment of antenna parameters has a great influence on the network coverage. The current antenna tuning is still based on the traditional manual tuning, but the disadvantages of the current antenna tuning are obvious, for example, with the deployment of a new base station or the change of a surrounding communication environment, the labor cost and the time cost of the traditional manual tuning in a dense network are greatly increased, and even more, the trial and error cost in the manual tuning process is high. Therefore, it is very urgent to realize the automatic adjustment of the antenna parameters, and only then, the method can deal with the intensive and complex network conditions which are continuously developed in new period.

The existing schemes for jointly adjusting a plurality of self-optimized antenna parameters only basically adjust one parameter of the downward inclination angle of the antenna, but the invention jointly adjusts three parameters of the downward inclination angle of the antenna, the horizontal wave width of half power and the vertical wave width of half power. Even if the existing scheme of jointly adjusting multiple parameters aims at the application scene which is basically a static environment and a single-layer cellular network, the slightly complicated heterogeneous network environment with multiple cells and simple samples is far from the current network situation under the 5G network, the studied scene is the double-layer heterogeneous cellular network under the user high dynamic situation with more complicated and close to the real network topology structure, and therefore, the scheme design of the invention has more practical application value.

In practical application, situations that a plurality of traditional methods are difficult to deal with are encountered, for example, interference of macro base stations of other cells to macro users, interference of micro base stations of the cell to macro users and the like, and user density and user mobility in a coverage area of the macro base stations are high. Therefore, a deep reinforcement learning method is needed to simplify the optimization target, predict the user state change situation at each moment after the current moment in real time, and ensure that all user weights and rates can be kept at a high level to adjust the antenna parameters at the current moment. Meanwhile, a method combining offline modeling and online learning is utilized, a model with good expansibility is abstracted out to perform offline modeling training and stored, and the burden of online training is greatly reduced.

Aiming at the problems that the conventional method is poor in expansibility, the complex existing network condition is not fully considered, and the like, the invention provides a novel reinforcement learning method based on automatic antenna parameter adjustment of a rasterized user position.

Disclosure of Invention

In view of the above, the present invention provides a novel reinforcement learning method for antenna tuning based on rasterized user position automation, so that a plurality of users moving at high speed in a complex network environment can always maintain high weighting and high speed. The invention relates to a scheme design combining antenna parameter self-optimization and deep reinforcement learning methods in a network coverage area with strong user mobility.

In order to achieve the above object, the present invention provides a novel reinforcement learning method based on rasterized user position automation antenna parameter adjustment, which is used in the following scenarios: the dual-layer heterogeneous cellular network comprises a plurality of macro base stations, a micro base station and a plurality of macro users, and the user mobility is strong. Because the scene is complex and the dynamic change of the user is very frequent, the traditional method is not suitable for antenna parameter adjustment. Therefore, we propose the following method, characterized in that: the method comprises the following two operation steps:

(1) an off-line modeling stage:

the method comprises the steps of performing rasterization processing on a cellular network by analyzing user position information of a network coverage area with strong user mobility and a signal-to-interference-plus-noise ratio (SINR) fed back by a user, performing cluster analysis on user data information in each grid by repeatedly performing snapshot, and finally abstracting a user position information model of a typical macro base station sector.

The specific steps of rasterization are as follows:

repeatedly snapshotting a macro cell and the surrounding environment thereof to acquire data; setting the size of a grid, and rasterizing a macro cell, wherein all the grids are positioned on the same horizontal plane and have no gradient; when a user moves to an area contained in a certain grid, the user is considered to be positioned in the center of the grid, and the number of users contained in each grid in the same time has no upper limit; calculating the distance between a user positioned at the center point of each grid of the macro cell and the macro base station, and the pitch angle and the azimuth angle of the user positioned in the grid relative to the macro base station; carrying out mean value processing on all user SINR values, path loss values and interference between users and a base station obtained in each grid respectively; and combining the rasterized macro cell to obtain a base station-user interference model, a path loss change model, a user position and a corresponding mean value SINR model.

(2) And (3) an online learning stage:

the novel deep reinforcement learning method predicts the state change of a user in the moving process by using a Markov model, matches the optimal antenna parameter configuration corresponding to the user state at different moments through repeated training until the neural network hyper-parameter convergence involved in the deep reinforcement learning, and finally provides the suggestion of the optimal configuration of the antenna parameters of the base station through statistical analysis.

The novel deep reinforcement learning method comprises the following specific steps:

the method comprises the steps of designing action, state, reward and hyper-parameter updating rules required by a reinforcement learning model, and also comprising an optimization target and constraint conditions, wherein the state is used as input, and the action is used as output; initializing each parameter, and inputting a new state (namely a set of SINR values of each macro user); updating the hyper-parameters according to a gradient descent principle from the current moment, continuously giving the reward and action at each moment, and continuously iterating the weighting and the rate R until the hyper-parameters are converged; after the hyperparameter converges, outputting the action with the largest occurrence times (namely a set of corresponding downward inclination angle, vertical half-power wave width and horizontal half-power wave width) in the n iteration processes of maximizing R under the state; repeating the method until all the states are used as input to obtain corresponding action as output; and outputting all the state-action pairs and outputting and selecting the number of the states of each action, thereby finishing all the processes of the automatic antenna parameter adjustment.

The invention relates to a novel depth reinforcement learning method for automatic antenna parameter adjustment based on rasterized user position information. The advantages are that: on the premise of simplifying the existing network conditions and reducing time overhead and calculation complexity, the state and position information fed back by users under most conditions of the existing network is completed, and the optimal configuration of the antenna closest to the actual condition is given so as to meet the condition that all users under the coverage of the antenna always keep weighting and the rate value maximized; meanwhile, the novel deep reinforcement learning method is used for processing actual problems encountered in the antenna parameter adjusting process in more detail, and the best matching mode meeting the data transmission development trend of the existing network user is strived to be achieved.

The key points of the innovation of the method are as follows: compared with the traditional mode of manually adjusting parameters of the antenna, the invention introduces a deep reinforcement learning method, designs various parameters required by a grid user position information model to extract an antenna gain model by utilizing a method combining off-line modeling and on-line learning, provides a set of complete antenna parameter joint adjustment scheme for a large number of user states and position information in a complex network through a novel deep reinforcement learning algorithm, greatly reduces time expenditure and calculation complexity, has strong expansibility, and enables the result to be more fit with the actual situation of the existing network.

Drawings

Fig. 1 is an application scenario of the present invention: a dual-layer heterogeneous cellular network.

Fig. 2 is a flow chart of the novel reinforcement learning method based on the rasterized user location automated antenna tuning of the present invention.

Fig. 3 is a simulation statistical chart of the number of states for selecting different antenna configurations in the case where there are 4 macro users in a typical sector and the SINR granularity of the divided users is 3dB in the embodiment of the present invention.

Fig. 4 is a simulation diagram of SINR granularity variation of divided users, where the percentage of the number of states configured by the selected optimal antenna parameters to all the states is different when the number of users in a sector is different in the embodiment of the present invention; and selecting a simulation diagram that the percentage of the state quantity of the optimal antenna parameter configuration to all the states changes along with the user quantity in the sector when the SINR granularity of the divided users is different.

(remark: optimal antenna parameter configuration means action maximizing the forward in the state case, and satisfying the high percentage of the number of states selecting the action in all the states)

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.

Referring to fig. 1, an application scenario of the method of the present invention is described first: the double-layer heterogeneous cellular network comprises K macro base stations and 2K micro base stations, wherein the K macro base stations which are configured with 3 directional antennas are all located in the centers of K macro cells and respectively cover a 120-degree sector, the 2K micro base stations are uniformly distributed in the K macro cells, each macro cell is averagely provided with U macro users, and the macro users are configured with a single omnidirectional antenna. And the macro user selects and connects with a nearby macro base station generating the maximum SINR value according to the signal-to-interference-plus-noise ratio (SINR), the connected macro base station is marked as K, and signals generated by the remaining K-1 macro base stations are regarded as interference for the macro user. The micro base station serves only a specific user and the signal it generates is also seen as interference to the macro user.

With reference to fig. 2, the following two operating steps of the method of the invention are described:

(1) an off-line modeling stage: the method comprises the steps of performing rasterization processing on a cellular network by analyzing user position information of a network coverage area with strong user mobility and a signal-to-interference-plus-noise ratio (SINR) fed back by a user, performing cluster analysis on user data information in each grid by repeatedly performing snapshot, and finally abstracting a user position information model of a typical macro base station sector.

(2) And (3) an online learning stage:

designing reinforced learning model

a. Design mechanism for state: the SINR value for user u is defined as follows:

for U users in the macro cell k, the SINR value of the user U is converted in dB level,

the conversion formula is: gamma ray_u[n]＝10log₁₀ρ_u[n]，

Then state is ═ gamma₁[n],γ₂[n],...,γ_U-1[n],γ_U[n]]；

b. Design mechanism for action:

action＝[θ_t,θ_3dB,φ_3dB]，

wherein the initial action is action_init＝[15°,10°,75°]Possible values of the respective parameters are shown below as the down tilt angle theta_t＝[0°,3°,6°,9°,12°,15°]

Vertical half power wave width theta_3dB＝[4.4°,6.8°,9.4°,10°,13.5°]

Horizontal half power wave width phi_3dB＝[45°,55°,65°,70°,75°,85°]；

c. Design mechanism for reward:

wherein

α is defined as a learning rate in reinforcement learning;

d. so that the optimization objective is

s.t.C1:

C2:

C3:

Wherein,

for all the sets of states, the state is,

all sets of action;

② designing updating rule of hyper-parameter

The loss function is:

wherein

The updating rule for gradient decreasing to obtain the hyperparameter w is as follows

When n/5 is an integer, the structure,

otherwise, it is not updated

Design mechanism of characteristic value

x＝[x₁,x₂,x₃,...,x_U-1,x_U],

x_u＝γ_u[0]+ΔA_u-Δβ_u(s,a)+ΔL_u,

Wherein Δ A_uFor the value of the gain change, Δ β, during the movement of the user u_u(s, a) is an interference variation value,. DELTA.L_uIs a path loss variation value;

design mechanism for dividing SINR value granularity

The SINR value of the user in the current network data is basically in the range of 0-30dB, and as the SINR value of the user is an infinite number of discrete values, a data initialization process is performed for the convenience of analysis:

for example, a rounding mode is adopted, the SINR granularity at this time is 1dB, and 0-30dB is divided into 31 discrete values; the SINR granularity may also be designed to be 2dB, which is divided into 16 discrete values, and so on. In the simulation implementation test, 2dB, 3dB, 4dB, 5dB and 6dB are adopted to divide the granularity of the SINR value, and the influence on the simulation implementation test result is researched.

Initializing each parameter, inputting a new state, updating the hyper-parameter according to a gradient descending principle from the current moment, continuously giving a reward and an action at each moment, and continuously iterating R until the hyper-parameter is converged; after the hyper-parameter is converged, outputting the state to enable the R to reach the most frequently occurring action in the n iteration processes with the maximum R; repeating the method until all the states are used as input to obtain corresponding action as output; and outputting all the state-action pairs and outputting and selecting the number of the states of each action, thereby finishing all the processes of the automatic antenna parameter adjustment.

In order to show the practical performance of the method of the present invention, the applicant performs multiple simulation implementation tests, a network configuration model in a test system is an application scenario shown in fig. 1, and the results of the simulation tests are shown in fig. 3 and fig. 4, and the simulation is performed under the conditions of different numbers of users in a typical sector and different user SINR partition with different granularity.

As can be seen from fig. 3, in the case that there are 4 macro users in a typical sector and the partition SINR granularity is 3dB, the statistics of the number of states of the optimal antenna configuration is selected, and the antenna configuration with too small number of selected states is not displayed. 94.78% of the states select 20 antenna parameter configurations from 180 antenna parameter configurations, in other words, only switching among 20 antenna parameter sets is needed, so that 94.78% of the user states can be always kept at a high weighting sum rate.

As can be seen from fig. 4, with the method of the present invention, for the same partition SINR granularity, with the change of the number of UEs, the overall selected optimal antenna parameter configuration combination does not change, which indicates that with the increase of the number of users, the selection of the optimal antenna parameter configuration combination is not affected (on the premise of not seriously affecting the quality of the communication environment), and the ratio of the number of states of selecting the optimal antenna parameter combination is in an increasing trend; on the premise that each sector has the same number of UEs, the ratio of the number of states of the optimal antenna parameter configuration combination is basically increased along with the process of gradually reducing the division SINR granularity from 6dB to 2 dB.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A novel reinforcement learning method based on automatic antenna parameter adjustment of a rasterized user position is used in the following scenes:

the dual-layer heterogeneous cellular network comprises a plurality of macro base stations, a micro base station and a plurality of macro users, wherein the base stations are provided with a plurality of directional antennas, and the macro users are provided with a single omnidirectional antenna; the macro user selects and connects with a macro base station which generates the maximum SINR value nearby according to the signal plus interference noise ratio (SINR); the micro base station only serves a specific user, and the signal generated by the micro base station is regarded as interference to the macro user, and besides, the macro user can be subjected to interference signals from a macro base station of a neighboring cell; in order to enable a plurality of users moving at high speed in a complex network environment to be always under high weighting and high speed, a novel reinforcement learning method based on automatic antenna tuning parameters of a rasterized user position is provided, three parameters of an antenna downward inclination angle, a vertical half-power wave width and a horizontal half-power wave width are jointly tuned, and the method comprises the following two operation steps:

(1) an off-line modeling stage: by repeatedly snapshotting a macro cell and the surrounding environment thereof, 3 models, namely a base station-user interference model, a path loss change model, a user position and a corresponding mean value SINR model, are obtained based on a rasterization method, and the offline modeling has the biggest advantage of reducing the time overhead and the calculation complexity in online learning on the premise of abstracting an expandable model; (2) and (3) an online learning stage: and predicting the moving state of the user at each moment by using the provided novel reinforcement learning method based on the real-time SINR value fed back by the user, and continuously iterating until the hyper-parameter is converged, and then providing antenna parameter configuration which can enable the weighting and the speed of the user to reach the maximum.

2. The method of claim 1, wherein:

the concrete steps of the modeling stage of the lifted-off line are as follows:

(1) repeatedly snapshotting a macro cell and the surrounding environment thereof to acquire data;

(2) and setting the size of the grids, and rasterizing the macro cell, wherein all the grids are positioned on the same horizontal plane and have no gradient. When a user moves to an area contained in a certain grid, the user is considered to be positioned in the center of the grid, and the number of users contained in each grid in the same time has no upper limit requirement;

(3) calculating the distance between a user positioned at the center point of each grid of the macro cell and the macro base station, and the pitch angle and the azimuth angle of the user positioned in the grid relative to the macro base station;

(4) carrying out mean value processing on all user SINR values, path loss values and interference between users and a base station obtained in each grid respectively;

(5) and (4) combining the data processed in the steps (3) and (4) with the rasterized macro cell to obtain a base station-user interference model, a path loss change model, a user position and a corresponding mean SINR model.

3. The method of claim 1, wherein:

the specific steps of the proposed online learning phase are as follows

(1) The method comprises the steps of designing action, state, reward, an optimization target and a constraint condition required by a reinforcement learning model, and also comprising an updating rule of a hyper-parameter, wherein the state is used as input, and the action is used as output;

(2) initializing each parameter, and inputting a new state (namely a set of SINR values of each macro user);

(3) from the current moment, updating the hyper-parameters according to a gradient descent principle, continuously giving the reward and action at each moment, and continuously iterating R;

(4) repeating (2) - (3) until the hyperparameter converges;

(5) after the hyperparameter converges, outputting the action (namely a set of corresponding downward inclination angle, vertical half-power wave width and horizontal half-power wave width) which is most frequently appeared in the n iteration processes when R reaches the maximum under the state;

(6) repeating (2) - (5) until all states are used as input to obtain corresponding action as output;

(7) and outputting all the state-action pairs and outputting and selecting the number of the states of each action, thereby finishing all the processes of the automatic antenna parameter adjustment.