CN109831236B

CN109831236B - Beam selection method based on Monte Carlo tree search assistance

Info

Publication number: CN109831236B
Application number: CN201811346507.7A
Authority: CN
Inventors: 陈特; 董彬虹; 陈延涛; 张存林; 曹蕾
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2021-06-01
Anticipated expiration: 2038-11-13
Also published as: CN109831236A

Abstract

The invention discloses a beam selection method based on Monte Carlo tree search assistance, belongs to the field of millimeter wave vehicle communication systems, and mainly relates to a method for selecting an optimal beam between a millimeter wave base station and a dynamically moving vehicle. The invention provides a method for selecting beams of a contact context multi-arm gambling machine based on Monte Carlo tree search assistance aiming at the defects in the prior art. The method is an online learning method, can effectively solve the problems of performance loss and environmental congestion in millimeter wave transmission, and is suitable for communication between vehicles and millimeter wave base stations. Firstly, the characteristic that the millimeter wave communication performance is easy to attenuate is effectively solved by utilizing the contextual information characteristic of a vehicle system; in addition, the Monte Carlo tree searching method adopted by the invention can well process network big data and better meet the requirements of actual communication environment.

Description

Beam selection method based on Monte Carlo tree search assistance

Technical Field

The invention belongs to the field of millimeter wave vehicle communication systems, and mainly relates to a method for selecting optimal beams between a millimeter wave base station and a dynamically moving vehicle

Background

In recent years, in research on new-generation vehicle-base station communication, researchers have focused on how to design multi-Gbps links, a technology that is considered to be the key to implementing 5G vehicle-to-updating (V2X) communication. Multi-Gbps links enable high data rates to be achieved, enabling vehicle communication systems to acquire accurate sensory data (e.g., ultra high definition real-time maps, etc.), which is critical for (semi-) autonomous vehicles. Currently, the 4G LTE-a system (lower than the 6GH band) used by us often generates a congestion condition during communication, which causes obstacles such as communication interruption. Currently, 5G communication systems under development are planned to overcome this obstacle using the undeveloped millimeter wave band (10-300 GHz). Millimeter wave communication has characteristics such as high frequency channel, short wavelength, and compared with traditional channel, millimeter wave channel has defects such as higher path loss and penetration loss. Recent research on vehicle-mounted communication systems has shown that: (1) directional transmission and beam forming are solutions capable of compensating for high path loss of millimeter wave communication, and (2) deployment of a high-density base station can compensate for the disadvantage that a short communication range of a millimeter wave band (100-150 m) is insufficient. These solutions guarantee the feasibility of millimeter wave communication. However, we are also faced with many new challenges in the design of millimeter wave communication systems. First, conventionally, in the frequency band below 6GHz, an omnidirectional transmission mode is generally adopted, but for the millimeter wave frequency band, precise beam alignment between the base station and the vehicle is required. Secondly, millimeter wave communication signals are prone to blockage (e.g. external environment: buildings, foliage) due to high penetration losses. Due to these limitations, the performance of millimeter wave communication systems may be severely hampered by inaccurate beam selection. Thus, if the base station can be made to perform dynamic beam selection based on its surroundings (e.g., to avoid blocking), performance degradation can be effectively reduced.

On the traditional beam direction measurement mode, data of actual measurement is adopted to analyze the direction of beam selection, however, the manual measurement mode is time-consuming and not extensible for the 5G cellular base station which is densely deployed in the future. In addition, this approach does not effectively address the mobile vehicle and environmental blockage scenarios. Based on the above-mentioned drawbacks, it is believed that the base stations should have the features of autonomously exploring, learning and adapting to their environment in order to enable accurate beam selection. To this end, we propose to utilize a monte carlo tree based search method in the base station that autonomously characterizes its surroundings by exploiting the collected context information. In particular, how to correlate information (e.g., the location of the user's vehicle) with its decision outcome (e.g., beam selection) is critical to making the best decision. To better cope with the large scale densification of 5G networks, we model the beam selection problem as a contact context dobby problem and propose an online learning algorithm for monte carlo tree search with low complexity for mmwave base stations. The algorithm can enable the millimeter wave base station to autonomously learn decisions from the relationship between previous decisions and available context information, and the method can be well applied to dynamic systems, such as the occurrence of congestion and the change of traffic modes.

Disclosure of Invention

The invention provides a method for selecting beams of a contact context multi-arm gambling machine based on Monte Carlo tree search assistance aiming at the defects in the prior art. The method is an online learning method, can effectively solve the problems of performance loss and environmental congestion in millimeter wave transmission, and is suitable for communication between vehicles and millimeter wave base stations.

For convenience of describing the contents of the present invention, a model used in the present invention will be described first, and terms used in the present invention will be defined.

Introduction of a system model: in a radio coverage area, a millimeter wave Base Station (mmWave Base Station, mmWBS) is a radio transceiver Station for information transmission between terminals. The invention considers that a processor with the capability of selecting beams is configured in the base station, and the large-scale antenna emitted by the millimeter wave base station is directionally selected. Assuming that the beam set is F ═ F₁,f₂,.., and all beams are the same size. Considering an application scenario in massive MIMO, a beam set is determined by the number of massive antennas, and thus the size | F | of the beam set is assumed to be infinite. The beam selection of the millimeter wave base station may be described as the optimal M beams may be selected. Meanwhile, the invention considers the mobility of the vehicles, and uses n (T) to represent the number of the vehicles served by the base station at the current time, wherein T is 1, 2. The aim of the invention is to optimize the optimal set of beams for each time instant so that the vehicle selects the optimal beam for each time instant.

Definition 1, as shown in FIG. 1, A is a characteristic space of a vehicle_TIt is shown that,

wherein m is_TRepresenting the number of partitioned subspaces, i.e., the number of vehicle types; wherein, a_iRepresenting the feature space of the i-th vehicle.

Definition 2, as shown in fig. 1, the set of the center points of the vehicle feature space can be represented as

Wherein v is_iRepresenting the i-th sub-vehicle feature space a_iIs located in the center of the (c),

d_urepresenting the dimensions of the vehicle context features.

Definition 3 the monte carlo tree employed in the present invention is a binary tree, the upper node of which can be represented as (a)_iH, n) form wherein a_iIs a sub-vehicle feature space type, namely a tree label; h is the depth of the tree and n represents the node numbered n among all nodes with depth h. For the set of beams contained in each node

Represents, and satisfies the following properties:

1.

2.

3.

the invention puts the wave beams into each node of the Monte Carlo tree in a wave beam characteristic clustering mode, and the wave beam characteristics in each node have little difference.

In order to more clearly show the structure of the monte carlo tree and the clustering of beams on the nodes thereof, fig. 2 shows a monte carlo tree, and fig. 3 shows the clustering of beams on each node in the tree of fig. 2.

Definition 4,

Indicates the tree node (a) by time t_iH, n) total number of times the beam is utilized, i.e. total number of times the beam is selected by the vehicle in the node by time t.

Define 5, at time t, Tree node (a)_iThe actual reward of h, n) can be expressed as

r_mIndicating the prize when the mth time was selected.

Definition 6, taking into account the balance between search and utilization, which is the characteristics of the dobby, in the present invention, the t-th time is the tree node (a)_iH, n) the prize will be defined as:

wherein c, l₁The values are constant when the value is more than 0 and rho is more than 0 and less than 1.

Definition 7, the present invention considers the relationship between parent nodes and child nodes in Monte Carlo tree while considering the characteristics of the dobby game machine, so as to determine the tree node (a) at the t-th time_iUpper bound on reward of h, n)

Is defined as: when the node (a)_iAnd h, n) are leaf nodes,

when in use

When the temperature of the water is higher than the set temperature,

E_maxa maximum prize value representing the current time; in the rest of the cases, the first and second,

define 8, in Tree

The steps of searching for the optimal path are as follows:

step 1, initializing the optimal Path ═ a_i0,1) and the starting point (a) of the current optimal path_i,h,n)＝(a_i,0,1)，

Step 2, iterative judgment: if the starting point (a) of the current optimal path_iH, n) are not leaf nodes and

if yes, executing step 3; otherwise, step 4 is executed.

Step 3, if

If yes, updating the starting point of the current optimal path as follows:

(a_i,h,n)＝(a_ih +1,2n), and will tree node (a)_iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)_iH +1,2n), returning to the step 2; if it is

If yes, the starting point of the current optimal path is updated to be (a)_i,h,n)＝(a_iH +1,2n-1), and will tree node (a)_iH +1,2n-1) is added to the optimal Path, i.e. Path ═ u (a)_iH +1,2n-1), return to step 2.

Step 4, outputting the optimal Path Path and the starting point (a) of the current optimal Path_iH, n), the starting point at this time is the only leaf on the optimal pathAnd (4) child nodes.

To more clearly describe the optimal path search, fig. 4 shows the process of performing the optimal path search on the monte carlo tree of fig. 2.

Define 9, in Tree

The steps of reverse updating along the optimal path are as follows:

step 1, in the tree

Finding out the optimal Path and the only leaf node (a) on the optimal Path_i,h_max,n)， h_maxAs a tree of current time of day

The maximum depth of (a). The number of iterations is initialized to 1, and the iteration starting point is a leaf node (a)_iH, n). The maximum number of iterations is h_max。

Step 2, when the iteration number is k, updating the node to be (a)_i,h,n^*) And is and

wherein h is h_max-k represents the depth of the current update node. Counting the number of times the selected beam in the node is requested at the time t, and using the sum of the counted times as the reward of the beam selection at the time

May particularly be expressed as

And 3, updating the actual average reward of the node:

step (ii) of4. Updating the number of times the node is utilized in the beam selection process:

step 5, updating the reward of the beam selection of the node according to the definition 5

Step 6, updating the beam selection reward upper bound of the node according to the definition 6

Step 7, the iteration times k are k + 1; if k > h_maxThen the iteration terminates and ends the tree pair

Carrying out a reverse updating process; otherwise, step 2 is executed.

To more clearly describe the optimal path search, fig. 5 shows the process of performing backtracking update on the optimal path of fig. 4.

Definition 10, threshold η of leaf expansion_h(t) is represented by

The leaf node expansion steps are as follows:

step 1, the maximum iteration number is expressed as | Lambda_a(t)I.e. the number of trees in the set. The number of initialization iterations is set to 1.

Step 2, when the iteration times is i, calculating the tree

Tree expansion threshold of

Step 3, if

And is

Is a tree

The leaf node of (2) is expanded, namely the tree is updated

The structure of (1):

simultaneously connecting nodes

And node

The reward setting of (1) is:

and 4, updating the iteration times i to i + 1.

Step 5, if i > | Λ_a(t)If yes, stopping iteration; otherwise, step 3 is executed.

The technical scheme of the invention is as follows:

the method is particularly a beam selection method for online learning by using a contact context multi-armed gambling machine model with assistance of Monte Carlo tree search. The core of the method is a multi-armed gambling machine algorithm in connection with the context based on the assistance of Monte Carlo tree search, and the process mainly comprises four parts of optimal path search, optimal beam selection, backtracking update of an optimal path and expansion of a Monte Carlo tree. Before this, the vehicle context feature space partitioning and the initialization setting of the Monte Carlo tree can be seen as a preprocessing part of the inventive method.

The technical scheme of the invention is a beam selection method based on Monte Carlo tree search assistance, which comprises the following steps:

step 1, dividing a user context feature space;

according to the context characteristics of all vehicles, the vehicle characteristic space A_TIs divided into m_TA sub-vehicle feature space;

step 2, initializing and setting a Monte Carlo tree;

when t is equal to 1, initializing m_TBinary tree

Wherein

Representing a vehicle feature space a_iIs (a) of_i0,1) denotes the root node of the binary tree, (a)_i,1,1),(a_i1,2) two leaf nodes of the binary tree; initialization node (a)_i1,1) and node (a)_i1,2) of the prize value,

E_maxa maximum prize value representing the current time;

step 3, at the time t, observing the number N (t) of vehicles served by the millimeter wave base station, extracting the context feature x (t) of each vehicle, and vectorizing the context feature x (t), wherein the context feature of the jth vehicle can be represented as x_j(t)，

d_uA dimension representing a vehicle context feature;

step 4, according to the extracted vehicle context characteristics, each vehicle selects the vehicle type of the vehicle; the selection criteria is that the jth vehicle is assumed to belong to the vehicleVehicle sub-feature space a_iThen there is

Is established, | · | non-conducting filament₂Representing a two-norm in which the set of vehicle feature space center points is represented as

step 5, if the jth vehicle belongs to the vehicle sub-feature space a_iThen in the tree

Performing optimal path search to obtain a leaf node with the highest reward value of the jth vehicle, namely all beams on the leaf node are used as recommended optimal beams of the jth vehicle at the time t; repeating the step 5 until all vehicles served by the millimeter wave base station at the current moment are traversed;

and 6, selecting M beams with the best performance from the recommended optimal beams of all vehicles, and putting the M beams into a set C of beam selection at the current moment, wherein C is { C ═ C₁(t),c₂(t),...,c_M(t)}；

Step 7, counting the number of times of requests of each vehicle to each beam in the beam selection set C at the t-th moment; wherein the number of requests for the jth vehicle to the beam m of the beam selection set C may be represented as d_j,m，j＝1,2,...,N(t)， m＝1,2,...,M；

Step 8, for the jth vehicle, in the corresponding characteristic space a_iTree of (2)

In the above, the reward value of the node and the times selected by the wave beam are to perform the algorithm of updating along the optimal path in the reverse direction; repeating the step 8 until the traversal is completedA vehicle is provided;

step 9, in a (t) ((a))_i(t)), i ═ 1, 2.., n (t),

selecting a non-repeating set of vehicle feature subspaces Λ_a(t)；

Step 10, at Λ_a(t)For each of the feature subspaces a_iCorresponding tree

Judging whether leaf node expansion is carried out or not; repeating the step 10 until the characteristic subspace Lambda is traversed_a(t)All the trees are listed;

and step 11, returning to step 3, wherein t is t + 1.

Further, the step of searching for the optimal path in step 5 is as follows:

step 5.1, initialize the optimal Path ═ a_i0,1) and the starting point (a) of the current optimal path_i,h,n)＝(a_i,0,1)，

Step 5.2, iterative judgment: if the starting point (a) of the current optimal path_iH, n) are not leaf nodes and

if yes, executing step 5.3; otherwise, executing step 5.4;

step 5.3, if

If yes, updating the starting point of the current optimal path as follows:

(a_i,h,n)＝(a_ih +1,2n), and will tree node (a)_iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)_iH +1,2n), returning to the step 5.2; if it is

If yes, the current optimal path is determinedThe starting point of (a) is updated to_i,h,n)＝(a_iH +1,2n-1), and will tree node (a)_iH +1,2n-1) is added to the optimal Path, i.e. Path ═ u (a)_iH +1,2n-1), returning to the step 5.2;

step 5.4, outputting the optimal Path Path and the starting point (a) of the current optimal Path_iH, n), the starting point at this time is the only leaf node on the optimal path.

Further, the step 8 is in the tree

The steps of reverse updating along the optimal path are as follows:

step 8.1, in the tree

The maximum depth of (d); the number of iterations is initialized to 1, and the iteration starting point is a leaf node (a)_iH, n); the maximum number of iterations is h_max；

Step 8.2, when the iteration number is k, updating the node to be (a)_i,h,n^*) And is and

wherein h is h_max-k represents the depth of the current update node; counting the number of times that the selected beam in the node is requested at the time t, and using the sum of the counted times as the reward at the time

May particularly be expressed as

C is the set of beam selections.

And 8.3, updating the actual average reward of the node:

step 8.4, updating the utilized times of the node in the process of selecting the beam:

step 8.5, updating the selected beam reward of the node

Step 8.6, updating the upper bound of the selected beam reward of the node

Step 8.7, the iteration times k are k + 1; if k > h_maxThen the iteration terminates and ends the tree pair

Carrying out a reverse updating process; otherwise, step 8.2 is performed.

Further, the method for determining whether to perform leaf node expansion in step 10 is as follows:

the leaf expansion threshold is

Step 10.1, maximum number of iterations expressed as Λ_a(t)I, the number of trees in the set; initializing the iteration times to be set to 1;

step 10.2, when the iteration times is i, calculating the tree

Tree expansion threshold of

Step 10.3, if

And is

Is a tree

The leaf node of (2) is expanded, namely the tree is updated

The structure of (1):

simultaneously connecting nodes

And node

The reward setting of (1) is:

step 10.4, updating the iteration times i to i + 1;

step 10.5, if i > | Λ_a(t)If yes, stopping iteration; otherwise, step 10.3 is performed.

The invention has the beneficial effects that: firstly, the characteristic that the millimeter wave communication performance is easy to attenuate is effectively solved by utilizing the contextual information characteristic of a vehicle system; in addition, the Monte Carlo tree searching method adopted by the invention can well process network big data and better meet the requirements of actual communication environment.

Drawings

FIG. 1 is a schematic diagram of vehicle feature space partitioning;

FIG. 2 is a schematic diagram of a Monte Carlo tree structure according to the present invention;

FIG. 3 is a schematic diagram of beam selection feature partitioning;

FIG. 4 is a schematic diagram of a Monte Carlo tree optimal path method;

FIG. 5 is a diagram illustrating a Monte Carlo tree backtracking update method;

fig. 6 is a flow chart of a beam selection method of the present invention.

Detailed Description

The technical solution of the present invention is described in detail below according to a specific embodiment. It should be understood that the scope of the present invention is not limited to the following examples, and any techniques implemented based on the present disclosure are within the scope of the present invention.

Data used by specific embodiments of the present invention will first be described. The data used in the present invention is from a database named movieeslens. The data source was a total of 1000209 evaluations of 3952 movies by 6040 users between 2000 and 2003. The present invention considers each user's evaluation of each movie as a beam-steering selection of each vehicle to the millimeter wave base station.

Secondly, according to practical situations, the initialization settings of the parameters of the embodiment of the present invention are as follows:

the slot length T is set to 8760 hours with a 1 hour difference between each slot. The user's contextual characteristics are age and gender only, adult and minor, male and female, respectively, i.e. the characteristic space a of the vehicle_TIs divided into m_T4 sub-vehicle feature spaces. The features of the movie are divided into 10 features according to a semantic algorithm. The base station beam selection number M is set to 16, i.e., a maximum of 16 beams can be selected. Reward E for maximum beam selection of tree nodes_max＝∞。

The three constants in definition 5 are set to:

ρ 0.5 and

fig. 6 shows a flow chart of the implementation of the proposed method of the present invention. The method comprises the following steps:

step 1, dividing a user context feature space, namely dividing a feature space A of a vehicle_TAnd dividing into 4 sub-vehicle feature spaces.

Step 2, initializing setting of the Monte Carlo tree, namely initializing 4 binary trees gamma when t is 1, wherein

Representing a vehicle feature space a_iThe binary tree of (a) is described,

at the same time, the node (a) is initialized_i1,1) and node (a)_i1,2) of the prize value,

step 3, at the time t, observing the number N (t) of vehicles served by the millimeter wave base station, extracting the context feature x (t) of each vehicle, and vectorizing the context feature, namely the context feature of the jth vehicle can be represented as x_j(t)，

And 4, selecting the subspace type of each vehicle according to the extracted vehicle context characteristics.

Step 5, if the jth vehicle belongs to the type a_iThen in the tree

And performing optimal path search. And 5, repeating the step until all vehicles served by the millimeter wave base station at the current moment are traversed.

Step 6, selecting M waves with highest frequency of occurrence from recommended optimal wave beam selection of all vehiclesThe beam set C is selected at the current time when the beam is put in, and may be denoted as C ═ C₁(t),c₂(t),...,c_M(t)}。

And 7, counting the number of times of requests of each beam to each beam in the optimal beam selection set C at the t-th moment. The number of requests for the jth beam to the beam m of the optimal beam selection set C may be denoted as d_j,m，j＝1,2,...,N(t)， m＝1,2,...,M。

Step 8, for the jth wave beam, in the corresponding characteristic space a_iTree of (2)

In the above, the reward value of the node and the selected times of the beam are backtracked and updated along the optimal path. And step 8 is repeated until all vehicles served by the millimeter wave base station at the current moment are traversed.

Step 9, in a (t) ((a))_i(t)), i ═ 1, 2.., n (t),

selecting a non-repeating set of vehicle feature subspaces Λ_a(t)。

Step 10, at Λ_a(t)For each of the feature subspaces a_iCorresponding tree

And judging whether leaf node expansion is performed or not. Until the feature subspace Lambda is traversed_a(t)All the trees above.

Step 11, if t is less than 8760, t is t +1, and the step 3 is returned; otherwise, the loop is exited.

Claims

1. A method for selecting beams based on monte carlo tree search assistance, the method comprising:

step 1, dividing a user context feature space;

step 2, initializing and setting a Monte Carlo tree;

when t is equal to 1, initializing m_TBinary tree

Wherein

E_maxa maximum prize value representing the current time;

d_uA dimension representing a vehicle context feature;

step 4, according to the extracted vehicle context characteristics, each vehicle selects the vehicle type of the vehicle; the selection criterion is that the jth vehicle is assumed to belong to the vehicle sub-feature space a_iThen there is

Wherein v is_iIndicating class i sub-vehicleCharacteristic space a_iIs located in the center of the (c),

Step 7, counting the number of times of requests of each vehicle to each beam in the beam selection set C at the t-th moment; wherein the number of requests for the jth vehicle to the beam m of the beam selection set C may be represented as d_j,m，j＝1,2,...,N(t)，m＝1,2,...,M；

In the above, the reward value of the node and the times selected by the wave beam are to perform the algorithm of updating along the optimal path in the reverse direction; repeating the step 8 until all vehicles are traversed;

step 9, in a (t) ((a))_i(t)), i ═ 1, 2.., n (t),

selecting a non-repeating set of vehicle feature subspaces Λ_a(t)；

Step 10, at Λ_a(t)For each of the feature subspaces a_iCorresponding tree

and step 11, returning to step 3, wherein t is t + 1.

2. The method of claim 1, wherein the step of searching for the optimal path in step 5 comprises:

if yes, executing step 5.3; otherwise, executing step 5.4;

step 5.3, if

It is true that the first and second sensors,

the starting point of the current optimal path is updated to (a)_i,h,n)＝(a_iH +1,2n), and will tree node (a)_iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)_iH +1,2n), returning to the step 5.2;

if it is

If yes, the starting point of the current optimal path is updated to be (a)_i,h,n)＝(a_iH +1,2n-1), and will tree node (a)_iH +1,2n-1) is added to the optimal path,

i.e. Path ═ u (a)_iH +1,2n-1), returning to the step 5.2;

3. The method of claim 1, wherein the step 8 is performed in a tree in a monte carlo tree search assisted beam selection method