CN109831236B - Beam selection method based on Monte Carlo tree search assistance - Google Patents

Beam selection method based on Monte Carlo tree search assistance Download PDF

Info

Publication number
CN109831236B
CN109831236B CN201811346507.7A CN201811346507A CN109831236B CN 109831236 B CN109831236 B CN 109831236B CN 201811346507 A CN201811346507 A CN 201811346507A CN 109831236 B CN109831236 B CN 109831236B
Authority
CN
China
Prior art keywords
vehicle
tree
node
optimal path
updating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811346507.7A
Other languages
Chinese (zh)
Other versions
CN109831236A (en
Inventor
陈特
董彬虹
陈延涛
张存林
曹蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201811346507.7A priority Critical patent/CN109831236B/en
Publication of CN109831236A publication Critical patent/CN109831236A/en
Application granted granted Critical
Publication of CN109831236B publication Critical patent/CN109831236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Traffic Control Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a beam selection method based on Monte Carlo tree search assistance, belongs to the field of millimeter wave vehicle communication systems, and mainly relates to a method for selecting an optimal beam between a millimeter wave base station and a dynamically moving vehicle. The invention provides a method for selecting beams of a contact context multi-arm gambling machine based on Monte Carlo tree search assistance aiming at the defects in the prior art. The method is an online learning method, can effectively solve the problems of performance loss and environmental congestion in millimeter wave transmission, and is suitable for communication between vehicles and millimeter wave base stations. Firstly, the characteristic that the millimeter wave communication performance is easy to attenuate is effectively solved by utilizing the contextual information characteristic of a vehicle system; in addition, the Monte Carlo tree searching method adopted by the invention can well process network big data and better meet the requirements of actual communication environment.

Description

Beam selection method based on Monte Carlo tree search assistance
Technical Field
The invention belongs to the field of millimeter wave vehicle communication systems, and mainly relates to a method for selecting optimal beams between a millimeter wave base station and a dynamically moving vehicle
Background
In recent years, in research on new-generation vehicle-base station communication, researchers have focused on how to design multi-Gbps links, a technology that is considered to be the key to implementing 5G vehicle-to-updating (V2X) communication. Multi-Gbps links enable high data rates to be achieved, enabling vehicle communication systems to acquire accurate sensory data (e.g., ultra high definition real-time maps, etc.), which is critical for (semi-) autonomous vehicles. Currently, the 4G LTE-a system (lower than the 6GH band) used by us often generates a congestion condition during communication, which causes obstacles such as communication interruption. Currently, 5G communication systems under development are planned to overcome this obstacle using the undeveloped millimeter wave band (10-300 GHz). Millimeter wave communication has characteristics such as high frequency channel, short wavelength, and compared with traditional channel, millimeter wave channel has defects such as higher path loss and penetration loss. Recent research on vehicle-mounted communication systems has shown that: (1) directional transmission and beam forming are solutions capable of compensating for high path loss of millimeter wave communication, and (2) deployment of a high-density base station can compensate for the disadvantage that a short communication range of a millimeter wave band (100-150 m) is insufficient. These solutions guarantee the feasibility of millimeter wave communication. However, we are also faced with many new challenges in the design of millimeter wave communication systems. First, conventionally, in the frequency band below 6GHz, an omnidirectional transmission mode is generally adopted, but for the millimeter wave frequency band, precise beam alignment between the base station and the vehicle is required. Secondly, millimeter wave communication signals are prone to blockage (e.g. external environment: buildings, foliage) due to high penetration losses. Due to these limitations, the performance of millimeter wave communication systems may be severely hampered by inaccurate beam selection. Thus, if the base station can be made to perform dynamic beam selection based on its surroundings (e.g., to avoid blocking), performance degradation can be effectively reduced.
On the traditional beam direction measurement mode, data of actual measurement is adopted to analyze the direction of beam selection, however, the manual measurement mode is time-consuming and not extensible for the 5G cellular base station which is densely deployed in the future. In addition, this approach does not effectively address the mobile vehicle and environmental blockage scenarios. Based on the above-mentioned drawbacks, it is believed that the base stations should have the features of autonomously exploring, learning and adapting to their environment in order to enable accurate beam selection. To this end, we propose to utilize a monte carlo tree based search method in the base station that autonomously characterizes its surroundings by exploiting the collected context information. In particular, how to correlate information (e.g., the location of the user's vehicle) with its decision outcome (e.g., beam selection) is critical to making the best decision. To better cope with the large scale densification of 5G networks, we model the beam selection problem as a contact context dobby problem and propose an online learning algorithm for monte carlo tree search with low complexity for mmwave base stations. The algorithm can enable the millimeter wave base station to autonomously learn decisions from the relationship between previous decisions and available context information, and the method can be well applied to dynamic systems, such as the occurrence of congestion and the change of traffic modes.
Disclosure of Invention
The invention provides a method for selecting beams of a contact context multi-arm gambling machine based on Monte Carlo tree search assistance aiming at the defects in the prior art. The method is an online learning method, can effectively solve the problems of performance loss and environmental congestion in millimeter wave transmission, and is suitable for communication between vehicles and millimeter wave base stations.
For convenience of describing the contents of the present invention, a model used in the present invention will be described first, and terms used in the present invention will be defined.
Introduction of a system model: in a radio coverage area, a millimeter wave Base Station (mmWave Base Station, mmWBS) is a radio transceiver Station for information transmission between terminals. The invention considers that a processor with the capability of selecting beams is configured in the base station, and the large-scale antenna emitted by the millimeter wave base station is directionally selected. Assuming that the beam set is F ═ F1,f2,.., and all beams are the same size. Considering an application scenario in massive MIMO, a beam set is determined by the number of massive antennas, and thus the size | F | of the beam set is assumed to be infinite. The beam selection of the millimeter wave base station may be described as the optimal M beams may be selected. Meanwhile, the invention considers the mobility of the vehicles, and uses n (T) to represent the number of the vehicles served by the base station at the current time, wherein T is 1, 2. The aim of the invention is to optimize the optimal set of beams for each time instant so that the vehicle selects the optimal beam for each time instant.
Definition 1, as shown in FIG. 1, A is a characteristic space of a vehicleTIt is shown that,
Figure RE-RE-GDA0001993485490000021
wherein m isTRepresenting the number of partitioned subspaces, i.e., the number of vehicle types; wherein, aiRepresenting the feature space of the i-th vehicle.
Definition 2, as shown in fig. 1, the set of the center points of the vehicle feature space can be represented as
Figure RE-RE-GDA0001993485490000022
Wherein v isiRepresenting the i-th sub-vehicle feature space aiIs located in the center of the (c),
Figure RE-RE-GDA0001993485490000023
durepresenting the dimensions of the vehicle context features.
Definition 3 the monte carlo tree employed in the present invention is a binary tree, the upper node of which can be represented as (a)iH, n) form wherein aiIs a sub-vehicle feature space type, namely a tree label; h is the depth of the tree and n represents the node numbered n among all nodes with depth h. For the set of beams contained in each node
Figure RE-RE-GDA0001993485490000024
Represents, and satisfies the following properties:
1.
Figure RE-RE-GDA0001993485490000025
2.
Figure RE-RE-GDA0001993485490000026
3.
Figure RE-RE-GDA0001993485490000031
the invention puts the wave beams into each node of the Monte Carlo tree in a wave beam characteristic clustering mode, and the wave beam characteristics in each node have little difference.
In order to more clearly show the structure of the monte carlo tree and the clustering of beams on the nodes thereof, fig. 2 shows a monte carlo tree, and fig. 3 shows the clustering of beams on each node in the tree of fig. 2.
Definition 4,
Figure RE-RE-GDA00019934854900000313
Indicates the tree node (a) by time tiH, n) total number of times the beam is utilized, i.e. total number of times the beam is selected by the vehicle in the node by time t.
Define 5, at time t, Tree node (a)iThe actual reward of h, n) can be expressed as
Figure RE-RE-GDA0001993485490000032
rmIndicating the prize when the mth time was selected.
Definition 6, taking into account the balance between search and utilization, which is the characteristics of the dobby, in the present invention, the t-th time is the tree node (a)iH, n) the prize will be defined as:
Figure RE-RE-GDA0001993485490000033
wherein c, l1The values are constant when the value is more than 0 and rho is more than 0 and less than 1.
Definition 7, the present invention considers the relationship between parent nodes and child nodes in Monte Carlo tree while considering the characteristics of the dobby game machine, so as to determine the tree node (a) at the t-th timeiUpper bound on reward of h, n)
Figure RE-RE-GDA0001993485490000034
Is defined as: when the node (a)iAnd h, n) are leaf nodes,
Figure RE-RE-GDA0001993485490000035
when in use
Figure RE-RE-GDA0001993485490000036
When the temperature of the water is higher than the set temperature,
Figure RE-RE-GDA0001993485490000037
Emaxa maximum prize value representing the current time; in the rest of the cases, the first and second,
Figure RE-RE-GDA0001993485490000038
define 8, in Tree
Figure RE-RE-GDA0001993485490000039
The steps of searching for the optimal path are as follows:
step 1, initializing the optimal Path ═ ai0,1) and the starting point (a) of the current optimal pathi,h,n)=(ai,0,1),
Figure RE-RE-GDA00019934854900000310
Step 2, iterative judgment: if the starting point (a) of the current optimal pathiH, n) are not leaf nodes and
Figure RE-RE-GDA00019934854900000311
if yes, executing step 3; otherwise, step 4 is executed.
Step 3, if
Figure RE-RE-GDA00019934854900000312
If yes, updating the starting point of the current optimal path as follows:
(ai,h,n)=(aih +1,2n), and will tree node (a)iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)iH +1,2n), returning to the step 2; if it is
Figure RE-RE-GDA0001993485490000041
If yes, the starting point of the current optimal path is updated to be (a)i,h,n)=(aiH +1,2n-1), and will tree node (a)iH +1,2n-1) is added to the optimal Path, i.e. Path ═ u (a)iH +1,2n-1), return to step 2.
Step 4, outputting the optimal Path Path and the starting point (a) of the current optimal PathiH, n), the starting point at this time is the only leaf on the optimal pathAnd (4) child nodes.
To more clearly describe the optimal path search, fig. 4 shows the process of performing the optimal path search on the monte carlo tree of fig. 2.
Define 9, in Tree
Figure RE-RE-GDA0001993485490000042
The steps of reverse updating along the optimal path are as follows:
step 1, in the tree
Figure RE-RE-GDA0001993485490000043
Finding out the optimal Path and the only leaf node (a) on the optimal Pathi,hmax,n), hmaxAs a tree of current time of day
Figure RE-RE-GDA0001993485490000044
The maximum depth of (a). The number of iterations is initialized to 1, and the iteration starting point is a leaf node (a)iH, n). The maximum number of iterations is hmax
Step 2, when the iteration number is k, updating the node to be (a)i,h,n*) And is and
Figure RE-RE-GDA0001993485490000045
wherein h is hmax-k represents the depth of the current update node. Counting the number of times the selected beam in the node is requested at the time t, and using the sum of the counted times as the reward of the beam selection at the time
Figure RE-RE-GDA0001993485490000046
May particularly be expressed as
Figure RE-RE-GDA0001993485490000047
And 3, updating the actual average reward of the node:
Figure RE-RE-GDA0001993485490000048
step (ii) of4. Updating the number of times the node is utilized in the beam selection process:
Figure RE-RE-GDA0001993485490000049
step 5, updating the reward of the beam selection of the node according to the definition 5
Figure RE-RE-GDA00019934854900000410
Step 6, updating the beam selection reward upper bound of the node according to the definition 6
Figure RE-RE-GDA00019934854900000411
Step 7, the iteration times k are k + 1; if k > hmaxThen the iteration terminates and ends the tree pair
Figure RE-RE-GDA00019934854900000412
Carrying out a reverse updating process; otherwise, step 2 is executed.
To more clearly describe the optimal path search, fig. 5 shows the process of performing backtracking update on the optimal path of fig. 4.
Definition 10, threshold η of leaf expansionh(t) is represented by
Figure RE-RE-GDA0001993485490000051
The leaf node expansion steps are as follows:
step 1, the maximum iteration number is expressed as | Lambdaa(t)I.e. the number of trees in the set. The number of initialization iterations is set to 1.
Step 2, when the iteration times is i, calculating the tree
Figure RE-RE-GDA0001993485490000052
Tree expansion threshold of
Figure RE-RE-GDA0001993485490000053
Step 3, if
Figure RE-RE-GDA0001993485490000054
And is
Figure RE-RE-GDA0001993485490000055
Is a tree
Figure RE-RE-GDA0001993485490000056
The leaf node of (2) is expanded, namely the tree is updated
Figure RE-RE-GDA0001993485490000057
The structure of (1):
Figure RE-RE-GDA0001993485490000058
simultaneously connecting nodes
Figure RE-RE-GDA0001993485490000059
And node
Figure RE-RE-GDA00019934854900000510
The reward setting of (1) is:
Figure RE-RE-GDA00019934854900000511
and 4, updating the iteration times i to i + 1.
Step 5, if i > | Λa(t)If yes, stopping iteration; otherwise, step 3 is executed.
The technical scheme of the invention is as follows:
the method is particularly a beam selection method for online learning by using a contact context multi-armed gambling machine model with assistance of Monte Carlo tree search. The core of the method is a multi-armed gambling machine algorithm in connection with the context based on the assistance of Monte Carlo tree search, and the process mainly comprises four parts of optimal path search, optimal beam selection, backtracking update of an optimal path and expansion of a Monte Carlo tree. Before this, the vehicle context feature space partitioning and the initialization setting of the Monte Carlo tree can be seen as a preprocessing part of the inventive method.
The technical scheme of the invention is a beam selection method based on Monte Carlo tree search assistance, which comprises the following steps:
step 1, dividing a user context feature space;
according to the context characteristics of all vehicles, the vehicle characteristic space ATIs divided into mTA sub-vehicle feature space;
step 2, initializing and setting a Monte Carlo tree;
when t is equal to 1, initializing mTBinary tree
Figure RE-RE-GDA00019934854900000512
Figure RE-RE-GDA00019934854900000513
Wherein
Figure RE-RE-GDA00019934854900000514
Representing a vehicle feature space aiIs (a) ofi0,1) denotes the root node of the binary tree, (a)i,1,1),(ai1,2) two leaf nodes of the binary tree; initialization node (a)i1,1) and node (a)i1,2) of the prize value,
Figure RE-RE-GDA00019934854900000515
Emaxa maximum prize value representing the current time;
step 3, at the time t, observing the number N (t) of vehicles served by the millimeter wave base station, extracting the context feature x (t) of each vehicle, and vectorizing the context feature x (t), wherein the context feature of the jth vehicle can be represented as xj(t),
Figure RE-RE-GDA0001993485490000061
duA dimension representing a vehicle context feature;
step 4, according to the extracted vehicle context characteristics, each vehicle selects the vehicle type of the vehicle; the selection criteria is that the jth vehicle is assumed to belong to the vehicleVehicle sub-feature space aiThen there is
Figure RE-RE-GDA0001993485490000062
Is established, | · | non-conducting filament2Representing a two-norm in which the set of vehicle feature space center points is represented as
Figure RE-RE-GDA0001993485490000063
Wherein v isiRepresenting the i-th sub-vehicle feature space aiIs located in the center of the (c),
Figure RE-RE-GDA0001993485490000064
step 5, if the jth vehicle belongs to the vehicle sub-feature space aiThen in the tree
Figure RE-RE-GDA0001993485490000065
Performing optimal path search to obtain a leaf node with the highest reward value of the jth vehicle, namely all beams on the leaf node are used as recommended optimal beams of the jth vehicle at the time t; repeating the step 5 until all vehicles served by the millimeter wave base station at the current moment are traversed;
and 6, selecting M beams with the best performance from the recommended optimal beams of all vehicles, and putting the M beams into a set C of beam selection at the current moment, wherein C is { C ═ C1(t),c2(t),...,cM(t)};
Step 7, counting the number of times of requests of each vehicle to each beam in the beam selection set C at the t-th moment; wherein the number of requests for the jth vehicle to the beam m of the beam selection set C may be represented as dj,m,j=1,2,...,N(t), m=1,2,...,M;
Step 8, for the jth vehicle, in the corresponding characteristic space aiTree of (2)
Figure RE-RE-GDA0001993485490000066
In the above, the reward value of the node and the times selected by the wave beam are to perform the algorithm of updating along the optimal path in the reverse direction; repeating the step 8 until the traversal is completedA vehicle is provided;
step 9, in a (t) ((a))i(t)), i ═ 1, 2.., n (t),
selecting a non-repeating set of vehicle feature subspaces Λa(t)
Step 10, at Λa(t)For each of the feature subspaces aiCorresponding tree
Figure RE-RE-GDA0001993485490000067
Judging whether leaf node expansion is carried out or not; repeating the step 10 until the characteristic subspace Lambda is traverseda(t)All the trees are listed;
and step 11, returning to step 3, wherein t is t + 1.
Further, the step of searching for the optimal path in step 5 is as follows:
step 5.1, initialize the optimal Path ═ ai0,1) and the starting point (a) of the current optimal pathi,h,n)=(ai,0,1),
Figure RE-RE-GDA0001993485490000071
Step 5.2, iterative judgment: if the starting point (a) of the current optimal pathiH, n) are not leaf nodes and
Figure RE-RE-GDA0001993485490000072
if yes, executing step 5.3; otherwise, executing step 5.4;
step 5.3, if
Figure RE-RE-GDA0001993485490000073
If yes, updating the starting point of the current optimal path as follows:
(ai,h,n)=(aih +1,2n), and will tree node (a)iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)iH +1,2n), returning to the step 5.2; if it is
Figure RE-RE-GDA0001993485490000074
If yes, the current optimal path is determinedThe starting point of (a) is updated toi,h,n)=(aiH +1,2n-1), and will tree node (a)iH +1,2n-1) is added to the optimal Path, i.e. Path ═ u (a)iH +1,2n-1), returning to the step 5.2;
step 5.4, outputting the optimal Path Path and the starting point (a) of the current optimal PathiH, n), the starting point at this time is the only leaf node on the optimal path.
Further, the step 8 is in the tree
Figure RE-RE-GDA0001993485490000075
The steps of reverse updating along the optimal path are as follows:
step 8.1, in the tree
Figure RE-RE-GDA0001993485490000076
Finding out the optimal Path and the only leaf node (a) on the optimal Pathi,hmax,n), hmaxAs a tree of current time of day
Figure RE-RE-GDA0001993485490000077
The maximum depth of (d); the number of iterations is initialized to 1, and the iteration starting point is a leaf node (a)iH, n); the maximum number of iterations is hmax
Step 8.2, when the iteration number is k, updating the node to be (a)i,h,n*) And is and
Figure RE-RE-GDA0001993485490000078
wherein h is hmax-k represents the depth of the current update node; counting the number of times that the selected beam in the node is requested at the time t, and using the sum of the counted times as the reward at the time
Figure RE-RE-GDA0001993485490000079
May particularly be expressed as
Figure RE-RE-GDA00019934854900000710
C is the set of beam selections.
And 8.3, updating the actual average reward of the node:
Figure RE-RE-GDA00019934854900000711
step 8.4, updating the utilized times of the node in the process of selecting the beam:
Figure RE-RE-GDA00019934854900000712
step 8.5, updating the selected beam reward of the node
Figure RE-RE-GDA00019934854900000713
Step 8.6, updating the upper bound of the selected beam reward of the node
Figure RE-RE-GDA0001993485490000081
Step 8.7, the iteration times k are k + 1; if k > hmaxThen the iteration terminates and ends the tree pair
Figure RE-RE-GDA0001993485490000082
Carrying out a reverse updating process; otherwise, step 8.2 is performed.
Further, the method for determining whether to perform leaf node expansion in step 10 is as follows:
the leaf expansion threshold is
Figure RE-RE-GDA0001993485490000083
Step 10.1, maximum number of iterations expressed as Λa(t)I, the number of trees in the set; initializing the iteration times to be set to 1;
step 10.2, when the iteration times is i, calculating the tree
Figure RE-RE-GDA0001993485490000084
Tree expansion threshold of
Figure RE-RE-GDA0001993485490000085
Step 10.3, if
Figure RE-RE-GDA0001993485490000086
And is
Figure RE-RE-GDA0001993485490000087
Is a tree
Figure RE-RE-GDA0001993485490000088
The leaf node of (2) is expanded, namely the tree is updated
Figure RE-RE-GDA0001993485490000089
The structure of (1):
Figure RE-RE-GDA00019934854900000810
simultaneously connecting nodes
Figure RE-RE-GDA00019934854900000811
And node
Figure RE-RE-GDA00019934854900000812
The reward setting of (1) is:
Figure RE-RE-GDA00019934854900000813
step 10.4, updating the iteration times i to i + 1;
step 10.5, if i > | Λa(t)If yes, stopping iteration; otherwise, step 10.3 is performed.
The invention has the beneficial effects that: firstly, the characteristic that the millimeter wave communication performance is easy to attenuate is effectively solved by utilizing the contextual information characteristic of a vehicle system; in addition, the Monte Carlo tree searching method adopted by the invention can well process network big data and better meet the requirements of actual communication environment.
Drawings
FIG. 1 is a schematic diagram of vehicle feature space partitioning;
FIG. 2 is a schematic diagram of a Monte Carlo tree structure according to the present invention;
FIG. 3 is a schematic diagram of beam selection feature partitioning;
FIG. 4 is a schematic diagram of a Monte Carlo tree optimal path method;
FIG. 5 is a diagram illustrating a Monte Carlo tree backtracking update method;
fig. 6 is a flow chart of a beam selection method of the present invention.
Detailed Description
The technical solution of the present invention is described in detail below according to a specific embodiment. It should be understood that the scope of the present invention is not limited to the following examples, and any techniques implemented based on the present disclosure are within the scope of the present invention.
Data used by specific embodiments of the present invention will first be described. The data used in the present invention is from a database named movieeslens. The data source was a total of 1000209 evaluations of 3952 movies by 6040 users between 2000 and 2003. The present invention considers each user's evaluation of each movie as a beam-steering selection of each vehicle to the millimeter wave base station.
Secondly, according to practical situations, the initialization settings of the parameters of the embodiment of the present invention are as follows:
the slot length T is set to 8760 hours with a 1 hour difference between each slot. The user's contextual characteristics are age and gender only, adult and minor, male and female, respectively, i.e. the characteristic space a of the vehicleTIs divided into mT4 sub-vehicle feature spaces. The features of the movie are divided into 10 features according to a semantic algorithm. The base station beam selection number M is set to 16, i.e., a maximum of 16 beams can be selected. Reward E for maximum beam selection of tree nodesmax=∞。
The three constants in definition 5 are set to:
Figure RE-RE-GDA0001993485490000091
ρ 0.5 and
Figure RE-RE-GDA0001993485490000092
fig. 6 shows a flow chart of the implementation of the proposed method of the present invention. The method comprises the following steps:
step 1, dividing a user context feature space, namely dividing a feature space A of a vehicleTAnd dividing into 4 sub-vehicle feature spaces.
Step 2, initializing setting of the Monte Carlo tree, namely initializing 4 binary trees gamma when t is 1, wherein
Figure RE-RE-GDA0001993485490000093
Representing a vehicle feature space aiThe binary tree of (a) is described,
Figure RE-RE-GDA0001993485490000094
at the same time, the node (a) is initializedi1,1) and node (a)i1,2) of the prize value,
Figure RE-RE-GDA0001993485490000095
step 3, at the time t, observing the number N (t) of vehicles served by the millimeter wave base station, extracting the context feature x (t) of each vehicle, and vectorizing the context feature, namely the context feature of the jth vehicle can be represented as xj(t),
Figure RE-RE-GDA0001993485490000096
And 4, selecting the subspace type of each vehicle according to the extracted vehicle context characteristics.
Step 5, if the jth vehicle belongs to the type aiThen in the tree
Figure RE-RE-GDA0001993485490000097
And performing optimal path search. And 5, repeating the step until all vehicles served by the millimeter wave base station at the current moment are traversed.
Step 6, selecting M waves with highest frequency of occurrence from recommended optimal wave beam selection of all vehiclesThe beam set C is selected at the current time when the beam is put in, and may be denoted as C ═ C1(t),c2(t),...,cM(t)}。
And 7, counting the number of times of requests of each beam to each beam in the optimal beam selection set C at the t-th moment. The number of requests for the jth beam to the beam m of the optimal beam selection set C may be denoted as dj,m,j=1,2,...,N(t), m=1,2,...,M。
Step 8, for the jth wave beam, in the corresponding characteristic space aiTree of (2)
Figure RE-RE-GDA0001993485490000101
In the above, the reward value of the node and the selected times of the beam are backtracked and updated along the optimal path. And step 8 is repeated until all vehicles served by the millimeter wave base station at the current moment are traversed.
Step 9, in a (t) ((a))i(t)), i ═ 1, 2.., n (t),
selecting a non-repeating set of vehicle feature subspaces Λa(t)
Step 10, at Λa(t)For each of the feature subspaces aiCorresponding tree
Figure RE-RE-GDA0001993485490000102
And judging whether leaf node expansion is performed or not. Until the feature subspace Lambda is traverseda(t)All the trees above.
Step 11, if t is less than 8760, t is t +1, and the step 3 is returned; otherwise, the loop is exited.

Claims (4)

1. A method for selecting beams based on monte carlo tree search assistance, the method comprising:
step 1, dividing a user context feature space;
according to the context characteristics of all vehicles, the vehicle characteristic space ATIs divided into mTA sub-vehicle feature space;
step 2, initializing and setting a Monte Carlo tree;
when t is equal to 1, initializing mTBinary tree
Figure FDA0002987458570000011
Figure FDA0002987458570000012
Wherein
Figure FDA0002987458570000013
Representing a vehicle feature space aiIs (a) ofi0,1) denotes the root node of the binary tree, (a)i,1,1),(ai1,2) two leaf nodes of the binary tree; initialization node (a)i1,1) and node (a)i1,2) of the prize value,
Figure FDA0002987458570000014
Emaxa maximum prize value representing the current time;
step 3, at the time t, observing the number N (t) of vehicles served by the millimeter wave base station, extracting the context feature x (t) of each vehicle, and vectorizing the context feature x (t), wherein the context feature of the jth vehicle can be represented as xj(t),
Figure FDA0002987458570000015
duA dimension representing a vehicle context feature;
step 4, according to the extracted vehicle context characteristics, each vehicle selects the vehicle type of the vehicle; the selection criterion is that the jth vehicle is assumed to belong to the vehicle sub-feature space aiThen there is
Figure FDA0002987458570000016
Is established, | · | non-conducting filament2Representing a two-norm in which the set of vehicle feature space center points is represented as
Figure FDA0002987458570000017
Wherein v isiIndicating class i sub-vehicleCharacteristic space aiIs located in the center of the (c),
Figure FDA0002987458570000018
step 5, if the jth vehicle belongs to the vehicle sub-feature space aiThen in the tree
Figure FDA0002987458570000019
Performing optimal path search to obtain a leaf node with the highest reward value of the jth vehicle, namely all beams on the leaf node are used as recommended optimal beams of the jth vehicle at the time t; repeating the step 5 until all vehicles served by the millimeter wave base station at the current moment are traversed;
and 6, selecting M beams with the best performance from the recommended optimal beams of all vehicles, and putting the M beams into a set C of beam selection at the current moment, wherein C is { C ═ C1(t),c2(t),...,cM(t)};
Step 7, counting the number of times of requests of each vehicle to each beam in the beam selection set C at the t-th moment; wherein the number of requests for the jth vehicle to the beam m of the beam selection set C may be represented as dj,m,j=1,2,...,N(t),m=1,2,...,M;
Step 8, for the jth vehicle, in the corresponding characteristic space aiTree of (2)
Figure FDA0002987458570000021
In the above, the reward value of the node and the times selected by the wave beam are to perform the algorithm of updating along the optimal path in the reverse direction; repeating the step 8 until all vehicles are traversed;
step 9, in a (t) ((a))i(t)), i ═ 1, 2.., n (t),
selecting a non-repeating set of vehicle feature subspaces Λa(t)
Step 10, at Λa(t)For each of the feature subspaces aiCorresponding tree
Figure FDA0002987458570000022
Judging whether leaf node expansion is carried out or not; repeating the step 10 until the characteristic subspace Lambda is traverseda(t)All the trees are listed;
and step 11, returning to step 3, wherein t is t + 1.
2. The method of claim 1, wherein the step of searching for the optimal path in step 5 comprises:
step 5.1, initialize the optimal Path ═ ai0,1) and the starting point (a) of the current optimal pathi,h,n)=(ai,0,1),
Figure FDA0002987458570000023
Step 5.2, iterative judgment: if the starting point (a) of the current optimal pathiH, n) are not leaf nodes and
Figure FDA0002987458570000024
if yes, executing step 5.3; otherwise, executing step 5.4;
step 5.3, if
Figure FDA0002987458570000025
It is true that the first and second sensors,
the starting point of the current optimal path is updated to (a)i,h,n)=(aiH +1,2n), and will tree node (a)iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)iH +1,2n), returning to the step 5.2;
if it is
Figure FDA0002987458570000026
If yes, the starting point of the current optimal path is updated to be (a)i,h,n)=(aiH +1,2n-1), and will tree node (a)iH +1,2n-1) is added to the optimal path,
i.e. Path ═ u (a)iH +1,2n-1), returning to the step 5.2;
step 5.4, outputting the optimal Path Path and the starting point (a) of the current optimal PathiH, n), the starting point at this time is the only leaf node on the optimal path.
3. The method of claim 1, wherein the step 8 is performed in a tree in a monte carlo tree search assisted beam selection method
Figure FDA0002987458570000027
The steps of reverse updating along the optimal path are as follows:
step 8.1, in the tree
Figure FDA0002987458570000031
Finding out the optimal Path and the only leaf node (a) on the optimal Pathi,hmax,n),hmaxAs a tree of current time of day
Figure FDA0002987458570000032
The maximum depth of (d); the number of iterations is initialized to 1, and the iteration starting point is a leaf node (a)iH, n); the maximum number of iterations is hmax
Step 8.2, when the iteration number is k, updating the node to be (a)i,h,n*) And is and
Figure FDA0002987458570000033
wherein h is hmax-k represents the depth of the current update node; counting the number of times that the selected beam in the node is requested at the time t, and using the sum of the counted times as the reward at the time
Figure FDA0002987458570000034
May particularly be expressed as
Figure FDA0002987458570000035
C is a set of beam selections;
step 8.3, update the actual of the nodeAverage reward:
Figure FDA0002987458570000036
step 8.4, updating the utilized times of the node in the process of selecting the beam:
Figure FDA0002987458570000037
step 8.5, updating the selected beam reward of the node
Figure FDA0002987458570000038
Step 8.6, updating the upper bound of the selected beam reward of the node
Figure FDA0002987458570000039
Step 8.7, the iteration times k are k + 1; if k > hmaxThen the iteration terminates and ends the tree pair
Figure FDA00029874585700000310
Carrying out a reverse updating process; otherwise, step 8.2 is performed.
4. The method of claim 1, wherein the determining whether to perform leaf node expansion in step 10 is performed by:
the leaf expansion threshold is
Figure FDA00029874585700000311
Step 10.1, maximum number of iterations expressed as Λa(t)I, the number of trees in the set; initializing the iteration times to be set to 1;
step 10.2, when the iteration times is i, calculating the tree
Figure FDA00029874585700000312
Tree expansion threshold of
Figure FDA00029874585700000313
Step 10.3, if
Figure FDA00029874585700000314
And is
Figure FDA00029874585700000315
Is a tree
Figure FDA00029874585700000316
The leaf node of (2) is expanded, namely the tree is updated
Figure FDA00029874585700000317
The structure of (1):
Figure FDA00029874585700000318
simultaneously connecting nodes
Figure FDA00029874585700000319
And node
Figure FDA00029874585700000320
The reward setting of (1) is:
Figure FDA0002987458570000041
step 10.4, updating the iteration times i to i + 1;
step 10.5, if i > | Λa(t)If yes, stopping iteration; otherwise, step 10.3 is performed.
CN201811346507.7A 2018-11-13 2018-11-13 Beam selection method based on Monte Carlo tree search assistance Active CN109831236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811346507.7A CN109831236B (en) 2018-11-13 2018-11-13 Beam selection method based on Monte Carlo tree search assistance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811346507.7A CN109831236B (en) 2018-11-13 2018-11-13 Beam selection method based on Monte Carlo tree search assistance

Publications (2)

Publication Number Publication Date
CN109831236A CN109831236A (en) 2019-05-31
CN109831236B true CN109831236B (en) 2021-06-01

Family

ID=66859211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811346507.7A Active CN109831236B (en) 2018-11-13 2018-11-13 Beam selection method based on Monte Carlo tree search assistance

Country Status (1)

Country Link
CN (1) CN109831236B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365375B (en) * 2019-06-26 2021-06-08 东南大学 Beam alignment and tracking method in millimeter wave communication system and computer equipment
CN111446999A (en) * 2020-03-26 2020-07-24 上海无线通信研究中心 Position-assisted beam alignment method and system based on multi-arm forced theft
CN111526499B (en) * 2020-04-17 2022-05-17 中南大学 Vehicle-mounted terminal communication method based on online learning and millimeter wave beam selection
CN111645687A (en) * 2020-06-11 2020-09-11 知行汽车科技(苏州)有限公司 Lane changing strategy determining method, device and storage medium
CN111865446B (en) * 2020-07-29 2021-04-06 中南大学 Intelligent beam registration method and device realized by using context information of network environment
FI20215133A1 (en) 2021-02-10 2022-04-01 Nokia Solutions & Networks Oy Beam selection for cellular access nodes
CN114609589B (en) * 2022-03-09 2023-08-11 电子科技大学 Heuristic backtracking-based real-time phased array radar beam residence scheduling method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101572574A (en) * 2009-06-01 2009-11-04 中国民航大学 Smart antenna self-adapting interference suppression method based on least square-lowest mean square
CN105959044A (en) * 2016-04-21 2016-09-21 北京航空航天大学 Hierarchical codebook structure design method of joint method
CN107329136A (en) * 2017-06-13 2017-11-07 电子科技大学 MIMO radar multiple target adaptive tracking method based on the variable analysis moment
CN107689922A (en) * 2017-08-31 2018-02-13 青岛大学 Steiner optimal trees computational methods and device based on particle swarm optimization
CN108738045A (en) * 2018-04-17 2018-11-02 浙江工业大学 A kind of mobile edge calculations rate maximization approach based on depth deterministic policy gradient method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101572574A (en) * 2009-06-01 2009-11-04 中国民航大学 Smart antenna self-adapting interference suppression method based on least square-lowest mean square
CN105959044A (en) * 2016-04-21 2016-09-21 北京航空航天大学 Hierarchical codebook structure design method of joint method
CN107329136A (en) * 2017-06-13 2017-11-07 电子科技大学 MIMO radar multiple target adaptive tracking method based on the variable analysis moment
CN107689922A (en) * 2017-08-31 2018-02-13 青岛大学 Steiner optimal trees computational methods and device based on particle swarm optimization
CN108738045A (en) * 2018-04-17 2018-11-02 浙江工业大学 A kind of mobile edge calculations rate maximization approach based on depth deterministic policy gradient method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Survey of Monte Carlo Tree Search Methods";Cameron B. Browne等;《IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES》;20120301;全文 *
"基于串行策略的SCMA多用户检测算法";董彬虹等;《电子与信息学报》;20160524;全文 *

Also Published As

Publication number Publication date
CN109831236A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN109831236B (en) Beam selection method based on Monte Carlo tree search assistance
CN113162679B (en) DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN109327252B (en) Online learning beam selection method based on contact context
US20230239037A1 (en) Space-air-ground integrated uav-assisted iot data collectioncollection method based on aoi
CN111865446B (en) Intelligent beam registration method and device realized by using context information of network environment
JP2018142957A (en) Management device, program for making computer execute, and computer-readable recording medium recording program
ElHalawany et al. Leveraging machine learning for millimeter wave beamforming in beyond 5G networks
CN113163466B (en) Self-adaptive fish school routing packet routing method based on fuzzy decision tree
Morocho-Cayamcela et al. Breaking wireless propagation environmental uncertainty with deep learning
Chiroma et al. Large scale survey for radio propagation in developing machine learning model for path losses in communication systems
Guan et al. MAPPO-based cooperative UAV trajectory design with long-range emergency communications in disaster areas
CN117295090A (en) Resource allocation method for Unmanned Aerial Vehicle (UAV) through-sense integrated system
CN116867025A (en) Sensor node clustering method and device in wireless sensor network
CN112765892B (en) Intelligent switching judgment method in heterogeneous Internet of vehicles
Wu et al. Research on RSS based indoor location method
Lou et al. Terrain-based UAV deployment: Providing coverage for outdoor users
Zheng et al. An intelligent wireless communication model based on multi-feature fusion and quantile regression neural network
Li et al. Piecewise-drl: Joint beamforming optimization for ris-assisted mu-miso communication system
Mukhtar et al. Satellite image and received signal-based outdoor localization using deep neural networks
Fu et al. Dense Multi-Agent Reinforcement Learning Aided Multi-UAV Information Coverage for Vehicular Networks
Tarekegn et al. Channel Quality Estimation in 3D Drone Base Station for Future Wireless Network
CN116321219B (en) Self-adaptive honeycomb base station federation forming method, federation learning method and device
CN112118596B (en) Short-range wireless signal strength prediction method based on path sequence regression
Yu et al. A Small Range Ergodic Beamforming Method Based on Binocular Vision Positioning
Singh et al. Multi-level fuzzy inference system based handover decision model for unmanned vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant