CN108810139B

CN108810139B - Monte Carlo tree search-assisted wireless caching method

Info

Publication number: CN108810139B
Application number: CN201810599991.8A
Authority: CN
Inventors: 高鹏宇; 杜洋; 董彬虹; 祝武勇; 崔亚迪; 陈特
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-06-12
Filing date: 2018-06-12
Publication date: 2021-02-02
Anticipated expiration: 2038-06-12
Also published as: CN108810139A

Abstract

The invention discloses a wireless caching method based on Monte Carlo tree search assistance, belongs to the field of mobile communication, and mainly relates to a method for caching nearby user demand content in advance when a wireless network is idle by a base station in mobile communication. In order to solve the problems, the invention provides a wireless caching method for online learning of user preferences by adopting a Monte Carlo tree search-assisted contact context multi-arm gambling machine model. The method can learn the preference degree of the user to the file at the current moment, namely the popularity degree of the file on line according to the context characteristics of the user. Meanwhile, the method based on the Monte Carlo tree search can bring good cache performance by virtue of an efficient data processing mode under the actual communication background that the scale of the video file is continuously enlarged. In addition, because the invention considers the context characteristics of the user and the file characteristics at the same time and carries out clustering processing on the context characteristics and the file characteristics respectively, the cold start problem can be effectively restrained.

Description

Monte Carlo tree search-assisted wireless caching method

Technical Field

The invention belongs to the field of mobile communication, and mainly relates to a method for caching nearby user demand content in advance when a wireless network is idle by a base station in mobile communication. The method is particularly a Monte-Carlo tree search-assisted wireless caching method for a connected context multi-arm gambling machine (MCTS-CMAB).

Background

In recent years, with the gradual popularization of mobile devices (such as smart phones and tablet computers) with multimedia functions, novel wireless service applications, such as WeChat, video, Taobao, microblog and the like, are emerging in large numbers. This allows the functionality of wireless mobile communication to be extended from the initial call to entertainment, office, social areas, etc. At the same time, this has also prompted a rapid increase in data traffic in wireless networks.

The explosive growth of mobile data traffic is a huge burden on the existing cellular network infrastructure, and particularly during the peak period of communication, conditions such as delay and interruption are easy to occur, so that the user experience is deteriorated. Meanwhile, research shows that in future mobile data traffic, mobile video traffic accounts for a large proportion. Therefore, based on the characteristics of the video itself and the reality of hard disk storage, some have proposed a solution named wireless caching, whose basic idea is to allocate a large capacity memory at the wireless access point and cache popular video in advance into the memory at the access point during off-peak hours (e.g., at night). By the mode, when a user requests a video file, if the requested file exists in the cache, the wireless access point can directly transmit the file to the user, so that the flow is localized. Therefore, the delay of data in the backhaul link and the core network can be greatly reduced, and the load of the backhaul link and the core network during the peak period is also reduced. Meanwhile, the occupation amount of the capacity of the backhaul link is reduced, more network resources can be released to serve other services, and the throughput of the system is indirectly improved.

In order to increase the probability that a user will find a video file of interest in the nearby terminal cache and successfully transmit it, a good caching strategy is particularly important, i.e. to determine which popular files should be cached by the terminal in advance. Among the existing Caching technologies, Equal Probability Random Caching (EPRC) and Cut-off Random Caching (CTRC) are the most popular two schemes. In equal probability random caching, all files are randomly cached by a user with the same probability; in the truncation type random cache strategy, a part of files with lower request probability in the file library are truncated to form a cache candidate sub file library, so that a user can randomly cache the files in the file library, and the cache hit rate is superior to equal probability random cache.

However, these two caching schemes cannot be adopted in practical systems. The following reasons mainly exist: 1. the above caching methods all assume that the popularity of a file follows some fixed distribution (usually considering a Zipf distribution). In actual communication, the popularity of the file should change continuously with time. More importantly, the relationship between the user preference and the file popularity is not very clear, but the original caching scheme is not involved. 2. The Context characteristics (Context) of the user, such as age, gender, etc., are not considered. The popularity of a file should be germane to the user's preferences. Users with different characteristics will necessarily have different preferences for files. 3. File features (Content features) such as comedy, art, etc. are not considered. Today, the number of files in the network is increasing day by day, and if each file is analyzed independently, the current caching method cannot meet the practical situation for processing the huge data volume. 4. Cold Start problem (Cold Start). Due to the lack of consideration of files or user prior knowledge, the existing caching method cannot achieve the optimal performance of the existing caching method in a short time.

Disclosure of Invention

In order to solve the problems, the invention provides a wireless caching method for online learning of user preferences by adopting a Monte Carlo tree search-assisted contact context multi-arm gambling machine model. The method can learn the preference degree of the user to the file at the current moment, namely the popularity degree of the file on line according to the context characteristics of the user. Meanwhile, the method based on the Monte Carlo tree search can bring good cache performance by virtue of an efficient data processing mode under the actual communication background that the scale of the video file is continuously enlarged. In addition, because the invention considers the context characteristics of the user and the file characteristics at the same time and carries out clustering processing on the context characteristics and the file characteristics respectively, the cold start problem can be effectively restrained.

For convenience of describing the contents of the present invention, a model used in the present invention will be described first, and terms used in the present invention will be defined.

Introduction of a system model: in a radio coverage area, a Base Station (BS) is a radio transceiver Station for information transmission between terminals. The invention considers that a memory with the capacity of caching a certain number of files is configured in the base station to cache the streaming files. Suppose the set of files is F ═ F₁,f₂,.., and all files are the same size. Considering the actual scenario of the current network big data, the file set will grow continuously over time, so the size | F | of the file set is assumed to be infinite. The capacity of the base station can be described as the maximum capacity of the base station to buffer M files in the file set. Meanwhile, in order to better approach to an actual scene, the mobility of users is considered, and n (T) is used to represent the number of users served by the base station at the current time, where T is 1, 2. The invention aims to optimize the cache file set at each moment, so that the request of a user for the cache file at each moment is maximized.

The Monte Carlo tree used in the present invention is a binary tree, and the nodes thereon can be represented as (a)_iH, n) form wherein a_iIs a sub-user feature space type, i.e. a label of the tree; h is the depth of the tree, n represents the node labeled n among all the nodes with the depth of h; the files are placed into each node of the Monte Carlo tree in a file feature clustering mode, and the file features in each node are not different greatly.

The technical scheme of the invention is a wireless caching method based on Monte Carlo tree search assistance, which comprises the following steps:

step 1: partitioning a feature space into m according to user context features_TAn individual user sub-feature space;

step 2: when t is equal to 1, initializing m_TA binary tree Γ, each sub-feature space corresponding to a binary tree, wherein

Representing a user sub-feature space a_iThe binary tree of (a) is described,

at the same time, the node (a) is initialized_i1,1) and node (a)_i1,2) prize value; wherein (a)_i0,1) represents the user sub-feature space a_iRoot node of binary tree (a)_i1,1) represents the user sub-feature space a_iThe 1 st node in the 1 st generation of the binary tree of (a)_i1,2) represents the user sub-feature space a_iThe 2 nd node in the 1 st generation of the binary tree of (1);

and step 3: at time t, obtaining the number N (t) of all users of the base station, and extracting the context feature of each user, wherein the context feature of the jth user can be represented as x_j(t)；

And 4, step 4: dividing each user into a corresponding user sub-feature space according to the current user context features;

and 5: if the jth user belongs to the user sub-feature space a_iThen in the tree

Performing optimal path search to obtain a terminal leaf node with the highest reward value of the jth user, randomly selecting a path when the reward values are the same, and taking all files on the leaf node as recommended cache files of the jth user at the time t; repeating the step 5 until all users of the base station at the current moment are traversed;

step 6: selecting M files with the highest occurrence frequency from the recommended cache files of all users, and putting the M files into a current cache file set C;

step 7, counting the number of times of requests of each user to each file in the cache file set C at the t-th moment; the number of requests of the jth user to the file m of the cached file set C can be represented as d_j,m，j＝1,2,...,N(t)， m＝1,2,...,M；

Step 8,For the jth user, in the corresponding feature space a_iBinary tree of

Tracing the path, and updating the reward value of each node and the number of times that each node is utilized; repeating the step 8 until all users are traversed;

step 9, for each user sub-feature space a_iCorresponding tree

Judging whether leaf node expansion is carried out or not, and if the leaf node needs to be expanded, growing a next-generation leaf node aiming at the leaf node; repeating the step 10 until all binary trees corresponding to the user sub-feature spaces are traversed;

and step 11, returning to step 3, wherein t is t + 1.

Further, the calculation method for updating the node reward value in each node reward value in the step 8 is as follows: counting the times of the files cached by the base station in the node requested by the user at the time t, and taking the sum of the counted times as the cache reward of the time

The reward for updating the node is:

wherein

Indicates the tree node (a) by time t_iH, n) is utilized, i.e. the total number of times that the file in the node is cached by the base station by the time t.

Further, the reward value of each node is

Wherein c, l₁The values are constant when the value is more than 0 and rho is more than 0 and less than 1.

Further, the method for determining whether the leaf node is expanded in step 9 is as follows:

step 1: calculating leaf node expansion threshold

Step 2: if it is

And is

Is a tree

And if not, expanding the leaf node, otherwise, not expanding the leaf node.

A wireless caching method based on Monte Carlo tree search assistance comprises the following steps:

step 1: classifying all users connected with the base station according to the user context characteristics (the types of files accessed by the users);

step 2: growing respective binary tree according to the context characteristics of each classified user, wherein the binary tree is used for carrying out detailed classification index on files with more user access times through classification indexes of all files of the base station within a period of time;

and step 3: selecting one end node in a binary tree corresponding to each type of user, wherein files contained in the node are used as recommended files; wherein the selection criteria of the end node are: the click rate of the selected terminal node containing the file is higher than that of other terminal nodes containing the file;

and 4, step 4: and collecting files recommended by various users together to serve as cache files of the base station.

Further, the method for growing the binary tree by each type of user in step 2 is as follows:

step 2.1: taking all files transmitted by the base station within a period of time as root nodes of a binary tree, and dividing the files in the root nodes into two types as two child nodes by adopting a clustering method;

step 2.2: judging the click rate of the class user on the files contained in the two child nodes, and selecting one child node with large click rate as a growth node;

step 2.3: dividing the files contained in the growth nodes selected in the step 2.2 into two types by adopting a clustering method, taking the two types as next generation child nodes, and selecting the growth nodes again by adopting the method in the step 2.2;

step 2.4: and (4) sequentially growing by adopting the same method in the step 2.3 until the number of times that the file contained in a certain growing node is clicked by the user is less than a certain threshold value.

The invention has the beneficial effects that: firstly, the invention utilizes the context characteristics of the user, and effectively solves the cold start problem existing in the prior cache method; in addition, the Monte Carlo tree searching method adopted by the invention can well process network big data and better meet the requirements of actual communication environment.

Drawings

FIG. 1 is a schematic diagram of user feature space partitioning;

FIG. 2 is a schematic diagram of a binary tree structure according to the present invention;

FIG. 3 is a schematic diagram of file feature partitioning;

FIG. 4 is a diagram of a binary tree optimal path method;

FIG. 5 is a diagram illustrating a binary tree trace-back update method;

fig. 6 is a flowchart of a wireless caching method according to the present invention.

Detailed Description

The invention considers the characteristics of the dobby gambling machine and the relationship between the parent node and the child nodes in the binary tree, so that the t-th time is the tree node (a)_iUpper bound on reward of h, n)

Is defined as: when the node (a)_iAnd h, n) are leaf nodes,

when in use

When the temperature of the water is higher than the set temperature,

E_maxa maximum prize value representing the current time; in the rest of the cases, the first and second,

the invention is in the tree

The steps of searching for the optimal path are as follows:

step 1, initializing the optimal Path ═ a_i0,1) and the starting point (a) of the current optimal path_i,h,n)＝(a_i,0,1)，

Step 2, iterative judgment: if the starting point (a) of the current optimal path_iH, n) are not leaf nodes and

if yes, executing step 3; otherwise, step 4 is executed.

Step 3, if

If true;

the starting point of the current optimal path is updated to (a)_i,h,n)＝(a_iH +1,2n), and will tree node (a)_iH +1,2n) is added to the optimal Path, i.e. Path ═ u (a)_iH +1,2n), returning to the step 2; if it is

If yes, the starting point of the current optimal path is updated to be (a)_i,h,n)＝(a_iH +1,2n-1), and will tree node (a)_iH +1,2n-1) is added to the optimal path, i.e.Path＝Path∪(a_iH +1,2n-1), return to step 2.

Step 4, outputting the optimal Path Path and the starting point (a) of the current optimal Path_iH, n), the starting point at this time is the only leaf node on the optimal path.

To more clearly describe the optimal path search, fig. 4 shows the process of performing the optimal path search on the binary tree of fig. 2.

The invention is in the tree

The steps of reverse updating along the optimal path are as follows:

step 1, in the tree

Finding out the optimal Path and the only leaf node (a) on the optimal Path_i,h_max,n)， h_maxAs a tree of current time of day

The maximum depth of (a). The number of iterations is initialized to 1, and the iteration starting point is a leaf node (a)_iH, n). The maximum number of iterations is h_max。

Step 2, when the iteration number is k, updating the node to be (a)_i,h,n^*) And is and

wherein h is h_max-k represents the depth of the current update node. Counting the requested times of the files cached in the node at the time t, and taking the sum of the counted times as the cache reward at the time

May particularly be expressed as

And 3, updating the actual average reward of the node:

and 4, updating the utilized times of the node in the caching process:

step 5, updating the cache reward of the node according to the definition 5

Step 6, updating the cache reward upper bound of the node according to the definition 6

Step 7, the iteration times k are k + 1; if k > h_maxThen the iteration terminates and ends the tree pair

Carrying out a reverse updating process; otherwise, step 2 is executed.

To more clearly describe the optimal path search, fig. 5 shows the process of performing a backtracking update on the optimal path of fig. 4.

The threshold of the leaf expansion of the invention, η h (t), is expressed as

The leaf node expansion steps are as follows:

step 1, the maximum iteration number is expressed as | Lambda_a(t)I.e. the number of trees in the set. The number of initialization iterations is set to 1.

Step 2, when the iteration times is i, calculating the tree

Tree expansion threshold of

Step 3, if

And is

Is a tree

The leaf node of (2) is expanded, namely the tree is updated

The structure of (1):

simultaneously connecting nodes

And node

The reward setting of (1) is:

and 4, updating the iteration times i to i + 1.

Step 5, if i > | Λ_a(t)If yes, stopping iteration; otherwise, step 3 is executed.

The technical solution of the present invention is described in detail below according to a specific embodiment. It should be understood that the scope of the present invention is not limited to the following examples, and any techniques implemented based on the present disclosure are within the scope of the present invention.

Data used by specific embodiments of the present invention will first be described. The data used in the present invention is from a database named movieeslens. The data source was a total of 1000209 evaluations of 3952 movies by 6040 users between 2000 and 2003. The present invention treats each user's rating for each movie as a cache request for each movie for each user.

Secondly, according to practical situations, the initialization settings of the parameters of the embodiment of the present invention are as follows:

the slot length T is set to 8760 hours with a 1 hour difference between each slot. The user's contextual characteristics are age and gender only, adult and minor, male and female, respectively, i.e. the user's characteristic space a_TIs divided into m_T4 sub-user feature spaces. The features of the movie are divided into 10 features according to a semantic algorithm. The maximum capacity M of the base station is set to 200, i.e. 200 movies can be buffered at maximum. Maximum cache reward E of tree node_maxInfinity. Three constants are set as:

ρ 0.5 and

fig. 6 shows a flow chart of the method of the present invention. The method comprises the following steps:

step 1, user context feature space division, namely dividing the feature space A of a user_TDivided into 4 sub-user feature spaces.

Step 2, initializing the binary tree, namely initializing 4 binary trees gamma when t is 1, wherein

Representing a user feature space a_iThe binary tree of (a) is described,

at the same time, the node (a) is initialized_i1,1) and node (a)_i1,2) of the prize value,

step 3, at the time of t, firstly observing the number N (t) of users served by the base station, and extracting each userThe context characteristics x (t) of users are vectorized, that is, the context characteristics of the jth user can be expressed as x_j(t)，

And 4, according to the extracted user context characteristics, each user selects the user type of the user.

Step 5, if the jth user belongs to the type a_iThen in the tree

And performing optimal path search. And 5, repeating the step until all the users served by the base station at the current moment are traversed.

Step 6, selecting M files with the highest occurrence frequency from the recommended cache files of all users, and putting the selected files into a current-time cache file set C, where the M files may be expressed as C ═ C₁(t),c₂(t),...,c_M(t)}。

And 7, counting the number of times of each user requesting each file in the cached file set C at the t-th moment. The number of requests of the jth user to the file m of the cached file set C can be represented as d_j,m，j＝1,2,...,N(t)， m＝1,2,...,M。

Step 8, for the jth user, in the corresponding feature space a_iTree of (2)

In the above, the reward value of the node and the cached times are backtracked and updated along the optimal path. And repeating the step 8 until all the users served by the base station at the current moment are traversed.

Step 9, in a (t) ((a))_i(t)), where i ═ 1, 2., n (t), a non-repeating user feature subspace set Λ is selected_a(t)。

Step 10, at Λ_a(t)For each of the feature subspaces a_iCorresponding tree

And judging whether leaf node expansion is performed or not. Until the feature subspace Lambda is traversed_a(t)All the trees above.

Step 11, if t is less than 8760, t is t +1, and the step 3 is returned; otherwise, the loop is exited.

Claims

1. A wireless caching method based on Monte Carlo tree search assistance comprises the following steps:

Representing a user sub-feature space a_iThe binary tree of (a) is described,

And then the optimal path search is carried out,obtaining a terminal leaf node with the highest reward value of the jth user, randomly selecting a path when the reward values are the same, and taking all files on the leaf node as recommended cache files of the jth user at the time t; repeating the step 5 until all users of the base station at the current moment are traversed;

step 7, counting the number of times of requests of each user to each file in the cache file set C at the t-th moment; the number of requests of the jth user to the file m of the cached file set C can be represented as d_j,m，j＝1,2,...,N(t)，m＝1,2,...,M；

Step 8, for the jth user, in the corresponding sub-feature space a_iBinary tree of

step 9, for each user sub-feature space a_iCorresponding tree

and step 10, returning to step 3, wherein t is t + 1.

2. The wireless caching method based on the monte carlo tree search assist as claimed in claim 1, wherein the calculation method for updating each node reward value in the step 8 is as follows: counting the times of the files cached by the base station in the node requested by the user at the time t, and taking the sum of the counted times as the cache reward of the time

The reward for updating the node is:

wherein

3. The wireless caching method based on the monte carlo tree search assist as claimed in claim 2, wherein the reward value of each node is

4. The wireless caching method based on the monte carlo tree search assist as claimed in claim 3, wherein the method for determining whether the leaf nodes are expanded in the step 9 is:

step 1: calculating leaf node expansion threshold

Step 2: if it is

And is

Is a tree

And if not, expanding the leaf node, otherwise, not expanding the leaf node.

5. A wireless caching method based on Monte Carlo tree search assistance comprises the following steps:

step 1: classifying all users connected with the base station according to the user context characteristics;

6. The wireless caching method based on the monte carlo tree search assist as claimed in claim 5, wherein the method for growing the binary tree for each type of users in step 2 is: