CN117528658A

CN117528658A - Edge collaborative caching method and system based on federal deep reinforcement learning

Info

Publication number: CN117528658A
Application number: CN202311355982.1A
Authority: CN
Inventors: 李春林; 赵明明; 张维明
Original assignee: National University of Defense Technology; Wuhan University of Technology WUT
Current assignee: National University of Defense Technology; Wuhan University of Technology WUT
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2024-02-06

Abstract

The invention relates to an edge collaborative caching method and system based on federal deep reinforcement learning, which construct a content request and user popularity model based on the content request probability of a mobile user in a time slice; constructing a collaborative cache model based on the average cache hit rate of the mobile user; based on access delay of the mobile user to acquire the content, constructing a content access delay model; setting a reward function based on the content request, a user popularity model, a collaborative caching model and access delay, and setting an objective function for minimizing average access delay of the edge cache long-term data by the content access delay model; pre-training the mobile user based on the competition depth Q network learning and the rewarding function to acquire target parameters; and solving by adopting an edge collaborative caching algorithm of federal deep reinforcement learning based on the target parameters to obtain an optimal caching strategy. On the premise that the user privacy is not violated, the content access delay is reduced, and the cache data hit rate is improved.

Description

Edge collaborative caching method and system based on federal deep reinforcement learning

Technical Field

The invention relates to the technical field of edge caching, in particular to an edge collaborative caching method and system based on federal deep reinforcement learning.

Background

In recent years, along with the continuous increase of the data volume of various network media services in a mobile network, the number of intelligent devices such as computers, smart phones and smart televisions is also increased, and the requirements of various sensors and wearable devices on the mobile network are continuously increased, so that the number of users of a distributed network is increased, and massive heterogeneous network data is generated. These increasingly heterogeneous data require both processing and analysis and some privacy. The traditional cloud storage technology needs to upload all data to the cloud for storage, so that privacy security of users is difficult to ensure, huge data needs a large amount of storage resources and computing resources, and the efficiency is low and a great network load is generated.

In the prior art, researchers have proposed a mobile edge computing architecture that sinks computing resources and storage resources from the cloud to the client at the edge network, alleviating the load pressure of the central network and the delay of data processing and supporting deep learning-based resource management. Under the architecture of mobile edge computing, because the communication between the edge server and the user is more efficient than that of the cloud server, the mobile edge computing system can provide faster and more accurate computing and storage services for the user in the mobile network, and is also beneficial to guaranteeing the privacy security of the user. However, due to the increasing number of network edge mobile users in recent years, a large amount of mobile data traffic occupies most of network resources in a mobile edge computing system, so that problems that an intermediate server with limited storage and computing resources cannot timely process a large amount of real-time data, cannot provide services for a large amount of users in a mobile network, communication delay of the network edge increases, backhaul link load increases and the like are generated, and in order to solve the problems, researchers have proposed an edge cache technology, which stores data frequently accessed by users in the intermediate server, so that high-speed access of the users to the same content is realized, and redundant transmission and access delay of the data are greatly reduced.

However, most of the current edge caching strategies only allocate resources for short-term demands, do not take long-term factors affecting the allocation of resources into consideration, and the existing strategies need to use global information which is almost difficult to acquire, and only reach an optimal solution of a system at a certain moment, so that dynamic adaptability and long-term optimization are lacked.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides the edge collaborative caching method and the system based on the federal deep reinforcement learning, which carry out data caching by the edge collaborative caching technology, so that the optimal caching strategy under the current condition is obtained by the edge collaborative caching algorithm of the federal deep reinforcement learning under the condition that global information is not required to be acquired on the premise of ensuring that the privacy of a mobile user is not violated, and the content access delay is reduced and the cache data hit rate is improved.

In order to achieve the above purpose, the edge collaborative caching method based on federal deep reinforcement learning is designed by the invention and is characterized by comprising the following steps:

step S1: based on the content request probability of the mobile user in the time slice, constructing a content request and user popularity model; constructing a collaborative cache model based on the average cache hit rate of the mobile user; based on access delay of the mobile user to acquire the content, constructing a content access delay model;

step S2: setting a reward function based on the content request, a user popularity model, a collaborative caching model and an access delay, and establishing a decision model aiming at minimizing the average access delay of the edge cache long-term data through a Markov decision process;

step S3: based on the reward function, the local edge server adopts a cache replacement algorithm to pretrain the decision model to obtain target parameters;

step S4: the target parameters are sent to a mobile user, and the mobile user updates the target parameters through a gradient descent method;

step S5: the updated target parameters are sent to a local edge server, the local edge server aggregates the target parameters, and the target parameters are updated to complete a federal deep reinforcement learning process;

step S6: and (3) continuously repeating the steps S3-S6 to finish multiple iterations so as to meet the set precision requirement, and inputting the target parameters into the decision model to obtain the optimal caching strategy.

Preferably, the expression of the content request and the user popularity model is:

in the method, in the process of the invention,representing the number of requests made by mobile user u to content f within time slot t, +.>For the sum of the number of requests issued by mobile user u in time slot t, F represents all the content.

Preferably, the expression of the collaborative caching model is:

where U represents the total number of mobile users,representing request probability->Indicating whether the mobile user u requests content f within time slice t to be hit by the local edge server and the collaborative edge service.

Preferably, the method for constructing the content access delay model comprises the following steps:

1) When the content requested by mobile user u hits in the local edge server, the average transmission rate from local edge server n to mobile user u within time slice t is calculated asThe formula is as follows:

in the method, in the process of the invention,representing the transmission bandwidth allocated to user u, P _n Representing the transmission power of the edge server n, h _u,n Channel gain, sigma, representing mobile user u and edge server n _u,n Representing gaussian noise power between mobile user u and edge server n;

2) Content access latency when content requested by mobile user u hits in local edge serverThe formula of (2) is:

wherein S is _f The size of the content is indicated and,indicating whether the local edge server hits the content f requested by the mobile user u;

3) Content access latency if the local edge server fails to hit and the collaborative edge server successfully hitsThe formula of (2) is:

wherein d _bs Representing round-trip transmission delays of the local edge server and the cooperating edge server,indicating whether the collaboration edge server hits the content requested by mobile user u;

4) If both the local edge server and the collaboration edge server fail to hit, a content request is sent to the remote content provider with delayed content accessThe formula of (2) is:

wherein d _cp Representing the round trip transmission delay between the remote content provider and the local edge server,indicating that both the local edge server and the cooperating edge server miss.

Preferably, the reward function is:

wherein d _cp Representing the round trip transmission delay between the remote content provider and the local edge server,representing the average content access delay for a mobile user within a time slice t.

Preferably, the pre-training in step S3 adopts a buffer replacement algorithm based on competition deep Q network learning, and the buffer replacement algorithm of competition deep Q network learning includes the following steps:

step S31: initializing and evaluating network parameters and target network parameters, and acquiring request contents of all mobile users in a time slice;

step S32: if the request content is cached by the local edge server or the cooperative edge server, acquiring the content of the request content, and then terminating iteration;

step S33: if enough space is still available in the cache of the local edge server or the collaborative edge server, acquiring the content from the remote content provider, storing the content in the cache space of the local edge server or the collaborative edge server, and terminating the iteration;

step S34: if the cache of the local edge server or the cooperative edge server has insufficient space, selecting a cache action to calculate a Q value, calculating a reward value based on the reward function to form a cache experience, and adding the cache experience into an experience pool;

step S35: selecting a cache experience from the experience pool, training a neural network through an average loss function, and updating and evaluating network parameters;

step S36: the steps are repeated until the iteration termination condition is met, and the target parameters are updated based on the evaluation network parameters.

Preferably, the buffer replacement algorithm for competition deep Q network learning divides the Q value function into a dominant value a (s, a) and a state value V(s), and calculates the Q value by the following formula:

where s represents the state, a represents the action, ε represents the evaluation network parameter, and a' represents the next action of a.

Preferably, the formula for updating the target parameter in step S4 is:

where k represents the number of iterations,representing the global parameters employed for the kth iteration of mobile user u, η represents the step size, D _u Representing a set of mobile users, x _uj The j-th input sample, y, representing mobile user u _uj Flag output indicating corresponding federal learning task for mobile user u,/->A loss function representing locally training deep reinforcement learning at mobile user u.

Preferably, the formula for aggregating the target parameters in step S5 is:

where k represents the number of iterations,representing a set of mobile users, D _u Representing the local data set owned by mobile user u,the global parameter representing the k+1th iteration of mobile user u, D represents all data sets.

The invention also provides an edge collaborative caching system based on federal deep reinforcement learning, which is characterized by comprising a model building module, a caching strategy initializing module, a pre-training module, a distributed learning module and a caching strategy generating module;

the model construction module is used for constructing a content request and user popularity model based on the content request probability of the mobile user in the time slice; constructing a collaborative cache model based on the average cache hit rate of the mobile user; based on the access delay of the mobile user acquired content, constructing a content access delay model, and inputting the content access delay model into the pre-training module;

the cache policy initializing module sets a reward function based on the content request, a user popularity model, a collaborative cache model and access delay, and establishes a decision model aiming at minimizing the average access delay of the edge cache long-term data through a Markov decision process;

the pre-training module pre-trains the decision model through a cache replacement algorithm to obtain target parameters;

the distributed learning module is used for selecting a mobile user, performing federal deep reinforcement learning based on the target parameters, updating the target parameters and inputting the target parameters into the cache policy generation module;

and the cache policy generation module inputs the target parameters into the decision model to obtain an optimal cache policy.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, through the cooperative edge caching algorithm of federal deep reinforcement learning, under the condition of no network priori knowledge, a proper strategy is learned to deal with the cache replacement problem through interaction with the actual environment, so that the cache efficiency and performance are improved, and a dynamic self-adaptive resource allocation strategy is realized.

2. According to the method, all data information of the mobile user is not needed, only the target parameters uploaded by the mobile user are processed through federal deep reinforcement learning, so that a data caching strategy is obtained, and the content access delay is reduced and the cache data hit rate is improved on the premise that the privacy of the user is not violated.

3. The invention adopts competition deep Q network learning to pretrain, improves learning efficiency, accelerates model convergence rate, saves computing resources and obtains more accurate strategy.

Drawings

FIG. 1 is a flow chart diagram of the method of the present invention;

FIG. 2 is a flowchart of a pre-training algorithm based on competition depth Q network learning;

FIG. 3 is a flow chart of a collaborative edge caching algorithm based on federal deep reinforcement learning.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is evident that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.

As shown in fig. 1, a flow chart of an edge collaborative caching method based on federal deep reinforcement learning provided by the invention comprises the following steps:

1) Content request and user popularity model

In the embodiment of the present invention, a set of edge servers n= {1, 2..once, N }, where all edge servers have the same cache size in each time slice, that isLet u= {1,2,..u } be a group of mobile users under coverage of edge server n, where U = U _n They are distributed in the cache service area. F= {1, 2..f } is a content library supported by a remote content provider, the present invention sets each content to have the same size S _f ,f∈F。

With time variation, the requirement of the mobile user is also changed continuously, T is set as a longer time period, and then T is divided into a plurality of shorter time slices T. Let u be the mobile user and n be the edge server. When the mobile user u is in the coverage range of the edge server n in a certain time slice t, and the u sends out a content request, the edge server n receives the content request, if the edge server n does not have corresponding content, the request is sent to other adjacent edge servers, and if the corresponding content is still not found, the request is sent to a remote content provider.

According to the existing research results, the access frequency of mobile users of most streaming media videos is similar to Ji Pufu Law Zipf's Law, which indicates that content requests of mobile users are often concentrated on a small part of very popular video content, and a large amount of common video content is rarely accessed. Therefore, the user can be set to meet Ji Pufu to distribute Zipf, and the local popularity of the cache content f in the edge server n is set as P _n,f While global popularity P _f ＝∑ _n∈N p _n,f . For any edge server n, adoptTo represent the preferred content of the mobile user, the probability calculation formula of the mobile user u sending out the request to the content f in the time slice t is as follows:

in the middle ofRepresenting the number of requests made by mobile user u to content f within time slot t,/>The sum of the number of requests issued by user u in time slot t.

2) Collaborative caching model

If all edge servers cooperate with only the edge server closest to the edge server, and if an edge server is n, the edge server closest to the edge server is a cooperation edge server bs _n The mobile user can only access one edge server n and send a request to the edge server n in a time slice, and if the content requested by the mobile user fails to hit in the edge server n, the mobile user cooperates with the edge server bs to the mobile user _n Send request, if collaboration edge server bs _n If the corresponding content is cached, a cache hit is also determined. Therefore, for any edge server n in time slice t, the average cache hit rate formula for the user is:

in the formula, U represents the total number of users,representing user popularity +.>Indicating whether user u requests content f for time slice t hit,/->The expression of (2) is as follows:

in the formula (i),respectively, whether the local edge server hits or not and whether the cooperative edge server hits or not, if the hit is 1, the miss is 0.

The values of (2) are calculated from the following formulas:

in the middle ofThe relationship between the request content f and the edge server cache. If the request content f is already cached in the edge server within the time slice t,/for the time slice t>Then 1 is taken and vice versa 0./>For the positional relationship between the mobile user u and the edge server n, if the mobile user u is within the range covered by the edge server n, i.e. +.>When (I)>The value of (2) is 1, and vice versa is 0. Wherein l _u,n Representing the distance between mobile user u and edge server n within time slot t,/>Indicating the position of user u within time slice t, loc _n The location of the server n is indicated, and R is the coverage of the server n. />Indicating whether the local edge cache server hits, < >>Indicating whether the collaboration edge server hits, and if hit is 1, miss is 0.

3) Content access delay model

When the mobile user n is within the coverage of the edge server u, if the user n sends a content request to hit in the local edge server u, the content can be directly obtained from the local edge server, if the user n fails to hit, the content is tried to be obtained from the collaboration edge server, if the collaboration edge server fails to hit, the content can only be obtained from the remote content provider, which clearly generates additional access delay. Therefore, in order to reduce the average content access latency and increase the hit rate of content accesses in the local and collaborative edge servers, it is necessary to decide which cache content to replace per time slice with a cache replacement policy. In the process of acquiring the content, access delay is also influenced by factors such as wireless channel state, and different user equipment can generate different delays due to different channel states. When the content requested by mobile user u hits in the local edge server, the average transfer rate from local edge server n to user u within time slice t isThe calculation formula is as follows:

in the middle ofThe bandwidth of the edge server n is B, which represents the transmission bandwidth allocated to the user u _n The transmission power of the edge server n is P _n The channel gain of mobile user u and edge server n is h _u,n The gaussian noise power between mobile user u and edge server n is σ _u,n 。

When mobile user u is within the coverage of edge server n and sends content request f within time slice t, if the cache of edge server n is hit successfully, then the content access is delayedCan be expressed by the following formula:

wherein S is _f Is the size of the content to be recorded,is the average transmission rate from the local edge server n to the user u in time slice t.

Content access latency if the local edge server fails to hit and the collaborative edge server successfully hitsThe formula of (2) is:

d in _bs Representing round-trip transmission delays of the local edge server and the cooperating edge server.

If both the local edge server and the collaboration edge server fail to hit, a content request is sent to the remote content provider with delayed content accessThe formula of (2) is:

d in _cp Representing the round trip transmission delay between the remote content provider and the local edge server, it is apparent that this delay is greater than the content access delay of the user and the local edge server and the cooperating edge server delay,and->Indicating whether the local edge server and the collaborative edge server hit, respectively,/or not>Then there is a miss for both the local edge server and the collaboration edge server and then a content request is sent to the remote content provider.

specifically, the embodiments of the present invention solve a large-scale cache optimization problem based on a reinforcement learning algorithm, which aims to obtain an optimization strategy that considers the long-term impact of the current decision on resource allocation and aims to maximize the jackpot. For a certain edge server, its state space S, action space a, transition probability P, and reward function R are as follows.

1) State space S

It is assumed that the mobile user requests only one content per time slot. In decision step t, the mobile user request status is Representing that user u does not make a request to edge server n, < >>Content f is requested from the edge server on behalf of user u. The local edge server content cache state is +.>f∈F，Representing that the edge server does not cache the content f, otherwise, < > is not cached>At each decision step t, the state vector iss _t ∈S。

2) Action space A

In each decision step t, it is assumed that the edge server can only replace one cached content or choose not to replace. In addition, the edge server needs to decide whether to provide the service through a local, collaborative edge server or a remote content provider.

In the decision step t of the process,representing a user request being processed by a local edge server, wherein,representing that the requested content has been cached locally,/->Indicating that the local f-th content is to be replaced.Indicating whether the user request is processed by the collaborative edge server, wherein +.>Indicating that the user request is handled by the collaborative edge server, otherwise, < +.> Indicating whether the request is handled by a remote content provider, wherein +.>Indicating that the remote content provider has processed the user's request, otherwise,/or-> A set is allocated for bandwidth of the participating devices. At each decision step t->Representing state s _t Lower system action space, a _t ∈A。

3) Probability of transition

Since the mobility of the user has markov, the probability of the system moving to the next decision step t+1 is P (s' |s, a) =ps [ s ] _t+1 ＝s′|s _t ＝s,a _t ＝a]. Wherein, sigma _s′∈S P(s′|s,a)＝1，s∈S，a∈A。

4) Reward function

At each decision step t, the edge server observes the state s _t And select action a _t The edge server then receives a reward r from the status and action _t ＝R(s _t ,a _t ). In order to minimize the average content access delay of the edge cache, embodiments of the present invention set the reward function as follows.

Wherein d _cp Representing remote content provider and local edgeThe round-trip transmission delay between servers,is the average content access delay for the user within time slice t.

Furthermore, the decisive strategy in the markov process is the mapping a=pi(s) from the state space. The object of the invention is to provide a method for detecting a state of interest in a random initial state s ₁ The long term expected return is maximized. The objective function is such that the long term return R _L Maximizing the average prize V according to the Belman equation as shown below _π (s) is shown below.

Wherein 0< gamma < 1 represents an attenuation factor.

The goal is to find the optimal strategy pi ^* Thus, a strategy model is constructed, and an objective function is expressed as:

specifically, as shown in fig. 2, the embodiment of the present invention adopts a buffer replacement algorithm based on competition deep Q network learning, and the buffer replacement algorithm based on competition deep Q network learning includes the following steps:

in deep Q networks, the neural network is used to solve an approximate Q-value function that corresponds to a Q-value for each state-action, Q (s, a) as shown in the following equation:

wherein s=s _t Represents the current state of decision step t, and s' =s _t+1 Indicates the next state, again a=a _t Represents the current action of decision time slot t, whereas a' =a _t+1 Indicates the next action, 0<Alpha < 1 is learning rate, 0<Gamma < 1 represents the attenuation factor.

Deep Q learning is mainly composed of two types of neural networks, an evaluation neural network and a target neural network, respectively. The evaluation neural network is used for generating a Q value corresponding to a given state-action, namely training evaluation parameter epsilon, and the target neural network is used for generating a target Q value, namely target parameter

The loss function of the estimated neural network is shown in the formula:

the target neural network can be derived from the following formula:

according to the embodiment of the invention, the deep Q learning is improved, the competition deep Q network is constructed, and the Q value function is divided into the dominant value A (s, a) and the state value V(s), so that the learning efficiency is optimized and the convergence speed is increased. The value of action a is represented by a dominance value, while the value of state s is represented by a state value. The sum of the two is the Q value function of the action a in the state s, as shown in the following formula:

Q(s,a|ε)＝A(s,a|ε)+V(s|ε)

in a competing deep Q network, the state values exist independently for the action, but the above formula cannot reflect the role of V(s) and a (s, a) in the final output result, so the embodiment of the invention improves it to the following formula:

the unmodified Q function cannot fully represent the value represented by state s, e.g., whatever the action in a state has, cannot have a substantial effect on the next state, and replacing the action in that state with anything results in a higher value, or in a lower value. That is, the result of determining the value of an action from the state is inaccurate, so that the accurate value of the action can be determined only by combining the state in which the action is located.

Embodiments of the present invention employ greedy policies to select cache actionsTo replace the corresponding cache.

As shown in fig. 3, the edge collaborative caching algorithm for federal deep reinforcement learning according to the embodiment of the present invention includes the following steps:

specifically, the loss function of the mobile user u for local model training is

The local iteration update within time slice t is as follows.

Where η represents the learning rate of the mobile user training. D (D) _u Local data set owned by mobile user uWherein x is _j For the j-th input sample, y _j And outputting the mark corresponding to the federal learning task. />Loss function, x, representing training deep reinforcement learning locally at mobile user u _uj The j-th input sample, y, representing mobile user u _uj Flag output indicating corresponding federal learning task for mobile user u,/->Representing global parameters. Different loss functions can be generated for different machine learning tasks, and the deep reinforcement learning of the embodiment of the invention adopts a least square method as the loss function. To achieve local accuracy θ ε [0,1 ]]The user equipment will perform the local model training iteration until/>I.e. the local training is completed. The invention sets the L requirement of the mobile user u _u =μ log (1/θ) local iterations to achieve local convergence. The embodiment of the invention sets the local precision theta which each mobile user needs to reach _u And a constant mu _u Are all the same, i.e. theta _u θ and μ=μ _u ，L _u =l, U e U, where the constant μ _u Depending on the data size and machine learning task of each mobile user.

edge server n aggregates a set of participating usersAnd updating the uploaded parameters, broadcasting the updated global parameters to the user, and entering the next round of global training. Global model parameter g of k+1st iteration aggregation _n (k+1) is as follows.

Specifically, setting the bang deep reinforcement learning through global parameters, and then obtaining an optimal caching strategy by taking a request of a mobile user as input.

It will be readily appreciated by those skilled in the art that the foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An edge collaborative caching method based on federal deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:

2. The edge collaborative caching method based on federal deep reinforcement learning according to claim 1, wherein the method comprises the following steps: the expression of the content request and user popularity model is as follows:

3. The edge collaborative caching method based on federal deep reinforcement learning according to claim 2, wherein: the expression of the collaborative caching model is as follows:

4. The edge collaborative caching method based on federal deep reinforcement learning according to claim 3, wherein: the construction method of the content access delay model comprises the following steps:

wherein S is _f The size of the content is indicated and,indicating whether the local edge server hits the content requested by mobile user u;

5. The edge collaborative caching method based on federal deep reinforcement learning according to claim 4, wherein: the reward function in step S2 is:

6. The edge collaborative caching method based on federal deep reinforcement learning according to claim 1, wherein the method comprises the following steps: in step S3, the pre-training adopts a buffer replacement algorithm based on competition deep Q network learning, where the buffer replacement algorithm for competition deep Q network learning includes the following steps:

7. The edge collaborative caching method based on federal deep reinforcement learning according to claim 6, wherein: the buffer replacement algorithm for competition deep Q network learning divides the Q value function into a dominant value a (s, a) and a state value V(s), and calculates the Q value by the following formula:

8. The edge collaborative caching method based on federal deep reinforcement learning according to claim 1, wherein the method comprises the following steps: in step S4, the formula for updating the target parameter is:

where k represents the number of iterations,representing the global parameters employed for the kth iteration of mobile user u, η represents the step size, D _u Representing a set of mobile users, x _uj The j-th input sample, y, representing mobile user u _uj Mark output representing corresponding federal learning task for mobile user u, f _u (ω _u ,x _uj ,y _uj ) A loss function representing locally training deep reinforcement learning at mobile user u.

9. The edge collaborative caching method based on federal deep reinforcement learning according to claim 1, wherein the method comprises the following steps: the formula for aggregating the target parameters in step S5 is:

where k represents the number of iterations,representing a set of mobile users, D _u Representing the local data set owned by mobile user u,/->The global parameter representing the k+1th iteration of mobile user u, D represents all data sets.

10. The edge collaborative caching system based on federal deep reinforcement learning is characterized in that: the system comprises a model building module, a cache strategy initializing module, a pre-training module, a distributed learning module and a cache strategy generating module;