CN116346837A

CN116346837A - Internet of things edge collaborative caching method based on deep reinforcement learning

Info

Publication number: CN116346837A
Application number: CN202310296228.9A
Authority: CN
Inventors: 郭永安; 周沂; 王宇翱; 钱琪杰
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-27

Abstract

The invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which comprises the following steps: (1) The edge server collects video cache information of the terminal layer user equipment and constructs a data set, (2) the distributed edge server trains the data set through a model training module; (3) The central server receives the local gradient parameters of each distributed edge server to carry out parameter aggregation, so as to obtain global gradient parameters; (4) The central server inputs the fitted global gradient parameters through a parameter training module, trains the neural network and outputs updated global model parameters; (5) Repeating the steps (1) - (4) to obtain a prediction model of the online video request, and predicting the video content request to obtain a prediction list of the online video request of the user; (6) And according to the obtained prediction list of the online video request of the user, the plurality of distributed edge servers perform collaborative caching until each distributed edge server reaches the upper storage limit.

Description

Internet of things edge collaborative caching method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of edge caching, and particularly relates to an Internet of things edge collaborative caching method based on deep reinforcement learning.

Background

With the rapid growth of network users, video quality is upgraded from the traditional 1080P high-definition content to 4K and 8K levels, and complex man-machine interaction modes such as VR and AR are generated, so that a backbone network is subjected to severe pressure. The edge caching technique can effectively reduce latency and backhaul load by placing the most popular content on servers closer to the requesting user. Edge caching techniques present challenges due to the limited storage space of edge servers and the varying popularity of content over time and space.

Reinforcement learning can accommodate environmental changes without any prior knowledge about the dynamics of the environment. Conventional reinforcement learning algorithms are limited to dynamic environments with fully observable low-dimensional state spaces. The state space of an actual edge-cached environment is typically high-dimensional, and manually extracting all useful features from the environment can create significant difficulties. The deep reinforcement learning can automatically determine an optimal strategy based on the original high-dimensional environment state space, effectively solve the dimension disaster and provide an effective solution for the actual edge cache environment.

Most of the current edge caching systems based on deep reinforcement learning use centralized content caching, but the centralized scheme requires a centralized controller to collect the local parameters of all servers and then generate content caching decisions for them. The computational complexity of the centralized approach grows exponentially with the number of servers. Thus, some studies confirm the effectiveness of the distributed content caching scheme.

Disclosure of Invention

The invention aims to: the invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which applies the deep reinforcement learning to content popularity perception and caching decision of edge caching, fully utilizes the self-adaptive capacity of the deep reinforcement learning, and can realize sensitive perception of network state, user request and content popularity and timely respond. The cache strategy is constantly optimized through learning, so that the cache hit rate of the edge cache is improved, the storage and calculation resources of the edge server are efficiently utilized, and the time delay and the return load are reduced.

The technical scheme is as follows: in order to solve the technical problems, the invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which comprises the following steps:

step one: the edge server collects video cache information of terminal layer user equipment in the area to construct a dynamic log file data set, wherein data set elements comprise video user ID, time stamp and video content ID;

step two: each distributed edge server trains a data set through a model training module, a neural network of the model training module inputs the data set into a dynamic log file data set containing video user ID, time stamp and video content ID label, and after training, the model training module outputs local gradient parameters containing cache information of the edge server

Synchronously forwarding to a central server and an adjacent edge server; the neural network in the distributed edge server is to get local gradient parameters +.>

The parameter may reflect a cache state of the edge server;

step three: the center server receives the local gradient parameters sent by each distributed edge server

Then, the local gradient parameter is treated by a parameter aggregation module>

Polymerizing to obtain global gradient parameter G _τ The method comprises the steps of carrying out a first treatment on the surface of the The distributed edge servers share the local gradient parameters of each other through a cooperative sharing module>

So as to realize cache information interaction among the edge servers;

step four: the central server inputs the fitted global gradient parameter G through a parameter training module _τ After training of the neural network, the updated global model parameter omega is output _τ The neural network of the central server is to obtain the global model parameter omega _τ This parameter may further optimize the neural network in the edge server; the central server uses the global model parameter omega _τ Sending to each distributed edge server to perform a new round of local gradient parameters

Global gradient parameter G _τ Global model parameters omega _τ Is updated according to the update of (a);

step five: repeating the first step to the fourth step until the prediction model converges to obtain a prediction model of the online video request of the video user, and predicting the video content request to obtain a prediction list of the online video request of the user; the prediction model after training is automatically updated;

step six: and according to a prediction list of the online video request of the user obtained by the prediction model, the plurality of distributed edge servers perform collaborative caching until each distributed edge server reaches the upper storage limit.

Further, the specific method of the first step is as follows: the distributed edge servers m collect video cache information i of terminal layer user equipment d in the coverage area of the distributed edge servers m, and each distributed edge server establishes a dynamic log file data set X according to the video cache information _m The data in the data set is subjected to label classification to obtain three types of data: video user ID, timestamp, and video content ID.

Further, the specific method of the second step is as follows:

in the step 2.1 of the method,data set division, namely classifying tags into dynamic log file data sets X _m Divided into having minimum batch size

Beta represents the batch size into which the training data set is divided, and M represents the number of edge servers;

step 2.2, generating an output matrix, and for the DNN neural network, generating the output matrix by the distributed edge server:

wherein,,

is the input matrix of the first layer of the neural network in the distributed edge server, alpha _m Is a rectifying linear unit activation function in the edge server for converting the input of each layer of neural network into a nonlinear mode, defining global model parameters ω, ω= (W, v), w= [ W) covering all DNN layers ₁ ,…,W _l ,…,W _L ]Sum v= [ v ₁ ,…,v _l ,…,v _L ]，W _l Is a global weight matrix, v _l Is a global bias vector, L represents the number of layers of the neural network;

step 2.3, calculating a predictive loss function, generating at the output layer

Predictive loss p for finding minimum batch iteration number τ of edge server m _m (ω _τ )：

Where τ represents a small sample completion trainingAn iterative process of training; t represents the time for completing the iterative process τ; omega _τ Representing global model parameters at the time of the iterative process τ;

is a distributed edge server input matrix +.>

Element(s) of->

Is a distributed edge server output matrix +.>

Is an element of (2);

step 2.4, calculating local gradient parameters; by calculation:

deriving local gradient parameters for distributed edge servers

Further, calculating global gradient parameter G in the third step _τ The formula of (2) is as follows:

further, the specific steps of the fourth step are as follows:

step 4.1, calculating a learning step length lambda, a neural network deployed by a parameter training module based on a central server, and a global gradient parameter, and obtaining eta _τ And delta _τ Respectively regarded as G _τ And

for estimating the mean to predict the variance, η at the current sample iteration τ _τ+1 And delta _τ+1 The updated formula of (c) is as follows:

wherein,,

and->

Respectively represent eta _τ And delta _τ Exponential decay step at τ for updating global model parameter ω _τ Adding a learning step size lambda to determine to update the global model omega at each iteration process tau _τ The update formula of the learning step size lambda is as follows:

step 4.2, calculating the global model parameter ω of the next iteration process τ+1 _τ+1 ：

Wherein omega _τ+1 The dataset for the edge server to learn the next τ+1, ε representing a constant;

step 4.3, predicting the global model parameter ω _τ+1 Sending to each distributed edge server to perform a new round of local gradient parameters

Global gradient parameter G _τ And global model parameters omega _τ Is updated according to the update of the update program.

Further, the specific steps of the step six are as follows:

step 6.1, each distributed edge server interacts with a prediction list of the online video requests of the terminal layer users in the coverage area of the distributed edge server;

step 6.2, counting video request times according to the occurrence frequency of different online videos in prediction lists of different users;

and 6.3, carrying out collaborative caching by the distributed edge servers according to the video request times, and if the current server caches the video, not repeatedly caching the video by the adjacent servers until each edge server reaches the upper storage limit.

The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

1. according to the invention, training models are deployed at the edge server and the center server respectively, the cache data is subjected to centralized model training and distributed cache operation based on a deep reinforcement learning algorithm, and the flow consumption of a return link is reduced and a cache strategy is optimized in real time in a side cloud cooperative mode.

2. The invention realizes collaborative edge cache, allows the distributed edge servers to mutually collaborate, and further improves the cache hit rate of the system. Through the transmission parameters, the user information privacy is protected and the communication overhead is reduced while the data is shared.

3. The self-learning type buffer strategy adjustment is realized, the self-adaption capability of the deep reinforcement learning can realize real-time analysis of data requests and design of corresponding buffer strategies, and the learning capability of the deep reinforcement learning model is improved along with accumulation of buffer data quantity, so that the buffer hit rate is further improved.

Drawings

FIG. 1 is a general framework diagram of an edge collaborative caching system and an operation method of the Internet of things based on deep reinforcement learning;

FIG. 2 is a diagram of a central server architecture and workflow in accordance with the present invention;

FIG. 3 is a diagram of an edge server architecture and workflow in accordance with the present invention.

The specific embodiment is as follows:

embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.

Fig. 1 is a general framework diagram of the deep reinforcement learning-based internet of things edge collaborative caching system and the operation method, which show the hierarchical relationship among a central server, an edge server and a terminal in the invention. Based on the above, the invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which comprises the following steps:

The parameter may reflect a cache state of the edge server;

So as to realize cache information interaction among the edge servers;

Further, the specific method of the first step is as follows: the distributed edge server m gathers video cache messages for the end layer user device d within its coverage areaInformation i, each distributed edge server establishes a dynamic log file data set X according to video cache information _m The data in the data set is subjected to label classification to obtain three types of data: video user ID, timestamp, and video content ID.

Further, the specific method of the second step is as follows:

step 2.1, dividing the data set, and classifying the tags to obtain a dynamic log file data set X _m Divided into having minimum batch size

wherein,,

For findingPredictive loss p to edge server m minimum number of batch iterations τ _m (ω _τ )：

Wherein τ represents the iteration process of training for a small batch of samples; t represents the time for completing the iterative process τ; omega _τ Representing global model parameters at the time of the iterative process τ;

is a distributed edge server input matrix +.>

Element(s) of->

Is a distributed edge server output matrix +.>

Is an element of (2);

step 2.4, calculating local gradient parameters; by calculation:

deriving local gradient parameters for distributed edge servers

further, the specific steps of the fourth step are as follows:

wherein,,

and->

Further, the specific steps of the step six are as follows:

As shown in fig. 2, the central server includes a parameter aggregation module, a parameter training module, and a cache state global view module. Wherein the parameter aggregation module aggregates local gradient parameters sent by each distributed edge server

Obtaining global gradient parameter G _τ The method comprises the steps of carrying out a first treatment on the surface of the The parameter training module is based on global gradient parameters G _τ The central server will continue to calculate the global model parameters ω for the next iteration process _τ And sends it to each distributed edge server. Each distributed edge server will be based on global model parameters ω _τ Adjusting a subsequent cache strategy; the cache state global view module may overview all edge server cache cases.

As shown in fig. 3, edgesThe edge server comprises a model training module, a collaboration sharing module and a rewards module. Wherein, each distributed edge server trains the data set through the model training module, the neural network input of the model training module is a dynamic log file data set containing video user ID, time stamp and video content ID label, after training, the model training module outputs local gradient parameters containing cache information of the edge server

The parameter may reflect a cache state of the edge server; the collaboration sharing module is used for receiving cache information of other edge servers, and then combining global model parameters omega issued by the center server according to the local data request _τ Helping the model training module to better make a caching decision; the rewards module is used for calculating rewards values of caching operation, data cached in the local server and the adjacent server both contribute to the hit rate of the system, and therefore rewards are set to be a weighted sum of the two so as to promote cooperation between the edge servers.

Claims

1. The method for collaborative caching of the edges of the Internet of things based on deep reinforcement learning is characterized by comprising the following steps of:

step two: each distributed edge server trains a data set through a model training module, a neural network of the model training module inputs a dynamic log file data set containing video user ID, time stamp and video content ID label, and after training, the model training module outputs data set containing cache information of the edge serverLocal gradient parameters of (a)

The parameter reflects the cache state of the edge server;

Then, the local gradient parameter is subjected to the parameter aggregation module>

So as to realize cache information interaction among the edge servers;

step four: the central server inputs the fitted global gradient parameter G through a parameter training module _τ After training of the neural network, the updated global model parameter omega is output _τ The neural network of the central server is to obtain the global model parameter omega _τ This parameter further optimizes the neural network in the edge server; the central server uses the global model parameter omega _τ Sending to each distributed edge server to perform a new round of local gradient parameters

2. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 1, wherein the specific method in the first step is as follows: the distributed edge servers m collect video cache information i of terminal layer user equipment d in the coverage area of the distributed edge servers m, and each distributed edge server establishes a dynamic log file data set X according to the video cache information _m The data in the data set is subjected to label classification to obtain three types of data: video user ID, timestamp, and video content ID.

3. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 1, wherein the specific method of the second step is as follows:

wherein,,

is the input matrix of the first layer of the neural network in the distributed edge server, alpha _m Is a rectifying linear unit activation function in the edge server for converting the input of each layer of neural network into a nonlinear mode, defining global model parameters ω, ω= (W, v), w= [ W) covering all DNN layers ₁ ，...，W _l ，...，W _L ]Sum v= [ v ₁ ，...，v _l ，....，v _L ]，W _l Is a global weight matrix, v _l Is a global bias vector, L represents the number of layers of the neural network;

is a distributed edge server input matrix

Element(s) of->

Is a distributed edge server output matrix +.>

Is an element of (2);

step 2.4, calculating local gradient parameters; by calculation:

deriving local gradient parameters for distributed edge servers

4. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 3, wherein the global gradient parameter G is calculated in the third step _τ The formula of (2) is as follows:

5. the method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 4, wherein the specific steps of the fourth step are as follows:

wherein,,

and->

6. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 1, wherein the specific steps in the sixth step are as follows: