CN116346837A - Internet of things edge collaborative caching method based on deep reinforcement learning - Google Patents

Internet of things edge collaborative caching method based on deep reinforcement learning Download PDF

Info

Publication number
CN116346837A
CN116346837A CN202310296228.9A CN202310296228A CN116346837A CN 116346837 A CN116346837 A CN 116346837A CN 202310296228 A CN202310296228 A CN 202310296228A CN 116346837 A CN116346837 A CN 116346837A
Authority
CN
China
Prior art keywords
edge server
parameter
global
server
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310296228.9A
Other languages
Chinese (zh)
Inventor
郭永安
周沂
王宇翱
钱琪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310296228.9A priority Critical patent/CN116346837A/en
Publication of CN116346837A publication Critical patent/CN116346837A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which comprises the following steps: (1) The edge server collects video cache information of the terminal layer user equipment and constructs a data set, (2) the distributed edge server trains the data set through a model training module; (3) The central server receives the local gradient parameters of each distributed edge server to carry out parameter aggregation, so as to obtain global gradient parameters; (4) The central server inputs the fitted global gradient parameters through a parameter training module, trains the neural network and outputs updated global model parameters; (5) Repeating the steps (1) - (4) to obtain a prediction model of the online video request, and predicting the video content request to obtain a prediction list of the online video request of the user; (6) And according to the obtained prediction list of the online video request of the user, the plurality of distributed edge servers perform collaborative caching until each distributed edge server reaches the upper storage limit.

Description

Internet of things edge collaborative caching method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of edge caching, and particularly relates to an Internet of things edge collaborative caching method based on deep reinforcement learning.
Background
With the rapid growth of network users, video quality is upgraded from the traditional 1080P high-definition content to 4K and 8K levels, and complex man-machine interaction modes such as VR and AR are generated, so that a backbone network is subjected to severe pressure. The edge caching technique can effectively reduce latency and backhaul load by placing the most popular content on servers closer to the requesting user. Edge caching techniques present challenges due to the limited storage space of edge servers and the varying popularity of content over time and space.
Reinforcement learning can accommodate environmental changes without any prior knowledge about the dynamics of the environment. Conventional reinforcement learning algorithms are limited to dynamic environments with fully observable low-dimensional state spaces. The state space of an actual edge-cached environment is typically high-dimensional, and manually extracting all useful features from the environment can create significant difficulties. The deep reinforcement learning can automatically determine an optimal strategy based on the original high-dimensional environment state space, effectively solve the dimension disaster and provide an effective solution for the actual edge cache environment.
Most of the current edge caching systems based on deep reinforcement learning use centralized content caching, but the centralized scheme requires a centralized controller to collect the local parameters of all servers and then generate content caching decisions for them. The computational complexity of the centralized approach grows exponentially with the number of servers. Thus, some studies confirm the effectiveness of the distributed content caching scheme.
Disclosure of Invention
The invention aims to: the invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which applies the deep reinforcement learning to content popularity perception and caching decision of edge caching, fully utilizes the self-adaptive capacity of the deep reinforcement learning, and can realize sensitive perception of network state, user request and content popularity and timely respond. The cache strategy is constantly optimized through learning, so that the cache hit rate of the edge cache is improved, the storage and calculation resources of the edge server are efficiently utilized, and the time delay and the return load are reduced.
The technical scheme is as follows: in order to solve the technical problems, the invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which comprises the following steps:
step one: the edge server collects video cache information of terminal layer user equipment in the area to construct a dynamic log file data set, wherein data set elements comprise video user ID, time stamp and video content ID;
step two: each distributed edge server trains a data set through a model training module, a neural network of the model training module inputs the data set into a dynamic log file data set containing video user ID, time stamp and video content ID label, and after training, the model training module outputs local gradient parameters containing cache information of the edge server
Figure BDA0004143194980000021
Synchronously forwarding to a central server and an adjacent edge server; the neural network in the distributed edge server is to get local gradient parameters +.>
Figure BDA0004143194980000022
The parameter may reflect a cache state of the edge server;
step three: the center server receives the local gradient parameters sent by each distributed edge server
Figure BDA0004143194980000023
Then, the local gradient parameter is treated by a parameter aggregation module>
Figure BDA0004143194980000024
Polymerizing to obtain global gradient parameter G τ The method comprises the steps of carrying out a first treatment on the surface of the The distributed edge servers share the local gradient parameters of each other through a cooperative sharing module>
Figure BDA0004143194980000025
So as to realize cache information interaction among the edge servers;
step four: the central server inputs the fitted global gradient parameter G through a parameter training module τ After training of the neural network, the updated global model parameter omega is output τ The neural network of the central server is to obtain the global model parameter omega τ This parameter may further optimize the neural network in the edge server; the central server uses the global model parameter omega τ Sending to each distributed edge server to perform a new round of local gradient parameters
Figure BDA0004143194980000026
Global gradient parameter G τ Global model parameters omega τ Is updated according to the update of (a);
step five: repeating the first step to the fourth step until the prediction model converges to obtain a prediction model of the online video request of the video user, and predicting the video content request to obtain a prediction list of the online video request of the user; the prediction model after training is automatically updated;
step six: and according to a prediction list of the online video request of the user obtained by the prediction model, the plurality of distributed edge servers perform collaborative caching until each distributed edge server reaches the upper storage limit.
Further, the specific method of the first step is as follows: the distributed edge servers m collect video cache information i of terminal layer user equipment d in the coverage area of the distributed edge servers m, and each distributed edge server establishes a dynamic log file data set X according to the video cache information m The data in the data set is subjected to label classification to obtain three types of data: video user ID, timestamp, and video content ID.
Further, the specific method of the second step is as follows:
in the step 2.1 of the method,data set division, namely classifying tags into dynamic log file data sets X m Divided into having minimum batch size
Figure BDA0004143194980000027
Beta represents the batch size into which the training data set is divided, and M represents the number of edge servers;
step 2.2, generating an output matrix, and for the DNN neural network, generating the output matrix by the distributed edge server:
Figure BDA0004143194980000028
wherein,,
Figure BDA0004143194980000029
is the input matrix of the first layer of the neural network in the distributed edge server, alpha m Is a rectifying linear unit activation function in the edge server for converting the input of each layer of neural network into a nonlinear mode, defining global model parameters ω, ω= (W, v), w= [ W) covering all DNN layers 1 ,…,W l ,…,W L ]Sum v= [ v 1 ,…,v l ,…,v L ],W l Is a global weight matrix, v l Is a global bias vector, L represents the number of layers of the neural network;
step 2.3, calculating a predictive loss function, generating at the output layer
Figure BDA0004143194980000031
Figure BDA0004143194980000032
Predictive loss p for finding minimum batch iteration number τ of edge server m mτ ):
Figure BDA0004143194980000033
Where τ represents a small sample completion trainingAn iterative process of training; t represents the time for completing the iterative process τ; omega τ Representing global model parameters at the time of the iterative process τ;
Figure BDA0004143194980000034
Figure BDA0004143194980000035
is a distributed edge server input matrix +.>
Figure BDA0004143194980000036
Element(s) of->
Figure BDA0004143194980000037
Is a distributed edge server output matrix +.>
Figure BDA0004143194980000038
Is an element of (2);
step 2.4, calculating local gradient parameters; by calculation:
Figure BDA0004143194980000039
deriving local gradient parameters for distributed edge servers
Figure BDA00041431949800000310
Further, calculating global gradient parameter G in the third step τ The formula of (2) is as follows:
Figure BDA00041431949800000311
further, the specific steps of the fourth step are as follows:
step 4.1, calculating a learning step length lambda, a neural network deployed by a parameter training module based on a central server, and a global gradient parameter, and obtaining eta τ And delta τ Respectively regarded as G τ And
Figure BDA00041431949800000312
for estimating the mean to predict the variance, η at the current sample iteration τ τ+1 And delta τ+1 The updated formula of (c) is as follows:
Figure BDA00041431949800000313
Figure BDA00041431949800000314
wherein,,
Figure BDA00041431949800000315
and->
Figure BDA00041431949800000316
Respectively represent eta τ And delta τ Exponential decay step at τ for updating global model parameter ω τ Adding a learning step size lambda to determine to update the global model omega at each iteration process tau τ The update formula of the learning step size lambda is as follows:
Figure BDA00041431949800000317
step 4.2, calculating the global model parameter ω of the next iteration process τ+1 τ+1
Figure BDA00041431949800000318
Wherein omega τ+1 The dataset for the edge server to learn the next τ+1, ε representing a constant;
step 4.3, predicting the global model parameter ω τ+1 Sending to each distributed edge server to perform a new round of local gradient parameters
Figure BDA0004143194980000041
Global gradient parameter G τ And global model parameters omega τ Is updated according to the update of the update program.
Further, the specific steps of the step six are as follows:
step 6.1, each distributed edge server interacts with a prediction list of the online video requests of the terminal layer users in the coverage area of the distributed edge server;
step 6.2, counting video request times according to the occurrence frequency of different online videos in prediction lists of different users;
and 6.3, carrying out collaborative caching by the distributed edge servers according to the video request times, and if the current server caches the video, not repeatedly caching the video by the adjacent servers until each edge server reaches the upper storage limit.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
1. according to the invention, training models are deployed at the edge server and the center server respectively, the cache data is subjected to centralized model training and distributed cache operation based on a deep reinforcement learning algorithm, and the flow consumption of a return link is reduced and a cache strategy is optimized in real time in a side cloud cooperative mode.
2. The invention realizes collaborative edge cache, allows the distributed edge servers to mutually collaborate, and further improves the cache hit rate of the system. Through the transmission parameters, the user information privacy is protected and the communication overhead is reduced while the data is shared.
3. The self-learning type buffer strategy adjustment is realized, the self-adaption capability of the deep reinforcement learning can realize real-time analysis of data requests and design of corresponding buffer strategies, and the learning capability of the deep reinforcement learning model is improved along with accumulation of buffer data quantity, so that the buffer hit rate is further improved.
Drawings
FIG. 1 is a general framework diagram of an edge collaborative caching system and an operation method of the Internet of things based on deep reinforcement learning;
FIG. 2 is a diagram of a central server architecture and workflow in accordance with the present invention;
FIG. 3 is a diagram of an edge server architecture and workflow in accordance with the present invention.
The specific embodiment is as follows:
embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
Fig. 1 is a general framework diagram of the deep reinforcement learning-based internet of things edge collaborative caching system and the operation method, which show the hierarchical relationship among a central server, an edge server and a terminal in the invention. Based on the above, the invention provides an Internet of things edge collaborative caching method based on deep reinforcement learning, which comprises the following steps:
step one: the edge server collects video cache information of terminal layer user equipment in the area to construct a dynamic log file data set, wherein data set elements comprise video user ID, time stamp and video content ID;
step two: each distributed edge server trains a data set through a model training module, a neural network of the model training module inputs the data set into a dynamic log file data set containing video user ID, time stamp and video content ID label, and after training, the model training module outputs local gradient parameters containing cache information of the edge server
Figure BDA0004143194980000051
Synchronously forwarding to a central server and an adjacent edge server; the neural network in the distributed edge server is to get local gradient parameters +.>
Figure BDA0004143194980000052
The parameter may reflect a cache state of the edge server;
step three: the center server receives the local gradient parameters sent by each distributed edge server
Figure BDA0004143194980000053
Then, the local gradient parameter is treated by a parameter aggregation module>
Figure BDA0004143194980000054
Polymerizing to obtain global gradient parameter G τ The method comprises the steps of carrying out a first treatment on the surface of the The distributed edge servers share the local gradient parameters of each other through a cooperative sharing module>
Figure BDA0004143194980000055
So as to realize cache information interaction among the edge servers;
step four: the central server inputs the fitted global gradient parameter G through a parameter training module τ After training of the neural network, the updated global model parameter omega is output τ The neural network of the central server is to obtain the global model parameter omega τ This parameter may further optimize the neural network in the edge server; the central server uses the global model parameter omega τ Sending to each distributed edge server to perform a new round of local gradient parameters
Figure BDA0004143194980000056
Global gradient parameter G τ Global model parameters omega τ Is updated according to the update of (a);
step five: repeating the first step to the fourth step until the prediction model converges to obtain a prediction model of the online video request of the video user, and predicting the video content request to obtain a prediction list of the online video request of the user; the prediction model after training is automatically updated;
step six: and according to a prediction list of the online video request of the user obtained by the prediction model, the plurality of distributed edge servers perform collaborative caching until each distributed edge server reaches the upper storage limit.
Further, the specific method of the first step is as follows: the distributed edge server m gathers video cache messages for the end layer user device d within its coverage areaInformation i, each distributed edge server establishes a dynamic log file data set X according to video cache information m The data in the data set is subjected to label classification to obtain three types of data: video user ID, timestamp, and video content ID.
Further, the specific method of the second step is as follows:
step 2.1, dividing the data set, and classifying the tags to obtain a dynamic log file data set X m Divided into having minimum batch size
Figure BDA0004143194980000057
Beta represents the batch size into which the training data set is divided, and M represents the number of edge servers;
step 2.2, generating an output matrix, and for the DNN neural network, generating the output matrix by the distributed edge server:
Figure BDA0004143194980000061
wherein,,
Figure BDA0004143194980000062
is the input matrix of the first layer of the neural network in the distributed edge server, alpha m Is a rectifying linear unit activation function in the edge server for converting the input of each layer of neural network into a nonlinear mode, defining global model parameters ω, ω= (W, v), w= [ W) covering all DNN layers 1 ,…,W l ,…,W L ]Sum v= [ v 1 ,…,v l ,…,v L ],W l Is a global weight matrix, v l Is a global bias vector, L represents the number of layers of the neural network;
step 2.3, calculating a predictive loss function, generating at the output layer
Figure BDA0004143194980000063
Figure BDA0004143194980000064
For findingPredictive loss p to edge server m minimum number of batch iterations τ mτ ):
Figure BDA0004143194980000065
Wherein τ represents the iteration process of training for a small batch of samples; t represents the time for completing the iterative process τ; omega τ Representing global model parameters at the time of the iterative process τ;
Figure BDA0004143194980000066
Figure BDA0004143194980000067
is a distributed edge server input matrix +.>
Figure BDA0004143194980000068
Element(s) of->
Figure BDA0004143194980000069
Is a distributed edge server output matrix +.>
Figure BDA00041431949800000610
Is an element of (2);
step 2.4, calculating local gradient parameters; by calculation:
Figure BDA00041431949800000611
deriving local gradient parameters for distributed edge servers
Figure BDA00041431949800000612
Further, calculating global gradient parameter G in the third step τ The formula of (2) is as follows:
Figure BDA00041431949800000613
further, the specific steps of the fourth step are as follows:
step 4.1, calculating a learning step length lambda, a neural network deployed by a parameter training module based on a central server, and a global gradient parameter, and obtaining eta τ And delta τ Respectively regarded as G τ And
Figure BDA00041431949800000614
for estimating the mean to predict the variance, η at the current sample iteration τ τ+1 And delta τ+1 The updated formula of (c) is as follows:
Figure BDA00041431949800000615
Figure BDA00041431949800000616
wherein,,
Figure BDA00041431949800000617
and->
Figure BDA00041431949800000618
Respectively represent eta τ And delta τ Exponential decay step at τ for updating global model parameter ω τ Adding a learning step size lambda to determine to update the global model omega at each iteration process tau τ The update formula of the learning step size lambda is as follows:
Figure BDA0004143194980000071
step 4.2, calculating the global model parameter ω of the next iteration process τ+1 τ+1
Figure BDA0004143194980000072
Wherein omega τ+1 The dataset for the edge server to learn the next τ+1, ε representing a constant;
step 4.3, predicting the global model parameter ω τ+1 Sending to each distributed edge server to perform a new round of local gradient parameters
Figure BDA0004143194980000073
Global gradient parameter G τ And global model parameters omega τ Is updated according to the update of the update program.
Further, the specific steps of the step six are as follows:
step 6.1, each distributed edge server interacts with a prediction list of the online video requests of the terminal layer users in the coverage area of the distributed edge server;
step 6.2, counting video request times according to the occurrence frequency of different online videos in prediction lists of different users;
and 6.3, carrying out collaborative caching by the distributed edge servers according to the video request times, and if the current server caches the video, not repeatedly caching the video by the adjacent servers until each edge server reaches the upper storage limit.
As shown in fig. 2, the central server includes a parameter aggregation module, a parameter training module, and a cache state global view module. Wherein the parameter aggregation module aggregates local gradient parameters sent by each distributed edge server
Figure BDA0004143194980000074
Obtaining global gradient parameter G τ The method comprises the steps of carrying out a first treatment on the surface of the The parameter training module is based on global gradient parameters G τ The central server will continue to calculate the global model parameters ω for the next iteration process τ And sends it to each distributed edge server. Each distributed edge server will be based on global model parameters ω τ Adjusting a subsequent cache strategy; the cache state global view module may overview all edge server cache cases.
As shown in fig. 3, edgesThe edge server comprises a model training module, a collaboration sharing module and a rewards module. Wherein, each distributed edge server trains the data set through the model training module, the neural network input of the model training module is a dynamic log file data set containing video user ID, time stamp and video content ID label, after training, the model training module outputs local gradient parameters containing cache information of the edge server
Figure BDA0004143194980000075
Synchronously forwarding to a central server and an adjacent edge server; the neural network in the distributed edge server is to get local gradient parameters +.>
Figure BDA0004143194980000076
The parameter may reflect a cache state of the edge server; the collaboration sharing module is used for receiving cache information of other edge servers, and then combining global model parameters omega issued by the center server according to the local data request τ Helping the model training module to better make a caching decision; the rewards module is used for calculating rewards values of caching operation, data cached in the local server and the adjacent server both contribute to the hit rate of the system, and therefore rewards are set to be a weighted sum of the two so as to promote cooperation between the edge servers.

Claims (6)

1. The method for collaborative caching of the edges of the Internet of things based on deep reinforcement learning is characterized by comprising the following steps of:
step one: the edge server collects video cache information of terminal layer user equipment in the area to construct a dynamic log file data set, wherein data set elements comprise video user ID, time stamp and video content ID;
step two: each distributed edge server trains a data set through a model training module, a neural network of the model training module inputs a dynamic log file data set containing video user ID, time stamp and video content ID label, and after training, the model training module outputs data set containing cache information of the edge serverLocal gradient parameters of (a)
Figure FDA0004143194970000011
Synchronously forwarding to a central server and an adjacent edge server; the neural network in the distributed edge server is to get local gradient parameters +.>
Figure FDA0004143194970000012
The parameter reflects the cache state of the edge server;
step three: the center server receives the local gradient parameters sent by each distributed edge server
Figure FDA0004143194970000013
Then, the local gradient parameter is subjected to the parameter aggregation module>
Figure FDA0004143194970000014
Polymerizing to obtain global gradient parameter G τ The method comprises the steps of carrying out a first treatment on the surface of the The distributed edge servers share the local gradient parameters of each other through a cooperative sharing module>
Figure FDA0004143194970000015
So as to realize cache information interaction among the edge servers;
step four: the central server inputs the fitted global gradient parameter G through a parameter training module τ After training of the neural network, the updated global model parameter omega is output τ The neural network of the central server is to obtain the global model parameter omega τ This parameter further optimizes the neural network in the edge server; the central server uses the global model parameter omega τ Sending to each distributed edge server to perform a new round of local gradient parameters
Figure FDA0004143194970000016
Global gradient parameter G τ Global model parameters omega τ Is updated according to the update of (a);
step five: repeating the first step to the fourth step until the prediction model converges to obtain a prediction model of the online video request of the video user, and predicting the video content request to obtain a prediction list of the online video request of the user; the prediction model after training is automatically updated;
step six: and according to a prediction list of the online video request of the user obtained by the prediction model, the plurality of distributed edge servers perform collaborative caching until each distributed edge server reaches the upper storage limit.
2. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 1, wherein the specific method in the first step is as follows: the distributed edge servers m collect video cache information i of terminal layer user equipment d in the coverage area of the distributed edge servers m, and each distributed edge server establishes a dynamic log file data set X according to the video cache information m The data in the data set is subjected to label classification to obtain three types of data: video user ID, timestamp, and video content ID.
3. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 1, wherein the specific method of the second step is as follows:
step 2.1, dividing the data set, and classifying the tags to obtain a dynamic log file data set X m Divided into having minimum batch size
Figure FDA0004143194970000021
Beta represents the batch size into which the training data set is divided, and M represents the number of edge servers;
step 2.2, generating an output matrix, and for the DNN neural network, generating the output matrix by the distributed edge server:
Figure FDA0004143194970000022
wherein,,
Figure FDA0004143194970000023
is the input matrix of the first layer of the neural network in the distributed edge server, alpha m Is a rectifying linear unit activation function in the edge server for converting the input of each layer of neural network into a nonlinear mode, defining global model parameters ω, ω= (W, v), w= [ W) covering all DNN layers 1 ,...,W l ,...,W L ]Sum v= [ v 1 ,...,v l ,....,v L ],W l Is a global weight matrix, v l Is a global bias vector, L represents the number of layers of the neural network;
step 2.3, calculating a predictive loss function, generating at the output layer
Figure FDA0004143194970000024
Predictive loss p for finding minimum batch iteration number τ of edge server m mτ ):
Figure FDA0004143194970000025
Wherein τ represents the iteration process of training for a small batch of samples; t represents the time for completing the iterative process τ; omega τ Representing global model parameters at the time of the iterative process τ;
Figure FDA0004143194970000026
is a distributed edge server input matrix
Figure FDA0004143194970000027
Element(s) of->
Figure FDA0004143194970000028
Is a distributed edge server output matrix +.>
Figure FDA0004143194970000029
Is an element of (2);
step 2.4, calculating local gradient parameters; by calculation:
Figure FDA00041431949700000210
deriving local gradient parameters for distributed edge servers
Figure FDA00041431949700000211
4. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 3, wherein the global gradient parameter G is calculated in the third step τ The formula of (2) is as follows:
Figure FDA00041431949700000212
5. the method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 4, wherein the specific steps of the fourth step are as follows:
step 4.1, calculating a learning step length lambda, a neural network deployed by a parameter training module based on a central server, and a global gradient parameter, and obtaining eta τ And delta τ Respectively regarded as G τ And
Figure FDA00041431949700000213
for estimating the mean to predict the variance, η at the current sample iteration τ τ+1 And delta τ+1 The updated formula of (c) is as follows:
Figure FDA00041431949700000214
Figure FDA00041431949700000215
wherein,,
Figure FDA0004143194970000031
and->
Figure FDA0004143194970000032
Respectively represent eta τ And delta τ Exponential decay step at τ for updating global model parameter ω τ Adding a learning step size lambda to determine to update the global model omega at each iteration process tau τ The update formula of the learning step size lambda is as follows:
Figure FDA0004143194970000033
step 4.2, calculating the global model parameter ω of the next iteration process τ+1 τ+1
Figure FDA0004143194970000034
Wherein omega τ+1 The dataset for the edge server to learn the next τ+1, ε representing a constant;
step 4.3, predicting the global model parameter ω τ+1 Sending to each distributed edge server to perform a new round of local gradient parameters
Figure FDA0004143194970000035
Global gradient parameter G τ And global model parameters omega τ Is updated according to the update of the update program.
6. The method for collaborative caching of the edge of the internet of things based on deep reinforcement learning according to claim 1, wherein the specific steps in the sixth step are as follows:
step 6.1, each distributed edge server interacts with a prediction list of the online video requests of the terminal layer users in the coverage area of the distributed edge server;
step 6.2, counting video request times according to the occurrence frequency of different online videos in prediction lists of different users;
and 6.3, carrying out collaborative caching by the distributed edge servers according to the video request times, and if the current server caches the video, not repeatedly caching the video by the adjacent servers until each edge server reaches the upper storage limit.
CN202310296228.9A 2023-03-24 2023-03-24 Internet of things edge collaborative caching method based on deep reinforcement learning Pending CN116346837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310296228.9A CN116346837A (en) 2023-03-24 2023-03-24 Internet of things edge collaborative caching method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310296228.9A CN116346837A (en) 2023-03-24 2023-03-24 Internet of things edge collaborative caching method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116346837A true CN116346837A (en) 2023-06-27

Family

ID=86892560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310296228.9A Pending CN116346837A (en) 2023-03-24 2023-03-24 Internet of things edge collaborative caching method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116346837A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116915781A (en) * 2023-09-14 2023-10-20 南京邮电大学 Edge collaborative caching system and method based on blockchain
CN117010485A (en) * 2023-10-08 2023-11-07 之江实验室 Distributed model training system and gradient protocol method in edge scene

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116915781A (en) * 2023-09-14 2023-10-20 南京邮电大学 Edge collaborative caching system and method based on blockchain
CN116915781B (en) * 2023-09-14 2023-12-12 南京邮电大学 Edge collaborative caching system and method based on blockchain
CN117010485A (en) * 2023-10-08 2023-11-07 之江实验室 Distributed model training system and gradient protocol method in edge scene
CN117010485B (en) * 2023-10-08 2024-01-26 之江实验室 Distributed model training system and gradient protocol method in edge scene

Similar Documents

Publication Publication Date Title
Elgendy et al. Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms
Ale et al. Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning
Zhang et al. Digital twin empowered content caching in social-aware vehicular edge networks
Jiang et al. User preference learning-based edge caching for fog radio access network
Sun et al. Cooperative computation offloading for multi-access edge computing in 6G mobile networks via soft actor critic
CN116346837A (en) Internet of things edge collaborative caching method based on deep reinforcement learning
Zhang et al. Deep learning for wireless coded caching with unknown and time-variant content popularity
CN107909108A (en) Edge cache system and method based on content popularit prediction
CN113191484A (en) Federal learning client intelligent selection method and system based on deep reinforcement learning
CN114143891A (en) FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network
Zhang et al. Federated learning with adaptive communication compression under dynamic bandwidth and unreliable networks
Fan et al. PA-cache: Evolving learning-based popularity-aware content caching in edge networks
CN113315978B (en) Collaborative online video edge caching method based on federal learning
Tang et al. Collective deep reinforcement learning for intelligence sharing in the internet of intelligence-empowered edge computing
Chen et al. An artificial intelligence perspective on mobile edge computing
Zhou et al. Edge computation offloading with content caching in 6G-enabled IoV
Fan et al. DNN deployment, task offloading, and resource allocation for joint task inference in IIoT
CN114154643A (en) Federal distillation-based federal learning model training method, system and medium
Geng et al. Bearing fault diagnosis based on improved federated learning algorithm
Tang et al. Representation and reinforcement learning for task scheduling in edge computing
CN116471286A (en) Internet of things data sharing method based on block chain and federal learning
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Gali et al. A Distributed Deep Meta Learning based Task Offloading Framework for Smart City Internet of Things with Edge-Cloud Computing.
Wan et al. Deep Reinforcement Learning‐Based Collaborative Video Caching and Transcoding in Clustered and Intelligent Edge B5G Networks
Cui et al. Multiagent reinforcement learning-based cooperative multitype task offloading strategy for internet of vehicles in B5G/6G network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination