CN113923128B

CN113923128B - Intelligent coding caching method based on federal reinforcement learning in fog wireless access network

Info

Publication number: CN113923128B
Application number: CN202111258088.3A
Authority: CN
Inventors: 蒋雁翔; 陈颖琦
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2024-02-13
Anticipated expiration: 2041-10-27
Also published as: CN113923128A

Abstract

The invention discloses an intelligent coding caching method based on federal reinforcement learning in a fog wireless access network, which comprises the following steps: 1. the cloud center server builds a global prediction model and initializes the global prediction model, and a plurality of distributed learning agents are arranged at the fog access points and the local prediction model is initialized; 2. observing the global state by the system; 3. each fog access point observes a local state; 4. the system uses a multicast coding transmission mode to meet the request received by each network edge node according to the implemented content placement decision; 5. each fog access point calculates theoretical time delay and virtual feedback according to local request conditions, and stores local learning experience; 6. each fog access point independently performs local model training by using local learning experience; 7. and uploading the local model to the cloud center server every a plurality of time slots by each fog access point to update the global model. The invention can reduce the forward load of the system, reduce the transmission delay and protect the privacy of users.

Description

Intelligent coding caching method based on federal reinforcement learning in fog wireless access network

Technical Field

The invention belongs to the field of intelligent caching of an edge network in a mobile communication system, and particularly relates to an intelligent coding caching method based on federal reinforcement learning in a fog wireless access network.

Background

With the widespread popularity of smart devices and the rapid development of mobile applications, mobile communication networks will be subjected to increasingly large traffic loads. In order to handle these surge data traffic, a Fog wireless Access network is proposed as a novel network architecture, wherein Fog wireless Access points (F-APs) are arranged near the network edges of users and use their edge storage capability and edge computing capability to reduce the traffic pressure of a cloud center server and improve the user experience. Meanwhile, the coding cache is used as a new cache mode, the local cache and the multicast transmission are effectively combined, and the limited storage space of the edge equipment can be more fully utilized.

However, most of the current research on non-uniform popularity code caching methods is focused on upper and lower bound analysis of theoretical performance of a fixed content placement strategy, so that the non-optimal content placement strategy cannot fully utilize the potential of code caching to reduce the forwarding load. In addition, the current non-uniform popularity coding caching method assumes ideal conditions of fixed popularity of the content, ignores the fact that the popularity of the content changes with time, and causes that the performances of forwarding load, time delay and the like cannot reach ideal states in practice. Therefore, there is a need for a method for encoding and caching in the case of time-varying popularity, which reduces the content transmission delay and improves the stability at the same time, thereby providing higher quality and more reliable communication services for mobile users.

Disclosure of Invention

The invention aims to provide an intelligent coding caching method based on federal reinforcement learning in a fog wireless access network, which aims to solve the technical problems of reducing forward load, reducing request time delay and protecting user privacy.

In order to solve the technical problems, the specific technical scheme of the invention is as follows:

a code caching method based on federal reinforcement learning in a fog wireless access network comprises the following steps:

step 1, constructing a global model theta at a cloud center server _G And initializing, arranging a learning agent at each fog access point and initializing its local model θ _k ；

Step 2, at the end of the time slot t, the system observes the global state s (t), and each mist access point observes its local state s _k (t)；

Step 3, the system predicts and implements the actual coding cache content placement decision a (t) by using a global model according to the observed global state s (t), and each fog access point simultaneously carries out the local state s according to the observation _k (t) predicting and recording virtual content placement decisions a _k (t)；

Step 4, in the time slot t+1, the system uses a multicast coding transmission mode to meet the request received by each fog access point according to the implemented content placement decision a (t);

step 5, at time slot t+1 end, local state becomes s _k (t+1) each fog access point counts the content request condition of its service user in time slot t+1 and makes a decision a according to its virtual placement _k (t) calculating theoretical delay and virtual feedback r _k (t) storing the local learning experience [ s ] _k (t)，a _k (t)，r _k (t)，s _k (t+1)] ^T ；

Step 6, each fog access point randomly samples the local learning experience of the fog access point to train so as to update the local model;

step 7, every T _A Uploading the local model of each fog access point to a cloud center server for integration, and downloading an integrated global model for replacing the local model;

and 8, repeatedly executing the steps 2 to 7 until the predicted performance fluctuation of the global model in the continuous time slot is less than 5%.

Further, the step 1 specifically includes the following steps:

step 1.1, constructing a neural network Q (s, a; theta) by a cloud center server _G ) Where s is the current global state vector, a is the global content placement decision vector, θ _G Is a global network model parameter and is randomly initialized;

step 1.2 forFor the fog access point index set, K is the number of fog access points, and the fog access point K constructs a neural network Q (s _k ，a _k ；θ _k ) Wherein s is _k A is the current local state vector, a _k Placing a decision vector, θ, for local content _k Is a local network model parameter and is randomly initialized.

Further, the step 2 specifically includes the following steps:

step 2.1, at the end of time slot t, the system observes the global stateWherein a (t-1) is a global caching decision made by the t-1 slot system,/for a-1 slot system>The frequency vector is requested for statistics of all N files in the t time slot system;

step 2.2, at the same time, forFog access point k observes its local state +.>Wherein a is _k (t-1) local buffer decision recorded for t-1 time slot,/i>The frequency vector is requested for all N files within slot tboot access point k.

Further, the step 3 specifically includes the following steps:

step 3.1, at the end of time slot t, the system predicts and implements actual encoded cache content placement decisions using global model based on observed global state s (t)Wherein N is _c (t) represents the number of files in the code cache, c _n (t) =1 means that file n is selected, c _n (t) =0 indicates that file n is not selected;

step 3.2 forFog access point k based on observed local state s _k (t) predicting and recording virtual coded cache content placement decisions +.>Wherein->Representing the number of files encoded in the cache,/->Indicating that file n is selected,/->It indicates that file n is not selected.

Further, the step 3.1 specifically includes the following steps:

step 3.1.1, at the end of the time slot t, the system predicts the actual encoded cache content placement decision using the global model according to the observed global state s (t):

step 3.1.2, implementing a content placement strategy: by usingRepresenting a subset of foggy access points, wherein +.>Is the number of elements L _t Is a fog access point subset, L _t Is a variable and L _t ＝KM/N _c (t), M is the cache size of the fog access point, |ζ| represents the number of collection elements taken; splitting File n into +.>Sub-files of the same size +.>Wherein-> Representing the selected set of file indexes, shaped as +.>Representing the number of combinations of B elements taken from among A different elements; for->The placement content of the fog access point k is as follows:

further, the step 3.2 specifically includes the following steps:

step 3.2.1 forFog access point k based on observed local state s _k (t) randomly selecting virtual content placement decisions with a probability of ε according to a greedy action selection policy, predicting virtual content placement decisions with a probability of 1- ε using its local model:

step 3.2.2 record content placement strategy a _k (t) but not implemented;

further, the step 4 specifically includes the following steps:

step 4.1, each fog access point receives a user request;

step 4.2, for cached requests, usingRepresenting its index set; use->Representing a fog access point set in which the request file is cached; let->Representing a set of fog access point subsets, wherein +.>For the fog access point subset, the cloud center server is +.>The contents of the multicast transmission are:

wherein,representing a bit exclusive or operation;

step 4.3, for uncached requests, the cloud center server goes toUnicast transmission requests content.

Further, the step 5 specifically includes:

step 5.1 forThe fog access point K equally divides the received V requests into K parts;

step 5.2, aggregating each request by the fog access pointIs received in a mist-access manner, where k ₁ ，k ₂ ，...，k _K-1 Is a virtual fog access point that virtually exists;

step 5.3, for fog access point set stationThe i-th request of all K mist access points in the network is represented by +.>Wherein->The i-th request representing the virtual foggy access point k' is file n,/or->The ith request representing virtual fog access point k' is not file n so that it is loaded from the forwarding link asWherein->Is a variable and->Form min (a, B) represents the smaller value selected from the two values of A, B;

in step 5.4, in time slot t, the theoretical delay of the fog access point k is expressed asWherein d is _f Delay for completely transmitting one file to fog access point for cloud center server, d _a A delay for complete transmission of a file from the foggy access point to the user;

step 5.5, virtual feedback is as follows:

wherein the method comprises the steps ofμ ₁ +μ ₂ ＝1，0＜μ ₁ ＜μ ₂ ＜1；

Step 5.6, learning experience [ s ] _k (t)，a _k (t)，r _k (t)，s _k (t+1)] ^T Stored in a local experience playback pool.

Further, the step 6 specifically includes the following steps:

step 6.1 forMist access point k randomly extracts experiences s from a local experience playback pool _k (j)，a _k (j)，r _k (j)，s _k (j+1)] ^T ；

Step 6.2 training the local model θ using gradient descent _k And updating.

The intelligent coding caching method based on federal reinforcement learning in the fog wireless access network has the following advantages:

1. aiming at the more challenging scene of time-varying content popularity in the fog wireless access network, the invention uses federal reinforcement learning to track the time-varying popularity, adaptively makes content placement decisions, can effectively reduce forward load, reduce transmission delay and keep performance stable, and is suitable for scenes closer to reality.

2. According to the invention, a virtual coding caching method is used, and theoretical time delay and virtual feedback of a virtual content placement strategy are calculated by assuming a virtual fog access point, so that the problem of local training data collection caused by the fact that a single fog access point cannot execute coding caching is solved, and distributed local training is realized.

3. According to the method, federal learning is used, the edge computing capability of the mist access node is fully utilized, the global prediction model of the coding cache content placement strategy is obtained through distributed training and model integration, and repeated uploading of training data to a cloud center server is avoided, so that bandwidth resource waste of a forward link is reduced, and privacy leakage risk of a user is reduced.

Drawings

Fig. 1 is a schematic flow chart of a code caching method based on federal reinforcement learning in a foggy radio access network according to the present invention;

fig. 2 is a graph of average time delay performance simulation results of a code caching method based on federal reinforcement learning in a foggy radio access network according to the present invention.

Detailed Description

In order to better understand the purpose, structure and function of the present invention, the following describes in detail an intelligent coding caching method based on federal reinforcement learning in a mist radio access network with reference to the accompanying drawings.

The embodiment provides a method for buffering asynchronous request codes of a fog wireless access network, which is shown in fig. 1 and comprises the following steps:

step 1, constructing a global model theta at a cloud center server _G And initializing, arranging a learning agent at each fog access point and initializing its local model θ _k 。

The step 1 specifically comprises the following steps:

Step 2, at the end of the time slot t, the system observes the global state s (t), and each mist access point observes its local state s _k (t)。

The step 2 specifically comprises the following steps:

Step 2.3, the system predicts and implements the actual code cache content placement decision a (t) using the global model based on the observed global state s (t), while each mist access point is based on its observed local state s _k (t) predicting and recording virtual content placement decisions a _k (t)。

The step 3 specifically comprises the following steps:

step 3.1, at the end of the time slot t, the system predicts the actual encoded cache content placement decision using the global model based on the observed global state s (t)

Wherein N is _c (t) represents the number of files in the code cache, c _n (t) =1 means that file n is selected, c _n (t) =0 indicates that file n is not selected. The content placement policy is then implemented: by usingRepresenting a subset of foggy access pointsMiddle->Is the number of elements L _t Is a fog access point subset, L _t Is a variable and L _t ＝KM/N _c (t), M is the cache size of the fog access point, and the shape of M is represented by I and II, and the number of the collection elements is taken; splitting File n into +.>Sub-files of the same sizeWherein-> Representing the selected set of file indexes, shaped as +.>Representing the number of combinations of B elements taken from among A different elements. For->The placement content of the fog access point k is as follows:

step 3.2, at the same time, forFog access point k based on observed local state s _k (t) randomly selecting virtual content placement decisions with epsilon probabilities according to a greedy action selection policy, predicting virtual content placement decisions with 1-epsilon probabilities using their local models ++>

Recording the content placement policy a _k (t) but is not implemented.

And 4, in the time slot t+1, the system uses a multicast coding transmission mode to meet the request received by each fog access point according to the content placement decision a (t).

The step 4 specifically comprises the following steps:

step 4.1, each fog access point receives a user request;

step 4.2, for cached requests, usingRepresenting its index set. Use->Representing a set of foggy access points in which the request file is cached. Let->Representing a set of fog access point subsets, wherein +.>For the fog access point subset, the cloud center server is +.>The contents of the multicast transmission are:

wherein,representing a bit exclusive or operation;

step 4.3,For uncached requests, the cloud center server goes toUnicast transmission requests content.

Step 5, at the end of the time slot t+1, the local state becomes s _k (t+1) each fog access point counts the content request condition of its service user in time slot t+1 and makes a decision a according to its virtual placement _k (t) calculating theoretical delay and virtual feedback r _k (t) storing the local learning experience [ s ] _k (t)，a _k (t)，r _k (t)，s _k (t+1)] ^T 。

The step 5 specifically comprises the following steps:

step 5.2 forThe fog access point K equally divides the received V requests into K parts;

step 5.2, assume that each request is aggregated by a foggy access pointIs received in a mist-access manner, where k ₁ ，k ₂ ，...，k _K-1 Is a virtual fog access point that virtually exists;

step 5.3 for foggy Access Point aggregationThe i-th request of all K mist access points in the network is represented by +.>Wherein->The i-th request representing the virtual foggy access point k' is file n,/or->Then represent a virtual fog access pointThe ith request of k' is not file n, so it is loaded from the forwarding link asWherein->Is a variable and->Form min (a, B) represents the smaller value selected from the two values of A, B;

step 5.5, virtual feedback is as follows:

Step 5.6, learning experience [ s ] _k (t)，a _k (t)，r _k (t)，s _k (t+1)] ^T Stored in a local experience playback pool;

and 6, each fog access point randomly samples the local learning experience of the fog access point to train, and local model updating is achieved.

The step 6 specifically comprises the following steps:

Step 6.2 training the local model θ using gradient descent _k And updating.

Step 7, every T _A And uploading the local model of each cloud access point to a cloud center server for integration, and downloading the integrated global model for replacing the local model.

The step 7 specifically comprises the following steps:

step 7.1 forThe fog access point k will model its local model θ _k Uploading to a cloud center server;

step 7.2, the cloud center server integrates:

wherein D is _k At T for fog access point k _A The number of learning experiences drawn in total in training of the individual time slots;

step 7.3 forMist access point k downloads updated global model θ _G And let theta _k ＝θ _G 。

In the simulation result of fig. 2, LFU (Least frequently Used) is a conventional non-coding caching method, NUCC (Coded Caching Under Non-Uniform Content Popularity Distributions with Multiple Requests published by Abdollah Ghaffari Sheshjavani et al in 2020IEEE Wireless Communications and Networking Conference) and APCC (Coded Caching Under Arbitrary Popularity Distributions published by Jinbei Zhang et al in IEEE Transactions on Information Theory in 2018) are two coding caching methods designed for non-uniform popularity, centralised is a Centralized extension method of the present invention, and propend is the method of the present patent. Compared with LFU, NUCC, APCC and other methods, the method of the invention realizes more stable and lower-delay content transmission, and has no obvious performance loss compared with a centralized extension method.

It will be understood that the invention has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. The code caching method based on federal reinforcement learning in the fog wireless access network is characterized by comprising the following steps:

Step 3, the system predicts and implements the actual content placement decision a (t) by using the global model according to the observed global state s (t), and each fog access point simultaneously predicts and implements the actual content placement decision a (t) according to the observed local state s _k (t) predicting and recording virtual content placement decisions a _k (t)；

In the time slot t+1, the system uses a multicast coding transmission mode to meet the request received by each fog access point according to the implemented actual content placement decision a (t);

step 5, at the end of the time slot t+1, the local state becomes s _k (t+1) each fog access point counts the content request condition of its service user in time slot t+1 and makes decision a according to its virtual content placement _k (t) calculating theoretical delay and virtual feedback r _k (t) storing the local learning experience [ s ] _k (t)，a _k (t)，r _k (t)，s _k (t+1)] ^T ；

2. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 1, wherein the step 1 specifically comprises the following steps:

3. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 2, wherein the step 2 specifically comprises the following steps:

4. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 3, wherein the step 3 specifically comprises the following steps:

step 3.1, at the end of time slot t, the system predicts and implements actual content placement decisions using global models based on the observed global state s (t)Wherein N is _c (t) represents the number of files in the code cache, c _n (t) =1 means that file n is selected, c _n (t) =0 indicates that file n is not selected;

step 3.2 forFog access point k based on observed local state s _k (t) predicting and recording virtual content placement decisions using its local model>Wherein->Representing the number of files encoded in the cache,/->Indicating that file n is selected,/->It indicates that file n is not selected.

5. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 4, wherein the step 3.1 specifically comprises the steps of:

step 3.1.1, at the end of the time slot t, the system predicts the actual content placement decision using the global model according to the observed global state s (t):

step 3.1.2, implementing a content placement strategy: by usingRepresenting a subset of foggy access points, wherein +.>Is the number of elements L _t Is a fog access point subset, L _t Is a variable and L _t ＝KM/N _c (t), M is the cache size of the fog access point,representing the number of collection elements; splitting File n into +.>Sub-files of the same size +.>Wherein the method comprises the steps of Representing the selected set of file indexes, shaped as +.>Representing the number of combinations of B elements taken from among A different elements; for->The placement content of the fog access point k is as follows:

6. the code caching method based on federal reinforcement learning in a foggy radio access network according to claim 5, wherein the step 3.2 specifically comprises the steps of:

step 3.2.1 forFog access point k based on observed local state s _k (t) according to greedyAction selection strategy, randomly selecting virtual content placement decisions with epsilon probability, and predicting virtual content placement decisions with 1-epsilon probability by using local model:

step 3.2.2 recording virtual content placement decision a _k (t) but is not implemented.

7. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 6, wherein the step 4 specifically comprises the steps of:

step 4.1, each fog access point receives a user request;

wherein,representing a bit exclusive or operation, s\ { k } is the set subtraction of the fog access point subset S and the fog access point set { k };

step 4.3, for uncached requests, the cloud center server goes toUnicast transmission request content,/->Index set for fog access point->Mist access point set with request file cached +.>Is a set subtraction of (a).

8. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 7, wherein the step 5 specifically comprises:

step 5.2, aggregating each request by the fog access pointWherein k is a fog access point reception ₁ ，k ₂ ，...，k _K-1 Is a virtual fog access point that virtually exists;

step 5.3 for foggy Access Point aggregationIth request of all K foggy access points in the network, definitionRepresenting the number of requests buffered, wherein +.>The i-th request representing the virtual foggy access point k' is file n,/or->The i-th request indicating the virtual fog access point k' is not file n and thus its forward link load is +.>Wherein->Is a variable and->Form min (a, B) represents the smaller value selected from the two values of A, B;

step 5.5, virtual feedback is as follows:

9. The code caching method based on federal reinforcement learning in a foggy radio access network according to claim 8, wherein the step 6 specifically comprises the steps of:

Step 6.2 training the local model θ using gradient descent _k And updating.