CN113923128A

CN113923128A - Intelligent coding caching method based on federal reinforcement learning in fog wireless access network

Info

Publication number: CN113923128A
Application number: CN202111258088.3A
Authority: CN
Inventors: 蒋雁翔; 陈颖琦
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-01-11
Anticipated expiration: 2041-10-27
Also published as: CN113923128B

Abstract

The invention discloses an intelligent coding caching method based on federal reinforcement learning in a fog wireless access network, which comprises the following steps: 1. the cloud center server builds a global prediction model and initializes the global prediction model, and a plurality of distributed learning agents are arranged at the fog access point and initializes the local prediction model; 2. the system observes the global state; 3. each fog access point observes local conditions; 4. the system uses a multicast coding transmission mode to meet the request received by each network edge node according to the implemented content placement decision; 5. each fog access point calculates theoretical time delay and virtual feedback according to local request conditions and stores local learning experience; 6. each fog access point independently performs local model training by using local learning and remembering experience; 7. and each fog access point uploads a local model to the cloud center server every a plurality of time slots to update the global model. The invention can reduce the forward load of the system, reduce the transmission time delay and protect the privacy of users.

Description

Intelligent coding caching method based on federal reinforcement learning in fog wireless access network

Technical Field

The invention belongs to the field of intelligent caching of edge networks in mobile communication systems, and particularly relates to an intelligent coding caching method based on federal reinforcement learning in a fog wireless access network.

Background

With the wide popularization of intelligent devices and the rapid development of mobile applications, mobile communication networks will bear increasingly large traffic loads. In order to handle these increased data traffic, a Fog wireless Access network (Fog Access Point, F-AP) is proposed as a new network architecture, where the Fog wireless Access Point (Fog Access Point, F-AP) is disposed near the network edge of the user and uses its edge storage capability and edge computing capability to reduce the traffic pressure of the cloud center server, thereby improving the user experience. Meanwhile, the coding cache is used as a new cache mode, the local cache and the multicast transmission are effectively combined, and the limited storage space of the edge equipment can be more fully utilized.

However, most of the existing researches on the inconsistent popularity coding caching method are focused on the upper and lower bound analysis of the theoretical performance of the fixed content placement strategy, so that the potential of the coding caching cannot be fully utilized by the non-optimal content placement strategy to reduce the forward load. In addition, the current non-uniform popularity coding and caching method assumes an ideal condition that the popularity of the content is fixed, and ignores the fact that the popularity of the content changes along with time, so that the performance of forwarding load, time delay and the like cannot reach an ideal state in practice. Therefore, there is a need for a coding caching method under the condition of time-varying popularity, which reduces the content transmission delay and improves the stability at the same time, and provides a higher-quality and more reliable communication service for mobile users.

Disclosure of Invention

The invention aims to provide an intelligent coding caching method based on federal reinforcement learning in a fog wireless access network, and aims to solve the technical problems of reducing forward load, reducing request time delay and protecting user privacy.

In order to solve the technical problems, the specific technical scheme of the invention is as follows:

a code caching method based on federal reinforcement learning in a fog wireless access network comprises the following steps:

step 1, constructing a global model theta on a cloud center server_GAnd initializing, arranging a learning agent at each fog access point and initializing its local model θ_k；

Step 2, at the end of the time slot t, the system observes the global state s (t), and each fog access point observes the local state s thereof_k(t)；

Step 3, the system uses the global model to predict and implement the actual code cache content placement decision a (t) according to the observed global state s (t), and each fog access point simultaneously uses the observed local state s_k(t) predicting and recording virtual content placement decisions a_k(t)；

Step 4, in the time slot t +1, the system places a decision a (t) according to the implemented content and uses a multicast coding transmission mode to meet the request received by each fog access point;

step 5, at the end of time slot t +1, the local state is changed into s_k(t +1), each fog access point counts the content request condition of the service user in the time slot t +1 and makes a decision according to the virtual placement thereof_k(t) calculating theoretical time delay and virtual feedback r_k(t) storing local learning experience [ s ]_k(t)，a_k(t)，r_k(t)，s_k(t+1)]^T；

Step 6, randomly sampling local learning experience of each fog access point for training to realize local model updating;

step 7, every T_AIn each time slot, all the fog access points upload the local models to the cloud center server for integration, and download the integrated global model for replacing the local models;

and 8, repeatedly executing the step 2 to the step 7 until the predicted performance fluctuation of the global model in the continuous time slot is less than 5 percent.

Further, the step 1 specifically includes the following steps:

step 1.1, the cloud center server constructs a neural network Q (s, a; theta)_G) Where s is the current global state vector, a is the global content placement decision vector, θ_GAre global network model parameters and are randomly initialized;

step 1.2, for

For the fog access point index set, K is the number of fog access points, and the fog access points K construct a neural network Q(s)_k，a_k；θ_k) Wherein s is_kFor the current local state vector, a_kPlacement of decision vectors, θ, for local content_kAre local network model parameters and are randomly initialized.

Further, the step 2 specifically includes the following steps:

step 2.1, at the end of the time slot t, the system observes the global state

Where a (t-1) is the global buffering decision made by the t-1 slot system,

requesting frequency vectors for statistics of all N files in the t time slot system;

step 2.2, at the same time, for

Fog access point k observes its local state

Wherein a is_k(t-1) local caching decisions for t-1 time slot records,

the requested frequency vectors for all N files in the slot tmouch access point k.

Further, the step 3 specifically includes the following steps:

step 3.1, at the end of the time slot t, the system uses the global model to predict and implement the actual coding cache content placement decision according to the observed global state s (t)

Wherein N is_c(t) represents the number of files of the encoding buffer, c_n(t) < 1 > indicates that file n is selected, c_nWhen (t) is 0, it means that the file n is not writtenSelecting;

step 3.2, for

Fog access point k based on observed local state s_k(t) predicting and recording virtual code cache content placement decisions using its local model

Wherein

Indicating the number of files of the encoding buffer,

the presentation file n is selected to indicate that,

it indicates that file n is not selected.

Further, the step 3.1 specifically includes the following steps:

step 3.1.1, at the end of the time slot t, the system predicts the actual coding cache content placement decision by using a global model according to the observed global state s (t):

step 3.1.2, implementing a content placement strategy: by using

Represents a subset of fog access points, wherein

Is the number of elements is L_tOf fog access points, L_tIs a variable and L_t＝KM/N_c(t), M is the buffer size of the fog access point, | □ | represents the number of the collection elements; splitting a file n into

Subfiles of the same size

Wherein

Representing a selected set of file indexes, shaped as

A combination number representing the number of B elements taken out of A different elements; for the

The placement content of the fog access point k is as follows:

further, the step 3.2 specifically includes the following steps:

step 3.2.1, for

Fog access point k based on observed local state s_k(t) according to a greedy action selection strategy, randomly selecting a virtual content placement decision with a probability of epsilon, and predicting the virtual content placement decision with a local model thereof with a probability of 1-epsilon:

step 3.2.2, record content Placement strategy a_k(t) but not practiced;

further, the step 4 specifically includes the following steps:

step 4.1, each fog access point receives a user request;

step 4.2, for the cached request, use

Representing its index set; by using

Representing a fog access point set cached with the request file; let

Represents a set of mist access point subsets, wherein

For the fog access point subset, the cloud center server is connected to

The content of the multicast transmission is:

wherein the content of the first and second substances,

representing a bit exclusive or operation;

step 4.3, for the request which is not cached, the cloud center server sends the request to the server

The request content is unicast transmitted.

Further, the step 5 specifically includes:

step 5.1, for

The fog access point K averagely divides the received V requests into K parts;

step 5.2, making each request collected by fog access point

Is received in fog, where k₁，k₂，...，k_K-1Is a virtually existing virtual fog access point;

step 5.3, collecting station for fog access points

The number of buffered requests of the ith request of all K fog access points is expressed as

Wherein

The ith request representing a virtual fog access point k' is file n,

the ith request, representing a virtual fog access point k', is not file n and thus it is loaded from the forward link by

Wherein

Is a variable and

the form min (A, B) represents that the smaller value is selected from A, B;

step 5.4, in the time slot t, the theoretical time delay of the fog access point k is expressed as

Wherein d is_fTime delay for complete transmission of one file to the cloud center server_aA time delay for complete transmission of a file from the fog access point to the user;

and 5.5, the virtual feedback comprises the following steps:

wherein

μ₁+μ₂＝1，0＜μ₁＜μ₂＜1；

Step 5.6, learning experience [ s ]_k(t)，a_k(t)，r_k(t)，s_k(t+1)]^TStored in a local experience playback pool.

Further, the step 6 specifically includes the following steps:

step 6.1, for

Fog access point k randomly draws experience s from within local experience playback pool_k(j)，a_k(j)，r_k(j)，s_k(j+1)]^T；

Step 6.2, training by using gradient descent and carrying out theta on the local model_kAnd (6) updating.

The intelligent coding caching method based on federal reinforcement learning in the fog wireless access network has the following advantages:

1. aiming at the more challenging scene of the time-varying content popularity in the fog wireless access network, the invention tracks the time-varying popularity by using federal reinforcement learning and adaptively makes a content placement decision, can effectively reduce the forward load, reduce the transmission delay and keep the stable performance, and is closer to the reality in the applicable scene.

2. The invention uses a virtual coding cache method, calculates the theoretical time delay and virtual feedback of a virtual content placement strategy by assuming a virtual fog access point, thereby solving the problem of local training data collection caused by the fact that a single fog access point cannot execute coding cache and realizing distributed local training.

3. The method uses federated learning, fully utilizes the edge computing capability of the fog access node, obtains the global prediction model of the coding cache content placement strategy through distributed training and model integration, and avoids repeatedly uploading training data to the cloud center server, thereby reducing the bandwidth resource waste of a fronthaul link and reducing the privacy disclosure risk of a user.

Drawings

Fig. 1 is a schematic flow chart of a code caching method based on federal reinforcement learning in a fog wireless access network according to the present invention;

fig. 2 is a simulation result diagram of average delay performance of the code caching method based on federal reinforcement learning in the fog wireless access network.

Detailed Description

In order to better understand the purpose, structure and function of the present invention, the following describes an intelligent code caching method based on federal reinforcement learning in a fog radio access network in detail with reference to the accompanying drawings.

The embodiment provides a fog radio access network asynchronous request code caching method, as shown in fig. 1, which includes the following steps:

step 1, constructing a global model theta on a cloud center server_GAnd initializing, arranging a learning agent at each fog access point and initializing its local model θ_k。

The step 1 specifically comprises the following steps:

step 1.2, for

For the fog access point index set, K is the number of fog access points, and the fog access points K construct a neural network Q(s)_k，a_k；θ_k) Wherein s is_kFor the current local state vector, a_kPlacement of decision vectors, θ, for local content_kIs a local network model parameter andis randomly initialized.

Step 2, at the end of the time slot t, the system observes the global state s (t), and each fog access point observes the local state s thereof_k(t)。

The step 2 specifically comprises the following steps:

step 2.1, at the end of the time slot t, the system observes the global state

Where a (t-1) is the global buffering decision made by the t-1 slot system,

step 2.2, at the same time, for

Fog access point k observes its local state

Wherein a is_k(t-1) local caching decisions for t-1 time slot records,

Step 2.3, the system uses the global model to predict and implement the actual code cache content placement decision a (t) according to the observed global state s (t), and each fog access point simultaneously uses the observed local state s_k(t) predicting and recording virtual content placement decisions a_k(t)。

The step 3 specifically comprises the following steps:

step 3.1, at the end of the time slot t, the system uses a global model to predict the actual code cache content placement decision according to the observed global state s (t)

Wherein N is_c(t) represents the number of files of the encoding buffer, c_n(t) < 1 > indicates that file n is selected, c_nIf (t) ═ 0, then file n is not selected. The content placement policy is then enforced: by using

Represents a subset of fog access points, wherein

Is the number of elements is L_tOf fog access points, L_tIs a variable and L_t＝KM/N_c(t), M is the buffer size of the fog access point, and the form | □ | represents the number of the collection elements; splitting a file n into

Subfiles of the same size

Wherein

Representing a selected set of file indexes, shaped as

The number of combinations of B elements taken out of a different elements is indicated. For the

The placement content of the fog access point k is as follows:

step 3.2, at the same time, for

Fog access point k based on observed local state s_k(t) selecting a strategy according to a greedy action, randomly selecting a virtual content placement decision with a probability of epsilon, and predicting the virtual content placement decision with a local model thereof with a probability of 1-epsilon

Record the content placement policy a_k(t) but not implemented.

And 4, in the time slot t +1, the system uses a multicast coding transmission mode to meet the request received by each fog access point according to the content placement decision a (t).

The step 4 specifically comprises the following steps:

step 4.1, each fog access point receives a user request;

step 4.2, for the cached request, use

Representing its index set. By using

Representing a set of fog access points that have cached the request file. Let

Represents a set of mist access point subsets, wherein

For the fog access point subset, the cloud center server is connected to

The content of the multicast transmission is:

wherein the content of the first and second substances,

representing a bit exclusive or operation;

The request content is unicast transmitted.

Step 5, at the end of time slot t +1, the local state is changed into s_k(t +1), each fog access point counts the content request condition of the service user in the time slot t +1 and makes a decision according to the virtual placement thereof_k(t) calculating theoretical time delay and virtual feedback r_k(t) storing local learning experience [ s ]_k(t)，a_k(t)，r_k(t)，s_k(t+1)]^T。

The step 5 specifically comprises the following steps:

step 5.2, for

The fog access point K averagely divides the received V requests into K parts;

step 5.2, assume that each request is assembled by a fog access point

step 5.3, for fog Access Point set

The ith request of all K fog access points in the system, a cached request number tableShown as

Wherein

The ith request representing a virtual fog access point k' is file n,

Wherein

Is a variable and

the form min (A, B) represents that the smaller value is selected from A, B;

and 5.5, the virtual feedback comprises the following steps:

wherein

μ₁+μ₂＝1，0＜μ₁＜μ₂＜1；

Step 5.6, learning experience [ s ]_k(t)，a_k(t)，r_k(t)，s_k(t+1)]^TStoring in a local experience playback pool;

and 6, randomly sampling local learning experience of each fog access point for training to realize local model updating.

The step 6 specifically comprises the following steps:

step 6.1, for

Step 7, every T_AAnd in each time slot, all the fog access points upload the local models to the cloud center server for integration, and download the integrated global model for replacing the local models.

The step 7 specifically comprises the following steps:

step 7.1, for

The fog access point k will have its local model θ_kUploading to a cloud center server;

step 7.2, the cloud center server integrates:

wherein D_kFor fog access point k at T_AThe total number of the extracted learning experiences in the training of the time slots;

step 7.3, for

The fog access point k downloads the updated global model theta_GAnd make theta_k＝θ_G。

In the simulation results shown in FIG. 2, LFU (Least frequency used) is a conventional Non-coding Caching method, NUCC (Abdollah Ghaffari Sheshjavani et al, authored in 2020 Wireless Communications and network Conference), Coded Cable Under Non-Uniform Content delivery distribution with Multiple Requests, and APCC (Coded Cable architecture Power distribution, authored in 2018, IEEE Transactions on Information Theory) are two coding methods designed for Non-Uniform Popularity, CentralizedCentralizedmethod is a Centralized extension method of the present invention method, and Propoad is the present patent method. Compared with LFU, NUCC, APCC and other methods, the method realizes more stable and lower-delay content transmission, and has no obvious performance loss compared with a centralized extension method.

It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A code caching method based on federal reinforcement learning in a fog wireless access network is characterized by comprising the following steps:

Step 2, at the end of the time slot t, the system observes the global state s (t), and each fog access point observes the local state thereofs_k(t)；

2. The code caching method based on federal reinforcement learning in a fog radio access network according to claim 1, wherein the step 1 specifically comprises the following steps:

step 1.2, for

3. The code caching method based on federal reinforcement learning in a fog radio access network according to claim 1, wherein the step 2 specifically comprises the following steps:

step 2.1, at the end of the time slot t, the system observes the global state

Where a (t-1) is the global buffering decision made by the t-1 slot system,

step 2.2, at the same time, for

Fog access point k observes its local state

Wherein a is_k(t1) local caching decisions for t-1 slot records,

4. The code caching method based on federal reinforcement learning in a fog radio access network according to claim 1, wherein the step 3 specifically comprises the following steps:

Wherein N is_c(t) represents the number of files of the encoding buffer, c_n(t) < 1 > indicates that file n is selected, c_n(t) ═ 0 then indicates that file n has not been selected;

step 3.2, for

Wherein

Indicating the number of files of the encoding buffer,

the presentation file n is selected to indicate that,

it indicates that file n is not selected.

5. The code caching method based on federal reinforcement learning in a fog wireless access network as claimed in claim 4, wherein the step 3.1 specifically comprises the following steps:

step 3.1.2, implementing a content placement strategy: by using

Represents a subset of fog access points, wherein

Subfiles of the same size

Wherein

Representing a selected set of file indexes, shaped as

The placement content of the fog access point k is as follows:

6. the code caching method based on federal reinforcement learning in a fog radio access network as claimed in claim 4, wherein said step 3.2 specifically comprises the following steps:

step 3.2.1, for

step 3.2.2, record content Placement strategy a_k(t) but not implemented.

7. The code caching method based on federal reinforcement learning in a fog radio access network according to claim 1, wherein the step 4 specifically comprises the following steps:

step 4.1, each fog access point receives a user request;

step 4.2, for the cached request, use

Representing its index set; by using

Representing a fog access point set cached with the request file; let

Represents a set of mist access point subsets, wherein

As fog access pointSubset, cloud-centric Server to

The content of the multicast transmission is:

wherein the content of the first and second substances,

representing a bit exclusive or operation;

The request content is unicast transmitted.

8. The code caching method based on federal reinforcement learning in a fog radio access network according to claim 1, wherein the step 5 specifically comprises:

step 5.1, for

The fog access point K averagely divides the received V requests into K parts;

step 5.2, making each request collected by fog access point

step 5.3, for fog Access Point set

Wherein

The ith request representing a virtual fog access point k' is file n,

Wherein

Is a variable and

the form min (A, B) represents that the smaller value is selected from A, B;

and 5.5, the virtual feedback comprises the following steps:

wherein

μ₁+μ₂＝1，0＜μ₁＜μ₂＜1；

9. The code caching method based on federal reinforcement learning in a fog radio access network according to claim 1, wherein the step 6 specifically comprises the following steps:

step 6.1, for