CN113255004B

CN113255004B - Safe and efficient federal learning content caching method

Info

Publication number: CN113255004B
Application number: CN202110666876.XA
Authority: CN
Inventors: 邓娜; 王凯伦
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2024-06-14
Anticipated expiration: 2041-06-16
Also published as: CN113255004A

Abstract

The invention provides a safe and efficient federal learning content caching method, firstly, each user downloads a discriminator sharing model and a generator sharing model from an edge server, and locally trains WGAN the discriminator local model and WGAN the generator local model; in addition, the user performs gradient clipping and model correction locally, so that gradient privacy protection is realized and communication cost is reduced; secondly, each user sends the corrected model update to an edge server, and the edge server aggregates to generate a new sharing model and sends the new sharing model to each user for the next training; repeating the training methods in the first two contents until the model is trained; finally, each user sends the pseudo data generated by the local generator to an edge server, and the server predicts the popularity trend of the content and makes a caching decision. The invention can effectively protect the privacy of the user and prevent the private data of the user from being revealed; the popularity trend of the content can be accurately predicted, and the high-cache hit rate and high-efficiency communication are realized.

Description

Safe and efficient federal learning content caching method

Technical Field

The invention belongs to the technical field of wireless communication, and relates to a safe and efficient federal learning content caching method.

Background

Currently, with the rapid development of the internet and mobile technology, mobile data traffic increases dramatically, which puts tremendous strain on the current crowded mobile networks. On the one hand, large data traffic results in high backhaul traffic load and long delay, and on the other hand, users want to acquire high quality content quickly and efficiently. Therefore, it is necessary to deploy the caching device at the edge of the mobile network. The buffer storage device avoids repeated data transmission, and makes the data more approximate to the user side, thereby reducing the backhaul traffic load between the user and the Internet, reducing the load of backbone network, and reducing the service delay of the mobile user. However, the buffering capacity of the buffering device is limited, and how to efficiently utilize the buffering capacity of the device becomes a popular field of current research, namely, designing an efficient buffering decision scheme.

In general, the caching scheme is divided into two types, namely, reactive caching and active caching. Reactive caching decides what content to cache based on past access requests by the user, e.g., first in first out (First Input First Output, FIFO), least recently Used (LEAST RECENTLY Used, LRU), least commonly Used (Least Frequently Used, LFU). They have hysteresis because they react only to historical access requests. Although it reacts quickly to changes in streaming content, the popularity trend of the content is not considered, resulting in a lower cache hit rate. In contrast, active caching first predicts the popularity of content using historical requests and context information, and then actively picks out and caches content that may be popular in the future, which is helpful for improving cache hit rate. At present, machine learning algorithms are widely applied to predict popular trends of content and are incorporated into active content caching.

In a typical machine learning algorithm, the data operations are performed in a central processing unit. In other words, the user data is highly controlled, i.e. it is necessary to collect all data to a central processor for unified training. For example, a method using reinforcement learning (see literature ：N.Zhang,K.Zheng and M.Tao,"Using Grouped Linear Prediction and Accelerated Reinforcement Learning for Online Content Caching,"in 2018IEEE International Conference on Communications Workshops(ICC Workshops),pp.1-6,2018), collaborative filtering method (see literature ：E.Bastug,M.Bennis,and M.Debbah,"Living on the edge:The role of proactive caching in 5G wireless networks,"IEEE Communications Magazine,vol.52,no.8,pp.82–89,2014.), linear regression method (see literature ：K.N.Doan,T.Van Nguyen,T.Q.S.Quek and H.Shin,"Content-Aware Proactive Caching for Backhaul Offloading in Cellular Network,"in IEEE Transactions on Wireless Communications,vol.17,no.5,pp.3128-3140,May 2018.), etc. to predict popular trends of content.) however, the above-described centralized machine learning method requires collecting information and behavior data of users to train models, and in fact users are not willing to provide such data because they typically involve privacy and sensitive information of users.

Federal learning is a distributed framework that can effectively protect the security of participants (i.e., users) private data. In the training process of federal learning, an edge server is responsible for maintaining a shared global model, and each participant carries out local training according to the shared model to generate a local model. The edge server and the participants do not exchange private data, but exchange updates of the shared global model and local model parameters. Specifically, the edge server receives updates of each participant's local model, aggregated into a new shared model for the next training. The participants receive the new shared model, perform model training using the local data, generate the new local model and send updates to the edge servers. The training process is completed by the cyclic reciprocation. However, existing active content caching methods based on federal learning, while guaranteeing the security of the participant private data, still reveal some of the privacy of the participants. For example, federal learning is applied in a foggy radio access network (see ：Y.Wu,Y.Jiang,M.Bennis,F.Zheng,X.Gao and X.You,"Content Popularity Prediction in Fog Radio Access Networks:A Federated Learning Based Approach,"ICC 2020-2020IEEE International Conference on Communications(ICC),2020,pp.1-6.),, which, while incorporating federal learning concepts, requires users to send personal content preferences into the foggy access point, which exposes users 'privacy; automatic encoders and hybrid filtering are incorporated in federal learning (see ：Z.Yu,J.Hu,G.Min,H.Lu,Z.Zhao,H.Wang,and N.Georgalas,"Federated learning based proactive content caching in edge computing,"in 2018IEEE Global Communications Conference(GLOBECOM),2018,pp.1–6.),, which, while guaranteeing users' private data is trained locally, requires each user to upload a list of content recommendations, which exposes individual user content preferences; and neural networks are used to predict weighted popularity (see ：K.Qi and C.Yang,"Popularity Prediction with Federated Learning for Proactive Caching at Wireless Edge,"in 2020IEEE Wireless Communications and Networking Conference(WCNC),2020,pp.1-6.),, which, while not exposing individual user content preferences, still exposes user access request totals, i.e., liveness levels).

In addition, the above methods are all based on the federal learning assumption that model gradients can be safely shared (updating of exchange model parameters between edge servers and participants, essentially sharing local model gradients), without exposing the participants' private training data, which is ideally feasible. However, in reality the security of model gradient sharing is not fully guaranteed, because the model gradients of the participants can still reveal certain properties of the private training data, such as the attribute classifier, even exposing the private training data of the participants completely, such as the deep leakage of gradients (see literature ：L.Zhu,Z.Liu and S.Han,"Deep Leakage from Gradients,"in 2019Conference and Workshop on Neural Information Processing Systems(NeurIPS),2019.)., above all, it is fully possible to obtain private training data from the participant's local model and gradients, and therefore, it is necessary to re-think about the security of model gradients in federal learning, explore federal learning methods of gradient level security, find federal learning active content caching methods that effectively protect the privacy of users on the premise of efficiently predicting popular content trends.

Based on the method, the invention provides a safe and efficient federal learning content caching method, which can accurately predict the popular trend of the content, realize the cache hit rate close to ideal, and ensure the privacy safety of gradient level, thereby well protecting the private data of users. Specifically, the wasperstein generation antagonism network (WASSERSTEIN GENERATIVE ADVERSARIAL Networks, WGAN) is used as a training framework for predicting content popularity trends, and two shared global models are maintained in an edge server based on federal learning: a discriminator sharing model and a generator sharing model. Each user trains WGAN a discriminator model and a generator model using private data based on the sharing model locally, taking into account the user's private data security. Meanwhile, in consideration of model gradient safety, each user needs to locally perform gradient clipping and model correction, and then update of model parameters of the user can be sent to an edge server so as to aggregate and generate a new sharing model for next training. In the training process, the communication cost is updated by each model sent and received between the user and the edge server, and each user only sends the corrected and updated model parameters, so that the safety of model gradient and private data is ensured, and the communication cost between the edge server and the user is effectively reduced. Finally, each user sends the dummy data generated by its local generator to the edge server, which processes the dummy data and predicts the popularity trend of the content, thereby caching the hottest content. In the process, the server receives only the dummy data, and the dummy data does not expose the preference or activity level of the user to the content, but effectively reflects the popularity trend of the content in the whole user group, so that an efficient algorithm of the cache hit rate is realized on the premise of not leaking the privacy of the user. Therefore, the invention is a safe and efficient federal learning content caching algorithm, effectively protects the privacy of each user while accurately predicting and caching the content popularity trend, and has higher feasibility in an actual communication system. The invention is supported by national natural science funding project (No. 61701071).

Disclosure of Invention

In content caching strategies, not only is an efficient algorithm and cache hit rate required to be implemented, but also the privacy of the user needs to be protected, such as the user's private data, preferences for content, liveness level, etc. Aiming at the problems, the invention provides a safe and efficient federal learning active content caching method, which not only can cache popular content by accurately predicting the popular trend of the content, but also can effectively protect privacy and model gradient of each user, and simultaneously reduces the communication cost between an edge server and the user.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a safe and efficient federal learning content caching method comprises the following four contents:

First, each user downloads the discriminator sharing model and the generator sharing model from the edge server, and locally trains WGAN the discriminator local model and WGAN the generator local model using private data based on the sharing model. In addition, the user performs gradient clipping and model correction locally so as to realize gradient privacy protection and reduce communication cost.

Second, each user sends the modified model update to the edge server, and the edge server aggregates the new shared model and sends the new shared model to each user for the next training.

Third, the training method in the first two contents is repeated until the model training is completed.

Fourth, each user transmits the pseudo data generated by the local generator to the edge server, the edge server predicts popularity trends of the content and scores popularity of the content, and finally a caching decision is made.

The steps will be described in detail below.

Step one: information collection and model building

Step 1.1, collecting information: the process of collecting information by the edge server base station mainly includes two aspects according to different sources:

1) The edge server base station obtains content library information through the network side, wherein the content library information comprises the total number of content items, content for responding to the user access request, content for caching and the like.

2) The edge server obtains information through the user side, wherein the information comprises a connected user list, a total number of users and the like.

Step 1.2 model building

Two shared global models are built in the edge server base station: a discriminator sharing model and a generator sharing model. For the discriminator sharing model, firstly, an input layer of the discriminator sharing model is determined according to the related information of a content library, then, the structure of the discriminator sharing model is designed, wherein the structure comprises the model layer number, the node number of each layer, an activation function of the model and the like, finally, an output layer is established to complete a regression task, and the Wasserstein distance is fitted. For the generator sharing model, its input layer is determined according to the dimension of the input noise, then its structure is designed, including model layer number, node number of each layer, activation function of the model, etc., and finally its output layer is determined according to the related information of the content library, so as to generate the pseudo data. After the shared model is built, all model parameters are uniformly initialized, and the two shared global models are successfully built.

Step two: training process of local model

Step 2.1 downloading sharing model

The users participating in the training first download the discriminator sharing model and the generator sharing model at the edge server base station, and then create a local discriminator model and a local generator model, the structure of which is the same as that of the sharing model.

Step 2.2 training local discriminator

The goal of generating a discriminator in a challenge network is to distinguish the pseudo-data generated by the generator from the real data. In WGAN, the Wasserstein distance is smooth, which is advantageous in that it reflects the distance of two non-overlapping distributions and thus provides a meaningful gradient. For this, we need to use the discriminator in WGAN to fit the wasperstein distance to achieve the goal of distinguishing between true and false data, i.e. minimize:

Where f _w is the user's local discriminator, P _r is the true data distribution, and P _g is the pseudo data distribution generated by the local generator. To minimize L _D, efficient gradient update methods root mean square transfer (Root Mean Square Prop, RMSProp) are employed. It uses a differential square weighted average for the gradient of model weights, thereby speeding up the convergence rate of the model. Specifically, during the training of the t-th round, the accumulated gradient momentum m _wt、m_bt is:

m_wt＝β·m_w(t-1)+(1-β)·dw²

m_bt＝β·m_b(t-1)+(1-β)·db²

Where β represents the gradient accumulation index, dw ² and db ² represent the squares of the differentiation of the model parameters w and b. After the accumulated gradient momentum is obtained, the model parameters can be updated based on it.

However, before updating the model parameters, the gradients need to be tailored in order to ensure the safety of the model gradients. For this purpose, we choose to update only gradients contributing greatly, i.e. the magnitude of the gradient value is updated a% before the whole gradients of the current layer, where a% ∈ (0, 1) is the preset gradient clipping coefficient. For small gradients that do not participate in the update, they are saved and added back to the corresponding gradients at the next training, instead of discarding them, in order to prevent losing a large amount of gradient information. As training proceeds, small gradients are accumulated each time until they become the first a% gradient, and then updated into the model parameters. For example, if the number of layers of the model is L ₁,L₂,…,L_n, for the layer L _i, the total number of gradients of all parameters in this layer is n _i, we only take the largest first k _i＝n_i ·a% of parameters for model update, and the rest of gradients are saved locally by the user and are added again to the corresponding gradients of the layer L _i for training at the next training. If k _i is not an integer, then round up. The gradient for updating thus drops from 100% to a%, i.e. gradient clipping.

Then, the model parameters w, b are updated by the gradient after clipping, and the updating process of the t-th round is as follows:

Wherein eta is the learning rate, and in order to prevent denominator from being 0, a small constant epsilon is added for smoothing, so that the numerical value is stable.

Finally, to meet the Lipschitz continuity condition of Wasserstein distance, it is necessary to ensure that ||f _w||_L +.ltoreq.K, where K is a constant that defines all parameters of the model not to exceed a certain range [ -c, c ], where c is a constant, that is to say that defines the model gradient not to exceed a certain range. To this end, we modify the range of model parameters to [ -c, c ], taking directly-c for values less than this range, and taking directly-c for values greater than this range to satisfy the Lipschitz continuity condition, i.e., model modification.

The gradient clipping and the model correction can effectively protect private data of a user, and meanwhile, the communication cost between the user and the edge server is reduced. Thus, a round of training of the local discriminator is completed.

Step 2.3 training the local Generator

The goal of generating a generator in a challenge network is to try to generate real data to fool the authenticator. In WGAN, the gradient of the generator does not vanish, since the wasperstein distance always provides a meaningful gradient. Considering that the input of the generator is independent of the true data distribution, if true and false data is to be generated, only the minimization is needed:

Where f _w is the user's local discriminator and P _g is the pseudo-data distribution generated by the local generator. To minimize L _G, the gradient update method of RMSProp is also employed. The method uses a differential square weighted average for the gradient of the model weight, corrects the swing amplitude of the gradient and enables the model to quickly converge. Specifically, during the training of the t-th round, the accumulated gradient momentum m _wt、m_bt is:

m_wt＝β·m_w(t-1)+(1-β)·dw²

m_bt＝β·m_b(t-1)+(1-β)·db²

Where β represents the gradient accumulation index, dw ² and db ² represent the squares of the differentiation of the model parameters w and b. From this, an accumulated gradient momentum can be obtained, from which the model parameters are then updated.

However, before updating the model parameters, gradients need to be tailored in order to guarantee low communication costs while protecting against unknown risks, even if the generator input is random noise. For this purpose, we choose to update only gradients contributing greatly, i.e. the magnitude of the gradient value is updated a% before the whole gradients of the current layer, where a% ∈ (0, 1) is the preset gradient clipping coefficient. For small gradients that do not participate in the update, they are saved and added back to the corresponding gradients at the next training, instead of discarding them, in order to prevent losing a large amount of gradient information. As training proceeds, small gradients are accumulated each time until they become the first a% gradient, and then updated into the model parameters. For example, if the number of layers of the model is L ₁,L₂,…,L_n, for the layer L _i, the total number of gradients of all parameters in this layer is n _i, we only take the largest first k _i＝n_i ·a% of parameters for model update, and the rest of gradients are saved locally by the user and are added again to the corresponding gradients of the layer L _i for training at the next training. If k _i is not an integer, then round up. The gradient for updating thus drops from 100% to a%, i.e. gradient clipping.

The gradient clipping and the model correction can effectively protect private data of a user, and meanwhile, the communication cost between the user and the edge server is reduced. Thus, a round of training of the local generator is completed.

Step three: aggregation process for shared models

Step 3.1 upload model update

Each user participating in the training first uploads updates of the local discriminator and the local generator model, and the edge server can generate a new discriminator sharing model and a new generator sharing model, respectively. In the training process of the t-th round, the uploaded local model is updated as follows:

Where n is the index of the user, Model update for local discriminator,/>Model update for local generator,/>For the model of the local discriminator,/>As a model of the local generator, W _Dt is a discriminator share model, and W _Gt is a generator share model. After the edge server obtains the model update, a new sharing model can be generated in an aggregation mode.

Step 3.2 generating a sharing model

The edge server processes all the obtained local model updates, and based on federal average, two new sharing models, namely a discriminator sharing model and a generator sharing model, are respectively generated in an aggregation mode. Firstly, respectively aggregating two types of local model updates, wherein the aggregated shared model updates in the t-th round of training are as follows:

Where H _Dt is the shared discriminator model update, H _Gt is the shared generator model update, and N _t is the number of users participating in the training. The model updates are then added to the new shared model, respectively, to generate the new shared model:

W_D(t+1)＝W_Dt+η_DtH_Dt

W_G(t+1)＝W_Gt+η_GtH_Gt

Where W _D(t+1) is the new discriminator sharing model, W _G(t+1) is the new generator sharing model, and η _Dt and η _Gt are the learning rates used when aggregating discriminators and generators, respectively. Thus new discriminator share models and generator share models are obtained which can be used for the next round of training process.

Step four: predicting popularity trends of content

Step 4.1 completion of model training

And (3) repeating the local model training process and the shared model aggregation process in the second step and the third step until the training round number reaches a preset value or the model precision tends to be unchanged.

Step 4.2 obtaining dummy data

After training, each user sends the pseudo data generated by the local generator to the edge server, wherein the pseudo data is data with the same dimension as the real data but different content, namely:

Where n is the index of the user, D _n is the dummy data generated by the local generator of the nth user, The local generator for the nth user, x is the input of the local generator, and is a random noise. To this end, the edge server obtains the dummy data generated by the local generators of the n users.

Step 4.3 content popularity trend prediction

The edge server predicts popularity trends of the content according to the dummy data D _n and ranks the content according to popularity.

Firstly, n pieces of pseudo data are correspondingly added according to dimensions to obtain a total score D reflecting the global popularity trend of the content, wherein the formula is as follows:

Where N is the total number of users providing dummy data. Then, the popularity trend of the content is predicted according to the size of the content score in the D, and the content is more likely to be popular as the score is larger, so that the content can be ordered in descending order according to the content score.

Step 4.4 caching popular content

And finally, taking the caching capacity of the edge server base station into consideration, selecting the first M content lists with the highest scores in D, and downloading the content from the Internet to a caching entity of the edge server base station so as to realize quick and efficient access request of the user.

The beneficial effects of the invention are as follows: the safe and efficient federal learning active content caching method provided by the invention has two advantages, namely safety and high efficiency. From the safety perspective, the invention effectively protects the privacy of the user, wherein the privacy comprises the private data of the user, the preference of the user on the content and the liveness level, the model gradient in the training process is also well protected, and the private data of the user is prevented from being revealed. From the high-efficiency point of view, the invention can accurately predict the popularity trend of the content, thereby realizing the cache hit rate. In addition, gradient clipping and model correction are carried out before model updating transmission, so that the communication cost between a server and a user is greatly reduced, and efficient communication is realized. Therefore, the invention is a safe and efficient federal learning active content caching method.

Drawings

Fig. 1 is a schematic diagram of the system architecture of the present invention.

Fig. 2 is a user side workflow diagram of the present invention.

Fig. 3 is an edge server side workflow diagram of the present invention.

FIG. 4 is a comparison of the present invention with other reference algorithms in terms of cache hit rate.

Detailed Description

The invention is further illustrated below with reference to specific examples.

A specific embodiment of the invention will be described taking the MovieLens K data set as an example, which is created by team GroupLens, a classical data set that is often used to evaluate content caching methods. The 1682 movies were rated by 943 users in the dataset, containing 10 ten thousand scores. We use movies to represent the content requested by the user and scoring to represent the user's preferences for the content. This is often reasonable because the user always has finished watching the movie before scoring. In this dataset, the present invention aims to predict the popularity trend of movies and to cache the popular movies in advance in an edge server base station, while protecting the privacy of the user.

A secure and efficient federal learning content caching method comprising:

step one: information collection and model building

1) And the edge server base station obtains the content library information through the network side. The content library in this example is a MovieLens K movie database that includes the total number of movies, context information, movie specific content, and the like.

2) The edge server obtains information through the user side, and in this example, includes a list of users who have watched movies in MovieLens K, the total number of users who have connected to the edge server, and so on.

Step 1.2 model building

Based on MovieLens K data sets of this example, two shared global models are built in the edge server base station: a discriminator sharing model and a generator sharing model. For the discriminator sharing model, its input layer is first determined based on the total number of movies in the movie library, where the total number of movies is 1682, and the scoring range is 0-5. And then designing the structure of the model, wherein the structure comprises the number of model layers, the number of nodes at each layer, an activation function of the model and the like, and finally establishing an output layer to complete a regression task and fitting the Wasserstein distance. For the generator sharing model, its input layer is determined according to the dimension of the input noise, then its structure is designed, including model layer number, node number of each layer, activation function of the model, etc., and finally its output layer is determined according to the total number of movies in the movie library and the scoring range, so as to generate the pseudo data. After the shared model is built, all model parameters are uniformly initialized, and the two shared global models are successfully built.

Step two: training process of local model

Step 2.1 downloading sharing model

943 Users participating in training in this example first download the discriminator sharing model and the generator sharing model at the edge server base station, then create the local discriminator model and the local generator model, and the structure of the local model is the same as that of the sharing model.

Step 2.2 training local discriminator

The goal of generating a discriminator in a challenge network is to distinguish the pseudo-data generated by the generator from the real data. In this example, each user uses a local discriminator to fit the wasperstein distance to the goal of resolving the authenticity data, i.e., minimizing:

m_wt＝β·m_w(t-1)+(1-β)·dw²

m_bt＝β·m_b(t-1)+(1-β)·db²

However, before updating the model parameters, the gradients need to be tailored in order to ensure the safety of the model gradients. For this purpose we choose to update only gradients contributing strongly, i.e. the magnitude of the gradient value is updated a% before the whole gradient of the current layer, where a% ∈ (0, 1) is the preset gradient clipping coefficient, in this case a% is set to 20%. For small gradients that do not participate in the update, they are saved and added back to the corresponding gradients at the next training, instead of discarding them, in order to prevent losing a large amount of gradient information. As training proceeds, small gradients are accumulated each time until the first 20% gradient is reached and then updated into the model parameters. The gradient for updating is thus reduced from 100% to 20%, i.e. gradient clipping.

Where η is the learning rate, in this example 0.015, a small constant ε is added for smoothing to prevent denominator from being 0, for numerical stabilization.

Finally, to meet the Lipschitz continuity condition of Wasserstein distance, it is necessary to ensure that ||f _w||_L +.ltoreq.K, where K is a constant that defines all parameters of the model not to exceed a certain range [ -c, c ], where c is a constant, that is to say that defines the model gradient not to exceed a certain range. To this end, in this example we modify the range of model parameters to [ -0.1,0.1], taking directly-0.1 for values less than this range, and 0.1 for values greater than this range, to meet the Lipschitz continuity condition, i.e. model modification.

Step 2.3 training the local Generator

The goal of generating a generator in a challenge network is to try to generate real data to fool the authenticator. In this example, each user trains the generator model locally to generate true-false hard data, where only minimal:

m_wt＝β·m_w(t-1)+(1-β)·dw²

m_bt＝β·m_b(t-1)+(1-β)·db²

However, before updating the model parameters, gradients need to be tailored in order to guarantee low communication costs while protecting against unknown risks, even if the generator input is random noise. For this purpose we choose to update only gradients contributing strongly, i.e. the magnitude of the gradient value is updated a% before the whole gradient of the current layer, where a% ∈ (0, 1) is the preset gradient clipping coefficient, in this case a% is set to 20%. For small gradients that do not participate in the update, they are saved and added back to the corresponding gradients at the next training, instead of discarding them, in order to prevent losing a large amount of gradient information. As training proceeds, small gradients are accumulated each time until the first 20% gradient is reached and then updated into the model parameters. The gradient for updating is thus reduced from 100% to 20%, i.e. gradient clipping.

where η is the learning rate, in this example 0.03, a small constant ε is added for smoothing to prevent denominator from being 0, for numerical stabilization.

Step three: aggregation process for shared models

Step 3.1 upload model update

In this example, each user participating in the training first uploads updates of the local discriminator and the local generator model, and the edge server can generate a new discriminator sharing model and a generator sharing model, respectively. In the training process of the t-th round, the uploaded local model is updated as follows:

Step 3.2 generating a sharing model

W_D(t+1)＝W_Dt+η_DtH_Dt

W_G(t+1)＝W_Gt+η_GtH_Gt

Where W _D(t+1) is the new discriminator sharing model, W _G(t+1) is the new generator sharing model, and η _Dt and η _Gt are the learning rates used when aggregating discriminators and generators, respectively, in this case using 1 as their values. Thus new discriminator share models and generator share models are obtained which can be used for the next round of training process.

Step four: predicting popularity trends of content

Step 4.1 completion of model training

And (3) repeating the local model training process and the shared model aggregation process in the second step and the third step until the training round number reaches a preset value or the model precision tends to be unchanged. In this example, the training wheel number is preset to 50 times.

Step 4.2 obtaining dummy data

After training is finished, each user sends the pseudo data generated by the local generator to the edge server. In this example, each user generates a piece of dummy data, and 943 pieces of dummy data are total. Dummy data is data of the same dimension as real data but different content, namely:

Where n is the index of the user, D _n is the dummy data generated by the local generator of the nth user, The local generator for the nth user, x is the input of the local generator, and is a random noise. To this end, the edge server obtains the dummy data generated by the 943 users' local generators.

Step 4.3 content popularity trend prediction

In this example, the edge server predicts popularity trends of the content based on 943 pieces of dummy data D _n, and ranks the content according to popularity.

Firstly, 943 pieces of pseudo data are correspondingly added according to dimensions to obtain an overall score D reflecting the global popularity trend of the content, and the formula is as follows:

Where N is the total number of users providing dummy data, in this example 943. And then predicting the popularity trend of the content according to the size of the content score in the D, wherein the content is more likely to be popular as the score is larger, so that the popularity trend of the content can be ordered in a descending order according to the content score.

Step 4.4 caching popular content

Finally, considering the caching capability of the edge server base station, selecting the first M movie lists with the highest scores in D, and downloading the movies from the Internet movie library to the caching entity of the edge server base station so as to realize quick and efficient access request of the user. Taking into account the movie library size of MovieLens K data sets in this example, it is a common and reasonable range to set the value of M between 50 and 400.

The cache hit rate is used in this example to measure the cache performance of the model, which is tested for 2 ten thousand access requests. When a user makes an access request, if a movie which the user wants to access exists in the buffer entity of the edge server base station, the movie is directly sent to the user from the base station, and the buffer is called successful. Otherwise, if there is no movie in the buffer entity of the edge server base station that the user wants to access, the movie will be downloaded through the internet, which is called buffer failure. The cache hit rate is the ratio of the number of successful caches to the total number of accesses, namely:

Where N _S is the number of cache successes and N _F is the number of cache failures. In this example the gradient cut-out a% is set to 20%, i.e. only 20% of the gradient is updated per training.

We compare the cache hit rate of the present invention with other algorithms: for example, the ideal algorithm Oracle is used as the upper bound of the cache hit rate, the m-E-Greedy algorithm, the random algorithm and the algorithm FPCC based on federal learning and mixed filtering (see document ：Z.Yu,J.Hu,G.Min,H.Lu,Z.Zhao,H.Wang,and N.Georgalas,"Federated learning based proactive content caching in edge computing,"in 2018IEEE Global Communications Conference(GLOBECOM),2018,pp.1–6.).. The cache hit rate of each algorithm is shown in FIG. 4. It can be seen that the algorithm proposed by the present invention is very close to the ideal algorithm Oracle and better than all other reference algorithms. This shows that the algorithm proposed by the present invention can realize the cache hit rate, can effectively utilize the cache capability of the edge server, and is an efficient active content caching algorithm.

In addition, in this example, the security protection of the user privacy of the present invention is compared with other algorithms:

1) Compared to the method using reinforcement learning (see ：N.Zhang,K.Zheng and M.Tao,"Using Grouped Linear Prediction and Accelerated Reinforcement Learning for Online Content Caching,"in 2018IEEE International Conference on Communications Workshops(ICC Workshops),pp.1-6,2018), collaborative filtering method (see ：E.Bastug,M.Bennis,and M.Debbah,"Living on the edge:The role of proactive caching in 5G wireless networks,"IEEE Communications Magazine,vol.52,no.8,pp.82–89,2014.), linear regression method (see ：K.N.Doan,T.Van Nguyen,T.Q.S.Quek and H.Shin,"Content-Aware Proactive Caching for Backhaul Offloading in Cellular Network,"in IEEE Transactions on Wireless Communications,vol.17,no.5,pp.3128-3140,May 2018.)), the present invention ensures that the user's private data is only saved locally for training.

2) Still further, the present invention ensures that individual user preferences for content and activity levels do not leak, as compared to federal learning methods in foggy radio access networks (see ：Y.Wu,Y.Jiang,M.Bennis,F.Zheng,X.Gao and X.You,"Content Popularity Prediction in Fog Radio Access Networks:A Federated Learning Based Approach,"ICC 2020-2020IEEE International Conference on Communications(ICC),2020,pp.1-6.)、 auto-encoder and hybrid filtering methods (see ：Z.Yu,J.Hu,G.Min,H.Lu,Z.Zhao,H.Wang,and N.Georgalas,"Federated learning based proactive content caching in edge computing,"in 2018IEEE Global Communications Conference(GLOBECOM),2018,pp.1–6.)、 neural network to predict weighted popularity methods (see ：K.Qi and C.Yang,"Popularity Prediction with Federated Learning for Proactive Caching at Wireless Edge,"in2020IEEE Wireless Communications and Networking Conference(WCNC),2020,pp.1-6.)).

3) Finally, the invention performs gradient clipping and model compression locally to the user during each training, thereby realizing the privacy protection of the user gradient level, namely the private data of the user cannot be exposed in the user local model parameters and gradients.

Based on the three points, the invention is a safe active content caching method, which can effectively protect the privacy of users and prevent data leakage.

In summary, the algorithm provided by the invention is an efficient and safe federal learning active content caching method. The method can accurately predict the popularity trend of the content, cache the popularity content in the edge server to achieve the cache hit rate, and meanwhile, the gradient clipping and the model correction reduce the communication cost during transmission, so that the method has high efficiency. In addition, the privacy of each user can be effectively protected by the method, the privacy protection of the gradient level is achieved while private data and behavior data are prevented from being revealed, and the safety of the method is embodied.

The examples described above represent only embodiments of the invention and are not to be understood as limiting the scope of the patent of the invention, it being pointed out that several variants and modifications may be made by those skilled in the art without departing from the concept of the invention, which fall within the scope of protection of the invention.

Claims

1. A secure and efficient federal learning content caching method, characterized in that first, each user downloads a discriminator sharing model and a producer sharing model from an edge server, locally trains WGAN the discriminator local model and WGAN the producer local model using private data based on the sharing model; in addition, the user performs gradient clipping and model correction locally so as to realize gradient privacy protection and reduce communication cost; secondly, each user sends the corrected model update to an edge server, and the edge server aggregates to generate a new sharing model and sends the new sharing model to each user for the next training; repeating the training methods in the first two contents until the model is trained; finally, each user sends the pseudo data generated by the local generator to an edge server, the edge server predicts the popularity trend of the content and scores the popularity degree of the content, and finally a caching decision is made; the specific contents are as follows:

step one: information collection and model building

Step 1.1, an edge server base station collects information;

Step 1.2 modeling

Two shared global models are built in the edge server base station: a discriminator sharing model and a generator sharing model; for the discriminator sharing model, firstly, determining an input layer of the discriminator sharing model according to the related information of a content library, designing the structure of the discriminator sharing model, and finally, establishing an output layer to complete a regression task and fitting a Wasserstein distance; for the generator sharing model, determining an input layer of the generator sharing model according to the dimension of input noise, designing a structure of the generator sharing model, and finally determining an output layer of the generator sharing model according to the information of the content library so as to generate pseudo data; after the shared model is established, all model parameters are uniformly initialized to obtain two shared global models;

Step two: training process of local model

Step 2.1 downloading sharing model

Firstly, a user participating in training downloads a discriminator sharing model and a generator sharing model from an edge server base station, and a local discriminator model and a local generator model are established, wherein the structure of the local model is the same as that of the sharing model;

Step 2.2 training local discriminator

The goal of generating a discriminator in the countermeasure network is to distinguish the pseudo data generated by the generator from the real data; the discriminator in WGAN was used to fit the wasperstein distance to the aim of distinguishing between true and false data, i.e. to minimize:

Where f _w is the user's local discriminator, P _r is the true data distribution, and P _g is the pseudo data distribution generated by the local generator;

The root mean square transfer minimization L _D is adopted by an effective gradient updating method; during the training of the t-th round, the accumulated gradient momentum m _wt、m_bt is:

m_wt＝β*m_w(t-1)+(1-β)·dw²

m_bt＝β*m_b(t-1)+(1-β)·db²

Where β represents the gradient accumulation index, dw ² and db ² represent the squares of the differentiation of model parameters w and b; updating model parameters according to the accumulated gradient momentum;

Prior to updating the model parameters, clipping the gradients: selecting gradients with large contribution for updating, namely updating the gradient value in the first a% of all gradients of the current layer; for small gradients which do not participate in updating, the small gradients are stored, and the small gradients are added back to the corresponding gradients in the next training;

The model parameters w, b are updated by adopting the gradient after clipping, and the updating process of the t-th round is as follows:

Wherein eta is learning rate and epsilon is constant;

finally, in order to meet the Lipschitz continuity condition of the Wasserstein distance, it is necessary to ensure that ||f _w‖_L is less than or equal to K, where K is a constant; correcting the range of model parameters to [ -c, c ], taking directly-c for values smaller than the range, and taking directly-c for values larger than the range so as to meet the Lipschitz continuity condition, namely, correcting the model;

the gradient clipping and the model correction can effectively protect private data of a user, and simultaneously reduce the communication cost between the user and an edge server; so far, one round of training of the local discriminator is completed;

step 2.3 training the local Generator

The goal of generating a generator in the challenge network is to try to generate real data to fool the discriminator; considering that the input of the generator is independent of the true data distribution, if true and false data is to be generated, only the minimization is needed:

where f _w is the user's local discriminator and P _g is the pseudo-data distribution generated by the local generator;

The gradient update method of RMSProp was also used to minimize L _G; during the training of the t-th round, the accumulated gradient momentum m _wt、m_bt is:

m_wt＝β·m_w(t-1)+(1-β)·dw²

m_bt＝β·m_b(t-1)+(1-β)·db²

Where β represents the gradient accumulation index, dw ² and db ² represent the squares of the differentiation of model parameters w and b; from this, the accumulated gradient momentum can be derived, from which the model parameters are then updated;

Before updating the model parameters, clipping the gradient is needed; selecting gradients with large contribution for updating, namely updating the gradient value in the first a% of all gradients of the current layer; for small gradients which do not participate in updating, storing the small gradients, and adding the small gradients back to the corresponding gradients in the next training;

Wherein eta is learning rate, in order to prevent denominator from being 0, a small constant epsilon is added for smoothing, and is used for numerical stabilization;

The gradient clipping and the model correction can effectively protect private data of a user, and simultaneously reduce the communication cost between the user and an edge server; so far, one round of training of the local generator is completed;

step three: aggregation process for shared models

Step 3.1 upload model update

Each user participating in training firstly uploads the updates of the local discriminator and the local generator model, and the edge server can generate a new discriminator sharing model and a new generator sharing model respectively; in the training process of the t-th round, the uploaded local model is updated as follows:

Where n is the index of the user, Model update for local discriminator,/>For the model update of the local generator,For the model of the local discriminator,/>As a model of the local generator, W _Dt is a discriminator sharing model, and W _Gt is a generator sharing model; after the edge server obtains the model update, new sharing models are generated by aggregation;

step 3.2 generating a sharing model

The edge server processes all acquired local model updates, and two new sharing models, namely a discriminator sharing model and a generator sharing model, are respectively generated by aggregation based on federal average;

firstly, respectively aggregating two types of local model updates, wherein the aggregated shared model updates in the t-th round of training are as follows:

Wherein H _Dt is a shared discriminator model update, H _Gt is a shared generator model update, and N _t is the number of users participating in training;

The model updates are then added to the new shared model, respectively, to generate the new shared model:

W_D(t+1)＝W_Dt+η_DtH_Dt

W_G(t+1)＝W_Gt+η_GtH_Gt

Wherein, W _D(t+1) is a new discriminator sharing model, W _G(t+1) is a new generator sharing model, and eta _Dt and eta _Gt are learning rates used when aggregating discriminators and generators respectively;

so far, a new discriminator sharing model and a generator sharing model are obtained and used for the training process of the next round;

step four: predicting popularity trends of content

Step 4.1 completion of model training

Repeating the local model training process and the shared model aggregation process in the second step and the third step until the training wheel number reaches a preset value or the model precision tends to be unchanged;

Step 4.2 obtaining dummy data

Where n is the index of the user, D _n is the dummy data generated by the local generator of the nth user, The local generator is the nth user, x is the input of the local generator, and is a random noise;

to this end, the edge server obtains pseudo data generated by the local generators of the n users;

Step 4.3 content popularity trend prediction

The edge server predicts the popularity trend of the content according to the pseudo data D _n, and sorts the popularity according to popularity;

Firstly, n pieces of pseudo data are correspondingly added according to dimensions to obtain a total score D reflecting the global popularity trend of the content, and the formula is as follows:

Wherein N is the total number of users providing dummy data;

Predicting the popularity trend of the content according to the content score in the step D, and sorting the content in a descending order according to the content score;

Step 4.4 caching popular content

Taking the caching capacity of the edge server base station into consideration, selecting the first M content lists with the highest scores in the D, and downloading the content lists into a caching entity of the edge server base station so as to realize quick and efficient access requests of users;

in the step 2.2, a% ∈ (0, 1) is a preset gradient clipping coefficient;

In the step 2.3, a% ∈ (0, 1) is a preset gradient clipping coefficient.

2. The method for caching federal learning content according to claim 1, wherein in step 1.1, the process of collecting information by the edge server base station includes two aspects according to sources:

1) The edge server base station obtains content library information through a network side, wherein the content library information comprises the total number of content items, content for responding to a user access request and content for caching;

2) The edge server obtains information through the user side, wherein the information comprises a connected user list and the total number of users.