CN111488528A

CN111488528A - Content cache management method and device and electronic equipment

Info

Publication number: CN111488528A
Application number: CN202010348220.9A
Authority: CN
Inventors: 江帆; 袁增; 宫家志; 周继军; 王颖; 张兰
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2020-08-04

Abstract

The present invention relates to the field of wireless communication, and in particular, to a content cache management method, a content cache management apparatus, and an electronic device. The method comprises the following steps: acquiring historical request data of each user for the cache data and constructing a user preference matrix; constructing a content popularity matrix according to a network topology relation matrix corresponding to a network architecture to which a user belongs and a user preference matrix; the content popularity matrix is used for describing the content popularity of each fog node and each base station in the network architecture aiming at the cache data; constructing a cache probability matrix for the cache data based on a user preference matrix and the content popularity matrix; and acquiring a cache state matrix by using a reinforcement learning model in combination with the spatial correlation between the cache probability matrix and the base station so as to cache the cache data according to the cache state matrix. The method can optimize the content caching mode and has better cache hit rate and timeliness.

Description

Content cache management method and device and electronic equipment

Technical Field

The present invention relates to the field of wireless communication, and in particular, to a content cache management method, a content cache management apparatus, and an electronic device.

Background

The F-RAN (Fog-Radio Access Network) architecture can be used to alleviate the load pressure of the fifth generation (5G) wireless Network by using the computing and storing capability of the edge device, that is, to place hot content at a position close to a requesting edge user as much as possible, thereby effectively reducing the content request delay.

However, the content caching problem associated with the F-RAN architecture faces a number of challenges due to limitations in the storage capacity of the terminal and dynamic changes in user requirements. Among other things, how and when content should be efficiently stored in a valid edge device, such as in a foggy node (FAP) or a Foggy User Equipment (FUE), to achieve a higher cache request hit rate becomes a crucial issue in content caching.

In view of the above, there is a need in the art to develop a new content caching method.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

The embodiment of the invention provides a content cache management method, a content cache management device and electronic equipment, so that a content cache mode is optimized at least to a certain extent, and the content cache management method, the content cache management device and the electronic equipment have better cache hit rate and timeliness.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to an aspect of an embodiment of the present invention, a content cache management method is provided, including:

acquiring historical request data of each user for the cache data and constructing a user preference matrix;

constructing a content popularity matrix according to a network topology relation matrix corresponding to the network architecture to which the user belongs and the user preference matrix; the content popularity matrix is used for describing the content popularity of each fog node and each base station in the network architecture aiming at the cache data;

constructing a caching probability matrix for the cached data based on the user preference matrix and the content popularity matrix;

and acquiring a cache state matrix by utilizing a reinforcement learning model in combination with the spatial correlation between the cache probability matrix and the base station so as to cache the cache data according to the cache state matrix.

According to an aspect of an embodiment of the present invention, there is provided a content cache management apparatus including:

the user preference matrix building module is used for obtaining historical request data of each user for the cache data and building a user preference matrix;

the content popularity matrix construction module is used for constructing a content popularity matrix according to a network topology relation matrix corresponding to the network architecture to which the user belongs and the user preference matrix; the content popularity matrix is used for describing the content popularity of each fog node and each base station in the network architecture aiming at the cache data;

a cache probability matrix construction module for constructing a cache probability matrix for the cache data based on the user preference matrix and the content popularity matrix;

and the cache state matrix acquisition module is used for acquiring a cache state matrix by utilizing a reinforcement learning model in combination with the spatial correlation between the cache probability matrix and the base station so as to cache the cache data according to the cache state matrix.

In some embodiments of the present invention, based on the foregoing solution, the user preference matrix building module includes:

a history information matrix construction unit; the historical request data in a preset period are counted, data statistics of file requests of all users for the cache data are obtained, and a corresponding historical information matrix is generated; the cache data comprises files, and each file is configured with a corresponding theme;

and the theme model training unit is used for processing by using the theme model by taking the historical information matrix and the relation matrix of the file and the corresponding theme as input parameters to obtain the user preference matrix and the user request probability matrix.

In some embodiments of the present invention, based on the foregoing solution, the apparatus further includes:

and the user classification module is used for clustering the users according to the user preference data in the user preference matrix so as to obtain all kinds of information and the kinds to which all the users belong.

In some embodiments of the present invention, based on the foregoing solution, the content popularity matrix building module includes:

and constructing the content popularity matrix by combining a transmission distance weight parameter between any two network devices in the network architecture, a network topology relation parameter between any two network devices, a conditional probability parameter of a file request of the fog user device and a probability parameter of a file request sent by the fog user device.

In some embodiments of the present invention, based on the foregoing scheme, the cache state matrix obtaining module includes:

a basic reward value configuration unit, configured to configure a basic reward value corresponding to each communication link mode of the network architecture;

the reward matrix construction unit is used for randomly selecting a fog user device and another network device in the network architecture, calculating a corresponding reward value based on a basic reward value of a corresponding communication link mode and a future reward, and iterating until a target state is reached to obtain a corresponding reward matrix;

the reward matrix optimization unit is used for optimizing the reward matrix by utilizing the cache probability matrix to obtain an optimized reward matrix;

the current cache hit rate calculation unit is used for randomly selecting a file request of the fog user equipment, and determining a corresponding optimal communication link according to the optimized reward matrix so as to determine the current cache hit rate corresponding to the optimal communication link;

an optimal communication link selection unit, configured to, when the current cache hit rate is greater than an initial cache hit rate of a network device corresponding to the optimal communication link, and the network device corresponding to the optimal communication link is a base station, obtain, according to the spatial correlation of the base station, another base station with a highest correlation with the base station;

a first cache execution unit, configured to not execute caching when the other base station has cache data corresponding to the file request;

and the second cache execution unit executes cache when cache data corresponding to the file request does not exist in the other base stations, updates the residual cache space of the base stations and updates the corresponding cache hit rate.

In some embodiments of the present invention, based on the foregoing scheme, the basic prize value configuration unit includes:

configuring a base prize value of the D2D communication link to a first prize value; configuring the basic reward value of the FAP-FUE communication link as a second reward value; configuring the base prize value of the BS-FUE communication link as a third prize value;

wherein the first reward value > second reward value > third reward value; and the first prize value for the D2D communication link between homogeneous FUEs is greater than the first prize value for the D2D communication link between non-homogeneous FUEs.

the first updating module is used for updating the file cache state of the fog user equipment according to the file request of the fog user equipment; and updating the cache state matrix based on the updated file cache state of the fog user equipment.

In some embodiments of the present invention, based on the foregoing solution, the apparatus further includes: the second updating module is used for responding to a timing instruction of a timer and acquiring the updated user preference matrix and the updated content popularity matrix; and updating the corresponding values of the fog nodes and the base stations in the cache state matrix according to the updated user preference matrix and the updated content popularity matrix.

According to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the content cache management method as described in the above embodiments.

In the technical solutions provided by some embodiments of the present invention, a user preference matrix is first constructed using historical request data of a user, and user preferences are measured. And then, a network topological relation matrix is constructed by combining topological relations among different devices in the network, and the predicted user preference is used for further predicting the change of the content popularity and constructing a content popularity change matrix. Finally, based on a Q-learning method, obtaining an optimal cache state matrix by utilizing the cache probability matrix and the correlation degree between base stations, and achieving an optimal cache placement method; thereby achieving maximum cache hit rate.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the invention may be applied;

FIG. 2 schematically shows a flow diagram of a content cache management method according to one embodiment of the invention;

fig. 3 is a diagram schematically illustrating a change in cache hit rate of a content caching phase over time when F is 100 according to an embodiment of the present invention;

fig. 4 is a diagram schematically illustrating a change in cache hit rate over time in a content update phase when F is 100 according to an embodiment of the present invention;

FIG. 5 schematically shows a diagram of the impact of the number of files on the performance of the proposed method according to one embodiment of the invention;

FIG. 6 is a schematic diagram that schematically illustrates the impact of the amount of FUE on the performance of the proposed method, in accordance with an embodiment of the present invention;

fig. 7 schematically shows a block diagram of a content cache management apparatus according to an embodiment of the present invention;

FIG. 8 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present invention can be applied.

As shown in fig. 1, the network architecture 100 considers a multi-cell scenario with an F-RAN architecture. As shown in fig. 1, the following network devices (NEs) may be included: m FAPs (foggy nodes 101), E BSs (base stations 102) and K randomly distributed fuss (foggy user equipments 103). Wherein, FUE set, FAP set and BS set are respectively expressed as

And

the radius of the cell is denoted d_BSThe maximum communication distances of FAP-FUE and D2D are respectively denoted as D_FAPAnd d_FUE. Assume that the computer and storage capabilities of each FAP are the same and that the cloud computing center stores F popular content files (e.g., popular search videos, popular movies, and high-score novels), where the collection of content files is represented as

As shown in fig. 1, the content file as the cache data may be first downloaded from the cloud computing center 104 and cached in each network device, and then may be shared among the fog user device 103, the fog node 101, and the base station 102.

It should be understood that the number of network devices in fig. 1 is merely illustrative. There may be any number of network devices of various types, as desired for implementation.

In an embodiment of the present invention, a user corresponds to a cloud user device, and may send a file request to a cloud computing center through FUE, cache a file locally, or cache a file in FAP or BS according to a file request.

It should be noted that the content caching management method provided by the embodiment of the present invention is executed between the FAP, the BS, and the FUE.

In recent years, with the rapid development of the internet of things (IoT) and the popularization of smart terminals, mobile users are increasingly eager to obtain network services with higher quality of experience and are willing to spend more cost for the network services. Therefore, the limited capacity of the backhaul link will face a great challenge in the face of rapidly increasing mobile multimedia service demands and increasing data traffic. In order to solve the above problems, the prior art provides a Network architecture of a Fog-Radio Access Network (F-RAN), which fully utilizes the computing and storing capabilities of edge devices to relieve the load pressure of a fifth generation (5G) wireless Network, that is, hot content is placed at a position close to a requesting edge user as much as possible, thereby effectively reducing the content request delay. However, the content caching problem associated with the F-RAN faces a number of challenges due to limitations in the storage capacity of the terminal and dynamic changes in user requirements. Among other things, how and when content should be efficiently stored in a valid edge device, such as in a foggy node (FAP) or a Foggy User Equipment (FUE), to achieve a higher cache request hit rate becomes a crucial issue in content caching.

Moreover, considering that there is a huge difference in coverage and computer and storage capacity of the user side, the fog node and the base station, the conventional method cannot be applied to a dynamically changing F-RAN network.

In view of the problems in the related art, the embodiment of the present invention first provides a content cache management method, and details of implementation of the technical solution of the embodiment of the present invention are described in detail below:

fig. 2 schematically shows a flow chart of a content cache management method according to an embodiment of the invention. The method can be executed by the cooperation of the terminal equipment and the server side; wherein, the server may be the cloud computing center shown in fig. 1; the terminal device may be the other network device shown in fig. 1. Referring to fig. 2, the content cache management method includes at least steps S11 through S14, which are described in detail below.

For the network architecture shown in the figure, to describe the relationship between network devices, we use the matrix a ═ a_i,j]_{(K+M+E)×(K+M+E)}To represent the network topology relationships between network devices, where the rows and columns are made up of all the network devices in the network architecture.

If the distance between FUE i and FUE j is less than or equal to d_FUEThen a is_i,j1, otherwise a_i,j0. Similarly, consider the communication distance constraint of dFAP and dBS, a_i,jCan be expressed as:

wherein d is_i,jRepresents the distance between NE i and NE j.

Additionally, consider that due to the selfishness of some users (i.e., the corresponding FUEs), users are reluctant to share their stored files with nearby users. Thus, the present embodiment introduces a matrix S ═ S_i,j]_K×KRepresenting sharing behavior among users, including:

therefore, considering the sharing behavior of the pairs among the users, the network topology relation matrix a needs to be optimizedAnd (4) transforming and redefining. If s is_i,j1 and a_i,j1, then a_i,jIf 1, that is, the communication distance between the two network devices is within the preset range and the two network devices are privately owned, s _i,j1 is ═ 1; otherwise a_i,jThe above formula (1) may be optimized as:

in addition, the present embodiment uses cache hit rate to reflect cache performance; wherein, P_hitRepresenting the average probability of a user obtaining a requested file within a limited time, the expression may include:

wherein, g_kRepresents the probability of FUE k sending a request; p is a radical of_k,fP (f | k) represents the conditional probability that FUE k requests file f;

and p is_k,f∈[0,1]；c_i,fIndicating the state of whether NE i caches file f.

In one embodiment of the invention, to achieve the maximization of the cache hit rate under the condition that the storage capacity of each NE is limited; therefore, the optimization problem of the caching method is constructed as follows:

P 1:max P_hitformula (5)

Wherein, B_fRepresenting files fSize, L_k，L_FAPAnd L_BSRespectively representing the size of the remaining space of FUE k, FAP and BS storage capacities. Parameter g_k、p_k,fAnd c_i,fAre coupled to each other.

In step S11, the history request data for the cache data by each user is acquired and a user preference matrix is constructed.

In one embodiment of the invention, user preferences may be defined as user preferences through a network device

(i.e., the foggy user device FUE) when sending file requests to the cloud computing center, the conditional probability (in p) that each file is requested_k,fRepresentation), since the user' S preference always changes with the change of time, a dynamic edge caching method is implemented at the FUE end by using a user preference prediction method to improve the cache hit rate thereof.

Step S111, counting the historical request data in a preset period, acquiring data statistics of file requests of each user for the cache data, and generating a corresponding historical information matrix; the cache data comprises files, and each file is configured with a corresponding theme;

and step S112, taking the historical information matrix and the relation matrix of the file and the corresponding theme as input parameters, and processing by using a theme model to obtain the user preference matrix and the user request probability matrix.

Specifically, the history request data may be obtained according to log information or history request information of each FUE in the network architecture. Counting data of files requested by each FUE to the cloud computing center within a preset time period, and constructing a historical information matrix N: [ N ] of a user_k,f]_K×(F+1). Wherein n is_k,fIndicating the total number of file f requested by FUE k within a limited time.

Establishing a matrix P_U:＝[p_k,f]_K×FRepresenting the user's personal preferences of all users (i.e., all corresponding FUEs).

For the p L SA model, user variables and file variables can be included, and theme variables are used for representing the theme of each file, such as the file theme of videos, voice and documents, or the theme of sports, entertainment, news and the like.

In this embodiment, the theme set is defined as:

and configure each file to have only one theme. Defining a relationship matrix between the document and the subject as: x: ═ X_f,z]_(F+1)×Z. Wherein,

for FUE, the file request configures the following parameters to model, including: 1) g_kIndicating the probability of FUE k sending a request; (2) p (z)_jI k), representing the selection of subject z by FUE k_jThe conditional probability of (a); (3) p (f | z)_j) Denote FUE k selection topic z_jConditional probability of file f. The initial values of the parameters can be selected from a relation matrix X between the files and the subjects and a history information matrix N: [ N ] of the user_k,f]_K×(F+1)Is obtained in which n is_k,fIndicating the total number of file f requested by FUE k within a limited time. File (F +1) indicates that the file is empty, indicating that FUE k has not sent a request.

Configuring the initial values of the parameters as follows:

based on the above, the joint probability of FUE k requesting file f can be expressed as:

where gk ∈ G, and G represents a matrix of probabilities that all users request files.

For the P L SA algorithm, the parameter P (fz) may also be estimated using the EM algorithm for convergence of the algorithm_j) And P (z)_jI k). The EM algorithm comprises two steps: a desired step and a maximum step. In the desired step, the subject variable z is calculated using the currently known parameters_jThe posterior probability of (a), comprising:

in the maximization step, the parameter P (fz) is updated with the maximized log-likelihood function l_j) And P (z)_jI k), the log-likelihood function/can be expressed as:

wherein, P (f | z)_j) And P (z)_jI k) are respectively:

after obtaining the joint probability P (k, f) and the user request probability g_kThereafter, the user preferences p_k,fBy p_k,f＝P(k,f)/g_kAnd (6) obtaining.

For example, in detail, the request probability matrix G and the user preference matrix P_UThe obtaining of (a) may include the following:

21) counting the total times of the user side request files in a counting period aiming at the counting period pi, and generating a historical information matrix N based on the total times;

22) initializing a threshold value delta;

23) calculated according to the above-mentioned formulas (7), (8) and (9), respectively

P⁽⁰⁾(f|z_j) And P⁽⁰⁾(z_j|k)；

24) At each iteration, the current topic variable z is calculated according to equation (11) above_jA posterior probability P: (ⁱ)(z_j|k,f)；

25) Calculating the current selection subject z of FUE k according to the above equations (13) and (14)_jConditional probability P of file f⁽ⁱ⁾(f|z_j) And FUE k selects topic z_jConditional probability P⁽ⁱ⁾(z_j|k)；

26) Calculating a current log-likelihood function l (i) according to the above formula (12);

27) judging an iteration condition: if the difference between the current log-likelihood function l (i) and the log-likelihood function l exceeds the threshold value delta, returning to the step 23) and carrying out the next iteration; otherwise, if the iteration condition is satisfied, assigning the current log-likelihood function l (i) to the log-likelihood function l to obtain the maximum log-likelihood function l, thereby obtaining the selection subject z of FUE k_jConditional probability P (f | z) of file f_j) Selecting subject z with FUE k_jConditional probability P (z)_j|k)；

28) Calculating a joint probability P (k, f) according to the above formula (10);

29) respectively obtaining a user request probability matrix G and a user preference matrix P according to a total probability formula and a conditional probability formula_UThe method comprises the following steps:

and p_k,f＝P(k,f)/g_k。

Obtaining a user request probability matrix G and a user preference matrix P by utilizing a P L SA algorithm_UPotential topics may be discovered in a large number of documents, andthe method overcomes the defect of inaccurate calculation of the file similarity in the traditional method. In addition, because the user historical data volume is large, the calculation of the user preference matrix can be completed on line, so that the phenomenon that excessive network resources are occupied and the calculation efficiency is influenced can be avoided.

In an embodiment of the present invention, the method for obtaining a user preference matrix may include:

inputting: historical information matrix N

Initializing ξ ← inf ← 0.5, i ← 1, l (0) ← 0 (likelihood function);

and (3) outputting: g and P_U.

In an embodiment of the present invention, the method may further include: and clustering the users according to the user preference data in the user preference matrix to acquire the information of each category and the category to which each user belongs.

Particularly, because the user preference content is possibly consistent or has high similarity, the invention clusters the users according to the user preference to improve the content hit rate so as to achieve high-efficiency distribution and avoid unnecessary redundant cache brought to the user equipment. For example, i can implement user clustering by using SOM algorithm. The algorithm simulates a feed-forward neural network with self-organizing features. Its network topology consists of two layers: an input layer and an output layer. Both the input layer and the output layer are composed of neurons that are fully interconnected by connection weights. Suppose that: there are Y input neurons and U output neurons. In addition, the connection weight matrix is configured as V ═ V_i,j]I ∈ Y, j ∈ U, wherein v_i,jRepresenting the connection weight between the ith input neuron and the jth output neuron.

For example, the SOM-based clustering process may specifically include the following:

31) the initial values of the connection weight matrix V between the input neurons and the output neurons are obtained using a random-based approach. In the input layer, the ith user and user preferences P_U(i) Replaced with input neurons as input data;

32) calculating input data P_U(i) Euclidean distance O between the connection weight matrix V and the connection weight matrix_j(n), the neuron with the weight vector closest to the input vector is the winning neuron, i.e. the neuron with the weight vector closest to the input vector is the winning neuron

33) Selecting topological neighborhoods

Is provided with Y_i,jIs the lateral distance between the neurons i and j on the neuron network topology, the topological neighborhood is selected according to the following formula,

where σ (t) is an exponential function σ (t) that decreases over time as σ (t) — σ₀exp(-t/T_σ) Where σ is₀And T_σIs a constant.

34) The weight matrix V is updated according to the following formula,

v_i,j＝v_i,j+Δv_i,jformula (17)

Where η (t) represents the learning rate, decays to 0 as it is continuously learned, i.e., the magnitude of the weight adjustment will become smaller.

35) Steps 32) -34) are repeated until either phi (t) is reduced to 0 or the total number of iterations reaches the maximum number of iterations. After performing the above process, it is assumed that the category results reach stability, which means that the ith user is classified into a certain category.

By performing the above-described SOM-based classification method for all users (corresponding to all FUEs), users with similar user preferences can be effectively classified into a plurality of categories.

In step S12, a content popularity matrix is constructed according to the network topology relationship matrix corresponding to the network architecture to which the user belongs and the user preference matrix; the content popularity matrix is used for describing the content popularity of each fog node and each base station in the network architecture aiming at the cache data;

in an embodiment of the present invention, the content popularity matrix may be constructed by combining a transmission distance weight parameter between any two network devices in the network architecture, a network topology relation parameter between any two network devices, a conditional probability parameter of a file request by the fog user device, and a probability parameter of a file request sent by the fog user device.

In particular, for the fog nodes and base stations in the network architecture, not individual preferences are addressed, but rather the common preferences of all users are taken into account. Thus, user preferences cannot be applied directly to the fog node and base station side. Thus, the present embodiment utilizes content popularity to represent common content preferences for different fog nodes and base stations. Also, the user preference prediction result described above can be used for prediction of popularity of content in an online form.

The user preference matrix P is utilized in consideration of actual parameters in a real application scene, such as the buffer status and communication range of a user_UAnd predicting a content popularity matrix by using the network topology relation matrix A, and expressing the optimal content caching mode of the fog nodes and the base station. Specifically, a matrix P is set_FB:＝[p_i,f]_(M+E)×FIndicating the content popularity of all the fog nodes and base stations. Wherein,

wherein,

i∈[1，M+E]，w_k,irepresents a transmission distance weight between two network devices, defined as:

in an embodiment of the invention, the obtaining the content popularity matrix may include:

inputting: p_U；A；G；

Output P_FB.

In step S13, a caching probability matrix for the cached data is constructed based on the user preference matrix and the content popularity matrix.

In one embodiment of the present invention, the user preference matrix P described above is combined_UAnd a content popularity matrix P_FBA cache probability matrix of the whole system is constructed and marked as:

in step S14, a cache state matrix is obtained by using a reinforcement learning model in combination with the spatial correlation between the cache probability matrix and the base station, so as to cache the cache data according to the cache state matrix.

In an embodiment of the present invention, specifically, the step S14 may include:

step S141, configuring basic reward values corresponding to each communication link mode of the network architecture;

step S142, randomly selecting a fog user device and another network device in the network architecture, calculating a corresponding reward value based on a basic reward value and a future reward of a corresponding communication link mode, and iterating until a target state is reached to obtain a corresponding reward matrix;

step S143, optimizing the reward matrix by using the cache probability matrix to obtain an optimized reward matrix;

step S144, a file request of the fog user equipment is randomly selected, and a corresponding optimal communication link is determined according to the optimized reward matrix so as to determine a current cache hit rate corresponding to the optimal communication link;

step S144, when the current cache hit rate is greater than the initial cache hit rate of the network device corresponding to the optimal communication link, and the network device corresponding to the optimal communication link is a base station, acquiring other base stations with the highest relevance with the base station according to the spatial relevance of the base station;

step S145, when the other base station has the cache data corresponding to the file request, not executing the cache; or

Step S146, when there is no cache data corresponding to the file request in the other base stations, performing caching, updating the remaining cache space of the base station, and updating the corresponding cache hit rate.

For example, in this embodiment, the reward in the Q-learning algorithm may be used to indicate a reward value that the network device can obtain when caching the file.

In this embodiment, the basic reward matrix may be defined as: r: ═ R_i,j]_{(K+M+E)×(K+M+E)}. Configuring a base reward value of a D2D (Device-to-Device) communication link as a first reward value; configuring a base reward value of a foggy node-foggy user equipment (FAP-FUE) communication link to be a second reward value; configuring a base reward value of a base station-foggy user equipment (BS-FUE) communication link to be a third reward value; wherein the first reward value > second reward value > third reward value; and the first prize value for the D2D communication link between homogeneous FUEs is greater than the first prize value for the D2D communication link between non-homogeneous FUEs.

In particular, the configuration of the base prize value may take into account the following factors:

first, consider the selection of a content caching communication link scheme. Specifically, to relieve the load stress of the backhaul link and the backhaul link, we set the reward value of selecting the D2D communication link to be the highest, and the reward value of selecting the BS-FUE communication link to be the lowest; the reward value for selecting the FAP-FUE communication link is set in between. The specific definition may include:

r_i,e＝μr_i,m＝γr_i,jθ equation (21)

Wherein,

μ,γ∈(0,1)，μ<γ, and θ are constants.

Second, preferably, the preference similarity of the users, i.e. the user preferences, can also be considered. Specifically, if a D2D communication link can be established between two FUEs, and the two FUEs belong to the same class, then the instant award is higher than a non-homogeneous D2D communication link, then there may be:

r_i,j＝λr_i,kformula (22)

Wherein,

λ>1。

third, network topology relationships may also be considered. In particular, if a communication link can be established between two network devices, r_i,jWill be given a non-zero value, otherwise r_i,jIs zero. Therefore, based on the network topology matrix, the above equation (22) is optimized as:

r_i,j＝a_i,j*r_i,jformula (23)

Wherein, i, j ∈ [1, K + M + E ].

Fourth, the communication distance between two network devices may also be considered. In the content caching process, a smaller communication distance may bring a smaller transmission delay. Thus, r_i,jCan be optimized as follows:

r_i,j＝w_i,j*r_i,jformula (24)

In this embodiment, a matrix Q ═ Q is used_i,j]_{(K+M+E)×(K+M+E)}To represent the Q-Table value in the Q-learning method; the rows and columns thereof represent the current state and the next state, i.e., the currently connected network device and the next connected network device, respectively. Then Q is_i,jCan be expressed as:

Q_i,j＝r_i,j+α*maxQ_j,allequation (25)

Wherein α represents the discount factor, i.e., the impact of future rewards on the current decision, and α∈ [0,1]I and j represent the current state and the next state, respectively, where i, j ∈ [1, K + M + E]；Q_j,allRepresents determinant (1 x (K + M + E)), and represents Q value, maxQ, of network device j to other network devices in the network_j,allIndicating the maximum Q value.

When calculating the optimal cache state matrix C, in order to better maximize the cache hit rate, in addition to considering the cache probability matrix P and the reward matrix Q, the present invention also considers the spatial correlation H ═ H of the base station_i,j]_E×EWherein h is_i,jThe spatial correlation between the base station i and the base station j is represented by the following calculation formula:

specifically, in detail, obtaining the optimal buffer status matrix C may include the following steps:

401) inputting a cache probability matrix P, a base station spatial correlation degree H and a clustering result;

402) initializing the maximum iteration number epicode, Q-Table matrix Q, cache state matrix C and cache hit rate p_hit；

403) Calculating a matrix R according to the above formula (24);

404) the system randomly selects one user (FUE) i and one communication link (i.e. another communication device j);

405) calculating a corresponding Q value according to the above formula (25);

406) returning to step 44) until the Q-Table is stable or the upper limit of the iteration times is reached;

407) the system randomly selects one FUE i;

408) optimizing a reward matrix Q by using a cache probability matrix, wherein the formula comprises the following steps: q ═ Q × P;

409) selecting the best communication link, namely another communication device j, according to the maximum value of Q corresponding to FUE i;

410) calculating the corresponding current cache hit rate P according to the formula (4)_hit；

411) If P is_hitGreater than p_hitAnd communication device j is a base station and the other one or more base stations having a spatial correlation with communication device j greater than a threshold buffer the file, i.e. the file is stored

Returning to the previous step;

otherwise, if the remaining space of the communication device j is larger than the size of the file f, the communication device j caches the file, recalculates the remaining space of the communication device j, and adds P_hitAssign a value to p_hit；

412) Returning to the step 49) until all the files are cached;

413) return to step 47) until the highest number of iterations is completed.

By considering user preferences when configuring the base prize value, different base prize values are configured for users of the same category and users of different categories, so that network devices utilizing the same category may have a higher cache hit rate when subsequently selecting a communication link.

In an embodiment of the invention, the obtaining the optimal buffer status matrix may include:

inputting: p; a; g; h;

initialization, epicode ← 10⁶；Q←0；C←0；p_hit←0；

And (3) outputting: the state matrix C is cached.

In an embodiment of the present invention, the method may further include: updating the file cache state of the fog user equipment according to the file request of the fog user equipment; and updating the cache state matrix based on the updated file cache state of the fog user equipment.

Specifically, during the content delivery phase, each user requests the file according to his or her own preferences. As user preferences and content popularity will change dynamically over time. Therefore, user preferences and content popularity need to be dynamically updated in time to obtain an optimal cache probability matrix. In the present embodiment, dynamic updating of user preferences is considered first. Because the user can send the file request at any time according to the requirement of the user and immediately update the state of the user cache file, the specific update time is not required to be set. By executing the method, the latest user preference matrix, the latest content popularity matrix and the latest cache state matrix are obtained respectively. Finally, each user can update the value corresponding to the FUE in the cache state matrix C by only executing the cache updating method provided by the present invention, and then updates a part of the cache.

In an embodiment of the present invention, the method may further include: responding to a timing instruction of a timer, and acquiring the updated user preference matrix and the updated content popularity matrix;

and updating the corresponding values of the fog nodes and the base stations in the cache state matrix according to the updated user preference matrix and the updated content popularity matrix.

In particular, since the content popularity is used at the end of the fog node and the base station, only the content update process of the fog node and the base station needs to be discussed. Unlike the user update process, the update process of the node and the base station is a passive update process because they cannot timely obtain the relevant information required by the user. Therefore, the present embodiment sets an update timer, and the fogging node and the base station can refresh the cache content file at off-peak hours of each time period. In each update period, the latest user preference matrix and content popularity matrix are obtained according to the method described above. Then, based on the updated user preference matrix and the content popularity matrix, only partial values of the cache state matrix C of all the fog nodes and the base station are updated by using the method.

For example, in detail, the above updating method may include the following steps:

51) inputting initial parameters: user preference matrix P_UA network topology relation matrix A; cache hit ratio P_hit(ii) a A current cache state matrix C;

52) FUE k requests file f, if file f is not cached locally;

53) selecting the communication link with the best FUE k according to the network topology relation matrix A and the current cache state matrix C (selecting to establish a communication link with the communication equipment j), and jumping to the step 54), otherwise, jumping to the step 55);

54) if the residual space of the storage capacity of the FUE k is larger than or equal to the size of the file f, caching the file by the FUE k, and recalculating the residual space of the storage capacity of the FUE k; otherwise, searching a cached file with the minimum local user preference value, deleting the cached file, recalculating the residual space of the storage capacity of the FUE k until the residual cache space is larger than or equal to the size of the file f, caching the file by the FUE k, and recalculating the residual space of the storage capacity of the FUE k;

55) and updating the buffer state matrix C.

56) And outputting the buffer state matrix C.

In an embodiment of the present invention, the update method may include:

inputting: p_U；A；P_hit；C；

And (3) outputting: c and P_U。

In one embodiment of the present invention, to demonstrate the effectiveness of the proposed content caching and content updating method of the present invention in terms of cache hit rate, the present invention compares the proposed algorithm with existing methods, specifically, consider the following three algorithms, D2D caching algorithm, BS caching algorithm and FAP caching algorithm, in addition, we also compare the proposed updating algorithm with L RU updating algorithm and L FU updating algorithm and FAP updating algorithm_f∈[100,3000]And maximum communication distances of the BS, the FAP, and the FUE are set to 150 meters, 40 meters, and 30 meters, respectively.

Referring to fig. 3, fig. 3 shows a case where the cache hit rate of all the content caching method varies with time when the number of files F is 100. It can be seen that the cache hit rate increases over time. This is because caching decisions may be more accurate as the content requirements of the user increase. In addition to this, it can be observed that the cache hit rate of the proposed method is superior to other methods. The FAP buffering method and the BS buffering method suffer from a cold start problem due to insufficient user data in an initial period. On the other hand, although the D2D caching method solves the cold start problem, the method only considers the caching method of the user; the method provided by the invention not only considers the FUE caching method, but also considers the caching methods of FAP and BS, which is beneficial to the effective caching of the content file in the whole system, thereby obtaining higher caching hit rate.

Referring to fig. 4, fig. 4 shows the change of the cache hit rate in the content update stage with time when the number of files F is 100. it can be observed that the cache hit rates of the other three update methods decrease with time, while the performance of the proposed update method is less affected by time, since the method proposed by the present invention employs a content file prediction method in the content update process, and not only the FUE cache method but also the content update methods of FAP and BS are considered, whereas the L FU and L RU methods update the content without employing a content prediction method, and thus their cache hit rates are lower than the proposed method.

Referring to fig. 5, fig. 5 shows the case where the cache hit rate varies with time as the number of files varies. As can be seen from the figure, as the number of files increases, the cache hit rate decreases accordingly. This is because the larger the number F of files is, the larger the number of topics is, and the more dispersed the user preferences are, under the condition that the storage capacity of the network device is the same, and therefore the larger the number of files that need to be cached per NE on average is, resulting in a decrease in cache hit rate.

Referring to FIG. 6, FIG. 6 shows the cache hit rate over time as the number of FUEs changes. It can be observed that as the number of users increases, the cache hit rate also increases. This is because as the number of users increases, the caching decision may be more accurate with the same number of files.

According to the data caching management method in the embodiment of the invention, a distributed content caching method based on user preference prediction and content popularity prediction is realized. The optimization problem is established by taking the maximum cache hit rate as a target and considering the constraint conditions of storage and capacity limitation of each user. Then, predicting the user preference in an offline manner by utilizing a topic model in combination with the selfishness of the user, and clustering all the users based on a Self-organizing mapping (SOM) method; and then the prediction result is used for predicting the content popularity of each fog node and the base station in an online mode. And finally, in order to obtain an optimal content caching matrix, combining a caching probability matrix and the correlation degree between a user clustering result and a base station, providing a content caching method based on Q-learning. In addition, in order to track changes of content popularity in time, the user preference matrix is updated based on file requests of users, and the content popularity matrix and the network topology relation matrix are updated accordingly, so that the FUE-related part in the cache state matrix can be updated. In addition, for the fog nodes and the base station, the updated user preference matrix and the updated content popularity matrix are used, and the updating timer is used for refreshing the cache content files of the fog nodes and the base station in the off-peak period, so that the partial values of the cache state matrixes of all the fog nodes and the base station are updated regularly. And under the condition of ensuring the network state, a more reasonable cache updating method is configured. The simulation results show that compared with the existing algorithm, the content caching and updating method provided by the method fully considers the changes of user preference and content popularity and has better cache hit rate.

The following describes an embodiment of an apparatus of the present invention, which may be used to implement the content cache management method in the foregoing embodiment of the present invention. For details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the multimedia playing control method of the present invention.

Fig. 7 schematically shows a block diagram of a content cache management apparatus according to an embodiment of the present invention.

Referring to fig. 7, a content cache management apparatus 700 according to an embodiment of the present invention includes: a user preference matrix building module 701, a content popularity matrix building module 702, a cache probability matrix building module 703 and a cache state matrix obtaining module 704. Wherein,

the user preference matrix building module 701 may be configured to obtain historical request data of each user for the cached data and build a user preference matrix.

The content popularity matrix building module 702 may be configured to build a content popularity matrix according to a network topology relationship matrix corresponding to a network architecture to which the user belongs and the user preference matrix; the content popularity matrix is used for describing the content popularity of each fog node and each base station in the network architecture aiming at the cache data.

The caching probability matrix constructing module 703 may be configured to construct a caching probability matrix for the cached data based on the user preference matrix and the content popularity matrix.

The cache state matrix obtaining module 704 may be configured to obtain a cache state matrix by using a reinforcement learning model in combination with the spatial correlation between the cache probability matrix and the base station, so as to cache the cache data according to the cache state matrix.

In an embodiment of the present invention, the user preference matrix building module 701 includes: the system comprises a historical information matrix construction unit and a topic model training unit. Wherein,

the history information matrix construction unit may be configured to count the history request data in a preset period, obtain data statistics of file requests of each user for the cache data, and generate a corresponding history information matrix; the cache data comprises files, and each file is configured with a corresponding theme.

The topic model training unit may be configured to use the historical information matrix and the relationship matrix between the file and the corresponding topic as input parameters, and perform processing using a topic model to obtain the user preference matrix and the user request probability matrix.

In one embodiment of the present invention, the apparatus 700 may further include: and a user classification module.

The user classification module may be configured to cluster the users according to the user preference data in the user preference matrix to obtain information of each category and a category to which each user belongs.

In one embodiment of the present invention, the content popularity matrix construction model 703 may include:

In an embodiment of the present invention, the cache state matrix obtaining module 704 may include: the device comprises a basic reward value configuration unit, a reward matrix construction unit, a reward matrix optimization unit, a current cache hit rate calculation unit, an optimal communication link selection unit, a first cache execution unit and a second cache execution unit. Wherein,

the basic bonus value configuration unit may be configured to configure a basic bonus value corresponding to each communication link mode of the network architecture.

The reward matrix construction unit may be configured to randomly select a fog user device and another network device in the network architecture, calculate a corresponding reward value based on a basic reward value of a corresponding communication link mode and a future reward, and iterate until a target state is reached to obtain a corresponding reward matrix.

The reward matrix optimization unit may be configured to optimize the reward matrix using the cache probability matrix to obtain an optimized reward matrix.

The current cache hit rate calculation unit may be configured to randomly select a file request of a fog user device, and determine a corresponding optimal communication link according to the optimized reward matrix, so as to determine a current cache hit rate corresponding to the optimal communication link.

The optimal communication link selection unit may be configured to, when the current cache hit rate is greater than an initial cache hit rate of the network device corresponding to the optimal communication link, and the network device corresponding to the optimal communication link is a base station, obtain, according to the spatial correlation of the base station, another base station with a highest correlation with the base station.

The first cache execution unit may be configured to not execute caching when the other base station has cache data corresponding to the file request.

The second cache execution unit may be configured to execute caching, update the remaining cache space of the base station, and update the corresponding cache hit rate when the other base station does not have cache data corresponding to the file request.

In one embodiment of the present invention, the basic bonus value configuration unit includes:

In one embodiment of the invention, the apparatus further comprises: a first update module.

The first updating module may be configured to update a file cache state of the fog user equipment according to a file request of the fog user equipment; and updating the cache state matrix based on the updated file cache state of the fog user equipment.

In one embodiment of the invention, the apparatus further comprises: and a second updating module.

The second updating module may be configured to obtain the updated user preference matrix and the updated content popularity matrix in response to a timing instruction of a timer; and updating the corresponding values of the fog nodes and the base stations in the cache state matrix according to the updated user preference matrix and the updated content popularity matrix.

The details of each module in the content cache management apparatus have been described in detail in the corresponding content cache management method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

It should be noted that the computer system 900 of the electronic device shown in fig. 8 is only an example, and should not bring any limitation to the function and the scope of the application of the embodiment of the present invention.

As shown in fig. 8, a computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for system operation are also stored. The CPU 901, ROM902, and RAM 903 are connected to each other via a bus 904. An Input/Output (I/O) interface 905 is also connected to bus 904.

To the I/O interface 905, AN input section 906 including a keyboard, a mouse, and the like, AN output section 907 including a Cathode Ray Tube (CRT), a liquid Crystal Display (L acquired Crystal Display, L CD), and the like, a speaker, and the like, a storage section 908 including a hard disk, and the like, and a communication section 909 including a Network interface card such as L AN (L optical Area Network), a modem, and the like, the communication section 909 performs communication processing via a Network such as the internet, the drive 910 is also connected to the I/O interface 905 as necessary, a removable medium 911 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, and the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to an embodiment of the present invention, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the embodiment of the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A content cache management method, comprising:

2. The method of claim 1, wherein constructing the user preference matrix according to historical request data of each user for the cached data comprises:

counting the historical request data in a preset period, acquiring data statistics of file requests of each user for the cache data, and generating a corresponding historical information matrix; the cache data comprises files, and each file is configured with a corresponding theme;

and processing by using a theme model by taking the historical information matrix and the relation matrix of the file and the corresponding theme as input parameters to obtain the user preference matrix and the user request probability matrix.

3. The method of claim 2, further comprising:

and clustering the users according to the user preference data in the user preference matrix to acquire the information of each category and the category to which each user belongs.

4. The method according to claim 1, wherein the constructing a content popularity matrix according to the network topology relationship matrix corresponding to the network architecture to which the user belongs and the user preference matrix comprises:

5. The method according to claim 1 or 3, wherein the obtaining an optimal buffer state matrix by using a reinforcement learning model in combination with the spatial correlation between the buffer probability matrix and the base station comprises:

configuring a basic reward value corresponding to each communication link mode of the network architecture;

randomly selecting a fog user device and another network device in the network architecture, calculating a corresponding reward value based on a basic reward value of a corresponding communication link mode and future rewards, and iterating until a target state is reached to obtain a corresponding reward matrix;

optimizing the reward matrix by using the cache probability matrix to obtain an optimized reward matrix;

randomly selecting a file request of a fog user device, and determining a corresponding optimal communication link according to the optimized reward matrix so as to determine a current cache hit rate corresponding to the optimal communication link;

when the current cache hit rate is greater than the initial cache hit rate of the network equipment corresponding to the optimal communication link and the network equipment corresponding to the optimal communication link is a base station, acquiring other base stations with the highest relevance with the base station according to the spatial relevance of the base station;

when the other base stations have cache data corresponding to the file request, not executing caching; or

And when the other base stations do not have cache data corresponding to the file request, executing cache, updating the residual cache space of the base stations, and updating the corresponding cache hit rate.

6. The method according to claim 5, wherein the configuring the basic bonus value corresponding to each communication link mode of the network architecture comprises:

configuring a base prize value of a device-to-device communication link to a first prize value; configuring the basic reward value of the fog node-fog user equipment communication link as a second reward value; configuring a base reward value of a base station-fog user equipment communication link to be a third reward value;

wherein the first reward value > second reward value > third reward value; and a first reward value for a device-to-device communication link between homogeneous fog user devices is greater than a first reward value for a device-to-device communication link between non-homogeneous fog user devices.

7. The method of claim 1, further comprising:

updating the file cache state of the fog user equipment according to the file request of the fog user equipment; and updating the cache state matrix based on the updated file cache state of the fog user equipment.

8. The method of claim 1, further comprising:

responding to a timing instruction of a timer, and acquiring the updated user preference matrix and the updated content popularity matrix;

9. A content cache management apparatus, comprising:

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the content cache management method of any one of claims 1 to 8.