CN116209015A

CN116209015A - Edge network cache scheduling method, system and storage medium

Info

Publication number: CN116209015A
Application number: CN202310465386.2A
Authority: CN
Inventors: 魏振春; 罗子成; 吕增威; 石雷; 徐娟; 樊玉琦
Original assignee: Intelligent Manufacturing Institute of Hefei University Technology
Current assignee: Intelligent Manufacturing Institute of Hefei University Technology
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-06-02
Anticipated expiration: 2043-04-27
Also published as: CN116209015B

Abstract

The present invention relates to the field of wireless communications and edge computing, and in particular, to a method, a system, and a storage medium for scheduling an edge network cache. In the invention, firstly, a cache scheduling model and a global scheduling model with the same network structure are constructed aiming at a small base station and a large base station, and model parameters are distributed to the small base station through the large base station in the training process; simultaneously, carrying out local training on each small base station to obtain a model gradient; the large base station aggregates the model gradients uploaded by each small base station to obtain global aggregated gradients, feeds the global aggregated gradients back to the small base stations, and updates the cache scheduling model in each small base station by using the global aggregated gradients until the global scheduling model converges, so that an optimal cache strategy is formulated according to the cache scheduling model and the global scheduling model. The invention improves the convergence rate and global fairness of the global scheduling model and greatly improves the convergence rate of the cache scheduling model.

Description

Edge network cache scheduling method, system and storage medium

Technical Field

The present invention relates to the field of wireless communications and edge computing, and in particular, to a method, a system, and a storage medium for scheduling an edge network cache.

Background

With the increasing number and quality of multimedia services on mobile networks, the resulting massive data put tremendous strain on backbone networks and mobile networks. Edge computing is an effective way to address the backbone network burden. In edge computation, edge caches use cache servers to store content that is closer to the end user. An effective caching strategy is critical due to the limited communication resources and cache capacity of the caching server.

In recent years, deep reinforcement learning has become an important tool for collaborative caching in edge computing. The federal reinforcement learning can protect the privacy of the terminal equipment, and compared with the traditional learning mode, the mode of only transmitting the weight greatly reduces the network resource consumption. However, the federal reinforcement learning method currently applied to edge caching results in some data being too much representative of all data due to the non-independent co-distribution of the data. The end result is that the global model obtained by federal reinforcement learning training has too much difference in performance on different terminal devices, so that the global model is not available in some devices, which can hit the enthusiasm of the devices to participate in federation.

Disclosure of Invention

In order to solve the defects of slow network convergence, poor global model fairness and the like in the traditional federation reinforcement learning in the prior art, the invention provides an edge network cache scheduling method which can realize good global fairness in edge network cache scheduling and has high training speed of a used scheduling model.

The invention provides an edge network cache scheduling method, which is used for edge network cache scheduling in a wireless communication network comprising a cloud server, a large base station and a small base station, wherein the edge network consists of the large base station and the small base station in the coverage area of the large base station; the edge network cache scheduling method decides the cache content of a large base station through a global scheduling model, and decides the cache content of a small base station through a cache scheduling model corresponding to the small base station one by one; the global scheduling model and the cache scheduling model are both neural network models; the input of the global scheduling model is the global state of the edge network where the large base station is located, and the output is the action probability distribution of the large base station; the input of the cache scheduling model is the state of the corresponding small base station, and the output is the action probability distribution of the corresponding small base station; the cache scheduling model at least comprises a network structure which is the same as the global scheduling model;

the state of the small base station comprises the content storage state of the small base station and the task request state of a client in the coverage area of the small base station; the actions of the small base station comprise content to be cached and content to be deleted; the state of the large base station is a set of states of all small base stations in a coverage area of the large base station, and the actions of the large base station comprise contents to be cached and contents to be deleted of the large base station;

the training process of the global scheduling model and the buffer scheduling model is as follows:

s1, acquiring a cache scheduling model corresponding to each small base station and a global scheduling model corresponding to a large base station;

s2, randomly selecting part of small base stations from the small base stations as reference base stations, and configuring the parameters of the cache scheduling model corresponding to each reference base station as network parameters of the global scheduling model;

s3, carrying out local training on the cache scheduling model corresponding to each reference base station, selecting Z reference base stations which complete the local training of the cache scheduling model as alternative base stations, and acquiring gradients and accumulated rewards of each alternative base station;

s4, calculating cosine distances between gradients of different alternative base stations, clustering according to the cosine distances, and forming K alternative base stations from Z alternative base stationsClusters, 1+.K+.10; calculating the global network parameters of each cluster, and enabling the global network parameters of the kth cluster to be marked as w _k ^T+1 ，

，/>

Representing the average gradient of the candidate base stations contained in the kth cluster; w (w) _G ^T Representing network parameters of the global scheduling model, T representing a round, T+1 representing a next round of T; replacing network parameters of a cache scheduling model of the alternative base station of each cluster with global network parameters w corresponding to the cluster _k ^T+1 The method comprises the steps of carrying out a first treatment on the surface of the The initial value of T is 0;

s5, arranging Z alternative base stations according to a descending order of the accumulated rewards, and marking the gradient of the alternative base station positioned at the Z-th position as G _z ^T 1 +.z, constructing the alternate gradient set { G- _z ^T 1 +.z +.Z +.; acquiring the theta candidate base stations ordered earlier in the descending order of the jackpot prize as target base stations, and the gradient of the sigma target base station as G ^# _σ 1 +.sigma +.theta, constructing the target gradient set { G- ^# _σ 1 +.sigma +.θ }; θ is an integer value obtained by rounding α×z, α is a set fairness and robustness control factor;

s6, let eta ^# =G ^# _σ The method comprises the steps of carrying out a first treatment on the surface of the The initial value of sigma is 1;

s7, let η=g _z ^T The method comprises the steps of carrying out a first treatment on the surface of the The initial value of z is 1;

s8, judging whether the specification of eta multiplied by eta is satisfied ^# <0; if yes, G in the target gradient set ^# _σ Updated to eta ^# -η ² ×η ^# /||η|| ² ，||η|| ² Representing the two norms of η, and then executing step S9; if not, executing step S9;

s9, judging whether Z is larger than or equal to Z; if not, updating z to z+1, and returning to the step S7; if yes, the following step S10 is executed;

s10, judging whether sigma is larger than or equal to theta; if not, the sigma is updated to sigma+1, z is initialized, and then the step S6 is returned; if yes, then select ladderThe first theta gradients in the degree set are replaced by the corresponding gradients in the target gradient set one by one, and the gradient mean value g in the alternative gradient set is calculated _G ^T+1 Calculating a transition term w _G ^T+1 =w _G ^T +g _G ^T+1 The method comprises the steps of carrying out a first treatment on the surface of the Then updating T to T+1, returning to step S2, and updating the network parameters of the global scheduling model to w when the updating times of T reach the set value c4 _G ^T+1 And fixes the global schedule model and the cache schedule model.

Preferably, in S3, the gradient of the candidate base station is a difference between the network parameter of the cache scheduling model after the candidate base station completes the local training and the network parameter of the current global scheduling model.

Preferably, the cache scheduling model is composed of a current network and a target network, and the current network and the target network have the same network structure as the global scheduling model; n-th small base station B in S3 _n The local training of the corresponding cache scheduling model comprises steps S31 to S37; n is equal to or less than 1 and equal to or less than N, wherein N is the number of small base stations in the edge network;

s31, setting cumulative rewards NR _n And a sample cumulative value AR; let small base station B _n The network parameters of the current network in the corresponding cache scheduling model are denoted as w _n The network parameters of the target network are denoted as w _n ^# Updating network parameters of the current network and the target network to network parameters w of the current global scheduling model _G The method comprises the steps of carrying out a first treatment on the surface of the Initialization states S, AR and NR _n All initialized to 0;

s32, inputting the state S into a current network, outputting action probability distribution by the current network, and selecting a decision action A by combining the action probability distribution;

s33, acquiring a state S after the state S executes the action A ^# A prize R; updating jackpot NR _n Equal to NR _n +R; the reward R is action reward for executing the action A when the small base station is in the state S;

s34, updating AR to AR+1, and constructing samples { S, A, S } ^# R, done is stored in a set experience playback set D, and done is a task completion mark; if AR<c3, done=0; if ar=c3, done=1; c3 is a set value; judging experience returnWhether the number of samples in the set reaches a set value c2 or not; if not, make S updated to S ^# Then returns to S32; if yes, the following step S35 is executed;

s35, sampling I samples from the experience playback set D, wherein the ith sample is named as { S } _i ,A _i ,S _i ^# ,R _i ,done _i }，S _i Representing the state in the ith sample, A _i Representing the action in the ith sample, S _i ^# Representing the next state in the ith sample, R _i Indicating the prize in the ith sample, done _i Indicating that the task completion flag in the ith sample is 1 +.i; calculating a loss function L (Q) by combining the I samples;

s36, updating the parameter w of the current network by combining gradient back propagation of L (Q) through the neural network _n If the number of parameter updating times of the current network is an integer multiple of the set value c1, updating the parameter w of the target network _n ^# So that w _n ^# =w _n ；

S37, judging whether the AR reaches c3; if not, make S updated to S ^# AR is updated to AR+1, then the state S is input into the current network, the decision action A is selected by combining the action probability distribution output by the current network, and a sample { S, A, S ] is constructed by combining the state S and the decision action A ^# R, done } and store in experience playback set D; then returning to step S35; the local training of the cache scheduling model is complete.

Preferably, S32 is adopted

The greedy approach selects decision actions based on the action probability distribution.

Preferably, let small base station B _n The action prize on slot t is denoted R (t, n), then:

R(t,n)=R _c (t,n)-R _d (t,n)；

R _c (t, n) represents small cell B _n Hit rewards on time slot t for the content buffered on time slot t+1, i.e. small base station B _n Buffered at time slot t and is buffered by small base station B _n Content requested by clients within coverage on time slot t+1Is the number of (3);

R _d (t, n) represents small cell B _n The cache content deleted on time slot t results in a negative penalty on time slot t+1; i.e. small base station B _n Buffered in time slot t-1 and deleted in time slot t and is detected by small cell B _n The number of content requested by clients in coverage over time slot t+1.

Preferably, the loss function in S35 is calculated as follows:

L(Q)=[Σ _i=1 ^I (y _i -Q(S _i ,A _i ,w _n )) ² +µ||w _G -w _n || ² /2]/I

Q(S _i ,A _i ,w _n ) Action A in action probability distribution representing current network output _i The corresponding probabilities; sigma represents summation, and the summation range is i E [1, I]The method comprises the steps of carrying out a first treatment on the surface of the The [ mu ] represents a regular term parameter; w (w) _G Network parameters, w, representing the current global scheduling model _n Network parameters representing a cache scheduling model; i W _G -w _n || ² Representing w _G -w _n Is a binary norm of (2); y is _i Is the current target value;

when done _i =1，y _i =R _i ；

When done _i =0，

；

Q(S _i ^# ,a,w _n ) Indicating that the current network is in the state S _i ^# The probability corresponding to the action a in the action probability distribution output at the time,

representing the maximum probability, a, in the probability distribution of actions of the current network output ^# Representing an action traversal space;

indicating that the target network is in the state S _i ^# Probability corresponding to target action in action probability distribution output at time, and target actionIs input as state S for the current network _i ^# The action corresponding to the maximum probability in the action probability distribution output at the time;

gamma represents an attenuation factor; r is R _i Indicating the prize in the ith sample, i.e. the small cell is in state S _i Execute action A at time _i Is a reward for (a).

Preferably, in S4, a K-means algorithm is adopted to cluster cosine distances between gradients of the alternative base stations.

Preferably, the cache scheduling model and the global scheduling model are trained by adopting historical data in an initial state; in the working process, the global scheduling model and the cache scheduling model are updated and trained regularly, and when the global scheduling model is updated and trained, network parameters of the global scheduling model are used as initialization parameters of the cache scheduling model.

The invention also provides an edge network cache scheduling system and a storage medium, which provide a carrier for the edge network cache scheduling method.

The invention provides an edge network cache scheduling system, which comprises a memory and a processor, wherein a computer program is stored in the memory, the processor is connected with the memory, and the processor is used for executing the computer program to realize the edge network cache scheduling method.

The storage medium provided by the invention stores a computer program, and the computer program is used for realizing the edge network cache scheduling method when being executed.

The invention has the advantages that:

(1) According to the edge network cache scheduling method, the cache scheduling model corresponding to each small base station is combined with the local specificity to perform local training, so that the adaptability to the local situation is ensured; and the gradient of the cache scheduling model of the small base station is combined in the global scheduling model training process of the large base station, and the parameter updating is performed by combining the global aggregation gradient, so that the convergence speed and the global fairness of the global scheduling model are improved.

(2) The buffer scheduling model of the small base station takes the network parameters of the global scheduling model of the large base station as initialization parameters, so that the convergence speed of the buffer scheduling model is greatly improved.

(3) The loss function constructed in the invention considers the hit condition of the small base station on the task request of the client in the coverage area, greatly improves the hit rate of the small base station cache scheduling, and is beneficial to the rapid convergence of the cache scheduling model of the small base station and the establishment of the optimal cache strategy.

(4) According to the invention, the global scheduling model considers the cache scheduling conditions of all the small base stations in the coverage area of the large base station, and the small base stations are enabled to cache the content required by the client in the coverage area by matching the global scheduling model and the cache scheduling model, and the large base station considers the whole network requirement condition of the edge network and the small base station cache condition when selecting the cache content, so that the fairness and the excellence of the edge network cache are greatly improved, and the client requirement under the edge network is fully satisfied.

(5) According to the invention, a simple and clear rewarding algorithm is provided, so that the calculation of rewards is simple and efficient.

(6) In the invention, small base stations serving as candidate base stations are clustered by combining cosine distances among gradients of the cache scheduling model, and then network parameters of the cache scheduling model in the cluster are updated by combining network parameters of the global scheduling model and average gradients of the cache scheduling model in the cluster, so that the convergence rate of the cache scheduling model is further improved. Meanwhile, the situation of each alternative base station is subjected to similarity evaluation by combining the cosine distance, the alternative base stations with similar states are mutually aggregated, and fairness and robustness of the cache scheduling model are further improved through cross reference among the alternative base stations.

(7) In the invention, the small base stations, the reference base station, the alternative base station and the target base station are selected layer by layer and progressive, thereby realizing the updating of all the small base stations and avoiding the network stagnation problem caused by frequent updating of all the small base stations. In the invention, the large base station and the small base station are nested and updated, and the large base station realizes the real-time support of the content requirement of the whole edge network; the small base stations are updated alternately, so that part of the small base stations do not participate in training in the training process, support is provided for clients in the coverage area of the small base stations, the buffer pressure of the large base stations is relieved, and the normal operation of the network of the edge network is guaranteed.

Drawings

FIG. 1 is a flow chart of a method for scheduling edge network buffers;

FIG. 2 is a flow chart of a local training method of a cache scheduling model;

FIG. 3 is a diagram illustrating statistics of cache hit rates for a small base station in an embodiment;

FIG. 4 is a diagram of the statistics of the cache hit rate of a small cell in an embodiment;

FIG. 5 is a diagram showing the comparison of the cache hit rate of a large base station in the embodiment.

Detailed Description

The edge network cache scheduling method provided by the embodiment is suitable for a wireless communication network comprising a cloud server, a large base station and a small base station; the communication coverage of the large base station is larger than that of the small base station; the wireless communication network comprises a plurality of large base stations, wherein each large base station comprises a plurality of small base stations in the coverage area; the large base station and the small base stations in the coverage area form an edge network;

the cache scheduling method decides the cache content of the large base station through the global scheduling model, and decides the cache content of the small base station through the cache scheduling model corresponding to the small base station one by one.

Referring to fig. 1 and 2, the method for scheduling an edge network buffer according to the present embodiment includes the following steps S1 to S10.

S1, acquiring a cache scheduling model corresponding to each small base station and a global scheduling model corresponding to a large base station; the cache scheduling model comprises a current network Q and a target network Q ^# Current network Q and target network Q ^# All are Q networks; the input of the Q network of the cache scheduling model is the state of the corresponding small base station, and the output is the action probability distribution of the corresponding small base station; the global scheduling model adopts a Q network, the input of the global scheduling model is the global state of the edge network where the large base station is located, and the output is the action probability distribution of the large base station; the network parameters of the current global scheduling model are denoted as w _G The network parameters of the global scheduling model on round T are noted as w _G ^T The method comprises the steps of carrying out a first treatment on the surface of the Current network of nth small cellThe network parameters of Q are denoted as w _n Target network Q of nth small cell ^# The network parameters of (a) are denoted as w _n ^# 。

Specifically, in this embodiment, the edge network includes N small base stations, and the small base station set b= { B ₁ ,B ₂ ,…,B _n ,…,B _N }，B _n Representing the nth small base station, wherein N is equal to or less than 1 and N is equal to or less than N; small base station B _n The state at time slot t is denoted S (t, n) and the global state of the edge network at time slot t is denoted S (t).

Thus, small base station B _n The input of the corresponding buffer scheduling model on the time slot t is S (t, n), and the output is the small base station B _n The action probability distribution decided on the time slot t; the input of the global scheduling model on the time slot t is S (t), and the output is the action probability distribution decided by the large base station on the time slot t.

S(t)={S(t,1),S(t,2),…,S(t,n),…,S(t,N)}

S(t,n)={M(t,n),Q(t,n)}

M (t, n) is small base station B _n The content storage state over time slot t, M (t, n) can be expressed as a binary number set, i.e., M (t, n) = { M ₁ (t,n),M ₂ (t,n),…,M _f (t,n),…,M _F (t, n) }, wherein M _f (t, n) is a binary number, when the small base station B _n Caching content C on time slot t _f M is then _f (t, n) =1; when small base station B _n No cached content C on time slot t _f M is then _f (t,n)=0；C _f Representing the F content, wherein the total content quantity is F, namely 1.ltoreq.fF.ltoreq.F; the content set is denoted as c= { C ₁ ,C ₂ ,…,C _f ,…,C _F }；

Q (t, n) is small base station B _n The task request state of the client in the coverage area on the time slot t can be specifically expressed as: q (t, n) = { Q ₁ (t,n),Q ₂ (t,n),…,Q _kn (t,n),…,Q _Kn (t,n)}，Q _kn (t,n)∈C∪{0}，Q _kn (t, n) represents the task request of the client u (n, kn) on the time slot t, the client u (n, kn) being the small base station B _n An n-th client in the communication coverage area is 1+.kn+.kn, kn represents the small base station B _n Coverage areaTotal number of clients within; if Q _kn (t,n)=C _f Then it means that the task request of client u (n, kn) on time slot t is content C _f The method comprises the steps of carrying out a first treatment on the surface of the If Q _kn (t, n) =0, then it means that client u (n, kn) has no task request on slot t.

Small base station B _n The action on the time slot t is denoted as a (t, n), a (t, n) = { a _d (t,n),A _c (t,n)}；

A _c (t, n) represents small cell B _n Set of contents to be cached on time slot t, A _d (t, n) represents small cell B _n A set of cached content to be deleted over time slot t;

the action of the large base station on the time slot t is denoted as A _G (t)，A _G (t) includes a set of cached content to be cached by the large base station on time slot t and a set of cached content to be deleted by the large base station on time slot t.

S2, from N small base stations B ₁ ,B ₂ ,…,B _n ,…,B _N The random selection of H small base stations is denoted as set Br= { Br ₁ ,Br ₂ ,……,Br _h ,…,Br _H Configuring parameters of a buffer scheduling model corresponding to each small base station in Br as w _G ^T The method comprises the steps of carrying out a first treatment on the surface of the T has an initial value of 0,w _G ⁰ To initialize network parameters; h is an ordinal number, 1 is less than or equal to H is less than or equal to H;

notably, if a new model is trained, the network parameters w are initialized _G ⁰ Initializing network parameters for random; if the model training is performed in the using process, initializing the network parameters w _G ⁰ And the network parameters of the current global scheduling model.

S3, carrying out local training on the cache scheduling model of each small base station in Br, selecting Z small base stations which finish the local training of the cache scheduling model as candidate base stations, and obtaining a gradient set g of the candidate base stations ^T And jackpot set NR ^T The method comprises the steps of carrying out a first treatment on the surface of the Let the gradient of the z-th alternative base station be denoted as g _z ^T The jackpot for the z-th alternative base station is noted as NR _z ^T ，1≦z≦Z；g ^T ={g ₁ ^T ,g ₂ ^T ,…,g _z ^T ,…,g _Z ^T }，NR ^T ={NR ₁ ^T ,NR ₂ ^T ,…,NR _z ^T ,…,NR _Z ^T }；

Nth small base station B _n The local training of the corresponding cache scheduling model comprises steps S31 to S37;

s31, make small base station B _n The network parameters of the current network Q in the corresponding cache scheduling model are denoted as w _n Target network Q ^# The network parameters of (a) are denoted as w _n ^# Let w _n =w _G ^T ，w _n ^# =w _n The method comprises the steps of carrying out a first treatment on the surface of the Setting cumulative rewards NR _n And a sample cumulative value AR; w (w) _G ^T Representing network parameters of a global scheduling model on the round T;

initialization state s=s (t, n), AR and NR _n All initialized to 0;

s32, inputting the state S into the current network Q, outputting action probability distribution on the time slot t by the current network Q, and selecting a decision action A by combining probabilities corresponding to the actions, wherein the decision action A is the small base station B _n Action a (t, n) on time slot t; in the specific implementation, can be adopted

The greedy method selects the decision action a=a (t, n) on the time slot t according to the action probability distribution.

S33, the state after the acquisition of the state S and the execution of the action A is recorded as the next state S ^# A prize R; updating jackpot NR _n Equal to NR _n +R; the reward R is action reward for executing the action A when the small base station is in the state S;

let small base station B _n The action prize on slot t is denoted R (t, n), then:

R(t,n)=R _c (t,n)-R _d (t,n) ；

R _c (t, n) represents small cell B _n Hit rewards on time slot t for the content buffered on time slot t+1, i.e. small base station B _n Content buffered in time slot t and small cell B _n The overlapping number of task requests of clients within the coverage area on time slot t+1; it can be seen that R _c (t, n) is small base station B _n Buffered at time slot t and is buffered by small base station B _n The number of content requested by clients within the coverage area on time slot t+1;

R _d (t, n) represents small cell B _n The cache content deleted on time slot t results in a negative penalty on time slot t+1; i.e. small base station B _n Buffer content deleted on time slot t and small base station B _n The overlapping number of task requests of clients within the coverage area on time slot t+1; it can be seen that R _d (t, n) is small base station B _n Buffered in time slot t-1 and deleted in time slot t and is detected by small cell B _n The number of content requested by clients in coverage over time slot t+1.

The state S after the action A is executed is recorded as the next state S ^# Definition of R _c (S,S ^# ) Cached for the small base station in the state S and the client in the coverage area of the small base station in the next state S ^# The number of content requested at the time, R _d (S,S ^# ) The step S is that the small base station is deleted in the action A and the client in the coverage area of the small base station is in the next state ^# The amount of content requested at the time, then the prize r=r for action a _c (S,S ^# )-R _d (S,S ^# )。

S34, updating AR to AR+1, and constructing samples { S, A, S } ^# R, done is stored in a set experience playback set D; done is a task completion flag; if AR<c3, done=0; if ar=c3, done=1; c3 is a set value; judging whether the number of samples in the experience playback set reaches a set value c2 or not; if not, make S updated to S ^# Then returns to S32; if yes, the following step S35 is executed;

s35, sampling I samples from the experience playback set D, wherein the ith sample is named as { S } _i ,A _i ,S _i ^# ,R _i ,done _i }，S _i Representing the state in the ith sample, A _i Representing the action in the ith sample, S _i ^# Representing the next state in the ith sample, R _i Indicating the prize in the ith sample, done _i A task completion flag in the i-th sample, 1I is less than or equal to I; the current target value y is calculated in combination with the following formula _i And a loss function L (Q);

when done _i =1，y _i =R _i ；

When done _i =0，

；

Q(S _i ^# ,a,w _n ) Indicating that the current network Q is in the state S _i ^# The probability corresponding to the action a in the action probability distribution output at the time,

representing the maximum probability, a, in the probability distribution of the actions of the Q network output ^# Representing an action traversal space;

representing a target network Q ^# In the input state S _i ^# The probability corresponding to the target action in the action probability distribution output at the time, wherein the target action is the state S of the current network Q at the input _i ^# The action corresponding to the maximum probability in the action probability distribution output at the time;

gamma represents an attenuation factor; r is R _i Indicating the prize in the ith sample, i.e. the small cell is in state S _i Execute action A at time _i Is a reward of (a);

L(Q)=[Σ _i=1 ^I (y _i -Q(S _i ,A _i ,w _n )) ² +µ||w _G ^T -w _n || ² /2]/I

Q(S _i ,A _i ,w _n ) Action A in action probability distribution representing current network Q output _i The corresponding probabilities; sigma represents summation, and the summation range is i E [1, I]The method comprises the steps of carrying out a first treatment on the surface of the The [ mu ] represents a regular term parameter; i W _G ^T -w _n || ² Representing w _G ^T -w _n Is a binary norm of (2);

s36, updating the parameter w of the current network by combining gradient back propagation of L (Q) through the neural network _n If when atThe parameter w of the target network is updated if the number of parameter updates of the previous network is an integer multiple of the set value c1 _n ^# So that w _n ^# =w _n ；

S37, judging whether the AR reaches a set c3; if not, make S updated to S ^# AR is updated to AR+1, then the state S is input into the current network, the decision action A is selected by combining the action probability distribution output by the current network, and a sample { S, A, S ] is constructed by combining the state S and the decision action A ^# R, done } and store in experience playback set D; then returning to step S35; the local training of the cache scheduling model is complete.

S4, calculating cosine distances between gradients of different alternative base stations, clustering the cosine distances by adopting a K-means algorithm (K-means clustering algorithm), forming K clusters by Z alternative base stations, calculating global network parameters of each cluster, and recording the global network parameters of the kth cluster as w _k ^T+1 Replacing network parameters of a cache scheduling model of the alternative base station of each cluster with global network parameters w corresponding to the cluster _k ^T+1 ；

，/>

Representing the average gradient of the candidate base stations contained in the kth cluster. In a specific operation, K may take a value of 3.

In the implementation, in this step, any existing cluster classification algorithm may be used to form K clusters from the Z candidate base stations.

Specifically, in this embodiment, let the kth cluster be referred to as mBS on the round T _k ^T Cluster mBS _k ^T Including J _k ^T Each alternative base station, cluster mBS _k ^T The average gradient of the candidate base stations is noted as

The method comprises the steps of carrying out a first treatment on the surface of the Then there are:

cluster set mBS ^T ={mBS ₁ ^T ,mBS ₂ ^T ,…,mBS _k ^T ,…,mBS _K ^T }

Intra-cluster base station number set J ^T ={J ₁ ^T ,J ₂ ^T ,…,J _k ^T ,…,J _K ^T }

Intra-cluster mean gradient set

Wherein, 1 is less than or equal to K is less than or equal to K.

S5, arranging Z alternative base stations according to a descending order of the accumulated rewards, and marking the gradient of the alternative base station positioned at the Z-th position as G _z ^T 1 +.z, constructing the alternate gradient set { G- _z ^T 1 +.z +.Z +.; the theta alternative base stations with the top order in the alternative gradient set are obtained and are marked as target base stations, and the gradient of the sigma-th target base station is marked as G ^# _σ 1 +.sigma +.theta, constructing the target gradient set { G- ^# _σ 1 +.sigma +.θ }; θ is an integer value obtained by rounding α×z, and specifically rounding, rounding up, rounding down, or the like may be employed; alpha is a set fairness and robustness control factor.

s9, judging whether z=Z is satisfied; if not, updating z to z+1, and returning to the step S7; if yes, the following step S10 is executed;

s10, judging whether sigma=θ is satisfied; if not, the sigma is updated to sigma+1, z is initialized, and then the step S6 is returned; if yes, replacing the first theta gradients in the candidate gradient set one by one to be the corresponding gradients in the target gradient set, and calculating a gradient mean value g in the candidate gradient set _G ^T+1 Calculate w _G ^T+1 =w _G ^T +g _G ^T+1 The method comprises the steps of carrying out a first treatment on the surface of the Then updating T to T+1, returning to step S2, and updating the network parameters of the global scheduling model to w when T reaches a set value c4 _G ^T+1 And fixes the global schedule model and the cache schedule model.

The invention is illustrated below with reference to specific examples.

In this embodiment, an edge network including 1 large base station and 30 small base stations in a wireless communication network is used for scene simulation, the cache capacity of the small base stations is 30 contents, and the total content is 100. In this embodiment, assume that the content request sequence received by the small base stations numbered 1-15 obeys Ji Pufu law zipf (1.2) with a slope of 1.2; the sequence of content requests received by the small base stations numbered 16-25 obeys Ji Pufu law zipf (1.5) with a slope of 1.5; the sequence of content requests received by the small base stations numbered 26-30 obeys Ji Pufu law zipf (0.8) with a slope of 0.8.

In this embodiment, the above-mentioned edge network cache scheduling method is adopted to train the cache scheduling model of the small base station and the global scheduling model of the large base station, and in the training process, let z=25, k=3; c1 =100, c2=128, c3=200, c4=10; i=64;

in this embodiment, for convenience of presentation, the small base stations numbered 1-15 calculate average buffer hit rates under different fairness and robustness control factors α, the small base stations numbered 16-25 calculate average buffer hit rates under different fairness and robustness control factors α, and the small base stations numbered 26-30 calculate average buffer hit rates under different fairness and robustness control factors α, and specific statistical results are shown in fig. 3 and 4. The cache hit rate of the large base station under different fairness and robustness control factors α is shown in fig. 5. The cache hit rate of the base station is the ratio of the number of the content requested by the client in the coverage area of the base station in the cache content of the base station to the total number of the cache content of the base station.

As can be seen from fig. 3 and fig. 4, the larger α is, the lower the standard deviation of the cache hit rate of the small cell is, and the lower the overall average value of the cache hit rate is; this means that the larger the α is, the higher the fairness that different small base stations exhibit in cache hit rate; but the smaller alpha is, the higher the cache accuracy of the base station obeying Ji Pufu law zipf obeying a specific slope is, and when alpha is 0, the small base stations 1-15 obeying Ji Pufu law zipf (1.2) obeying a slope of 1.2 can realize an average cache hit rate of 0.781; when α is 0.6, a small base station 16-25 that follows Ji Pufu law zipf (1.5) with a slope of 1.5 for the content request sequence may achieve an average cache hit rate of 0.753; when α is 0.8, the average cache hit rate of the small base stations 1-15, the average cache hit rate of the small base stations 16-25, and the average cache hit rate of the small base stations 26-30 are all between intervals (0.66,0.7). It can be seen that in this embodiment, the association between the cache hit rate of the small base station and α is linked with each other, which proves that the cache scheduling method provided by the invention achieves a good effect on the small base station.

In fig. 5, when comparing different α values, the cache scheduling method provided by the present invention has a cache hit rate on a large base station; the cache hit rate of the invention on a large base station is compared with that of other existing algorithms. It can be known that when the cache scheduling method provided by the invention is adopted, the smaller the alpha value is, the larger the cache capacity of the large base station is, the larger the hit rate of the large base station is; and the cache hit rate of the cache scheduling method on the large base station is superior to that of the existing edge network caching algorithm, such as least used content (LRU), first-in first-out (FIFO) and Random permutation (Random).

The above embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. An edge network cache scheduling method is used for edge network cache scheduling in a wireless communication network comprising a cloud server, a large base station and a small base station, wherein the edge network consists of the large base station and the small base station within the coverage area of the large base station; the method is characterized in that the edge network cache scheduling method decides the cache content of a large base station through a global scheduling model, and decides the cache content of a small base station through a cache scheduling model corresponding to the small base station one by one; the global scheduling model and the cache scheduling model are both neural network models; the input of the global scheduling model is the global state of the edge network where the large base station is located, and the output is the action probability distribution of the large base station; the input of the cache scheduling model is the state of the corresponding small base station, and the output is the action probability distribution of the corresponding small base station; the cache scheduling model at least comprises a network structure which is the same as the global scheduling model;

s4, calculating cosine distances between gradients of different alternative base stations, clustering according to the cosine distances, and forming K clusters of Z alternative base stations, wherein K is smaller than or equal to 1 and smaller than or equal to 10; calculating the global network parameters of each cluster, and enabling the global network parameters of the kth cluster to be marked as w _k ^T ⁺¹ ，

，/>

Representing the average gradient of the candidate base stations contained in the kth cluster;w _G ^T representing network parameters of the global scheduling model, T representing a round, T+1 representing a next round of T; replacing network parameters of a cache scheduling model of the alternative base station of each cluster with global network parameters w corresponding to the cluster _k ^T+1 The method comprises the steps of carrying out a first treatment on the surface of the The initial value of T is 0;

s10, judging whether sigma is larger than or equal to theta; if not, the sigma is updated to sigma+1, z is initialized, and then the step S6 is returned; if yes, replacing the first theta gradients in the candidate gradient set one by one to be the corresponding gradients in the target gradient set, and calculating a gradient mean value g in the candidate gradient set _G ^T+1 Calculating a transition term w _G ^T+1 =w _G ^T +g _G ^T+1 The method comprises the steps of carrying out a first treatment on the surface of the Then updating T to T+1, returning to step S2, and updating the network parameters of the global scheduling model to w when the updating times of T reach the set value c4 _G ^T+1 And fixes the global schedule model and the cache schedule model.

2. The edge network cache scheduling method of claim 1, wherein in S3, the gradient of the candidate base station is a difference between the network parameters of the cache scheduling model after the candidate base station has completed the local training and the network parameters of the current global scheduling model.

3. The edge network cache scheduling method of claim 2, wherein the cache scheduling model is composed of a current network and a target network, and the current network and the target network are the same as the global scheduling model in network structure; n-th small base station B in S3 _n The local training of the corresponding cache scheduling model comprises steps S31 to S37; n is equal to or less than 1 and equal to or less than N, wherein N is the number of small base stations in the edge network;

s36, updating the parameter w of the current network by combining gradient back propagation of L (Q) through the neural network _n The method comprises the steps of carrying out a first treatment on the surface of the If the number of parameter updating times of the current network is an integer multiple of the set value c1, updating the parameter w of the target network _n ^# So that w _n ^# =w _n ；

4. The edge network cache scheduling method of claim 3, wherein the step of S32 is performed by

5. The edge network buffer scheduling method of claim 3 wherein small cell B is caused to _n The action prize on slot t is denoted R (t, n), then:

R(t,n)=R _c (t,n)-R _d (t,n) ；

R _c (t, n) represents small cell B _n Hit rewards on time slot t for the content buffered on time slot t+1, i.e. small base station B _n On time slot tCached and by small base station B _n The number of content requested by clients within the coverage area on time slot t+1;

6. The method for scheduling edge network buffers as recited in claim 3, wherein the loss function in S35 is calculated as follows:

L(Q)=[Σ _i=1 ^I (y _i -Q(S _i ,A _i ,w _n )) ² +µ||w _G -w _n || ² /2]/I

when done _i =1，y _i =R _i ；

When done _i =0，

；Q(S _i ^# ,a,w _n ) Indicating that the current network is in the state S _i ^# Probability corresponding to action a in action probability distribution output at the time,/->

indicating that the target network is in the state S _i ^# The probability corresponding to the target action in the action probability distribution output at the time, wherein the target action is the state S of the current network at the input _i ^# The action corresponding to the maximum probability in the action probability distribution output at the time;

7. The edge network cache scheduling method of claim 1, wherein the cosine distances between gradients of the candidate base stations are clustered in S4 using a K-means algorithm.

8. The edge network cache scheduling method of claim 1, wherein the cache scheduling model and the global scheduling model are trained with historical data in an initial state; in the working process, the global scheduling model and the cache scheduling model are updated and trained regularly, and when the global scheduling model is updated and trained, network parameters of the global scheduling model are used as initialization parameters of the cache scheduling model.

9. An edge network cache scheduling system, comprising a memory and a processor, wherein the memory stores a computer program, the processor is connected to the memory, and the processor is configured to execute the computer program to implement the edge network cache scheduling method according to any one of claims 1 to 8.

10. A storage medium storing a computer program which, when executed, is adapted to implement the edge network cache scheduling method of any one of claims 1 to 8.