CN109617991B - Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network - Google Patents

Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network Download PDF

Info

Publication number
CN109617991B
CN109617991B CN201811634918.6A CN201811634918A CN109617991B CN 109617991 B CN109617991 B CN 109617991B CN 201811634918 A CN201811634918 A CN 201811634918A CN 109617991 B CN109617991 B CN 109617991B
Authority
CN
China
Prior art keywords
state
base station
file
time slot
small
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811634918.6A
Other languages
Chinese (zh)
Other versions
CN109617991A (en
Inventor
潘志文
高深
刘楠
尤肖虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201811634918.6A priority Critical patent/CN109617991B/en
Publication of CN109617991A publication Critical patent/CN109617991A/en
Application granted granted Critical
Publication of CN109617991B publication Critical patent/CN109617991B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints
    • H04W28/14Flow control between communication endpoints using intermediate storage

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a value function approximation-based ultra-dense heterogeneous network small station code cooperation caching method. Expressing a value function as a function of state and action by adopting a reinforcement learning method of value function approximation, taking the number of file requests directly served by a maximized average accumulated substation as an optimization target, continuously interacting with the environment to adapt to the dynamic change of the environment, excavating a potential file request transfer mode to obtain an approximation formula of the value function, and further obtaining a cooperative caching decision matched with the file request transfer mode; and the macro base station encodes the cooperative caching decision and transmits the encoded cooperative caching result to each small station. The invention makes a cache decision by a transfer mode of a file request in a real network excavated by reinforcement learning, does not need any hypothesis on prior distribution of data, and is more suitable for an actual system; and through real-time interaction with the environment, the popularity of the time-varying file can be tracked, a corresponding caching strategy is made, the process is simple and feasible, and the NP-hard problem does not need to be solved.

Description

Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network
Technical Field
The invention belongs to the technical field of wireless network deployment in mobile communication, and particularly relates to a super-dense heterogeneous network small station code cooperation caching method.
Background
With the popularization of intelligent terminals and the development of internet services, in order to meet the requirements of users on high data rate and high service quality, an ultra-dense heterogeneous network will become one of the key technologies of a fifth generation mobile communication system (5G), and by deploying dense small stations in the coverage range of a macro base station, the communication quality of users at the edge of the network can be effectively improved, so that the spectrum efficiency and the system throughput are improved. However, since the small stations are connected to the macro base station through the wireless backhaul link, the densely deployed small stations put a great strain on the wireless backhaul link, and the highly loaded wireless backhaul link becomes a bottleneck of the network. Ultra-dense network architectures are highly desirable to be integrated with other network architectures or technologies to better serve users, and mobile network marginalization is a suitable option. The edge storage is an important concept in the mobile network edge architecture, namely, the file is cached in a small station to reduce mass data transmission in a peak period, so that the load of a return link of a system can be effectively reduced, the transmission delay is reduced, and the user experience is improved. The number of the small stations in the super-dense heterogeneous network is large, the distance is short, the user is generally in the coverage range of the small stations, and if the small stations transmit files for the user in a cooperation mode, the limited cache space of the small stations can be utilized more fully. The problem of edge caching in ultra-dense heterogeneous networks is therefore worth intensive research.
The existing caching technology is used for modeling caching decisions into an optimization problem. Firstly, the file popularity is generally considered to be invariable along with time, the file popularity in an actual network is constantly changeable, and the method for solving the optimization problem based on the invariable file popularity cannot track the constant change of the file popularity, so that the obtained caching decision cannot be well suitable for the actual network; secondly, even if the constant file popularity is changed into the instant file popularity, the file popularity is changed once, the optimization problem needs to be operated again once, which brings huge network overhead, and moreover, the modeling optimization problem is often an NP-hard (Non-Polynomial hard) problem, and the solution is very difficult; finally, because the caching problem is that a caching decision is made according to file request behaviors which have already occurred in the network to prepare for the file request behaviors which will occur, the method for making the caching decision based on the traditional solution optimization problem cannot mine the transfer mode of the file request in the network, so that the made caching decision is not optimal for the file request which will occur.
Disclosure of Invention
In order to solve the technical problems provided by the background technology, the invention provides a value function approximation-based ultra-dense heterogeneous network small station coding cooperative caching method, a potential transfer mode of a file request is mined by adopting the value function approximation method, and a cooperative caching strategy superior to that of the traditional method is obtained.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
the macro base station and the small stations in the coverage area of the macro base station are used as machines, the macro base station is responsible for determining actions to be executed by the small stations in each time slot state and sending the actions to the small stations, each small station is responsible for executing the actions, the states comprise the file popularity of the time slot and a cooperative caching decision made in the previous time slot, and the actions are the cooperative caching decision made in the current time slot for requesting service for the file in the next time slot; expressing a value function as a function of state and action by adopting a reinforcement learning method of value function approximation, taking the number of file requests directly served by a maximized average accumulated substation as an optimization target, continuously interacting with the environment to adapt to the dynamic change of the environment, excavating a potential file request transfer mode to obtain an approximation formula of the value function, and further obtaining a cooperative caching decision matched with the file request transfer mode; and the macro base station encodes the cooperative caching decision and transmits the encoded cooperative caching result to each small station.
Further, the method comprises the following steps:
step 1, collecting network information, and setting parameters:
collecting macro base station set M, small station set P and file request set C in network1And the number p of small stations in the coverage area of the mth macro base stationmM belongs to M; obtaining a small station cache space K, and determining the station cache space K by an operator according to the network operation condition and the hardware cost; an operator divides one day time into T time slots according to the file request condition in the ultra-dense heterogeneous network, sets the time starting point of each time slot, and divides each time slot into three stages according to the occurrence time: a file transmission stage, an information exchange stage and a cache decision stage;
step 2, formulating a base station cooperation caching scheme based on MDS coding:
recording a cooperative caching decision vector of the small station as a (t), wherein each element a in the a (t)c(t)∈[0,1],c∈C1Representing the proportion of the buffering of the c-th file at the t-th slot, acThe file set of (t) ≠ 0 is the file set of the t-slot cache, which is marked as C' (t), the C-th file contains B information bits, and the m-th macro base station passes throughMDS codes generate B information bits by coding
Figure BDA0001929822840000031
Individual check bits:
Figure BDA0001929822840000032
in the above formula, d is the number of small stations with received signal power greater than a threshold, the threshold is determined by the operator according to the network operation condition, all
Figure BDA0001929822840000033
Each check bit is divided into a small station candidate bit and a macro base station candidate bit, wherein the small station candidate bit comprises pmB bits, that is, each small station has B candidate bits which are not repeated mutually, and each small station selects a front a from the respective candidate bits in the t time slotc(t) buffering B bits;
the macro base station randomly selects (1-da) from candidate bits thereofc(t)) B bits are cached, and according to the coding property of the MDS, the whole file can be recovered by acquiring at least B check bits from one file request;
step 3, formulating a base station cooperative transmission scheme:
each file request of a user first gets da from the d substations covering itc(t) B bits, if dac(t) is more than or equal to 1, the macro base station does not need to transmit data; otherwise, the macro base station selects one small station closest to the user from the d small stations and transmits (1-da)c(t)) B bits to the small station, and then the small station transmits the bits to the user, and the data transmitted by the macro base station is called backhaul link load;
step 4, describing a reinforcement learning task by using a Markov Decision Process (MDP):
establishing reinforcement learning quadruplets
Figure BDA0001929822840000041
Wherein X represents a state space and A represents an action space,
Figure BDA0001929822840000042
Representing the probability of a state transition, the probability of performing action a transition to the x' state in the x state,
Figure BDA0001929822840000043
represents a reward for the transfer;
the specific form of reinforcement learning quadruple is as follows:
an action space: since the number of elements contained in the cache decision vector is equal to the set C1The number of elements C, so that the motion space is a C-dimensional continuous space, ac(t) is quantized into L discrete values, L is determined by an operator according to the computing capacity of the macro station, and the discretized motion space is A ═ { a ═1,a2,…,a|A|Any one of them is a motion vector
Figure BDA0001929822840000044
j belongs to {1,2, …, | A | } needs to satisfy
Figure BDA0001929822840000045
The total number of the action vectors meeting the condition is | A |, and the caching decision a (t) of the t-th time slot belongs to A;
state space: p in the coverage area of mth macro station in the t time slotmThe total times of file requests of the small station are recorded as a vector N (t) [ N ]1(t),N2(t),…,NC(t)]The total file popularity is recorded as a vector theta (t) ═ theta1(t),θ2(t),…,θC(t)]Wherein
Figure BDA0001929822840000046
C ∈ C, the state of the tth time slot is recorded as x (t) ═ Θ (t), a (t-1)](ii) a Let H ═ Θ12,…,Θ|H|The total file popularity set is denoted as Θ (t), which is an element in the set H after quantization, and the state space is denoted as X ═ X1,x2,…,x|H||A|State X (t) e X;
probability of state transition: after the action a (t) is performed in the t-th time slot, the action is applied to the current state x (t), and the environment is converted from the current state with potential transition probability
Figure BDA0001929822840000047
Transition to the next state x (t +1), the transition probability being unknown;
reward: at the same time that the context is transferred to x (t +1), the context will give the machine a reward
Figure BDA0001929822840000048
Defined here as the number of file requests served directly by the cell:
Figure BDA0001929822840000051
in the above formula, u [. cndot.)]Which represents a step function of the measured value,
Figure BDA0001929822840000052
in order to update the number of files to be transmitted in the buffering decision stage of the t-th time slot,
Figure BDA0001929822840000053
the number of files transmitted by the macro base station in the information exchange phase of the (t +1) th time slot;
step 5, defining a reinforcement learning target:
defining a deterministic policy function pi (X), X ∈ X, according to which the action a to be executed under state X (t) is known (t) ═ pi (X (t)), the state value function:
Figure BDA0001929822840000054
in the above formula, the first and second carbon atoms are,
Figure BDA0001929822840000055
representing the progressive prize awarded by using the strategy pi from the state x (t)Also, γ < 1 > 0 is a measure of how much the action π (x (t)) performed by t slots affects the future states;
after the state value function is obtained, a state-action value function, namely a Q function, is obtained:
Figure BDA0001929822840000056
in the above formula, the first and second carbon atoms are,
Figure BDA0001929822840000057
a '(t)) represents the accumulated reward brought by using the strategy pi after the action a' (t) is executed from the state x (t);
replacing x (t), x (t +1), a '(t) with x, x', a, respectively, with the goal of finding a desired jackpot
Figure BDA0001929822840000058
The maximum strategy is denoted as π*(x) The optimum value function is
Figure BDA0001929822840000059
Obtaining the following according to the optimal strategy:
Figure BDA0001929822840000061
namely:
Figure BDA0001929822840000062
step 6, formulating a Q-learning process based on value function approximation:
(601) expressing the Q function by approximation of a value function, i.e. expressing the Q function as a function of state and action, subject to transient rewards
Figure BDA0001929822840000063
In state x (t), the action a' (t) is performed, and the Q function is approximately expressed as:
Figure BDA0001929822840000064
in the above formula, ω1And ω2Weight representing two parts, set ω1>>ω2,β,ηi,ξiThe parameters are unknown parameters and need to be obtained through learning;
(602) and (3) solving a cooperative caching decision:
Figure BDA0001929822840000065
(603) establishing a Q-learning goal:
Figure BDA0001929822840000066
calculating the real value of the accumulated prize in carrying out the action a (t) in the state x (t) according to the above formula:
Figure BDA0001929822840000067
in the above formula, the first and second carbon atoms are,
Figure BDA0001929822840000068
is the motion estimation value under the state x (t + 1);
(604) defining a loss function:
Figure BDA0001929822840000071
in the above formula, eta ═ eta12,…,ηC],ξ=[ξ12,…,ξC],EπExpressing the expectation of the strategy pi;
updating parameters beta, eta and xi according to the loss function;
step 7, setting the current time slot t as 1, and randomly setting the starting state x (t) as [ theta (t), a (t-1)]Initial value of parameter betap=0,ηp=0,ξpThe operator sets the value of γ in the range of 0,1 according to the network change speed, and determines the value of the update step δ in the range of (0, 1) according to the order of magnitude of the parameter to be updated]Setting the number t of training time slots according to the network scaletotal
Step 8, in a cache decision stage of a t time slot, a strategy of an epsilon-greedy method is used for taking a cooperative cache decision a (t) to be executed under a state x (t);
step 9, the macro base station carries out MDS coding on the files needing to be cached according to the step 2, and transmits the coded data packets to the small station for caching;
step 10, in the file transmission stage of the t +1 time slot, a user requests a file, and the base station performs cooperative transmission to serve the user according to the step 3;
step 11, in an information exchange stage of a t +1 time slot, reporting the file request times of all the small stations in the coverage range of each macro base station to the macro base station in the t +1 time slot, summarizing the total file request times by the macro base station to be recorded as a vector N (t +1), and calculating the total file popularity to be recorded as a vector theta (t + 1);
step 12, the state to be shifted to is x (t +1) ═ Θ (t +1), a (t)]Calculating a reward function
Figure BDA0001929822840000072
Step 13, estimating the action to be executed in the state x (t + 1):
Figure BDA0001929822840000073
step 14, updating parameters in the Q function approximation formula according to the step (604);
step 15, if t ═ ttotalIf yes, stopping training and entering step 16; otherwise, t is t +1, enter the next time slot, go back to step 8, continue training;
and step 16, starting from the t time slot, determining a cooperative caching decision based on the Q function approximation formula obtained by training, and serving a file request of the next time slot.
Further, in step 3, the determination method of d is as follows:
let the probability of a user being served by d' cell be pd'Firstly, based on the base station deployment situation of the operator, p is obtained by calculation according to the historical data of the user positiond': in a time period tau, the positions of U users are respectively recorded at intervals of tau ', tau and tau' are automatically determined by an operator according to the network operation condition, and the number of base stations d 'with the base station number being d' is recorded, wherein the received signal power of the user U belongs to {1,2, …, U } at each position is greater than a threshold value
Figure BDA0001929822840000081
The historical positions of the U users are used for calculation to obtain:
Figure BDA0001929822840000082
in the above formula, the first and second carbon atoms are,
Figure BDA0001929822840000083
indicating the number of positions where i base stations provide service for the user u in the historical position of the user u;
then, d is selected as the probability value pdThe 'largest d':
Figure BDA0001929822840000084
further, in step (602), due to ω1>>ω2Omission of
Figure BDA0001929822840000085
Obtaining a caching decision:
Figure BDA0001929822840000086
the solution of the above equation is as follows:
according tomaxd/L is more than or equal to 1 to determine the maximum value of the elements in the cache decision vector, LmaxIs the denominator of the largest element, since l is within the range satisfying the inequalitymaxThe smaller the size, the better, therefore
Figure BDA0001929822840000087
Figure BDA0001929822840000088
Represents rounding up;
secondly, calculating the number z of each element i/L in the caching decision vector according to the caching space of the base stationi,i=1,2,…,lmax
Figure BDA0001929822840000091
Wherein
Figure BDA0001929822840000092
Represents rounding down;
determining the position of each element: coefficient of curvature etaiθi(t), i is 1,2, …, C is arranged in descending order, the j-th element after sorting is marked as
Figure BDA0001929822840000093
Corresponding to the h-th before sortingjThe document firstly preliminarily determines the positions of the elements:
Figure BDA0001929822840000094
then, adjust
Figure BDA0001929822840000095
In which condition 1-l is satisfiedmaxd/L < 0, from
Figure BDA0001929822840000096
Starting to j ═ 1, the following steps are looped to adjust the elements in the motion vector: from
Figure BDA0001929822840000097
To find out the satisfying condition
Figure BDA0001929822840000098
And
Figure BDA0001929822840000099
minimum j' of
Figure BDA00019298228400000910
The ratio is reduced by 1/L,
Figure BDA00019298228400000911
adding 1/L;
also in the estimation step 13 using the above solution
Figure BDA00019298228400000912
Further, in step 8, a cooperative caching decision is selected according to step (602) with a probability of 1-epsilon; randomly selecting one satisfying condition by probability epsilon
Figure BDA0001929822840000101
And
Figure BDA0001929822840000102
to coordinate caching decisions.
Further, in step (604), a random gradient descent method is adopted to update parameters β, η, ξ in the Q-function approximation expression:
Figure BDA0001929822840000103
in the above formula betac
Figure BDA0001929822840000104
Parameter, β, representing the current time slotp
Figure BDA0001929822840000105
Represents the parameter of the previous time slot, and the updating step length is represented by delta less than or equal to 1 and more than 0.
Adopt the beneficial effect that above-mentioned technical scheme brought:
the invention provides service for users by utilizing small-station cooperative coding caching and cooperative transmission, makes caching decision by mining the transfer mode of the file request in the collected real network through reinforcement learning, is used as a data-driven machine learning method, does not need any hypothesis on prior distribution of data, and is more suitable for an actual system; and through real-time interaction with the environment, the popularity of the time-varying file can be tracked, a corresponding caching strategy is made, the process is simple and feasible, and the NP-hard problem does not need to be solved.
The invention makes a cooperative caching decision based on a value function approximation method, the macro base station collects state information through continuous interaction with the environment, makes a corresponding cooperative caching decision and transmits the decision to each small station, so that the most accurate files can be cached by effectively utilizing the limited storage space of the small stations, the number of file requests directly served by the small stations is obviously increased, and the load of a return link of a system is reduced.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
The invention provides a super-dense heterogeneous network small station coding cooperation caching method based on value function approximation, which aims to maximize the number of file requests directly served by an average accumulated small station and on the premise that the total size of small station caching files does not exceed a small station caching space. According to the method, a transfer mode of a file request is mined through reinforcement learning, and a small station code cooperation caching method is formulated according to the mined mode. The reinforcement learning is described as an MDP (Markov Decision Process), the macro base station and the small stations in the coverage area of the macro base station are used as machines, the macro base station is responsible for determining actions to be executed and sending the actions to the small stations, the small stations are responsible for executing the actions and changing the environment, the environment is fed back to the machines according to a reward function to obtain a reward, and the actions to be executed by the small stations in the state of each time slot are learned through continuous interaction with the environment, wherein the state is partial description of the environment observed by the macro base station and comprises the file popularity of the time slot and a cooperative caching Decision made in the previous time slot, and the actions are cooperative caching decisions made in the time slot and used for requesting services for files in the next time slot. The reward function is defined according to the goal of the caching decision, here defined as the number of file requests served directly by the child station. Value function approximation (value function approximation) is a reinforcement learning method, and is suitable for the condition that a reinforcement learning task is performed in a huge discrete state space or a continuous state space, a value function is expressed as a function of a state and an action, the number of file requests directly served by a maximum average accumulation small station is taken as an optimization target, the method can adapt to the dynamic change of an environment through continuous interaction with the environment, a potential file request transfer mode can be mined, an approximation formula of the value function is obtained, and a cooperative caching decision matched with the file request transfer mode is further obtained. The macro base station combines an MDS (maximum Distance separable) coding method to code the file, and finally, the coding cooperation cache result is transmitted to each small station, so that the file request number directly served by the small station is obviously increased, and the system return link load is reduced.
An embodiment is given below by taking an LTE-a system as an example, and as shown in fig. 1, the specific steps are as follows:
the first step is as follows: collecting network information, and setting parameters:
collecting macro base station set M, small station set P and file request set C in network1And the number p of small stations in the coverage area of the mth macro base stationmM is equal to M, set C1C files are contained; obtaining a small station cache space K, and determining the station cache space K by an operator according to the network operation condition and the hardware cost; the operator divides the time of day into T time slots according to the file request situation in the ultra-dense heterogeneous network,and setting a time starting point of each time slot, and dividing each time slot into three stages according to the occurrence time: file transmission stage, information exchange stage and buffer decision stage.
The second step is that: formulating a base station cooperation caching scheme based on MDS coding:
the cooperative caching decision vector of the small station is recorded as a (t) ═ a1(t),a2(t),…,aC(t)]Wherein 0 is not less than ac(t) is less than or equal to 1, C epsilon C represents the proportion of the C-th file buffered in the t-th time slot substation, acA file set (namely a file set cached in a t time slot) with (t) ≠ 0 is marked as C' (t), the file C contains B information bits, and the macro base station m generates the B information bits by encoding through MDS
Figure BDA0001929822840000121
Individual check bits:
Figure BDA0001929822840000122
wherein d is the number of small stations with received signal power greater than a threshold value, the threshold value is determined by an operator according to the network operation condition, and all the stations are connected with the network
Figure BDA0001929822840000123
Each check bit is divided into a small station candidate bit and a macro base station candidate bit, wherein the small station candidate bit comprises pmB bits, that is, each small station has B candidate bits which are not repeated mutually, and each small station selects a front a from the respective candidate bits in the t time slotc(t) buffering B bits;
the macro base station randomly selects (1-da) from candidate bits thereofc(t)) B bits are buffered, and according to the coding property of the MDS, the whole file can be recovered by acquiring at least B check bits once when the file is requested.
The third step: and (3) formulating a base station cooperative transmission scheme:
each file request of a user first gets da from the d substations covering itc(t) B bits, if dac(t) is more than or equal to 1, the macro base station does not need to transmit data; otherwise, the macro base station selects one small station closest to the user from the d small stations and transmits (1-da)c(t)) B bits to the small station, which then transmits the bits to the user, the data transmitted by the macro base station is called backhaul link load. d, a determination method:
the probability of a user being served by d' cell is pd'Firstly, based on the base station deployment situation of the operator, p is obtained by calculation according to the historical data of the user positiond': in a time period tau, the positions of U users are respectively recorded at intervals of tau ', tau and tau' are automatically determined by an operator according to the network operation condition, and the number of base stations d 'with the base station number being d' is recorded, wherein the received signal power of the user U belongs to {1,2, …, U } at each position is greater than a threshold value
Figure BDA0001929822840000131
The historical positions of the U users are used for calculation to obtain:
Figure BDA0001929822840000132
wherein
Figure BDA0001929822840000133
Indicating the number of locations where i base stations served user u in the historical location of user u.
Choosing d as the probability value pd'Maximum d':
Figure BDA0001929822840000134
the fourth step: the reinforcement learning task is described by MDP:
Figure BDA0001929822840000135
wherein X represents a state space, A represents an action space,
Figure BDA0001929822840000136
representing the probability of a state transition, the probability of performing action a transition to the x' state in the x state,
Figure BDA0001929822840000137
represents a reward for the transfer;
the specific form of reinforcement learning quadruples in this problem is as follows:
1. an action space: the action is defined as a cooperative caching decision vector of the small station, the action which can be taken by the machine forms an action space, the number of elements contained in the caching decision vector is equal to the number C of the files, the action space is a C-dimensional continuous space, and each dimension is more than or equal to 0 and less than or equal to acAnd C epsilon is quantized into L discrete values, L is determined by an operator according to the computing capability of the macro station, and the discretized action space is A ═ a ≦ 11,a2,…,a|A|Any one of them is a motion vector
Figure BDA0001929822840000141
j belongs to {1,2, …, | A | } needs to satisfy
Figure BDA0001929822840000142
The total number of motion vectors satisfying this condition is | a |, and the t-th time slot is determined by the cache decision a (t) e a.
2. State space: the state is the description of the environment where the machine senses, and consists of a file popularity vector and a cooperative caching decision vector of a small station, for example, p in the coverage area of the mth macro station at the tth time slotmThe total times of file requests of the small station are recorded as a vector N (t) ═ N1(t),N2(t),…,NC(t)]The total file popularity is recorded as a vector theta (t) ═ theta1(t),θ2(t),…,θC(t)]Wherein
Figure BDA0001929822840000143
C ∈ C, the state of the tth time slot is recorded as x (t) ═ Θ (t), a (t-1)](ii) a Let H ═ Θ12,…,Θ|H|Is the total set of file popularity, Θ (t) is quantifiedThat is, one element in the set H, the state space is denoted as X ═ X1,x2,…,x|H||A|State X (t) e X.
3. Probability of state transition: after the action a (t) is performed in the t-th time slot, the action is applied to the current state x (t), and the environment is converted from the current state with potential transition probability
Figure BDA0001929822840000144
Transition to the next state x (t +1), the transition probability is unknown.
4. Reward: at the same time that the context is transferred to x (t +1), the context will give the machine a reward
Figure BDA0001929822840000145
Defined here as the number of file requests served directly by the cell:
Figure BDA0001929822840000146
wherein u [. C]Representing a step function, wherein when the value in the brackets is more than 0, the function value is 1, otherwise, the function value is 0;
Figure BDA0001929822840000147
in order to update the number of files to be transmitted in the buffering decision stage of the t-th time slot,
Figure BDA0001929822840000151
is the number of files transmitted by the macro base station during the information exchange phase of the (t +1) th time slot.
The fifth step: and (3) clear reinforcement learning target:
defining a deterministic policy function pi (X), X ∈ X, according to which the action a to be executed under state X (t) is known (t) ═ pi (X (t)); defining a gamma discount expected accumulated prize function:
Figure BDA0001929822840000152
wherein EπMeaning that the expectation for the strategy pi is made,
Figure BDA0001929822840000153
representing the cumulative reward due to the use of the policy π, starting from state x (t), 0 ≦ γ < 1 is a measure of the degree of influence of the action π (x (t)) performed for the t slot on the future state.
After obtaining the state value function, a state-action value function (Q function) is obtained:
Figure BDA0001929822840000154
Figure BDA0001929822840000155
representing the accumulated reward brought by using the strategy pi after the action a' (t) is executed from the state x (t), and the equations (4) and (5) are called as Bellman equations.
Replacing x (t), x (t +1), a '(t) with x, x', a, respectively, with the goal of finding a desired jackpot
Figure BDA0001929822840000156
The maximum strategy is denoted as π*(x) The optimum value function is
Figure BDA0001929822840000157
According to the formulas (4) and (5) under the optimal strategy, the following can be obtained:
Figure BDA0001929822840000158
namely:
Figure BDA0001929822840000159
(6) the two formulas (7) reveal the improvement mode of the non-optimal strategy, namely, the action selected by the strategy is changed into the current optimal action:
Figure BDA0001929822840000161
under the condition that the reinforcement learning quadruple is known, solving the Bellman equation based on an equation (8) available value iterative algorithm or a strategy iterative algorithm to obtain an optimal strategy.
And a sixth step: under the condition that the state transition probability is unknown, based on a Q-learning process of value function approximation:
because the state transition probability is unknown, an optimal strategy cannot be obtained through a strategy iteration algorithm or a value iteration algorithm; meanwhile, the conversion from a state value function to a Q function is difficult due to the unknown state transition probability, so that the Q function is directly estimated;
1. the Q function approximates: in order to solve the difficulty of Q-table storage and traversal search caused by large state space and action space, a value function approximation method is used for representing a Q function, namely the Q function is represented as a function of state and action, and is rewarded instantaneously
Figure BDA0001929822840000162
Taking t slots as an example, in state x (t), the action a' (t) is performed, and the Q function is approximately expressed as:
Figure BDA0001929822840000163
wherein ω is1And ω2Weight representing two parts, set ω1>>ω2,β,ηi,ξiAre unknown parameters and need to be learned.
2. Selection of a collaborative caching decision:
Figure BDA0001929822840000164
due to omega1>>ω2Omission of
Figure BDA0001929822840000165
Obtaining a caching decision:
Figure BDA0001929822840000171
(11) formula A collaborative caching policy that maximizes the value in middle brackets, as can be seen from the expression in middle brackets, is associated with (1-da'i(t)) multiplied factor ηiθi(t) is directly related to the magnitude of the value in parentheses, ηiθiThe larger (t) is, the corresponding (1-da'i(t)) should be smaller so that the larger the value in parentheses will be. Therefore, the solving process of equation (11) is as follows:
according tomaxd/L is more than or equal to 1 to determine the maximum value of the elements in the cache decision vector, LmaxIs the denominator of the largest element, since l is within the range satisfying the inequalitymaxThe smaller the size, the better, therefore
Figure BDA0001929822840000172
Figure BDA0001929822840000173
Represents rounding up;
② calculating each element i/L, i-1, 2, …, L in the buffer decision vectormaxNumber z of (2)i
Figure BDA0001929822840000174
Wherein
Figure BDA0001929822840000175
Represents rounding down;
determining the position of each element: coefficient of curvature etaiθi(t), i is 1,2, …, C is arranged in descending order, the j-th element after sorting is marked as
Figure BDA0001929822840000176
Corresponding to the h-th before sortingjThe document firstly preliminarily determines the positions of the elements:
Figure BDA0001929822840000181
then, adjust
Figure BDA0001929822840000182
In which condition 1-l is satisfiedmaxd/L < 0, from
Figure BDA0001929822840000183
Starting to j ═ 1, the following steps are looped to adjust the elements in the motion vector: from
Figure BDA0001929822840000184
To find out the satisfying condition
Figure BDA0001929822840000185
And
Figure BDA0001929822840000186
minimum j' of
Figure BDA0001929822840000187
The ratio is reduced by 1/L,
Figure BDA0001929822840000188
adding 1/L.
3. Q-learning goal:
substituting equation (6) into equation (5) can obtain:
Figure BDA0001929822840000189
(14) the formula discloses a calculation method for the accumulated prize true value by executing the action a (t) under the state x (t):
Figure BDA00019298228400001810
wherein
Figure BDA00019298228400001811
The estimated value of the motion in the state x (t +1) is estimated according to step 2.
Defining a loss function:
Figure BDA00019298228400001812
wherein the parameter vector eta ═ eta12,…,ηC],ξ=[ξ12,…,ξC]The goal of Q-learning is to make the estimated and true values of the Q function as close as possible, i.e., to minimize the loss function.
4. And updating parameters beta, eta and xi in the approximate expression of the Q function by adopting a random gradient descent method:
Figure BDA0001929822840000191
wherein beta isc
Figure BDA0001929822840000192
Parameter, β, representing the current time slotp
Figure BDA0001929822840000193
Represents the parameter of the previous time slot, and the updating step length is represented by delta less than or equal to 1 and more than 0.
The seventh step: setting a current time slot t ═ 1, and randomly setting a start state x (t) ═ Θ (t), a (t-1)]Initial value of parameter betap=0,ηp=0,ξpThe operator sets the value of γ to 0, depending on how fast the network changes, in the range 0,1, according to which it is updatedThe order of magnitude of the parameter(s) determines the value of δ, in the range (0, 1)]Setting the number t of training time slots according to the network scaletotal
Eighth step: in the cache decision stage of the t time slot, a strategy using an epsilon-greedy method takes a cooperative cache decision a (t) to be executed under a state x (t): selecting a cooperative caching decision according to the step 2 in the sixth step according to the probability 1-epsilon; randomly selecting one satisfying condition by probability epsilon
Figure BDA0001929822840000194
And
Figure BDA0001929822840000195
to coordinate caching decisions.
The ninth step: and the macro base station carries out MDS coding on the files needing to be cached according to the second step, and transmits the coded data packets to the small station for caching.
The tenth step: and in the file transmission stage of the (t +1) th time slot, the user requests a file, and the base station serves the user according to the third step of cooperative transmission.
The eleventh step: in the information exchange stage of the (t +1) th time slot, all the small stations in the coverage range of each macro base station report the file request times of the small stations in the (t +1) th time slot to the macro base station, the macro base station collects the total file request times and records the total file request times as a vector N (t +1), and the total file popularity is calculated and recorded as a vector theta (t + 1).
The twelfth step: the state of transition to is x (t +1) ═ Θ (t +1), a (t)]Calculating a reward function according to the formula (3)
Figure BDA0001929822840000201
The thirteenth step: estimating the action to be performed in the state x (t +1) according to step 2 in the sixth step:
Figure BDA0001929822840000202
the fourteenth step is that: and updating parameters in the Q function approximation formula according to the formula (17).
Fifteenth aspect of the inventionThe method comprises the following steps: if t is ttotalIf so, stopping training and entering the sixteenth step; otherwise, t is t +1, enter the next time slot, go back to the eighth step, continue training.
Sixteenth, step: and (5) determining a cooperative caching decision to serve the file request of the next time slot according to the step 2 in the sixth step based on the Q function approximation formula obtained by training from the t time slot.
According to the process, in the Q function learning process, the macro base station and the small stations in the coverage area of the macro base station are used as machines, the file popularity and the cooperative caching decision of the small stations are used as states, the cooperative caching decision is used as an action, the file request number directly served by the small stations is used as a reward function, interaction is continuously carried out with the environment, the maximum accumulated reward function is used as a target, a Q function approximation formula is obtained through learning, the cooperative caching decision in each state is further obtained, then the macro base station encodes the files to be cached by using MDS, and the encoding result is transmitted to each small station for cooperative caching. The method utilizes a reinforcement learning method to find the mode from the data without solving the optimization problem based on data distribution. The method can track the popularity of the files which change in real time, fully excavate and make a cooperative caching decision by utilizing a potential file request transfer mode, is more suitable for an actual system, obviously improves the number of file requests directly served by a small station, effectively reduces the load of a return link of the system, provides the performance of the system, and improves the user experience.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims (5)

1. The super-dense heterogeneous network small station code cooperative caching method based on value function approximation is characterized by comprising the following steps: the macro base station and the small stations in the coverage range of the macro base station are used as machines, the macro base station is responsible for determining actions to be executed by the small stations in each time slot state and transmitting the actions to the small stations, and the small stations are responsible for executing the actions, the states comprise the file popularity of the time slot and a cooperative caching decision made in the previous time slot, and the actions are the cooperative caching decision made in the current time slot for requesting service for the file of the next time slot; expressing a value function as a function of state and action by adopting a reinforcement learning method of value function approximation, taking the number of file requests directly served by a maximized average accumulated substation as an optimization target, continuously interacting with the environment to adapt to the dynamic change of the environment, excavating a potential file request transfer mode to obtain an approximation formula of the value function, and further obtaining a cooperative caching decision matched with the file request transfer mode; the macro base station encodes the cooperative caching decision and transmits the encoded cooperative caching result to each small station;
the method comprises the following steps:
step 1, collecting network information, and setting parameters:
collecting macro base station set M, small station set P and file request set C in network1And the number p of small stations in the coverage area of the mth macro base stationmM belongs to M; obtaining a small station cache space K, and determining the station cache space K by an operator according to the network operation condition and the hardware cost; an operator divides one day time into T time slots according to the file request condition in the ultra-dense heterogeneous network, sets the time starting point of each time slot, and divides each time slot into three stages according to the occurrence time: a file transmission stage, an information exchange stage and a cache decision stage;
step 2, formulating a base station cooperation caching scheme based on MDS coding:
recording a cooperative caching decision vector of the small station as a (t), wherein each element a in the a (t)c(t)∈[0,1],c∈C1Representing the proportion of the buffering of the c-th file at the t-th slot, acThe file set of (t) ≠ 0 is a file set of the t time slot cache and is marked as C' (t), the C-th file contains B information bits, and the m-th macro base station encodes and generates the B information bits through MDS (Multi-dimensional System) encoding
Figure FDA0002926919210000011
Individual check bits:
Figure FDA0002926919210000012
in the above formula, d is the number of small stations with received signal power greater than a threshold, the threshold is determined by the operator according to the network operation condition, all
Figure FDA0002926919210000021
Each check bit is divided into a small station candidate bit and a macro base station candidate bit, wherein the small station candidate bit comprises pmB bits, that is, each small station has B candidate bits which are not repeated mutually, and each small station selects a front a from the respective candidate bits in the t time slotc(t) buffering B bits; the macro base station randomly selects (1-da) from candidate bits thereofc(t)) B bits are cached, and according to the coding property of the MDS, the whole file can be recovered by acquiring at least B check bits from one file request;
step 3, formulating a base station cooperative transmission scheme:
each file request of a user first gets da from the d substations covering itc(t) B bits, if dac(t) is more than or equal to 1, the macro base station does not need to transmit data; otherwise, the macro base station selects one small station closest to the user from the d small stations and transmits (1-da)c(t)) B bits to the small station, and then the small station transmits the bits to the user, and the data transmitted by the macro base station is called backhaul link load;
step 4, describing a reinforcement learning task by using a Markov Decision Process (MDP):
establishing reinforcement learning quadruplets
Figure FDA0002926919210000022
Wherein X represents a state space, A represents an action space,
Figure FDA0002926919210000023
representing the probability of a state transition, the probability of performing action a transition to the x' state in the x state,
Figure FDA0002926919210000024
represents a reward for the transfer;
the specific form of reinforcement learning quadruple is as follows:
an action space: since the number of elements contained in the cache decision vector is equal to the set C1The number of elements C, so that the motion space is a C-dimensional continuous space, ac(t) is quantized into L discrete values, L is determined by an operator according to the computing capacity of the macro station, and the discretized motion space is A ═ { a ═1,a2,…,a|A|Any one of them is a motion vector
Figure FDA0002926919210000025
j belongs to {1,2, …, | A | } needs to satisfy the condition:
Figure FDA0002926919210000026
the total number of the action vectors meeting the condition is | A |, and the caching decision a (t) of the t-th time slot belongs to A;
state space: p in the coverage area of mth macro station in the t time slotmThe total times of file requests of the small station are recorded as a vector N (t) [ N ]1(t),N2(t),…,NC(t)]The total file popularity is recorded as a vector theta (t) ═ theta1(t),θ2(t),…,θC(t)]Wherein
Figure FDA0002926919210000031
Then the state of the tth slot is recorded as x (t) ═ Θ (t), a (t-1)](ii) a Let H ═ Θ12,…,Θ|H|The total file popularity set is denoted as Θ (t), which is an element in the set H after quantization, and the state space is denoted as X ═ X1,x2,…,x|H||A|State X (t) e X;
probability of state transition: after the action a (t) is performed in the t-th time slot, the action is applied to the current state x (t), and the environment is converted from the current state with potential transition probability
Figure FDA0002926919210000032
Transition to the next state x (t +1), the transition probability being unknown;
reward: at the same time that the context is transferred to x (t +1), the context will give the machine a reward
Figure FDA0002926919210000033
Defined here as the number of file requests served directly by the cell:
Figure FDA0002926919210000034
in the above formula, u [. cndot.)]Which represents a step function of the measured value,
Figure FDA0002926919210000035
in order to update the number of files to be transmitted in the buffering decision stage of the t-th time slot,
Figure FDA0002926919210000036
the number of files transmitted by the macro base station in the information exchange phase of the (t +1) th time slot;
step 5, defining a reinforcement learning target:
defining a deterministic policy function pi (X), X ∈ X, according to which the action a to be executed under state X (t) is known (t) ═ pi (X (t)), the state value function:
Figure FDA0002926919210000041
in the above formula, the first and second carbon atoms are,
Figure FDA0002926919210000042
representing the accumulated reward brought by using the strategy pi starting from the state x (t), 0 ≦ γ < 1 is a measure of the degree of influence of the action pi (x (t)) performed by the t slot on the future state;
after the state value function is obtained, a state-action value function, namely a Q function, is obtained:
Figure FDA0002926919210000043
in the above formula, the first and second carbon atoms are,
Figure FDA0002926919210000044
representing the accumulated reward brought by using the strategy pi after the action a' (t) is executed from the state x (t);
replacing x (t), x (t +1), a '(t) with x, x', a, respectively, with the goal of finding a desired jackpot
Figure FDA0002926919210000045
The maximum strategy is denoted as π*(x) The optimum value function is
Figure FDA0002926919210000046
Obtaining the following according to the optimal strategy:
Figure FDA0002926919210000047
namely:
Figure FDA0002926919210000048
step 6, formulating a Q-learning process based on value function approximation:
(601) expressing the Q function by approximation of a value function, i.e. expressing the Q function as a function of state and action, subject to transient rewards
Figure FDA0002926919210000049
In state x (t), the action a' (t) is performed, and the Q function is approximately expressed as:
Figure FDA0002926919210000051
in the above formula, ω1And ω2Weight representing two parts, set ω1>>ω2,β,ηi,ξiThe parameters are unknown parameters and need to be obtained through learning;
(602) and (3) solving a cooperative caching decision:
Figure FDA0002926919210000052
(603) establishing a Q-learning goal:
Figure FDA0002926919210000053
calculating the real value of the accumulated prize in carrying out the action a (t) in the state x (t) according to the above formula:
Figure FDA0002926919210000054
in the above formula, the first and second carbon atoms are,
Figure FDA0002926919210000055
is the motion estimation value under the state x (t + 1);
(604) defining a loss function:
Figure FDA0002926919210000056
in the above formula, eta ═ eta12,…,ηC],ξ=[ξ12,…,ξC],EπExpressing the expectation of the strategy pi;
updating parameters beta, eta and xi according to the loss function;
step 7, setting the current time slot t as 1, and randomly setting the starting state x (t) as [ theta (t), a (t-1)]Initial value of parameter betap=0,ηp=0,ξpThe operator sets the value of γ in the range of 0,1 according to the network change speed, and determines the value of the update step δ in the range of (0, 1) according to the order of magnitude of the parameter to be updated]Setting the number t of training time slots according to the network scaletotal
Step 8, in a cache decision stage of a t time slot, a strategy of an epsilon-greedy method is used for taking a cooperative cache decision a (t) to be executed under a state x (t);
step 9, the macro base station carries out MDS coding on the files needing to be cached according to the step 2, and transmits the coded data packets to the small station for caching;
step 10, in the file transmission stage of the t +1 time slot, a user requests a file, and the base station performs cooperative transmission to serve the user according to the step 3;
step 11, in an information exchange stage of a t +1 time slot, reporting the file request times of all the small stations in the coverage range of each macro base station to the macro base station in the t +1 time slot, summarizing the total file request times by the macro base station to be recorded as a vector N (t +1), and calculating the total file popularity to be recorded as a vector theta (t + 1);
step 12, the state to be shifted to is x (t +1) ═ Θ (t +1), a (t)]Calculating a reward function
Figure FDA0002926919210000061
Step 13, estimating the action to be executed in the state x (t + 1):
Figure FDA0002926919210000062
step 14, updating parameters in the Q function approximation formula according to the step (604);
step 15, if t ═ ttotalIf yes, stopping training and entering step 16; otherwise, t is t +1, enter the next time slot, go back to step 8, continue training;
and step 16, starting from the t time slot, determining a cooperative caching decision based on the Q function approximation formula obtained by training, and serving a file request of the next time slot.
2. The method for cooperative caching of codes of small stations in ultra-dense heterogeneous network based on value function approximation as claimed in claim 1, wherein in step 3, the determination method of d is as follows:
let the probability of a user being served by d' cell be pd'Firstly, based on the base station deployment situation of the operator, p is obtained by calculation according to the historical data of the user positiond': in a time period tau, the positions of U users are respectively recorded at intervals of tau ', tau and tau' are automatically determined by an operator according to the network operation condition, and the number of base stations d 'with the base station number being d' is recorded, wherein the received signal power of the user U belongs to {1,2, …, U } at each position is greater than a threshold value
Figure FDA0002926919210000071
The historical positions of the U users are used for calculation to obtain:
Figure FDA0002926919210000072
in the above formula, the first and second carbon atoms are,
Figure FDA0002926919210000073
indicating the number of positions where i base stations provide service for the user u in the historical position of the user u;
then, d is selected as the probability value pd'Maximum d':
Figure FDA0002926919210000074
3. the ultra-dense heterogeneous network small station coding cooperative caching method based on value function approximation as claimed in claim 1, wherein in step (602), ω is the factor of ω1>>ω2Omission of
Figure FDA0002926919210000075
Obtaining a caching decision:
Figure FDA0002926919210000076
the solution of the above equation is as follows:
according tomaxd/L is more than or equal to 1 to determine the maximum value of the elements in the cache decision vector, LmaxIs the denominator of the largest element, since l is within the range satisfying the inequalitymaxThe smaller the size, the better, therefore
Figure FDA0002926919210000077
Figure FDA0002926919210000078
Represents rounding up;
secondly, calculating the number z of each element i/L in the caching decision vector according to the caching space of the base stationi,i=1,2,…,lmax
Figure FDA0002926919210000079
Wherein
Figure FDA00029269192100000710
Represents rounding down;
determining the position of each element: coefficient of curvature etaiθi(t), i is 1,2, …, C is arranged in descending order, the j-th element after sorting is marked as
Figure FDA0002926919210000081
Corresponding to the h-th before sortingjThe document firstly preliminarily determines the positions of the elements:
Figure FDA0002926919210000082
then, adjust
Figure FDA0002926919210000083
In which condition 1-l is satisfiedmaxd/L < 0, from
Figure FDA0002926919210000084
Starting to j ═ 1, the following steps are looped to adjust the elements in the motion vector: from
Figure FDA0002926919210000085
To find out the satisfying condition
Figure FDA0002926919210000086
And
Figure FDA0002926919210000087
minimum j' of
Figure FDA0002926919210000088
The ratio is reduced by 1/L,
Figure FDA0002926919210000089
adding 1/L;
also in the estimation step 13 using the above solution
Figure FDA00029269192100000810
4. The method for cooperative caching of codes of small stations in ultra-dense heterogeneous networks based on value function approximation as claimed in claim 3, wherein in step 8, a cooperative caching decision is selected according to step (602) with a probability of 1-epsilon; randomly selecting one satisfying condition by probability epsilon
Figure FDA00029269192100000811
And
Figure FDA00029269192100000812
to coordinate caching decisions.
5. The method for cooperative caching of codes of small stations in ultra-dense heterogeneous network based on value function approximation as claimed in claim 1, wherein in step (604), a random gradient descent method is used to update parameters β, η, ξ:
Figure FDA0002926919210000091
in the above formula betac
Figure FDA0002926919210000092
Parameter, β, representing the current time slotp
Figure FDA0002926919210000093
Represents the parameter of the previous time slot, and the updating step length is represented by delta less than or equal to 1 and more than 0.
CN201811634918.6A 2018-12-29 2018-12-29 Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network Active CN109617991B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811634918.6A CN109617991B (en) 2018-12-29 2018-12-29 Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811634918.6A CN109617991B (en) 2018-12-29 2018-12-29 Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network

Publications (2)

Publication Number Publication Date
CN109617991A CN109617991A (en) 2019-04-12
CN109617991B true CN109617991B (en) 2021-03-30

Family

ID=66015366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811634918.6A Active CN109617991B (en) 2018-12-29 2018-12-29 Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network

Country Status (1)

Country Link
CN (1) CN109617991B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138836B (en) * 2019-04-15 2020-04-03 北京邮电大学 Online cooperative caching method based on optimized energy efficiency
CN110381540B (en) * 2019-07-22 2021-05-28 天津大学 Dynamic cache updating method for responding popularity of time-varying file in real time based on DNN
CN111311996A (en) * 2020-03-27 2020-06-19 湖南有色金属职业技术学院 Online education informationization teaching system based on big data
CN112218337B (en) * 2020-09-04 2023-02-28 暨南大学 Cache strategy decision method in mobile edge calculation
CN112672402B (en) * 2020-12-10 2022-05-03 重庆邮电大学 Access selection method based on network recommendation in ultra-dense heterogeneous wireless network
CN112911717B (en) * 2021-02-07 2023-04-25 中国科学院计算技术研究所 Transmission method for MDS (data packet System) encoded data packet of forwarding network
CN113132466B (en) * 2021-03-18 2022-03-15 中山大学 Multi-access communication method, device, equipment and medium based on code cache
CN115118728B (en) * 2022-06-21 2024-01-19 福州大学 Edge load balancing task scheduling method based on ant colony algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103929781A (en) * 2014-04-09 2014-07-16 东南大学 Cross-layer interference coordination optimization method in super dense heterogeneous network
CN104782172A (en) * 2013-09-18 2015-07-15 华为技术有限公司 Small station communication method, device and system
CN104955077A (en) * 2015-05-15 2015-09-30 北京理工大学 Heterogeneous network cell clustering method and device based on user experience speed
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN108882269A (en) * 2018-05-21 2018-11-23 东南大学 The super-intensive network small station method of switching of binding cache technology
CN110445825A (en) * 2018-05-04 2019-11-12 东南大学 Super-intensive network small station coding cooperative caching method based on intensified learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5857405B2 (en) * 2010-12-28 2016-02-10 ソニー株式会社 Information processing apparatus, playback control method, program, and content playback system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104782172A (en) * 2013-09-18 2015-07-15 华为技术有限公司 Small station communication method, device and system
CN103929781A (en) * 2014-04-09 2014-07-16 东南大学 Cross-layer interference coordination optimization method in super dense heterogeneous network
CN104955077A (en) * 2015-05-15 2015-09-30 北京理工大学 Heterogeneous network cell clustering method and device based on user experience speed
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN110445825A (en) * 2018-05-04 2019-11-12 东南大学 Super-intensive network small station coding cooperative caching method based on intensified learning
CN108882269A (en) * 2018-05-21 2018-11-23 东南大学 The super-intensive network small station method of switching of binding cache technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OFDMA毫微微小区双层网络中基于分组的资源分配;张海波;《电子与信息学报》;20160229;第38卷(第2期);262-268 *

Also Published As

Publication number Publication date
CN109617991A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109617991B (en) Value function approximation-based cooperative caching method for codes of small stations of ultra-dense heterogeneous network
CN112118601B (en) Method for reducing task unloading delay of 6G digital twin edge computing network
CN110113190B (en) Unloading time delay optimization method in mobile edge computing scene
CN110445825B (en) Super-dense network small station code cooperation caching method based on reinforcement learning
Zhang et al. Deep learning for wireless coded caching with unknown and time-variant content popularity
CN111629380B (en) Dynamic resource allocation method for high concurrency multi-service industrial 5G network
CN110167176B (en) Wireless network resource allocation method based on distributed machine learning
CN112637806B (en) Transformer substation monitoring system based on deep reinforcement learning and resource scheduling method thereof
CN114745383A (en) Mobile edge calculation assisted multilayer federal learning method
CN116782296A (en) Digital twinning-based internet-of-vehicles edge computing and unloading multi-objective decision method
CN115146764A (en) Training method and device of prediction model, electronic equipment and storage medium
CN112667406A (en) Task unloading and data caching method in cloud edge fusion heterogeneous network
CN113139341A (en) Electric quantity demand prediction method and system based on federal integrated learning
CN114553718B (en) Network traffic matrix prediction method based on self-attention mechanism
CN115065728A (en) Multi-strategy reinforcement learning-based multi-target content storage method
CN114548575A (en) Self-adaptive building day-ahead load prediction method based on transfer learning
CN114626550A (en) Distributed model collaborative training method and system
CN110505604B (en) Method for accessing frequency spectrum of D2D communication system
Peng et al. Hmm-lstm for proactive traffic prediction in 6g wireless networks
CN115022195B (en) Flow dynamic measurement method for IPv6 network
CN116484976A (en) Asynchronous federal learning method in wireless network
CN115623445A (en) Efficient communication method based on federal learning in Internet of vehicles environment
CN115912430A (en) Cloud-edge-cooperation-based large-scale energy storage power station resource allocation method and system
Gupta et al. Learning-based multivariate real-time data pruning for smart PMU communication
CN113115355B (en) Power distribution method based on deep reinforcement learning in D2D system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant