CN109617991A

CN109617991A - Based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method

Info

Publication number: CN109617991A
Application number: CN201811634918.6A
Authority: CN
Inventors: 潘志文; 高深; 刘楠; 尤肖虎
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-04-12
Anticipated expiration: 2038-12-29
Also published as: CN109617991B

Abstract

The invention discloses be based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method.The intensified learning method of adopted value approximation to function, value function is expressed as to the function of state and movement, to maximize file request number that average accumulated small station directly services as optimization aim, by constantly with environmental interaction, adapt to the dynamic change of environment, potential file request transfer mode is excavated, the approximate expression of value function is obtained, and then obtains the cooperation caching decision to match with file request transfer mode；Macro base station encodes cooperation caching decision, and coding cooperative buffered results are communicated to each small station.The transfer mode of file request formulates cache decision in the live network that the present invention is excavated by intensified learning, without any pair of data prior distribution it is assumed that be more applicable for real system；And by with environment real-time, interactive, the file popularity of traceable time-varying makes corresponding cache policy, and process simple possible is not required to solution NP-hard problem.

Description

Based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method

Technical field

The invention belongs to the wireless network deployment technical fields in mobile communication, in particular to a kind of super-intensive heterogeneous network Network small station coding cooperative caching method.

Background technique

With the popularization of intelligent terminals with the development of Internet service, serviced to meet user for high data rate and height The requirement of quality, super-intensive heterogeneous network will become one of the key technology of the 5th Generation Mobile Communication System (5G), by macro The intensive small station of deployment, can effectively promote the communication quality of network edge user in base station range, to improve frequency spectrum effect Rate and throughput of system.However, the small station of dense deployment is to wireless since small station by wireless backhaul link is connected to macro base station Backhaul link causes huge pressure, and the wireless backhaul link of high load becomes the bottleneck of network at this time.The super-intensive network architecture It is urgently blended with other network architectures or technology preferably to service user, mobile network's marginalisation is exactly a kind of suitable Selection.Edge storage is an important concept in mobile network's marginalisation framework, i.e., reduces peak in small station cache file The mass data transfers of phase can effectively mitigate system backhaul link load, reduce propagation delay time, promote user experience.Super-intensive Heterogeneous network middle and small stations number is more, distance is close, and user generally lies in the coverage area in multiple small stations, if small station is user association Make transmission file, then the limited spatial cache in small station can be made more fully to be utilized.Therefore in super-intensive heterogeneous network Edge cache problem be worth further investigation.

Cache decision is often modeled as an optimization problem by existing caching technology.Firstly, often thinking file prevalence Degree does not change over time, and the file popularity in real network changes constantly, this popular based on constant file The method that degree carrys out solving optimization problem is unable to the continuous variation of trace files popularity, so that the cache decision gone out can not It is perfectly suitable for real network；Even secondly, constant file popularity is changed into instantaneous file popularity, file stream The transformation of row degree is primary, and optimization problem will rerun once, brings huge network overhead, moreover the optimization problem modeled is past Toward being NP-hard (Non-Polynomial hard) problem, solve extremely difficult；Finally, due to which cache problem itself is basis The file request behavior having occurred and that in network, makes cache decision, and the file request behavior for that will occur is prepared, base The transfer mode of file request in network cannot be excavated in the method that tradition solution optimization problem formulates cache decision, to make to make Cache decision not be optimal to the file request that will occur.

Summary of the invention

In order to solve the technical issues of above-mentioned background technique proposes, the present invention provides different based on the approximate super-intensive of value function Network forming network small station coding cooperative caching method, adopted value function approximation method excavate the potential transfer mode of file request, obtain Obtain the cooperation caching strategy better than conventional method.

In order to achieve the above technical purposes, the technical solution of the present invention is as follows:

Based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method, by macro base station and its covering model As machine, macro base station is responsible for determining the small station movement to be executed under each time slot state and is assigned to each small in small station in enclosing It stands, each small station is responsible for executing movement, and the state includes the file popularity of this time slot and the cooperation caching that previous time slot is made Decision, it is next time slot file request service collaboration cache decision that the movement, which refers to that this time slot is made,；It is close using value function As intensified learning method, value function is expressed as to the function of state and movement, is directly serviced with maximizing average accumulated small station File request number be optimization aim, by constantly with environmental interaction, adapting to the dynamic change of environment, excavate potential File request transfer mode obtains the approximate expression of value function, and then obtains slow with cooperating of matching of file request transfer mode Deposit decision；Macro base station encodes cooperation caching decision, and coding cooperative buffered results are communicated to each small station.

Further, comprising the following steps:

Step 1, the acquisition network information, are arranged parameter:

Acquire macro base station set M, the small station set P, file request set C in network₁And m-th of macro base station covers model Enclose interior small station number p_m,m∈M；Small station spatial cache K is obtained, operator determines according to network operation situation and hardware cost Stand spatial cache K；One time was divided into T time slot according to the file request situation in super-intensive heterogeneous network by operator, And the start time of each time slot is set, each time slot is successively divided into three phases according to time of origin: file transmits rank Section, information exchange stage and caching decision phase；

The base station collaboration buffering scheme that step 2, formulation are encoded based on MDS:

The cooperation caching decision vector in small station is denoted as a (t), each element a in a (t)_c(t) [0,1] ∈, c ∈ C₁, generation Table caches the ratio of c-th of file, a in t time slot small station_c(t) ≠ 0 file set is the file set of t slot buffer, It is denoted as C'(t), c-th of file includes B information bit, and m-th of macro base station is encoded by MDS B information bit coding life AtA check bit:

In above formula, d is the number in the small station that received signal power is greater than a threshold value, and threshold value is transported by operator according to network Market condition determines ownA check bit is divided into small station candidate's bit and macro base station candidate's bit two parts, medium and small Candidate bit of standing includes p_mB bit, i.e., there is mutually unduplicated B candidate bit in each small station, in each small station of t time slot from each From candidate bit in select before a_c(t) B bit is cached；

Macro base station arbitrarily chooses (1-da from its candidate bit_c(t)) B bit is cached, Encoding according to MDS Matter, a file request, which obtains at least B check bit, can restore entire file；

Step 3 formulates base station collaboration transmission plan:

Each file request of user obtains da from d small station for covering it first_c(t) B bit, if da_c(t) >=1, Then macro base station is not required to transmit data again；Otherwise macro base station selects a small station nearest apart from user, transmission from d small station (1-da_c(t)) B bit gives the small station, then by the small station these bit transfers to user, the data of macro base station transmission claim For backhaul link load；

Step 4 describes intensified learning task with markov decision process MDP:

Establish intensified learning four-tupleWherein X represents state space, and A representative movement is empty Between,State transition probability is represented, execution movement a is transferred to x ' shape probability of state under x state,Represent the transfer Bring award；

Intensified learning four-tuple concrete form is as follows:

Motion space: since the element number that cache decision vector includes is equal to set C₁Element number C, therefore act Space is C dimension continuous space, every dimension a_c(t) it is quantized into L discrete value, L is determined by operator according to macro station computing capability, then The motion space of discretization is A={ a¹,a²,…,a^|A|, wherein any one acts vectorj∈ 1,2 ..., | A | need to meetThe movement vector total number for meeting the condition is | A |, the caching of t time slot is determined Plan a (t) ∈ A；

State space: the p in t time slot, m-th of macro station coverage area_mA small station file request total degree be denoted as to Measure N (t)=[N₁(t),N₂(t),…,N_C(t)], general act popularity is denoted as vector theta (t)=[θ₁(t),θ₂(t),…,θ_C (t)], whereinC ∈ C, then the state of t time slot is denoted as x (t)=[Θ (t), a (t- 1)]；Enable H={ Θ¹,Θ²,…,Θ^|H|It is general act popularity set, it is a member in set H after Θ (t) is quantified Element, then state space is denoted as X={ x¹,x²,…,x^|H||A|, state x (t) ∈ X；

State transition probability: after t time slot execution acts a (t), which is applied on current state x (t), ring Border is from current state with potential transition probabilityIt is transferred to next state x (t+1), which is unknown 's；

Award: while environment is transferred to x (t+1), environment can be to one, machine award, awardHerein It is defined as the file request number that small station directly services:

In above formula, u [] represents jump function,For in t The cache decision stage of gap updates small station and caches the number of files that need to be transmitted,For in the information exchange stage of (t+1) time slot by macro base Stand transmission number of files；

Step 5, clear intensified learning target:

It defines deterministic policy function π (x), x ∈ X knows, the movement a (t) to be executed at state x (t) according to the strategy =π (x (t)), then state value function:

In above formula,It represents from state x (t), is awarded using being accumulated brought by tactful π, 0≤γ < 1 It is measurement of the movement π (x (t)) to state influence degree in future of t time slot execution；

After obtaining state value function, state-movement value function, i.e. Q function are just obtained:

In above formula,A'(t)) represent from state x (t), execution acts a'(t) after reuse tactful π band The accumulation award come；

X (t), x (t+1), a'(t are replaced respectively with x, x', a), target is to find that expectation is made to accumulate awardMaximum strategy is denoted as π^*(x), optimal value function isIt is obtained according to optimal policy:

Namely:

Step 6 is formulated based on the approximate Q-learning process of value function:

(601) Q function is indicated with the approximate method of value function, i.e., is the function of state and movement by Q function representation, by Instantaneous awardInspiration, at state x (t), execution act a'(t), Q approximation to function indicate are as follows:

In above formula, ω₁And ω₂Two-part weight is represented, ω is set₁> > ω₂, β, η_i, ξ_iIt is unknown parameter, needs It is obtained by study；

(602) cooperation caching decision is solved:

(603) target of Q-learning is established:

It is calculated at state x (t) according to above formula, execution movement a (t) brings accumulation award true value:

In above formula,For the motion estimation value under state x (t+1)；

(604) loss function is defined:

In above formula, η=[η₁,η₂,…,η_C], ξ=[ξ₁,ξ₂,…,ξ_C], E_πExpression asks expectation to tactful π；

According to loss function undated parameter β, η, ξ；

Step 7, setting current time slots t=1, are randomly provided initial state x (t)=[Θ (t), a (t-1)], parameter is initial Value β^p=0, η^p=0, ξ^p=0, operator according to network change speed be arranged γ value, range be [0,1), according to what is updated The order of magnitude of parameter determines the value for updating step-length δ, range be (0,1], the number t of training time slot is set according to network size_total；

Step 8, the cache decision stage in t time slot take the association to be executed under state x (t) using ε-greedy method strategy Make cache decision a (t)；

The file for needing to cache is carried out MDS coding according to step 2 by step 9, macro base station, and the data packet after coding is passed It is defeated by small station caching；

Step 10, the file transmission stage in t+1 time slot, user's demand file, base station is to use according to step 3 cooperation transmission Family service；

Step 11, the information exchange stage in t+1 time slot, all small stations in each macro base station coverage area are by it in t+ File request number reports that macro base station summary file request total degree is denoted as vector N (t+1), and calculates to macro base station in 1 time slot General act popularity is denoted as vector theta (t+1)；

Step 12, the state being transferred to are x (t+1)=[Θ (t+1), a (t)], calculate reward functions

Step 13, estimation movement to be executed at state x (t+1):

Step 14 updates the parameter in Q approximation to function formula according to step (604)；

If step 15, t=t_total, then deconditioning, enters step 16；Otherwise, t=t+1 is returned into next time slot To step 8, continue to train；

Step 16, since t time slot, cooperation caching decision is determined based on the obtained Q approximation to function formula of training, is served down The file request of one time slot.

Further, in step 3, the determination method of d is as follows:

If the probability that user is serviced by d' small station is p_d', it is primarily based on the base station deployment situation of operator, according to user P is calculated in the historical data of position_d': in period τ, record the position of U user respectively every τ ' time interval, τ with τ ' is voluntarily determined that record user u ∈ { 1,2 ..., U } receives signal at each position by operator according to network operation situation Power is greater than the base station number d' of a threshold value, then the position number that base station number is d' is denoted asUtilize U user's Historical position is calculated:

In above formula,It indicates in the historical position of user u, there is i base station to provide the position number of service for user u；

Then, choosing d is to make probability value p_d' maximum d':

Further, in step (602), due to ω₁> > ω₂, omitObtain cache decision:

The solution procedure of above formula is as follows:

1. according to l_maxD/L >=1 determines the maximum value of element in cache decision vector, l_maxIt is the denominator of greatest member, by In the l in the value range for meeting inequality_maxIt is the smaller the better, therefore Expression rounds up；

2. calculating the number z of each element i/L in cache decision vector according to node B cache space_i, i=1,2 ..., l_max:

WhereinIt indicates to be rounded downwards；

3. determining the position of each element: coefficient η_iθ_i(t), i=1,2 ..., C are arranged in descending order, and j-th after sequence Element is denoted asCorresponding to the h before sequence_jA file primarily determines the position of each element first:

Then, it adjustsIn meet condition 1-l_maxThe element of d/L < 0, fromStart to J=1 terminates, and recycles following step to adjust the element in movement vector: fromIn Find the condition of satisfactionWithMinimum j',Subtract 1/L,Add 1/L；

Equally using in above-mentioned method for solving estimating step 13

Further, in step 8, with probability 1- ε according to step (602) selecting collaboration cache decision；It is random with probability ε Selection one meets conditionWithCooperation caching decision.

Further, it in step (604), is updated using stochastic gradient descent method in Q approximation to function expression Parameter beta, η, ξ:

β in above formula^c,Represent the parameter of current time slots, β^p,Represent the parameter of previous time slot, 0 < δ≤ 1 represents update step-length.

By adopting the above technical scheme bring the utility model has the advantages that

The present invention provides service using small station cooperative coding caching and cooperation transmission for user, passes through intensified learning and excavates receipts The transfer mode of file request formulates cache decision in the live network collected, a kind of machine learning side as data-driven Method, without any pair of data prior distribution it is assumed that being more applicable for real system；And by with environment real-time, interactive, can chase after The file popularity of track time-varying, makes corresponding cache policy, and process simple possible is not required to solution NP-hard problem.

The present invention is based on value function approximation methods to formulate cooperation caching decision, macro base station by the continuous interaction with environment, Collection status information makes corresponding cooperation caching decision, and decision is communicated to each small station, and it is limited to efficiently use small station Memory space caches most accurate file, significantly improves the file request number directly serviced by small station, reduces system backhaul chain Road load.

Detailed description of the invention

Fig. 1 is flow chart of the method for the present invention.

Specific embodiment

Below with reference to attached drawing, technical solution of the present invention is described in detail.

The present invention proposes a kind of to maximize file request number that average accumulated small station directly services as target, in small station Under the premise of cache file total size is no more than small station spatial cache, compiled based on value function approximate super-intensive heterogeneous network small station Code cooperation caching method.This method excavates the transfer mode of file request by intensified learning, and according to the mode system excavated Determine small station coding cooperative caching method.Intensified learning is described as a MDP (Markov Decision Process), macro base Stand and its coverage area in small station as machine, macro base station is responsible for determining the movement to be executed and to be assigned to each small station, each small station It is responsible for executing movement, changes environment, environment feeds back to one, machine award according to reward functions, by constantly handing over environment Mutually, study obtains the small station movement to be executed in the state of each time slot, and state here is the environment that macro base station is observed Part describes, the cooperation caching decision that file popularity and previous time slot including this time slot are made, movement here Referring to that this time slot is made is the cooperation caching decision of next time slot file request service.Reward functions are determined according to caching The target of plan is defined as the file request number that small station directly services come what is defined here.Value function approximation (value Function approximation) it is a kind of intensified learning method, it is in huge discrete shape suitable for intensified learning task The case where carrying out on state space or continuous state space is expressed as value function the function of state and movement, average to maximize The file request number that accumulation small station directly services is optimization aim, by constantly with environmental interaction, adapting to the dynamic of environment Variation, can excavate potential file request transfer mode, obtain the approximate expression of value function, and then obtain shifting with file request The cooperation caching decision that mode matches.Macro base station combination MDS (Maximum Distance Separable) coding method, will Coding cooperative buffered results are finally communicated to each small station by document No., significantly improve the file request directly serviced by small station Number reduces system backhaul link load.

A kind of embodiment is hereafter provided by taking lte-a system as an example, as shown in Figure 1, the specific steps are as follows:

Step 1: the acquisition network information, is arranged parameter:

Acquire macro base station set M, the small station set P, file request set C in network₁And m-th of macro base station covers model Enclose interior small station number p_m, m ∈ M, set C₁Include C file；Small station spatial cache K is obtained, operator is according to network operation feelings Condition and hardware cost determine station spatial cache K；Operator according to the file request situation in super-intensive heterogeneous network by one when Between be divided into T time slot, and the start time of each time slot is set, each time slot is successively divided into three according to time of origin Stage: file transmits stage, information exchange stage and caching decision phase.

Step 2: formulating the base station collaboration buffering scheme encoded based on MDS:

The cooperation caching decision vector in small station is denoted as a (t)=[a₁(t),a₂(t),…,a_C(t)], wherein 0≤a_c(t)≤ 1, c ∈ C, which is represented, caches the ratio of c-th of file, a in t time slot small station_c(t) ≠ 0 file set (the i.e. text of t slot buffer Part set) it is denoted as C'(t), file c includes B information bit, and macro base station m is encoded by MDS and B information bit coding is generatedA check bit:

Wherein d is the number in the small station that received signal power is greater than a threshold value, and threshold value is by operator according to the network operation Situation voluntarily determines ownA check bit is divided into small station candidate's bit and macro base station candidate's bit two parts, wherein Small station candidate's bit includes p_mB bit, i.e., each small station have mutually it is unduplicated B candidate bit, each small station of t time slot from A before being selected in respective candidate's bit_c(t) B bit is cached；

Macro base station arbitrarily chooses (1-da from its candidate bit_c(t)) B bit is cached, Encoding according to MDS Matter, a file request, which obtains at least B check bit, can restore entire file.

Step 3: formulating base station collaboration transmission plan:

Each file request of user obtains da from d small station for covering it first_c(t) B bit, if da_c(t) >=1, Then macro base station is not required to transmit data again；Otherwise macro base station selects a small station nearest apart from user, transmission from d small station (1-da_c(t)) B bit gives the small station, then by the small station these bit transfers to user, the data of macro base station transmission claim For backhaul link load.The determination method of d:

The probability that user is serviced by d' small station is p_d', it is primarily based on the base station deployment situation of operator, according to user position P is calculated in the historical data set_d': in period τ, the position of U user, τ and τ ' are recorded respectively every τ ' time interval It is voluntarily determined by operator according to network operation situation, record user u ∈ { 1,2 ..., U } receives signal function at each position Rate is greater than the base station number d' of a threshold value, then the position number that base station number is d' is denoted asUtilize the history of U user Position is calculated:

WhereinIt indicates in the historical position of user u, there is i base station to provide the position number of service for user u.

Choosing d is to make probability value p_d'Maximum d':

Step 4: describing intensified learning task with MDP:

Wherein X represents state space, and A represents motion space,Represent state transfer Probability, execution movement a is transferred to x ' shape probability of state under x state,Represent transfer bring award；

Concrete form of the intensified learning four-tuple in the problem is as follows:

1, motion space: action definition is the cooperation caching decision vector in small station, and the movement that machine can be taken constitutes dynamic Make space, since the element number that cache decision vector includes is equal to the number C of file, motion space is that C dimension connects here Continuous space, 0≤a of every dimension_c≤ 1, c ∈ C are quantized into L discrete value, and L is voluntarily determined by operator according to macro station computing capability, Then the motion space of discretization is A={ a¹,a²,…,a^|A|, wherein any one acts vectorj∈ 1,2 ..., | A | need to meetThe movement vector total number for meeting the condition is | A |, the caching of t time slot is determined Plan a (t) ∈ A.

2, state space: state is the description that machine perceives its local environment, and state is small by file popularity vector sum The cooperation caching decision vector composition stood, such as the p in t time slot, m-th of macro station coverage area_mA small station file request Total degree is denoted as vector N (t)==[N₁(t),N₂(t),…,N_C(t)], general act popularity is denoted as vector theta (t)=[θ₁ (t),θ₂(t),…,θ_C(t)], whereinC ∈ C, then the state of t time slot be denoted as x (t)= [Θ(t),a(t-1)]；Enable H=={ Θ¹,Θ²,…,Θ^|H|It is general act popularity set, it is to collect after Θ (t) is quantified An element in H is closed, then state space is denoted as X=={ x¹,x²,…,x^|H||A|, state x (t) ∈ X.

3, state transition probability: after t time slot execution acts a (t), which is applied on current state x (t), Environment is from current state with potential transition probabilityIt is transferred to next state x (t+1), which is unknown 's.

4, award: while environment is transferred to x (t+1), environment can be to one, machine award, awardAt this In be defined as the file request number that small station directly services:

Wherein, u [] represents jump function, and when the value in bracket is greater than 0, otherwise functional value 1 is 0；It needs to pass to update small station caching in the cache decision stage of t time slot Defeated number of files,For in the information of (t+1) time slot The number of files that switching phase is transmitted by macro base station.

Step 5: clear intensified learning target:

It defines deterministic policy function π (x), x ∈ X, according to this strategy, it is known that the movement to be executed at state x (t) A (t)=π (x (t))；Define the state value function that award is accumulated in the expectation of γ discount:

Wherein E_πExpression asks expectation to tactful π,It represents from state x (t), tires out using brought by tactful π Product award, 0≤γ < 1 are measurement of the movement π (x (t)) to state influence degree in future of t time slot execution.

After obtaining state value function, state-movement value function (Q function) can be obtained:

Represent from state x (t), execution acts a'(t) after reuse tactful π bring accumulation Award, (4) formula and (5) formula are known as Bellman equation.

X (t), x (t+1), a'(t are replaced respectively with x, x', a), target is to find that expectation is made to accumulate awardMaximum strategy is denoted as π^*(x), optimal value function isAccording to (4) under optimal policy Formula and (5) formula can be obtained:

Namely:

(6) (7) two formulas disclose the improved procedure of non-optimal strategy, i.e., change into the movement of policy selection current optimal Movement:

In the situation known to intensified learning four-tuple, Iteration algorithm or Policy iteration algorithm solution can be used based on (8) formula Bellman equation obtains optimal policy.

Step 6: being based on the approximate Q-learning process of value function under state transition probability unknown situation:

Since state transition probability is unknown, so can not be obtained by Policy iteration algorithm or Iteration algorithm optimal Strategy；Simultaneously because state transition probability is unknown to be caused from state value function to the conversion of Q function difficulty, therefore consider direct Estimate Q function；

1, Q approximation to function: to solve the storage of Q-table caused by big state space and motion space and traversal search Difficulty indicates Q function with the approximate method of value function, i.e., is the function of state and movement by Q function representation, is instantaneously awardedInspiration, by taking t time slot as an example, in state x (t), execution acts a'(t), Q approximation to function indicates are as follows:

Wherein ω₁And ω₂Two-part weight is represented, ω is set₁> > ω₂, β, η_i, ξ_iIt is unknown parameter, needs to lead to Overfitting obtains.

2, the selection of cooperation caching decision:

Due to ω₁> > ω₂, omitObtain cache decision:

(11) formula, which is asked, makes the maximum cooperation caching strategy of value in bracket, can be seen that from the expression formula in bracket and (1-da'_i(t)) factor η being multiplied_iθ_i(t) size of value bracket Nei, η are directly related to_iθ_i(t) bigger, corresponding (1- da'_i(t)) should be smaller, just the value in bracket can be made bigger in this way.Therefore the solution procedure of (11) formula is as follows:

2. each element i/L, i=1,2 in cache decision vector are calculated ..., l_maxNumber z_i:

WhereinIt indicates to be rounded downwards；

Then, it adjustsIn meet condition 1-l_maxThe element of d/L < 0, fromStart to J=1 terminates, and recycles following step to adjust the element in movement vector: fromIn Find the condition of satisfactionWithMinimum j',Subtract 1/L,Add 1/L.

3, the target of Q-learning:

(6) formula is substituted into (5) Shi Ke get:

(14) formula is disclosed at state x (t), and execution movement a (t) brings the calculation method of accumulation award true value:

WhereinFor the motion estimation value under state x (t+1), according to step Rapid 2 estimate.

Define loss function:

Wherein parameter vector η=[η₁,η₂,…,η_C], ξ=[ξ₁,ξ₂,…,ξ_C], the target of Q-learning is exactly to make Q letter Several estimated values and true value close to, namely minimize loss function.

4, the parameter beta in Q approximation to function expression is updated using stochastic gradient descent method, η, ξ:

Wherein β^c,Represent the parameter of current time slots, β^p,Represent the parameter of previous time slot, 0 generation of < δ≤1 Table updates step-length.

Step 7: setting current time slots t=1, is randomly provided initial state x (t)=[Θ (t), a (t-1)], parameter is initial Value β^p=0, η^p=0, ξ^p=0, operator according to network change speed be arranged γ value, range be [0,1), according to what is updated The order of magnitude of parameter determines the value of δ, range be (0,1], the number t of training time slot is set according to network size_total。

Step 8: taking the association to be executed under state x (t) using ε-greedy method strategy in the cache decision stage of t time slot Make cache decision a (t): with probability 1- ε according to step 2 selecting collaboration cache decision in the 6th step；One is randomly choosed with probability ε Meet conditionWithCooperation caching decision.

Step 9: macro base station will need the file that caches to carry out MDS coding according to second step, and by the data packet after coding It is transferred to small station caching.

Step 10: the file in (t+1) time slot transmits the stage, user's demand file, base station cooperates according to third step and passes Defeated is user service.

Step 11: all small stations in each macro base station coverage area will in the information exchange stage of (t+1) time slot Its file request number in (t+1) time slot reports that, to macro base station, macro base station summary file request total degree is denoted as vector N (t + 1) it, and calculates general act popularity and is denoted as vector theta (t+1).

Step 12: the state being transferred to is x (t+1)=[Θ (t+1), a (t)], reward functions are calculated according to (3) formula

Step 13: estimating the movement to be executed at state x (t+1) according to step 2 in the 6th step:

Step 14: updating the parameter in Q approximation to function formula according to (17) formula.

Step 15: if t=t_total, then deconditioning, into the 16th step；Otherwise, t=t+1, into lower a period of time Gap returns to the 8th step, continues to train.

Step 16: the Q approximation to function formula obtained based on training determines association according to step 2 in the 6th step since t time slot Make cache decision, serves the file request of next time slot.

According to the above process it is found that during Q function learning, in macro base station and its coverage area small station as machine, It is using cooperation caching decision as movement, small station is direct using the cooperation caching decision in file popularity and small station as state The file request number of service is as reward functions, by constantly interacting with environment, is to maximize accumulation reward functions Target, study obtains Q approximation to function formula, and then obtains the cooperation caching decision under each state, and then macro base station is encoded with MDS The file that will be cached is encoded, and coding result is communicated to each small station and carries out cooperation caching.This method utilizes extensive chemical Learning method looks for mode from data, is not necessarily based on data distribution solving optimization problem.The file that real-time change can be tracked is popular Degree sufficiently excavates and formulates cooperation caching decision using potential file request transfer mode, is more suitable for real system, shows The file request number for improving and directly being serviced by small station is write, system backhaul link load is effectively reduced, system performance is provided, is promoted User experience.

Embodiment is merely illustrative of the invention's technical idea, and this does not limit the scope of protection of the present invention, it is all according to Technical idea proposed by the present invention, any changes made on the basis of the technical scheme are fallen within the scope of the present invention.

Claims

1. being based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method, it is characterised in that: by macro base station And its small station in coverage area is as machine, macro base station be responsible for determining under each time slot state the small station movement to be executed and under Up to each small station, each small station is responsible for executing movement, and the state includes that the file popularity of this time slot and previous time slot are made Cooperation caching decision, it is next time slot file request service collaboration cache decision that the movement, which refers to that this time slot is made,；Using Value function, is expressed as the function of state and movement by the approximate intensified learning method of value function, to maximize average accumulated small station The file request number directly serviced is optimization aim, by constantly with environmental interaction, adapting to the dynamic change of environment, is excavated Potential file request transfer mode out, obtains the approximate expression of value function, and then obtains matching with file request transfer mode Cooperation caching decision；Macro base station encodes cooperation caching decision, and coding cooperative buffered results are communicated to each small station.

2. it is based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method according to claim 1, It is characterized in that, comprising the following steps:

Step 1, the acquisition network information, are arranged parameter:

Acquire macro base station set M, the small station set P, file request set C in network₁And in m-th of macro base station coverage area Small station number p_m,m∈M；Small station spatial cache K is obtained, operator determines that station is slow according to network operation situation and hardware cost Deposit space K；One time was divided into T time slot according to the file request situation in super-intensive heterogeneous network by operator, and was set Each time slot is successively divided into three phases according to time of origin by the start time for setting each time slot: file transmits stage, letter Cease switching phase and caching decision phase；

The cooperation caching decision vector in small station is denoted as a (t), each element a in a (t)_c(t) [0,1] ∈, c ∈ C_1,It represents T time slot small station caches the ratio of c-th of file, a_c(t) ≠ 0 file set is the file set of t slot buffer, is denoted as C'(t), c-th of file includes B information bit, and m-th of macro base station is encoded by MDS and B information bit coding is generatedA check bit:

In above formula, d is the number in the small station that received signal power is greater than a threshold value, and threshold value is by operator according to network operation feelings Condition determines ownA check bit is divided into small station candidate's bit and macro base station candidate's bit two parts, and middle and small stations are waited Selecting bit includes p_mB bit, i.e., there is mutually unduplicated B candidate bit in each small station, in each small station of t time slot from respective A before being selected in candidate bit_c(t) B bit is cached；Macro base station arbitrarily chooses (1-da from its candidate bit_c(t)) B Bit is cached, and according to MDS coding properties, a file request, which obtains at least B check bit, can restore entire text Part；

Step 3 formulates base station collaboration transmission plan:

Each file request of user obtains da from d small station for covering it first_c(t) B bit, if da_c(t) >=1, then macro Base station is not required to transmit data again；Otherwise macro base station selects a small station nearest apart from user from d small station, transmits (1-da_c (t)) B bit gives the small station, then by the small station these bit transfers to user, the data of macro base station transmission are known as backhaul Link load；

Step 4 describes intensified learning task with markov decision process MDP:

Establish intensified learning four-tupleWherein X represents state space, and A represents motion space,State transition probability is represented, execution movement a is transferred to x ' shape probability of state under x state,The transfer is represented to bring Award；

Intensified learning four-tuple concrete form is as follows:

Motion space: since the element number that cache decision vector includes is equal to set C₁Element number C, therefore motion space It is C dimension continuous space, every dimension a_c(t) it is quantized into L discrete value, L is determined by operator according to macro station computing capability, then discrete The motion space of change is A={ a¹,a²,…,a^|A|, wherein any one acts vectorj∈{1, 2 ..., | A | need to meetThe movement vector total number for meeting the condition is | A |, the cache decision a of t time slot (t)∈A；

State space: the p in t time slot, m-th of macro station coverage area_mA small station file request total degree is denoted as vector N (t) =[N₁(t),N₂(t),…,N_C(t)], general act popularity is denoted as vector theta (t)=[θ₁(t),θ₂(t),…,θ_C(t)], whereinThe state of so t time slot is denoted as x (t)=[Θ (t), a (t-1)]；Enable H= {Θ¹,Θ²,…,Θ^|H|It is general act popularity set, it is an element in set H, then state after Θ (t) is quantified Space is denoted as X={ x¹,x²,…,x^|H|A|, state x (t) ∈ X；

State transition probability: t time slot execution act a (t) after, which is applied on current state x (t), environment from Current state is with potential transition probabilityIt is transferred to next state x (t+1), which is unknown；

Award: while environment is transferred to x (t+1), environment can be to one, machine award, awardIt defines herein The file request number directly serviced at small station:

In above formula, u [] represents jump function,For in the slow of t time slot It deposits decision phase update small station and caches the number of files that need to be transmitted, For the number of files transmitted in the information exchange stage of (t+1) time slot by macro base station；

Step 5, clear intensified learning target:

It defines deterministic policy function π (x), x ∈ X knows, the movement a (t) to be executed at state x (t)=π according to the strategy (x (t)), then state value function:

In above formula,It represents from state x (t), is awarded using being accumulated brought by tactful π, when 0≤γ < 1 is t Measurement of the movement π (x (t)) that gap executes to state influence degree in future；

In above formula,A'(t)) represent from state x (t), execution acts a'(t) after to reuse tactful π bring tired Product award；

X (t), x (t+1), a'(t are replaced respectively with x, x', a), target is to find that expectation is made to accumulate awardMost Big strategy is denoted as π^*(x), optimal value function isIt is obtained according to optimal policy:

Namely:

In above formula, ω₁And ω₂Two-part weight is represented, ω is set₁> > ω₂, β, η_i, ξ_iIt is unknown parameter, needs to pass through Study obtains；

(602) cooperation caching decision is solved:

(603) target of Q-learning is established:

In above formula,For the motion estimation value under state x (t+1)；

(604) loss function is defined:

According to loss function undated parameter β, η, ξ；

Step 7, setting current time slots t=1, are randomly provided initial state x (t)=[Θ (t), a (t-1)], initial parameter value β^p= 0, η^p=0, ξ^p=0, operator according to network change speed be arranged γ value, range be [0,1), according to the parameter to be updated The order of magnitude determine update step-length δ value, range be (0,1], according to network size be arranged training time slot number t_total；

Step 8, the cache decision stage in t time slot take the cooperation to be executed under state x (t) slow using ε-greedy method strategy Deposit decision a (t)；

The file for needing to cache is carried out MDS coding according to step 2 by step 9, macro base station, and the data packet after coding is transferred to Small station caching；

Step 10, the file transmission stage in t+1 time slot, user's demand file, base station are user's clothes according to step 3 cooperation transmission Business；

Step 11, the information exchange stage in t+1 time slot, all small stations in each macro base station coverage area are by it in t+1 File request number reports that macro base station summary file request total degree is denoted as vector N (t+1), and is calculated total to macro base station in gap File popularity is denoted as vector theta (t+1)；

Step 13, estimation movement to be executed at state x (t+1):

If step 15, t=t_total, then deconditioning, enters step 16；Otherwise, t=t+1 returns to step into next time slot Rapid 8, continue to train；

Step 16, since t time slot, cooperation caching decision is determined based on the obtained Q approximation to function formula of training, serves lower a period of time The file request of gap.

3. it is based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method according to claim 2, It is characterized in that, in step 3, the determination method of d is as follows:

If the probability that user is serviced by d' small station is p_d', it is primarily based on the base station deployment situation of operator, according to user location Historical data p is calculated_d': in period τ, record the position of U user respectively every τ ' time interval, τ and τ ' by Operator voluntarily determines according to network operation situation, records user u ∈ { 1,2 ..., U } received signal power at each position Greater than the base station number d' of a threshold value, then the position number that base station number is d' is denoted asUtilize the history bit of U user It sets and is calculated:

Then, choosing d is to make probability value p_d'Maximum d':

4. it is based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method according to claim 2, It is characterized in that, in step (602), due to ω₁> > ω₂, omit Obtain cache decision:

The solution procedure of above formula is as follows:

1. according to l_maxD/L >=1 determines the maximum value of element in cache decision vector, l_maxThe denominator of greatest member, due to Meet l in the value range of inequality_maxIt is the smaller the better, therefore Expression rounds up；

WhereinIt indicates to be rounded downwards；

3. determining the position of each element: coefficient η_iθ_i(t), i=1,2 ..., C are arranged in descending order, j-th of element after sequence It is denoted asCorresponding to the h before sequence_jA file primarily determines the position of each element first:

Then, it adjustsIn meet condition 1-l_maxThe element of d/L < 0, fromStart to j=1 Terminate, recycles following step to adjust the element in movement vector: fromIn find Meet conditionWithMinimum j', Subtract 1/L,Add 1/L；

Equally using in above-mentioned method for solving estimating step 13

5. it is based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method according to claim 4, It is characterized in that, in step 8, with probability 1- ε according to step (602) selecting collaboration cache decision；One is randomly choosed with probability ε Meet conditionWithCooperation caching decision.

6. it is based on value function approximate super-intensive heterogeneous network small station coding cooperative caching method according to claim 2, It is characterized in that, in step (604), updates the parameter beta in Q approximation to function expression using stochastic gradient descent method, η, ξ:

β in above formula^c,Represent the parameter of current time slots, β^p,The parameter of previous time slot is represented, 0 < δ≤1 is represented Update step-length.