CN106170131A - A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network - Google Patents

A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network Download PDF

Info

Publication number
CN106170131A
CN106170131A CN201610736690.6A CN201610736690A CN106170131A CN 106170131 A CN106170131 A CN 106170131A CN 201610736690 A CN201610736690 A CN 201610736690A CN 106170131 A CN106170131 A CN 106170131A
Authority
CN
China
Prior art keywords
sbs
strategy
mbs
interference
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610736690.6A
Other languages
Chinese (zh)
Inventor
邵鸿翔
张建照
赵杭生
杨健
曹龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
36th Institute Of Central Military Commission Equipment Development Department
Original Assignee
36th Institute Of Central Military Commission Equipment Development Department
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 36th Institute Of Central Military Commission Equipment Development Department filed Critical 36th Institute Of Central Military Commission Equipment Development Department
Priority to CN201610736690.6A priority Critical patent/CN106170131A/en
Publication of CN106170131A publication Critical patent/CN106170131A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/14Spectrum sharing arrangements between different networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/243TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account interferences
    • H04W52/244Interferences in heterogeneous networks, e.g. among macro and femto or pico cells or other sector / system interference [OSI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/53Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/02Hierarchically pre-organised networks, e.g. paging networks, cellular networks, WLAN [Wireless Local Area Network] or WLL [Wireless Local Loop]
    • H04W84/04Large scale networks; Deep hierarchical networks
    • H04W84/042Public Land Mobile systems, e.g. cellular systems

Abstract

Under the heterogeneous hierarchical cellular network of a kind of imperfect channel information, sane layering Game Learning resource allocation methods, belongs to wireless communication technology field.Interference control problem research for existing layered heterogeneous network is mostly all based on Utopian perfect channel information and changes the problem that the user's income brought reduces with channel uncertainty, and this resource allocation methods proposes a kind of discrete strategies Resource Allocation Formula based on robust bilayer Staenberg game.Processed and by the way of economy constraint on the premise of the QoS (service quality) ensureing macrocell user by robust, the frequency spectrum share making upper strata honeycomb microcellulor little for lower floor is charged, maximize the effectiveness on upper strata so that the high efficiency of spectrum utilization and robustness obtain optimal compromise.Comparing the only hypothesis perfect resource distribution mode of channel information, the method makes the whole system can be along with channel variance situation, and self adaptation obtains superior policy selection result so that system obtains sane income and QoS guarantee.

Description

A kind of sane layering game of channel status condition of uncertainty lower leaf heterogeneous network Practise resource allocation methods
One technical field
The present invention relates to the resource allocation problem solution of 5G layered heterogeneous network.This invention is for heterogeneous wireless network Interference management problem under the conditions of channel information is imperfect, it is proposed that a kind of discrete strategies resource based on robust bilayer game Allocative decision.Belong to wireless communication technology field.
Two background technologies
Along with the continuous growth of new media market demand demand, 5G cellular network relative to present 4G Cellular Networks at capacity On to improve 1000 times, dense set network technology is by one of key technology becoming next generation communication.By in macrocell base stations (Macro-cell Base Station, MBS) around lays small cell base station (Small-cell Base Station, SBS), Can expanded areas of coverage, improve energy efficiency, improve user's transfer rate, to reach to improve the purpose of Consumer's Experience.Isomery Double layer cell host to have two kinds to use frequency mode: (1) orthogonal pattern (split-spectrum) that exclusively enjoys, this mode honeycomb at different levels The most noiseless, management is simple but spectrum efficiency is the lowest.(2) multiplexer mode (shared-spectrum) is shared, this Method can increase the space reuse efficiency of frequency spectrum, is more suitable for the little microcellulor network laid on a large scale, but can cause cellulor And the cross-layer interference between main honeycomb and the same layer interference between cellulor, need the control of interference to coordinate.If do not carried out suitably Interference coordination, severe jamming between base station can be brought and launch the huge waste of power.Therefore, interference controls coordination problem one-tenth Difficult point for the distribution of present stage heterogeneous wireless net resource.
Game theory is a kind of method for processing participant's mutual interests decision-making, is suitable for solving by rationality participant's group The system optimization problem become, can be widely applied to solve the resource allocation problem of multiple-user network, such as power and the distribution of channel. Double-deck Staenberg game (Stackelberg Game) is widely used in analyzing and solving the resource distribution of hierarchical wireless network Problem.But the distribution research of existing game resource all assumes that channel condition information (Channel between all users and base station State Information, CSI) oneself knows, and does corresponding decision-making accordingly.But it is in practical situations both, double-deck at isomery especially In network, owing to base station belongs to different operators, the information exchange between base station is difficulty with, even if can obtain, channel Information also has ageing.It addition, for individual privacy and the consideration of safety, base station is unwilling to form connection in double-layer network Alliance's exchange information, so requires that the center type resource allocation mode coordinating all base stations is difficult to implement.Therefore, the most distributed place Managing the distribution of the isomery double-layer network resource under the conditions of imperfect channel information is a stubborn problem.
Existing document be all based on greatly perfect channel information it is assumed that the parameter and the object function that are related to can be accurate Really obtain.Due to the stochastic and dynamic characteristic of wireless channel, in existing model, the base station user perfection of different layers inter-stage obtains mutually Information is actual.But under condition of uncertainty, use the resource allocation policy in the past obtained under the conditions of perfect channel information It is likely to make the penalty of real system.Additionally work on hand is mostly the resource allocation problem considering serial number.Compare Continuous print resource allocation policy, the resource distribution mode of discrete strategies can simplify transmission design and data process, reduce base station it Between information exchange expense, such as the downlink transfer just only supporting discrete power to control in 3GPPLTE cellular network, existing from Dissipate policy selection method computational complexity the highest, it is impossible to adapt to environment and the decision-making needs of user of real-time change.
Three summary of the invention
Present invention is primarily aimed at the disadvantages mentioned above overcoming the existing resource method of salary distribution, it is proposed that a kind of channel status is not Determine the Radio Resource bilayer distribution Optimization Framework in model lower leaf isomery microcellulor network and a kind of distributed layer study Algorithm.The scheme that proposes is searched for the equilibrium discrete strategies realizing macro base station and micro-base station.Effectively suppression is due to channel status not Determine that the income caused declines problem.
It is an object of the invention to be realized by techniques below scheme:
Present invention OFDM based on downlink hierarchical cellular network, this network is by a macro base station and N number of cellulor base Stand composition, as shown in Figure 1.Linked by Digital Subscriber Line (Digital SubscriberLine, DSL) between each honeycomb, make It is used for exchanging information for controlling channel.Each base station services multiple user in a time multiplexed manner.Macro base station and cellulor base Multiplexed network frequency spectrum resource is shared at station.For ease of analyzing, it is assumed that each small cell base station only services a chalcid fly at a time slot Nest user.Because the frequency spectrum that small cell base station is identical with macro base station use, the most inevitably there is the cross-layer between different base station Disturb with same layer.In order to protect the communication quality of user in macro base station, we use interference price to lower floor's small cell base station Transmitting power uses restraint, and limits small cell base station the accumulated interference of macro base station is necessarily less than threshold value Z.Since so, If macro base station is impacted by the communication of lower floor's small cell base station, it will be paid a price for the interference bringing macro base station, So small cell base station needs to optimize the power policy of oneself.And upper strata macro base station wish the interference of its user is limited to full Under conditions of foot service constraints, improve the total revenue to lower floor's small cell base station interference charge as far as possible.
We use double-layer frame based on Staenberg game.Upper strata game participant, as leader, has advantage Status, first makes a policy and broadcasts to lower floor.Lower floor participant follower is the relation of following, and the decision-making according to upper strata is passive Give a response, select from possible set of strategies oneself best strategy.The present invention uses single many follower of leader Form.First MBS takes action as leader, issuing unit's interference price.SBS is as follower, according to determining of upper strata MBS Valency, selects optimal power allocation strategy to maximize its effectiveness.This effectiveness embodies the game participant income to selection strategy, can Represented by function based on strategy.
Specifically comprising the following steps that of the method
1. the utility analysis of lower floor's cellulor and expression
In the wireless network of isomery, selfish for rationality, will not consult between SBS, the selection being all independent makes oneself to receive The strategy that benefit is maximum, thus constitute non-cooperative game relation.We define the utility function of underlying User SBS by rate capacities Income, the energy cost paid and the interference cost composition to upper strata.Owing to whether considering that SBS is disturbed by MBS, have no effect on and ask The analysis process of topic.For ease of processing, the present invention is not related to the Power Control Problem of macrocellular.So, the income of underlying User With oneself to launch power, neighbours SBS relevant to its interference and channel status.Microcellulor little for lower floor, SBS i receives Signal to Interference plus Noise Ratio can be written as:
γ i ( p i , p - i ) = p i h i i Σ j ≠ i p j h j i + σ 0 , ∀ i ∈ { 1 , 2 , ... , N } , - - - ( 1 )
σ in formula (1)0Represent the Gaussian noise power received, piRepresent the transmitting power of lower floor SBS i, p-iRepresent except The power policy of other SBS outside SBS i, hjiThe channel gain that SBS i user is disturbed by expression SBS j, i, j ∈ 1, 2 ..., N}, N are the sum of SBS, thenRepresent the interference using other base stations of shared channel that SBS i is brought.Under The utility function of layer SBS i can be defined as:
ui(pi, p-i, ui, λ0)=W log (1+ γi(pi, p-i))-uipi0gi0pi (2)
Formula (2) is made up of 3 parts, represents the interference that MBS is brought by the capacity gain of SBS, power consumption cost and SBS respectively, Wherein W represents bandwidth, gi0Represent the SBS i channel gain to MBS user, uiIt is energy consumption unit price, λ0Unit interference price, Being equivalent to SBS will be for paying to the interference of MBS.
2. the utility analysis of upper strata macrocellular and expression
For upper strata MBS, its target is that (the most all SBS are to MBS macrocellular under conditions of self can tolerate interference The accumulated interference of user is less than thresholding Z), maximize the cumulative paying income that it is disturbed by lower floor SBS.So upper strata MBS Utility function can be defined as:
U 0 ( λ 0 , p i ) = Σ i N λ 0 g i 0 p i - - - ( 3 )
P in formula (3)iThe function about interference price can be expressed as.It is also game Jiao of Dual-layer policy selection Point, implys that lower floor SBS launches how much power relevant with the interference price on upper strata.
3. the optimization problem of levels honeycomb during known channel state information
For lower floor's cellulor, if SBS to increase its through-put power, although improve the receipts of signal transmission rate Benefit, but the interference to MBS and the consumption of self-energy will be caused and pay more cost.So underlying User must select Suitably power policy maximizes the effectiveness of oneself, to reach the balance of income and cost.For each SBS user, ask Topic can be modeled as:
Problem 1:
MBS to maximize self benefits, so the target on upper strata can be established as band about in the range of its interference can be born Bundle optimization problem, it may be assumed that
Problem 2:
4. robustness optimization problem when interference channel status information not exclusively understands
Owing to SBS and MBS is subordinate to different individuals or operator, backhaul link capacity is extremely limited, generally cannot obtain To perfect CSI.Additionally also lack corresponding mechanism between SBS and share CSI.Therefore, the present invention considers actual imperfect letter Road information condition, introduces channel ambiguous model and describes the stochastic and dynamic of wireless channel.Assume that the letter of oneself is only known in base station Road gain hii, but the channel gain h of the same layer interference that do not know for surejiChannel gain g with cross-layer interferencei0.We are letter Road gain table is shown as the summation form of nominal estimated value and uncertain value, i.e. Herein from channel The uncertain worst condition caused of information sets out, and Staenberg problem of game is converted into the maximum-minimize problem of bilayer.
The utility function of lower floor SBS can be converted into:
max minU i ( p i , p - i , u i , λ 0 ) = W log ( 1 + p i h i i Σ j ≠ i p j ( h j i ‾ + Δh j i ) + σ 0 ) - u i p i - λ 0 ( g i 0 ‾ + Δg i 0 ) p i - - - ( 6 )
Being similar to, the utility function of upper strata MBS is converted into:
m a x min U 0 ( λ 0 , p i ) = Σ i N λ 0 ( g i 0 ‾ + Δg i 0 ) p i s . t . Σ i N ( g i 0 ‾ + Δg i 0 ) p i ≤ Z - - - ( 7 )
Utilize cylindricality model (column-wise) and Cauchy inequality, the upper bound of the uncertain component of channel gain and by not Determine that brought maximum interference can be characterized as respectively:
|Δgi0|≤εi0 (8-1)
Σ j ≠ i p j Δh j i ≤ [ Σ j ≠ i | p j | 2 Σ j ≠ i | Δh j i | 2 ] 1 2 ≤ ϵ j i Σ j ≠ i p j 2 - - - ( 8 - 2 )
Wherein ε represents the uncertain upper bound.Utilizing formula (8), former problem can be converted into and consider the maximum uncertain feelings of channel Robust bilayer problem of game under condition, i.e. the maximum-minimize problem of formula (6) and formula (7) can be reduced to:
Problem 3:
Problem 4:
5. distributed type double Q learning algorithm
In the double-deck game framework that invention is carried, the user of each participation game has finite discrete strategy set.This Invent and utilization strengthening Q learning algorithm is found equilibrium solution.We assume that all game participants are rationality, can select to make The optimal strategy that its effectiveness is maximum.The available policies collection of definition user i is|Si| represent plan The number slightly collected.Specific to levels user, the set of strategies of lower floor SBS user isUpper strata MBS user Set of strategies beThe policy space of all users is represented by Represent Descartes Long-pending.Define its when the t time iteration, each strategy probability vector beEach user need to be met Set of strategies probability and
So, the expected utility of user i just can be expressed as:
u i ( π i t , π - i t ) = E [ U i | π i t , π - i t ] = Σ s ′ ∈ S U i ( s ′ ) Π i ∈ N ∪ { 0 } π i , a i t - - - ( 11 )
WhereinRepresent that user i is based on current tactful probabilityThe strategy that collection is selected.
So maximum utility target for upper strata MBS can be written as:
Problem 5:
Similar, maximizing its effectiveness for lower floor SBS can be written as:
Problem 6:
By above-mentioned analysis, we provide the SE definition of double-deck nitrification enhancement.
Definition 2: when any policy selection meets levels base station effectiveness simultaneouslyWith Time, then policy selectionIt it is the stable strategy solution of double-deck study.
Theorem 2: MBS gives π on upper strata0In the case of, lower floor SBS certainly exists a mixed strategy solution (πi, π-i, π0) MeetThus obtain the Nash Equilibrium of lower floor.
In Q learning process, the strategy of user is parameterized for Q function, the relative utility of its each specific policy of expression. Define user i when the t time iteration based on strategy probabilitySelected strategyQ function beBetween user Strategy and environmental interaction, obtain each strategy returns award accordingly, updates Q function.At selection strategyAfter, corresponding Q-value Updated by formula (21),
Q i t + 1 ( s i , a i t + 1 ) = ( 1 - κ i t ) Q i t ( s i , a i t ) + κ i t u i ( s i , a i t , π - i t ) , - - - ( 14 )
WhereinRepresent learning rate, meetIt is that user i selects the t time iteration The expected returns of strategy, as shown in formula (15).
u i ( s i , a i t , π - i t ) = Σ a - i t ∈ S - i U i ( s i , a i t , S - i t ) Π j ∈ N ∪ { 0 } / i π j , a j t , - - - ( 15 )
WhereinAndEach BS user is according to Bohr of formula (15) Zi Man distribution updates its strategy.
π i t ( s i , a i ) = exp [ Q i t ( s i , a i t ) / ψ i ] Σ a i ∈ S exp [ Q i t ( s i , a i t ) / ψ i ] , - - - ( 16 )
Wherein ψi> 0 is temperature coefficient, selects to be tendency detection or utilization for control strategy.Work as ψiTend to 0, represent User only utilizes, and corresponding strategy can be selected to go to maximize Q-value.Relatively, ψ is worked asiTend to ∞, represent that user only detects, user Policy selection be completely random, the tactful probability distribution of userMeet and be uniformly distributed.According to formula (14) and (16), upper strata MBS updates corresponding Q function by iteration.Assume that upper strata MBS every c period updates a pricing strategy.Calculate in bilayer study iteration In method, as unique public information, first the MBS on upper strata issues price to all SBS of lower floor.Lower floor receives interference price After, find respective optimal response power policy by learning algorithm, then feed back to upper strata MBS at each time period terminal, So that upper strata MBS updates the bidding strategy of oneself according to the power policy information that lower floor reports.Algorithm is nested iterations circulation side Formula.
The Q function of lower floor SBS i is updated by formula (17),
Q i t + 1 ( s i , a i t + 1 ) = ( 1 - κ i t ) Q i t ( s i , a i t ) + κ i t u ‾ i ( s i , a i t , s 0 , a 0 ) , - - - ( 17 )
The expected utility wherein estimatedIt is represented by:
WhereinRepresent that levels merges in a period of time to be chosen asNumber of times.We can see Renewal to upper strata MBS and lower floor SBS is based on different unit of time, and underlying User every T time slot updates iteration and completes one Secondary, and upper-layer user c time period updates iteration and completes once, the renewal of levels subscriber policy is all based on the other side's iteration more Result after Xin is obtained by Q study.Lower floor performs formula (17) at the end of each time slot, completes the renewal of its Q function.Class As, upper strata MBS user performs formula (19) at the end of each time period c, completes the renewal of its Q function:
Q 0 c + 1 ( s 0 , a 0 ) = ( 1 - κ 0 ) Q 0 c ( s 0 , a 0 ) + κ 0 u 0 c ( s 0 , a 0 , π - i c T ) - - - ( 19 )
In actual algorithm running, when the set of strategies of user is relatively large, index is increased by the speed of convergence, becomes For the biggest short slab.The carried algorithm of the present invention makes full use of each environmental information, updates the Q of All Policies in an iteration Value, algorithm can quickly converge to a pure strategy equilibrium point, and concrete steps are as shown in table 1.
Table 1 modified model bilayer Q learning algorithm
Beneficial effects of the present invention is as follows:
In protection macro base station on the premise of the communication quality of user, the isomery bilayer robust Model of proposition can effectively suppress Problem due to user's income minimizing that uncertainty change brings.Carried algorithm can be restrained in the short period and obtain superior Policy selection result.
Four accompanying drawing explanations
Fig. 1 is the system schematic of the OFDM cellular network of downlink;
Fig. 2 is double-deck Q learning algorithm flow chart;
Fig. 3 is by being built framework performance specification schematic diagram;
Five detailed description of the invention
The embodiment of the present invention is as it is shown in figure 1, this network is made up of a macro base station and 2 small cell base station.Each base station Service multiple user in a time multiplexed manner.Macro base station shares multiplexed network frequency spectrum resource with small cell base station.For ease of dividing Analysis, it is assumed that each small cell base station only services a cellulor user at a time slot.
1) utility analysis of lower floor's cellulor and expression
u i ( p i , p - i , u i , λ 0 ) = W l o g ( 1 + p i h i i Σ j ≠ i p j h j i + σ 0 ) - u i p i - λ 0 g i 0 p i
It is made up of 3 parts, represents the interference that MBS is brought by the capacity gain of SBS, power consumption cost and SBS, wherein W respectively Represent bandwidth, σ0Represent the Gaussian noise power received, piRepresent the transmitting power of lower floor SBS i, p-iRepresent in addition to SBS i The power policy of other SBS, hjiRepresent the channel gain that SBS i user is disturbed by SBS j, thenRepresent use same Frequently the interference that SBS i is brought by other base stations of channel.gi0Represent the SBS i channel gain to MBS user, uiIt it is energy consumption unit Price, λ0Unit interference price, being equivalent to SBS will be for paying to the interference of MBS.
Underlying User must select suitable power policy to maximize oneself effectiveness, to reach the flat of income and cost Weighing apparatus.For each SBS user, problem can be modeled as:
Problem 1:
2) utility analysis of upper strata macrocellular and expression
u 0 ( λ 0 , p i ) = Σ i N λ 0 g i 0 p i
MBS to maximize self benefits, so the target on upper strata can be established as band about in the range of its interference can be born Bundle optimization problem, it may be assumed that
Problem 2:
3) robustness optimization problem when interference channel status information not exclusively understands
The present invention utilizes channel ambiguous model to describe the stochastic and dynamic of wireless channel.Channel measurement skill can be passed through in base station Art (channel-quality indicator measure) obtains the channel gain h of oneselfii, but the same layer that do not knows for sure The channel gain h of interferencejiChannel gain g with cross-layer interferencei0.We are expressed as nominal estimated value and the most true channel gain The summation form of definite value, i.e.Go out from the uncertain worst condition caused of channel information herein Send out, Staenberg problem of game is converted into the maximum-minimize problem of bilayer.And utilize cylindricality model (column-wise) and Cauchy inequality, the upper bound of the uncertain component of channel gain and can being characterized as respectively by uncertain brought maximum interference:
|Δgi0|≤εi0
Σ j ≠ i p j Δh j i ≤ [ Σ j ≠ i | p j | 2 Σ j ≠ i | Δh j i | 2 ] 1 2 ≤ ϵ j i Σ j ≠ i p j 2
Wherein ε represents the uncertain upper bound.Utilize above formula, former problem to be converted into and consider the maximum uncertain feelings of channel Robust bilayer problem of game under condition, modeling problem 1, the maximum-minimize problem of 2 can be reduced to:
Problem 3:
Problem 4:
4) distributed type double Q learning algorithm
Assume that SBS1 and SBS2 is respectively g to the nominal channel gain of MBS user10=0.2, g20=0.3, normalizing SBS Channel gain to himself user is h1,1=h2,2=1, the nominal interference channel gain between lower floor SBS is h respectively1,2=h2,1 =0.1.Noise power σ0=0.01dBmW.If the interference price strategy of MBS integrates as π0=[2.5,3,3.5,4,4.5], SBS's Power distribution strategies collection isThe wherein maximum transmission power p of SBSmax=100dBmW.Arrange Each time period is made up of T=100 time slot, upper strata iteration time hop count C=100.
Step 1: start upper strata circulation, until c=C maximum time hop count.(initialize all user's Q functions For each strategy equal-probability distribution.)
(1) in each time period, MBS is according to its strategy probability set π0, select a pricing strategyAnd be broadcast to All of lower floor SBS.
Step 2: lower floor learning process t=1:T
(1) each SBS i is according to the tactful probability set of oneselfSelect respective power policy sI, ai
(2) each SBS i calculates its effectiveness according to feedback informationAnd according to formula
Update it and estimate expected utility
(3) each SBS i is according to formula Calculate other | Si|-1 tactful effectiveness
(4) each SBS i is according to formulaAnd formula
Update its Q-value and strategy probability set.
(5) all SBS pass to MBS last strategy and terminate at T time slot.
The iteration completing lower floor's strategy updates.
Step 3:MBS calculates the effectiveness of its c time period
Step 4:MBS is according to formulaAnd formula
Update its Q-value and strategy probability set.
Step 5:MBS selects upper layer policy according to its updated policy probability set.
The iteration completing upper layer policy updates.C=c+1, jumps back to step 1.
Iteration terminates, and exports 1 macrocellular and the corresponding optimum strategy of 2 small cell base station.

Claims (4)

1. the distribution of the Radio Resource bilayer in channel status ambiguous model lower leaf isomery microcellulor network Optimization Framework And one distributed layer learning algorithm.Realized by the communication system at isomery cellular network: this system includes macrocellular system System and little micro cellular system two parts, macrocell system includes base station MBS and some macrocell user;Little micro cellular system includes L little microcell base station SBS and some little microcellulor users, each base station only services a cellulor user at a time slot. Because the frequency spectrum that SBS with MBS use is identical, the most inevitably there is the cross-layer between different base station and disturb with layer.
W represents bandwidth, σ0Represent the Gaussian noise power received, piRepresent the transmitting power of lower floor SBS i, p-iRepresent except SBS The power policy of other SBS outside i, hjiRepresent the channel gain that SBS i user is disturbed by SBS j, thenRepresent and use The interference that SBS i is brought by other base stations of shared channel.gi0Represent the SBS i channel gain to MBS user, uiIt it is energy consumption list Position price, λ0Unit interference price, be equivalent to SBS will for the interference of MBS is paid, and limit SBS must to the accumulated interference of MBS Must be less than threshold value Z.Specifically comprising the following steps that of the method
1) lower floor's cellulor effectiveness and optimization problem represent
u i ( p i , p - i , u i , λ 0 ) = W log ( 1 + p i h i i Σ j ≠ i p j h j i + σ 0 ) - u i p i - λ 0 g i 0 p i - - - ( 1 )
Being made up of 3 parts, represent the interference that MBS is brought by the capacity gain of SBS, power consumption cost and SBS respectively, wherein lower floor uses Family must select suitable power policy to maximize the effectiveness of oneself, to reach the balance of income and cost.Each SBS is used For family, problem can be modeled as:
Problem 1:
2) upper strata macrocellular effectiveness and optimization problem represent
MBS to maximize self benefits in the range of its interference can be born, thus the target on upper strata can be established as belt restraining excellent Change problem, it may be assumed that
Problem 2:
3) robustness optimization problem when interference channel status information not exclusively understands
The present invention utilizes channel ambiguous model to describe the stochastic and dynamic of wireless channel.Channel measurement technology can be passed through in base station (channel-quality indicator measure) obtains the channel gain h of oneselfii, but the same layer that do not knows for sure is done The channel gain h disturbedjiChannel gain g with cross-layer interferencei0.We are expressed as nominal estimated value and uncertain channel gain The summation form of value, i.e.
Problem 3:
Problem 4:
4) distributed type double Q learning algorithm
If the interference price strategy of MBS integrates as π0, the power distribution strategies of SBS integrates as π 'i, the wherein maximum transmission power of SBS pmax.Arranging each time period is made up of T time slot, upper strata iteration time hop count C.And upper strata iterations is C, each iteration It is denoted as c;Lower floor's iterations is T, and each iteration is denoted as t.
In Q learning process, the strategy of user is parameterized for Q function, the relative utility of its each specific policy of expression.Definition User i when the t time iteration based on strategy probabilitySelected strategyQ function beBy the strategy between user And environmental interaction, obtain each strategy returns award accordingly, updates Q function.At selection strategyAfter, corresponding Q-value passes through Formula (7) updates,
Q i t + 1 ( s i , a i t + 1 ) = ( 1 - κ i t ) Q i t ( s i , a i t ) + κ i t u i ( s i , a i t , π - i t ) , - - - ( 7 )
Wherein κ 'iRepresent learning rate, meetIt is that user i selects plan the t time iteration Expected returns slightly, as shown in formula (8).
u i ( s i , a i t , π - i t ) = Σ a - i t ∈ S - i u i ( s i , a i t , S - i t ) Π j ∈ N ∪ { 0 } / i π j , a j t , - - - ( 8 )
WhereinAndEach BS user is according to the Boltzmann of formula (9) Distribution updates its strategy.
π i t ( s i , a i ) = exp [ Q i t ( s i , a i t ) / ψ i ] Σ a i ∈ S exp [ Q i t ( s i , a i t ) / ψ i ] , - - - ( 9 )
Wherein ψi> 0 is temperature coefficient, selects to be tendency detection or utilization for control strategy.According to formula (7) and (9), upper strata MBS updates corresponding Q function by iteration.Assume that upper strata MBS every c period updates a pricing strategy.Calculate in bilayer study iteration In method, as unique public information, first the MBS on upper strata issues price to all SBS of lower floor.Lower floor receives interference price After, find respective optimal response power policy by learning algorithm, then feed back to upper strata MBS at each time period terminal, So that upper strata MBS updates the bidding strategy of oneself according to the power policy information that lower floor reports.Algorithm is nested iterations circulation side Formula is carried out.
Robust optimization problem the most according to claim 1 proposes Staenberg bilayer game framework, and we are lower floor's effectiveness And upper strata effectiveness (6) together forms Staenberg game (5).The target of game is to find SE equilibrium point so that levels user All can not obtain the raising of self effectiveness by individually changing its strategy.Use single leader many follower form herein. First MBS takes action as leader, issuing unit's interference price.SBS is as follower, according to the price of upper strata MBS, selects Optimal power allocation strategy maximizes its effectiveness.
Robust the most according to claim 1 processes, and from the uncertain worst condition caused of channel information, we are not The channel gain h of the same layer interference of perfect information can be obtainedjiChannel gain g with cross-layer interferencei0Be expressed as nominal estimated value and The summation form of uncertain value, i.e.And utilize cylindricality model (column-wise) and Cauchy Inequality, the upper bound of the uncertain component of channel gain and can being characterized as respectively by uncertain brought maximum interference:
|Δgi0|≤εi0
Σ j ≠ i p j Δh j i ≤ [ Σ j ≠ i | p j | 2 Σ j ≠ i | Δh j i | 2 ] 1 2 ≤ ϵ j i Σ j ≠ i p j 2
Wherein ε represents the uncertain upper bound.Utilize above formula, former problem can be converted under considering channel maximum uncertain condition Robust bilayer problem of game.
Distributed type double Q learning method the most according to claim 1, proposes to update the Q-value of All Policies in an iteration Method, more efficiently utilize each environmental information, algorithm the convergence speed has and is obviously improved.Concrete algorithm operating procedure For:
Step 1: start upper strata circulation, until c=C maximum time hop count.(initialize all user's Q functions For each strategy equal-probability distribution.)
(1) in each time period, MBS is according to its strategy probability set π0, select a pricing strategyAnd be broadcast to own Lower floor SBS.
Step 2: lower floor's learning process t=1: T
(1) each SBS i is according to the tactful probability set π ' of oneselfiSelect respective power policy sI, ai
(2) each SBS i calculates its effectiveness according to feedback informationAnd according to formula
Update it and estimate expected utility
(3) each SBS i is according to formulaCalculate Other | Si|-1 tactful effectiveness
(4) each SBS i is according to formulaAnd formula
Update its Q-value and strategy probability set.
(5) all SBS pass to MBS last strategy and terminate at T time slot.
The iteration completing lower floor's strategy updates.
Step 3:MBS calculates the effectiveness of its c time period
Step 4:MBS is according to formulaAnd formula
Update its Q-value and strategy probability set.
Step 5:MBS selects upper layer policy according to its updated policy probability set.
The iteration completing upper layer policy updates.C=c+1, jumps back to step 1.
Iteration terminates, and exports 1 macrocellular and the corresponding optimum strategy of 2 small cell base station.
CN201610736690.6A 2016-08-22 2016-08-22 A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network Pending CN106170131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610736690.6A CN106170131A (en) 2016-08-22 2016-08-22 A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610736690.6A CN106170131A (en) 2016-08-22 2016-08-22 A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network

Publications (1)

Publication Number Publication Date
CN106170131A true CN106170131A (en) 2016-11-30

Family

ID=57376582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610736690.6A Pending CN106170131A (en) 2016-08-22 2016-08-22 A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network

Country Status (1)

Country Link
CN (1) CN106170131A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106937295A (en) * 2017-02-22 2017-07-07 沈阳航空航天大学 Heterogeneous network high energy efficiency power distribution method based on game theory
CN107276704A (en) * 2017-05-10 2017-10-20 重庆邮电大学 The maximized optimal robustness Poewr control method of efficiency is based in two layers of Femtocell network
CN107360029A (en) * 2017-07-11 2017-11-17 王焱华 A kind of cloud computing method and device
CN107995034A (en) * 2017-11-30 2018-05-04 华北电力大学(保定) A kind of dense cellular network energy and business collaboration method
CN108156665A (en) * 2018-02-28 2018-06-12 北京科技大学 A kind of resource allocation methods in isomery cloud small cell network
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN108834108A (en) * 2018-05-03 2018-11-16 中国人民解放军陆军工程大学 Fight the D2D cooperative relaying selection method based on virtual decision that half-duplex is actively eavesdropped
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN109787996A (en) * 2019-02-21 2019-05-21 北京工业大学 A kind of spoof attack detection method based on DQL algorithm in mist calculating
CN110472764A (en) * 2018-05-09 2019-11-19 沃尔沃汽车公司 Coordinate the method and system serviced in many ways using half cooperation nash banlance based on intensified learning
CN110636523A (en) * 2019-09-20 2019-12-31 中南大学 Millimeter wave mobile backhaul link energy efficiency stabilization scheme based on Q learning
CN110749881A (en) * 2019-09-17 2020-02-04 南京航空航天大学 Unmanned aerial vehicle cluster robust power control method based on improved double-layer game
CN111491315A (en) * 2019-12-18 2020-08-04 中国人民解放军陆军工程大学 Model and layered learning algorithm for expanding delay and energy consumption compromise in unmanned aerial vehicle network
CN112752291A (en) * 2020-12-15 2021-05-04 中国联合网络通信集团有限公司 Uplink rate evaluation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102104889A (en) * 2011-03-22 2011-06-22 北京邮电大学 Cross-layer optimization system and method based on impedance matching
CN102307351A (en) * 2011-08-29 2012-01-04 中山大学 Game-theory-based spectrum allocation method, communication equipment and system
CN103906246A (en) * 2014-02-25 2014-07-02 北京邮电大学 Wireless backhaul resource scheduling method under honeycomb heterogeneous network
US20140213271A1 (en) * 2013-01-28 2014-07-31 Snu R&Db Foundation Apparatus and method for performing inter-cell interference coordination using limited channel state information in heterogeneous network
CN104796993A (en) * 2015-04-21 2015-07-22 西安交通大学 Stackelberg game-based cross-layer resource allocation method of heterogeneous network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102104889A (en) * 2011-03-22 2011-06-22 北京邮电大学 Cross-layer optimization system and method based on impedance matching
CN102307351A (en) * 2011-08-29 2012-01-04 中山大学 Game-theory-based spectrum allocation method, communication equipment and system
US20140213271A1 (en) * 2013-01-28 2014-07-31 Snu R&Db Foundation Apparatus and method for performing inter-cell interference coordination using limited channel state information in heterogeneous network
CN103906246A (en) * 2014-02-25 2014-07-02 北京邮电大学 Wireless backhaul resource scheduling method under honeycomb heterogeneous network
CN104796993A (en) * 2015-04-21 2015-07-22 西安交通大学 Stackelberg game-based cross-layer resource allocation method of heterogeneous network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SHENGRONG BU: "Interference-Aware Energy-Efficient Resource Allocation for OFDMA-Based Heterogeneous Networks With Incomplete Channel State Information", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 *
YOUMING SUN: "Capacity offloading in two-tier small cell networks over unlicensed band: A hierarchical learning framework", 《2015 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP)》 *
张志才: "Femtocell网络的绿色节能技术研究", 《中国博士学位论文全文数据库信息科技辑》 *
徐勇军: "下垫式认知无线电网络动态资源分配问题研究", 《信息科技辑》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106937295A (en) * 2017-02-22 2017-07-07 沈阳航空航天大学 Heterogeneous network high energy efficiency power distribution method based on game theory
CN107276704B (en) * 2017-05-10 2020-08-04 重庆邮电大学 Optimal robust power control method based on energy efficiency maximization in two-layer Femtocell network
CN107276704A (en) * 2017-05-10 2017-10-20 重庆邮电大学 The maximized optimal robustness Poewr control method of efficiency is based in two layers of Femtocell network
CN107360029A (en) * 2017-07-11 2017-11-17 王焱华 A kind of cloud computing method and device
CN107995034A (en) * 2017-11-30 2018-05-04 华北电力大学(保定) A kind of dense cellular network energy and business collaboration method
CN107995034B (en) * 2017-11-30 2020-12-08 华北电力大学(保定) Energy and service cooperation method for dense cellular network
CN108156665A (en) * 2018-02-28 2018-06-12 北京科技大学 A kind of resource allocation methods in isomery cloud small cell network
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN108521673B (en) * 2018-04-09 2022-11-01 湖北工业大学 Resource allocation and power control joint optimization method based on reinforcement learning in heterogeneous network
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN108834108B (en) * 2018-05-03 2021-04-02 中国人民解放军陆军工程大学 D2D cooperative relay selection method for resisting half-duplex active eavesdropping and based on virtual decision
CN108834108A (en) * 2018-05-03 2018-11-16 中国人民解放军陆军工程大学 Fight the D2D cooperative relaying selection method based on virtual decision that half-duplex is actively eavesdropped
CN110472764A (en) * 2018-05-09 2019-11-19 沃尔沃汽车公司 Coordinate the method and system serviced in many ways using half cooperation nash banlance based on intensified learning
CN110472764B (en) * 2018-05-09 2023-08-11 沃尔沃汽车公司 Method and system for coordinating multiparty services using semi-collaborative Nash balancing based on reinforcement learning
CN109787996A (en) * 2019-02-21 2019-05-21 北京工业大学 A kind of spoof attack detection method based on DQL algorithm in mist calculating
CN109787996B (en) * 2019-02-21 2021-11-12 北京工业大学 Camouflage attack detection method based on DQL algorithm in fog calculation
CN110749881A (en) * 2019-09-17 2020-02-04 南京航空航天大学 Unmanned aerial vehicle cluster robust power control method based on improved double-layer game
CN110636523A (en) * 2019-09-20 2019-12-31 中南大学 Millimeter wave mobile backhaul link energy efficiency stabilization scheme based on Q learning
CN111491315A (en) * 2019-12-18 2020-08-04 中国人民解放军陆军工程大学 Model and layered learning algorithm for expanding delay and energy consumption compromise in unmanned aerial vehicle network
CN111491315B (en) * 2019-12-18 2023-06-27 中国人民解放军陆军工程大学 System based on delay and energy consumption compromise model in extended unmanned aerial vehicle network
CN112752291A (en) * 2020-12-15 2021-05-04 中国联合网络通信集团有限公司 Uplink rate evaluation method and device
CN112752291B (en) * 2020-12-15 2022-12-13 中国联合网络通信集团有限公司 Uplink rate evaluation method and device

Similar Documents

Publication Publication Date Title
CN106170131A (en) A kind of sane layering Game Learning resource allocation methods of channel status condition of uncertainty lower leaf heterogeneous network
Du et al. Contract design for traffic offloading and resource allocation in heterogeneous ultra-dense networks
CN1992962B (en) Inter-cell interference coordination method based on evolution network architecture of 3G system
Liu et al. Dynamic spectrum access algorithm based on game theory in cognitive radio networks
CN104955077B (en) A kind of heterogeneous network cell cluster-dividing method and device based on user experience speed
Sun et al. Location optimization and user association for unmanned aerial vehicles assisted mobile networks
Abozariba et al. NOMA-based resource allocation and mobility enhancement framework for IoT in next generation cellular networks
CN102395136B (en) Telephone traffic distribution calculation method based on neighbor cell field intensity information and system thereof
CN102438313B (en) Communication alliance dispatching method based on CR (cognitive radio)
CN107249217A (en) The Joint Task unloading of ad hoc mobile cloud network and resource allocation methods
CN104902488B (en) The configuration method of each layer network base station in layered heterogeneous network
CN106792824A (en) Cognitive heterogeneous wireless network robust resource allocation algorithm
CN107466099A (en) A kind of interference management self-organization method based on non-orthogonal multiple access
CN103269487B (en) Dynamic disturbance management method based on game theory in femtocell network downlink
CN103856996A (en) Power control-access control combined method
CN106060851A (en) Secure resource optimization method under congestion control in heterogeneous cloud wireless access network
CN104579444B (en) Interference alignment schemes in a kind of isomery cellular network
CN107454601A (en) The wireless dummy mapping method of inter-cell interference is considered under a kind of super-intensive environment
CN104159314B (en) The distributed energy saving resources distribution method of heterogeneous network
CN108848535A (en) A kind of mist calculating environmental resource distribution method towards shared model
Cheung et al. SINR-based random access for cognitive radio: Distributed algorithm and coalitional game
Lyu et al. Analysis and optimization for large-scale LoRa networks: Throughput fairness and scalability
CN104540203A (en) Performance optimizing method for wireless body area network based on independent sets
CN110493800A (en) Super-intensive networking resources distribution method based on Game with Coalitions in a kind of 5G network
CN104486767B (en) Dynamic ABS disturbance restraining methods based on sub-clustering in isomery cellular network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210007 post battalion, Qinhuai District, Nanjing, Jiangsu Province, No. 18

Applicant after: National University of Defense Technology

Address before: 210007 post battalion, Qinhuai District, Nanjing, Jiangsu Province, No. 18

Applicant before: The 36th Institute of Central Military Commission Equipment Development Department

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161130