CN106358308A

CN106358308A - Resource allocation method for reinforcement learning in ultra-dense network

Info

Publication number: CN106358308A
Application number: CN201510409462.3A
Authority: CN
Inventors: 张海君; 王文韬; 孙梦颖; 郝匀琴; 周平; 阳欣豪
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2015-07-14
Filing date: 2015-07-14
Publication date: 2017-01-25

Abstract

A resource allocation method for reinforcement learning in an ultra-dense network is provided. The invention relates to the field of ultra-dense networks in 5G (fifth generation) mobile communications and provides a method for allocating resources between a home node B and a macro node B, between a home node B and another home node B and between a home node B and a mobile user in a dense deployment network; the method is implemented through power control, each femotcell is considered as an intelligent body to jointly adjust transmitting powers of home node Bs, the densely deployed home node Bs are avoided causing severe jamming to a macro node B and an adjacent B when transmitting at maximum powder, and system throughput is maximized; user delay QoS is considered, and traditional 'Shannon capacity' is replaced with 'available capacity' that may ensure user delay; a supermodular game model is utilized such that whole network power distribution gains Nash equilibrium; the reinforcement learning method Q-learning is utilized such that the home node B has learning function, and optimal power distribution can be achieved; by using the resource allocation method, it is possible to effectively improve the system capacity of an ultra-dense network at the premise of satisfying user delay.

Description

A kind of resource allocation methods of the intensified learning in super-intensive network

Technical field

Present document relates to moving communicating field, especially, the present invention is the method for a kind of resource allocation of super-intensive heterogeneous network (ultra dense network, udn) in the 5th generation (5th-generation) GSM.

Background technology

Nowadays mobile network has been enter into the quick stage of popularization, and meanwhile 5g technology is all energetically being studied in countries in the world, and 5g standard also begins to show up prominently.Take the lead in, using cognitive radio power technology, being automatically determined the frequency range of offer by network, realize the distinguishing feature that multi-internet integration is 5g.The 5g of China works also achieved with preliminary achievement.The main target of 5g network is Consumer's Experience, network is carried out redesigning in terms of capacity, speed, time delay three, optimize.5g network will accommodate substantial amounts of terminal device simultaneously, therefore will jointly meet the demand of mobile service flow growth by higher spectrum reuse efficiency, more frequency spectrum resource and more dense cell deployment.This makes the new challenge of 5g network faces.

Because, in traditional GSM, network design, operation maintenance depend on greatly manually, need to expend a large amount of human and material resources.Therefore, just have self-organizing network (self-organization network, Son concept), realizes deployment, maintenance and the optimization of network by the self organization ability of communication network.And have the access of many low power nodes in 5g system so that network structure is more complicated, form super-intensive heterogeneous network.Due to the rare of Radio Resource so that many operators wish to make full use of frequency spectrum resource in high band, use double-layer network in following 5g system, there is two kinds of user is grand user and domestic consumer respectively simultaneously.In order to make full use of frequency spectrum resource, double-layer network shares identical frequency range, but have also been introduced co-channel interference simultaneously, and existing technology can not solve cross-layer interference problem present in 5g network well, based on this, the self-optimization techniques in self-organization network are applied among super-intensive network distribute to realize the self-organizing of resource by primary study of the present invention.

In order to realize resource self-organizing distribution, classical intensified learning q-learning algorithm is applied in super-intensive network.Q-learning algorithm is based on discrete state, and discrete variable refers to the size of power in the present invention.The housing choice behavior of q-learning and behavior value function select different strategies respectively, by measuring channel parameter in real time, dynamically carry out resource self-organizing distribution, and then reach the purpose of suppression interference.

Content of the invention

Present invention is generally directed to energy saving resources assignment problem is it is proposed that a kind of resource allocation methods based on q-learning algorithm and equilibria of Supermodular Games in super-intensive self-organizing network.The method optimizes the energy utilization efficiency in network, ensures qos, improves power system capacity.

In order to solve the above problems, the invention provides a kind of Resource Allocation Formula of efficiency optimization:

Step 1: initialization Studying factors, for each state s and each action a, initialize evaluation function, transmission strategy, guess, positive scalar.

Step 2: initialization Home eNodeB state s, transmission power p, signal interference ratio etc..

Step 3: according to transmission strategy, the action of selected active procedure.

Step 4: according to the current signal interference ratio of the feedback information detection of target receiver, by identifying whether current transmission power rank and relatively current signal interference ratio determine subsequent time state more than thresholding.

Step 5: if the current Signal to Interference plus Noise Ratio of domestic consumer is more than threshold value, return (return, i.e. Efficiency Function) is calculated by Reward Program, otherwise, zero setting will be returned.

Step 6: the return value obtaining in step 4 is asked with expectation, using the q more new formula based on guess, obtains new q value, update evaluation function.

Step 7: according to the q value obtaining in step 5, according to greedy strategy, update the strategy of user.

Step 8: using guess more new formula, obtain the guess of other Home eNodeB subsequent time behaviors, make user enter next state.Proceed to step 2.

Step 9: terminate this learning process, the radio resource allocation of each Home eNodeB completes, prepare scheduling of resource next time.

In step 1, each Home eNodeB only allows a user to access, and the setting of q value needs estimation in advance.

In step 3, strategy, i.e. the probability of selection action, the maximum action of select probability.

In step 4, the computing formula of signal interference ratio

In formula,Represent Home eNodeB to the channel gain of internal user, the channel gain of Home eNodeB to other base station user.Represent the interference to Home eNodeB for the grand user., represent the transmission power of this base station and interference base station respectively, refer to Gaussian noise.

In steps of 5, the Efficiency Function of each base station refers to

In formula, refer to the available capacity of Home eNodeB k, under the requirement ensureing a fixed response time qos, wireless channel can reach the transfer rate of maximum.Represent qos delay parameter, bigger, delay requirement is higher.Represent the transmission power to user for the Home eNodeB.Represent the power being consumed in communication network.

Above formula shows that subchannel is linear, and the transmission power obtaining on every sub-channels is linear with general power

(in formula, refer to a frame data transmission duration in conjunction with Shannon public affairs capacity formula, b represents communication bandwidth.Obtain the expression of utility function

In step 6, the more new formula of q value,

In formula, it is Studying factors, there is attenuation characteristic, it is discount factor.Refer to the state in next moment.

In step 7, the more new formula of strategy

Value is the positive parameter of temperature, and value is bigger, and the probability convergence of action is identical.

In step 8, guess more new formula

In formula, refer to the guess of previous moment.Refer to subsequent time, in current state and strategy when taking current action, refer to the strategy of previous moment.

From technical method, this method with maximize each Home eNodeB efficiency as target, consider the same layer interference in super-intensive self-organizing network, cross-layer is disturbed and qos time delay, constantly adjust the transmission power of Home eNodeB using q-learning algorithm, it is finally reached Nash Equilibrium, so that the efficiency of each Home eNodeB is optimized.Both ensure the qos time delay of user, improve the efficiency of Home eNodeB it is achieved that the RRM of home base station network again.

Below by the drawings and specific embodiments, technical scheme is further elaborated.

Brief description

For the elaboration embodiments of the invention that become apparent from and existing technical scheme, below technical scheme is illustrated that the explanation accompanying drawing used in accompanying drawing and description of the prior art does simple introduction, obviously, on the premise of not paying creative work, those of ordinary skill in the art can obtain other accompanying drawings by this accompanying drawing.

Fig. 1 show the system architecture diagram comprising Home eNodeB and macro base station in the embodiment of the present invention；

The flow chart that Fig. 2 show Home eNodeB power distribution in the embodiment of the present invention.

Specific embodiment

Main idea is that, by being simulated to communication environment, set up model, initialize Studying factors, guess, transmission strategy and evaluation function q, the state of detection present channel, state instruction parameter has signal interference ratio, transmission power, status etc..According to transmission policy selection current action, the signal interference ratio detecting is compared with given threshold value, if being more than threshold value, obtain a return, if being less than threshold value, the return obtaining is set to zero, obtain new q value using based on the q more new formula of guess, and according to q value by the strategy of greedy strategy acquisition subsequent time and guess, renewal subsequent time state, and enter next communications status, repeat above-mentioned learning process.Power allocation scheme is assessed as Performance evaluation criterion using q value, finds and make the maximum power allocation scheme of Total Return function in super-intensive subzone network.

Fig. 1 show the system architecture diagram simultaneously including that Home eNodeB is disposed with frequency with macro base station, and it comprises a macro base station, multiple femto base station and its user, grand user.

Step 101: setting learning process initial time t=0.

Step 102: initialization Studying factors, for each state s and each action a, initialize evaluation function, transmission strategy, guess, positive scalar.

Step 103: initialization Home eNodeB state s, transmission power p, signal interference ratio etc..

Step 104: according to transmission strategy, select the action of active procedure.

Step 105: according to the current signal interference ratio of the feedback information detection of target receiver, by identifying whether current transmission power rank and relatively current signal interference ratio determine subsequent time state more than thresholding.

Step 106: if current signal interference ratio is more than thresholding, will pass through Reward Program and calculate a return, otherwise, return is set to zero.

Step 107: in step 4, all return value obtaining are asked expectation, using the q more new formula based on guess, obtain new q value, update evaluation function.

Step 108: by the q value obtaining according to the new transmission strategy of the acquisition of greedy strategy.

Step 109: using guess more new formula, obtain the guess to other Home eNodeB subsequent time behaviors.

Step 110: enter next state determined by step 105, enter next moment t=t+1, and proceed to step 102.

Step 111: terminate this learning process, the radio resource allocation of each Home eNodeB completes, prepare scheduling of resource next time.

Claims

1. a kind of resource allocation methods of the intensified learning in super-intensive network are it is characterised in that comprise the following steps:

Step 1: initialization Studying factors, for each state s and each action a, initialize evaluation function, transmission strategy, guess, positive scalar；

Step 2: initialization Home eNodeB state s, transmission power p, signal interference ratio etc.；

Step 3: according to transmission strategy, the action of selected active procedure；

Step 4: according to the current signal interference ratio of the feedback information detection of target receiver, by identifying whether current transmission power rank and relatively current signal interference ratio determine subsequent time state more than thresholding；

Step 5: if the current Signal to Interference plus Noise Ratio of domestic consumer is more than threshold value, return (return, i.e. Efficiency Function) is calculated by Reward Program, otherwise, zero setting will be returned；

Step 6: the return value obtaining in step 4 is asked with expectation, using the more new formula based on guess q, obtains new q value, update evaluation function；

Step 7: according to the q value obtaining in step 5, according to greedy strategy, update the strategy of user；

Step 8: using guess more new formula, obtain the guess of other Home eNodeB subsequent time behaviors, make user enter next state, proceed to step 2；

2. super-intensive network according to claim 1 resource allocation methods it is characterised in that:

In described step 1, using equilibria of Supermodular Games model, the intelligent body that each Home eNodeB is viewed as having fair competition relation tries to achieve optimal objective so as to reach Nash Equilibrium.

3. super-intensive network according to claim 1 resource allocation methods it is characterised in that:

In described step 5, using Efficiency Function as q-learning algorithm evaluation function, Efficiency Function formula is

, the available capacity of its middle finger Home eNodeB k, under the requirement ensureing a fixed response time qos, wireless channel can reach the transfer rate of maximum；Represent qos delay parameter, bigger, delay requirement is higher, represent the transmission power to user for the Home eNodeB, represent the power being consumed in communication network,

Above formula shows that subchannel is linear, and the transmission power obtaining on every sub-channels is linear with general power, (in formula, refer to a frame data transmission duration in conjunction with Shannon public affairs capacity formula, b represents communication bandwidth, obtains the expression of utility function.

4. super-intensive network according to claim 1 resource allocation methods it is characterised in that:

In described step 8, target household base station requires no knowledge about the strategy to other base station strategies, but the strategy of other base stations is guessed, adjusts the strategy of oneself so as to convergence Nash Equilibrium, its guess more new formula is in formula, refers to the guess of previous moment,Refer to subsequent time, in current state and strategy when taking current action, refer to the strategy of previous moment.

5. super-intensive network according to claim 1 resource allocation methods it is characterised in that:

In described step 6, based on the q more new formula of guess it is:

In formula, it is Studying factors, there is attenuation characteristic, it is discount factor, refers to the state in next moment, go to assess the quality of return value in this iteration with q value.