CN107995034A

CN107995034A - A kind of dense cellular network energy and business collaboration method

Info

Publication number: CN107995034A
Application number: CN201711236163.XA
Authority: CN
Inventors: 李保罡; 吕亚波; 赵伟; 刘涛
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2018-05-04
Anticipated expiration: 2037-11-30
Also published as: CN107995034B

Abstract

The embodiment of the invention discloses a kind of dense cellular network energy and business collaboration method, can be applied to the resource allocation of online more base stations, first with matching theory, realizes the packet of user and corresponding base station.The sub-clustering of customer-centric is realized using matching theory, so as to reduce the scale of base station group in units of cluster, then realizes energy cooperation between the distribution of base station power and base station using acting on behalf of nitrification enhancement more.

Description

A kind of dense cellular network energy and business collaboration method

Technical field

The present invention relates to wireless communication field, more particularly to a kind of dense cellular network energy and business collaboration method.

Background technology

Super-intensive network is considered as one of most promising technology in 5G, and small honeycomb covering radius can be realized smaller Interference, high spectrum reuse, high data rate, at the same time, substantial amounts of cellular basestation also bring unprecedented energy Expense is measured, for studying as research hotspot in recent years for base station energy-saving problem.

At present, in the prior art just for the resource allocation under single honeycomb based on energy capture and two honeycombs, and Research for more base station energy cooperations under dense network scene is less, how to carry out dense cellular network energy and industry The cooperation of business is those skilled in the art's technical problem urgently to be resolved hurrily.

The content of the invention

In order to solve the above technical problems, an embodiment of the present invention provides a kind of dense cellular network energy and business collaboration side Method.

An embodiment of the present invention provides following technical solution：

A kind of dense cellular network energy and business collaboration method, the described method includes：

According to utility function, the list of preferences on user terminal and base station is generated；

According to list of preferences, using multi-to-multi matching algorithm, user base station cluster is obtained；

In user base station cluster, using nitrification enhancement, the cooperation plan of energy between base station power distribution and base station is obtained Slightly.

Wherein, it is described according to utility function, the list of preferences on user terminal and base station is generated, is specifically included：Definition Utility functionThe data volume that nth base station can be sent on k-th of channel to terminal m is represented, according to transmission data rateAnd channel gainGenerate base station and the list of preferences of user.

Wherein, it is described in user base station cluster, using nitrification enhancement is acted on behalf of, obtain base station power distribution and base station more Between energy cooperation policy, specifically include：

The first step, determines behavior aggregate, that is, acts on behalf of all possible behavior value of output, state representation is extracted from environment, As observation of the agency to environment；

Second step, the state of each agency's observation current environment, into the exploratory stage；

3rd step, act on behalf of it is average and speed is target using maximization system, according to the behavior of the observation progress rationality of oneself Selection, wherein, behavior, which includes the transmit power of base station and energy cooperation, the strategy that this part can be used to decision-making, two, at random The experimental strategy and deterministic baseline policy of property；

4th step, after the completion of all decision-makings of all base stations, the incentive message of computing environment, its corresponding shape of each agent update State behavior value；

5th step, repeats third and fourth step, until the exploratory stage terminates, the relatively newer strategy learnt and benchmark plan Quality slightly, using preferably strategy as the output policy of this state.

Compared with prior art, above-mentioned technical proposal has the following advantages：

Method of the present invention, can be applied to the resource allocation of online more base stations, real first with matching theory Current family and the packet of corresponding base station.The sub-clustering of customer-centric is realized using matching theory, calculation is matched better than with tradition Method, is directed to user in the present invention, channel, the matching of base station three, by base station channel corresponding with its with an effectiveness letter Number represents, the pairing of above-mentioned three can be realized using a matching process, so as to avoid the two level in conventional method Match somebody with somebody, the complexity of calculating is reduced on the premise of guarantee is optimal.In power allocated phase, so as to reducing base station group in units of cluster Scale, the cooperation of energy between the distribution of base station power and base station is then realized using online intensified learning method.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is illustrated by the dense cellular network energy that one embodiment of the invention provides and the flow of business collaboration method Figure.

Embodiment

Just as described in the background section, there is provided a kind of dense cellular network energy and business collaboration method are people from this area Member's urgent problem to be solved.

In view of this, the present invention proposes a kind of dense cellular network energy and business collaboration method, core of the invention Thought is：Online resource allocation algorithm is just for single honeycomb or two honeycombs in the prior art.And it is directed to dense cellular Network, when there are during a large amount of base stations, the method for traditional intensified learning does not guarantee that convergence in the case of more agencies, in addition works as base When quantity of standing is especially more, if directly carrying out resource point using the constringent methods for acting on behalf of intensified learning of existing guarantee more Match somebody with somebody, its convergence rate is also especially slowly, it is necessary to considerably long learning time.Due to the presence of disadvantages mentioned above, directly using traditional Intensified learning method is not suitable for the resource allocation in dense network, and in order to overcome disadvantages mentioned above, inventor creatively proposes, first First with matching theory, base station, user, resource block are matched, realize the packet of user and corresponding base station.Utilize matching theory Realize the sub-clustering of customer-centric, it is then real using intensified learning method so as to reduce the scale of base station group in units of cluster The cooperation of energy between existing base station power distribution and base station.

That is, the access point of magnanimity causes power distribution and the global optimization of energy cooperation to be faced with dense network Huge difficulty, for this reason, patent of the present invention proposes distributed solution, institute in power allocation procedure is reduced by sub-clustering The base station number being related to, effectively reduces the difficulty of nitrification enhancement, ensure that and is received in the limited learning cycle of algorithm Hold back optimal policy.

Referring to Fig. 1, an embodiment of the present invention provides a kind of dense cellular network energy and business collaboration method, applied to close Collect cellular network, the described method includes：

Step 101：According to utility function, the list of preferences on user terminal and base station is generated.

Wherein, according to utility function, the list of preferences on user terminal and base station is generated, is specifically included：

Define utility functionRepresent the data volume that nth base station can be sent on k-th of channel to terminal m, foundation Send data rateAnd channel gainGenerate base station and the list of preferences of user.

Step 102：According to list of preferences, using multi-to-multi matching algorithm, user base station cluster is obtained.

Specifically, according to list of preferences, user terminal and base station are matched according to multi-to-multi matching theory model, obtained To user base station cluster.

Step 103：In user base station cluster, using nitrification enhancement, energy between base station power distribution and base station is obtained Cooperation policy.

In user base station cluster, the appearance acted on behalf of more causes the unstable of environment so that algorithm can not restrain, to solve this Problem, proposes the concept of exploratory stage, and the action learning acted on behalf of more is modeled as stage game, it is allowed to which agency is in a state The lower exploration for carrying out limited number of time, row is produced with different probability using the experimental strategy of randomness and deterministic baseline policy To calculate its progressive award, after the exploratory stage, comparative experiments strategy and baseline policy, export final strategy.

Specifically, in user base station cluster, the battery capacity, channel condition information, the energy that obtain all base stations in cluster are caught Obtain with the information such as data packet, start exploratory stage step, base station attempts to carry out power distribution in cluster, empirically tactful with Probability p Decision-making is carried out, the reward of taken action and acquisition is recorded using baseline policy decision-making with probability 1-p, each acts on behalf of basis The reward of acquisition updates the strategy of oneself, and by this information record into local knowledge base, the step of repeating the exploratory stage, directly Terminate to the exploratory stage, the strategy and baseline policy that comparative learning arrives, export final strategy.

From above-described embodiment, method of the present invention, can be applied to the resource allocation of online more base stations, first First with matching theory, the packet of user and corresponding base station is realized.The sub-clustering of customer-centric is realized using matching theory, So as to reduce the scale of base station group in units of cluster, then realized using intensified learning method between the distribution of base station power and base station The cooperation of energy.

Moreover, the method for present invention institute, by terminal, base station, the matching process simplification of three variables of resource block, by once Matching process realizes the pairing of above three amount.Resource block and base station are subjected to unified arrangement, with k-th of i-th of base station Channel is a matching amount and terminal coupling, and the list of preferences of terminal is base station and the synthesis of channel.Moreover, the above process is realized The sub-clustering of base station, and realize the distribution of power after sub-clustering and energy cooperation, still, due to portfolio and energy capture Uneven, the clustering model of above-mentioned formation will no longer be over time optimal Clustering Model, that is, need restarting to match Algorithm sub-clustering, we mainly consider the energy residual situation of base station, when the remaining capacity for having base station in cluster is not enough to support this when In gap during the transmitting of data, just it is again started up matching algorithm and forms new cluster.Since more agent algorithms are tactful in the present invention Study, each agency is rationality in addition, and when member changes in cluster, what is learnt before is tactful equally applicable, is not required to again Iterative study.

Wherein, it is described in user base station cluster, using nitrification enhancement, draw energy between base station power distribution and base station Cooperation strategy, specifically include：

The first step, determines behavior aggregate A, that is, acts on behalf of all possible behavior value of output；State representation is extracted from environment S, as observation of the agency to environment.

Second step, the state s of each agency's observation current environment^t, into the exploratory stage.

3rd step, act on behalf of it is average and speed is target using maximization system, according to the behavior of the observation progress rationality of oneself Selection, wherein, behavior, which includes the transmit power of base station and energy cooperation, the strategy that this part can be used to decision-making, two, at random The experimental strategy and deterministic baseline policy of property.

4th step, after the completion of all decision-makings of all base stations, the incentive message of computing environment, its corresponding shape of each agent update State behavior value Q (s, a).

5th step, repeats third and fourth step, until the exploratory stage terminates.The relatively newer strategy learnt and benchmark plan Quality slightly, using preferably strategy as the output policy of this state.

Below to involved in the above method to committed step carry out referring to explanation：

1. the generation of list of preferences

It is to realize user to match purpose, base station, the pairing of resource block, since the same resource block same time can only be by one A base station and user use, so being ranked up in user terminal by what base station and resource block were unified, avoid 3 in the matching process The matching of a amount, makes algorithm more concise.Instability caused by frequently switching access point in view of terminal, terminal are preferential Consider the base station that access energy is more and channel quality is good, comprehensive two factors, we define utility functionRepresent n-th The data volume that base station can be sent on k-th of channel to terminal m,

Wherein B_nFor base station battery electricity,It is used for the channel gain of k-th of channel for connecting terminal m for nth base station,For the transmission power of base station, σ²For additive white Gaussian noise,Co-channel interference for other base stations to base station n,For transmission power of i-th of base station on k-th of channel.

Each terminal accordingly sorts base station and channel, the list of preferences of generation terminal-pair base station.Base station is to the inclined of terminal Good list is determined by the channel gain of base station to terminal.In view of there is N number of base station in model, each base station have K it is mutually orthogonal Subchannel, M user, then list of preferences be expressed as

Wherein, SBS_iFor the arrangement behind base station and the expansion of its channel, the numbering of i-th of BTS channel of expression, UE_iFor i-th User.

2. matching process

Since each base station there are K orthogonal subchannels, K terminal can be serviced at the same time, it is further contemplated that terminal is more Connect working method, it is assumed that each user can at most connect L different subchannel, therefore base station, terminal, the matching of channel belong to Multi-to-multi matches.Detailed process is as follows.

1) when there is not matched terminal, an optional terminal, performs operation below,

2) request matching：The terminal m chosen sends pairing request to base station n, and k-th of the letter to be matched is contained in request Road information, and n priority for terminal in base station is highest, and do not refused terminal m.

Respond：If m requested channels in base station are idle, receive request, otherwise, base station on channel K The terminal m of terminal i and the current request pairing of pairing compares, and the list of preferences according to base station to terminal, it is high to receive priority Terminal pairing request, refuses another terminal and is added in not matched terminal list.

3) until not matched terminal list is stops during sky, otherwise, return is 1).

4) matching terminates, and returns to the set of pairing.

3. act discretization and state feature extraction

Assuming that having m terminal in a cluster, then it is corresponding with m channel and is serviced for it, base station where collecting this m channel Battery capacity, energy capture situation, the data packet to be sent and channel gain information, form the status information in this cluster, represent ForWherein,The data package size of i-th of base station, energy are represented respectively Amount capture, battery capacity and channel gain information.The action definition of agency isWhereinFor transmit power, For the energy of two base station cooperations.In order to simplify the selection of behavior aggregate, patent of the present invention uses limited transmission power value and cooperation Energy value, is expressed asWithWherein, δ_p,δ_ERepresent minimum respectively for step-length Transmit power and cooperation energy unit.

4. value function approximation

Action value function is approached using linear functionWillIt is expressed as limited a characteristic function φ_i,m (s^t, a), m=1 ..., M and weight vectors θ_iSum of products form

Wherein, Φ (s^t, a)=(φ_i,1(s^t,a),...,φ_i,n(s^t, a)) be state action pair feature function set, φ_i,l(s^t, function a) is characterized, θ is weight vectors, and characteristic function is using tiling coding (tiling in patent of the present invention code).After characteristic function determines, to acting value functionRenewal be converted into adjustment to weight vectors, using most Small mean square error is the target of weighed value adjusting, and the purpose of adjustment of weights is to minimize Q_i(s, a) andDifference, it is more Newly process is

5. system is averaged and speed

In view of under dense network scene, each honeycomb has the electric power storage of non-uniform energy capture and limited memory capacity The characteristics of pond, in two neighboring time slot the change of base station battery electricity be expressed as

Wherein,The energy of data consumption is sent for t time slot base stations,The energy of base station i is shared to for t time slot base stations n, η is energy transmission efficiency.Obviously, the currently used energy in base station is no more than the electricity stored in battery.

In view of the causality of energy capture, i.e. the energy of current time slots capture can only be in next and later time slot Use, therefore, sending the required energy of data should meet

Signal-to-noise ratio is in downlink

T-th of time slot base station n is calculated by k-th of channel as the speed of terminal m transmission data is

Then the speed of base station all in system is

The purpose of patent of the present invention is that the system in finite time of maximizing is average and speed

s.t.(3)(4)(5)

6. power distribution algorithm

The power distribution algorithm of more base stations in cluster is discussed in detail in this part.This algorithm is using Markov game as theoretical mould Type, is distributed and the energy cooperation between base station, realization more using intensified learning is acted on behalf of to complete the power of base station on the downlink channel The purpose of maximum system throughput in the case of energy constraint.

It is traditional act on behalf of the problem of nitrification enhancement presence can not restrain, analyze its reason and be multiple learning agents Exist at the same time and result in non-stable external environment condition, learning agent is in dynamic environment without calligraphy learning to a stable decision-making Strategy.For this problem, the present invention proposes the method repeatedly explored under each state, is by this process model building Stage game, in this stage, agency can use baseline policy decision-making, and the experiment plan of randomness is used with a small probability New strategy is slightly explored, multiple learning agents learn optimal policy response under stable environment, after the exploration of limited number of time, The new tactful and original baseline policy that comparative learning arrives, chooses preferably optimal plan of the strategy output as current state Slightly.Idiographic flow is as follows.

1) arrange parameter：The experiment Probability p of i-th of agency_i, learning rate α, inertia values λ_i。

2) the tactful π of i-th of agency is initialized_i, which is to act on behalf of the optimal response that i is directed to other proxy policies.

3) the state s of environment is sensed, starts the exploratory stage.

4) with 1-p_iProbability use initial policy (baseline policy) decision-making, with p_iProbability is explored using randomized policy, It is expressed as

5) agency receives the incentive message r of environment_i(s^t,a₁,...,a_i,...,a_m), observation NextState s^t+1。

6) value function is updated according to following formula

For all

7) for all states, peak optimization reaction collectionIf the exploratory stage Strategy belongs to peak optimization reaction collection, then the strategy of next exploratory stage isOtherwise, the strategy of next exploratory stage is

As shown in the above, the present invention propose be directed to dense network scene under, in base station utilisable energy by limited time, base Stand, terminal, the sub-clustering of resource block and online power distribution and base station energy cooperation method.

The present invention is directed to the problem of management of a large amount of base stations of dense network, is realized using distributed matching algorithm with user Centered on sub-clustering, so as to reduce the scale of base station group in units of cluster, and then simplify the complexity of power distribution.

A terminal can connect multiple base stations at the same time in the present invention, and a base station can service multiple terminals, therefore after matching Have overlapping between cluster, in terms of user perspective, the base station that services same terminal is a cluster, between the base station in cluster energy cooperation lead to Cross directly transmission electric energy to realize, from the perspective of base station, each base station connects multiple users equivalent to being in multiple clusters, cluster Between energy cooperation can pass through base station adjust different terminals transmission power realize.From the point of view of whole system, above-mentioned two A process can adjust flowing of the energy between base station, realize the balance of base station functions.

The present invention simplifies terminal, base station, the matching process of three variables of resource block, is realized by a matching process State the pairing of three amounts.Resource block and base station are subjected to unified arrangement, using k-th of channel of i-th of base station as a matching Amount and terminal coupling, the list of preferences of terminal is base station and the synthesis of channel.

The present invention introduces the concept of exploratory stage more in nitrification enhancement is acted on behalf of, and is used in exploratory stage agency solid Fixed strategy π, is tested with a small probability to explore other strategies, thus creates a stable environment for agency Learn optimal global decisions strategy, ensure that convergence.

The present invention considers that the data rate of base station in model is limited by local energy, when generating list of preferences, this Patent of invention is base using the existing electricity B of base station battery as an influence factor, combined channel gain g, definition utility function The estimate of data volume can be sent on every channel by standing

The present invention considers the efficiency of energy transmission between base station, can be by transmitting energy or adjustment to same with base station in cluster The transmission power two ways of one terminal, which is realized, to be cooperated, and online nitrification enhancement realizes both equal in patent of the present invention Weighing apparatus.

The status information s of base station of the present invention is continuous quantity, for this reason, introducing linear value function approximation method, is forced with linear function Recently store and predict the Q values of each state, with reference to tiling coding algorithms, realize to continuous state space in model Processing.

Moreover, traditional matching algorithm is to use second degree matches for the matched solution method of three amounts, i.e., first allow it In two it is flux matched, then allow one of intermediate quantity and the 3rd flux matched, process complexity more of the present invention.

One kind is proposed in the case of energy constraint for super-intensive network in patent of the present invention, and base station is by using appropriate Transmission power and energy coordination strategy to maximize, system is average for a long time and the purpose of speed.First to utilization in the present invention The method carried out with algorithm to large-scale base station in sub-clustering, with existing literature is different, and the present invention makees base station and its resource block For a matching amount, a list of preferences is generated, base station, user and resource block progress are realized using multi-to-multi matching algorithm Match somebody with somebody.Promoter of the user as matching process, more consideration is given to the satisfaction for having arrived user, in substance form using user in The overlapping clustering model of the heart.Compared to traditional second degree matches, and the matching of three amounts, method simplifies matching in patent of the present invention Process, is allowed to more succinct understandable.For overlapping cluster after matching, the present invention propose it is online act on behalf of nitrification enhancement into Row power distributes.Appearance for multiple learning agents causes dynamic environment so that agency can not be restrained with problem, it is proposed that The method repeatedly explored under each state, this process model building is stage game, and in this stage, agency can use benchmark plan Slightly decision-making, and new strategy is explored using the experimental strategy of randomness with a small probability, multiple learning agents are stable Learn optimal policy response under environment, after the exploration of limited number of time, new tactful and original benchmark plan that comparative learning arrives Slightly, preferably optimal policy of the strategy output as current state is chosen.More agent algorithms in the present invention ensure that convergence, Simultaneously because the reduction of base station scale, acting on behalf of the environment of observation becomes simple, and algorithm can converge to optimal strategy faster.

In the prior art, the regenerative resource (such as solar energy, wind energy etc.) captured from environment is non-uniform, is had Ripple is qualitative and discontinuity, for the lower base station of such power supply energy supply maximize handling capacity the problem of propose theoretical frame and The method of specific implementation.

Base station energy capture proposed by the present invention and the method for multiple base station energy cooperations are ensureing the premise of service quality Under contribute to energy-saving and emission-reduction, reduce operator's operating cost, realize the economic benefit of higher.

Patent of the present invention realizes the distributed of base station and obtains electric energy by energy capture from maintenance operation, single base station, It is qualitative to reduce the ripple of capture energy by the storage battery accumulation of energy of limited capacity, meanwhile, the energy cooperation between base station realizes big rule The networking of the mould base station energy is shared, and further improves the stability of base station operation.

Various pieces are described by the way of progressive in this specification, and what each some importance illustrated is and other parts Difference, between various pieces identical similar portion mutually referring to.

The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention Embodiment illustrated herein is not intended to be limited to, and is to fit to consistent with the principles and novel features disclosed herein Most wide scope.

Claims

1. a kind of dense cellular network energy and business collaboration method, it is characterised in that the described method includes：

In user base station cluster, using nitrification enhancement, the cooperation policy of energy between base station power distribution and base station is obtained.

2. according to the method described in claim 1, it is characterized in that, described according to utility function, generate on user terminal and The list of preferences of base station, specifically includes：

Define utility function V_nk,m, the data volume that nth base station can be sent on k-th of channel to terminal m is represented, according to transmission Data rate V_nk,mWith channel gain g_nk,m, generate base station and the list of preferences of user.

3. according to the method described in claim 1, it is characterized in that, described in user base station cluster, using acting on behalf of extensive chemical more Algorithm is practised, the cooperation policy of energy between base station power distribution and base station is obtained, specifically includes：

The first step, determines behavior aggregate, that is, acts on behalf of all possible behavior value of output；State representation is extracted from environment, as Act on behalf of the observation to environment；

3rd step, act on behalf of it is average and speed is target using maximization system, according to the action selection of the observation progress rationality of oneself, Wherein, behavior, which includes the transmit power of base station and energy cooperation, the strategy that this part can be used to decision-making, two, the reality of randomness Test tactful and deterministic baseline policy；

4th step, after the completion of all decision-makings of all base stations, the incentive message of computing environment, its corresponding statusline of each agent update For value；

5th step, repeats third and fourth step, until the exploratory stage terminates, the relatively newer strategy learnt and baseline policy Quality, using preferably strategy as the output policy of this state.