CN116132997A - Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm - Google Patents

Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm Download PDF

Info

Publication number
CN116132997A
CN116132997A CN202310082022.6A CN202310082022A CN116132997A CN 116132997 A CN116132997 A CN 116132997A CN 202310082022 A CN202310082022 A CN 202310082022A CN 116132997 A CN116132997 A CN 116132997A
Authority
CN
China
Prior art keywords
base station
heterogeneous network
state
algorithm
energy efficiency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310082022.6A
Other languages
Chinese (zh)
Inventor
李君�
刘子怡
刘兴鑫
李晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi University
Original Assignee
Wuxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi University filed Critical Wuxi University
Priority to CN202310082022.6A priority Critical patent/CN116132997A/en
Publication of CN116132997A publication Critical patent/CN116132997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/08Access point devices
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a method for optimizing energy efficiency in a hybrid power supply heterogeneous network based on an A2C algorithm, which comprises the steps of determining the user position of a small base station according to the quantity distribution condition of a macro base station and the small base station; a single small base station is regarded as an intelligent agent, and a state space, an action space and a reward function of a Markov decision process are established; random acquisition of small cell user and environment interactionsObtaining a state; action information(s) t ,a t ,r t ,s t+1 ) Transferring to a critic network; transmitting the optimal actions obtained by learning of each small base station as states to a macro base station, and repeatedly deploying the small base stations in the coverage area of the macro base station to obtain an optimal small base station deployment strategy, namely optimal resource allocation; the users are connected to the corresponding small base stations to obtain better channels, and the energy efficiency of the heterogeneous network system is maximized. The invention improves the energy efficiency in the A2C algorithm by using the A2C algorithm in reinforcement learning, approximates the state action value function by using a Gaussian distribution method, saves the resources of the traditional power grid, and saves the cost of the power grid communication energy consumption.

Description

Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm
Technical Field
The invention relates to the technical field of physical layers of communication systems, in particular to a method for optimizing energy efficiency in a hybrid power supply heterogeneous network based on an A2C algorithm.
Background
The increasing number of terminals and the rapid increase in data traffic demands, conventional single-layer networks have not been able to meet the rapidly evolving demands of current technologies, and wireless communication networks have also faced significant challenges. In order to relieve the huge pressure of the communication network, researchers put forward an A2C (adaptive active-critical network) algorithm, and the current wireless access network is developed into an A2C algorithm consisting of macro base stations meeting the wide area access requirements and small base stations meeting the small area high density access requirements. In order to support high speed mobile data services and provide better coverage, next generation cellular networks are expected to deploy widely micro or small cell base stations that can offload some users and traffic from traditional macro base stations. While system capacity can be increased and network coverage can be enhanced, the deployment of a large number of small base stations also presents new challenges such as cell interference, resource waste, and huge energy consumption, and thus optimizing grid resources from a global perspective is becoming more and more important. With the rise of the number of base stations and the price of energy, energy efficiency becomes an important issue for grid management.
Disclosure of Invention
The invention provides a method for optimizing energy efficiency in a hybrid power supply heterogeneous network based on an A2C algorithm, which optimizes energy efficiency based on the A2C algorithm in reinforcement learning.
In order to achieve the above effects, the technical scheme of the invention is as follows:
the method for optimizing the energy efficiency in the hybrid power supply heterogeneous network based on the A2C algorithm comprises the following steps:
step 1: constructing a heterogeneous network system according to an optimization target, wherein the system consists of a macro base station, small base stations and users, a single small base station is regarded as an intelligent body, a state space, an action space and a reward function of a Markov decision process are established, and the signal-to-interference-and-noise ratio from the macro base station to the users is calculated;
step 2: calculating total conventional power of the macro base station, and constructing an actor network and a critic network according to a defined Markov decision process; training a heterogeneous network system by utilizing an A2C algorithm;
step 3: random acquisition state s of small base station user and environment interaction t State s t Information is transmitted to an actor network, and the actor network is used for transmitting information according to the state s of the current environment t And the self state of the intelligent agent to select proper action a t Obtain instant rewards r t And state s at time t+1 t+1 Obtain action information(s) t ,a t ,r t ,s t+1 );
Step 4: action information(s) t ,a t ,r t ,s t+1 ) Transmitting to the critic network, updating the parameters of the critic network and maximizing rewards to obtain an optimal small base station deployment strategy, namely, optimal resource allocation; the users are connected to the corresponding small base stations to obtain better channels, and the energy efficiency of the heterogeneous network system is maximized.
The heterogeneous network system comprises a macro base station and a plurality of small base stations, and users are uniformly distributed in the coverage area of the base stations.
Further, in step 1, K small base stations and M users are deployed in the macro base station, the set of the small base stations is κ= {0,1,2, … …, K }, the set of the users is m= {0,1,2, … … M }, and assuming that the conventional macro base station and the subordinate small base stations use the same radio spectrum, the signal-to-interference-and-noise ratio γm of the user M during the time slot t is:
Figure BDA0004067819750000021
in the formula g k,m (t) is the average channel gain of macro base station at time slot t, p for user m k,m (t) is the transmit power of the small cell; g i,m (t) is the average channel gain of user m from other interfering base stations i at time slot t; p is p i (t) is for user u i (t) the total radio transmission power of the serving base station i,
Figure BDA0004067819750000022
Figure BDA0004067819750000023
is the variance of the additive white gaussian noise at user m.
Further, in step 1, the total bandwidth W and bandwidth B of the agent are divided into sub-channels of W/B, and during time slot t, user m is allocated as B m (t) ∈ {0,1, … …, W/B }, while
Figure BDA0004067819750000024
Total information rate r achieved by the agent during time slot t sum (t) is:
Figure BDA0004067819750000025
the small base station acquires energy from the power grid and the renewable energy collection device.
Further, the wireless transmission power obtained by the conventional power grid is set to be a positive value, and the wireless transmission power obtained in the renewable energy source equipment is set to be a negative value; at time slot t, the total power of the macro base station
Figure BDA0004067819750000026
The method comprises the following steps:
Figure BDA0004067819750000027
in the method, in the process of the invention,
Figure BDA0004067819750000028
representing static power, including circuit board generations of processors, cooling components, etc.; η is a coefficient factor related to the efficiency of the wireless power amplifier; p is p k (t) is the total wireless transmission power of all users u (t) associated with the macro base station in time slot t;
the static power
Figure BDA0004067819750000029
To obtain the static power of each base station from the regular grid, the total regular power of the macro base station at time slot t +.>
Figure BDA00040678197500000210
The method comprises the following steps:
Figure BDA0004067819750000031
wherein p is k,u (t) represents the power used by the user in the macro base station, u k (t) represents a user.
Further, the energy efficiency of the agent in step 1 during time slot t is defined as the ratio ρ of the information rate to the conventional power consumption t The method comprises the following steps:
Figure BDA0004067819750000032
further, in step 1, the states are the signal-to-interference-and-noise ratio γ (t) of each user and the battery power e (t) of each small cell, that is:
s t =(γ 1 (t),γ 2 (t),...,γ M (t),e 1 (t),e 2 (t),...,e K (t));
action a t Represented as the number u of users in macro base station k (t), number of subchannels b for user m m (t), transmission power p of user m k,m (t), action a t =(u k (t),b m (t),p k,m (t));
Rewards r t Equivalent to the energy efficiency of heterogeneous network systems, i.e. r t =ρ t
Further, step 2 trains the heterogeneous network system by utilizing an A2C algorithm specifically,
step 2.1: establishing a heterogeneous network system according to an optimization target; setting a reinforcement learning frame, initializing a state space, selecting an optimal action according to a state obtained by interaction of an agent and an A2C algorithm environment, and transmitting the optimal action as state information to a macro base station;
step 2.2: state s t Inputting the state action value function and the state value function of the heterogeneous network system into the heterogeneous network system, and updating the state action value function and the state value function of the heterogeneous network system;
step 2.3: and carrying out multiple rounds of iterative training on the heterogeneous network until the rewarding function converges, and obtaining the trained heterogeneous network.
Further, in step 2.2, the state action value function and the state value function of the A2C algorithm are updated, specifically, the state action value function Q of the A2C algorithm is updated w (s, a) and a state value function V v (s) updating, approximating the state action value function Q by using a Gaussian distribution method w (s, a); approximating the state value function V with tile encoding v (s) using 32 tile overlays, each tile overlay 4*4 =16; introducing qualification trace to update all action values from each screen to different degrees, wherein the update degree is attenuated according to the time, and a dominance function is set in an actor network.
Further, the state action value function Q w (s, a) is expressed as:
Figure BDA0004067819750000041
wherein w is T Is a parameter vector, expressed as: w= (w) 1 ,w 2 ,...,w n ) T The method comprises the steps of carrying out a first treatment on the surface of the Use parameter θ= (θ) 12 ,...,θ n ) T Construction strategy pi θ = (a|s), expressed as pi θ (s,a)=Pr(a|s,θ);π θ (s,a)~N(μ(s),σ 2 ) Mu(s) is the mean and sigma is the standard deviation; ψ(s) =(γ 12 ,...,γ M ,e 1 ,e 2 ,...,e K ) T
Figure BDA0004067819750000042
Gamma is denoted as the signal-to-interference-and-noise ratio of each user and e is denoted as the battery level of each small base station.
Further, the state value function V v (s) is expressed as:
V v (s)=w It(1) *1+w It(2) *1+...+w It(32) *1,
wherein w is It(1) 、w It(2) 、....、w It(32) To activate the weight value of a tile, it (1), it (2), it (32) is the index value of the activated tile, the other tiles are 0.
Further, in step 4, the operation information (s t ,a t ,r t ,s t+1 ) The action return value q of the agent transmitted to the critic network by the critic network t Making an evaluation, and updating the critic network parameters by adopting TD errors (the deviation between the estimated value in the Xu Chafen learning method and the existing value); will(s) t ,a t ,q t ) Transmitting to an actor network, updating action selection probability according to strategy gradient, and maximizing rewards r t The method comprises the steps of carrying out a first treatment on the surface of the Transmitting the optimal actions obtained by learning of each small base station as states to a macro base station, and repeatedly deploying the small base stations in the coverage area of the macro base station to obtain an optimal small base station deployment strategy, namely optimal resource allocation; the users are connected to the corresponding small base stations to obtain better channels, and the energy efficiency of the heterogeneous network system is maximized.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
according to the invention, the heterogeneous network model is trained by using the A2C algorithm in reinforcement learning, so that the energy efficiency of a heterogeneous network system is improved, the state action value function is approximated by using a Gaussian distribution method, the state value function is approximated by using tile coding, the actor network is helped to know the strategy gradient, the variance of the strategy gradient is further reduced by using the advantage function, and the A2C algorithm is enabled to converge faster.
Drawings
The drawings are for illustrative purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
FIG. 1 is a schematic flow chart of an energy efficiency optimization method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a heterogeneous network system according to an embodiment of the present invention;
fig. 3 is a training flowchart of a heterogeneous network system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
For ease of understanding, referring to fig. 1-2, an embodiment of a method for optimizing energy efficiency in a hybrid power heterogeneous network based on an A2C algorithm provided by the present invention includes the following steps:
step 1: determining the user position of the small base station according to the quantity distribution condition of the macro base station and the small base station; a single small base station is regarded as an intelligent agent, and a state space, an action space and a reward function of a Markov decision process are established; setting the rewards as the energy efficiency of the agent;
step 2: constructing an actor network and a critic network according to a defined Markov decision process; training a heterogeneous network system by utilizing an A2C algorithm;
step 3: small base station userAmbient interaction randomly obtains state s t State s t Transmitting to an actor network, wherein the actor network is used for transmitting the state s of the current environment to the actor network t And the self state of the intelligent agent to select the proper action a t Obtain instant rewards r t And state s at time t+1 t+1 Obtain action information(s) t ,a t ,r t ,s t+1 );
Step 4: action information(s) t ,a t ,r t ,s t+1 ) Pass to the critic network, and the action taken by the critic network on the agent returns q t Evaluating the value, and updating the critic network parameters by adopting TD errors; will(s) t ,a t ,q t ) Transmitting to an actor network, updating action selection probability according to strategy gradient, and maximizing rewards r t The method comprises the steps of carrying out a first treatment on the surface of the Transmitting the optimal actions obtained by learning of each small base station as states to a macro base station, and repeatedly deploying the small base stations in the coverage area of the macro base station to obtain an optimal small base station deployment strategy, namely optimal resource allocation; the users are connected to the corresponding small base stations to obtain better channels, and the energy efficiency of the heterogeneous network system is maximized. When the state of the critic network changes, a new small base station deployment strategy can be obtained only by re-inputting a new state into the actor network.
The invention optimizes the energy efficiency by using the A2C algorithm in reinforcement learning, saves the resources of the traditional power grid and saves the cost of the energy consumption of the communication network.
Example 2
Specifically, the description of the embodiment will be given with reference to specific embodiments on the basis of embodiment 1, so as to further demonstrate the technical effects of the present embodiment. The method comprises the following steps:
specifically, in step 1, K small base stations and M users are deployed in the macro base station, where the set of small base stations is κ= {0,1,2, … …, K }, the set of users is m= {0,1,2, … … M }, and if the conventional macro base station and the subordinate small base stations use the same radio spectrum, the signal-to-interference-and-noise ratio γ of the user M during the time slot t m The method comprises the following steps:
Figure BDA0004067819750000061
/>
wherein k represents macro base station, g k,m (t) is the average channel gain of macro base station at time slot t, p for user m k,m (t) is the transmit power of the small cell; g i,m (t) is the average channel gain of user m from other interfering base stations i at time slot t; p is p i (t) is for user u i (t) the total radio transmission power of the serving base station i,
Figure BDA0004067819750000062
Figure BDA0004067819750000063
is the variance of the additive white gaussian noise at user m.
Specifically, in step 1, the total bandwidth W and bandwidth B of the agent are divided into sub-channels W/B, and during time slot t, user m is allocated as B m (t) ∈ {0,1, … …, W/B }, while
Figure BDA0004067819750000064
Total information rate r achieved by the agent during time slot t sum (t) is:
Figure BDA0004067819750000065
specifically, the wireless transmission power obtained by the conventional power grid is set to be a positive value, and the wireless transmission power obtained in the renewable energy source equipment is set to be a negative value; at time slot t, the total power of the macro base station
Figure BDA0004067819750000066
The method comprises the following steps:
Figure BDA0004067819750000067
in the method, in the process of the invention,
Figure BDA0004067819750000068
representing static power; η is a coefficient factor related to the efficiency of the wireless power amplifier; p is p k (t) is the total wireless transmission power of all users u (t) associated with the macro base station in time slot t;
the static power
Figure BDA0004067819750000069
To obtain the static power of each base station from the regular grid, the total regular power of the macro base station at time slot t +.>
Figure BDA00040678197500000610
The method comprises the following steps:
Figure BDA00040678197500000611
wherein p is k,u (t) represents the power used by the user in the macro base station, u k (t) represents a user.
Specifically, the energy efficiency of the agent in step 1 during time slot t is defined as the ratio ρ of the information rate to the conventional power consumption t The method comprises the following steps:
Figure BDA0004067819750000071
specifically, in step 1, the states are a signal-to-interference-and-noise ratio γ (t) of each user and a battery power e (t) of each small cell, that is:
s t =(γ 1 (t),γ 2 (t),...,γ M (t),e 1 (t),e 2 (t),...,e K (t));
action a t Represented as the number u of users in macro base station k (t), number of subchannels b for user m m (t), transmission power p of user m k,m (t), action a t =(u k (t),b m (t),p k,m (t)); rewards r t Equivalent to the energy efficiency of heterogeneous network systems, i.e. r t =ρ t
Specifically, as shown in fig. 3, step 2 training the heterogeneous network system by using an A2C algorithm specifically includes step 2.1: setting a reinforcement learning frame, initializing a state space, selecting an optimal action according to a state obtained by interaction of an agent and an A2C algorithm environment, and transmitting the optimal action as state information to a macro base station;
step 2.2: state s t Inputting the state action value function and the state value function of the heterogeneous network system into the heterogeneous network system, and updating the state action value function and the state value function of the heterogeneous network system;
step 2.3: and carrying out multiple rounds of iterative training on the heterogeneous network until the rewarding function converges, and obtaining the trained heterogeneous network.
In a specific implementation process, assuming that 20 users exist in one macro base station, the positions of the users are randomly distributed, and an initial state is randomly generated; the bandwidth of each sub-channel is set to b=1 Hz, the information rate r of each user m =[0,1]The method comprises the steps of carrying out a first treatment on the surface of the Setting P at maximum transmission kilometer of single user max =1w, therefore the transmission power p of user m k,m ∈[-1,1]The method comprises the steps of carrying out a first treatment on the surface of the Static field power
Figure BDA0004067819750000072
η=1, discount coefficient β=0.99, attenuation factor λ=0.5; normalizing most of the parameters, and normalizing the renewable energy collected in the small cell to e k =[0,1]The signal-to-interference-plus-noise ratio between the small cell and its served user m is normalized to gamma m =[0,1]The method comprises the steps of carrying out a first treatment on the surface of the The learning rate of the critic network is 0.02, the learning rate of the actor network is gradually increased from 0.01 to 0.03, 10000 rounds are trained, and each round is circulated 50 times.
Specifically, in step 2.2, the state action value function and the state value function of the A2C algorithm are updated, specifically, the state action value function Q of the A2C algorithm is updated w (s, a) and a state value function V v (s) updating, approximating the state action value function Q by using a Gaussian distribution method w (s, a); approximating the state value function V with tile encoding v (s) using 32 tile overlays, each tile with 4*4 =16, to solve the problem of huge state actions;the critic network introduces qualification trace to update all action values from each screen to different degrees, and the update degree sets a dominance function in the actor network according to the time attenuation.
The dominance function is expressed as: a(s) t ,a t )=Q w (s t ,a t )-[τV v (s t )+(1-τ)V v (s t-1 )]The method comprises the steps of carrying out a first treatment on the surface of the The Q table in the state action value function is obtained through reinforcement learning iteration according to different actions executed under a plurality of state information.
State action value function Q w (s, a) is expressed as:
Figure BDA0004067819750000081
wherein w is T Is a parameter vector, expressed as: w= (w) 1 ,w 2 ,...,w n ) T The method comprises the steps of carrying out a first treatment on the surface of the Use parameter θ= (θ) 12 ,...,θ n ) T Construction strategy pi θ = (a|s), expressed as pi θ (s,a)=Pr(a|s,θ);π θ (s,a)~N(μ(s),σ 2 ) Mu(s) is the mean and sigma is the standard deviation; ψ(s) = (γ) 12 ,...,γ M ,e 1 ,e 2 ,...,e K ) T
Figure BDA0004067819750000082
Gamma is denoted as the signal-to-interference-and-noise ratio of each user and e is denoted as the battery level of each small base station.
Since the bellman equation cannot be used for huge state actions, a function approximation is used to approximate the function; the difference between the A2C algorithm and the AC algorithm is that the A2C algorithm introduces a baseline to construct an advantage function, further reduces variance, and makes the A2C algorithm converge faster, so two sets of parameters, namely a state action value function Q, need to be updated in the algorithm w (s, a) and a state value function V v (s)。
State value function V v (s) is expressed as:
V v (s)=w It(1) *1+w It(2) *1+...+w It(32) *1,
wherein w is It(1) 、w It(2) 、....、w It(32) To activate the weight value of a tile, it (1), it (2), it (32) are index values of activated tiles, and other tiles are not present in the state value function with inactive eigenvalues of 0.
Aiming at the problem of energy waste in power grid communication and reducing the energy cost and energy burden of a power grid, the invention provides a novel small base station, wherein energy collecting equipment is integrated in the small base station, and renewable energy sources (such as wind energy and solar energy) are utilized for replacing energy sources for supply. The invention uses the novel small base station, uses renewable energy sources to supply power, and uses the traditional power grid to supply power when the renewable energy sources in the equipment are exhausted. In the environment of the A2C algorithm, the novel small base station is deployed in the coverage area of the traditional macro base station, the small base station unloads traffic from the macro base station, and the macro base station makes joint user scheduling and resource allocation decisions. The A2C algorithm is applied to a macro base station and a small base station, and the small base station is randomly deployed in the coverage area of the macro base station; the types of the users connected to the base station can be divided into macro base station users and small base station users, and the users connected to the macro base station are more than the users connected to the small base station generally; the energy efficiency of the hybrid power supply of the heterogeneous network system is improved by using an A2C algorithm.
The invention provides a method for optimizing the energy efficiency of hybrid power supply based on an A2C algorithm in reinforcement learning, which uses Gaussian distribution as a parameterized strategy in an actor network to generate continuous random behaviors, and uses a gradient rising method to update strategy parameters. Tile encoding is used at the critic network to estimate the performance of the strategy and to help the actor network to learn the strategy gradient, with the advantage function further reducing the variance of the strategy gradient.
The pseudo code of the A2C algorithm in the embodiment of the present invention is as follows:
table 1A2C algorithm pseudocode
Figure BDA0004067819750000091
/>
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1. The method for optimizing the energy efficiency in the hybrid power supply heterogeneous network based on the A2C algorithm is characterized by comprising the following steps of:
step 1: constructing a heterogeneous network system according to an optimization target, wherein the system consists of a macro base station, small base stations and users, a single small base station is regarded as an intelligent body, a state space, an action space and a reward function of a Markov decision process are established, and the signal-to-interference-and-noise ratio from the macro base station to the users is calculated;
step 2: calculating total conventional power of a macro base station, constructing an actor network and a critic network according to a defined Markov decision process, and training a heterogeneous network system by utilizing an A2C algorithm;
step 3: random acquisition state s of small base station user and environment interaction t State s t Information is transmitted to an actor network, and the actor network is used for transmitting information according to the state s of the current environment t And the self state of the intelligent agent to select proper action a t Obtain instant rewards r t And state s at time t+1 t+1 Obtain action information(s) t ,a t ,r t ,s t+1 );
Step 4: action information(s) t ,a t ,r t ,s t+1 ) Transmitting to the critic network, updating the parameters of the critic network and maximizing rewards to obtain an optimal small base station deployment strategy, namely, optimal resource allocation; the users are connected to the corresponding small base stations to obtain better channels, and the energy efficiency of the heterogeneous network system is maximized.
2. The method for optimizing energy efficiency in a hybrid power supply heterogeneous network based on an A2C algorithm according to claim 1, wherein the signal-to-interference-and-noise ratio from a macro base station to a user is calculated in the step 1, wherein K small base stations and M users are deployed in the macro base station, and the user positions of the small base stations are determined according to the quantity distribution condition of the macro base station and the small base stations; the set of small base stations is κ= {0,1,2, … …, K }, the set of users is m= {0,1,2, … … M }, assuming that the legacy macro base station and the attached small base stations use the same radio spectrum, the signal-to-interference-and-noise ratio γ of user M during time slot t m The method comprises the following steps:
Figure FDA0004067819740000011
in the formula g k,m (t) is the average channel gain of macro base station at time slot t, p for user m k,m (t) is the transmit power of the small cell; g i,m (t) is the average channel gain of user m from other interfering base stations i at time slot t; p is p i (t) is for user u i (t) the total radio transmission power of the serving base station i,
Figure FDA0004067819740000012
Figure FDA0004067819740000013
is the variance of the additive white gaussian noise at user m.
3. The method for optimizing energy efficiency in a hybrid power heterogeneous network based on A2C algorithm according to claim 2, wherein the calculating macro base station total regular power in step 2 is specifically that the total bandwidth W, bandwidth B of the heterogeneous network system is divided into sub-channels of W/B, and during time slot t, user m is allocated as B m (t) ∈ {0,1, … …, W/B },
Figure FDA0004067819740000021
heterogeneous network system is implemented during time slot tCurrent total information rate r sum (t) is:
Figure FDA0004067819740000022
setting the wireless transmission power obtained by a conventional power grid to be a positive value, and setting the wireless transmission power obtained by renewable energy equipment to be a negative value; at time slot t, the total power of the macro base station
Figure FDA0004067819740000023
The method comprises the following steps:
Figure FDA0004067819740000024
in the method, in the process of the invention,
Figure FDA0004067819740000025
representing static power; η is a coefficient factor related to the efficiency of the wireless power amplifier; p is p k (t) is the total wireless transmission power of all users u (t) associated with the macro base station in time slot t;
the static power
Figure FDA0004067819740000026
To obtain the static power of each base station from the regular grid, the total regular power of the macro base station at time slot t +.>
Figure FDA0004067819740000027
The method comprises the following steps: />
Figure FDA0004067819740000028
Wherein p is k,u (t) represents the power used by the user in the macro base station, u k (t) represents a user.
4. The method for optimizing energy efficiency in a hybrid power heterogeneous network based on A2C algorithm according to claim 3, wherein the energy efficiency of the heterogeneous network system in step 1 during time slot t is defined as a ratio ρ of information rate to regular power consumption t The method comprises the following steps:
Figure FDA0004067819740000029
5. the method for optimizing energy efficiency in a hybrid power heterogeneous network based on an A2C algorithm according to claim 4, wherein in step 1, the states are a signal-to-interference-and-noise ratio γ (t) of each user and a battery power e (t) of each small base station, namely:
s t =(γ 1 (t),γ 2 (t),...,γ M (t),e 1 (t),e 2 (t),...,e K (t));
action a t Represented as the number u of users in macro base station k (t), number of subchannels b for user m m (t), transmission power p of user m k,m (t), action a t =(u k (t),b m (t),p k,m (t));
Rewards r t Equivalent to the energy efficiency of heterogeneous network systems, i.e. r t =ρ t
6. The method for optimizing energy efficiency in a hybrid power supply heterogeneous network based on an A2C algorithm as claimed in claim 5, wherein step 2 trains the heterogeneous network system by using the A2C algorithm specifically,
step 2.1: setting a reinforcement learning frame, initializing a state space, selecting an optimal action according to a state obtained by interaction of an agent and an A2C algorithm environment, and transmitting the optimal action as state information to a macro base station;
step 2.2: state s t Inputting the state action value function and the state value function of the heterogeneous network system into the heterogeneous network system, and updating the state action value function and the state value function of the heterogeneous network system;
step 2.3: and carrying out multiple rounds of iterative training on the heterogeneous network until the rewarding function converges, and obtaining the trained heterogeneous network.
7. The method for optimizing energy efficiency in a hybrid power heterogeneous network based on an A2C algorithm according to claim 6, wherein the updating of the state action value function and the state value function of the A2C algorithm in step 2.2 is specifically to update the state action value function Q of the A2C algorithm w (s, a) and a state value function V v (s) updating, approximating the state action value function Q by using a Gaussian distribution method w (s, a); approximating the state value function V with tile encoding v (s) using 32 tile overlays, each tile overlay 4*4 =16; introducing qualification trace to update all action values from each screen to different degrees, wherein the update degree is attenuated according to the time, and a dominance function is set in an actor network.
8. The method for optimizing energy efficiency in a hybrid power heterogeneous network based on the A2C algorithm of claim 7, wherein the state action value function Q w (s, a) is expressed as:
Figure FDA0004067819740000031
wherein w is T Is a parameter vector, expressed as: w= (w) 1 ,w 2 ,...,w n ) T The method comprises the steps of carrying out a first treatment on the surface of the Use parameter θ= (θ) 12 ,...,θ n ) T Construction strategy pi θ = (a|s), expressed as pi θ (s,a)=Pr(a|s,θ);π θ (s,a)~N(μ(s),σ 2 ) Mu(s) is the mean and sigma is the standard deviation; ψ(s) = (γ) 12 ,...,γ M ,e 1 ,e 2 ,...,e K ) T
Figure FDA0004067819740000032
Gamma is expressed as the signal-to-interference-and-noise ratio of each user and e is expressed as the electricity of each small base stationAnd (5) battery power.
9. The method for optimizing energy efficiency in a hybrid power heterogeneous network based on the A2C algorithm of claim 8, wherein the state value function V v (s) is expressed as:
V v (s)=w It(1) *1+w It(2) *1+...+w It(32) *1,
wherein w is It(1) 、w It(2) 、....、w It(32) To activate the weight value of a tile, it (1), it (2), it (32) is the index value of the activated tile, the other tiles are 0.
10. The method for optimizing energy efficiency in a hybrid power heterogeneous network based on an A2C algorithm according to claim 9, wherein step 4 is specifically to determine the energy efficiency of the hybrid power heterogeneous network based on the motion information (s t ,a t ,r t ,s t+1 ) The action return value q of the agent transmitted to the critic network by the critic network t Making an evaluation, and updating the critic network parameters by adopting TD errors; will(s) t ,a t ,q t ) Transmitting to an actor network, updating action selection probability according to strategy gradient, and maximizing rewards r t The method comprises the steps of carrying out a first treatment on the surface of the Transmitting the optimal actions obtained by learning of each small base station as states to a macro base station, and repeatedly deploying the small base stations in the coverage area of the macro base station to obtain an optimal small base station deployment strategy, namely optimal resource allocation; the users are connected to the corresponding small base stations to obtain better channels, and the energy efficiency of the heterogeneous network system is maximized.
CN202310082022.6A 2023-01-17 2023-01-17 Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm Pending CN116132997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310082022.6A CN116132997A (en) 2023-01-17 2023-01-17 Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310082022.6A CN116132997A (en) 2023-01-17 2023-01-17 Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm

Publications (1)

Publication Number Publication Date
CN116132997A true CN116132997A (en) 2023-05-16

Family

ID=86297064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310082022.6A Pending CN116132997A (en) 2023-01-17 2023-01-17 Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm

Country Status (1)

Country Link
CN (1) CN116132997A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726143A (en) * 2024-02-07 2024-03-19 山东大学 Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726143A (en) * 2024-02-07 2024-03-19 山东大学 Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning
CN117726143B (en) * 2024-02-07 2024-05-17 山东大学 Environment-friendly micro-grid optimal scheduling method and system based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN109474980B (en) Wireless network resource allocation method based on deep reinforcement learning
Wei et al. User scheduling and resource allocation in HetNets with hybrid energy supply: An actor-critic reinforcement learning approach
CN110417496B (en) Cognitive NOMA network stubborn resource allocation method based on energy efficiency
Ng et al. Energy-efficient resource allocation in OFDMA systems with hybrid energy harvesting base station
CN110708711B (en) Heterogeneous energy-carrying communication network resource allocation method based on non-orthogonal multiple access
Wang et al. Joint interference alignment and power control for dense networks via deep reinforcement learning
CN109413724A (en) A kind of task unloading and Resource Allocation Formula based on MEC
CN111586720A (en) Task unloading and resource allocation combined optimization method in multi-cell scene
Budhiraja et al. Cross-layer interference management scheme for D2D mobile users using NOMA
CN104378772B (en) Towards the small base station deployment method of the amorphous covering of cell in a kind of cellular network
CN106231610B (en) Based on the resource allocation methods of sub-clustering in Femtocell double-layer network
CN111446992B (en) Method for allocating resources with maximized minimum energy efficiency in wireless power supply large-scale MIMO network
CN107343268B (en) Non-orthogonal multicast and unicast transmission beamforming method and system
CN114885420A (en) User grouping and resource allocation method and device in NOMA-MEC system
CN116132997A (en) Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm
Chuang et al. Dynamic multiobjective approach for power and spectrum allocation in cognitive radio networks
Zhang et al. A dynamic power allocation scheme in power-domain NOMA using actor-critic reinforcement learning
Liu et al. Robust resource allocation in two-tier NOMA heterogeneous networks toward 5G
CN113473580A (en) Deep learning-based user association joint power distribution strategy in heterogeneous network
CN105490794A (en) Packet-based resource distribution method for orthogonal frequency division multiple access (OFDMA) femtocell double-layer network
CN108965034B (en) Method for associating user to network under ultra-dense deployment of small cell base station
CN114615730A (en) Content coverage oriented power distribution method for backhaul limited dense wireless network
CN114521023A (en) SWIPT-assisted NOMA-MEC system resource allocation modeling method
CN107426775B (en) Distributed multi-user access method for high-energy-efficiency heterogeneous network
CN112954806A (en) Chord graph coloring-based joint interference alignment and resource allocation method in heterogeneous network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination