CN114051252B - Multi-user intelligent transmitting power control method in radio access network - Google Patents

Multi-user intelligent transmitting power control method in radio access network Download PDF

Info

Publication number
CN114051252B
CN114051252B CN202111145720.3A CN202111145720A CN114051252B CN 114051252 B CN114051252 B CN 114051252B CN 202111145720 A CN202111145720 A CN 202111145720A CN 114051252 B CN114051252 B CN 114051252B
Authority
CN
China
Prior art keywords
wireless access
access device
power control
network
control strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111145720.3A
Other languages
Chinese (zh)
Other versions
CN114051252A (en
Inventor
张先超
赵耀
张庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing University
Original Assignee
Jiaxing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiaxing University filed Critical Jiaxing University
Priority to CN202111145720.3A priority Critical patent/CN114051252B/en
Publication of CN114051252A publication Critical patent/CN114051252A/en
Application granted granted Critical
Publication of CN114051252B publication Critical patent/CN114051252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/06TPC algorithms
    • H04W52/14Separate analysis of uplink or downlink
    • H04W52/146Uplink power control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a multi-user intelligent transmitting power control method in a wireless access network, which comprises the following steps: modeling and analyzing the communication system of each wireless access device accessing to the network to obtain the global channel state and the global sequence state of the wireless access device; determining a power control strategy of each wireless access device based on Markov decision processes of a plurality of entities; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy; training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network; and each wireless access device performs intelligent transmitting power control according to the trained strategy network. The invention reduces the time delay and the power consumption of the whole uplink communication system, provides high-quality communication service by utilizing limited resources, and has good realizability and expandability due to low complexity and distributed decision.

Description

Multi-user intelligent transmitting power control method in radio access network
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method for controlling multi-user intelligent transmission power in a radio access network.
Background
With the rapid development of mobile internet and artificial intelligence technology in recent years, intelligent wireless access devices such as smart phones, augmented Reality (AR), virtual Reality (VR) and intelligent applications such as telemedicine, industry 4.0, and autopilot have entered a explosive growth stage, which means that a large number of wireless access devices will access a communication network, and the requirements of these intelligent wireless access devices on communication performance are more severe and diverse than those of previous mobile phones. In order to guarantee the communication service quality and experience of the access users, the limited wireless communication resources must be reasonably configured. The transmission power in these resources plays a direct and critical role, the low power natural communication quality is poor, the high power also brings about the interference problem of multiple users to reduce the communication quality, and meanwhile, the high power consumption problem of the wireless access device is also of great concern, so the control of the transmission power of multiple users in the future wireless access network is a key problem in the current wireless communication field.
However, the current power control method based on the model and the numerical optimization algorithm faces the problems of difficult modeling, excessively high algorithm complexity, excessively long solving time and the like when facing the future complex access network, and needs to be re-optimized to adapt to new parameters when the environment changes, so that the method is difficult to be used for power control in practice. Therefore, an intelligent power control method is provided herein, which considers complex channel environment and user demand queue, and performs distributed intelligent control on the multi-user transmitting power in the wireless access network based on multi-agent deep reinforcement learning technology, so as to realize high-quality communication service with low power consumption and low time delay.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a method for controlling multi-user intelligent transmitting power in a wireless access network, which solves the problem that the prior art is difficult to be applied to a future wireless access network.
The technical scheme provided by the invention is as follows:
the invention discloses a multi-user intelligent transmitting power control method in a wireless access network, which comprises the following steps:
modeling and analyzing a communication system of each wireless access device accessing to the network to obtain a global channel state and a global sequence state of the wireless access device;
determining a power control strategy of each wireless access device based on Markov decision processes of a plurality of entities; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network;
and each wireless access device performs intelligent transmitting power control according to the trained strategy network.
Further, each wireless access device accessing to the network performs uplink communication with a single base station in an OFDMA access mode, and the number of assignable subcarriers of the OFDMA is smaller than the number of the wireless access devices; the OFDMA is non-orthogonal multiplexing of carriers, and information of more than one wireless access device is carried on the same subcarrier.
Further, in the non-orthogonal multiplexing, the base station receives a radio access deviceThe achievable data rate for k on subcarrier m is:
Figure BDA0003285353840000021
wherein H is k,m (t) is the channel state information of the wireless access device k at the subcarrier m at the time t; p (P) k,m (t) transmitting power information of the wireless access device k on the subcarrier m at the moment t; h j,m (t) is the channel state information of the wireless access device j at the subcarrier m at the moment t; p (P) j,m (t) transmitting power information from the wireless access device j to the subcarrier m at the moment t; Γ is the SINR gap due to the signal modulation multiplexing mode; n (N) 0 Is the noise power.
Further, the queue dynamics of radio access device k on subcarrier m:
Figure BDA0003285353840000031
I k (t) is the length of the sequence to be transmitted of the wireless access device k at the time t; c (C) k,m (t) is the achievable data rate for the base station to receive wireless access device k on subcarrier M, M being the number of subcarriers.
Further, in step S2, based on the markov decision process, the wireless access device k performs a corresponding power control policy pi k Selecting action a k The method comprises the steps of carrying out a first treatment on the surface of the And entering a next state S (t+1) according to the current state S (t) of the wireless access equipment and actions of all the wireless access equipment; and, at the time of state transition, each wireless access device gets a corresponding reward function r k (t)=r(S(t),a k (t), S (t+1)), and obtains the observed quantity o of the new state of the self k (t+1); in the power control strategy, each wireless access device pursues the long-term return of maximizing itself as
Figure BDA0003285353840000032
Where γ is the discount factor and T is the time length.
Further, the optimizing target model of the power control strategy establishes a multi-wireless access device transmitting power control problem in the wireless access network according to the low-power consumption and low-delay targets as follows:
Figure BDA0003285353840000033
α k and beta k Respectively corresponding positive weight of power consumption and time delay of the wireless access equipment;
Figure BDA0003285353840000034
Figure BDA0003285353840000035
for controlling policy pi k The average uplink transmitting power consumption and the average uplink communication time delay of the wireless access equipment k; p (P) max Maximum transmit power for the wireless access device; p (P) k,m (t) transmitting power information of the wireless access device k on the subcarrier m at the moment t; m is the number of subcarriers;
the rewards for each wireless access device in the optimization objective model are:
Figure BDA0003285353840000041
k is the number of wireless access devices; l (L) k (t) is the queue dynamics of wireless access device k on subcarrier m; lambda (lambda) k The average arrival rate of packets for wireless access device k.
Further, the training the power control strategy by using the multi-agent deep reinforcement learning method comprises the following steps:
step S301, in each iteration round, operating the power control strategy of each wireless access device in the time length T; the central node of the wireless access network collects the actions, states and rewards of each wireless access device;
step S302, calculating estimated dominance values of all wireless access devices;
step S303, traversing all wireless access devices, wherein each wireless access device acquires channel state information in rewards and observation values of the wireless access device from the central node, acquires queue state information from the wireless access device, and combines the queue state information to obtain a final observation value of the wireless access device;
step S304, according to the final observed value, each wireless access device locally uses a gradient descent method to update the corresponding strategy parameters;
step S305, the central node updates the corresponding dominant function network parameters of each wireless access device by using a gradient descent method;
step S306, adding 1 to the round number, and starting the iterative training process from step S301 again;
after iteration is carried out to the maximum round times, the algorithm converges, and the trained strategy network is output.
Further, in step S302, the dominance function for calculating the estimated dominance value of the wireless access device is:
Figure BDA0003285353840000042
wherein the time parameter n=0, 1,2, …, N-1; n-1 is the number of time points corresponding to the time length T; gamma, lambda E [0,1 ]]A discount factor that balances the estimated bias and variance; v (V) k (S(t);φ k ) State S (t) of radio access device at time t and neural network parameter phi for radio access device k k The following centralized cost function; r is (r) k (t) is a prize for wireless access device k.
Further, in step S305, the central node updates the minimization loss function of the corresponding dominance function network parameter of each wireless access device by using the gradient descent method to be;
Figure BDA0003285353840000051
further, in step S306, each wireless access device updates the objective function of the corresponding policy parameter locally using the gradient descent method as follows:
Figure BDA0003285353840000052
wherein l k (t;θ k ) Indicating the adjustment control strategy pi k Parameter θ k Likelihood ratio between new and old strategies; clip (l) k (t;θ k ) 1- ε,1+ε) represents the step of adding l k (t;θ k ) The amplitude limit is [ 1-epsilon, 1+epsilon ]]A section; epsilon is the error;
Figure BDA0003285353840000053
is an estimate of the dominance function.
The invention has the beneficial effects that:
the invention takes the requirement of a future wireless access network as a starting point, considers the environmental variability and complexity of the future wireless access network, provides a multi-user intelligent power control method, reduces the time delay and the power consumption of the whole uplink communication system, provides high-quality communication service by using limited resources, and has good realizability and expandability due to low complexity and distributed decision.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.
Fig. 1 is a flowchart of a method for controlling intelligent transmission power of multiple users according to an embodiment of the present invention;
FIG. 2 is a framework diagram of multi-agent deep reinforcement learning in an embodiment of the invention;
FIG. 3 is a flowchart of a multi-agent proximity policy optimization method in an embodiment of the present invention;
FIG. 4 is a pseudo code example graph of a multi-agent proximity policy optimization algorithm in an embodiment of the invention.
Detailed Description
Preferred embodiments of the present invention are described in detail below with reference to the attached drawing figures, which form a part of the present application and, together with the embodiments of the present invention, serve to explain the principles of the invention.
General purpose in this embodimentThe communication system takes uplink communication between a base station and ground wireless access equipment as an example, 50 wireless access equipment are arranged in an area with the diameter of 1km at random and are communicated with a single base station in an uplink mode, the total available communication bandwidth is 10MHz, the number of OFDMA available subcarriers is 20, and the path loss of a communication channel is 120.9+37.6log 10 d (in dB), where d is the distance between the transmitting and receiving ends, the doppler frequency is set to 10hz, and the sinr gap is Γ=7.5 dB. The average arrival rate of the data packet is 4Mbps, the maximum transmission power of the wireless access device is 38dBm, the total time step is 1s, the data packet is divided into 1000 time blocks, and the discount coefficients are respectively gamma=0.98 and lambda=0.96. Training was performed for a total of 10000 iterations.
The implementation of the method requires that an environment simulation platform is firstly built (or in an actual environment) to train and learn the power control strategies of a plurality of wireless access devices. After the algorithm converges, the trained strategy is applied to the actual wireless access network, and the wireless access equipment is used as an intelligent agent for intelligent power control. Each agent makes intelligent power control decisions through the collected own user information (queue state information) and part of environment information (own channel state information). Thus, the multi-user high-quality communication service of the wireless access network with long-term low power consumption and low time delay is realized.
The disclosed method for controlling multi-user intelligent transmitting power in a wireless access network in this embodiment, as shown in fig. 1, includes the following steps:
step S101, modeling and analyzing a communication system of each wireless access device accessing to the network to obtain a global channel state and a global sequence state of the wireless access device;
step S102, determining a power control strategy of each wireless access device based on Markov decision processes of a plurality of entities; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
step S103, training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network;
step S104, each wireless access device performs intelligent transmitting power control according to the trained strategy network.
The present embodiment optimizes the communication service quality of multiple users in the radio access network, so in step S101, modeling analysis is performed on the communication system of the radio access device, including:
1) Calculating the transmission rate of the wireless access equipment;
each wireless access device accessing to the network performs uplink communication with a single base station in an OFDMA access mode, and the number of allocable subcarriers of the OFDMA is smaller than the number of the wireless access devices; the OFDMA allows non-orthogonal multiplexing of carriers, and carries information of more than one wireless access device on the same subcarrier.
Specifically, in the communication system of this embodiment, K intelligent wireless access devices are set to perform uplink communication with a single base station in an OFDMA access manner, where the number of allocable subcarriers of OFDMA is M, so that M < K is better to simulate the situation of accessing a large number of wireless access devices in the future, and in addition, in order to further reduce the queue waiting delay and improve the spectrum utilization rate, non-orthogonal multiplexing of carriers is allowed here, which means that more than one wireless access device may be carried on the same subcarrier. Let the transmission power of the kth wireless access device on subcarrier m at time t be P k,m (t) the transmitted signal is x k,m (t). The signal received by the base station at time t at subcarrier m from the kth wireless access device can be expressed as:
Figure BDA0003285353840000071
wherein h is k,m (t) is the complex channel coefficient between radio access device k and base station on subcarrier m at time t, z k,m (t) is independently and uniformly distributed complex Gaussian white noise, and the noise power is N 0 . Order the
Figure BDA0003285353840000072
Representing global letterChannel State Information (CSI), where H k , m (t)=|h k,m (t)| 2 The instantaneous channel gain on subcarrier m between radio access device k and the base station at time t is indicated. Here, a rayleigh fading channel model common in a radio access network is adopted, and in order to characterize the dynamic characteristics of a channel, the channel coefficient is expressed as a first-order complex gaussian markov process according to a Jakes fading model:
Figure BDA0003285353840000081
wherein h is k,m (t) and channel update procedure e k,m And (t) are unit variance circularly symmetric complex Gaussian random variables which are independently and uniformly distributed. Correlation coefficient ρ=j 0 (2πf d T), where J 0 (.) is a zero-order Bessel function, f d Is the maximum doppler frequency.
Since multiplexing of subcarriers is allowed here, the base station will receive signals from multiple terrestrial radio access devices on one OFDMA resource block, for one of which the signals of the other radio access devices will be regarded as noise, and the received signal rate of the radio access devices will also depend on the signal-to-interference-and-noise ratio (SINR). At a given channel state information H (t) and transmit power
Figure BDA0003285353840000082
The achievable data rate of base station receiving radio access device k on subcarrier m can be expressed as:
Figure BDA0003285353840000083
where Γ is the SINR gap due to multiplexing such as signal modulation.
2) Modeling and analyzing the queue dynamics of the communication wireless access equipment;
in a radio access network, one of the greatest visual experiences of a user of a radio access device on a communication service is the time delay of communication, and the user's demand is represented by the size of a data packet at the bottom layer of communication, so that high-quality communication service means that low-time delay transmission can be realized and communication resources can be efficiently utilized regardless of the user's demand. The continuous increase of the communication rate is finally aimed at meeting the requirement of large data transmission of users more quickly; the power and communication rate are reduced if the user requires a small amount of data to save power consumption while reducing interference to other users. Therefore, in consideration of the performance index of time delay, modeling analysis is carried out on the dynamic information of the data packet queue.
Assuming that the wireless access device transmits data packets to randomly enter a sequence to be transmitted in a poisson distribution process, the average arrival rate of the data packets of the wireless access device k is set as lambda k ,I(t)=(I 1 (t),…,I K When (t)) is set to the size of the packet information amount reaching the wireless access device at time t
Figure BDA0003285353840000091
Mathematical expectation E [ I ] k (t)]=λ k . Let L be k (t) ∈0, +. L (t) = (L) 1 (t),…,L K (t))∈[0,∞) K Is global sequence state information (QSI). For wireless access device k, its queue dynamics can be expressed as:
Figure BDA0003285353840000092
after the system environment, state models (i.e., CSI and QSI) are built in step S101, the power control strategy and optimization target models are designed in step S102, including:
1) Establishing a power control strategy model
Because both the wireless channel environment and the wireless access device queue dynamics have Markov properties, and a distributed control strategy is adopted here, each wireless access device makes an autonomous decision according to part of state information observed by itself, so the dynamic decision process is modeled as a Markov decision process of a plurality of bodies, namely a part of observed Markov games.
Specifically, let s= (H, L) be the global state, and the action set of the radio access device k be
Figure BDA0003285353840000093
o k As the observation set of radio access device k, it is assumed here that the radio access device can observe its own channel state information H k,m (t) and queue status information L k (t). Wireless access device k selects actions according to a random policy: a, a k (t)~π k (a k (t)|o k (t)) and then enter the next state according to the state transfer function: s (t+1) to P (S (t+1) |S (t), a) 1 (t),…,a K (t)). Each wireless access device will get a corresponding reward function r k (t)=r(S(t),a k (t), S (t+1)), and obtains the observed quantity o of the new state of the self k (t+1). Each wireless access device pursues a maximum of its own long-term return +.>
Figure BDA0003285353840000101
Where γ is the discount coefficient and T is the time range.
2) Determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
from the above model building we can further build specific targets and facing problems. First of all, the object of the invention is to reduce the communication power consumption of a radio access device, in a control strategy pi k The average uplink transmit power consumption of wireless access device k can be expressed as
Figure BDA0003285353840000102
In addition, communication delay of wireless access equipment is reduced, and control strategy pi is adopted k In the following, according to the littermate rule, the average uplink communication delay of radio access device k can be expressed as
Figure BDA0003285353840000103
Where T is the time range. According to the mathematical expression and the established low-power consumption and low-delay target, the problem of establishing multi-user intelligent transmitting power control in the wireless access network is as follows:
Figure BDA0003285353840000104
the objective of the problem is to minimize the weighted power consumption and the delay, alpha k And beta k Respectively, the power consumption and the time delay of the wireless access equipment are corresponding positive weight. According to the objective, defining rewards for each wireless access device as
Figure BDA0003285353840000105
Cooperation must be established between wireless access devices to achieve such team-type goals.
Specifically, in step S103, a multi-agent deep reinforcement learning method is applied to obtain an optimal power control policy of each wireless access device;
the multi-agent deep reinforcement learning technology applied in this embodiment is specifically a multi-agent proximity strategy optimization method, and the overall framework is centralized training and distributed execution, as shown in fig. 2, and the optimal power control strategy is obtained by performing multi-agent deep reinforcement learning based on an actor-arbiter algorithm.
In order to obtain the optimal power control strategy, strategy evaluation and strategy improvement are required to be continuously iterated. In a Markov game with multiple agents, the value of a strategy is determined by the global state value and the actions of each agent, so strategy pi is measured k Centralized evaluation was performed. To reduce the evaluation variance, a generic merit function evaluation strategy is employed here, in particular, defining the centralized merit function of agent k-resorting strategy as V πk (S(t))=E[R k |S(t)]Action-cost function of
Figure BDA0003285353840000114
The dominance function can be expressed as
Figure BDA0003285353840000113
In reality, the accurate value of the dominance function cannot be obtained, the dominance function needs to be estimated by adopting a deep neural network, and the parameters of the dominance function network are set to phi= { phi 1 ,…,φ K The estimate of the dominance function can be written as:
Figure BDA0003285353840000111
wherein, gamma, lambda E [0,1 ]]To balance the discount factor of estimated bias and variance, δ k (t+n)=r k (t+n)+γV k (S(t+n+1);φ k )-V k (S(t+n);φ k ) As a time difference function, n is a time parameter, which indicates the point in time to which the strategy is running, and the expansion (8) is performed to obtain:
Figure BDA0003285353840000112
network parameter phi = { phi 1 ,…,φ K By minimizing the loss function:
Figure BDA0003285353840000121
the above-described procedure of evaluating the merit function is implemented at a central node (e.g., a wireless access point such as a base station).
The distributed strategy improvement can be performed by transmitting the dominance function value back to each wireless access device with the dominance function required by the evaluation strategy, and the basic idea of the improvement is to adjust the strategy parameter theta= { theta 1 ,…,θ K To maximize the objective function J (θ k )=E[R k ]In order to improve training stability, excessive generation in strategy training process is preventedChanging, the proximity gradient optimization algorithm changes the objective function to:
Figure BDA0003285353840000122
/>
wherein the likelihood ratio between new and old strategies
Figure BDA0003285353840000123
clip(l k (t;θ k ) 1-. Epsilon.1+epsilon.) will be l k (t;θ k ) The amplitude limit is [ 1-epsilon, 1+epsilon ]]Interval epsilon is the error. The policy improvement requires only a partial observation of each wireless access device itself and can be performed at the wireless access device.
More specifically, in the communication system, the multi-agent proximity policy optimization method implemented based on the actor-arbiter is shown in fig. 3; the method specifically comprises the following steps:
step S301, in each iteration round, the power control strategy of each wireless access device is operated within the time length T
Figure BDA0003285353840000124
The central node collects the actions, states and rewards of each wireless access device to obtain { S (t), a } 1 (t),…,a K (t), r (t); wherein the initial power control strategy is a random strategy;
the center node is a base station or other wireless access equipment serving as the center node;
step S302, calculating estimated dominance values of all wireless access devices;
the dominance function for calculating the estimated dominance value of the wireless access device is formula (9);
step S303, traversing all wireless access devices, wherein each wireless access device acquires channel state information in rewards and observation values of the wireless access device from a central node, acquires queue state information from the wireless access device, and combines the queue state information to obtain a final observation value of the wireless access device;
step S304, according to the final observed value, each wireless access device locally uses a gradient descent method to update the corresponding strategy parameter theta;
wherein the gradient descent method used locally by each wireless access device is performed according to the objective function of formula (11);
step S305, updating the corresponding dominance function network parameter phi of each wireless access device by using a gradient descent method at the central node;
wherein the gradient descent method used by the central node is performed according to the minimization loss function of formula (10);
step S306, adding 1 to the round number, and starting the iterative training process from step S301 again;
after the iteration reaches the maximum round times, the algorithm converges, the training process is finished, and the trained strategy network is output.
Specifically, in step S104, when each wireless access device performs intelligent transmission power control according to the trained policy network,
and each wireless access device selects the optimal transmitting power to access the wireless communication network according to the trained strategy network and the respective pi (a (t) |o (t)) in a complex change environment. At this time, centralized training is not performed any more, and intelligent decision making is performed in a fully distributed manner.
As shown in fig. 4, this embodiment also provides a pseudo code example of the whole multi-agent proximity policy optimization algorithm, and uses double-layer nesting of for statements to implement optimization of the power control policy of the network access wireless access device.
In summary, the method for controlling multi-user intelligent transmitting power in the wireless access network of the embodiment reduces time delay and power consumption of the whole uplink communication system, provides high-quality communication service by using limited resources, and has good realizability and expandability due to low complexity and distributed decision.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (1)

1. A multi-user intelligent transmitting power control method in a wireless access network is characterized by comprising the following steps:
modeling and analyzing a communication system of each wireless access device accessing to the network to obtain a global channel state and a global sequence state of the wireless access device;
determining a power control strategy of each wireless access device based on Markov decision processes of a plurality of entities; determining an optimization target model of the power control strategy according to the average uplink transmission power consumption and the average uplink communication time delay of the wireless access equipment under the power control strategy;
training the power control strategy by using a multi-agent deep reinforcement learning method to obtain a trained strategy network;
each wireless access device performs intelligent transmitting power control according to the trained strategy network;
each wireless access device accessing to the network performs uplink communication with a single base station in an OFDMA access mode, wherein the number of assignable subcarriers of the OFDMA is smaller than the number of the wireless access devices; the OFDMA is non-orthogonal multiplexing of carriers, and information of more than one wireless access device is carried on the same subcarrier;
modeling and analyzing a communication system of each wireless access device accessing to the network to obtain a global channel state and a global sequence state of the wireless access device, wherein the method comprises the following steps:
1) Calculating the transmission rate of the wireless access equipment;
in the non-orthogonal multiplexing, the base station receives the achievable data rate of the wireless access device k on the subcarrier m as follows:
Figure QLYQS_1
wherein H is k,m (t) is the channel state information of the wireless access device k at the subcarrier m at the time t; p (P) k,m (t) transmitting power information of the wireless access device k on the subcarrier m at the moment t; h j,m (t) is the channel state information of the wireless access device j at the subcarrier m at the moment t; p (P) j,m (t) transmitting power information from the wireless access device j to the subcarrier m at the moment t; Γ is the SINR gap due to the signal modulation multiplexing mode; n (N) 0 Is the noise power;
2) Modeling and analyzing the queue dynamics of the communication wireless access equipment;
the determined queue dynamics of wireless access device k on subcarrier m:
Figure QLYQS_2
I k (t) is the length of the sequence to be transmitted of the wireless access device k at the time t; c (C) k,m (t) receiving an achievable data rate of the wireless access device k on the subcarrier M for the base station, wherein M is the number of subcarriers;
based on Markov decision process, wireless access device k is based on corresponding power control strategy pi k Selecting action a k The method comprises the steps of carrying out a first treatment on the surface of the And entering a next state S (t+1) according to the current state S (t) of the wireless access equipment and actions of all the wireless access equipment; and, at the time of state transition, each wireless access device gets a corresponding reward function r k (t)=r(S(t),a k (t), S (t+1)), and obtains the observed quantity o of the new state of the self k (t+1); in the power control strategy, each wireless access device pursues the long-term return of maximizing itself as
Figure QLYQS_3
Wherein gamma is a discount factor and T is a time length;
the optimizing target model of the power control strategy establishes a multi-wireless access device transmitting power control problem in the wireless access network according to the low-power consumption and low-delay targets, wherein the transmitting power control problem comprises the following steps:
Figure QLYQS_4
α k and beta k Respectively corresponding positive weight of power consumption and time delay of the wireless access equipment;
Figure QLYQS_5
Figure QLYQS_6
for controlling policy pi k The average uplink transmitting power consumption and the average uplink communication time delay of the wireless access equipment k; p (P) max Maximum transmit power for the wireless access device; p (P) k,m (t) transmitting power information of the wireless access device k on the subcarrier m at the moment t; m is the number of subcarriers;
the rewards for each wireless access device in the optimization objective model are:
Figure QLYQS_7
k is the number of wireless access devices; l (L) k (t) is the queue dynamics of wireless access device k on subcarrier m; lambda (lambda) k Average arrival rate of data packets for wireless access device k;
the process of training the power control strategy by using the multi-agent deep reinforcement learning method comprises the following steps:
step S301, in each iteration round, operating the power control strategy of each wireless access device in the time length T; the central node of the wireless access network collects the actions, states and rewards of each wireless access device;
step S302, calculating estimated dominance values of all wireless access devices;
step S303, traversing all wireless access devices, wherein each wireless access device acquires channel state information in rewards and observation values of the wireless access device from the central node, acquires queue state information from the wireless access device, and combines the queue state information to obtain a final observation value of the wireless access device;
step S304, according to the final observed value, each wireless access device locally uses a gradient descent method to update the corresponding strategy parameters;
step S305, the central node updates the corresponding dominant function network parameters of each wireless access device by using a gradient descent method;
step S306, adding 1 to the round number, and starting the iterative training process from step S301 again;
after iteration is carried out to the maximum round times, the algorithm converges, and a trained strategy network is output;
in step S302, the dominance function for calculating the estimated dominance value of the wireless access device is:
Figure QLYQS_8
wherein the time parameter n=0, 1,2, …, N-1; n-1 is the number of time points corresponding to the time length T; gamma, lambda E [0,1 ]]A discount factor that balances the estimated bias and variance; v (V) k (S(t);φ k ) State S (t) of radio access device at time t and neural network parameter phi for radio access device k k The following centralized cost function; r is (r) k (t) is a reward for wireless access device k;
in step S305, the central node updates the minimization loss function of the corresponding dominance function network parameter of each wireless access device by using a gradient descent method as follows;
Figure QLYQS_9
in step S306, each wireless access device updates the objective function of the corresponding policy parameter locally using the gradient descent method as follows:
Figure QLYQS_10
wherein l k (t;θ k ) Indicating the adjustment control strategy pi k Parameter θ k Likelihood ratio between new and old strategies; clip (l) k (t;θ k ) 1- ε,1+ε) represents the step of adding l k (t;θ k ) The amplitude limit is [ 1-epsilon, 1+epsilon ]]A section; epsilon is the error;
Figure QLYQS_11
is an estimate of the dominance function. />
CN202111145720.3A 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network Active CN114051252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111145720.3A CN114051252B (en) 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111145720.3A CN114051252B (en) 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network

Publications (2)

Publication Number Publication Date
CN114051252A CN114051252A (en) 2022-02-15
CN114051252B true CN114051252B (en) 2023-05-26

Family

ID=80204660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111145720.3A Active CN114051252B (en) 2021-09-28 2021-09-28 Multi-user intelligent transmitting power control method in radio access network

Country Status (1)

Country Link
CN (1) CN114051252B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117135655A (en) * 2023-08-15 2023-11-28 华中科技大学 Intelligent OFDMA resource scheduling method, system and terminal of delay-sensitive WiFi
CN117412323A (en) * 2023-09-27 2024-01-16 华中科技大学 WiFi network resource scheduling method and system based on MAPPO algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112700663A (en) * 2020-12-23 2021-04-23 大连理工大学 Multi-agent intelligent signal lamp road network control method based on deep reinforcement learning strategy

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713B (en) * 2020-06-04 2021-06-08 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112700663A (en) * 2020-12-23 2021-04-23 大连理工大学 Multi-agent intelligent signal lamp road network control method based on deep reinforcement learning strategy

Also Published As

Publication number Publication date
CN114051252A (en) 2022-02-15

Similar Documents

Publication Publication Date Title
CN111666149B (en) Ultra-dense edge computing network mobility management method based on deep reinforcement learning
Mei et al. Intelligent radio access network slicing for service provisioning in 6G: A hierarchical deep reinforcement learning approach
CN110531617B (en) Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station
CN113162682B (en) PD-NOMA-based multi-beam LEO satellite system resource allocation method
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
Shi et al. Drone-cell trajectory planning and resource allocation for highly mobile networks: A hierarchical DRL approach
CN114051252B (en) Multi-user intelligent transmitting power control method in radio access network
CN110113190A (en) Time delay optimization method is unloaded in a kind of mobile edge calculations scene
CN113162679A (en) DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN112118601A (en) Method for reducing task unloading delay of 6G digital twin edge computing network
Elnahas et al. Game theoretic approaches for cooperative spectrum sensing in energy-harvesting cognitive radio networks
CN110809306A (en) Terminal access selection method based on deep reinforcement learning
Xu et al. Multi-agent reinforcement learning based distributed transmission in collaborative cloud-edge systems
CN111526592B (en) Non-cooperative multi-agent power control method used in wireless interference channel
CN116456493A (en) D2D user resource allocation method and storage medium based on deep reinforcement learning algorithm
Wang et al. Distributed reinforcement learning for age of information minimization in real-time IoT systems
CN109982434A (en) Wireless resource scheduling integrated intelligent control system and method, wireless communication system
Wang et al. Decentralized learning based indoor interference mitigation for 5G-and-beyond systems
CN115640131A (en) Unmanned aerial vehicle auxiliary computing migration method based on depth certainty strategy gradient
Elsayed et al. Deep reinforcement learning for reducing latency in mission critical services
Guan et al. An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning
Wu et al. 3D aerial base station position planning based on deep Q-network for capacity enhancement
Qi et al. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing
CN114885340B (en) Ultra-dense wireless network power distribution method based on deep migration learning
CN116260871A (en) Independent task unloading method based on local and edge collaborative caching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant