CN109474980B - Wireless network resource allocation method based on deep reinforcement learning - Google Patents
Wireless network resource allocation method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN109474980B CN109474980B CN201811535056.1A CN201811535056A CN109474980B CN 109474980 B CN109474980 B CN 109474980B CN 201811535056 A CN201811535056 A CN 201811535056A CN 109474980 B CN109474980 B CN 109474980B
- Authority
- CN
- China
- Prior art keywords
- reinforcement learning
- eval
- deep reinforcement
- subcarrier
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/06—TPC algorithms
- H04W52/14—Separate analysis of uplink or downlink
- H04W52/143—Downlink power control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/24—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
- H04W52/241—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/26—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
- H04W52/265—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/30—TPC using constraints in the total amount of available transmission power
- H04W52/34—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
- H04W52/346—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/542—Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/543—Allocation or scheduling criteria for wireless resources based on quality criteria based on requested quality, e.g. QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
Abstract
The invention provides a wireless network resource allocation method based on deep reinforcement learning, which can improve the energy efficiency in a time-varying channel environment to the maximum extent with lower complexity. The method comprises the following steps: establishing a deep reinforcement learning model; modeling a time-varying channel environment between a base station and a user terminal as a finite-state time-varying Markov channel, determining a normalized channel coefficient, and inputting the normalized channel coefficient into a convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user; according to the subcarrier distribution result, downlink power is distributed to the users multiplexed on each subcarrier based on the inverse ratio of the channel coefficient, a return function is determined based on the distributed downlink power, and the return function is fed back to the deep reinforcement learning model; training a convolutional neural network q in a deep reinforcement learning model according to the determined return functioneval、qtargetAnd determining the local optimal power distribution in the time-varying channel environment. The invention relates to the field of wireless communication and artificial intelligence decision making.
Description
Technical Field
The invention relates to the field of wireless communication and artificial intelligence decision making, in particular to a wireless network resource allocation method based on deep reinforcement learning.
Background
Starting from the Long Term Evolution (LTE) era, the networking architecture is shifted from Macro network to Macro-micro cooperation, and Macro Cell (Macro Cell) sustainable development faces many challenges, such as unpredictable traffic growth demand, ubiquitous access demand, random hotspot deployment, and great cost pressure of Macro Cell itself. Therefore, the advantages of accurate coverage and blind area supplement of Small base stations (Small cells) such as micro-cells and home base stations are embodied, and the method gradually becomes an important link for cooperative work with macro base stations in network deployment and allocation of macro base station service pressure. The fifth generation mobile communication is an extension of 4G, and 5G is not a single radio access technology but a general term for a solution after evolution and integration of a plurality of new radio access technologies and existing radio access technologies. Nowadays, 5G networks start to get into the sight of people, and the user experience rate is generally considered as the most important performance index of 5G. The technical features of 5G can be summarized by several numbers: capacity boost of 1000x, connection support of 1000 hundred million +, maximum speed of 10GB/s, delay of 1ms or less. The 5G main technologies comprise a super-large-scale multi-antenna, a novel multiple access technology and an ultra-dense network, wherein the deployment of the small base station and the macro base station form the ultra-dense heterogeneous network, and ubiquitous services are provided for users.
With the rapid increase of the number of mobile users, the arrangement of small base stations also tends to be ultra-intensive, the energy consumption brought by the field of wireless communication is very large, and aiming at the national conditions of serious environmental pollution and increasingly scarce energy in China, green communication is inevitably a direction worthy of research and exploration, so on the basis of ensuring the satisfaction of user data requirements and service quality, the realization of higher energy efficiency through a reasonable resource distribution mode is an important research direction.
Disclosure of Invention
The invention aims to provide a wireless network resource allocation method based on deep reinforcement learning, so as to solve the problem that wireless resource allocation in a time-varying channel environment cannot be effectively realized in the prior art.
In order to solve the above technical problem, an embodiment of the present invention provides a wireless network resource allocation method based on deep reinforcement learning, including:
s101, establishing a convolutional neural network q with two same parameterseval、qtargetForming a deep reinforcement learning model;
s102, modeling the time-varying channel environment between the base station and the user terminal as a time-varying Markov channel in a finite state, determining a normalized channel coefficient between the base station and the user, and inputting the normalized channel coefficient into a convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user;
s103, distributing downlink power to the users multiplexed on each subcarrier based on the inverse ratio of the channel coefficient according to the subcarrier distribution result, determining system energy efficiency based on the distributed downlink power, determining a return function based on the system energy efficiency, and feeding the return function back to the deep reinforcement learning model;
s104, training a convolutional neural network q in the deep reinforcement learning model according to the determined return functioneval、qtargetIf the difference value between the system energy efficiency value obtained continuously for multiple times and the preset threshold value is within the preset range or higher than the preset threshold value, the currently allocated downlink power is locally and optimally allocated under the time-varying channel environment.
Further, the normalized channel coefficient is represented as:
wherein Hn,kThe normalized channel coefficient is expressed as the normalized channel gain of the base station and the user terminal n on the subcarrier k; h isn,kRepresenting the channel gain of the base station and the user terminal n on the subcarrier k;representing the noise power on subcarrier k.
Further, the input convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers for the user comprises the following steps:
inputting the normalized channel coefficients into a convolutional neural network qevalConvolutional neural network qevalBy means of decision formulasSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user;
wherein, thetaevalRepresenting a convolutional neural network qevalThe weight parameter of (2), Q function Q (s, a'; theta)eval) The weight is represented as thetaevalOf the convolutional neural network qevalPerforming the reported value obtained by action a' at state s, which is the input normalized channel coefficient; and a represents a decision action of the deep reinforcement learning model, namely an optimal subcarrier distribution result, wherein the optimal subcarrier distribution result is obtained according to the index of the action with the maximum return value.
Further, the downlink power allocated to the user is represented as:
wherein p isn,kIndicating the downlink transmitting power distributed by the base station for the user terminal n on the subcarrier k; p'kIndicating the downlink transmitting power distributed by the base station on the subcarrier k; a represents an attenuation factor; kmaxRepresents the most multiplexed on each sub-carrier in the complexity that the current serial interference canceller can bear in the non-orthogonal multiple access networkThe number of large users.
Further, the determining the system energy efficiency based on the allocated downlink power comprises:
determining the maximum undistorted information transmission rate r from the base station subcarrier k to the user terminal nn,k;
Determining the power consumption U of the system according to the determined normalized channel coefficient between the base station and the user, the subcarrier distribution result and the distributed downlink powerP(X);
According to determined rn,kAnd UP(X), determining system energy efficiency.
Further, the maximum undistorted information transmission rate r from the base station subcarrier k to the user terminal nn,kExpressed as:
rn,k=log2(1+γn,k)
wherein, γn,kRepresenting the signal-to-noise ratio, gamma, of the signal obtained by the user terminal n from the subcarrier kn,kRepresenting the signal-to-noise ratio of the signal obtained by the user terminal n from the subcarrier k;
system power consumption UP(X) is represented by:
wherein p iskIndicating circuit power consumption, # denotes base station energy recovery factor, xn,kIndicating whether user terminal n uses subcarrier k.
Further, the system energy efficiency is expressed as:
wherein, een,kRepresenting the energy efficiency of the sub-carrier k to the user terminal n,denotes the channel bandwidth of subcarrier K, N denotes the set of user terminals, and K denotes the set of subcarriers available under the current base station.
Further, the determining a reward function based on the system energy efficiency and feeding the reward function back to the deep reinforcement learning model comprises:
punishment is carried out on the system energy efficiency which does not accord with the preset modeling constraint condition according to the type which does not accord with the modeling constraint condition by a weak supervision algorithm based on value return to obtain a return function after a deep reinforcement learning model makes a decision action, and the return function is fed back to the deep reinforcement learning model; wherein the reward function is represented as:
wherein, rewardtRepresenting a return function calculated during the t training; rminThe minimum standard of the user service quality, namely the minimum downlink transmission rate is represented; hinnterThe normalized channel coefficient corresponding to the shortest distance between the nearest base station working at the same subcarrier frequency and the current optimized base station is represented; i iskRepresenting the upper limit of cross-layer interference that the k-th sub-carrier frequency band can bear ξcase1~ξcase3And (3) representing the penalty coefficients of the three cases which do not accord with the modeling constraint on the energy efficiency of the system.
Further, training the convolutional neural network q in the deep reinforcement learning model according to the determined return functioneval、qtargetIf the difference between the system energy efficiency value obtained for a plurality of consecutive times and the preset threshold is within the preset range or higher than the preset threshold, the power local optimal allocation of the currently allocated downlink power in the time-varying channel environment includes:
and storing the return function, the channel environment, the decision action and the transferred inferior state as a quadruple into a memory playback unit memory of the deep reinforcement learning model, wherein the memory is represented as:
memory:D(t)={e(1),...,e(t)}
e(t)=(s(t),a(t),r(t),s(t+1))
wherein, s (t) represents the input state when the deep reinforcement learning model is trained for the t time; a (t) represents the decision-making action made by the deep reinforcement learning model when the deep reinforcement learning model is trained for the t time; r (t) represents the reward function obtained after the action a (t) of the deep reinforcement learning model is performed when the deep reinforcement learning model is trained for the t timet(ii) a s (t +1) represents a secondary state after updating according to a time-varying Markov channel in a finite state when the deep reinforcement learning model is trained for t +1 times;
randomly selecting memory data from a memory playback unit of the deep reinforcement learning model for learning two convolutional neural networks and updating gradient descent, wherein the gradient descent only updates the convolutional neural network qevalQ is updated every fixed times in the deep reinforcement learning model training processtargetParameter thetatargetIs qevalParameter thetaeval;
And if the difference value between the system energy efficiency value obtained for a plurality of times continuously and the preset threshold value is within the preset range or higher than the preset threshold value, the currently allocated downlink power is the local optimal power allocation under the time-varying channel environment.
Further, the gradient descent update formula is expressed as:
wherein the content of the first and second substances,represents a training learning rate; λ represents a discount factor for evaluation of the attitude of the decision body;represents that when the input is the sub-state s (t +1) of the current memory e (t), the weight is thetatargetOf the convolutional neural network qtargetAn action a' which is decided to be capable of harvesting the maximum return; q(s), (t), a (t); thetaeval) Indicating that when the input is the state s (t) of the current memory e (t), the weight is θevalOf the convolutional neural network qevalPerforming the reward value obtained in act a (t);represents a parameter of thetaevalThe convolutional neural network performs gradient descent operation.
The technical scheme of the invention has the following beneficial effects:
in the scheme, two convolutional neural networks q are establishedeval、qtargetForming a deep reinforcement learning model; modeling a time-varying channel environment between a base station and a user terminal into a finite-state time-varying Markov channel, determining a normalized channel coefficient between the base station and the user, and inputting the normalized channel coefficient into a convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user; according to a subcarrier distribution result, distributing downlink power to users multiplexed on each subcarrier based on the inverse ratio of a channel coefficient, determining system energy efficiency based on the distributed downlink power, determining a return function based on the system energy efficiency, and feeding the return function back to a deep reinforcement learning model; training a convolutional neural network q in a deep reinforcement learning model according to the determined return functioneval、qtargetIf the difference value between the system energy efficiency value obtained continuously for multiple times and the preset threshold value is within the preset range or higher than the preset threshold value, the currently allocated downlink power is locally and optimally allocated under the time-varying channel environment; therefore, the time-varying channel environment between the base station and the user terminal is modeled into a time-varying Markov channel in a finite state, so that on the basis of considering the time-varying channel with high complexity, a deep reinforcement learning model is used, the calculation complexity is converted into the process of training the deep reinforcement learning model, and therefore a decision-making action is selected with lower complexity, the local optimal distribution of the sub-carriers from the base station to the user terminal in the time-varying channel environment is determined, and the energy efficiency in the time-varying channel environment is improved to the maximum extent.
Drawings
Fig. 1 is a schematic flowchart of a method for allocating wireless network resources based on deep reinforcement learning according to an embodiment of the present invention;
fig. 2 is a detailed flowchart of a method for allocating wireless network resources based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a wireless network resource allocation method based on deep reinforcement learning, aiming at the problem that the wireless resource allocation in a time-varying channel environment cannot be effectively realized in the prior art.
As shown in fig. 1, a method for allocating wireless network resources based on deep reinforcement learning according to an embodiment of the present invention includes:
s101, establishing a convolutional neural network q with two same parameterseval、qtargetConstructing a Deep enhanced learning model (Deep Q Network, DQN);
s102, modeling the time-varying channel environment between the base station and the user terminal as a time-varying Markov channel in a finite state, determining a normalized channel coefficient between the base station and the user, and inputting the normalized channel coefficient into a convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user;
s103, distributing downlink power to the users multiplexed on each subcarrier based on the inverse ratio of the channel coefficient according to the subcarrier distribution result, determining system energy efficiency based on the distributed downlink power, determining a return function based on the system energy efficiency, and feeding the return function back to the deep reinforcement learning model;
s104, training a convolutional neural network q in the deep reinforcement learning model according to the determined return functioneval、qtargetIf the difference value between the system energy efficiency value obtained continuously for multiple times and the preset threshold value is within the preset range or higher than the preset threshold value, the currently allocated downlink power is locally and optimally allocated under the time-varying channel environment.
The wireless network resource allocation method based on deep reinforcement learning of the embodiment of the invention establishes two convolutional neural networks qeval、qtargetForming a deep reinforcement learning model; modeling a time-varying channel environment between a base station and a user terminal into a finite-state time-varying Markov channel, determining a normalized channel coefficient between the base station and the user, and inputting the normalized channel coefficient into a convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user; according to a subcarrier distribution result, distributing downlink power to users multiplexed on each subcarrier based on the inverse ratio of a channel coefficient, determining system energy efficiency based on the distributed downlink power, determining a return function based on the system energy efficiency, and feeding the return function back to a deep reinforcement learning model; training a convolutional neural network q in a deep reinforcement learning model according to the determined return functioneval、qtargetIf the difference value between the system energy efficiency value obtained continuously for multiple times and the preset threshold value is within the preset range or higher than the preset threshold value, the currently allocated downlink power is locally and optimally allocated under the time-varying channel environment; therefore, the time-varying channel environment between the base station and the user terminal is modeled into a time-varying Markov channel in a finite state, so that on the basis of considering the time-varying channel with high complexity, a deep reinforcement learning model is used, the calculation complexity is converted into the process of training the deep reinforcement learning model, and therefore a decision-making action is selected with lower complexity, the local optimal distribution of the sub-carriers from the base station to the user terminal in the time-varying channel environment is determined, and the energy efficiency in the time-varying channel environment is improved to the maximum extent.
The deep reinforcement learning in the embodiment is a decision method based on artificial intelligence, and is characterized in that sequential decisions are made by a decision body in a dynamically changing environment, states, actions and rewards required by the deep reinforcement learning can be constructed, and the decision body can be automated and optimizes the decision actions when a deep heightening learning model is trained. The wireless network resource allocation method based on deep reinforcement learning can simulate a time-varying channel environment, optimize allocation of wireless network resources in a time-varying network scene to the maximum extent with low calculation complexity, and achieve the effect of jointly improving quick decision and energy efficiency. The trained deep reinforcement learning model can be continuously used for managing wireless resources in a time-varying channel environment and making a quick decision with high return. In a wide range of wireless network optimization, the deep reinforcement learning model can be subjected to distributed calculation, so that the complexity is reduced.
In order to better understand the method for allocating wireless network resources based on deep reinforcement learning in this embodiment, the method is described in detail, and the specific steps may include:
a11, constructing a depth enhanced learning model DQN
In this embodiment, a convolutional neural network q with two identical parameters is initially establishedeval、qtargetForming a deep reinforcement learning model; the decision process of the deep reinforcement learning model is determined by a Q function Q (s, a; theta), wherein theta represents a weight parameter of the convolutional neural network, and the convolutional neural network QevalAnd q istargetRespectively is thetaevalAnd thetatargetThe two are the same when initialized; and the Q function Q (s, a; theta) represents a return value obtained by the convolutional neural network with the weight value of theta when the convolutional neural network executes the action a in the state of s.
In this embodiment, each convolutional neural network is composed of two convolutional layers, two pooling layers, and two fully-connected layers; each training input is [ n ]samples,N,K]First dimension nsamplesRepresenting the number of input samples, second, third dimension ([ N, K)]) Representing an input sample, i.e. with dimensions [ N, K ]]The normalized channel coefficient matrix of (a); the input number in each training is nsamplesThe normalized channel coefficient matrix of each input convolutional neural network is [ N, K ]]Data, output is all possible actions under the current channel state, and the return value Q obtained by each actionaction_val,Qaction_valThe data structure of (a) is a one-dimensional vector [ Action ]num]Therein, ActionNumRepresenting all possible actions, the number of input channel states being nsamplesEach stateGet the return value of all actionsnum]Thus the output is nsamplesOne-dimensional vector [ Actionnum]A two-dimensional matrix is formed.
A12, modeling the time-varying channel environment between base station and user terminal as finite-state time-varying Markov channel, determining the normalized channel coefficient between base station and user, and inputting it to convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers for the user
In this embodiment, a plurality of common-frequency Small Base Stations (SBS) are deployed within a certain range, and the small base stations include an outdoor micro base station, a pico base station, and an indoor home base station. Within the coverage area of each small base station, 6 user terminals (UE) and 3 Subcarriers (SC) available in the non-orthogonal multiple access network are distributed in a certain area by taking the small base station as the center. In the embodiment, an independent deep reinforcement learning model is operated on each small base station, so that the effect of distributed processing is achieved. Initializing parameters of the small cell and the user terminal, including but not limited to: SBS and UEnNormalized channel coefficient H on subcarrier kn,kA channel bandwidth B and a sub-carrier channel bandwidth B allocated to the base stationSCThe circuit consumes power pkEtc., wherein the UEnIndicating user terminal n, SCkRepresenting a sub-carrier k while initializing a user-sub-carrier correlation matrix XN,KAnd Finite State time varying Markov Channel (FSMC) transition probability matrixN represents a set of user terminals, and K represents a set of usable subcarriers under the current base station; user-subcarrier incidence matrix X obtained by initializationN,KAnd finite state time varying Markov channel transition probability matrixUsed for subsequent user association matrix optimization and calculation of updated channel state.
In this embodiment, the optimization letterObtaining initial coordinates through space random scattering of time-varying Markov channel with finite state channel environment, calculating initial normalized channel coefficient matrix, and quantizing the obtained value in ten steps with quantization boundary being bound0,...,bound9The optimized scene is based on a time-varying Markov channel transition probability matrixAnd (4) changing. Transition probability matrixElement available probability transition indicator p in (1)i,jWhere i represents the current state, j represents the next state (the state after the action was performed in the current state), pi,jRepresenting the probability of transition from the current state i to the secondary state j; stipulate when i equals ji,jTaking the maximum value, namely keeping the probability of the original channel state to be maximum, wherein the probability of transferring to the adjacent second state is half of the probability of transferring to the adjacent first state, and each iteration is according toAnd updating the environment.
In this embodiment, the user-subcarrier correlation matrix XN,KMay use a user-subcarrier allocation indicator xn,kDenotes xn,kIndicating whether the user terminal n uses the subcarrier k, in a specific application, for example, binary 1 (x)n,k1) indicates that the user terminal n uses subcarrier k and uses binary 0 (x)n,k0) means that the user terminal n does not use the subcarrier k, i.e. does not apply for resources using the subcarrier k. All possible subcarrier allocation calculation methods are as follows:
the number of combinations C is introduced, and if the upper limit of the number of the non-orthogonal multiple access network subcarrier multiplexing users is 2 and each user can only use one subcarrier (the number can be adjusted according to practical application), the types are sharedFor convenience of explanation, this embodimentCalculating by using small-capacity small base station network modelA simplified case of (1). Will ActionnumThe possible subcarrier allocation methods are stored in a list structure, denoted ActionlistThe list index corresponds to a possible subcarrier distribution method, and the subcarrier distribution method can be matched according to the index value, so that the complexity of DQN processing is reduced, and the DQN decision Action is designed to be an integer [0, Action ]num-1](ii) a Wherein each subcarrier allocation method corresponds to a user-subcarrier correlation matrix XN,K。
In this embodiment, a ratio of gain to noise between the base station and the user terminal is used as a normalized channel coefficient, and the normalized channel coefficient is determined by the following formula:
wherein Hn,kThe normalized channel coefficient is expressed as the normalized channel gain of the base station and the user terminal n on the subcarrier k; h isn,kRepresenting the channel gain of a base station and a user terminal n on a subcarrier k, calculating according to Rayleigh fast fading and large-scale fading caused by distance, wherein the common service range based on a small base station is an indoor environment, and adding two layers of wall loss;represents the noise power on subcarrier k, where E [ ·]The mathematical expectation is represented by the mathematical expectation,represents a mean of 0 and a variance ofWhite additive gaussian noise.
In this embodiment, the normalized channel coefficient is input to a convolutional neural network qevalConvolutional neural network qevalBy means of decision formulasSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user;
wherein the Q function Q (s, a'; theta)eval) Representing a convolutional neural network qevalThe decision body executes the return value obtained by the action a' in a state s, wherein the state s is an input normalized channel coefficient; a represents the decision action of the deep reinforcement learning model, namely the optimal subcarrier allocation result, and is a possible XN,KAnd represents the correlation matrix of the user terminal n and the subcarrier k.
In this embodiment, the input of the DQN of the deep enhanced learning model is the state s of the DQN decision-making body, i.e. the normalized channel coefficient (specifically: two-dimensional normalized channel coefficient matrix H)N,K) (ii) a The output is a one-dimensional vector Qaction_valAt Qaction_valThe action a' with the largest value is selected as the decision action for subcarrier allocation (optimal subcarrier allocation result), and therefore, in Qaction_valIndex into Action of Action with the largest valuelistMatching to obtain the current decision action XN,KThereby obtaining a user-subcarrier correlation matrix X when the subcarriers from the base station to the user terminal obtain the local optimal distribution valueN,KThus, the complexity of DQN processing can be reduced by matching the subcarrier allocation method according to the index value.
A13, according to the optimal subcarrier allocation result, based on the fractional order algorithm allocated by the fixed subcarriers, that is, allocating downlink power to the users multiplexed on each subcarrier under the same subcarrier according to the channel gain coefficient inverse proportion rule (wherein, the user with larger channel gain allocates smaller power, and the user with smaller channel gain allocates larger power).
In this embodiment, the downlink power allocated to the user is represented as:
wherein p isn,kIndicating that the base station is on a subcarrierk is the downlink transmitting power distributed to the user terminal n; p'kIndicating the downlink transmitting power distributed by the base station on the subcarrier k; a represents an attenuation factor with a constraint of 0<a<1, in the same sub-optimization process, the value of a is a fixed value and can not be changed according to different users or different subcarriers; kmaxRepresents the maximum number of users multiplexed on each subcarrier in a non-orthogonal multiple access network under the complexity that the current Successive Interference Cancellation (SIC) can bear.
A14, determining the maximum undistorted information transmission rate r from the base station subcarrier k to the user terminal nn,k
In this embodiment, the maximum undistorted information transmission rate r from the base station subcarrier k to the user terminal nn,kExpressed as:
rn,k=log2(1+γn,k)
wherein, γn,kRepresenting the signal-to-noise ratio, gamma, of the signal obtained by the user terminal n from the subcarrier kn,kRepresenting the signal-to-noise ratio of the signal obtained by the user terminal n from subcarrier k.
In this embodiment, in a non-orthogonal multiple access network, the normalized channel coefficients of users multiplexed on the same subcarrier are arranged in a descending order, and are represented as:
|H1,k|≥|H2,k|≥…≥|Hn,k|≥|Hn+1,k|≥…≥|HKmax,k|
based on the optimal decoding order of the successive interference canceller, when the ue i is located before j in the sequence, the interference from the ue j can be successfully decoded and removed, and the ue j receives the signal of the ue i and accepts the signal as interference. In the non-orthogonal multiple access network, considering the fairness among users and the principle of reducing co-channel interference, when allocating power, the user with good channel condition allocates less power, i.e. in the above example, if H isi,k>Hj,kThen p is allocatedi,k<pj,kIn accordance with the assignment rule of the fractional order algorithm in a 13.
The co-frequency interference and the calculation complexity are reduced as much as possible under the small base station scene, and the number of the multiplexed sub-carriers is predefined to be KmaxThe maximum information transmission rate for ue i and ue j is a logarithmic function of the Signal to Interference plus Noise Ratio (SINR). Chi shapeINNER=pi,kHj,kIndicating the intra-layer co-channel interference experienced by the user terminal j under the service of the current base station.
In this embodiment, the maximum transmission rate of the user terminal i and the user terminal j is represented as:
namely:
ri,k=log2(1+pi,kHi,k),
a16, determining the system power consumption UP(X)
In this embodiment, it is considered that the small cell has an energy recovery unit, and the system power consumption UP(X) is represented by:
in this example, pkRepresents the power consumed by the circuit; psi denotes the base station energy recovery coefficient, which can be modified according to the actual hardware properties.
A17, according to determined gamman,kAnd UP(X), determining the energy efficiency of the system
In the present example, based on the obtainedMaximum undistorted information transmission rate r from base station subcarrier k to user terminal nn,kAnd system power consumption UP(X) calculating the energy efficiency ee of the subcarriers k to the user terminal nn,k:
Wherein the content of the first and second substances,representing the subcarrier k channel bandwidth.
In this embodiment, the system energy efficiency is expressed as:
wherein, een,kRepresenting the energy efficiency of the sub-carrier k to the user terminal n,denotes the channel bandwidth of subcarrier K, N denotes the set of user terminals, and K denotes the set of subcarriers available under the current base station.
A17, determining a return function based on the system energy efficiency, and feeding the return function back to the deep reinforcement learning model
In this embodiment, for system energy efficiency that does not meet preset modeling constraint conditions (the modeling constraint conditions are determined by factors such as an inter-user fairness principle, a minimum quality of service standard, and an upper limit of cross-layer interference), a weak supervision algorithm based on value return punishment is performed on the system energy efficiency according to types that do not meet the modeling constraint conditions, a return function after a deep reinforcement learning model makes a decision action is obtained, and the return function is fed back to the deep reinforcement learning model; wherein the reward function is represented as:
wherein, rewardtRepresenting a return function calculated during the t training; rminRepresents the minimum standard of user quality of Service (QoS), i.e. the minimum downlink transmission rate; hinnterThe normalized channel coefficient representing the shortest distance between the nearest base station operating at the same subcarrier frequency and the currently optimized base station may be calculated according to the method in step a 12; i iskRepresenting the cross-layer (cross-station) interference upper limit which the k sub-carrier frequency band can bear, setting and adjusting the interference upper limit according to specific application ξcase1~ξcase3Penalty coefficients for energy efficiency are represented for three cases that do not meet the modeling constraints.
In addition, the following should be noted: when the system energy efficiency is directly taken as the return function, xn,kα other constraints are also required, in combination with the above constraint, where xn,kAnd the constraint conditions to be met by the a are as follows:
wherein, BSpeakRepresenting the peak power of the small base station; condition 1The user terminal is forced to be associated with 1 subcarrier at the same time; condition 2Limiting the maximum number of users multiplexed on the same subcarrier in a non-orthogonal multiple access network, wherein the number is KmaxThe purpose is to reduce intra-station interference and to reduce the complexity of the successive interference canceller; condition 3For QoS constraints, the information transmission rate of all user terminals served by the base station should exceed the user quality of service minimum limit. Condition 4Is to the slave base station in the sub-carrierThe limit of the maximum transmit power of wave k. Condition 5The method is an effective interference coordination mechanism, and limits the interference of the currently optimized base station to other base stations. Condition 6Is the limit on the attenuation factor when distributing power.
A18, storing the report function, channel environment, decision action and transition order state into DQN memory playback unit
In this embodiment, the reporting function, the channel environment, the decision action, and the transition order (transition state) are stored as a quadruple in the DQN memory playback unit memory, where the memory is represented as:
memory:D(t)={e(1),...,e(t)}
e(t)=(s(t),a(t),r(t),s(t+1))
wherein, s (t) represents the normalized channel coefficient (state) input during the t-th training of the model; a (t) represents the decision-making action made by the DQN when the deep reinforcement learning model is trained for the t time, namely a user-subcarrier correlation matrix; r (t) represents a reward function obtained after the action a (t) of the DQN is finished when the t training deep reinforcement learning model is trainedt(ii) a s (t +1) represents the normalized channel coefficient (secondary state) after updating according to the time-varying Markov channel in the finite state when the deep reinforcement learning model is trained for t +1 times.
In this embodiment, each group e (t) is stored by defining a memory playback class and setting the memory as a data structure of an object array or a dictionary.
A19, training a deep reinforcement learning model by using a batch processing mode, and randomly selecting batch memory data with a fixed size from the DQN memory playback unit for learning and gradient descent updating of two convolutional neural networks.
In this embodiment, the memory data is processed by using a Loss function Loss (θ), which is expressed as:
the gradient descent update formula is expressed as:
wherein the content of the first and second substances,represents a training learning rate; λ represents a discount factor for evaluation of the attitude of the decision body;represents that when the input is the sub-state s (t +1) of the current memory e (t), the weight is thetatargetOf the convolutional neural network qtargetAn action a' which is decided to be capable of harvesting the maximum return; q(s), (t), a (t); thetaeval) Indicating that when the input is the state s (t) of the current memory e (t), the weight is θevalOf the convolutional neural network qevalPerforming the reward value obtained in act a (t);represents a parameter of thetaevalThe convolutional neural network performs gradient descent operation, i.e. modifies the convolutional neural network qevalParameter theta ofevalMake the convolutional neural network qtargetAnd q isevalThe output of (c) is subtracted to a minimum.
In the present embodiment, the subtraction Q(s) (t), a (t); θeval) If the memory unit e (1) selects action 2, only updating [1,2 ] of two convolutional neural networks by gradient descent updating formula]The values of the positions are unchanged, the values corresponding to the rest of actions in the first dimension are unchanged, and in order to ensure the stability of training, the gradient descent only updates the convolutional neural network qevalThe parameter (c) of (c).
A20, updating q every fixed times in the deep reinforcement learning model training processtargetParameter qevalThe parameters, expressed as:
wherein, CiterA counter for representing training is used for recording the training times; cmaxDenotes qtargetParameter and qevalUpdate interval of parameter, also CiterOf (2) and thus CiterIs equal to CmaxAnd then, the zero is reset.
A21, q updated by steps A19 and A20targetNetwork parameters and qevalIf the difference value between the system energy efficiency value which is continuously optimized for multiple times and a preset threshold (specified value) is within a preset range or is higher than the preset threshold, the deep reinforcement learning model can be considered to be suitable for wireless resource allocation in the time-varying channel environment, the currently allocated downlink power is locally optimal power allocation in the time-varying channel environment, the current deep reinforcement learning model achieves locally optimal allocation of network resources in the time-varying environment, and the obtained deep reinforcement learning model can be continuously used in the actual time-varying channel environment;
a22, otherwise, pressUpdate environment, judge Citer=CmaxIf true, let Citer=0、θtarget=θevalThen, step A12 is executed; otherwise, step a12 is directly executed until the difference between the recalculated system energy efficiency value and the preset threshold is within the preset range or higher than the preset threshold, at which time the best optimization in the time-varying channel environment is achieved.
In this embodiment, as the number of times of optimization t increases, the return value of the DQN model in the time-varying channel environment gradually tends from low to higher, and this process is a wireless network resource allocation method based on deep reinforcement learning, thereby implementing optimization of subcarrier and power allocation in the time-varying channel environment.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (1)
1. A wireless network resource allocation method based on deep reinforcement learning is characterized by comprising the following steps:
s101, establishing a convolutional neural network q with two same parameterseval、qtargetForming a deep reinforcement learning model;
s102, modeling the time-varying channel environment between the base station and the user terminal as a time-varying Markov channel in a finite state, determining a normalized channel coefficient between the base station and the user, and inputting the normalized channel coefficient into a convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user;
s103, distributing downlink power to the users multiplexed on each subcarrier based on the inverse ratio of the channel coefficient according to the subcarrier distribution result, determining system energy efficiency based on the distributed downlink power, determining a return function based on the system energy efficiency, and feeding the return function back to the deep reinforcement learning model;
s104, training a convolutional neural network q in the deep reinforcement learning model according to the determined return functioneval、qtargetIf the difference value between the system energy efficiency value obtained for a plurality of continuous times and the preset threshold value is within the preset range, or the system energy efficiency value obtained for a plurality of continuous times is higher than the preset threshold value, the currently allocated downlink power is the local optimal power allocation under the time-varying channel environment;
wherein the normalized channel coefficient is represented as:
wherein Hn,kThe normalized channel coefficient is expressed as the normalized channel gain of the base station and the user terminal n on the subcarrier k; h isn,kRepresenting the channel gain of the base station and the user terminal n on the subcarrier k;represents the noise power on subcarrier k;
wherein the input convolutional neural network qevalSelecting the action with the maximum output return value as a decision action, and allocating subcarriers for the user comprises the following steps:
inputting the normalized channel coefficients into a convolutional neural network qevalConvolutional neural network qevalBy means of decision formulasSelecting the action with the maximum output return value as a decision action, and allocating subcarriers to the user;
wherein, thetaevalRepresenting a convolutional neural network qevalThe weight parameter of (2), Q function Q (s, a'; theta)eval) The weight is represented as thetaevalOf the convolutional neural network qevalPerforming the reported value obtained by action a' at state s, which is the input normalized channel coefficient; a represents a decision action of the deep reinforcement learning model, namely an optimal subcarrier distribution result, wherein the optimal subcarrier distribution result is obtained according to an index of the action with the maximum return value;
wherein, the downlink power allocated to the user is represented as:
wherein p isn,kIndicating the downlink transmitting power distributed by the base station for the user terminal n on the subcarrier k; p'kTo representThe downlink transmitting power distributed by the base station on the sub-carrier K, α represents the attenuation factor, KmaxThe maximum number of users multiplexed on each subcarrier in a non-orthogonal multiple access network under the complexity borne by the current serial interference eliminator is represented;
wherein determining system energy efficiency based on the allocated downlink power comprises:
determining the maximum undistorted information transmission rate r from the base station subcarrier k to the user terminal nn,k;
Determining the power consumption U of the system according to the determined normalized channel coefficient between the base station and the user, the subcarrier distribution result and the distributed downlink powerP(X);
According to determined rn,kAnd UP(X) determining a system energy efficiency;
wherein, the maximum undistorted information transmission rate r from the base station subcarrier k to the user terminal nn,kExpressed as:
rn,k=log2(1+γn,k)
wherein, γn,kRepresenting the signal-to-noise ratio of the signal obtained by the user terminal n from the subcarrier k;
system power consumption UP(X) is represented by:
wherein p iskIndicating circuit power consumption, # denotes base station energy recovery factor, xn,kIndicating whether user terminal n uses subcarrier k;
wherein the system energy efficiency is expressed as:
wherein, een,kRepresenting sub-carriersk to the energy efficiency of the user terminal n,representing a channel bandwidth of a subcarrier K, N representing a set of user terminals, and K representing a set of subcarriers available under a current base station;
wherein the determining a reward function based on the system energy efficiency and feeding back the reward function to the deep reinforcement learning model comprises:
punishment is carried out on the system energy efficiency which does not accord with the preset modeling constraint condition according to the type which does not accord with the modeling constraint condition by a weak supervision algorithm based on value return to obtain a return function after a deep reinforcement learning model makes a decision action, and the return function is fed back to the deep reinforcement learning model; wherein the reward function is represented as:
wherein, rewardtRepresenting a return function calculated during the t training; rminThe minimum standard of the user service quality, namely the minimum downlink transmission rate is represented; hinnterThe normalized channel coefficient corresponding to the shortest distance between the nearest base station working at the same subcarrier frequency and the current optimized base station is represented; i iskRepresenting the upper limit of cross-layer interference that the k-th sub-carrier frequency band can bear ξcase1~ξcase3Penalty coefficients representing the system energy efficiency for three cases not conforming to the modeling constraints;
wherein the convolutional neural network q in the deep reinforcement learning model is trained according to the determined return functioneval、qtargetIf the difference between the system energy efficiency value obtained for a plurality of consecutive times and the preset threshold is within the preset range, or the system energy efficiency value obtained for a plurality of consecutive times is higher than the preset threshold, the power local optimal allocation of the currently allocated downlink power in the time-varying channel environment includes:
and storing the return function, the channel environment, the decision action and the transferred inferior state as a quadruple into a memory playback unit memory of the deep reinforcement learning model, wherein the memory is represented as:
memory:D(t)={e(1),...,e(t)}
e(t)=(s(t),a(t),r(t),s(t+1))
wherein, s (t) represents the input state when the deep reinforcement learning model is trained for the t time; a (t) represents the decision-making action made by the deep reinforcement learning model when the deep reinforcement learning model is trained for the t time; r (t) represents the reward function obtained after the action a (t) of the deep reinforcement learning model is performed when the deep reinforcement learning model is trained for the t timet(ii) a s (t +1) represents a secondary state after updating according to a time-varying Markov channel in a finite state when the deep reinforcement learning model is trained for t +1 times;
randomly selecting memory data from a memory playback unit of the deep reinforcement learning model for learning two convolutional neural networks and updating gradient descent, wherein the gradient descent only updates the convolutional neural network qevalQ is updated every fixed times in the deep reinforcement learning model training processtargetParameter thetatargetIs qevalParameter thetaeval;
If the difference value between the system energy efficiency value obtained for a plurality of continuous times and the preset threshold value is within the preset range, or the system energy efficiency value obtained for a plurality of continuous times is higher than the preset threshold value, the currently allocated downlink power is the local optimal power allocation under the time-varying channel environment;
wherein the gradient descent update formula is represented as:
wherein the content of the first and second substances,represents a training learning rate; λ represents a discount factor for evaluation of the attitude of the decision body;represents that when the input is the sub-state s (t +1) of the current memory e (t), the weight is thetatargetOf the convolutional neural network qtargetAn action a' which is decided to be capable of harvesting the maximum return; q(s), (t), a (t); thetaeval) Indicating that when the input is the state s (t) of the current memory e (t), the weight is θevalOf the convolutional neural network qevalPerforming the reward value obtained in act a (t);represents a parameter of thetaevalThe convolutional neural network performs gradient descent operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811535056.1A CN109474980B (en) | 2018-12-14 | 2018-12-14 | Wireless network resource allocation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811535056.1A CN109474980B (en) | 2018-12-14 | 2018-12-14 | Wireless network resource allocation method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109474980A CN109474980A (en) | 2019-03-15 |
CN109474980B true CN109474980B (en) | 2020-04-28 |
Family
ID=65675169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811535056.1A Active CN109474980B (en) | 2018-12-14 | 2018-12-14 | Wireless network resource allocation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109474980B (en) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113615277B (en) * | 2019-03-27 | 2023-03-24 | 华为技术有限公司 | Power distribution method and device based on neural network |
CN109962728B (en) * | 2019-03-28 | 2021-01-26 | 北京邮电大学 | Multi-node joint power control method based on deep reinforcement learning |
CN110084245B (en) * | 2019-04-04 | 2020-12-25 | 中国科学院自动化研究所 | Weak supervision image detection method and system based on visual attention mechanism reinforcement learning |
CN110430613B (en) * | 2019-04-11 | 2022-07-01 | 重庆邮电大学 | Energy-efficiency-based resource allocation method for multi-carrier non-orthogonal multiple access system |
CN110035478A (en) * | 2019-04-18 | 2019-07-19 | 北京邮电大学 | A kind of dynamic multi-channel cut-in method under high-speed mobile scene |
CN110167176B (en) * | 2019-04-25 | 2021-06-01 | 北京科技大学 | Wireless network resource allocation method based on distributed machine learning |
CN110401975A (en) * | 2019-07-05 | 2019-11-01 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of method, apparatus and electronic equipment of the transmission power adjusting internet of things equipment |
CN110380776B (en) * | 2019-08-22 | 2021-05-14 | 电子科技大学 | Internet of things system data collection method based on unmanned aerial vehicle |
CN110635833B (en) * | 2019-09-25 | 2020-12-15 | 北京邮电大学 | Power distribution method and device based on deep learning |
CN111428903A (en) * | 2019-10-31 | 2020-07-17 | 国家电网有限公司 | Interruptible load optimization method based on deep reinforcement learning |
CN110809306B (en) * | 2019-11-04 | 2021-03-16 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
CN110972309B (en) * | 2019-11-08 | 2022-07-19 | 厦门大学 | Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning |
US11246173B2 (en) | 2019-11-08 | 2022-02-08 | Huawei Technologies Co. Ltd. | Systems and methods for multi-user pairing in wireless communication networks |
CN112988229B (en) * | 2019-12-12 | 2022-08-05 | 上海大学 | Convolutional neural network resource optimization configuration method based on heterogeneous computation |
CN111211831A (en) * | 2020-01-13 | 2020-05-29 | 东方红卫星移动通信有限公司 | Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method |
CN111431646B (en) * | 2020-03-31 | 2021-06-15 | 北京邮电大学 | Dynamic resource allocation method in millimeter wave system |
CN111526592B (en) * | 2020-04-14 | 2022-04-08 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN112104400B (en) * | 2020-04-24 | 2023-04-07 | 广西华南通信股份有限公司 | Combined relay selection method and system based on supervised machine learning |
CN111542107A (en) * | 2020-05-14 | 2020-08-14 | 南昌工程学院 | Mobile edge network resource allocation method based on reinforcement learning |
CN111885720B (en) * | 2020-06-08 | 2021-05-28 | 中山大学 | Multi-user subcarrier power distribution method based on deep reinforcement learning |
CN111867110B (en) * | 2020-06-17 | 2023-10-03 | 三明学院 | Wireless network channel separation energy-saving method based on switch switching strategy |
CN111930501B (en) * | 2020-07-23 | 2022-08-26 | 齐齐哈尔大学 | Wireless resource allocation method based on unsupervised learning and oriented to multi-cell network |
CN112770398A (en) * | 2020-12-18 | 2021-05-07 | 北京科技大学 | Far-end radio frequency end power control method based on convolutional neural network |
CN113115355B (en) * | 2021-04-29 | 2022-04-22 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113490184B (en) * | 2021-05-10 | 2023-05-26 | 北京科技大学 | Random access resource optimization method and device for intelligent factory |
CN113395757B (en) * | 2021-06-10 | 2023-06-30 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN114126025B (en) * | 2021-11-02 | 2023-04-28 | 中国联合网络通信集团有限公司 | Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server |
CN114142912B (en) * | 2021-11-26 | 2023-01-06 | 西安电子科技大学 | Resource control method for guaranteeing time coverage continuity of high-dynamic air network |
CN114360305A (en) * | 2021-12-15 | 2022-04-15 | 广州创显科教股份有限公司 | Classroom interactive teaching method and system based on 5G network |
CN114928549A (en) * | 2022-04-20 | 2022-08-19 | 清华大学 | Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105407535A (en) * | 2015-10-22 | 2016-03-16 | 东南大学 | High energy efficiency resource optimization method based on constrained Markov decision process |
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN106909728A (en) * | 2017-02-21 | 2017-06-30 | 电子科技大学 | A kind of FPGA interconnection resources configuration generating methods based on enhancing study |
CN108307510A (en) * | 2018-02-28 | 2018-07-20 | 北京科技大学 | A kind of power distribution method in isomery subzone network |
CN108712748A (en) * | 2018-04-12 | 2018-10-26 | 天津大学 | A method of the anti-interference intelligent decision of cognitive radio based on intensified learning |
CN108737057A (en) * | 2018-04-27 | 2018-11-02 | 南京邮电大学 | Multicarrier based on deep learning recognizes NOMA resource allocation methods |
CN108989099A (en) * | 2018-07-02 | 2018-12-11 | 北京邮电大学 | Federated resource distribution method and system based on software definition Incorporate network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180121766A1 (en) * | 2016-09-18 | 2018-05-03 | Newvoicemedia, Ltd. | Enhanced human/machine workforce management using reinforcement learning |
US20180091981A1 (en) * | 2016-09-23 | 2018-03-29 | Board Of Trustees Of The University Of Arkansas | Smart vehicular hybrid network systems and applications of same |
-
2018
- 2018-12-14 CN CN201811535056.1A patent/CN109474980B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN105407535A (en) * | 2015-10-22 | 2016-03-16 | 东南大学 | High energy efficiency resource optimization method based on constrained Markov decision process |
CN106909728A (en) * | 2017-02-21 | 2017-06-30 | 电子科技大学 | A kind of FPGA interconnection resources configuration generating methods based on enhancing study |
CN108307510A (en) * | 2018-02-28 | 2018-07-20 | 北京科技大学 | A kind of power distribution method in isomery subzone network |
CN108712748A (en) * | 2018-04-12 | 2018-10-26 | 天津大学 | A method of the anti-interference intelligent decision of cognitive radio based on intensified learning |
CN108737057A (en) * | 2018-04-27 | 2018-11-02 | 南京邮电大学 | Multicarrier based on deep learning recognizes NOMA resource allocation methods |
CN108989099A (en) * | 2018-07-02 | 2018-12-11 | 北京邮电大学 | Federated resource distribution method and system based on software definition Incorporate network |
Non-Patent Citations (2)
Title |
---|
Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach;Ying He等;《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》;20171006;全文 * |
Power Allocation in Multi-cell Networks Using Deep Reinforcement Learning;Yong Zhang等;《2018 IEEE 88th Vehicular Technology Conference (VTC-Fall)》;20180830;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109474980A (en) | 2019-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109474980B (en) | Wireless network resource allocation method based on deep reinforcement learning | |
CN110493826B (en) | Heterogeneous cloud wireless access network resource allocation method based on deep reinforcement learning | |
CN109729528B (en) | D2D resource allocation method based on multi-agent deep reinforcement learning | |
CN106358308A (en) | Resource allocation method for reinforcement learning in ultra-dense network | |
CN107426820B (en) | Resource allocation method for improving energy efficiency of multi-user game in cognitive D2D communication system | |
CN110708711A (en) | Heterogeneous energy-carrying communication network resource allocation method based on non-orthogonal multiple access | |
Wang et al. | Joint interference alignment and power control for dense networks via deep reinforcement learning | |
CN105451322B (en) | A kind of channel distribution and Poewr control method based on QoS in D2D network | |
AlQerm et al. | Enhanced machine learning scheme for energy efficient resource allocation in 5G heterogeneous cloud radio access networks | |
CN107708157A (en) | Intensive small cell network resource allocation methods based on efficiency | |
CN106792451B (en) | D2D communication resource optimization method based on multi-population genetic algorithm | |
CN109982437B (en) | D2D communication spectrum allocation method based on location-aware weighted graph | |
CN113316154B (en) | Authorized and unauthorized D2D communication resource joint intelligent distribution method | |
Coskun et al. | Three-stage resource allocation algorithm for energy-efficient heterogeneous networks | |
Shahid et al. | Self-organized energy-efficient cross-layer optimization for device to device communication in heterogeneous cellular networks | |
Zhang et al. | Resource optimization-based interference management for hybrid self-organized small-cell network | |
CN105490794B (en) | The packet-based resource allocation methods of the Femto cell OFDMA double-layer network | |
Yu et al. | Interference coordination strategy based on Nash bargaining for small‐cell networks | |
CN110139282B (en) | Energy acquisition D2D communication resource allocation method based on neural network | |
CN114867030A (en) | Double-time-scale intelligent wireless access network slicing method | |
CN110677175A (en) | Sub-channel scheduling and power distribution joint optimization method based on non-orthogonal multiple access system | |
CN114423028A (en) | CoMP-NOMA (coordinated multi-point-non-orthogonal multiple Access) cooperative clustering and power distribution method based on multi-agent deep reinforcement learning | |
Khodmi et al. | Joint user-channel assignment and power allocation for non-orthogonal multiple access in a 5G heterogeneous ultra-dense networks | |
CN110677176A (en) | Combined compromise optimization method based on energy efficiency and spectrum efficiency | |
CN109275163B (en) | Non-orthogonal multiple access joint bandwidth and rate allocation method based on structured ordering characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |