CN109474980A - A kind of wireless network resource distribution method based on depth enhancing study - Google Patents
A kind of wireless network resource distribution method based on depth enhancing study Download PDFInfo
- Publication number
- CN109474980A CN109474980A CN201811535056.1A CN201811535056A CN109474980A CN 109474980 A CN109474980 A CN 109474980A CN 201811535056 A CN201811535056 A CN 201811535056A CN 109474980 A CN109474980 A CN 109474980A
- Authority
- CN
- China
- Prior art keywords
- eval
- depth
- subcarrier
- indicate
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/06—TPC algorithms
- H04W52/14—Separate analysis of uplink or downlink
- H04W52/143—Downlink power control
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/24—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
- H04W52/241—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/26—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
- H04W52/265—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/30—TPC using constraints in the total amount of available transmission power
- H04W52/34—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
- H04W52/346—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/542—Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/54—Allocation or scheduling criteria for wireless resources based on quality criteria
- H04W72/543—Allocation or scheduling criteria for wireless resources based on quality criteria based on requested quality, e.g. QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
Abstract
The present invention provides a kind of wireless network resource distribution method based on depth enhancing study, and the energy efficiency in time-variant channel environment can be improved to the maximum extent with lower complexity.The described method includes: establishing depth enhancing learning model;Time-variant channel environment between base station and user terminal is modeled as to the time-varying Markov channel of finite state, determines normalization channel coefficients, and input convolutional neural networks qeval, select the maximum movement of output return value to act as decision, distribute subcarrier for user;According to sub-carrier allocation results, the inverse ratio based on channel coefficients is the user's allocation of downlink power being multiplexed on each subcarrier, determines Reward Program based on the descending power of distribution, and Reward Program is fed back to depth enhancing learning model;According to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、qtarget, determine that power local optimum is distributed under time-variant channel environment.The present invention relates to wireless communication and artificial intelligence decision domains.
Description
Technical field
The present invention relates to wireless communication and artificial intelligence decision domain, particularly relate to it is a kind of based on depth enhancing study
Wireless network resource distribution method.
Background technique
In long term evolution (Long Term Evolution, the LTE) epoch, networking framework is from macro network to macro micro- collaboration
Transformation, macrocellular (Macro Cell) Faced In Sustainable Development lot of challenges, for example, not expected business increased requirement,
Ubiquitous access demand, random hot spot deployment and the biggish cost pressure of macrocellular itself.Therefore, microcellulor, Home eNodeB
Etc. small base station (Small Cell) precisely cover, supplement blind area the advantages of emerge from, and be increasingly becoming network deployment in it is macro
Base station cooperates, and shares the important link of macro base station service pressure.5th third-generation mobile communication is the extension after 4G, 5G
It is not a single wireless access technology, but various new wireless access technology and existing wireless access technology evolution collection
The general name of solution after.Nowadays 5G network initially enters the sight of people again, and industry generally believes user experience rate
It is the most important performance indicator of 5G.The technical characterstic of 5G can be summarized with several numbers: the capacity boost of 1000x, 100,000,000,000+
Connection support, the maximum speed of 10GB/s, 1ms delay below.Major technique includes ultra-large multiple antennas in 5G, novel
Multiple access technique and super-intensive network, wherein the deployment of small base station and macro base station constitute super-intensive heterogeneous network, for
Family provides ubiquitous business.
With the sharp increase of mobile subscriber's quantity, the laying of small base station also tends to super-intensive, wireless communication field bring
Energy consumption be it is very huge, for China's national conditions that environmental pollution is serious and the energy is increasingly in short supply, green communications are inevitable
It is the direction for being worth research and discovery, therefore, on the basis of guaranteeing to meet user data demand and service quality, passes through conjunction
The resource distribution mode of reason realizes that higher energy efficiency is an important research direction, still, in the prior art, not yet
Effective optimization method it can be considered that time varying channel influence, simulate practical time-variant channel environment, divided with lower computation complexity
Distribution network resource and the optimization method for obtaining higher-energy efficiency.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of wireless network resource distribution sides based on depth enhancing study
Method, to solve the problems, such as that radio resource allocation in time-variant channel environment can not be effectively realized present in the prior art.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of wireless network resource based on depth enhancing study
Distribution method, comprising:
S101 establishes the convolutional neural networks q by two identical parameterseval、qtargetConstituting depth enhances learning model;
S102 believes the time-varying Markov that the time-variant channel environment between base station and user terminal is modeled as finite state
Road determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value is most
Big movement is acted as decision, distributes subcarrier for user;
S103, according to sub-carrier allocation results, the inverse ratio based on channel coefficients is the user point being multiplexed on each subcarrier
With descending power, system energy efficiency is determined based on the descending power of distribution, return letter is determined based on the system energy efficiency
Number, and Reward Program is fed back into depth enhancing learning model;
S104, according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、
qtargetIf the difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or is higher than
Preset threshold, the then descending power currently distributed are that power local optimum is distributed under time-variant channel environment.
Further, the normalization channel coefficients indicate are as follows:
Wherein, Hn,kTo normalize channel coefficients, indicate that the normalization channel of base station and user terminal n on sub-carrierk increases
Benefit;hn,kIndicate base station and the channel gain of user terminal n on sub-carrierk;Indicate noise power on sub-carrierk.
Further, the input convolutional neural networks qeval, select the maximum movement of output return value dynamic as decision
Make, distributing subcarrier for user includes:
The normalization channel coefficients are inputted into convolutional neural networks qeval, convolutional neural networks qevalPass through decision formulaIt selects the maximum movement of output return value to act as decision, distributes subcarrier for user;
Wherein, θevalIndicate convolutional neural networks qevalWeighting parameter, Q function Q (s, a ';θeval) indicate that weight is
θevalConvolutional neural networks qevalIn state s, execution acts a ' return value obtained, and the state s is the normalizing of input
Change channel coefficients;A indicates the decision movement of depth enhancing learning model, i.e., optimal sub-carrier allocation results, wherein according to return
The index for being worth maximum movement obtains optimal sub-carrier allocation results.
Further, the descending power for user's distribution indicates are as follows:
Wherein, pn,kIndicate that base station is the down transmitting power of user terminal n distribution on sub-carrierk;p'kIndicate base station
The down transmitting power distributed on sub-carrierk;A indicates decay factor;KmaxIt indicates in non-orthogonal multiple access network, when
Under the complexity that preceding serial interference canceller can be born, the maximum number of user that is multiplexed on each subcarrier.
Further, the descending power based on distribution determines that system energy efficiency includes:
Determine the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,k;
According to the normalization channel coefficients between determining base station and user, the downlink function of sub-carrier allocation results and distribution
Rate determines system power consumption UP(X);
According to determining rn,kAnd UP(X), system energy efficiency is determined.
Further, the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,kIt indicates are as follows:
rn,k=log2(1+γn,k)
Wherein, γn,kIndicate the Signal-to-Noise that user terminal n is obtained from subcarrier k, γn,kIndicate user terminal n from son
The Signal-to-Noise that carrier wave k is obtained;
System power consumption UP(X) it indicates are as follows:
Wherein, pkIndication circuit consumes power, and ψ indicates base station energy regenerating coefficient, xn,kIndicate whether user terminal n makes
With subcarrier k.
Further, system energy efficiency indicates are as follows:
Wherein, een,kIndicate the energy efficiency of subcarrier k to user terminal n,Indicate subcarrier k channel width, N table
Show the set of user terminal, K indicates the set of workable subcarrier under current base station.
Further, described that Reward Program is determined based on the system energy efficiency, and Reward Program is fed back into depth
Enhancing learning model includes:
The system energy efficiency for not meeting preset modeling constraint condition is pressed with the Weakly supervised algorithm based on value return
The system energy efficiency is punished according to the type for not meeting modeling constraint condition, depth enhancing learning model is obtained and makes a policy
Reward Program after movement, and the Reward Program is fed back into depth enhancing learning model;Wherein, the Reward Program indicates
Are as follows:
Wherein, rewardtIndicate the t times it is trained when the Reward Program that calculates;RminIndicate QoS of customer lowest bid
Standard, i.e., minimum downlink transmission rate;HinnterIt indicates to work in the nearest base station of same subcarrier frequency and the base station currently optimized
Between the corresponding normalization channel coefficients of the shortest distance;IkIndicate the cross-layer interference upper limit that k-th of subcarrier frequency range can be born;
ξcase1~ξcase3Indicate that three kinds do not meet the case where modeling constrains to the penalty coefficient of system energy efficiency.
Further, described according to determining Reward Program, training depth enhances the convolutional neural networks in learning model
qeval、qtargetIf difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or
Higher than preset threshold, then the descending power currently distributed is that the distribution of power local optimum includes: under time-variant channel environment
Using Reward Program, channel circumstance, decision movement and the next state being transferred to as four-tuple deposit depth enhancing study
The memory playback unit memory of model, wherein the memory is indicated are as follows:
Memory:D (t)=e (1) ..., e (t) }
E (t)=(s (t), a (t), r (t), s (t+1))
Wherein, s (t) indicates the state inputted when the t times trained depth enhancing learning model;A (t) indicates the t times training
When depth enhances learning model, the decision that depth enhancing learning model is made is acted;R (t) indicates that the t times trained depth enhancing is learned
When practising model, depth enhances learning model after movement a (t) is made, obtained Reward Program rewardt;S (t+1) indicates t+1
When secondary trained depth enhancing learning model, according to the updated next state of time-varying Markov channel of finite state;
Data memory is randomly selected for two convolutional neural networks from the memory playback unit of depth enhancing learning model
Study and gradient decline update, wherein gradient decline only update a convolutional neural networks qevalParameter, depth enhance learn
Model training is practised in the process every fixed number of times, updates qtargetParameter θtargetFor qevalParameter θeval;
If the difference between the resulting system energy efficiency value of continuous several times and preset threshold is in default range or high
In preset threshold, then the descending power currently distributed is that power local optimum is distributed under time-variant channel environment.
Further, gradient decline more new formula is expressed as:
Wherein,Indicate training learning rate;λ indicates the discount factor assessed decision body next state;It indicates in next state s (t+1) of the input for current memory e (t), weight θtargetConvolution
Neural network qtargetThe movement a ' that can harvest maximal rewards that decision goes out;Q(s(t),a(t);θeval) indicate in input to be to work as
When the state s (t) of preceding memory e (t), weight θevalConvolutional neural networks qevalExecution acts a (t) return value obtained;Indicate to be θ to parameterevalConvolutional neural networks do gradient step-down operation.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, establish by two convolutional neural networks qeval、qtargetConstituting depth enhances learning model;By base
The time-variant channel environment between user terminal of standing is modeled as the time-varying Markov channel of finite state, determines base station and user
Between normalization channel coefficients, and input convolutional neural networks qeval, select the maximum movement of output return value as decision
Movement distributes subcarrier for user;According to sub-carrier allocation results, the inverse ratio based on channel coefficients is to be multiplexed on each subcarrier
User's allocation of downlink power, system energy efficiency is determined based on the descending power of distribution, based on the system energy efficiency it is true
Determine Reward Program, and Reward Program is fed back into depth enhancing learning model;According to determining Reward Program, training depth enhancing
Convolutional neural networks q in learning modeleval、qtargetIf the resulting system energy efficiency value of continuous several times and preset threshold it
Between difference in default range or be higher than preset threshold, then the descending power currently distributed is power under time-variant channel environment
Local optimum distribution;In this way, the time-variant channel environment between base station and user terminal to be modeled as to the time-varying Ma Er of finite state
Section's husband's channel, it is complicated by calculating to enhance learning model using depth on the basis of considering the time varying channel of high complexity
During degree is transformed into trained depth enhancing learning model, to choose decision movement with lower complexity, determine that time-varying is believed
Under road environment, the subcarrier local optimum of base station to user terminal is distributed, and improves the energy in time-variant channel environment to the maximum extent
Amount efficiency.
Detailed description of the invention
Fig. 1 is that the process of the wireless network resource distribution method provided in an embodiment of the present invention based on depth enhancing study is shown
It is intended to;
Fig. 2 is the detailed stream of the wireless network resource distribution method provided in an embodiment of the present invention based on depth enhancing study
Journey schematic diagram.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
The present invention it is existing can not effectively realize radio resource allocation in time-variant channel environment aiming at the problem that, provide one
Wireless network resource distribution method of the kind based on depth enhancing study.
As shown in Figure 1, the wireless network resource distribution method provided in an embodiment of the present invention based on depth enhancing study, packet
It includes:
S101 establishes the convolutional neural networks q by two identical parameterseval、qtargetConstituting depth enhances learning model
(Deep Q Network, DQN);
S102 believes the time-varying Markov that the time-variant channel environment between base station and user terminal is modeled as finite state
Road determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value is most
Big movement is acted as decision, distributes subcarrier for user;
S103, according to sub-carrier allocation results, the inverse ratio based on channel coefficients is the user point being multiplexed on each subcarrier
With descending power, system energy efficiency is determined based on the descending power of distribution, return letter is determined based on the system energy efficiency
Number, and Reward Program is fed back into depth enhancing learning model;
S104, according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、
qtargetIf the difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or is higher than
Preset threshold, the then descending power currently distributed are that power local optimum is distributed under time-variant channel environment.
Wireless network resource distribution method based on depth enhancing study described in the embodiment of the present invention, establishes and is rolled up by two
Product neural network qeval、qtargetConstituting depth enhances learning model;Time-variant channel environment between base station and user terminal is built
Mould is the time-varying Markov channel of finite state, determines the normalization channel coefficients between base station and user, and input convolution
Neural network qeval, select the maximum movement of output return value to act as decision, distribute subcarrier for user;According to subcarrier
Allocation result, the inverse ratio based on channel coefficients is the user's allocation of downlink power being multiplexed on each subcarrier, based under distribution
Row power determines system energy efficiency, determines Reward Program based on the system energy efficiency, and Reward Program is fed back to depth
Degree enhancing learning model;According to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、
qtargetIf the difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or is higher than
Preset threshold, the then descending power currently distributed are that power local optimum is distributed under time-variant channel environment;In this way, by base station and using
Time-variant channel environment between the terminal of family is modeled as the time-varying Markov channel of finite state, to consider high complexity
Time varying channel on the basis of, enhance learning model using depth, computation complexity, which is transformed into trained depth, enhances learning model
During, to choose decision movement with lower complexity, determine under time-variant channel environment, the son of base station to user terminal carries
The distribution of wave local optimum improves the energy efficiency in time-variant channel environment to the maximum extent.
Depth enhancing study in the present embodiment is a kind of decision-making technique based on artificial intelligence, it is characterized in that becoming in dynamic
The sequential decision that decision body is made in the environment of change can enhance state, movement, award required for learning, certainly with construction depth
Plan body can be automated when training depth increases learning model and Optimal Decision-making acts.Increased described in the present embodiment based on depth
The wireless network resource distribution method learnt by force, can simulate time-variant channel environment, to the maximum extent with lower computation complexity
The distribution for optimizing wireless network resource in time-varying network scene, achievees the effect that high-speed decision is promoted jointly with energy efficiency.Instruction
The depth enhancing learning model perfected can continue to the management for time-variant channel environment radio resource, and make the fast of high repayment
Quick decision plan.In large-scale radio network optimization, this depth can be enhanced learning model and carry out distributed computing, to reduce
Complexity.
Wireless network resource distribution method based on depth enhancing study described in the present embodiment in order to better understand is right
The method is described in detail, and specific steps may include:
A11, building depth enhance learning model DQN
In the present embodiment, initially set up by the convolutional neural networks q of two identical parameterseval、qtargetConstitute depth enhancing
Learning model;The decision process of the depth enhancing learning model is by Q function Q (s, a;It θ) determines, wherein θ indicates convolutional Neural
The weighting parameter of network, convolutional neural networks qevalAnd qtargetParameter be respectively θevalAnd θtarget, the two initialization phase
Together;Q function Q (s, a;θ) the convolutional neural networks that expression weight is θ execution in state s acts a, return value obtained.
In the present embodiment, each convolutional neural networks are by two convolutional layers, two pond layers and two full articulamentum structures
At;Training input is [n every timesamples, N, K], first dimension nsamplesIndicate input sample quantity, second and third dimension
The normalization channel coefficient matrix that one input sample of ([N, K]) expression, i.e. dimension are [N, K];Quantity is inputted in training every time
For nsamplesNormalization channel coefficient matrix, every time input convolutional neural networks normalization channel coefficient matrix be [N, K]
Data export the return value Q obtained for movement all possible under current channel condition, each movementaction_val, Qaction_val
Data structure be one-dimensional vector [Actionnum], wherein ActionNumIndicate all possible amount of action, input channel shape
State quantity is nsamples, the return value [Action of everything acquisition is made under each statenum], therefore exporting is nsamplesIt is a
One-dimensional vector [Actionnum] constitute two-dimensional matrix.
A12 believes the time-varying Markov that the time-variant channel environment between base station and user terminal is modeled as finite state
Road determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value is most
Big movement is acted as decision, distributes subcarrier for user
In the present embodiment, in a certain range, dispose it is multiple with the small base stations (SBS) of frequency, small base station include outdoor micro-base station,
Scytoblastema station and indoor Home eNodeB.6 user terminals (UE) are set in each small base station range and 3 nonopiate more
Available subcarrier (SC) spreads a distribution in location access network in the certain area centered on small base station.The present embodiment is each
An independent depth is run on small base station enhances learning model, achievees the effect that distributed treatment.Initialize small base station and
The parameter of user terminal, the parameter includes but is not limited to: SBS and UEnNormalization channel coefficients H on sub-carrierkn,k, be
The channel width B of this base station distribution, sub-carrier channels bandwidth BSC, circuit consume power pkDeng, wherein UEnIndicate user terminal
N, SCkIndicate subcarrier k, while the subcarrier associated matrix X of initialising subscriber-N,KWith the time-varying Markov channel of finite state
(Finite State Markov Channel, FSMC) transition probability matrixN indicates the set of user terminal, and K is indicated
The set of workable subcarrier under current base station;Initialize the obtained subcarrier associated matrix X of user-N,KWith finite state
Time-varying Markov matrix of the channel transfer probabilityOptimize as subsequent user incidence matrix and calculate and updates channel status.
In the present embodiment, optimization channel circumstance is the time-varying Markov channel of finite state, is spread at random a little by space
Initial coordinate is obtained, and calculates initial normalization channel coefficient matrix, ten rank of institute's value is quantified, quantization boundary is
bound0,...,bound9, optimization scene is based on time-varying Markov matrix of the channel transfer probabilityVariation.Transition probability square
Battle arrayIn element usable probability shift indicator pi,jIt indicates, wherein i indicates current state, and j indicates next state (current
State in state after execution movement), pi,jIndicate the probability that next state j is transferred to from current state i;Provide p when i=ji,jIt takes most
Big value keeps the maximum probability of former channel status, is transferred to adjacent second shape probability of state and is transferred into adjacent first
The half of state probability, each iteration are pressedMore new environment.
In the present embodiment, the subcarrier associated matrix X of user-N,KElement can distribute indicator x with user-subcarriern,k
It indicates, xn,kIndicate whether user terminal n uses subcarrier k, in a particular application, for example, binary one (x can be usedn,k=1)
Indicate that user terminal n uses subcarrier k, with Binary Zero (xn,k=0) it indicates that user terminal n does not use subcarrier k, that is, does not have
Apply to the resource for using subcarrier k.All possible subcarrier distribution calculation method is as follows:
Introduce number of combinations C, it is assumed that defining the non-orthogonal multiple access network subcarrier multiplex number upper limit is 2, and each
User (can be adjusted) in the case where can only using a subcarrier according to practical application, and type is sharedFor ease of description, the present embodiment is calculated using the small base station network model of low capacitySimplification situation.By ActionnumThe possible sub-carrier wave distribution method of kind is deposited with the structure of list
Storage, is expressed as Actionlist, list index corresponds to possible sub-carrier wave distribution method, can match subcarrier according to index value
Distribution method, to reduce the complexity of DQN processing, DQN decision movements design is integer [0, Actionnum-1];Wherein,
The corresponding subcarrier associated matrix X of a user-of every sub-carrier distribution methodN,K。
In the present embodiment, using the ratio of gain and noise between base station and user terminal as normalization channel coefficients, institute
Normalization channel coefficients are stated to be determined by following formula:
Wherein, Hn,kTo normalize channel coefficients, indicate that the normalization channel of base station and user terminal n on sub-carrierk increases
Benefit;hn,kIndicate base station and the channel gain of user terminal n on sub-carrierk, it is big according to caused by Ruili rapid fading and distance
Scale decline calculates, and the generic service range based on small base station is indoor environment, and two layers of wall damage is added;It indicates
Noise power on sub-carrierk, wherein E [] indicates mathematic expectaion,Indicate that mean value is 0, variance
ForAdditive white Gaussian noise.
In the present embodiment, the normalization channel coefficients are inputted into convolutional neural networks qeval, convolutional neural networks qevalIt is logical
Cross decision formulaIt selects the maximum movement of output return value to act as decision, is distributed for user
Subcarrier;
Wherein, Q function Q (s, a ';θeval) indicate convolutional neural networks qevalThe execution in state s of decision body acts a ' institute
The return value of acquisition, the state s are the normalization channel coefficients of input;A indicates the decision movement of depth enhancing learning model,
I.e. optimal sub-carrier allocation results are a kind of possible XN,K, indicate the incidence matrix of user terminal n and subcarrier k.
In the present embodiment, the input that depth enhances learning model DQN is DQN decision body state in which s, i.e. normalization letter
Road coefficient (specifically: two dimension normalization channel coefficient matrix HN,K);Output is one-dimensional vector Qaction_val, in Qaction_valMiddle choosing
The decision movement (optimal sub-carrier allocation results) that the maximum movement a ' of value is distributed as subcarrier is selected, therefore, in Qaction_val
The index of the middle maximum movement of selective value enters ActionlistMatching obtains current decision movement XN,K, to obtain arriving base station
The subcarrier of user terminal obtains the subcarrier associated matrix X of user-when local optimum apportioning costN,K, in this way, according to index value
Sub-carrier wave distribution method is matched, can reduce the complexity of DQN processing.
A13, according to optimal sub-carrier allocation results, based on the score order algorithm of fixed subcarrier distribution, i.e., same sub- load
User's allocation of downlink power (wherein, channel increasing under wave according to channel gain coefficient inverse ratio rule to be multiplexed on each subcarrier
The biggish user of benefit distributes smaller power, and channel gain small user distribute relatively high power).
In the present embodiment, the descending power for user's distribution is indicated are as follows:
Wherein, pn,kIndicate that base station is the down transmitting power of user terminal n distribution on sub-carrierk;p'kIndicate base station
The down transmitting power distributed on sub-carrierk;A indicates that decay factor, constraint condition are 0 < a < 1, and same suboptimization
The value of Cheng Zhong, a are definite value, can not be changed according to different user or different sub-carrier;KmaxIt indicates to access net in non-orthogonal multiple
In network, complexity that current serial interference eliminator (Successive Interference Cancellation, SIC) can be born
Under degree, the maximum number of user that is multiplexed on each subcarrier.
A14 determines the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,k
In the present embodiment, the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,kIt indicates are as follows:
rn,k=log2(1+γn,k)
Wherein, γn,kIndicate the Signal-to-Noise that user terminal n is obtained from subcarrier k, γn,kIndicate user terminal n from son
The Signal-to-Noise that carrier wave k is obtained.
In the present embodiment, in non-orthogonal multiple access network, user of the multiplexing on same subcarrier is normalized into letter
Road coefficient arranges in descending order, indicates are as follows:
|H1,k|≥|H2,k|≥…≥|Hn,k|≥|Hn+1,k|≥…≥|HKmax,k|
It can be succeeded before user terminal i is located at j in the sequence based on the optimal decoding order of serial interference canceller
The interference for carrying out user terminal j is decoded and removes, user terminal j can receive the signal of user terminal i, and together as interference
Receive.Non-orthogonal multiple accesses in network, considers fairness between user and reduces the principle of co-channel interference, when distribution power,
Channel condition good user distribute smaller power, i.e., in examples detailed above, if Hi,k>Hj,k, then p is distributedi,k<pj,k, with point in A13
The distribution principle of number order algorithm is consistent.
Consider to reduce co-channel interference and computation complexity to the greatest extent under small base station scene, predefines each subcarrier multiplex number
For KmaxThe maximum information transmission rate of=2, user terminal i and user terminal j are a Signal to Interference plus Noise Ratios
The logarithmic function of (Signal to Interference plus Noise Ratio, SINR).χINNER=pi,kHj,kIt indicates to use
Co-channel interference in the layer that family terminal j is subject under current base station service.
In the present embodiment, the peak transfer rate of user terminal i and user terminal j are indicated are as follows:
ri,k=log2(1+γi,k), rj,k=log2(1+γj,k), γi,k=pi,kHi,k,
That is:
ri,k=log2(1+pi,kHi,k),
A16 determines system power consumption UP(X)
In the present embodiment, consider that small base station has energy recovery unit, the system power consumption UP(X) it indicates are as follows:
In the present embodiment, pkIndication circuit consumes power;ψ indicates base station energy regenerating coefficient, can be according to actual hardware category
Property change.
A17, according to determining γn,kAnd UP(X), system energy efficiency is determined
In the present embodiment, according to the undistorted rate of information throughput of maximum of obtained base station subcarrier k to user terminal n
rn,kWith system power consumption UP(X), the energy efficiency ee of subcarrier k to user terminal n is calculatedn,k:
Wherein,Indicate subcarrier k channel width.
In the present embodiment, system energy efficiency is indicated are as follows:
Wherein, een,kIndicate the energy efficiency of subcarrier k to user terminal n,Indicate subcarrier k channel width, N table
Show the set of user terminal, K indicates the set of workable subcarrier under current base station.
A17 determines Reward Program based on the system energy efficiency, and Reward Program is fed back to depth enhancing study mould
Type
In the present embodiment, between preset modeling constraint condition is not met, (the modeling constraint condition is by fairness user
Principle, service quality minimum standard, the cross-layer interference upper limit etc. factors determine) system energy efficiency, based on the weak of value return
Algorithm is supervised, the system energy efficiency is punished according to the type for not meeting modeling constraint condition, obtains depth enhancing study
Model make a policy movement after Reward Program, and by Reward Program feed back to depth enhancing learning model;Wherein, the return
Function representation are as follows:
Wherein, rewardtIndicate the t times it is trained when the Reward Program that calculates;RminIndicate QoS of customer (Quality
Of Service, QoS) minimum standard, i.e., minimum downlink transmission rate;HinnterExpression works in same subcarrier frequency most
The corresponding normalization channel coefficients of the shortest distance between nearly base station and the base station currently optimized, can be according to the method in step A12
It calculates;IkIt indicates cross-layer (across station) interference upper limit that k-th of subcarrier frequency range can be born, adjustment interference is set according to concrete application
Upper limit value;ξcase1~ξcase3Indicate that three kinds do not meet the case where modeling constrains to the penalty coefficient of energy efficiency.
Separately it should be understood that when directly using system energy efficiency as Reward Program, xn,k, α also need to meet other about
Beam condition, in conjunction with above-mentioned constraint condition, at this point, xn,k, the constraint condition that need to meet of a are as follows:
Wherein, BSpeakIndicate small base station peak power;Condition 1Force user terminal can only simultaneously with
1 subcarrier is associated;Condition 2It limits in non-orthogonal multiple access network, same height carries
The maximum number of user amount being multiplexed on wave, the quantity are Kmax, it is therefore an objective to it reduces and is interfered in station and reduce answering for serial interference canceller
Miscellaneous degree;Condition 3It is constrained for QoS, by the rate of information throughput of all user terminals of base station service
It should be more than the minimum limitation of QoS of customer.Condition 4Be to from base station subcarrier k most
The limitation of big transmission power.Condition 5It is a kind of effective interference coordination mechanism, limits current optimization
Interference of the base station for other base stations.Condition 6Limitation when being distribution power, to decay factor.
Reward Program, channel circumstance, decision are acted and transfer next state deposit DQN remember playback unit by A18
In the present embodiment, Reward Program, channel circumstance, decision are acted and transfer next state (state being transferred to) is used as four
Tuple is stored in DQN and remembers playback unit memory, and memory is indicated are as follows:
Memory:D (t)=e (1) ..., e (t) }
E (t)=(s (t), a (t), r (t), s (t+1))
Wherein, s (t) indicates the normalization channel coefficients (state) inputted when the t times training pattern;A (t) is indicated the t times
When training depth enhancing learning model, the decision movement that DQN makes, the i.e. subcarrier associated matrix of user-;R (t) is indicated the t times
When training depth enhancing learning model, DQN is after movement a (t) is made, obtained Reward Program rewardt;S (t+1) indicates t+
When 1 trained depth enhancing learning model, according to the updated normalization channel system of time-varying Markov channel of finite state
Number (next state).
In the present embodiment, by being defined to memory playback class, and memory is set as to the number of object array or dictionary
According to structure, every group of e (t) is stored.
A19, enhances learning model using batch mode training depth, randomly selects fixation from DQN memory playback unit
Study and gradient decline update of the batch data memory of size for two convolutional neural networks.
In the present embodiment, data memory is handled using loss function Loss (θ), loss function Loss (θ) is indicated are as follows:
Gradient decline more new formula is expressed as:
Wherein,Indicate training learning rate;λ indicates the discount factor assessed decision body next state;It indicates in next state s (t+1) of the input for current memory e (t), weight θtargetConvolution
Neural network qtargetThe movement a ' that can harvest maximal rewards that decision goes out;Q(s(t),a(t);θeval) indicate in input to be to work as
When the state s (t) of preceding memory e (t), weight θevalConvolutional neural networks qevalExecution acts a (t) return value obtained;Indicate to be θ to parameterevalConvolutional neural networks do gradient step-down operation, i.e., modification convolutional neural networks qevalParameter
θeval, so that convolutional neural networks qtargetWith qevalOutput subtract each other after minimum.
In the present embodiment, subtraction Q (s (t), a (t);θeval) operation belong to the operation of respective action index position, such as remember
Recall unit e (1) and selected movement 2, then more new formula is declined by gradient, only updates in two convolutional neural networks at [1,2]
Numerical value, remaining in first dimension acts that corresponding numerical value is constant, and the stability to guarantee training, and gradient decline is only more
New convolutional neural networks qevalParameter.
A20 updates q every fixed number of times in depth enhancing learning model training processtargetParameter is qevalParameter,
It indicates are as follows:
Wherein, CiterThe counter in training is indicated, for recording frequency of training;CmaxIndicate qtargetParameter and qevalGinseng
Several update intervals, while being also CiterPeriod of change, therefore CiterEqual to CmaxWhen, it is zeroed.
A21, the q obtained after being updated by step A19 and A20targetNetwork parameter and qevalNetwork parameter, if continuous several times
Difference between the system energy efficiency value and preset threshold (designated value) of optimization is within the scope of preset, or is higher than default threshold
Value, then it is current to distribute it is believed that depth enhancing learning model can be adapted for the radio resource allocation of this time-variant channel environment
Descending power be time-variant channel environment under power local optimum distribute, current depth enhancing learning model Internet resources are divided
With the local optimum under changing environment when reaching this, gained depth enhancing learning model can persistently make under practical time-variant channel environment
With;
Otherwise A22 is pressedMore new environment judges Citer=CmaxIt is whether true, if so, then enable Citer=0,
θtarget=θeval, then execute step A12;Otherwise, step A12 is directly executed, until the system energy efficiency recalculated
Difference between value and preset threshold is in default range, or is higher than preset threshold, at this point, reaching in time-variant channel environment
Optimum optimization.
In the present embodiment, with the increase of optimization number t, return value of the DQN model in time-variant channel environment can be from low
Gradually it is intended to higher level, which is the wireless network resource distribution method based on depth enhancing study, to realize
The optimization of sub-carrier, power distribution in time-variant channel environment.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of wireless network resource distribution method based on depth enhancing study characterized by comprising
S101 establishes the convolutional neural networks q by two identical parameterseval、qtargetConstituting depth enhances learning model;
Time-variant channel environment between base station and user terminal is modeled as the time-varying Markov channel of finite state by S102,
It determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value maximum
Movement as decision act, for user distribute subcarrier;
S103, according to sub-carrier allocation results, the inverse ratio based on channel coefficients is under the user's distribution being multiplexed on each subcarrier
Row power, determines system energy efficiency based on the descending power of distribution, determines Reward Program based on the system energy efficiency, and
Reward Program is fed back into depth enhancing learning model;
S104, according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、qtargetIf
Difference between the resulting system energy efficiency value of continuous several times and preset threshold is in default range or is higher than preset threshold,
The descending power then currently distributed is that power local optimum is distributed under time-variant channel environment.
2. the wireless network resource distribution method according to claim 1 based on depth enhancing study, which is characterized in that institute
Stating normalization channel coefficients indicates are as follows:
Wherein, Hn,kTo normalize channel coefficients, base station and the normalization channel gain of user terminal n on sub-carrierk are indicated;
hn,kIndicate base station and the channel gain of user terminal n on sub-carrierk;Indicate noise power on sub-carrierk.
3. the wireless network resource distribution method according to claim 2 based on depth enhancing study, which is characterized in that institute
State input convolutional neural networks qeval, select the maximum movement of output return value to act as decision, distribute subcarrier for user
Include:
The normalization channel coefficients are inputted into convolutional neural networks qeval, convolutional neural networks qevalPass through decision formulaIt selects the maximum movement of output return value to act as decision, distributes subcarrier for user;
Wherein, θevalIndicate convolutional neural networks qevalWeighting parameter, Q function Q (s, a ';θeval) expression weight be θevalVolume
Product neural network qevalIn state s, execution acts a ' return value obtained, and the state s is the normalization channel system of input
Number;A indicates the decision movement of depth enhancing learning model, i.e., optimal sub-carrier allocation results, wherein maximum according to return value
The index of movement obtains optimal sub-carrier allocation results.
4. the wireless network resource distribution method according to claim 3 based on depth enhancing study, which is characterized in that be
The descending power of user's distribution indicates are as follows:
Wherein, pn,kIndicate that base station is the down transmitting power of user terminal n distribution on sub-carrierk;p'kIndicate base station in sub- load
The down transmitting power distributed on wave k;A indicates decay factor;KmaxIt indicates in non-orthogonal multiple access network, current serial
Under the complexity that interference eliminator can be born, the maximum number of user that is multiplexed on each subcarrier.
5. the wireless network resource distribution method according to claim 4 based on depth enhancing study, which is characterized in that institute
The descending power stated based on distribution determines that system energy efficiency includes:
Determine the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,k;
According to the normalization channel coefficients between determining base station and user, the descending power of sub-carrier allocation results and distribution,
Determine system power consumption UP(X);
According to determining rn,kAnd UP(X), system energy efficiency is determined.
6. the wireless network resource distribution method according to claim 5 based on depth enhancing study, which is characterized in that base
Stand the undistorted rate of information throughput r of maximum of subcarrier k to user terminal nn,kIt indicates are as follows:
rn,k=log2(1+γn,k)
Wherein, γn,kIndicate the Signal-to-Noise that user terminal n is obtained from subcarrier k, γn,kIndicate user terminal n from subcarrier
The Signal-to-Noise that k is obtained;
System power consumption UP(X) it indicates are as follows:
Wherein, pkIndication circuit consumes power, and ψ indicates base station energy regenerating coefficient, xn,kIndicate whether user terminal n is carried using son
Wave k.
7. the wireless network resource distribution method according to claim 6 based on depth enhancing study, which is characterized in that be
Energy efficiency of uniting indicates are as follows:
Wherein, een,kIndicate the energy efficiency of subcarrier k to user terminal n,Indicate subcarrier k channel width, N indicates to use
The set of family terminal, K indicate the set of workable subcarrier under current base station.
8. the wireless network resource distribution method according to claim 7 based on depth enhancing study, which is characterized in that institute
It states and Reward Program is determined based on the system energy efficiency, and Reward Program is fed back into depth enhancing learning model and includes:
To the system energy efficiency for not meeting preset modeling constraint condition, with the Weakly supervised algorithm based on value return, according to not
The type for meeting modeling constraint condition punishes the system energy efficiency, obtains depth enhancing learning model and makes a policy movement
Reward Program afterwards, and the Reward Program is fed back into depth enhancing learning model;Wherein, the Reward Program indicates are as follows:
Wherein, rewardtIndicate the t times it is trained when the Reward Program that calculates;RminIndicate QoS of customer minimum standard, i.e., most
Low downlink transmission rate;HinnterIt indicates to work in the nearest base station of same subcarrier frequency between the base station that currently optimizes most
The corresponding normalization channel coefficients of short distance;IkIndicate the cross-layer interference upper limit that k-th of subcarrier frequency range can be born;ξcase1~
ξcase3Indicate that three kinds do not meet the case where modeling constrains to the penalty coefficient of system energy efficiency.
9. the wireless network resource distribution method according to claim 8 based on depth enhancing study, which is characterized in that institute
It states according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、qtargetIf continuous more
Difference between secondary resulting system energy efficiency value and preset threshold in default range or is higher than preset threshold, then currently
The descending power of distribution is that the distribution of power local optimum includes: under time-variant channel environment
Using Reward Program, channel circumstance, decision movement and the next state being transferred to as four-tuple deposit depth enhancing learning model
Memory playback unit memory, wherein the memory is indicated are as follows:
Memory:D (t)=e (1) ..., e (t) }
E (t)=(s (t), a (t), r (t), s (t+1))
Wherein, s (t) indicates the state inputted when the t times trained depth enhancing learning model;A (t) indicates the t times trained depth
When enhancing learning model, the decision that depth enhancing learning model is made is acted;R (t) indicates the t times trained depth enhancing study mould
When type, depth enhances learning model after movement a (t) is made, obtained Reward Program rewardt;S (t+1) indicates t+1 instruction
When practicing depth enhancing learning model, according to the updated next state of time-varying Markov channel of finite state;
For two convolutional neural networks of data memory is randomly selected from the memory playback unit of depth enhancing learning model
It practises and gradient decline updates, wherein gradient decline only updates convolutional neural networks qevalParameter, depth enhance study mould
Every fixed number of times in type training process, q is updatedtargetParameter θtargetFor qevalParameter θeval;
If the difference between the resulting system energy efficiency value of continuous several times and preset threshold is in default range or is higher than pre-
If threshold value, then the descending power currently distributed is that power local optimum is distributed under time-variant channel environment.
10. the wireless network resource distribution method according to claim 9 based on depth enhancing study, which is characterized in that
Gradient decline more new formula is expressed as:
Wherein,Indicate training learning rate;λ indicates the discount factor assessed decision body next state;It indicates in next state s (t+1) of the input for current memory e (t), weight θtargetConvolution
Neural network qtargetThe movement a ' that can harvest maximal rewards that decision goes out;Q(s(t),a(t);θeval) indicate in input to be to work as
When the state s (t) of preceding memory e (t), weight θevalConvolutional neural networks qevalExecution acts a (t) return value obtained;Indicate to be θ to parameterevalConvolutional neural networks do gradient step-down operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811535056.1A CN109474980B (en) | 2018-12-14 | 2018-12-14 | Wireless network resource allocation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811535056.1A CN109474980B (en) | 2018-12-14 | 2018-12-14 | Wireless network resource allocation method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109474980A true CN109474980A (en) | 2019-03-15 |
CN109474980B CN109474980B (en) | 2020-04-28 |
Family
ID=65675169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811535056.1A Active CN109474980B (en) | 2018-12-14 | 2018-12-14 | Wireless network resource allocation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109474980B (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109962728A (en) * | 2019-03-28 | 2019-07-02 | 北京邮电大学 | A kind of multi-node combination Poewr control method based on depth enhancing study |
CN110035478A (en) * | 2019-04-18 | 2019-07-19 | 北京邮电大学 | A kind of dynamic multi-channel cut-in method under high-speed mobile scene |
CN110084245A (en) * | 2019-04-04 | 2019-08-02 | 中国科学院自动化研究所 | The Weakly supervised image detecting method of view-based access control model attention mechanism intensified learning, system |
CN110167176A (en) * | 2019-04-25 | 2019-08-23 | 北京科技大学 | A kind of wireless network resource distribution method based on distributed machines study |
CN110380776A (en) * | 2019-08-22 | 2019-10-25 | 电子科技大学 | A kind of Internet of things system method of data capture based on unmanned plane |
CN110401975A (en) * | 2019-07-05 | 2019-11-01 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of method, apparatus and electronic equipment of the transmission power adjusting internet of things equipment |
CN110430613A (en) * | 2019-04-11 | 2019-11-08 | 重庆邮电大学 | Resource allocation methods of the multicarrier non-orthogonal multiple access system based on efficiency |
CN110635833A (en) * | 2019-09-25 | 2019-12-31 | 北京邮电大学 | Power distribution method and device based on deep learning |
CN110809306A (en) * | 2019-11-04 | 2020-02-18 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
CN110972309A (en) * | 2019-11-08 | 2020-04-07 | 厦门大学 | Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning |
CN111211831A (en) * | 2020-01-13 | 2020-05-29 | 东方红卫星移动通信有限公司 | Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method |
CN111428903A (en) * | 2019-10-31 | 2020-07-17 | 国家电网有限公司 | Interruptible load optimization method based on deep reinforcement learning |
CN111431646A (en) * | 2020-03-31 | 2020-07-17 | 北京邮电大学 | Dynamic resource allocation method in millimeter wave system |
CN111526592A (en) * | 2020-04-14 | 2020-08-11 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111542107A (en) * | 2020-05-14 | 2020-08-14 | 南昌工程学院 | Mobile edge network resource allocation method based on reinforcement learning |
WO2020191686A1 (en) * | 2019-03-27 | 2020-10-01 | 华为技术有限公司 | Neural network-based power distribution method and device |
CN111867110A (en) * | 2020-06-17 | 2020-10-30 | 三明学院 | Wireless network channel separation energy-saving method based on switch switching strategy |
CN111885720A (en) * | 2020-06-08 | 2020-11-03 | 中山大学 | Multi-user subcarrier power distribution method based on deep reinforcement learning |
CN111930501A (en) * | 2020-07-23 | 2020-11-13 | 齐齐哈尔大学 | Wireless resource allocation method based on unsupervised learning and oriented to multi-cell network |
CN112104400A (en) * | 2020-04-24 | 2020-12-18 | 广西华南通信股份有限公司 | Combined relay selection method and system based on supervised machine learning |
CN112770398A (en) * | 2020-12-18 | 2021-05-07 | 北京科技大学 | Far-end radio frequency end power control method based on convolutional neural network |
WO2021088441A1 (en) * | 2019-11-08 | 2021-05-14 | Huawei Technologies Co., Ltd. | Systems and methods for multi-user pairing in wireless communication networks |
CN112988229A (en) * | 2019-12-12 | 2021-06-18 | 上海大学 | Convolutional neural network resource optimization configuration method based on heterogeneous computation |
CN113115355A (en) * | 2021-04-29 | 2021-07-13 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113395757A (en) * | 2021-06-10 | 2021-09-14 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN113490184A (en) * | 2021-05-10 | 2021-10-08 | 北京科技大学 | Smart factory-oriented random access resource optimization method and device |
CN114126025A (en) * | 2021-11-02 | 2022-03-01 | 中国联合网络通信集团有限公司 | Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server |
CN114142912A (en) * | 2021-11-26 | 2022-03-04 | 西安电子科技大学 | Resource control method for guaranteeing time coverage continuity of high-dynamic air network |
CN114360305A (en) * | 2021-12-15 | 2022-04-15 | 广州创显科教股份有限公司 | Classroom interactive teaching method and system based on 5G network |
CN114928549A (en) * | 2022-04-20 | 2022-08-19 | 清华大学 | Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105407535A (en) * | 2015-10-22 | 2016-03-16 | 东南大学 | High energy efficiency resource optimization method based on constrained Markov decision process |
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN106909728A (en) * | 2017-02-21 | 2017-06-30 | 电子科技大学 | A kind of FPGA interconnection resources configuration generating methods based on enhancing study |
US20180091981A1 (en) * | 2016-09-23 | 2018-03-29 | Board Of Trustees Of The University Of Arkansas | Smart vehicular hybrid network systems and applications of same |
US20180121766A1 (en) * | 2016-09-18 | 2018-05-03 | Newvoicemedia, Ltd. | Enhanced human/machine workforce management using reinforcement learning |
CN108307510A (en) * | 2018-02-28 | 2018-07-20 | 北京科技大学 | A kind of power distribution method in isomery subzone network |
CN108712748A (en) * | 2018-04-12 | 2018-10-26 | 天津大学 | A method of the anti-interference intelligent decision of cognitive radio based on intensified learning |
CN108737057A (en) * | 2018-04-27 | 2018-11-02 | 南京邮电大学 | Multicarrier based on deep learning recognizes NOMA resource allocation methods |
CN108989099A (en) * | 2018-07-02 | 2018-12-11 | 北京邮电大学 | Federated resource distribution method and system based on software definition Incorporate network |
-
2018
- 2018-12-14 CN CN201811535056.1A patent/CN109474980B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106358308A (en) * | 2015-07-14 | 2017-01-25 | 北京化工大学 | Resource allocation method for reinforcement learning in ultra-dense network |
CN105407535A (en) * | 2015-10-22 | 2016-03-16 | 东南大学 | High energy efficiency resource optimization method based on constrained Markov decision process |
US20180121766A1 (en) * | 2016-09-18 | 2018-05-03 | Newvoicemedia, Ltd. | Enhanced human/machine workforce management using reinforcement learning |
US20180091981A1 (en) * | 2016-09-23 | 2018-03-29 | Board Of Trustees Of The University Of Arkansas | Smart vehicular hybrid network systems and applications of same |
CN106909728A (en) * | 2017-02-21 | 2017-06-30 | 电子科技大学 | A kind of FPGA interconnection resources configuration generating methods based on enhancing study |
CN108307510A (en) * | 2018-02-28 | 2018-07-20 | 北京科技大学 | A kind of power distribution method in isomery subzone network |
CN108712748A (en) * | 2018-04-12 | 2018-10-26 | 天津大学 | A method of the anti-interference intelligent decision of cognitive radio based on intensified learning |
CN108737057A (en) * | 2018-04-27 | 2018-11-02 | 南京邮电大学 | Multicarrier based on deep learning recognizes NOMA resource allocation methods |
CN108989099A (en) * | 2018-07-02 | 2018-12-11 | 北京邮电大学 | Federated resource distribution method and system based on software definition Incorporate network |
Non-Patent Citations (2)
Title |
---|
YING HE等: "Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 * |
YONG ZHANG等: "Power Allocation in Multi-cell Networks Using Deep Reinforcement Learning", 《2018 IEEE 88TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-FALL)》 * |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020191686A1 (en) * | 2019-03-27 | 2020-10-01 | 华为技术有限公司 | Neural network-based power distribution method and device |
CN113615277A (en) * | 2019-03-27 | 2021-11-05 | 华为技术有限公司 | Power distribution method and device based on neural network |
CN113615277B (en) * | 2019-03-27 | 2023-03-24 | 华为技术有限公司 | Power distribution method and device based on neural network |
CN109962728A (en) * | 2019-03-28 | 2019-07-02 | 北京邮电大学 | A kind of multi-node combination Poewr control method based on depth enhancing study |
CN110084245A (en) * | 2019-04-04 | 2019-08-02 | 中国科学院自动化研究所 | The Weakly supervised image detecting method of view-based access control model attention mechanism intensified learning, system |
CN110430613A (en) * | 2019-04-11 | 2019-11-08 | 重庆邮电大学 | Resource allocation methods of the multicarrier non-orthogonal multiple access system based on efficiency |
CN110035478A (en) * | 2019-04-18 | 2019-07-19 | 北京邮电大学 | A kind of dynamic multi-channel cut-in method under high-speed mobile scene |
CN110167176A (en) * | 2019-04-25 | 2019-08-23 | 北京科技大学 | A kind of wireless network resource distribution method based on distributed machines study |
CN110167176B (en) * | 2019-04-25 | 2021-06-01 | 北京科技大学 | Wireless network resource allocation method based on distributed machine learning |
CN110401975A (en) * | 2019-07-05 | 2019-11-01 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of method, apparatus and electronic equipment of the transmission power adjusting internet of things equipment |
CN110380776A (en) * | 2019-08-22 | 2019-10-25 | 电子科技大学 | A kind of Internet of things system method of data capture based on unmanned plane |
CN110380776B (en) * | 2019-08-22 | 2021-05-14 | 电子科技大学 | Internet of things system data collection method based on unmanned aerial vehicle |
CN110635833A (en) * | 2019-09-25 | 2019-12-31 | 北京邮电大学 | Power distribution method and device based on deep learning |
CN110635833B (en) * | 2019-09-25 | 2020-12-15 | 北京邮电大学 | Power distribution method and device based on deep learning |
CN111428903A (en) * | 2019-10-31 | 2020-07-17 | 国家电网有限公司 | Interruptible load optimization method based on deep reinforcement learning |
CN110809306A (en) * | 2019-11-04 | 2020-02-18 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
CN110972309B (en) * | 2019-11-08 | 2022-07-19 | 厦门大学 | Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning |
US11246173B2 (en) | 2019-11-08 | 2022-02-08 | Huawei Technologies Co. Ltd. | Systems and methods for multi-user pairing in wireless communication networks |
CN110972309A (en) * | 2019-11-08 | 2020-04-07 | 厦门大学 | Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning |
WO2021088441A1 (en) * | 2019-11-08 | 2021-05-14 | Huawei Technologies Co., Ltd. | Systems and methods for multi-user pairing in wireless communication networks |
CN112988229A (en) * | 2019-12-12 | 2021-06-18 | 上海大学 | Convolutional neural network resource optimization configuration method based on heterogeneous computation |
CN111211831A (en) * | 2020-01-13 | 2020-05-29 | 东方红卫星移动通信有限公司 | Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method |
CN111431646B (en) * | 2020-03-31 | 2021-06-15 | 北京邮电大学 | Dynamic resource allocation method in millimeter wave system |
CN111431646A (en) * | 2020-03-31 | 2020-07-17 | 北京邮电大学 | Dynamic resource allocation method in millimeter wave system |
CN111526592B (en) * | 2020-04-14 | 2022-04-08 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN111526592A (en) * | 2020-04-14 | 2020-08-11 | 电子科技大学 | Non-cooperative multi-agent power control method used in wireless interference channel |
CN112104400A (en) * | 2020-04-24 | 2020-12-18 | 广西华南通信股份有限公司 | Combined relay selection method and system based on supervised machine learning |
CN111542107A (en) * | 2020-05-14 | 2020-08-14 | 南昌工程学院 | Mobile edge network resource allocation method based on reinforcement learning |
CN111885720B (en) * | 2020-06-08 | 2021-05-28 | 中山大学 | Multi-user subcarrier power distribution method based on deep reinforcement learning |
CN111885720A (en) * | 2020-06-08 | 2020-11-03 | 中山大学 | Multi-user subcarrier power distribution method based on deep reinforcement learning |
CN111867110B (en) * | 2020-06-17 | 2023-10-03 | 三明学院 | Wireless network channel separation energy-saving method based on switch switching strategy |
CN111867110A (en) * | 2020-06-17 | 2020-10-30 | 三明学院 | Wireless network channel separation energy-saving method based on switch switching strategy |
CN111930501A (en) * | 2020-07-23 | 2020-11-13 | 齐齐哈尔大学 | Wireless resource allocation method based on unsupervised learning and oriented to multi-cell network |
CN112770398A (en) * | 2020-12-18 | 2021-05-07 | 北京科技大学 | Far-end radio frequency end power control method based on convolutional neural network |
CN113115355A (en) * | 2021-04-29 | 2021-07-13 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113115355B (en) * | 2021-04-29 | 2022-04-22 | 电子科技大学 | Power distribution method based on deep reinforcement learning in D2D system |
CN113490184A (en) * | 2021-05-10 | 2021-10-08 | 北京科技大学 | Smart factory-oriented random access resource optimization method and device |
CN113395757A (en) * | 2021-06-10 | 2021-09-14 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN113395757B (en) * | 2021-06-10 | 2023-06-30 | 中国人民解放军空军通信士官学校 | Deep reinforcement learning cognitive network power control method based on improved return function |
CN114126025B (en) * | 2021-11-02 | 2023-04-28 | 中国联合网络通信集团有限公司 | Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server |
CN114126025A (en) * | 2021-11-02 | 2022-03-01 | 中国联合网络通信集团有限公司 | Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server |
CN114142912A (en) * | 2021-11-26 | 2022-03-04 | 西安电子科技大学 | Resource control method for guaranteeing time coverage continuity of high-dynamic air network |
CN114360305A (en) * | 2021-12-15 | 2022-04-15 | 广州创显科教股份有限公司 | Classroom interactive teaching method and system based on 5G network |
CN114928549A (en) * | 2022-04-20 | 2022-08-19 | 清华大学 | Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN109474980B (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109474980A (en) | A kind of wireless network resource distribution method based on depth enhancing study | |
CN109729528B (en) | D2D resource allocation method based on multi-agent deep reinforcement learning | |
CN110493826A (en) | A kind of isomery cloud radio access network resources distribution method based on deeply study | |
CN109600178B (en) | Optimization method for energy consumption, time delay and minimization in edge calculation | |
Wang et al. | Joint interference alignment and power control for dense networks via deep reinforcement learning | |
CN106358308A (en) | Resource allocation method for reinforcement learning in ultra-dense network | |
CN113596785B (en) | D2D-NOMA communication system resource allocation method based on deep Q network | |
CN110213776B (en) | WiFi unloading method based on Q learning and multi-attribute decision | |
Zhu et al. | Machine-learning-based opportunistic spectrum access in cognitive radio networks | |
CN107426773A (en) | Towards the distributed resource allocation method and device of efficiency in Wireless Heterogeneous Networks | |
Xu et al. | Resource allocation algorithm based on hybrid particle swarm optimization for multiuser cognitive OFDM network | |
CN106358300A (en) | Distributed resource distribution method in microcellular network | |
Zhang et al. | Deep learning based user association in heterogeneous wireless networks | |
Yu et al. | Interference coordination strategy based on Nash bargaining for small‐cell networks | |
CN114423028B (en) | CoMP-NOMA cooperative clustering and power distribution method based on multi-agent deep reinforcement learning | |
Ouamri et al. | Double deep q-network method for energy efficiency and throughput in a uav-assisted terrestrial network | |
CN107071881A (en) | A kind of small cell network distributed energy distribution method based on game theory | |
CN110139282A (en) | A kind of energy acquisition D2D communication resource allocation method neural network based | |
CN115811788B (en) | D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning | |
CN105873127A (en) | Heuristic user connection load balancing method based on random decision | |
Li et al. | Dynamic power allocation in IIoT based on multi-agent deep reinforcement learning | |
CN112243283B (en) | Cell-Free Massive MIMO network clustering calculation method based on successful transmission probability | |
Chen et al. | A reinforcement learning based joint spectrum allocation and power control algorithm for D2D communication underlaying cellular networks | |
Chen et al. | A multi-agent reinforcement learning based power control algorithm for D2D communication underlaying cellular networks | |
CN115915454A (en) | SWIPT-assisted downlink resource allocation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |