CN109474980A - A kind of wireless network resource distribution method based on depth enhancing study - Google Patents

A kind of wireless network resource distribution method based on depth enhancing study Download PDF

Info

Publication number
CN109474980A
CN109474980A CN201811535056.1A CN201811535056A CN109474980A CN 109474980 A CN109474980 A CN 109474980A CN 201811535056 A CN201811535056 A CN 201811535056A CN 109474980 A CN109474980 A CN 109474980A
Authority
CN
China
Prior art keywords
eval
depth
subcarrier
indicate
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811535056.1A
Other languages
Chinese (zh)
Other versions
CN109474980B (en
Inventor
张海君
刘启瑞
皇甫伟
董江波
隆克平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201811535056.1A priority Critical patent/CN109474980B/en
Publication of CN109474980A publication Critical patent/CN109474980A/en
Application granted granted Critical
Publication of CN109474980B publication Critical patent/CN109474980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/06TPC algorithms
    • H04W52/14Separate analysis of uplink or downlink
    • H04W52/143Downlink power control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/241TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/265TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
    • H04W52/346TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/53Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/542Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/543Allocation or scheduling criteria for wireless resources based on quality criteria based on requested quality, e.g. QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource

Abstract

The present invention provides a kind of wireless network resource distribution method based on depth enhancing study, and the energy efficiency in time-variant channel environment can be improved to the maximum extent with lower complexity.The described method includes: establishing depth enhancing learning model;Time-variant channel environment between base station and user terminal is modeled as to the time-varying Markov channel of finite state, determines normalization channel coefficients, and input convolutional neural networks qeval, select the maximum movement of output return value to act as decision, distribute subcarrier for user;According to sub-carrier allocation results, the inverse ratio based on channel coefficients is the user's allocation of downlink power being multiplexed on each subcarrier, determines Reward Program based on the descending power of distribution, and Reward Program is fed back to depth enhancing learning model;According to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、qtarget, determine that power local optimum is distributed under time-variant channel environment.The present invention relates to wireless communication and artificial intelligence decision domains.

Description

A kind of wireless network resource distribution method based on depth enhancing study
Technical field
The present invention relates to wireless communication and artificial intelligence decision domain, particularly relate to it is a kind of based on depth enhancing study Wireless network resource distribution method.
Background technique
In long term evolution (Long Term Evolution, the LTE) epoch, networking framework is from macro network to macro micro- collaboration Transformation, macrocellular (Macro Cell) Faced In Sustainable Development lot of challenges, for example, not expected business increased requirement, Ubiquitous access demand, random hot spot deployment and the biggish cost pressure of macrocellular itself.Therefore, microcellulor, Home eNodeB Etc. small base station (Small Cell) precisely cover, supplement blind area the advantages of emerge from, and be increasingly becoming network deployment in it is macro Base station cooperates, and shares the important link of macro base station service pressure.5th third-generation mobile communication is the extension after 4G, 5G It is not a single wireless access technology, but various new wireless access technology and existing wireless access technology evolution collection The general name of solution after.Nowadays 5G network initially enters the sight of people again, and industry generally believes user experience rate It is the most important performance indicator of 5G.The technical characterstic of 5G can be summarized with several numbers: the capacity boost of 1000x, 100,000,000,000+ Connection support, the maximum speed of 10GB/s, 1ms delay below.Major technique includes ultra-large multiple antennas in 5G, novel Multiple access technique and super-intensive network, wherein the deployment of small base station and macro base station constitute super-intensive heterogeneous network, for Family provides ubiquitous business.
With the sharp increase of mobile subscriber's quantity, the laying of small base station also tends to super-intensive, wireless communication field bring Energy consumption be it is very huge, for China's national conditions that environmental pollution is serious and the energy is increasingly in short supply, green communications are inevitable It is the direction for being worth research and discovery, therefore, on the basis of guaranteeing to meet user data demand and service quality, passes through conjunction The resource distribution mode of reason realizes that higher energy efficiency is an important research direction, still, in the prior art, not yet Effective optimization method it can be considered that time varying channel influence, simulate practical time-variant channel environment, divided with lower computation complexity Distribution network resource and the optimization method for obtaining higher-energy efficiency.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of wireless network resource distribution sides based on depth enhancing study Method, to solve the problems, such as that radio resource allocation in time-variant channel environment can not be effectively realized present in the prior art.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of wireless network resource based on depth enhancing study Distribution method, comprising:
S101 establishes the convolutional neural networks q by two identical parameterseval、qtargetConstituting depth enhances learning model;
S102 believes the time-varying Markov that the time-variant channel environment between base station and user terminal is modeled as finite state Road determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value is most Big movement is acted as decision, distributes subcarrier for user;
S103, according to sub-carrier allocation results, the inverse ratio based on channel coefficients is the user point being multiplexed on each subcarrier With descending power, system energy efficiency is determined based on the descending power of distribution, return letter is determined based on the system energy efficiency Number, and Reward Program is fed back into depth enhancing learning model;
S104, according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、 qtargetIf the difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or is higher than Preset threshold, the then descending power currently distributed are that power local optimum is distributed under time-variant channel environment.
Further, the normalization channel coefficients indicate are as follows:
Wherein, Hn,kTo normalize channel coefficients, indicate that the normalization channel of base station and user terminal n on sub-carrierk increases Benefit;hn,kIndicate base station and the channel gain of user terminal n on sub-carrierk;Indicate noise power on sub-carrierk.
Further, the input convolutional neural networks qeval, select the maximum movement of output return value dynamic as decision Make, distributing subcarrier for user includes:
The normalization channel coefficients are inputted into convolutional neural networks qeval, convolutional neural networks qevalPass through decision formulaIt selects the maximum movement of output return value to act as decision, distributes subcarrier for user;
Wherein, θevalIndicate convolutional neural networks qevalWeighting parameter, Q function Q (s, a ';θeval) indicate that weight is θevalConvolutional neural networks qevalIn state s, execution acts a ' return value obtained, and the state s is the normalizing of input Change channel coefficients;A indicates the decision movement of depth enhancing learning model, i.e., optimal sub-carrier allocation results, wherein according to return The index for being worth maximum movement obtains optimal sub-carrier allocation results.
Further, the descending power for user's distribution indicates are as follows:
Wherein, pn,kIndicate that base station is the down transmitting power of user terminal n distribution on sub-carrierk;p'kIndicate base station The down transmitting power distributed on sub-carrierk;A indicates decay factor;KmaxIt indicates in non-orthogonal multiple access network, when Under the complexity that preceding serial interference canceller can be born, the maximum number of user that is multiplexed on each subcarrier.
Further, the descending power based on distribution determines that system energy efficiency includes:
Determine the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,k
According to the normalization channel coefficients between determining base station and user, the downlink function of sub-carrier allocation results and distribution Rate determines system power consumption UP(X);
According to determining rn,kAnd UP(X), system energy efficiency is determined.
Further, the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,kIt indicates are as follows:
rn,k=log2(1+γn,k)
Wherein, γn,kIndicate the Signal-to-Noise that user terminal n is obtained from subcarrier k, γn,kIndicate user terminal n from son The Signal-to-Noise that carrier wave k is obtained;
System power consumption UP(X) it indicates are as follows:
Wherein, pkIndication circuit consumes power, and ψ indicates base station energy regenerating coefficient, xn,kIndicate whether user terminal n makes With subcarrier k.
Further, system energy efficiency indicates are as follows:
Wherein, een,kIndicate the energy efficiency of subcarrier k to user terminal n,Indicate subcarrier k channel width, N table Show the set of user terminal, K indicates the set of workable subcarrier under current base station.
Further, described that Reward Program is determined based on the system energy efficiency, and Reward Program is fed back into depth Enhancing learning model includes:
The system energy efficiency for not meeting preset modeling constraint condition is pressed with the Weakly supervised algorithm based on value return The system energy efficiency is punished according to the type for not meeting modeling constraint condition, depth enhancing learning model is obtained and makes a policy Reward Program after movement, and the Reward Program is fed back into depth enhancing learning model;Wherein, the Reward Program indicates Are as follows:
Wherein, rewardtIndicate the t times it is trained when the Reward Program that calculates;RminIndicate QoS of customer lowest bid Standard, i.e., minimum downlink transmission rate;HinnterIt indicates to work in the nearest base station of same subcarrier frequency and the base station currently optimized Between the corresponding normalization channel coefficients of the shortest distance;IkIndicate the cross-layer interference upper limit that k-th of subcarrier frequency range can be born; ξcase1case3Indicate that three kinds do not meet the case where modeling constrains to the penalty coefficient of system energy efficiency.
Further, described according to determining Reward Program, training depth enhances the convolutional neural networks in learning model qeval、qtargetIf difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or Higher than preset threshold, then the descending power currently distributed is that the distribution of power local optimum includes: under time-variant channel environment
Using Reward Program, channel circumstance, decision movement and the next state being transferred to as four-tuple deposit depth enhancing study The memory playback unit memory of model, wherein the memory is indicated are as follows:
Memory:D (t)=e (1) ..., e (t) }
E (t)=(s (t), a (t), r (t), s (t+1))
Wherein, s (t) indicates the state inputted when the t times trained depth enhancing learning model;A (t) indicates the t times training When depth enhances learning model, the decision that depth enhancing learning model is made is acted;R (t) indicates that the t times trained depth enhancing is learned When practising model, depth enhances learning model after movement a (t) is made, obtained Reward Program rewardt;S (t+1) indicates t+1 When secondary trained depth enhancing learning model, according to the updated next state of time-varying Markov channel of finite state;
Data memory is randomly selected for two convolutional neural networks from the memory playback unit of depth enhancing learning model Study and gradient decline update, wherein gradient decline only update a convolutional neural networks qevalParameter, depth enhance learn Model training is practised in the process every fixed number of times, updates qtargetParameter θtargetFor qevalParameter θeval
If the difference between the resulting system energy efficiency value of continuous several times and preset threshold is in default range or high In preset threshold, then the descending power currently distributed is that power local optimum is distributed under time-variant channel environment.
Further, gradient decline more new formula is expressed as:
Wherein,Indicate training learning rate;λ indicates the discount factor assessed decision body next state;It indicates in next state s (t+1) of the input for current memory e (t), weight θtargetConvolution Neural network qtargetThe movement a ' that can harvest maximal rewards that decision goes out;Q(s(t),a(t);θeval) indicate in input to be to work as When the state s (t) of preceding memory e (t), weight θevalConvolutional neural networks qevalExecution acts a (t) return value obtained;Indicate to be θ to parameterevalConvolutional neural networks do gradient step-down operation.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, establish by two convolutional neural networks qeval、qtargetConstituting depth enhances learning model;By base The time-variant channel environment between user terminal of standing is modeled as the time-varying Markov channel of finite state, determines base station and user Between normalization channel coefficients, and input convolutional neural networks qeval, select the maximum movement of output return value as decision Movement distributes subcarrier for user;According to sub-carrier allocation results, the inverse ratio based on channel coefficients is to be multiplexed on each subcarrier User's allocation of downlink power, system energy efficiency is determined based on the descending power of distribution, based on the system energy efficiency it is true Determine Reward Program, and Reward Program is fed back into depth enhancing learning model;According to determining Reward Program, training depth enhancing Convolutional neural networks q in learning modeleval、qtargetIf the resulting system energy efficiency value of continuous several times and preset threshold it Between difference in default range or be higher than preset threshold, then the descending power currently distributed is power under time-variant channel environment Local optimum distribution;In this way, the time-variant channel environment between base station and user terminal to be modeled as to the time-varying Ma Er of finite state Section's husband's channel, it is complicated by calculating to enhance learning model using depth on the basis of considering the time varying channel of high complexity During degree is transformed into trained depth enhancing learning model, to choose decision movement with lower complexity, determine that time-varying is believed Under road environment, the subcarrier local optimum of base station to user terminal is distributed, and improves the energy in time-variant channel environment to the maximum extent Amount efficiency.
Detailed description of the invention
Fig. 1 is that the process of the wireless network resource distribution method provided in an embodiment of the present invention based on depth enhancing study is shown It is intended to;
Fig. 2 is the detailed stream of the wireless network resource distribution method provided in an embodiment of the present invention based on depth enhancing study Journey schematic diagram.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.
The present invention it is existing can not effectively realize radio resource allocation in time-variant channel environment aiming at the problem that, provide one Wireless network resource distribution method of the kind based on depth enhancing study.
As shown in Figure 1, the wireless network resource distribution method provided in an embodiment of the present invention based on depth enhancing study, packet It includes:
S101 establishes the convolutional neural networks q by two identical parameterseval、qtargetConstituting depth enhances learning model (Deep Q Network, DQN);
S102 believes the time-varying Markov that the time-variant channel environment between base station and user terminal is modeled as finite state Road determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value is most Big movement is acted as decision, distributes subcarrier for user;
S103, according to sub-carrier allocation results, the inverse ratio based on channel coefficients is the user point being multiplexed on each subcarrier With descending power, system energy efficiency is determined based on the descending power of distribution, return letter is determined based on the system energy efficiency Number, and Reward Program is fed back into depth enhancing learning model;
S104, according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、 qtargetIf the difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or is higher than Preset threshold, the then descending power currently distributed are that power local optimum is distributed under time-variant channel environment.
Wireless network resource distribution method based on depth enhancing study described in the embodiment of the present invention, establishes and is rolled up by two Product neural network qeval、qtargetConstituting depth enhances learning model;Time-variant channel environment between base station and user terminal is built Mould is the time-varying Markov channel of finite state, determines the normalization channel coefficients between base station and user, and input convolution Neural network qeval, select the maximum movement of output return value to act as decision, distribute subcarrier for user;According to subcarrier Allocation result, the inverse ratio based on channel coefficients is the user's allocation of downlink power being multiplexed on each subcarrier, based under distribution Row power determines system energy efficiency, determines Reward Program based on the system energy efficiency, and Reward Program is fed back to depth Degree enhancing learning model;According to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、 qtargetIf the difference between the resulting system energy efficiency value of continuous several times and preset threshold in default range or is higher than Preset threshold, the then descending power currently distributed are that power local optimum is distributed under time-variant channel environment;In this way, by base station and using Time-variant channel environment between the terminal of family is modeled as the time-varying Markov channel of finite state, to consider high complexity Time varying channel on the basis of, enhance learning model using depth, computation complexity, which is transformed into trained depth, enhances learning model During, to choose decision movement with lower complexity, determine under time-variant channel environment, the son of base station to user terminal carries The distribution of wave local optimum improves the energy efficiency in time-variant channel environment to the maximum extent.
Depth enhancing study in the present embodiment is a kind of decision-making technique based on artificial intelligence, it is characterized in that becoming in dynamic The sequential decision that decision body is made in the environment of change can enhance state, movement, award required for learning, certainly with construction depth Plan body can be automated when training depth increases learning model and Optimal Decision-making acts.Increased described in the present embodiment based on depth The wireless network resource distribution method learnt by force, can simulate time-variant channel environment, to the maximum extent with lower computation complexity The distribution for optimizing wireless network resource in time-varying network scene, achievees the effect that high-speed decision is promoted jointly with energy efficiency.Instruction The depth enhancing learning model perfected can continue to the management for time-variant channel environment radio resource, and make the fast of high repayment Quick decision plan.In large-scale radio network optimization, this depth can be enhanced learning model and carry out distributed computing, to reduce Complexity.
Wireless network resource distribution method based on depth enhancing study described in the present embodiment in order to better understand is right The method is described in detail, and specific steps may include:
A11, building depth enhance learning model DQN
In the present embodiment, initially set up by the convolutional neural networks q of two identical parameterseval、qtargetConstitute depth enhancing Learning model;The decision process of the depth enhancing learning model is by Q function Q (s, a;It θ) determines, wherein θ indicates convolutional Neural The weighting parameter of network, convolutional neural networks qevalAnd qtargetParameter be respectively θevalAnd θtarget, the two initialization phase Together;Q function Q (s, a;θ) the convolutional neural networks that expression weight is θ execution in state s acts a, return value obtained.
In the present embodiment, each convolutional neural networks are by two convolutional layers, two pond layers and two full articulamentum structures At;Training input is [n every timesamples, N, K], first dimension nsamplesIndicate input sample quantity, second and third dimension The normalization channel coefficient matrix that one input sample of ([N, K]) expression, i.e. dimension are [N, K];Quantity is inputted in training every time For nsamplesNormalization channel coefficient matrix, every time input convolutional neural networks normalization channel coefficient matrix be [N, K] Data export the return value Q obtained for movement all possible under current channel condition, each movementaction_val, Qaction_val Data structure be one-dimensional vector [Actionnum], wherein ActionNumIndicate all possible amount of action, input channel shape State quantity is nsamples, the return value [Action of everything acquisition is made under each statenum], therefore exporting is nsamplesIt is a One-dimensional vector [Actionnum] constitute two-dimensional matrix.
A12 believes the time-varying Markov that the time-variant channel environment between base station and user terminal is modeled as finite state Road determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value is most Big movement is acted as decision, distributes subcarrier for user
In the present embodiment, in a certain range, dispose it is multiple with the small base stations (SBS) of frequency, small base station include outdoor micro-base station, Scytoblastema station and indoor Home eNodeB.6 user terminals (UE) are set in each small base station range and 3 nonopiate more Available subcarrier (SC) spreads a distribution in location access network in the certain area centered on small base station.The present embodiment is each An independent depth is run on small base station enhances learning model, achievees the effect that distributed treatment.Initialize small base station and The parameter of user terminal, the parameter includes but is not limited to: SBS and UEnNormalization channel coefficients H on sub-carrierkn,k, be The channel width B of this base station distribution, sub-carrier channels bandwidth BSC, circuit consume power pkDeng, wherein UEnIndicate user terminal N, SCkIndicate subcarrier k, while the subcarrier associated matrix X of initialising subscriber-N,KWith the time-varying Markov channel of finite state (Finite State Markov Channel, FSMC) transition probability matrixN indicates the set of user terminal, and K is indicated The set of workable subcarrier under current base station;Initialize the obtained subcarrier associated matrix X of user-N,KWith finite state Time-varying Markov matrix of the channel transfer probabilityOptimize as subsequent user incidence matrix and calculate and updates channel status.
In the present embodiment, optimization channel circumstance is the time-varying Markov channel of finite state, is spread at random a little by space Initial coordinate is obtained, and calculates initial normalization channel coefficient matrix, ten rank of institute's value is quantified, quantization boundary is bound0,...,bound9, optimization scene is based on time-varying Markov matrix of the channel transfer probabilityVariation.Transition probability square Battle arrayIn element usable probability shift indicator pi,jIt indicates, wherein i indicates current state, and j indicates next state (current State in state after execution movement), pi,jIndicate the probability that next state j is transferred to from current state i;Provide p when i=ji,jIt takes most Big value keeps the maximum probability of former channel status, is transferred to adjacent second shape probability of state and is transferred into adjacent first The half of state probability, each iteration are pressedMore new environment.
In the present embodiment, the subcarrier associated matrix X of user-N,KElement can distribute indicator x with user-subcarriern,k It indicates, xn,kIndicate whether user terminal n uses subcarrier k, in a particular application, for example, binary one (x can be usedn,k=1) Indicate that user terminal n uses subcarrier k, with Binary Zero (xn,k=0) it indicates that user terminal n does not use subcarrier k, that is, does not have Apply to the resource for using subcarrier k.All possible subcarrier distribution calculation method is as follows:
Introduce number of combinations C, it is assumed that defining the non-orthogonal multiple access network subcarrier multiplex number upper limit is 2, and each User (can be adjusted) in the case where can only using a subcarrier according to practical application, and type is sharedFor ease of description, the present embodiment is calculated using the small base station network model of low capacitySimplification situation.By ActionnumThe possible sub-carrier wave distribution method of kind is deposited with the structure of list Storage, is expressed as Actionlist, list index corresponds to possible sub-carrier wave distribution method, can match subcarrier according to index value Distribution method, to reduce the complexity of DQN processing, DQN decision movements design is integer [0, Actionnum-1];Wherein, The corresponding subcarrier associated matrix X of a user-of every sub-carrier distribution methodN,K
In the present embodiment, using the ratio of gain and noise between base station and user terminal as normalization channel coefficients, institute Normalization channel coefficients are stated to be determined by following formula:
Wherein, Hn,kTo normalize channel coefficients, indicate that the normalization channel of base station and user terminal n on sub-carrierk increases Benefit;hn,kIndicate base station and the channel gain of user terminal n on sub-carrierk, it is big according to caused by Ruili rapid fading and distance Scale decline calculates, and the generic service range based on small base station is indoor environment, and two layers of wall damage is added;It indicates Noise power on sub-carrierk, wherein E [] indicates mathematic expectaion,Indicate that mean value is 0, variance ForAdditive white Gaussian noise.
In the present embodiment, the normalization channel coefficients are inputted into convolutional neural networks qeval, convolutional neural networks qevalIt is logical Cross decision formulaIt selects the maximum movement of output return value to act as decision, is distributed for user Subcarrier;
Wherein, Q function Q (s, a ';θeval) indicate convolutional neural networks qevalThe execution in state s of decision body acts a ' institute The return value of acquisition, the state s are the normalization channel coefficients of input;A indicates the decision movement of depth enhancing learning model, I.e. optimal sub-carrier allocation results are a kind of possible XN,K, indicate the incidence matrix of user terminal n and subcarrier k.
In the present embodiment, the input that depth enhances learning model DQN is DQN decision body state in which s, i.e. normalization letter Road coefficient (specifically: two dimension normalization channel coefficient matrix HN,K);Output is one-dimensional vector Qaction_val, in Qaction_valMiddle choosing The decision movement (optimal sub-carrier allocation results) that the maximum movement a ' of value is distributed as subcarrier is selected, therefore, in Qaction_val The index of the middle maximum movement of selective value enters ActionlistMatching obtains current decision movement XN,K, to obtain arriving base station The subcarrier of user terminal obtains the subcarrier associated matrix X of user-when local optimum apportioning costN,K, in this way, according to index value Sub-carrier wave distribution method is matched, can reduce the complexity of DQN processing.
A13, according to optimal sub-carrier allocation results, based on the score order algorithm of fixed subcarrier distribution, i.e., same sub- load User's allocation of downlink power (wherein, channel increasing under wave according to channel gain coefficient inverse ratio rule to be multiplexed on each subcarrier The biggish user of benefit distributes smaller power, and channel gain small user distribute relatively high power).
In the present embodiment, the descending power for user's distribution is indicated are as follows:
Wherein, pn,kIndicate that base station is the down transmitting power of user terminal n distribution on sub-carrierk;p'kIndicate base station The down transmitting power distributed on sub-carrierk;A indicates that decay factor, constraint condition are 0 < a < 1, and same suboptimization The value of Cheng Zhong, a are definite value, can not be changed according to different user or different sub-carrier;KmaxIt indicates to access net in non-orthogonal multiple In network, complexity that current serial interference eliminator (Successive Interference Cancellation, SIC) can be born Under degree, the maximum number of user that is multiplexed on each subcarrier.
A14 determines the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,k
In the present embodiment, the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,kIt indicates are as follows:
rn,k=log2(1+γn,k)
Wherein, γn,kIndicate the Signal-to-Noise that user terminal n is obtained from subcarrier k, γn,kIndicate user terminal n from son The Signal-to-Noise that carrier wave k is obtained.
In the present embodiment, in non-orthogonal multiple access network, user of the multiplexing on same subcarrier is normalized into letter Road coefficient arranges in descending order, indicates are as follows:
|H1,k|≥|H2,k|≥…≥|Hn,k|≥|Hn+1,k|≥…≥|HKmax,k|
It can be succeeded before user terminal i is located at j in the sequence based on the optimal decoding order of serial interference canceller The interference for carrying out user terminal j is decoded and removes, user terminal j can receive the signal of user terminal i, and together as interference Receive.Non-orthogonal multiple accesses in network, considers fairness between user and reduces the principle of co-channel interference, when distribution power, Channel condition good user distribute smaller power, i.e., in examples detailed above, if Hi,k>Hj,k, then p is distributedi,k<pj,k, with point in A13 The distribution principle of number order algorithm is consistent.
Consider to reduce co-channel interference and computation complexity to the greatest extent under small base station scene, predefines each subcarrier multiplex number For KmaxThe maximum information transmission rate of=2, user terminal i and user terminal j are a Signal to Interference plus Noise Ratios The logarithmic function of (Signal to Interference plus Noise Ratio, SINR).χINNER=pi,kHj,kIt indicates to use Co-channel interference in the layer that family terminal j is subject under current base station service.
In the present embodiment, the peak transfer rate of user terminal i and user terminal j are indicated are as follows:
ri,k=log2(1+γi,k), rj,k=log2(1+γj,k), γi,k=pi,kHi,k,
That is:
ri,k=log2(1+pi,kHi,k),
A16 determines system power consumption UP(X)
In the present embodiment, consider that small base station has energy recovery unit, the system power consumption UP(X) it indicates are as follows:
In the present embodiment, pkIndication circuit consumes power;ψ indicates base station energy regenerating coefficient, can be according to actual hardware category Property change.
A17, according to determining γn,kAnd UP(X), system energy efficiency is determined
In the present embodiment, according to the undistorted rate of information throughput of maximum of obtained base station subcarrier k to user terminal n rn,kWith system power consumption UP(X), the energy efficiency ee of subcarrier k to user terminal n is calculatedn,k:
Wherein,Indicate subcarrier k channel width.
In the present embodiment, system energy efficiency is indicated are as follows:
Wherein, een,kIndicate the energy efficiency of subcarrier k to user terminal n,Indicate subcarrier k channel width, N table Show the set of user terminal, K indicates the set of workable subcarrier under current base station.
A17 determines Reward Program based on the system energy efficiency, and Reward Program is fed back to depth enhancing study mould Type
In the present embodiment, between preset modeling constraint condition is not met, (the modeling constraint condition is by fairness user Principle, service quality minimum standard, the cross-layer interference upper limit etc. factors determine) system energy efficiency, based on the weak of value return Algorithm is supervised, the system energy efficiency is punished according to the type for not meeting modeling constraint condition, obtains depth enhancing study Model make a policy movement after Reward Program, and by Reward Program feed back to depth enhancing learning model;Wherein, the return Function representation are as follows:
Wherein, rewardtIndicate the t times it is trained when the Reward Program that calculates;RminIndicate QoS of customer (Quality Of Service, QoS) minimum standard, i.e., minimum downlink transmission rate;HinnterExpression works in same subcarrier frequency most The corresponding normalization channel coefficients of the shortest distance between nearly base station and the base station currently optimized, can be according to the method in step A12 It calculates;IkIt indicates cross-layer (across station) interference upper limit that k-th of subcarrier frequency range can be born, adjustment interference is set according to concrete application Upper limit value;ξcase1case3Indicate that three kinds do not meet the case where modeling constrains to the penalty coefficient of energy efficiency.
Separately it should be understood that when directly using system energy efficiency as Reward Program, xn,k, α also need to meet other about Beam condition, in conjunction with above-mentioned constraint condition, at this point, xn,k, the constraint condition that need to meet of a are as follows:
Wherein, BSpeakIndicate small base station peak power;Condition 1Force user terminal can only simultaneously with 1 subcarrier is associated;Condition 2It limits in non-orthogonal multiple access network, same height carries The maximum number of user amount being multiplexed on wave, the quantity are Kmax, it is therefore an objective to it reduces and is interfered in station and reduce answering for serial interference canceller Miscellaneous degree;Condition 3It is constrained for QoS, by the rate of information throughput of all user terminals of base station service It should be more than the minimum limitation of QoS of customer.Condition 4Be to from base station subcarrier k most The limitation of big transmission power.Condition 5It is a kind of effective interference coordination mechanism, limits current optimization Interference of the base station for other base stations.Condition 6Limitation when being distribution power, to decay factor.
Reward Program, channel circumstance, decision are acted and transfer next state deposit DQN remember playback unit by A18
In the present embodiment, Reward Program, channel circumstance, decision are acted and transfer next state (state being transferred to) is used as four Tuple is stored in DQN and remembers playback unit memory, and memory is indicated are as follows:
Memory:D (t)=e (1) ..., e (t) }
E (t)=(s (t), a (t), r (t), s (t+1))
Wherein, s (t) indicates the normalization channel coefficients (state) inputted when the t times training pattern;A (t) is indicated the t times When training depth enhancing learning model, the decision movement that DQN makes, the i.e. subcarrier associated matrix of user-;R (t) is indicated the t times When training depth enhancing learning model, DQN is after movement a (t) is made, obtained Reward Program rewardt;S (t+1) indicates t+ When 1 trained depth enhancing learning model, according to the updated normalization channel system of time-varying Markov channel of finite state Number (next state).
In the present embodiment, by being defined to memory playback class, and memory is set as to the number of object array or dictionary According to structure, every group of e (t) is stored.
A19, enhances learning model using batch mode training depth, randomly selects fixation from DQN memory playback unit Study and gradient decline update of the batch data memory of size for two convolutional neural networks.
In the present embodiment, data memory is handled using loss function Loss (θ), loss function Loss (θ) is indicated are as follows:
Gradient decline more new formula is expressed as:
Wherein,Indicate training learning rate;λ indicates the discount factor assessed decision body next state;It indicates in next state s (t+1) of the input for current memory e (t), weight θtargetConvolution Neural network qtargetThe movement a ' that can harvest maximal rewards that decision goes out;Q(s(t),a(t);θeval) indicate in input to be to work as When the state s (t) of preceding memory e (t), weight θevalConvolutional neural networks qevalExecution acts a (t) return value obtained;Indicate to be θ to parameterevalConvolutional neural networks do gradient step-down operation, i.e., modification convolutional neural networks qevalParameter θeval, so that convolutional neural networks qtargetWith qevalOutput subtract each other after minimum.
In the present embodiment, subtraction Q (s (t), a (t);θeval) operation belong to the operation of respective action index position, such as remember Recall unit e (1) and selected movement 2, then more new formula is declined by gradient, only updates in two convolutional neural networks at [1,2] Numerical value, remaining in first dimension acts that corresponding numerical value is constant, and the stability to guarantee training, and gradient decline is only more New convolutional neural networks qevalParameter.
A20 updates q every fixed number of times in depth enhancing learning model training processtargetParameter is qevalParameter, It indicates are as follows:
Wherein, CiterThe counter in training is indicated, for recording frequency of training;CmaxIndicate qtargetParameter and qevalGinseng Several update intervals, while being also CiterPeriod of change, therefore CiterEqual to CmaxWhen, it is zeroed.
A21, the q obtained after being updated by step A19 and A20targetNetwork parameter and qevalNetwork parameter, if continuous several times Difference between the system energy efficiency value and preset threshold (designated value) of optimization is within the scope of preset, or is higher than default threshold Value, then it is current to distribute it is believed that depth enhancing learning model can be adapted for the radio resource allocation of this time-variant channel environment Descending power be time-variant channel environment under power local optimum distribute, current depth enhancing learning model Internet resources are divided With the local optimum under changing environment when reaching this, gained depth enhancing learning model can persistently make under practical time-variant channel environment With;
Otherwise A22 is pressedMore new environment judges Citer=CmaxIt is whether true, if so, then enable Citer=0, θtargeteval, then execute step A12;Otherwise, step A12 is directly executed, until the system energy efficiency recalculated Difference between value and preset threshold is in default range, or is higher than preset threshold, at this point, reaching in time-variant channel environment Optimum optimization.
In the present embodiment, with the increase of optimization number t, return value of the DQN model in time-variant channel environment can be from low Gradually it is intended to higher level, which is the wireless network resource distribution method based on depth enhancing study, to realize The optimization of sub-carrier, power distribution in time-variant channel environment.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of wireless network resource distribution method based on depth enhancing study characterized by comprising
S101 establishes the convolutional neural networks q by two identical parameterseval、qtargetConstituting depth enhances learning model;
Time-variant channel environment between base station and user terminal is modeled as the time-varying Markov channel of finite state by S102, It determines the normalization channel coefficients between base station and user, and inputs convolutional neural networks qeval, selection output return value maximum Movement as decision act, for user distribute subcarrier;
S103, according to sub-carrier allocation results, the inverse ratio based on channel coefficients is under the user's distribution being multiplexed on each subcarrier Row power, determines system energy efficiency based on the descending power of distribution, determines Reward Program based on the system energy efficiency, and Reward Program is fed back into depth enhancing learning model;
S104, according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、qtargetIf Difference between the resulting system energy efficiency value of continuous several times and preset threshold is in default range or is higher than preset threshold, The descending power then currently distributed is that power local optimum is distributed under time-variant channel environment.
2. the wireless network resource distribution method according to claim 1 based on depth enhancing study, which is characterized in that institute Stating normalization channel coefficients indicates are as follows:
Wherein, Hn,kTo normalize channel coefficients, base station and the normalization channel gain of user terminal n on sub-carrierk are indicated; hn,kIndicate base station and the channel gain of user terminal n on sub-carrierk;Indicate noise power on sub-carrierk.
3. the wireless network resource distribution method according to claim 2 based on depth enhancing study, which is characterized in that institute State input convolutional neural networks qeval, select the maximum movement of output return value to act as decision, distribute subcarrier for user Include:
The normalization channel coefficients are inputted into convolutional neural networks qeval, convolutional neural networks qevalPass through decision formulaIt selects the maximum movement of output return value to act as decision, distributes subcarrier for user;
Wherein, θevalIndicate convolutional neural networks qevalWeighting parameter, Q function Q (s, a ';θeval) expression weight be θevalVolume Product neural network qevalIn state s, execution acts a ' return value obtained, and the state s is the normalization channel system of input Number;A indicates the decision movement of depth enhancing learning model, i.e., optimal sub-carrier allocation results, wherein maximum according to return value The index of movement obtains optimal sub-carrier allocation results.
4. the wireless network resource distribution method according to claim 3 based on depth enhancing study, which is characterized in that be The descending power of user's distribution indicates are as follows:
Wherein, pn,kIndicate that base station is the down transmitting power of user terminal n distribution on sub-carrierk;p'kIndicate base station in sub- load The down transmitting power distributed on wave k;A indicates decay factor;KmaxIt indicates in non-orthogonal multiple access network, current serial Under the complexity that interference eliminator can be born, the maximum number of user that is multiplexed on each subcarrier.
5. the wireless network resource distribution method according to claim 4 based on depth enhancing study, which is characterized in that institute The descending power stated based on distribution determines that system energy efficiency includes:
Determine the undistorted rate of information throughput r of maximum of base station subcarrier k to user terminal nn,k
According to the normalization channel coefficients between determining base station and user, the descending power of sub-carrier allocation results and distribution, Determine system power consumption UP(X);
According to determining rn,kAnd UP(X), system energy efficiency is determined.
6. the wireless network resource distribution method according to claim 5 based on depth enhancing study, which is characterized in that base Stand the undistorted rate of information throughput r of maximum of subcarrier k to user terminal nn,kIt indicates are as follows:
rn,k=log2(1+γn,k)
Wherein, γn,kIndicate the Signal-to-Noise that user terminal n is obtained from subcarrier k, γn,kIndicate user terminal n from subcarrier The Signal-to-Noise that k is obtained;
System power consumption UP(X) it indicates are as follows:
Wherein, pkIndication circuit consumes power, and ψ indicates base station energy regenerating coefficient, xn,kIndicate whether user terminal n is carried using son Wave k.
7. the wireless network resource distribution method according to claim 6 based on depth enhancing study, which is characterized in that be Energy efficiency of uniting indicates are as follows:
Wherein, een,kIndicate the energy efficiency of subcarrier k to user terminal n,Indicate subcarrier k channel width, N indicates to use The set of family terminal, K indicate the set of workable subcarrier under current base station.
8. the wireless network resource distribution method according to claim 7 based on depth enhancing study, which is characterized in that institute It states and Reward Program is determined based on the system energy efficiency, and Reward Program is fed back into depth enhancing learning model and includes:
To the system energy efficiency for not meeting preset modeling constraint condition, with the Weakly supervised algorithm based on value return, according to not The type for meeting modeling constraint condition punishes the system energy efficiency, obtains depth enhancing learning model and makes a policy movement Reward Program afterwards, and the Reward Program is fed back into depth enhancing learning model;Wherein, the Reward Program indicates are as follows:
Wherein, rewardtIndicate the t times it is trained when the Reward Program that calculates;RminIndicate QoS of customer minimum standard, i.e., most Low downlink transmission rate;HinnterIt indicates to work in the nearest base station of same subcarrier frequency between the base station that currently optimizes most The corresponding normalization channel coefficients of short distance;IkIndicate the cross-layer interference upper limit that k-th of subcarrier frequency range can be born;ξcase1~ ξcase3Indicate that three kinds do not meet the case where modeling constrains to the penalty coefficient of system energy efficiency.
9. the wireless network resource distribution method according to claim 8 based on depth enhancing study, which is characterized in that institute It states according to determining Reward Program, training depth enhances the convolutional neural networks q in learning modeleval、qtargetIf continuous more Difference between secondary resulting system energy efficiency value and preset threshold in default range or is higher than preset threshold, then currently The descending power of distribution is that the distribution of power local optimum includes: under time-variant channel environment
Using Reward Program, channel circumstance, decision movement and the next state being transferred to as four-tuple deposit depth enhancing learning model Memory playback unit memory, wherein the memory is indicated are as follows:
Memory:D (t)=e (1) ..., e (t) }
E (t)=(s (t), a (t), r (t), s (t+1))
Wherein, s (t) indicates the state inputted when the t times trained depth enhancing learning model;A (t) indicates the t times trained depth When enhancing learning model, the decision that depth enhancing learning model is made is acted;R (t) indicates the t times trained depth enhancing study mould When type, depth enhances learning model after movement a (t) is made, obtained Reward Program rewardt;S (t+1) indicates t+1 instruction When practicing depth enhancing learning model, according to the updated next state of time-varying Markov channel of finite state;
For two convolutional neural networks of data memory is randomly selected from the memory playback unit of depth enhancing learning model It practises and gradient decline updates, wherein gradient decline only updates convolutional neural networks qevalParameter, depth enhance study mould Every fixed number of times in type training process, q is updatedtargetParameter θtargetFor qevalParameter θeval
If the difference between the resulting system energy efficiency value of continuous several times and preset threshold is in default range or is higher than pre- If threshold value, then the descending power currently distributed is that power local optimum is distributed under time-variant channel environment.
10. the wireless network resource distribution method according to claim 9 based on depth enhancing study, which is characterized in that Gradient decline more new formula is expressed as:
Wherein,Indicate training learning rate;λ indicates the discount factor assessed decision body next state;It indicates in next state s (t+1) of the input for current memory e (t), weight θtargetConvolution Neural network qtargetThe movement a ' that can harvest maximal rewards that decision goes out;Q(s(t),a(t);θeval) indicate in input to be to work as When the state s (t) of preceding memory e (t), weight θevalConvolutional neural networks qevalExecution acts a (t) return value obtained;Indicate to be θ to parameterevalConvolutional neural networks do gradient step-down operation.
CN201811535056.1A 2018-12-14 2018-12-14 Wireless network resource allocation method based on deep reinforcement learning Active CN109474980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811535056.1A CN109474980B (en) 2018-12-14 2018-12-14 Wireless network resource allocation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811535056.1A CN109474980B (en) 2018-12-14 2018-12-14 Wireless network resource allocation method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN109474980A true CN109474980A (en) 2019-03-15
CN109474980B CN109474980B (en) 2020-04-28

Family

ID=65675169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811535056.1A Active CN109474980B (en) 2018-12-14 2018-12-14 Wireless network resource allocation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN109474980B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109962728A (en) * 2019-03-28 2019-07-02 北京邮电大学 A kind of multi-node combination Poewr control method based on depth enhancing study
CN110035478A (en) * 2019-04-18 2019-07-19 北京邮电大学 A kind of dynamic multi-channel cut-in method under high-speed mobile scene
CN110084245A (en) * 2019-04-04 2019-08-02 中国科学院自动化研究所 The Weakly supervised image detecting method of view-based access control model attention mechanism intensified learning, system
CN110167176A (en) * 2019-04-25 2019-08-23 北京科技大学 A kind of wireless network resource distribution method based on distributed machines study
CN110380776A (en) * 2019-08-22 2019-10-25 电子科技大学 A kind of Internet of things system method of data capture based on unmanned plane
CN110401975A (en) * 2019-07-05 2019-11-01 深圳市中电数通智慧安全科技股份有限公司 A kind of method, apparatus and electronic equipment of the transmission power adjusting internet of things equipment
CN110430613A (en) * 2019-04-11 2019-11-08 重庆邮电大学 Resource allocation methods of the multicarrier non-orthogonal multiple access system based on efficiency
CN110635833A (en) * 2019-09-25 2019-12-31 北京邮电大学 Power distribution method and device based on deep learning
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN110972309A (en) * 2019-11-08 2020-04-07 厦门大学 Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
CN111428903A (en) * 2019-10-31 2020-07-17 国家电网有限公司 Interruptible load optimization method based on deep reinforcement learning
CN111431646A (en) * 2020-03-31 2020-07-17 北京邮电大学 Dynamic resource allocation method in millimeter wave system
CN111526592A (en) * 2020-04-14 2020-08-11 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111542107A (en) * 2020-05-14 2020-08-14 南昌工程学院 Mobile edge network resource allocation method based on reinforcement learning
WO2020191686A1 (en) * 2019-03-27 2020-10-01 华为技术有限公司 Neural network-based power distribution method and device
CN111867110A (en) * 2020-06-17 2020-10-30 三明学院 Wireless network channel separation energy-saving method based on switch switching strategy
CN111885720A (en) * 2020-06-08 2020-11-03 中山大学 Multi-user subcarrier power distribution method based on deep reinforcement learning
CN111930501A (en) * 2020-07-23 2020-11-13 齐齐哈尔大学 Wireless resource allocation method based on unsupervised learning and oriented to multi-cell network
CN112104400A (en) * 2020-04-24 2020-12-18 广西华南通信股份有限公司 Combined relay selection method and system based on supervised machine learning
CN112770398A (en) * 2020-12-18 2021-05-07 北京科技大学 Far-end radio frequency end power control method based on convolutional neural network
WO2021088441A1 (en) * 2019-11-08 2021-05-14 Huawei Technologies Co., Ltd. Systems and methods for multi-user pairing in wireless communication networks
CN112988229A (en) * 2019-12-12 2021-06-18 上海大学 Convolutional neural network resource optimization configuration method based on heterogeneous computation
CN113115355A (en) * 2021-04-29 2021-07-13 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113395757A (en) * 2021-06-10 2021-09-14 中国人民解放军空军通信士官学校 Deep reinforcement learning cognitive network power control method based on improved return function
CN113490184A (en) * 2021-05-10 2021-10-08 北京科技大学 Smart factory-oriented random access resource optimization method and device
CN114126025A (en) * 2021-11-02 2022-03-01 中国联合网络通信集团有限公司 Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server
CN114142912A (en) * 2021-11-26 2022-03-04 西安电子科技大学 Resource control method for guaranteeing time coverage continuity of high-dynamic air network
CN114360305A (en) * 2021-12-15 2022-04-15 广州创显科教股份有限公司 Classroom interactive teaching method and system based on 5G network
CN114928549A (en) * 2022-04-20 2022-08-19 清华大学 Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105407535A (en) * 2015-10-22 2016-03-16 东南大学 High energy efficiency resource optimization method based on constrained Markov decision process
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN106909728A (en) * 2017-02-21 2017-06-30 电子科技大学 A kind of FPGA interconnection resources configuration generating methods based on enhancing study
US20180091981A1 (en) * 2016-09-23 2018-03-29 Board Of Trustees Of The University Of Arkansas Smart vehicular hybrid network systems and applications of same
US20180121766A1 (en) * 2016-09-18 2018-05-03 Newvoicemedia, Ltd. Enhanced human/machine workforce management using reinforcement learning
CN108307510A (en) * 2018-02-28 2018-07-20 北京科技大学 A kind of power distribution method in isomery subzone network
CN108712748A (en) * 2018-04-12 2018-10-26 天津大学 A method of the anti-interference intelligent decision of cognitive radio based on intensified learning
CN108737057A (en) * 2018-04-27 2018-11-02 南京邮电大学 Multicarrier based on deep learning recognizes NOMA resource allocation methods
CN108989099A (en) * 2018-07-02 2018-12-11 北京邮电大学 Federated resource distribution method and system based on software definition Incorporate network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN105407535A (en) * 2015-10-22 2016-03-16 东南大学 High energy efficiency resource optimization method based on constrained Markov decision process
US20180121766A1 (en) * 2016-09-18 2018-05-03 Newvoicemedia, Ltd. Enhanced human/machine workforce management using reinforcement learning
US20180091981A1 (en) * 2016-09-23 2018-03-29 Board Of Trustees Of The University Of Arkansas Smart vehicular hybrid network systems and applications of same
CN106909728A (en) * 2017-02-21 2017-06-30 电子科技大学 A kind of FPGA interconnection resources configuration generating methods based on enhancing study
CN108307510A (en) * 2018-02-28 2018-07-20 北京科技大学 A kind of power distribution method in isomery subzone network
CN108712748A (en) * 2018-04-12 2018-10-26 天津大学 A method of the anti-interference intelligent decision of cognitive radio based on intensified learning
CN108737057A (en) * 2018-04-27 2018-11-02 南京邮电大学 Multicarrier based on deep learning recognizes NOMA resource allocation methods
CN108989099A (en) * 2018-07-02 2018-12-11 北京邮电大学 Federated resource distribution method and system based on software definition Incorporate network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YING HE等: "Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach", 《IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY》 *
YONG ZHANG等: "Power Allocation in Multi-cell Networks Using Deep Reinforcement Learning", 《2018 IEEE 88TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-FALL)》 *

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020191686A1 (en) * 2019-03-27 2020-10-01 华为技术有限公司 Neural network-based power distribution method and device
CN113615277A (en) * 2019-03-27 2021-11-05 华为技术有限公司 Power distribution method and device based on neural network
CN113615277B (en) * 2019-03-27 2023-03-24 华为技术有限公司 Power distribution method and device based on neural network
CN109962728A (en) * 2019-03-28 2019-07-02 北京邮电大学 A kind of multi-node combination Poewr control method based on depth enhancing study
CN110084245A (en) * 2019-04-04 2019-08-02 中国科学院自动化研究所 The Weakly supervised image detecting method of view-based access control model attention mechanism intensified learning, system
CN110430613A (en) * 2019-04-11 2019-11-08 重庆邮电大学 Resource allocation methods of the multicarrier non-orthogonal multiple access system based on efficiency
CN110035478A (en) * 2019-04-18 2019-07-19 北京邮电大学 A kind of dynamic multi-channel cut-in method under high-speed mobile scene
CN110167176A (en) * 2019-04-25 2019-08-23 北京科技大学 A kind of wireless network resource distribution method based on distributed machines study
CN110167176B (en) * 2019-04-25 2021-06-01 北京科技大学 Wireless network resource allocation method based on distributed machine learning
CN110401975A (en) * 2019-07-05 2019-11-01 深圳市中电数通智慧安全科技股份有限公司 A kind of method, apparatus and electronic equipment of the transmission power adjusting internet of things equipment
CN110380776A (en) * 2019-08-22 2019-10-25 电子科技大学 A kind of Internet of things system method of data capture based on unmanned plane
CN110380776B (en) * 2019-08-22 2021-05-14 电子科技大学 Internet of things system data collection method based on unmanned aerial vehicle
CN110635833A (en) * 2019-09-25 2019-12-31 北京邮电大学 Power distribution method and device based on deep learning
CN110635833B (en) * 2019-09-25 2020-12-15 北京邮电大学 Power distribution method and device based on deep learning
CN111428903A (en) * 2019-10-31 2020-07-17 国家电网有限公司 Interruptible load optimization method based on deep reinforcement learning
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN110972309B (en) * 2019-11-08 2022-07-19 厦门大学 Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning
US11246173B2 (en) 2019-11-08 2022-02-08 Huawei Technologies Co. Ltd. Systems and methods for multi-user pairing in wireless communication networks
CN110972309A (en) * 2019-11-08 2020-04-07 厦门大学 Ultra-dense wireless network power distribution method combining graph signals and reinforcement learning
WO2021088441A1 (en) * 2019-11-08 2021-05-14 Huawei Technologies Co., Ltd. Systems and methods for multi-user pairing in wireless communication networks
CN112988229A (en) * 2019-12-12 2021-06-18 上海大学 Convolutional neural network resource optimization configuration method based on heterogeneous computation
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
CN111431646B (en) * 2020-03-31 2021-06-15 北京邮电大学 Dynamic resource allocation method in millimeter wave system
CN111431646A (en) * 2020-03-31 2020-07-17 北京邮电大学 Dynamic resource allocation method in millimeter wave system
CN111526592B (en) * 2020-04-14 2022-04-08 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN111526592A (en) * 2020-04-14 2020-08-11 电子科技大学 Non-cooperative multi-agent power control method used in wireless interference channel
CN112104400A (en) * 2020-04-24 2020-12-18 广西华南通信股份有限公司 Combined relay selection method and system based on supervised machine learning
CN111542107A (en) * 2020-05-14 2020-08-14 南昌工程学院 Mobile edge network resource allocation method based on reinforcement learning
CN111885720B (en) * 2020-06-08 2021-05-28 中山大学 Multi-user subcarrier power distribution method based on deep reinforcement learning
CN111885720A (en) * 2020-06-08 2020-11-03 中山大学 Multi-user subcarrier power distribution method based on deep reinforcement learning
CN111867110B (en) * 2020-06-17 2023-10-03 三明学院 Wireless network channel separation energy-saving method based on switch switching strategy
CN111867110A (en) * 2020-06-17 2020-10-30 三明学院 Wireless network channel separation energy-saving method based on switch switching strategy
CN111930501A (en) * 2020-07-23 2020-11-13 齐齐哈尔大学 Wireless resource allocation method based on unsupervised learning and oriented to multi-cell network
CN112770398A (en) * 2020-12-18 2021-05-07 北京科技大学 Far-end radio frequency end power control method based on convolutional neural network
CN113115355A (en) * 2021-04-29 2021-07-13 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113115355B (en) * 2021-04-29 2022-04-22 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113490184A (en) * 2021-05-10 2021-10-08 北京科技大学 Smart factory-oriented random access resource optimization method and device
CN113395757A (en) * 2021-06-10 2021-09-14 中国人民解放军空军通信士官学校 Deep reinforcement learning cognitive network power control method based on improved return function
CN113395757B (en) * 2021-06-10 2023-06-30 中国人民解放军空军通信士官学校 Deep reinforcement learning cognitive network power control method based on improved return function
CN114126025B (en) * 2021-11-02 2023-04-28 中国联合网络通信集团有限公司 Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server
CN114126025A (en) * 2021-11-02 2022-03-01 中国联合网络通信集团有限公司 Power adjustment method for vehicle-mounted terminal, vehicle-mounted terminal and server
CN114142912A (en) * 2021-11-26 2022-03-04 西安电子科技大学 Resource control method for guaranteeing time coverage continuity of high-dynamic air network
CN114360305A (en) * 2021-12-15 2022-04-15 广州创显科教股份有限公司 Classroom interactive teaching method and system based on 5G network
CN114928549A (en) * 2022-04-20 2022-08-19 清华大学 Communication resource allocation method and device of unauthorized frequency band based on reinforcement learning

Also Published As

Publication number Publication date
CN109474980B (en) 2020-04-28

Similar Documents

Publication Publication Date Title
CN109474980A (en) A kind of wireless network resource distribution method based on depth enhancing study
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN110493826A (en) A kind of isomery cloud radio access network resources distribution method based on deeply study
CN109600178B (en) Optimization method for energy consumption, time delay and minimization in edge calculation
Wang et al. Joint interference alignment and power control for dense networks via deep reinforcement learning
CN106358308A (en) Resource allocation method for reinforcement learning in ultra-dense network
CN113596785B (en) D2D-NOMA communication system resource allocation method based on deep Q network
CN110213776B (en) WiFi unloading method based on Q learning and multi-attribute decision
Zhu et al. Machine-learning-based opportunistic spectrum access in cognitive radio networks
CN107426773A (en) Towards the distributed resource allocation method and device of efficiency in Wireless Heterogeneous Networks
Xu et al. Resource allocation algorithm based on hybrid particle swarm optimization for multiuser cognitive OFDM network
CN106358300A (en) Distributed resource distribution method in microcellular network
Zhang et al. Deep learning based user association in heterogeneous wireless networks
Yu et al. Interference coordination strategy based on Nash bargaining for small‐cell networks
CN114423028B (en) CoMP-NOMA cooperative clustering and power distribution method based on multi-agent deep reinforcement learning
Ouamri et al. Double deep q-network method for energy efficiency and throughput in a uav-assisted terrestrial network
CN107071881A (en) A kind of small cell network distributed energy distribution method based on game theory
CN110139282A (en) A kind of energy acquisition D2D communication resource allocation method neural network based
CN115811788B (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
CN105873127A (en) Heuristic user connection load balancing method based on random decision
Li et al. Dynamic power allocation in IIoT based on multi-agent deep reinforcement learning
CN112243283B (en) Cell-Free Massive MIMO network clustering calculation method based on successful transmission probability
Chen et al. A reinforcement learning based joint spectrum allocation and power control algorithm for D2D communication underlaying cellular networks
Chen et al. A multi-agent reinforcement learning based power control algorithm for D2D communication underlaying cellular networks
CN115915454A (en) SWIPT-assisted downlink resource allocation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant