CN109862610A - A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm - Google Patents

A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm Download PDF

Info

Publication number
CN109862610A
CN109862610A CN201910013868.8A CN201910013868A CN109862610A CN 109862610 A CN109862610 A CN 109862610A CN 201910013868 A CN201910013868 A CN 201910013868A CN 109862610 A CN109862610 A CN 109862610A
Authority
CN
China
Prior art keywords
user
channel
moment
phone user
data rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910013868.8A
Other languages
Chinese (zh)
Other versions
CN109862610B (en
Inventor
李强
张雪艳
楼瀚琼
葛晓虎
肖泳
黄晓庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910013868.8A priority Critical patent/CN109862610B/en
Publication of CN109862610A publication Critical patent/CN109862610A/en
Application granted granted Critical
Publication of CN109862610B publication Critical patent/CN109862610B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of D2D subscriber resource distribution methods based on deeply study DDPG algorithm, the present invention utilizes phone user and D2D user related information, optimal D2D user channel allocations and transmission power combined optimization strategy are obtained using deeply learning method, D2D user is by selecting suitable transmission power and distribution channel, to reduce the interference to phone user, the information rate of itself is maximized simultaneously, efficient resource allocation is realized in the case where not influencing phone user QoS, improve the handling capacity of cellular network, meet the requirement of green communications.DDPG algorithm effectively solves the problems, such as the combined optimization of D2D user channel allocations and power control, it not only shows and stablizes in a series of optimization of Continuous action spaces, and it acquires time step required for optimal solution and is also far less than DQN, compared with the DRL method based on value function, the depth-size strategy gradient method optimisation strategy based on AC frame is more efficient, solving speed faster.

Description

A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm
Technical field
The invention belongs to wireless communication technology fields, learn DDPG algorithm based on deeply more particularly, to one kind D2D subscriber resource distribution method.
Background technique
With the growing of local service is wirelessly communicated, cellular networks carry pressure is increasing.Terminal direct communication (D2D, Device-to-Device) technology allows adjacent terminal end under the control of base station as one of 5G Key Communication Technology, Data sharing is directly carried out from each other, forms data sharing network, shares the channel resource of cellular network to reach mitigation base Stand burden, promoted the availability of frequency spectrum, improve throughput of system purpose.
D2D communication is a kind of new technique for allowing directly to be communicated between terminal by sharing local resource, its energy The spectrum utilization efficiency for enough increasing cellular system, reduces terminal transmission power, lifting system at the load for mitigating cellular base stations Entire throughput solves the problems, such as that wireless communication system frequency spectrum resource is deficient to a certain extent.D2D user can use three kinds Mode is communicated: 1. honeycomb mode, the communication pattern as traditional cellular communication modes, i.e., by the relaying of base station come Realize the information transmission between two users.When the distance of two users farther out when, it will usually select honeycomb mode;2. dedicated channel Mode, under the mode, two users' direct communication does not need to relay by base station, uses dedicated channel;3. shared channel mould Formula, under the mode, two users' direct communication.It is different from dedicated channel mode, under shared channel mode, D2D user and shared bee Nest user (Cellular User, CU) shared channel.
In D2D model of communication system, D2D technical application can effectively be unloaded into base station flow into cellular communications network, The availability of frequency spectrum is improved, but D2D user can interfere the user accessed when sharing the channel of phone user, The performance for influencing user, causes system performance to decline.Therefore, how D2D user independently selects suitable communication channel and transmitting Power will directly affect the service quality of entire communication system.
Summary of the invention
In view of the drawbacks of the prior art, it is an object of the invention to solve D2D user in the prior art to use in shared honeycomb When the channel at family, the technical issues of being interfered to the user accessed, influence the performance of user.
To achieve the above object, in a first aspect, the embodiment of the invention provides one kind based on deeply study DDPG calculation The D2D subscriber resource distribution method of method uses shared channel pattern communication, the side between the D2D user and phone user Method the following steps are included:
Step S1. acquires the reachable data rate of D2D user and reachable data rate, the D2D of transmission power, phone user The shared channel information of user and phone user, and set the target data rate of phone user;
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and mesh The shared channel information for marking data rate, D2D user and phone user, establishes deeply learning model;
Step S3. utilizes DDPG algorithm optimization deeply learning model;
Step S4. obtains optimal D2D user emission power and channel distribution according to the deeply learning model after optimization Strategy.
Specifically, reachable data rate R of m-th of D2D user in moment tm(t) calculation formula is as follows:
Rm(t)=Blog2(1+Γm(t))
Wherein, B is channel width, ΓmIt (t) is reception SINR of m-th of D2D user in moment t,It is m-th D2D user is to the transmission power in moment t, PcFor the transmission power of phone user, hm(t) it is used for the D2D of composition D2D user couple Channel coefficients between family, hcIt (t) is phone user and the channel coefficients between the D2D user of its shared channel, σ1 2For bee Additive white Gaussian noise power in nest user and communication link between the D2D user of its shared channel;
With the phone user of m-th of D2D common user channel moment t reachable data rate Rc(t) calculation formula is such as Under:
Rc(t)=Blog2(1+Γc(t))
Wherein, B is channel width, ΓcIt (t) is phone user's connecing in moment t with m-th of D2D common user channel SINR is received,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, hc(t) it is Channel coefficients between phone user and base station, h 'm(t) channel coefficients between D2D user and base station, σ2 2For D2D user Additive white Gaussian noise power in communication link between base station, 1≤m≤M, M are that the D2D of base station signal coverage area is used Family is to total number.
Specifically, for m-th of D2D user couple, in moment t shared channel information are as follows:
IfThen n-th of channel has simultaneously by phone user and m-th of D2D user to sharingAnd i ≠ n, i.e.,M is the D2D of base station signal coverage area For user to total number, N is base station available channel sum.
Specifically, the deeply learning model of foundation includes:
State space is phone user to the satisfaction of service quality, is in moment t definition status
If m-th of D2D user shares n-th of channel, have
Wherein, RthFor the target data rate of phone user, RcIt (t) is the reachable data rate of phone user, In the state of moment t when for m-th of D2D user to shared nth channel;
The motion space of D2D user includes two variables of transmission power and shared channel, is indicated are as follows:
Wherein,For m-th of D2D user moment t transmission power,For n-th of channel quilt Phone user and m-th of D2D user share situation;
The reward function of D2D user are as follows:
Wherein, RcIt (t) is the reachable data rate of phone user, RthFor the target data rate of phone user, Rm(t) it is The reachable data rate of D2D user, Ψ are negative constant;
Valuation functionsIt indicates from stateStart, selection executes movementThe folding generated afterwards Button reward, Q value renewal function are as follows:
Wherein,For instant reward function, γ is discount factor,It is m-th of D2D user to sharing the In the state of moment (t+1) when n channel,It is m-th of D2D user in the movement of moment (t+1), A is movementThe motion space of composition, N are base station available channel sum.
Specifically, it is described using DDPG algorithm optimization deeply learning model specifically includes the following steps:
S301. rounds p is trained to be initialized as 1;
Time step t in S302.p bout is initialized as 1;
S303. online Actor strategy network is according to input state st, output action at, and obtain instant reward rt, together When go to NextState st+1, to obtain training data (st,at,rt,st+1);
S304. by training data (st,at,rt,st+1) be stored in experience replay pond;
S305. T training data (s of stochastical sampling from experience replay pondi,ai,ri,si+1) data set is constituted, it is sent to Online Actor strategy network, online Critic evaluation network, target Actor strategy network and target Critic evaluate network;
S306. the data set obtained according to sampling, target Actor strategy network is according to state si+1Output action a 'i+1, mesh Critic evaluation network is marked according to state si+1With the movement a ' of target Actor strategy network outputi+1, export valuation functions Q ' (si+1,a′i+1| θ ') give loss gradient functionOnline Critic evaluation network is according to state si, movement aiWith instant prize Encourage ri, export valuation functions Q (si,ai| θ) give Sampling Strategies gradientWith loss function gradientAccording to the loss function GradientUpdate online Critic evaluation network parameter θ;Online Actor strategy network will act aiIt exports to Sampling Strategies GradientAnd according toUpdate online Actor strategy network parameter δ, 1≤i≤T;
S307. target network parameter δ ' and θ ' are updated according to online network parameter δ and θ respectively:
δ′←τδ+(1-τ)δ′;
θ′←τθ+(1-τ)θ′;
Wherein, τ is the weight of online network parameter;
S308. judging whether to meet t < K, K is the total time step in p bout, if so, t=t+1, enters step S303, Otherwise, S309 is entered step;
S309. judging whether to meet p < I, I is training rounds given threshold, if so, p=p+1, enters step S302, Otherwise, optimization terminates, the deeply learning model after being optimized.
Specifically, parameter updates gradient formula are as follows:
Specifically, step S4 specifically: the status information s at input system momentm(t), optimal action policy is exportedObtain optimal D2D user emission powerWith distribution channel
Second aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums Computer program is stored in matter, which realizes D2D user described in above-mentioned first aspect when being executed by processor Resource allocation methods.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect Fruit:
1. the present invention utilizes phone user and D2D user related information, proposes that deeply learns optimisation strategy, utilize depth Degree intensified learning method obtains optimal D2D user channel allocations and transmission power combined optimization strategy, D2D user and passes through choosing Suitable transmission power and distribution channel are selected, to reduce the interference to phone user, while maximizing the information rate of itself, It does not influence to realize efficient resource allocation in the case where phone user QoS, improves the handling capacity of cellular network, it is logical to meet green The requirement of letter.
2. the present invention can effectively solve D2D user channel allocations using DDPG algorithm and the combined optimization of power control is asked Topic not only shows in a series of optimization of Continuous action spaces and stablizes, but also it is also remote to acquire time step required for optimal solution Far fewer than DQN, compared with the DRL method based on value function, the depth-size strategy gradient method optimisation strategy efficiency based on AC frame Higher, solving speed is faster.
Detailed description of the invention
Fig. 1 is a kind of D2D user resources distribution for learning DDPG algorithm based on deeply provided in an embodiment of the present invention Method flow diagram;
Fig. 2 is D2D user resources distribution model schematic diagram provided in an embodiment of the present invention;
Fig. 3 is the deeply learning framework schematic diagram provided in an embodiment of the present invention based on Actor-Critic model;
Fig. 4 is DDPG algorithm frame schematic diagram provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
It is an object of the present invention to pass through the transmission power and channel assignment strategy of combined optimization D2D user, bee is not being influenced Under the premise of nest QoS of customer, the information rate of D2D user is maximized, improves the availability of frequency spectrum.Utilize deep learning side Method applies to the DDPG algorithm frame based on AC in the system model, available optimal D2D user power control and That is, in cellular networks channel assignment strategy to available one group of optimal transmission power and shares letter to any D2D user Road information makes it maximally improve the capacity of network on the basis of guaranteeing phone user QoS.
As shown in Figure 1, a kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm, the D2D are used Shared channel pattern communication is used between family and phone user, the described method comprises the following steps:
Step S1. acquires the reachable data rate of D2D user and reachable data rate, the D2D of transmission power, phone user The shared channel information of user and phone user, and set the target data rate of phone user;
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and mesh The shared channel information for marking data rate, D2D user and phone user, establishes deeply learning model;
Step S3. utilizes DDPG algorithm optimization deeply learning model;
Step S4. obtains optimal D2D user emission power and channel distribution according to the deeply learning model after optimization Strategy.
Step S1. acquires the reachable data rate of D2D user and reachable data rate, the D2D of transmission power, phone user The shared channel information of user and phone user, and set the target data rate of phone user.
As shown in Fig. 2, having in base station (BS, a Base station) coverage area in D2D user resources distribution model Multiple phone users and D2D user.D2D user can only be divided by transmitting information, each channel with phone user's shared channel One phone user of dispensing uses, and each phone user at a time can only be with a pair of of D2D common user channel.Due to sharing Channel, phone user and D2D user can generate interference between each other.
Assuming that there is M D2D user couple in a base station signal coverage area, there is N number of phone user in base station, and distribution is N number of can With channel, it is assumed that each channel can only distribute to a phone user and use.
For m-th of D2D user couple, in moment t shared channel information are as follows:
IfThen n-th of channel has simultaneously by phone user and m-th of D2D user to sharingAnd i ≠ n, i.e.,
Assuming that work between the users of different channels there is no interference, calculate separately phone user and D2D user when Carve the instantaneous received signal interference-to-noise ratio (SINR) of t.
Reception SINR calculation formula of m-th of D2D user in moment t is as follows:
Wherein,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, hm (t) channel coefficients between the D2D user of composition D2D user couple, hc(t) it is used for phone user and the D2D of its shared channel Channel coefficients between family, σ1 2The additive Gaussian in communication link between phone user and the D2D user of its shared channel White noise acoustical power.
Reachable data rate calculation formula of the corresponding D2D user in moment t is as follows:
Rm(t)=Blog2(1+Γm(t))
Wherein, B is channel width, ΓmIt (t) is reception SINR of m-th of D2D user in moment t.
Reception SINR calculation formula with the phone user of m-th of D2D common user channel in moment t is as follows:
Wherein,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, hcChannel coefficients of ' (t) between phone user and base station, h 'm(t) channel coefficients between D2D user and base station, σ2 2For Additive white Gaussian noise power in communication link between D2D user and base station.
Reachable data rate calculation formula of the corresponding phone user in moment t is as follows:
Rc(t)=Blog2(1+Γc(t))
Wherein, B is channel width, ΓcIt (t) is phone user's connecing in moment t with m-th of D2D common user channel Receive SINR.
When the reachable data rate of phone user is more than or equal to the target data rate of phone user, phone user is to clothes Quality of being engaged in is satisfied;Otherwise, phone user is dissatisfied to service quality.By setting the target data rate of phone user, thus Control the service quality of communication system.
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and mesh The shared channel information for marking data rate, D2D user and phone user, establishes deeply learning model.
In order to efficiently solve the problems, such as the combined optimization in higher-dimension continuous space, using D2D user as intelligent body, it is strong to establish depth Change learning model, proposes that deeply learns optimisation strategy using phone user and D2D user related information.Guaranteeing honeycomb use Under the premise of the QoS of family, by the transmission power and channel assignment strategy of combined optimization D2D user, efficient resource allocation is realized, Improve power system capacity.
It is built in the case where known phone user's available channel and transmission power based on honeycomb and D2D model of communication system It stands using D2D user as the deeply learning model of intelligent body.Intensified learning mainly has 4 elements, i.e. strategy, reward, movement And environment.The target of intensified learning is one optimal policy of study, and the movement that intelligent body is selected obtains environment maximum Reward.Reward can be calculated with a function, also known as reward function.In order to measure the long-term effect of intensified learning, lead to Value function (value function) is commonly used to replace reward function, the not only instant reward of measurement movement is also measured from the shape State plays a series of reward that then possible states are accumulated.Environment, that is, state space, movement is exactly to allow in each state Motion space, reward be exactly select some movement enter some state obtain front or negative value.
The space state (State): definition status space is satisfaction of the phone user to service quality, is defined in moment t State is
If m-th of D2D user shares n-th of channel, have
Wherein, RthFor the target data rate of phone user, RcIt (t) is the reachable data rate of phone user,For In the state of moment t when m-th of D2D user is to shared nth channel;Rc(t)≥RthWhen, i.e. the honeycomb on nth channel The QoS of user is met,Rc(t) < RthWhen, i.e., the QoS of the phone user on nth channel is not expired Foot,
Act the space (Action): can be by adjusting the shared channel or transmission power of D2D user, to reduce to honeycomb The interference of user, while maximizing D2D user up to data rate, in m-th of D2D user of t moment to a function can only be selected Rate level and a shared channel can indicate so there are two variables in the motion space of D2D user are as follows:
Wherein,For m-th of D2D user moment t transmission power,For n-th of channel quilt Phone user and m-th of D2D user share situation, and A is movementThe motion space of composition.
Reward (Reward) function: D2D user takes corresponding movement that will obtain corresponding reward, defines D2D user Reward function are as follows:
Wherein, RcIt (t) is the reachable data rate of phone user, RthFor the target data rate of phone user, Rm(t) it is The reachable data rate of D2D user, Ψ are negative constant, represent the cost that a certain movement of selection needs to pay, i.e. movement cost.When When the QoS of phone user obtains meeting, the reachable data rate of D2D user is exactly its reward, otherwise, is punished a certain dynamic to select The cost of work.
In the present invention, deeply learning algorithm is established on the basis of Q study.Q study is a kind of reinforcement of model-free Learning algorithm, valuation functionsIt indicates from stateStart, selection executes movementAfter generate Maximum-discount reward, Q value renewal function are as follows:
Wherein,For reward function, γ is discount factor, represents the importance of the following reward, if γ is close to 0, D2D User mainly considers to reward at once;γ mainly looks ahead reward close to 1, D2D user;For m-th of D2D user In the state of moment (t+1) when to shared nth channel,For m-th of D2D user moment (t+1) movement.
Step S3. utilizes DDPG algorithm optimization deeply learning model.
Motion space in deeply study includes two variables of transmission power and shared channel, considers that transmission power exists Consecutive variations in a certain range, in order to solve this higher-dimension motion space, the especially combined optimization in Continuous action space is asked Topic introduces a kind of depth based on action family-reviewer (Actor-Critic, AC) frame by Q study in conjunction with neural network Spend deterministic policy gradient (Deep Deterministic Policy Gradient, DDPG) algorithm.In DDPG algorithm, both There is Actor strategy network, and there is Critic to evaluate network, the parameter of the two networks can be optimized by training.DDPG is calculated Method uses the Actor-Critic framework of intensified learning, is made of 4 neural networks: the identical Actor policy network of 2 structures Network, respectively online Actor strategy network and target Actor strategy network;The identical Critic of 2 structures evaluates network, point It Wei not online Critic evaluation network and target Critic evaluation network.Wherein, target Actor strategy network and target Critic Network is evaluated mainly for generation of training dataset, and online Actor strategy network and online Critic evaluation network are mainly used Optimize network parameter in training.As shown in figure 3, Actor is responsible for through Policy-Gradient learning strategy in AC frame, and Critic is responsible for estimating value function by Policy evaluation.One side Actor learning strategy, and stragetic innovation relies on Critic estimation Value function;Another aspect Critic estimates value function, and value function is the function of strategy.Strategy and value function each other according to Rely, influences each other, it is therefore desirable to iteration optimization in the training process.
The input of Actor strategy network is st, output is a certain movement at.Tactful network is used for strategic function approximation π (st| δ)≈π*(st), wherein δ is Actor strategy network parameter.Generally, π (st| δ) parameter δ should towards make Q value increase Direction updates.Define J (δ)=Es[Q(at,st| θ)], wherein EsExpectation, a are asked in () expressiont=π (st| δ), find D2D user The process of optimum behavior strategy maximizes the process of J (δ).
Critic evaluates the movement (s that the input of network is the state of D2D user's t moment and takest,at), output is corresponding Q (st,at| θ) and state st+1.Critic evaluates network and is used for valuation functions approximation Q (st,at|θ)≈Q*(st,at), wherein θ is that Critic evaluates network parameter, reduces the loss function between target network and online network by updating θ value:
Loss=E [Q ' (st,at′|θ′)-Q(st,at|θ)]2
Wherein, Q ' (st,at' | θ ') be target network valuation functions, Q (st,at| θ) be online network valuation functions.
The method for having used experience pond to play back in DDPG optimization algorithm.Deep neural network is wanted as supervised learning model It asks sample data mutually indepedent, but is highlights correlations in time by the sample that Q learning algorithm obtains, if these data sequences Column are directly used in training, will lead to the over-fitting of neural network, are not easy to restrain.DDPG algorithm by each timing node of intelligent body with Transfer sample (the s that environmental interaction obtainst,at,rt,st+1) be all stored in experience replay pond, then from experience replay pond with Machine extracts T sample data (si,ai,ri,si+1) Lai Xunlian neural network, sampling obtained data in this way may be considered mutually Between onrelevant, 1≤i≤T.
According to sample data (si,ai,ri,si+1) J (δ)=E can be obtaineds[Q(ai,si| θ)] and loss function Loss=E [Q′(si,ai′|θ′)-Q(si,ai|θ)]2, then carry out optimization neural network parameter using gradient descent method, it is public that parameter updates gradient Formula are as follows:
DDPG algorithm improves the learning efficiency of system, enhances the stability of learning process.Wherein, online network passes through Stochastic gradient descent (Stochastic Gradient Decent) scheduling algorithm utilizes gradient updating parameter, and target network passes through soft Update undated parameter.Target network Parameters variation is small, for providing some letters needed for online network updates in the training process Breath;Online network parameter real-time update, after every excessively specified step number, the parameter of online network can be copied to target network.Target The introducing of network keeps learning process more stable, and training is easy to restrain, and is exactly by the system of certain iterative steps after training Optimal system.
As shown in figure 4, it is described using DDPG algorithm optimization deeply learning model specifically includes the following steps:
S301. rounds p is trained to be initialized as 1;
Time step t in S302.p bout is initialized as 1;
S303. online Actor strategy network is according to input state st, output action at, and obtain instant reward rt, together When go to NextState st+1, to obtain training data (st,at,rt,st+1);
S304. by training data (st,at,rt,st+1) be stored in experience replay pond;
S305. T training data (s of stochastical sampling from experience replay pondi,ai,ri,si+1) data set is constituted, it is sent to Online Actor strategy network, online Critic evaluation network, target Actor strategy network and target Critic evaluate network;
S306. the data set obtained according to sampling, target Actor strategy network is according to state si+1Output action ai+1, mesh Critic evaluation network is marked according to state si+1With the movement a ' of target Actor strategy network outputi+1, export valuation functions Q ' (si+1,a′i+1| θ ') give loss gradient functionOnline Critic evaluation network is according to state si, movement aiWith instant prize Encourage ri, export valuation functions Q (si,ai| θ) give Sampling Strategies gradientWith loss function gradientAccording to the loss function GradientUndated parameter θ;Online Actor strategy network will act aiIt exports and gives Sampling Strategies gradientAnd according to Undated parameter δ, 1≤i≤T;
S307. target network parameter δ ' and θ ' are updated according to online network parameter δ and θ respectively:
δ′←τδ+(1-τ)δ′;
θ′←τθ+(1-τ)θ′。
S308. judging whether to meet t < K, K is the total time step in p bout, if so, t=t+1, enters step S303, Otherwise, S309 is entered step;
S309. judging whether to meet p < I, I is training rounds given threshold, if so, p=p+1, enters step S302, Otherwise, optimization terminates, the deeply learning model after being optimized.
Step S4. obtains optimal D2D user emission power and channel distribution according to the deeply learning model after optimization Strategy.
Using the trained deeply learning model of DDPG algorithm, the available optimal channel distribution of D2D user and Power control strategy, the status information s at input system momentm(t), optimal action policy is exportedObtain optimal D2D transmission powerWith distribution channelTo not influence The capacity of communication system is improved on the basis of phone user QoS.
More than, the only preferable specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any Within the technical scope of the present application, any changes or substitutions that can be easily thought of by those familiar with the art, all answers Cover within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.

Claims (8)

1. a kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm, which is characterized in that the D2D is used Shared channel pattern communication is used between family and phone user, the described method comprises the following steps:
The reachable data rate and transmission power, the reachable data rate of phone user, D2D user of step S1. acquisition D2D user With the shared channel information of phone user, and the target data rate of phone user is set;
Step S2. is according to the reachable data rate and transmission power of D2D user, the reachable data rate of phone user and number of targets According to rate, D2D user and the shared channel of phone user information, deeply learning model is established;
Step S3. utilizes DDPG algorithm optimization deeply learning model;
Step S4. obtains optimal D2D user emission power and channel distribution plan according to the deeply learning model after optimization Slightly.
2. D2D subscriber resource distribution method as described in claim 1, which is characterized in that m-th of D2D user can moment t's Up to data rate Rm(t) calculation formula is as follows:
Rm(t)=Blog2(1+Γm(t))
Wherein, B is channel width, ΓmIt (t) is reception SINR of m-th of D2D user in moment t,It is used for m-th of D2D Family is to the transmission power in moment t, PcFor the transmission power of phone user, hm(t) between the D2D user for composition D2D user couple Channel coefficients, hcIt (t) is phone user and the channel coefficients between the D2D user of its shared channel, σ1 2For phone user And the additive white Gaussian noise power in the communication link between the D2D user of its shared channel;
With the phone user of m-th of D2D common user channel moment t reachable data rate Rc(t) calculation formula is as follows:
Rc(t)=Blog2(1+Γc(t))
Wherein, B is channel width, Γc(t) for the phone user of m-th of D2D common user channel moment t reception SINR,It is m-th of D2D user to the transmission power in moment t, PcFor the transmission power of phone user, h 'cIt (t) is bee Channel coefficients between nest user and base station, h 'm(t) channel coefficients between D2D user and base station, σ2 2For D2D user with Additive white Gaussian noise power in communication link between base station, 1≤m≤M, M are the D2D user of base station signal coverage area To total number.
3. D2D subscriber resource distribution method as described in claim 1, which is characterized in that for m-th of D2D user couple, Moment t shared channel information are as follows:
IfThen n-th of channel has simultaneously by phone user and m-th of D2D user to sharingAnd i ≠ n, i.e.,M is the D2D of base station signal coverage area For user to total number, N is base station available channel sum.
4. D2D subscriber resource distribution method as described in claim 1, which is characterized in that the deeply learning model of foundation Include:
State space is phone user to the satisfaction of service quality, is in moment t definition status
If m-th of D2D user shares n-th of channel, have
Wherein, RthFor the target data rate of phone user, RcIt (t) is the reachable data rate of phone user,For m In the state of moment t when a D2D user is to shared nth channel;
The motion space of D2D user includes two variables of transmission power and shared channel, is indicated are as follows:
Wherein,For m-th of D2D user moment t transmission power,It is used for n-th of channel by honeycomb Family and m-th of D2D user share situation;
The reward function of D2D user are as follows:
Wherein, RcIt (t) is the reachable data rate of phone user, RthFor the target data rate of phone user, Rm(t) it is used for D2D The reachable data rate at family, Ψ are negative constant;
Valuation functionsIt indicates from stateStart, selection executes movementThe discount prize generated afterwards It encourages, Q value renewal function are as follows:
Wherein,For instant reward function, γ is discount factor,Shared nth is believed for m-th of D2D user In the state of moment (t+1) when road,It is m-th of D2D user in the movement of moment (t+1), A is movementStructure At motion space, N be base station available channel sum.
5. D2D subscriber resource distribution method as described in claim 1, which is characterized in that described deep using DDPG algorithm optimization Spend intensified learning model specifically includes the following steps:
S301. rounds p is trained to be initialized as 1;
Time step t in S302.p bout is initialized as 1;
S303. online Actor strategy network is according to input state st, output action at, and obtain instant reward rt, turn simultaneously To NextState st+1, to obtain training data (st,at,rt,st+1);
S304. by training data (st,at,rt,st+1) be stored in experience replay pond;
S305. T training data (s of stochastical sampling from experience replay pondi,ai,ri,si+1) data set is constituted, it is sent to online Actor strategy network, online Critic evaluation network, target Actor strategy network and target Critic evaluate network;
S306. the data set obtained according to sampling, target Actor strategy network is according to state si+1Output action a 'i+1, target Critic evaluates network according to state si+1With the movement a ' of target Actor strategy network outputi+1, export valuation functions Q ' (si+1,a′i+1| θ ') give loss gradient functionOnline Critic evaluation network is according to state si, movement aiWith instant prize Encourage ri, export valuation functions Q (si,ai| θ) give Sampling Strategies gradientWith loss function gradientAccording to the loss function GradientUpdate online Critic evaluation network parameter θ;Online Actor strategy network will act aiIt exports to Sampling Strategies GradientAnd according toUpdate online Actor strategy network parameter δ, 1≤i≤T;
S307. target network parameter δ ' and θ ' are updated according to online network parameter δ and θ respectively:
δ′←τδ+(1-τ)δ′;
θ′←τθ+(1-τ)θ′;
Wherein, τ is the weight of online network parameter;
S308. judging whether to meet t < K, K is the total time step in p bout, if so, t=t+1, enters step S303, otherwise, Enter step S309;
S309. judging whether to meet p < I, I is training rounds given threshold, if so, p=p+1, enters step S302, it is no Then, optimization terminates, the deeply learning model after being optimized.
6. D2D subscriber resource distribution method as claimed in claim 5, which is characterized in that parameter updates gradient formula are as follows:
7. D2D subscriber resource distribution method as described in claim 1, which is characterized in that step S4 specifically: input system The status information s at momentm(t), optimal action policy is exportedObtain optimal D2D user's hair Penetrate powerWith distribution channel
8. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize D2D user resources as described in any one of claim 1 to 7 point when being executed by processor Method of completing the square.
CN201910013868.8A 2019-01-08 2019-01-08 D2D user resource allocation method based on deep reinforcement learning DDPG algorithm Active CN109862610B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910013868.8A CN109862610B (en) 2019-01-08 2019-01-08 D2D user resource allocation method based on deep reinforcement learning DDPG algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910013868.8A CN109862610B (en) 2019-01-08 2019-01-08 D2D user resource allocation method based on deep reinforcement learning DDPG algorithm

Publications (2)

Publication Number Publication Date
CN109862610A true CN109862610A (en) 2019-06-07
CN109862610B CN109862610B (en) 2020-07-10

Family

ID=66894095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910013868.8A Active CN109862610B (en) 2019-01-08 2019-01-08 D2D user resource allocation method based on deep reinforcement learning DDPG algorithm

Country Status (1)

Country Link
CN (1) CN109862610B (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267338A (en) * 2019-07-08 2019-09-20 西安电子科技大学 Federated resource distribution and Poewr control method in a kind of D2D communication
CN110493826A (en) * 2019-08-28 2019-11-22 重庆邮电大学 A kind of isomery cloud radio access network resources distribution method based on deeply study
CN110505604A (en) * 2019-08-22 2019-11-26 电子科技大学 A kind of method of D2D communication system access frequency spectrum
CN110518580A (en) * 2019-08-15 2019-11-29 上海电力大学 A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing
CN110769514A (en) * 2019-11-08 2020-02-07 山东师范大学 Heterogeneous cellular network D2D communication resource allocation method and system
CN110784882A (en) * 2019-10-28 2020-02-11 南京邮电大学 Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111181618A (en) * 2020-01-03 2020-05-19 东南大学 Intelligent reflection surface phase optimization method based on deep reinforcement learning
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
CN111313996A (en) * 2020-03-31 2020-06-19 四川九强通信科技有限公司 AP channel allocation and power control joint optimization method based on reinforcement learning
CN111726811A (en) * 2020-05-26 2020-09-29 国网浙江省电力有限公司嘉兴供电公司 Slice resource allocation method and system for cognitive wireless network
CN112019249A (en) * 2020-10-22 2020-12-01 中山大学 Intelligent reflecting surface regulation and control method and device based on deep reinforcement learning
CN112187074A (en) * 2020-09-15 2021-01-05 电子科技大学 Inverter controller based on deep reinforcement learning
CN112202672A (en) * 2020-09-17 2021-01-08 华中科技大学 Network route forwarding method and system based on service quality requirement
CN112383965A (en) * 2020-11-02 2021-02-19 哈尔滨工业大学 Cognitive radio power distribution method based on DRQN and multi-sensor model
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112492686A (en) * 2020-11-13 2021-03-12 辽宁工程技术大学 Cellular network power distribution method based on deep double-Q network
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112533237A (en) * 2020-11-16 2021-03-19 北京科技大学 Network capacity optimization method for supporting large-scale equipment communication in industrial internet
CN112601284A (en) * 2020-12-07 2021-04-02 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning
CN112953601A (en) * 2019-12-10 2021-06-11 中国科学院深圳先进技术研究院 Application of optimization-driven hierarchical deep reinforcement learning in hybrid relay communication
CN112991384A (en) * 2021-01-27 2021-06-18 西安电子科技大学 DDPG-based intelligent cognitive management method for emission resources
CN113093124A (en) * 2021-04-07 2021-07-09 哈尔滨工程大学 DQN algorithm-based real-time allocation method for radar interference resources
CN113115355A (en) * 2021-04-29 2021-07-13 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113115344A (en) * 2021-04-19 2021-07-13 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113163426A (en) * 2021-04-25 2021-07-23 东南大学 High-density AP distribution scene GCN-DDPG wireless local area network parameter optimization method and system
CN113342537A (en) * 2021-07-05 2021-09-03 中国传媒大学 Satellite virtual resource allocation method, device, storage medium and equipment
CN113453358A (en) * 2021-06-11 2021-09-28 南京信息工程大学滨江学院 Joint resource allocation method of wireless energy-carrying D2D network
CN113473419A (en) * 2021-05-20 2021-10-01 南京邮电大学 Method for accessing machine type communication equipment to cellular data network based on reinforcement learning
CN113766661A (en) * 2021-08-30 2021-12-07 北京邮电大学 Interference control method and system for wireless network environment
CN113795049A (en) * 2021-09-15 2021-12-14 马鞍山学院 Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning
CN113923605A (en) * 2021-10-25 2022-01-11 浙江大学 Distributed edge learning system and method for industrial internet
CN113991654A (en) * 2021-10-28 2022-01-28 东华大学 Energy internet hybrid energy system and scheduling method thereof
CN114423070A (en) * 2022-02-10 2022-04-29 吉林大学 D2D-based heterogeneous wireless network power distribution method and system
CN114630299A (en) * 2022-03-08 2022-06-14 南京理工大学 Information age-perceptible resource allocation method based on deep reinforcement learning
CN114727316A (en) * 2022-03-29 2022-07-08 江南大学 Internet of things transmission method and device based on depth certainty strategy
CN115002720A (en) * 2022-06-02 2022-09-02 中山大学 Internet of vehicles channel resource optimization method and system based on deep reinforcement learning
CN116367223A (en) * 2023-03-30 2023-06-30 广州爱浦路网络技术有限公司 XR service optimization method and device based on reinforcement learning, electronic equipment and storage medium
CN116739323A (en) * 2023-08-16 2023-09-12 北京航天晨信科技有限责任公司 Intelligent evaluation method and system for emergency resource scheduling

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN108924935A (en) * 2018-07-06 2018-11-30 西北工业大学 A kind of power distribution method in NOMA based on nitrification enhancement power domain

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN108924935A (en) * 2018-07-06 2018-11-30 西北工业大学 A kind of power distribution method in NOMA based on nitrification enhancement power domain

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ACHRAF MOUSSAID等: "Deep Reinforcement Learning-based Data Transmission for D2D Communications", 《2018 14TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB)》 *
EDUARDO BEJAR等: "Deep Reinforcement Learning Based Neuro-Control for a Two-Dimensional Magnetic Positioning System", 《2018 4TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS》 *
JIAYING YIN等: "JOINT CONTENT POPULARITY PREDICTION AND CONTENT DELIVERY POLICY FOR CACHE-ENABLED D2D NETWORKS: A DEEP REINFORCEMENT LEARNING APPROACH", 《2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP)》 *

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110267338A (en) * 2019-07-08 2019-09-20 西安电子科技大学 Federated resource distribution and Poewr control method in a kind of D2D communication
CN110518580A (en) * 2019-08-15 2019-11-29 上海电力大学 A kind of active distribution network running optimizatin method for considering microgrid and actively optimizing
CN110518580B (en) * 2019-08-15 2023-04-28 上海电力大学 Active power distribution network operation optimization method considering micro-grid active optimization
CN110505604A (en) * 2019-08-22 2019-11-26 电子科技大学 A kind of method of D2D communication system access frequency spectrum
CN110505604B (en) * 2019-08-22 2021-07-09 电子科技大学 Method for accessing frequency spectrum of D2D communication system
CN110493826B (en) * 2019-08-28 2022-04-12 重庆邮电大学 Heterogeneous cloud wireless access network resource allocation method based on deep reinforcement learning
CN110493826A (en) * 2019-08-28 2019-11-22 重庆邮电大学 A kind of isomery cloud radio access network resources distribution method based on deeply study
CN110784882B (en) * 2019-10-28 2022-06-28 南京邮电大学 Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN110784882A (en) * 2019-10-28 2020-02-11 南京邮电大学 Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN110769514A (en) * 2019-11-08 2020-02-07 山东师范大学 Heterogeneous cellular network D2D communication resource allocation method and system
CN110769514B (en) * 2019-11-08 2023-05-12 山东师范大学 Heterogeneous cellular network D2D communication resource allocation method and system
CN112953601B (en) * 2019-12-10 2023-03-24 中国科学院深圳先进技术研究院 Application of optimization-driven hierarchical deep reinforcement learning in hybrid relay communication
CN112953601A (en) * 2019-12-10 2021-06-11 中国科学院深圳先进技术研究院 Application of optimization-driven hierarchical deep reinforcement learning in hybrid relay communication
CN111083767B (en) * 2019-12-23 2021-07-27 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111181618A (en) * 2020-01-03 2020-05-19 东南大学 Intelligent reflection surface phase optimization method based on deep reinforcement learning
CN111211831A (en) * 2020-01-13 2020-05-29 东方红卫星移动通信有限公司 Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method
CN111313996A (en) * 2020-03-31 2020-06-19 四川九强通信科技有限公司 AP channel allocation and power control joint optimization method based on reinforcement learning
CN111726811B (en) * 2020-05-26 2023-11-14 国网浙江省电力有限公司嘉兴供电公司 Slice resource allocation method and system for cognitive wireless network
CN111726811A (en) * 2020-05-26 2020-09-29 国网浙江省电力有限公司嘉兴供电公司 Slice resource allocation method and system for cognitive wireless network
CN112187074A (en) * 2020-09-15 2021-01-05 电子科技大学 Inverter controller based on deep reinforcement learning
CN112202672A (en) * 2020-09-17 2021-01-08 华中科技大学 Network route forwarding method and system based on service quality requirement
CN112202672B (en) * 2020-09-17 2021-07-02 华中科技大学 Network route forwarding method and system based on service quality requirement
CN112019249A (en) * 2020-10-22 2020-12-01 中山大学 Intelligent reflecting surface regulation and control method and device based on deep reinforcement learning
CN112383965B (en) * 2020-11-02 2023-04-07 哈尔滨工业大学 Cognitive radio power distribution method based on DRQN and multi-sensor model
CN112383965A (en) * 2020-11-02 2021-02-19 哈尔滨工业大学 Cognitive radio power distribution method based on DRQN and multi-sensor model
CN112492686A (en) * 2020-11-13 2021-03-12 辽宁工程技术大学 Cellular network power distribution method based on deep double-Q network
CN112492686B (en) * 2020-11-13 2023-10-13 辽宁工程技术大学 Cellular network power distribution method based on deep double Q network
CN112533237A (en) * 2020-11-16 2021-03-19 北京科技大学 Network capacity optimization method for supporting large-scale equipment communication in industrial internet
CN112492691A (en) * 2020-11-26 2021-03-12 辽宁工程技术大学 Downlink NOMA power distribution method of deep certainty strategy gradient
CN112492691B (en) * 2020-11-26 2024-03-26 辽宁工程技术大学 Downlink NOMA power distribution method of depth deterministic strategy gradient
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112601284B (en) * 2020-12-07 2023-02-28 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning
CN112601284A (en) * 2020-12-07 2021-04-02 南京邮电大学 Downlink multi-cell OFDMA resource allocation method based on multi-agent deep reinforcement learning
CN112991384A (en) * 2021-01-27 2021-06-18 西安电子科技大学 DDPG-based intelligent cognitive management method for emission resources
CN112991384B (en) * 2021-01-27 2023-04-18 西安电子科技大学 DDPG-based intelligent cognitive management method for emission resources
CN113093124A (en) * 2021-04-07 2021-07-09 哈尔滨工程大学 DQN algorithm-based real-time allocation method for radar interference resources
CN113115344B (en) * 2021-04-19 2021-12-14 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113115344A (en) * 2021-04-19 2021-07-13 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113163426A (en) * 2021-04-25 2021-07-23 东南大学 High-density AP distribution scene GCN-DDPG wireless local area network parameter optimization method and system
CN113115355A (en) * 2021-04-29 2021-07-13 电子科技大学 Power distribution method based on deep reinforcement learning in D2D system
CN113473419B (en) * 2021-05-20 2023-07-07 南京邮电大学 Method for accessing machine type communication device into cellular data network based on reinforcement learning
CN113473419A (en) * 2021-05-20 2021-10-01 南京邮电大学 Method for accessing machine type communication equipment to cellular data network based on reinforcement learning
CN113453358A (en) * 2021-06-11 2021-09-28 南京信息工程大学滨江学院 Joint resource allocation method of wireless energy-carrying D2D network
CN113342537B (en) * 2021-07-05 2023-11-14 中国传媒大学 Satellite virtual resource allocation method, device, storage medium and equipment
CN113342537A (en) * 2021-07-05 2021-09-03 中国传媒大学 Satellite virtual resource allocation method, device, storage medium and equipment
CN113766661B (en) * 2021-08-30 2023-12-26 北京邮电大学 Interference control method and system for wireless network environment
CN113766661A (en) * 2021-08-30 2021-12-07 北京邮电大学 Interference control method and system for wireless network environment
CN113795049A (en) * 2021-09-15 2021-12-14 马鞍山学院 Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning
CN113795049B (en) * 2021-09-15 2024-02-02 马鞍山学院 Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning
CN113923605A (en) * 2021-10-25 2022-01-11 浙江大学 Distributed edge learning system and method for industrial internet
CN113991654A (en) * 2021-10-28 2022-01-28 东华大学 Energy internet hybrid energy system and scheduling method thereof
CN113991654B (en) * 2021-10-28 2024-01-23 东华大学 Energy internet hybrid energy system and scheduling method thereof
CN114423070A (en) * 2022-02-10 2022-04-29 吉林大学 D2D-based heterogeneous wireless network power distribution method and system
CN114423070B (en) * 2022-02-10 2024-03-19 吉林大学 Heterogeneous wireless network power distribution method and system based on D2D
CN114630299A (en) * 2022-03-08 2022-06-14 南京理工大学 Information age-perceptible resource allocation method based on deep reinforcement learning
CN114630299B (en) * 2022-03-08 2024-04-23 南京理工大学 Information age perceivable resource allocation method based on deep reinforcement learning
CN114727316B (en) * 2022-03-29 2023-01-06 江南大学 Internet of things transmission method and device based on depth certainty strategy
CN114727316A (en) * 2022-03-29 2022-07-08 江南大学 Internet of things transmission method and device based on depth certainty strategy
CN115002720A (en) * 2022-06-02 2022-09-02 中山大学 Internet of vehicles channel resource optimization method and system based on deep reinforcement learning
CN116367223A (en) * 2023-03-30 2023-06-30 广州爱浦路网络技术有限公司 XR service optimization method and device based on reinforcement learning, electronic equipment and storage medium
CN116367223B (en) * 2023-03-30 2024-01-02 广州爱浦路网络技术有限公司 XR service optimization method and device based on reinforcement learning, electronic equipment and storage medium
CN116739323B (en) * 2023-08-16 2023-11-10 北京航天晨信科技有限责任公司 Intelligent evaluation method and system for emergency resource scheduling
CN116739323A (en) * 2023-08-16 2023-09-12 北京航天晨信科技有限责任公司 Intelligent evaluation method and system for emergency resource scheduling

Also Published As

Publication number Publication date
CN109862610B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN109862610A (en) A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm
CN109803344B (en) A kind of unmanned plane network topology and routing joint mapping method
Wilhelmi et al. Collaborative spatial reuse in wireless networks via selfish multi-armed bandits
Li et al. Incentive mechanisms for device-to-device communications
CN110493826A (en) A kind of isomery cloud radio access network resources distribution method based on deeply study
CN109474980A (en) A kind of wireless network resource distribution method based on depth enhancing study
CN102006658B (en) Chain game based synergetic transmission method in wireless sensor network
CN109729528A (en) A kind of D2D resource allocation methods based on the study of multiple agent deeply
Zhou et al. The partial computation offloading strategy based on game theory for multi-user in mobile edge computing environment
CN104579523B (en) Cognition wireless network frequency spectrum perception and the access united optimization method of decision-making
CN102833759B (en) Cognitive radio spectrum allocation method enabling OFDM (orthogonal frequency division multiplexing) master user to realize maximum revenue
CN102438313B (en) Communication alliance dispatching method based on CR (cognitive radio)
Ji et al. Power optimization in device-to-device communications: A deep reinforcement learning approach with dynamic reward
CN109819422B (en) Stackelberg game-based heterogeneous Internet of vehicles multi-mode communication method
CN107105455A (en) It is a kind of that load-balancing method is accessed based on the user perceived from backhaul
CN113316154A (en) Authorized and unauthorized D2D communication resource joint intelligent distribution method
Han et al. Joint resource allocation in underwater acoustic communication networks: A game-based hierarchical adversarial multiplayer multiarmed bandit algorithm
CN113795049A (en) Femtocell heterogeneous network power self-adaptive optimization method based on deep reinforcement learning
Yan et al. Self-imitation learning-based inter-cell interference coordination in autonomous HetNets
Mohanavel et al. Deep Reinforcement Learning for Energy Efficient Routing and Throughput Maximization in Various Networks
CN114051252A (en) Multi-user intelligent transmitting power control method in wireless access network
Benamor et al. Mean field game-theoretic framework for distributed power control in hybrid noma
CN103957565B (en) Resource allocation methods based on target SINR in distributed wireless networks
Balcı et al. Fairness aware deep reinforcement learning for grant-free NOMA-IoT networks
CN115811788A (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190607

Assignee: WUHAN JINGLI ELECTRONIC TECHNOLOGY Co.,Ltd.

Assignor: HUAZHONG University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2022420000134

Denomination of invention: A D2D User Resource Allocation Method Based on Deep Reinforcement Learning DDPG Algorithm

Granted publication date: 20200710

License type: Common License

Record date: 20221125

EE01 Entry into force of recordation of patent licensing contract