CN113115355A - Power distribution method based on deep reinforcement learning in D2D system - Google Patents
Power distribution method based on deep reinforcement learning in D2D system Download PDFInfo
- Publication number
- CN113115355A CN113115355A CN202110475005.XA CN202110475005A CN113115355A CN 113115355 A CN113115355 A CN 113115355A CN 202110475005 A CN202110475005 A CN 202110475005A CN 113115355 A CN113115355 A CN 113115355A
- Authority
- CN
- China
- Prior art keywords
- network
- link
- power
- agent
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0473—Wireless resource allocation based on the type of the allocated resource the resource being transmission power
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention belongs to the technical field of wireless communication, and particularly relates to a power distribution method based on deep reinforcement learning in a D2D system. In the scheme of the invention, a deep neural network is independently constructed for each link pair, the channel information of all links is not required to be obtained in real time, the communication environment around the current link is predicted according to partial historical information and decision information of other links, and the link pairs can be mutually matched to carry out real-time power decision so as to maximize the weighting and the speed of the global network, so that the power distribution method based on deep reinforcement learning without iteration is realized.
Description
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to a power distribution method based on deep reinforcement learning in a D2D system.
Background
Network operators worldwide have shown a strong interest in the development and application of 5G. The basic idea of 5G is to take advantage of direct connections between mobile users to relieve the base station of burden. To improve the energy efficiency of cellular networks and to improve system throughput, device-to-device (D2D) is considered a good and viable solution. In a D2D network, multiple pairs of D2D link pairs coexist with full frequency reuse in a cell, causing interference between links to become very complex. In the D2D scenario, the system capacity is generally optimized by performing interference management through power control, most of the conventional power control algorithms are implemented through continuous iteration based on real-time channel information, and real-time power adjustment is very difficult due to time-consuming channel estimation and complex matrix operation.
Disclosure of Invention
Aiming at the problems existing in the traditional power control, the invention provides a power distribution method based on deep reinforcement learning in a D2D system without iteration.
The technical scheme of the invention is as follows:
a power allocation method based on deep reinforcement learning in a D2D system, assuming that there are N pairs of link pairs, namely N agents, in the D2D system, includes the following steps:
s1, information collection: n pairs of link pairs respectively receive outdated channel, power information and decision information of other links from a Central Controller (CC) to obtain respective observation vectors;
s2, network construction: each link pair independently creates a network and establishes an experience storage pool (Replay Buffer) of the link pair;
s3, online decision and training network: and performing online power decision according to the past observation vector collected in the last time in the communication environment around the link at step S1, and storing the state, action, reward and observation vector obtained by the interaction of the intelligent agent and the environment into an experience pool. Meanwhile, each link randomly selects a group of data from the experience pool to train the network in the S2, network parameters are updated, and the network with the updated network parameters is used in the next online decision making.
The invention provides a power control model of a network based on deep reinforcement learning, which mainly comprises the following steps of online detection and training:
data: the D2D system provides channel information and power data for the offline module and the online module, respectively. For the offline module: the D2D system provides labeled sample data as a training set; for online modules: the D2D system provides (unlabeled) sampled data as detected data.
Network construction: and (3) independently constructing a network for each link according to a specific format, wherein the network is responsible for giving specific power decision and loss function of the network according to input information.
Performing on-line training: continuous power distribution is viewed as a multi-agent interworking task through online training. The system establishes a fixed-size experience pool (Replay Buffer) for each link pair to store data. Each link pair independently takes out data from an experience pool of the link pair, and then the output of on-line training reinforcement learning can be modeled as posterior probability, so that a cost function (for example, the cost function based on the maximum posterior probability designed by the invention) suitable for power distribution is developed; and giving a training set, and obtaining a trained network through continuous online training and feedback.
And (3) online decision making: and when the online training is carried out, the real-time power distribution result is taken as the power distribution result according to the power distribution result of the network. And meanwhile, storing data collected by the online decision into an experience pool as training data for later training. The effect of the online decision will be better and better along with the process of online training.
The invention uses a Linear rectification function (ReLU) as an activation function of each layer based on the input and hidden layers of the deep neural network
Relu(x)=log(1+exp x)
The output layer uses the tanh function to determine the final power output gear. The output value is:
the power distribution mechanism based on multi-agent deep reinforcement learning provided by the invention is a universal reinforcement learning framework, can be suitable for any type of network, and can be generalized to different networks.
The method has the advantages that a deep neural network is independently constructed for each link pair in the scheme of the invention, channel information of all links is not required to be obtained in real time, the communication environment around the current link is predicted according to partial historical information and decision information of other links, and all link pairs can be mutually matched to carry out real-time power decision so as to maximize the weighting and the speed of the global network, so that the power distribution method based on deep reinforcement learning without iteration is realized.
Drawings
Fig. 1 shows a D2D communication system model in the present invention;
fig. 2 shows a frame structure of a D2D communication system in the present invention;
FIG. 3 shows a network structure of each pair of links in the present invention
FIG. 4 shows a power decision flow for each pair of link users in the present invention;
FIG. 5 illustrates a comparison of performance of reinforcement learning based power allocation schemes and other power allocation schemes proposed by the present invention for different numbers of test links;
fig. 6 shows the variance of operator network training loss for a pair of links in the reinforcement learning based power allocation scheme proposed by the present invention.
Fig. 7 shows the criticc network training loss variation of a pair of links in the reinforcement learning based power allocation scheme proposed by the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
Fig. 1 shows a D2D network model in the present invention, the system is composed of a cellular mobile communication system and a D2D communication system, respectively. In this example, the macrocell base station reserves a small portion of the exclusive cellular spectrum for the D2D communication system. Therefore, the cellular mobile communication system and the D2D communication system do not interfere with each other, and the macro cellular base station only serves as a relay to help the D2D communication equipment to exchange control information with small amount and delay. Assume that there are M D2D communication devices, 1 channel, in this example system, as shown in fig. 1. Definition ofAre the channel parameters of transmitter i to receiver j. The correlation of the channels is defined as a first order gaussian markov process. The invention uses the Jakes model to express the change of the small-scale fading of the t frame, namely
WhereinRepresenting the channel gain from transmitter i to receiver j,representing the channel parameter from transmitter i to receiver j, the channel parameter at time zeroComplianceMean is μ and variance is σ2Complex gaussian. ρ refers to the channel correlation coefficient between different frames. ThetaijRepresenting large scale fading, compliant with the ITU-1411 outdoor model for a short range of 5 mhz bandwidth, 2.4 mhz carrier frequency.Indicating small scale fading is obeyedIndependent and equally distributed random variables. P represents the correlation coefficient of the adjacent time slot channel, obeying J0(2πfdT),J0Denotes a first class of zero-order besetsThe function of' ldIs the maximum doppler frequency. The large scale fading is related to the distance between two communication nodes, and the small scale fading remains constant within one frame, but varies from frame to frame. the signal to interference plus noise ratio (SINR) of user i at time t is
WhereinIndicating the power of user i at time t, usingTo represent the power vectors of all users in the network at time t. Sigma2Representing the power of additive white gaussian noise. At time t the rate of user i is
The present invention aims to find an efficient user association scheme that maximizes the sum of the rates of all D2D users, i.e.
WhereinRepresents the weight of user i at time t, typically assigned according to the user's long-term average rate. Weights ensure user fairness in the network by assigning weights by allowing users with poor channel conditions more transmission opportunities.
In a large D2D network, it is practically difficult to obtain real-time CSI due to the large overhead and latency of the backhaul network. Here, the invention assumes that only past information is available, and therefore only the expectation function of the real-time weighting and rate can be maximized:
wherein past informationThe problem in the above equation is that the non-convex function is difficult to solve using the conventional optimization method, and requires high-dimensional integration and complex matrix operation. The invention provides a method for directly obtaining a power distribution result by using past information and skipping complex matrix operation by using deep reinforcement learning.
Fig. 2 shows a frame structure of user communications in a D2D network in the present invention. The D2D link pair divides a data frame in one time slot into three parts. In the first part of the frame header, the D2D link pair receives the outdated interference information and the power decision information of the last time from the CC, and then inputs the processed information into the neural network for power decision. The second part, the D2D link pair, performs data transmission according to allocated power, while performing interference information collection in real time. Finally, the interference information at this time and its own power allocation information are transmitted to the CC in the third part of the tail of the frame.
Fig. 3 shows the decision flow of the present algorithm.
Fig. 4 shows the reinforcement learning network structure of each pair of links in the present invention. There are three Main components in the network, Replay Buffer, Main Net and Target Net.
The Replay Buffer is responsible for storing sample data tuples generated in the Main Net network, and the data tuples stored in the Replay Buffer can be taken out from the Replay Buffer according to a certain strategy in the network training process, wherein the strategy can be random or some designed weight selection strategies.
The network structures of the Main Net network and the Target Net network are completely consistent, and each network comprises an operator network and a critic network. The actor network is responsible for receiving the state information of the link and outputting a power decision value, and the critic network is responsible for evaluating the current output of the actor, namely judging whether the power decision is good or not.
The Main Net network has two functions: the method is mainly used for generating real-time data tuples and storing the data tuples into a Replay Buffer, and is also used for updating in real time after an Actor and a Critic network calculate loss functions.
The Target Net network has only one role, the Target Q value in calculating the loss function. The method is used for fixing the Q value to stabilize the network so as to prevent the target value from continuously jumping and the network training effect from being poor. The parameters of the Target Net network are overwritten by the parameters of the Main Net network after a fixed period of time or a fixed training number so as to update the parameters.
The following introduces more important variables in the network:
1) movement spaceIn each time slot, each agent needs to decide its own transmit power level. The network in the invention does not need power discretization, namely the network in the invention can make power decision on continuous action, which can not be realized by the traditional algorithm. Therefore, the movement space in the present inventionIs defined as:
thus, the dimensions of the network action space of the present invention are infinite. For link pair i, defineIn the action of time slot t, the agent is in [0, P ]max]Is arbitrarily selected from the range of values ofAnd (4) counting. Definition ofThe decision vector to be stored into the experience pool for the current link.
2) State space S: as a basis for power decisions, the state must provide the network with enough information to allow the agent to have sufficient knowledge of the surrounding communication environment and to support the network in making the correct decisions. In a communication network, the communication environment around a link consists of three parts: the communication quality between the transmitter and the receiver of the local transmitter, the interference of the local transmitter to other receivers and the interference of other transmitters to the local receiver. With the three pieces of information, the link can sense the surrounding communication environment. Definition ofAnd K is the state information set of the agent i in the time slot t, and the number of the state information is K. The following detailed descriptionOf (1).
For a particular pair of D2D, it is the local CSI that best represents the quality of the communication between the current transmitter and receiver so:
another determinant factor affecting the rate of the link is power information:
third, the rate of the link at the last time may also represent the communication environment around the link:
the interference of the link transmitter to other receivers is expressed as:
the interference of other link transmitters to the local link receiver is represented as:
sixthly, in the algorithm, the network can make an accurate power decision by sensing the communication environment of the surrounding links when making an independent decision. Therefore, it is necessary to inform the channel information of the links around the link. Thus:
in the above formula, d ═ rank (a, b) means that a ranks the d-th bit in descending order of values in the set b from large to small.
3) Reward functionIn order to make the link aware of the surrounding communication environment while maximizing the global and rate, three parts are considered in the design of the reward function.
First, the most direct feedback for measuring the quality of the primary power allocation of the link must be the transmission rate itself, so the first component of the reward function is
Secondly, it is desirable that the links learn to cooperate with each other, so if the reward function is only self and rate, it will certainly cause large interference to the surrounding links, so the interference information around the links is also added into the reward function. The classification of interference information is mainly divided into two categories, one is interference caused by the link to other link pairs due to the transmitted informationThe second is the interference caused by other links to the current link
wherein:
representing the rate at which link j has thrown the interference it has caused by link i. In addition:
indicating the rate that the current link can achieve if no remaining links have an impact on the current link i.
The meaning of the reward function (17) is that the rate of the link i at the current moment subtracts the influence of the current link on the actual rate of other links, and then adds the influence of other links on the rate of the link.
The overall algorithm flow is as follows for link i:
first, the current state is determinedInputting into main network to obtain currentAndand combining action training of other linksObtaining the state vector of the next momentWill be provided withStored as tuples into a data experience pool.
And thirdly, directly inputting the data tuples into the main network to obtain the evaluation value corresponding to the current latest strategy.
Fourthly, the data of the next time in the data tupleInput into secondary network to calculate action of next moment of current linkAnd utilize the next time action of the other linkAn evaluation value is calculated.
And finally, calculating a loss function more Main Net network according to the information. In addition, the network adopts a soft update mode to update the parameters of Target Net, namely, the parameters are updated a little bit each time training is carried out. This may reduce the variance of the network. In addition, it is worth emphasizing that the value range of the output of the operator network after passing through the activation function tanh is (-1,1), and the value range does not correspond to the magnitude of the power, and one operator network output x and power p are designediThe mapping relationship between:
pi=Pmax×(x+1)/2 (22)
in the following, the present invention will illustrate the performance of the proposed solution according to the simulation result. First, consider a network of 4D 2D link pairs. The transmitters of all the link pairs are randomly distributed in a square area with the side length of 50 meters, and the distances between the receivers and the transmitters of the link pairs are uniformly distributed between 2m and 50 m. The maximum transmission power of the D2D transmitter is set to p 38dBm and the background noise power is set to σ2At-114 dBm, doppler shift 10Hz, and correlation coefficient p between adjacent channels is 0.01. The path loss model is 32.45+20log10(f)+20log10(d)-Gt-Gr(in dB), where f (Mhz) is the carrierFrequency, d (km) is distance, GtDenotes the transmit antenna gain, GrRepresenting the receive antenna gain. The invention sets f to 2.4GHz, Gt=Gr2.5 dB. The multi-agent deep reinforcement learning algorithm is implemented using TensorFlow.
FIG. 5 illustrates a comparison of the performance of a multi-agent reinforcement learning-based power distribution scheme and other power distribution schemes in different test areas. The three comparison algorithms are a full power transmission strategy (MPT), an FP scheme utilizing real-time channel information, and an AA scheme all transmitting with maximum power. The network of the present invention is able to stabilize after 6w training when there are only 4 link pairs, and is even more surprising in performance. The algorithm of the invention can be about 20% better than the FP algorithm, and about 50% higher than the full-open AA algorithm. The excellent performance is shown on only four links, and the effectiveness of the algorithm is proved. It is worth emphasizing that the training of the algorithm herein is performed with 4 links varying constantly. Only such changing link locations can test whether the network of the present invention really learns to use interference data around the links to infer a real-time communication environment and make decisions. Some previous algorithms using reinforcement learning are trained under the condition that the geographical position of a link is not changed, although the training can achieve some good effect, the method has no significance in an actual communication system because the position of a link pair cannot be changed all the time, and once the position of the link pair is changed, the algorithms become invalid and need to be retrained. The significance of the algorithm herein is that the network does not need to be retrained while the location of the link pairs is constantly changing, so that the algorithm of the present invention can remain effective at all times.
Some of the loss changes of reinforcement learning during the training process are shown below, and some details of the training of the network of the present invention are shown here by taking the agent1 as an example, so that the unsupervised framework of the algorithm of the present invention is more clear and intuitive. First, fig. 6 shows the loss function loss curve of the operator network of a link pair. It can be seen from the figure that the loss function of the actor network is increasing until 4 ten thousand steps, indicating that the performance of the network is deteriorating. After training of about 4 ten thousand steps, the network finally explores a strategy for reducing the loss function, so that the loss function of the network can be reduced all the time. And after 6 ten thousand training, the loss function of the network finally tends to be stable. Second, the penalty function for the critic network is shown in FIG. 7, where it is desirable to minimize the critic network to reduce the gap between the actual and expected Q values. Within 3 million training steps, the change in the critic loss function is irregular. The network of the present invention is continuously searching, so the randomness of the action is relatively high, and the network of the present invention is continuously searching for different strategies. Consistent with the trend of the actor network, the loss function of the critic network also tends to be stable after about 4 ten thousand training sessions.
Claims (3)
1. A power allocation method based on deep reinforcement learning in a D2D system, wherein N pairs of link pairs, namely N agents, are assumed in the D2D system, and the method comprises the following steps:
s1, each agent receives outdated channel, power information and power decision information of other links from the central controller respectively to obtain respective observation vectors;
s2, each agent independently creates a power distribution network based on deep learning and establishes an experience storage pool;
s3, based on the outdated observation vector of the previous moment obtained in the step S1, performing online decision according to the power distribution network to obtain the power distribution result of the current moment, storing the state, the action, the reward and the observation vector obtained by interaction of the intelligent agent and the environment into an experience pool, simultaneously taking out data from respective experience storage pools to train the network, updating the network parameters, and using the network with the updated network parameters when performing online decision next time.
2. The method for power distribution based on deep reinforcement learning in D2D system of claim 1, wherein in step S2, the specific structure of the power distribution network created by each agent individually is: the power distribution network comprises a Main network for training and a Target network for calculation, wherein the input and the output of the Main network are connected with the experience storage pool;
the structure of the Main network and the Target network are completely the same, and the Main network and the Target network respectively comprise an operator network for receiving the state information of a link and outputting a power decision value and a critic network for evaluating the current output; the Main network is updated in real time after loss functions are calculated by the Actor and Critic networks, and the Target network is used for calculating a Target Q value and is used for fixing a Q value stabilizing network.
3. The method for power allocation based on deep reinforcement learning in D2D system of claim 2, wherein in step S3, the definition of state, action and reward obtained by interaction between agent and environment are:
defining statesIs the state information set of agent i in time slot t, K is the number of state information, wherein,for the channel gain from transmitter i to receiver j at the last time,in order to obtain the power information at the last time,for interference of the transmitter of the link to other receivers, s2Represents the power of additive white gaussian noise,for the link receiver to be interfered by other link transmitters,for the rate of the link at the last time,for the SINR ratio of user i at time t,for the channel information of the links around the present link, in order to be the information in the past,
defining an action spaceIs composed ofFor agent i, defineFor the decision vector that the agent currently wants to store into the experience pool,for the agent's action in time slot t, the agent is in [0, P ]max]Is arbitrarily chosen as a real number, PmaxMaximum power;
w is the weight of the target,represents the rate of the link j after the interference generated by the link i is removed;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110475005.XA CN113115355B (en) | 2021-04-29 | 2021-04-29 | Power distribution method based on deep reinforcement learning in D2D system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110475005.XA CN113115355B (en) | 2021-04-29 | 2021-04-29 | Power distribution method based on deep reinforcement learning in D2D system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113115355A true CN113115355A (en) | 2021-07-13 |
CN113115355B CN113115355B (en) | 2022-04-22 |
Family
ID=76720455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110475005.XA Active CN113115355B (en) | 2021-04-29 | 2021-04-29 | Power distribution method based on deep reinforcement learning in D2D system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113115355B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114257994A (en) * | 2021-11-25 | 2022-03-29 | 西安电子科技大学 | D2D network robust power control method, system, equipment and terminal |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109474980A (en) * | 2018-12-14 | 2019-03-15 | 北京科技大学 | A kind of wireless network resource distribution method based on depth enhancing study |
CN109729528A (en) * | 2018-12-21 | 2019-05-07 | 北京邮电大学 | A kind of D2D resource allocation methods based on the study of multiple agent deeply |
CN109862610A (en) * | 2019-01-08 | 2019-06-07 | 华中科技大学 | A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm |
CN110213814A (en) * | 2019-07-04 | 2019-09-06 | 电子科技大学 | A kind of distributed power distributing method based on deep neural network |
US20190370086A1 (en) * | 2019-08-15 | 2019-12-05 | Intel Corporation | Methods and apparatus to manage power of deep learning accelerator systems |
WO2020135312A1 (en) * | 2018-12-26 | 2020-07-02 | 上海交通大学 | Artificial neural network-based power positioning and thrust distribution apparatus and method |
CN111901862A (en) * | 2020-07-07 | 2020-11-06 | 西安交通大学 | User clustering and power distribution method, device and medium based on deep Q network |
CN112261725A (en) * | 2020-10-23 | 2021-01-22 | 安徽理工大学 | Data packet transmission intelligent decision method based on deep reinforcement learning |
-
2021
- 2021-04-29 CN CN202110475005.XA patent/CN113115355B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109474980A (en) * | 2018-12-14 | 2019-03-15 | 北京科技大学 | A kind of wireless network resource distribution method based on depth enhancing study |
CN109729528A (en) * | 2018-12-21 | 2019-05-07 | 北京邮电大学 | A kind of D2D resource allocation methods based on the study of multiple agent deeply |
WO2020135312A1 (en) * | 2018-12-26 | 2020-07-02 | 上海交通大学 | Artificial neural network-based power positioning and thrust distribution apparatus and method |
CN109862610A (en) * | 2019-01-08 | 2019-06-07 | 华中科技大学 | A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm |
CN110213814A (en) * | 2019-07-04 | 2019-09-06 | 电子科技大学 | A kind of distributed power distributing method based on deep neural network |
US20190370086A1 (en) * | 2019-08-15 | 2019-12-05 | Intel Corporation | Methods and apparatus to manage power of deep learning accelerator systems |
CN112396172A (en) * | 2019-08-15 | 2021-02-23 | 英特尔公司 | Method and apparatus for managing power of deep learning accelerator system |
CN111901862A (en) * | 2020-07-07 | 2020-11-06 | 西安交通大学 | User clustering and power distribution method, device and medium based on deep Q network |
CN112261725A (en) * | 2020-10-23 | 2021-01-22 | 安徽理工大学 | Data packet transmission intelligent decision method based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
JIAQI SHI: "Distributed Deep Learning Power Allocation for D2D Network Based on Outdated Information", 《 2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC)》 * |
吕亚平: "基于深度学习的家庭基站下行链路功率分配", 《计算机工程》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114257994A (en) * | 2021-11-25 | 2022-03-29 | 西安电子科技大学 | D2D network robust power control method, system, equipment and terminal |
CN114257994B (en) * | 2021-11-25 | 2024-04-26 | 西安电子科技大学 | Method, system, equipment and terminal for controlling robust power of D2D network |
Also Published As
Publication number | Publication date |
---|---|
CN113115355B (en) | 2022-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3635505B1 (en) | System and method for deep learning and wireless network optimization using deep learning | |
US10375585B2 (en) | System and method for deep learning and wireless network optimization using deep learning | |
Li et al. | Downlink transmit power control in ultra-dense UAV network based on mean field game and deep reinforcement learning | |
CN109962728B (en) | Multi-node joint power control method based on deep reinforcement learning | |
US11533115B2 (en) | Systems and methods for wireless signal configuration by a neural network | |
CN110213814B (en) | Distributed power distribution method based on deep neural network | |
WO2021036414A1 (en) | Co-channel interference prediction method for satellite-to-ground downlink under low earth orbit satellite constellation | |
CN111526592B (en) | Non-cooperative multi-agent power control method used in wireless interference channel | |
CN114698128B (en) | Anti-interference channel selection method and system for cognitive satellite-ground network | |
US11284361B2 (en) | System and method for device-to-device communication | |
CN106604288B (en) | Wireless sensor network interior joint adaptively covers distribution method and device on demand | |
CN113239632A (en) | Wireless performance prediction method and device, electronic equipment and storage medium | |
Adeel et al. | Critical analysis of learning algorithms in random neural network based cognitive engine for lte systems | |
CN113115355B (en) | Power distribution method based on deep reinforcement learning in D2D system | |
CN115499921A (en) | Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network | |
CN113382060B (en) | Unmanned aerial vehicle track optimization method and system in Internet of things data collection | |
Bhadauria et al. | QoS based deep reinforcement learning for V2X resource allocation | |
CN110505604B (en) | Method for accessing frequency spectrum of D2D communication system | |
CN115811788B (en) | D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning | |
CN110753367B (en) | Safety performance prediction method for mobile communication system | |
Liu et al. | A deep reinforcement learning based adaptive transmission strategy in space-air-ground integrated networks | |
CN113747386A (en) | Intelligent power control method in cognitive radio network spectrum sharing | |
Ren et al. | Joint spectrum allocation and power control in vehicular communications based on dueling double DQN | |
CN114268348A (en) | Honeycomb-free large-scale MIMO power distribution method based on deep reinforcement learning | |
Adeel et al. | Random neural network based power controller for inter-cell interference coordination in lte-ul |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |