CN111491358B - Adaptive modulation and power control system based on energy acquisition and optimization method - Google Patents

Adaptive modulation and power control system based on energy acquisition and optimization method Download PDF

Info

Publication number
CN111491358B
CN111491358B CN202010325108.3A CN202010325108A CN111491358B CN 111491358 B CN111491358 B CN 111491358B CN 202010325108 A CN202010325108 A CN 202010325108A CN 111491358 B CN111491358 B CN 111491358B
Authority
CN
China
Prior art keywords
transmitter
action
power
average
receiver
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010325108.3A
Other languages
Chinese (zh)
Other versions
CN111491358A (en
Inventor
杨佳雨
胡杰
杨鲲
冷甦鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010325108.3A priority Critical patent/CN111491358B/en
Publication of CN111491358A publication Critical patent/CN111491358A/en
Application granted granted Critical
Publication of CN111491358B publication Critical patent/CN111491358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/262TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account adaptive modulation and coding [AMC] scheme
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an adaptive modulation and power control system and an optimization method based on energy acquisition, which are applied to the technical field of wireless communication networks, and the system comprises the following components: the device comprises a transmitter, a receiver, a Rayleigh fading channel and a channel estimation module; the transmitter adaptively adjusts the transmitting power and the modulation mode of the transmitter under the constraints of average power limit, average bit error rate limit and average energy harvesting limit according to the feedback information of the channel estimation module; the receiver adaptively adjusts the power division factor; the receiver comprises a rechargeable battery, a part of received energy is stored in the battery by the receiver in a power division mode, and the rest energy is used for transmitting data to the transmitter through a Rayleigh fading channel; the problem of the energy supply of the low-power-consumption receiver of the future Internet of things is effectively solved, and the vision of a green network is realized.

Description

Adaptive modulation and power control system based on energy acquisition and optimization method
Technical Field
The invention belongs to the technical field of wireless communication networks, and particularly relates to a self-adaptive link technology applied to an SWIPT system.
Background
In recent years, Simultaneous Wireless Information and Power Transfer (SWIPT) has received considerable attention to extend the life of energy-limited nodes. In a SWIPT application scenario, a transmitter sends information and energy to a receiver over a wireless channel. In a conventional transmission scheme, a modulation mode and transmission power are fixed, which is called a non-adaptive scheme. This scheme does not take full advantage of the time-varying fading channel. And it is a non-adaptive system designed according to the worst case of the channel state in order to ensure reliable transmission in various states of the time-varying channel. The design principle of the system may result in an inefficient use of the channel capacity. In order to obtain maximum throughput under different channel conditions, it is necessary to introduce adaptive link techniques (including adaptive modulation, adaptive power control, adaptive energy transfer control) into the SWIPT system.
In addition, the artificial intelligence technique is good in hand. The system is applied to various fields at present by virtue of the characteristic that machine equipment and the like can sense more intelligently like human beings and make certain feedback with the environment. In the field of communications, artificial intelligence techniques are also applied to the various communication layers. For example, the physical layer may perform intelligent modulation and coding by deep learning, the MAC layer may perform certain resource allocation according to reinforcement learning, and the network layer may intelligently help each device to find an optimal route. The combination of communication and machine learning is making networks more intelligent.
Unlike conventional adaptive link techniques, in a SWIPT system, since a receiver operates using only energy collected from an energy signal received from a wireless channel, there is a tradeoff between an amount of information transfer and an amount of energy transfer, and thus the adaptive link control scheme must be designed to simultaneously optimize the collected energy in addition to considering the optimization of throughput, thereby ensuring the performance and stability of the system. In the conventional optimization method, although a time-varying channel is considered, the assumption that the channel transition probability is known in the system in the related research is not reasonable because it is difficult to accurately estimate the channel transition probability in the real world.
Disclosure of Invention
In order to solve the technical problems, the invention provides an adaptive modulation and power control scheme and an optimization method based on energy collection and deep reinforcement learning.
The technical scheme adopted by the invention is as follows: an adaptive modulation link control system comprising: the device comprises a transmitter, a receiver, a Rayleigh fading channel and a channel estimation module;
the transmitter adaptively adjusts the transmitting power and the modulation mode of the transmitter under the constraints of average power limit, average bit error rate limit and average energy harvesting limit according to the feedback information of the channel estimation module;
the receiver comprises a rechargeable battery, a part of received energy is stored in the battery by the receiver in a power division mode, and the rest energy is used for transmitting data to the transmitter through a Rayleigh fading channel.
The transmitter maintains two deep neural networks, which are respectively noted as: a target network and an evaluation network, wherein the target network is used for selecting an action strategy and outputting an expected return value corresponding to the selected action strategy
Figure GDA0003153907880000021
Representing a reward function, the evaluation network serving to evaluate the value of the function Q(s) at the current momentt,at) And (6) estimating. The action strategy refers to a modulation mode.
The second technical scheme adopted by the invention is as follows: a deep neural network optimization method based on deep reinforcement learning comprises the following steps:
b1, randomly initializing and evaluating the weight parameter theta of the network and the weight theta of the target network-
B2, the transmitter obtains the maximum Q(s) according to the current evaluation networkt,at) Act a oftThe target network performs action a with a probability of 1-epsilontRandomly selecting actions from the action candidate set according to the probability of epsilon for exploration;
b3, each action corresponds to a reward function, and the state of the transmitter is changed from stIs transferred to st+1
B4, controlling the samples(s) stored in the experience pool by adopting a sliding windowt,at,rt,st+1),stRepresenting the state of the transmitter at time t, st+1Representing the state of the transmitter at time t +1, rtExpressing a return function value at the t moment;
b5, taking samples from the experience pool by the evaluation network and the target network, and carrying out a back propagation algorithm based on gradient descent to update network parameters;
b6, assigning the weight parameter of the evaluation network to the target network so that theta-=θ。
The action candidate set in step B2 specifically includes: the transmitter obtains a maximum function Q(s) from the evaluation networkt,at) Corresponding action policy atAnd selecting action strategies with the same or adjacent orders with the action strategies to form an action candidate set.
The reward function of step B3 is set according to constraints including average power limit, average bit error rate limit, and average energy harvesting limit.
When the constraint is satisfied, the return function takes the value of the frequency spectrum utilization rate corresponding to the action;
when the constraint is not satisfied, the reward function takes a negative value equal to the degree to which the constraint is not satisfied.
Step B4 further includes initially setting the sliding window to 2.
The invention has the beneficial effects that: according to the invention, an energy acquisition technology and a wireless communication technology are combined, so that the problem of energy supply of a low-power receiver of the Internet of things in the future is effectively solved, and the vision of a green network is realized. Meanwhile, based on a deep reinforcement learning technology, intelligent decision is made for intelligent nodes in the network, and a Priority Experience Generation (PEG) technology is used to improve the convergence of the deep reinforcement learning algorithm, so that the reinforcement learning algorithm can converge and learn a strategy with higher performance in a digital integrated transmission scene. The strategy is applied to a data-energy integrated cooperative transmission scene, so that the wireless network is more intelligent.
Drawings
Fig. 1 is a flowchart of an adaptive link control design and optimization method based on energy collection and deep reinforcement learning according to the present invention.
Fig. 2 is a system diagram of adaptive modulation, adaptive power control and adaptive energy control according to the present invention.
FIG. 3 illustrates the location of PEG neutralization implementation in the reinforcement learning algorithm of the present invention, in comparison to conventional Per experience Playback (PER);
wherein, fig. 3(a) is that the priority experience playback mechanism can take effect when the good experience and bad experience are balanced; fig. 3(b) shows a case where the PER fails, and when all the experience pools are bad experience, effective learning cannot be achieved even though the PER is used; FIG. 3(c) is a diagram illustrating the principle of the Preferred Empirical Generation (PEG) proposed by the present invention.
Fig. 4 is a deep reinforcement learning DQN algorithm framework of an embodiment of the invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the following further explains the technical contents of the present invention with reference to fig. 1 to 4.
For the understanding of the present invention, the following terms will be explained first:
WPT: wireless energy transfer.
WIT: and (4) wireless information transmission.
h: the channel gain.
(st,at,rt,st+1): transitions format of reinforcement learning algorithm.
The epsilon-greedy strategy: exploration strategy of original DQN.
γ: instantaneous signal-to-noise ratio.
Figure GDA0003153907880000031
: average error rate performance.
Figure GDA0003153907880000032
: average energy harvesting performance.
Figure GDA0003153907880000041
: the average transmit power.
ρPS: a power division factor.
Mod: and (4) modulation mode.
P0: a constrained target value for the average harvest energy.
Pt0: a target value for the average transmit power constraint.
BER0: average bit error rateIs determined.
In the case where the channel transition probability is unknown, a reinforcement learning based approach may be very effective. In the reinforcement learning method, the optimal control strategy is learned by repeatedly interacting with the environment (i.e., the channel) without assuming prior information of the channel transition probability.
The invention provides a self-adaptive link control design and optimization method based on energy collection and deep reinforcement learning, which comprises the following steps as shown in figure 1:
s1, constructing a self-adaptive link control system (including self-adaptive power control, self-adaptive modulation and self-adaptive energy control) based on an energy acquisition technology;
in this embodiment, a point-to-point SWIPT system consisting of one intelligent transmitter and one receiver is considered. They all have only one antenna. The receiver is assumed to have a rechargeable battery. The receiver uses power splitting to store a portion of the received energy in a battery, and the remaining energy is used to transmit data to the transmitter. The transmitter adaptively adjusts the transmitting power and the modulation mode of the transmitter under the constraints of average power limit, average bit error rate limit and average energy harvesting limit according to the feedback information of channel estimation, and the receiver adaptively adjusts the modulation rate division factor. For example, the transmitter makes adjustments to the transmit power based on various performance indicators of the current system operation, selects an appropriate modulation scheme based on a policy, and stabilizes the average performance indicator within the performance constraints by dividing the power, some for energy transmission, some for data transmission, and adaptively adjusting the ratio during division (adaptive power division factor control).
We consider the transmitter to have complete Channel State Information (CSI), and assuming the wireless channel is quasi-static fading and rayleigh flat fading, the downlink channel gain from the transmitter to the receiver can be expressed as
g=|h|2α
Where α denotes the components of large scale fading, including path fading and lognormal shadow fading, which will remain unchanged over multiple time slots. Based on a first-order Gauss-Markov process, the invention considers a correlation time-varying fading channel of small-scale Rayleigh fading component h variation.
ht=ρht-1+et
Wherein h-CN (0,1) is a cyclic symmetric complex Gaussian function (CSCG) of unit variance, and the channel updating process e1,e2… … consists of CSCG random variables with distributed independent identity distributions, satisfying CN (0, 1-rho)2) Correlation coefficient ρ ═ J0(2πfdT), here J)0(. is a zero-order Bessel function of the first kind, fdThe maximum doppler frequency.
The correlation profile of the channel (i.e., the probability distribution of channel transitions) is assumed to be unknown to the system. The distance between the transmitter and the receiver is d and the path loss coefficient is λ. Suppose the transmitter has a transmit power of PtThe average received power of energy reception is PrThe power division factor is rhoPSNoise power of σ2The received signal-to-noise ratio (SINR) may be expressed as
Figure GDA0003153907880000051
For energy harvesting, we use a general linear model, assuming EH circuit conversion efficiency as a constant η. For ease of analysis, the symbol period is set to 1. Considering the EH power threshold, the EH circuit output power PEHIs determined by the following formula. Wherein P isthIndicating a received power threshold, (a)+Denotes max (a, 0).
PEH=η(Pr-Pth)+
S2, designing a reinforcement learning improvement technology PEG suitable for the adaptive modulation scene according to the defects in the traditional deep reinforcement learning. The prior experience generation comprises two parts, namely, the intelligent transmitter is enabled to conduct efficient exploration during training. Second, controlling the state s of the next experiencet+1
Let intelligent transmitter efficiently explore when training: when performing an action, only selectThose actions that may become the optimal strategy are selected (these actions do not cause the average performance to fluctuate too much), and those actions that are significantly worse (actions that cause the average performance metric to drift significantly) are not considered. So for those bad action strategies, no computing power needs to be wasted to try and error the learning. So compared with the original DQN algorithm, we do not adopt the epsilon-greedy strategy for exploration any more. Combining with the rule of adaptive modulation scenario, after obtaining an action strategy that is relatively in line with the current performance constraint, if an action strategy with higher performance is to be obtained by exploration, only a modulation mode with the same or adjacent order as the current action strategy may be selected. According to the characteristics of the adaptive modulation scene, a new exploration action strategy is designed, and the current optimal strategy alpha is obtained each timetThe search is performed with a probability epsilon (epsilon may take a larger value to enhance the search, e.g., 0.4, at the beginning of the search; then epsilon gradually decreases with training and finally decays to 0.05). The action candidate set is updated according to the above rule (i.e. the modulation mode with the same or adjacent order as the current action strategy is selected). And finally, randomly selecting an action strategy from the reduced action candidate set for training. After the algorithm learns a better decision, the search strategy can reduce the action space to be searched and accelerate the process of searching the optimal decision scheme by the algorithm. The motion candidate set here is a subset of the motion space.
The selection of modulation modes comprises BPSK, 4QPSK, 8QAM, 16QAM, 64QAM, 256QAM and other code types, if the current alpha istIf the corresponding modulation mode is 64QAM, the action candidate set in this embodiment is composed of 16QAM, 64QAM, and 256 QAM; 4QAM is not selected because the order difference is too large.
Controlling the state s of the next experiencet+1And reducing the transition which generates too much deviation from the optimal strategy: in the scenario of this embodiment, it is desirable that the state sequences experienced in the training process all satisfy the performance constraint. The experience near the performance index is learned, so that a useful strategy is easier to learn, rapid convergence is realized, and better performance is obtained. However, during algorithm trainingSince the smart transmitter has not completely learned the appropriate strategy, the strategy obtained by the "trial and error" exploration will produce a transition out of the constraint, in which case the experience generated will have no learning value for the subsequent exploration. To this end, we introduce a "forgetting mechanism" (implemented with sliding windows). After the intelligent transmitter makes a mistake, we forget the influence of the mistake for a short time. For example, after a single error action is performed, the average bit error rate performance deviates from the constraint range, and after several state transitions, the action is removed in the sliding window (in the initial stage of exploration, because the algorithm does not learn a good strategy, a large amount of bad experience is generated, the size of the sliding window can take a smaller value, for example, only 2 steps of information are stored, and then as the training progresses, the sliding window is gradually increased to allow the algorithm to perform longer performance consideration), and the average bit error rate performance returns to a state which is more in accordance with the constraint. Therefore, the method can be realized, the algorithm can automatically jump out of the bad state in a short time, and more state transitions meeting performance constraints appear in the experience pool.
The influence of different experiences on the algorithm is different, the experience according with the current constraint can enable the intelligent transmitter to better learn a high-performance strategy, and the experience deviating from the constraint has little help effect on the learning of the intelligent transmitter and even has an adverse effect. Experience that meets the constraints is hereinafter referred to as good experience, and experience that deviates from the constraints is hereinafter referred to as bad experience. As shown in fig. 3(a), when the ratio of good experience to bad experience is balanced, the prior experience Playback (PER) mechanism can be effective, that is, the learning effect is optimized by sampling good experience with higher frequency; fig. 3(b) shows a case where the PER fails, and when all the experience pools are bad experience, effective learning cannot be achieved even though the PER is used; fig. 3(c) is the action principle of the Preferred Experience Generation (PEG) proposed by the present invention, and uses a simple Experience Replay (ER), and modifies the exploration of the environment by the agent ((r) in fig. 3 (c)) and the process of generating samples from the environment and putting them into the experience pool ((r) in fig. 3 (c)) to make more experiences that meet the current constraints be put into the experience pool.
And S3, carrying out optimization decision based on deep reinforcement learning aiming at the intelligent transmitter in the system. The method comprises the following steps:
s31, determining the error rate performance and the energy harvesting performance of the receiver;
Figure GDA0003153907880000071
Figure GDA0003153907880000072
Figure GDA0003153907880000073
s32, determining the state value and the state space of the deep reinforcement learning of the transmitter;
as can be seen from the optimization problem of this embodiment, the optimization target (i.e. the average channel capacity) is closely related to the current signal-to-noise ratio, the signal-to-noise ratio γ and the current channel quality h, and the power division factor ρPSIt is related. In addition, due to constraints such as average power and average bit error rate, it is necessary to reflect the change information of these environments and states in State when designing State. The state is as follows,
Figure GDA0003153907880000074
s33, determining the action value and the action space of the deep reinforcement learning of the transmitter;
in time slot t, the transmitter determines the modulation mode Mod of the signal at time t(t)Magnitude of transmitted power Pt (t)And power division factor
Figure GDA0003153907880000075
Figure GDA0003153907880000076
S34, determining a return function of deep reinforcement learning of the transmitter;
when the constraint is satisfied, the reward is that the spectrum utilization rate R (s, a) ═ C corresponding to the action is executed.
When the constraint is not satisfied, the reward is equal to the negative of the degree to which the constraint is not satisfied, and the specific formula is as follows.
R(s,a)=RPEH+RBER+RPT
Figure GDA0003153907880000077
Figure GDA0003153907880000078
Figure GDA0003153907880000079
Wherein (·)+Representing taking the absolute value.
And S35, performing deep reinforcement learning and decision making by the adaptive link control transmitter based on energy collection.
The intelligent transmitter maintains two deep neural networks, namely a target network and an evaluation network, wherein the evaluation network is responsible for estimating system return, and the target network is responsible for selecting a certain action value. At the beginning of time t, the intelligent transmitter firstly inputs the current state s of the intelligent transmittertTo the action network, the target network then outputs the expected return value of each action, and the intelligent transmitter selects the action a with the maximum expected return valuet. Then the intelligent transmitter calculates the current average bit error rate, average transmitting power and average energy collecting power to obtain the next state value st+1. Then intelligently transmitting the state of the current moment-action-report-state group(s) of the next momentt,at,rt,st+1) Storing the data into a memory cache, wherein the size of the memory cache can be selected to be 1000, namely, storing 1000-step state transition samples. Then, data of a certain mini-batch, for example, 64 samples, are selected from the memory buffer, and the weight parameter theta of the neural network is updated through back propagation by using a small batch gradient descent. The neural network is a fully connected neural network with 3 hidden layers, and the activating function uses a double-cut sine function tanh. The deep reinforcement learning process of the intelligent transmitter is shown in fig. 4.
The deep reinforcement learning process is specifically as follows. Firstly, randomly initializing and evaluating a network weight parameter theta and a weight theta of a target network-. Then, a cyclic process is carried out: the agent obtains the maximum Q(s) according to the current evaluation networkt,at) Act a oftPerforming action a with a probability of 1-epsilontAnd randomly selecting actions from the action candidate set according to the probability of epsilon for exploration. Each action corresponds to a prize value rtAnd causing the state of the smart transmitter to be changed from stIs transferred to st+1In the formation state st+1In the process of (3), the PEG technique (i.e., control when updating the above-mentioned average performance parameters using a sliding window) is used. When obtaining the whole(s)t,at,rt,st+1) And then storing the data in an experience pool. And taking samples from the experience pool by the evaluation network and the target network, and carrying out a back propagation algorithm based on gradient descent to update network parameters. For example, after every 100 exploration steps, the parameters of the evaluation network are assigned to the target network so that θ-=θ。
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (3)

1. An adaptive modulation link control system, comprising: the device comprises a transmitter, a receiver, a Rayleigh fading channel and a channel estimation module;
the transmitter adaptively adjusts the transmitting power and the modulation mode of the transmitter under the constraints of average power limit, average bit error rate limit and average energy harvesting limit according to the feedback information of the channel estimation module;
the receiver comprises a rechargeable battery, a part of received energy is stored in the battery by the receiver in a power division mode, and the rest energy is used for transmitting data to the transmitter through a Rayleigh fading channel;
the process of the adaptive modulation link control based on the adaptive modulation link control system is as follows:
a1, determining the error rate performance and the energy harvesting performance of a receiver;
a2, determining the state value and the state space of deep reinforcement learning of the transmitter; the state value is recorded as st
Figure FDA0003153907870000011
Wherein h ist-1Indicates the channel quality, h, corresponding to time slot t-1tIndicating the channel quality for the time slot t,
Figure FDA0003153907870000012
indicating the corresponding transmit power level, Mod, of the time slot t-1(t-1)Indicates the modulation mode, gamma, corresponding to the time slot t-1(t-1)Indicating the signal-to-noise ratio corresponding to time slot t-1,
Figure FDA0003153907870000013
representing the average energy harvesting performance for time slot t-1,
Figure FDA0003153907870000014
indicating the average bit error rate performance for time slot t-1,
Figure FDA0003153907870000015
indicates the level corresponding to time slot t-1The average transmit power;
a3, determining the action value and the action space of the deep reinforcement learning of the transmitter;
in time slot t, the transmitter determines the modulation mode Mod of the signal at time t(t)Magnitude of transmitted power Pt (t)And power division factor
Figure FDA0003153907870000016
The motion space of the time slot t transmitter is:
Figure FDA0003153907870000017
a4, determining the reward value r of deep reinforcement learning of the transmittert
When the constraint is satisfied, the prize value rtThe corresponding spectrum utilization rate R(s) for executing the actiont,at)=Ct
When the constraint is not satisfied, the reward value rtA negative value equal to the degree to which the constraint is not satisfied;
a5, carrying out deep reinforcement learning and decision making based on an improved prior experience generation method; the transmitter maintains two deep neural networks, which are respectively recorded as: a target network and an evaluation network, wherein the target network is used for selecting the action strategy and outputting an expected reward value r corresponding to the selected action strategyt+αmaxQ(st+1,at+1) The evaluation network is used to evaluate the current time value function Q(s)t,at) Carrying out estimation;
step a5 specifically includes the following steps:
b1, randomly initializing and evaluating the weight parameter theta of the network and the weight theta of the target network-
B2, the transmitter obtains the maximum Q(s) according to the current evaluation networkt,at) Act a oftThe target network performs action a with a probability of 1-epsilontRandomly selecting actions from the action candidate set according to the probability of epsilon for exploration; step B2The candidate set specifically comprises: the transmitter obtains a maximum function Q(s) from the evaluation networkt,at) Corresponding action policy atSelecting action strategies with the same or adjacent orders with the action strategies to form an action candidate set;
b3, each action corresponds to a reward value rtAnd causing the state of the smart transmitter to be changed from stIs transferred to st+1
B4, controlling the samples(s) stored in the experience pool by adopting a sliding windowt,at,rt,st+1),stRepresenting the state of the transmitter at time t, st+1Represents the state of the transmitter at time t + 1;
b5, taking samples from the experience pool by the evaluation network and the target network, and carrying out a back propagation algorithm based on gradient descent to update network parameters;
b6, assigning the parameters of the evaluation network to the target network so that theta-=θ。
2. The adaptive modulation link control system of claim 1 wherein the reward value r of step B3 istThe settings are based on constraints including average power limit, average bit error rate limit, average energy harvesting limit.
3. The adaptive modulation link control system of claim 2, wherein the step B4 further comprises initially setting the sliding window to 2.
CN202010325108.3A 2020-04-23 2020-04-23 Adaptive modulation and power control system based on energy acquisition and optimization method Active CN111491358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010325108.3A CN111491358B (en) 2020-04-23 2020-04-23 Adaptive modulation and power control system based on energy acquisition and optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010325108.3A CN111491358B (en) 2020-04-23 2020-04-23 Adaptive modulation and power control system based on energy acquisition and optimization method

Publications (2)

Publication Number Publication Date
CN111491358A CN111491358A (en) 2020-08-04
CN111491358B true CN111491358B (en) 2021-10-26

Family

ID=71813667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010325108.3A Active CN111491358B (en) 2020-04-23 2020-04-23 Adaptive modulation and power control system based on energy acquisition and optimization method

Country Status (1)

Country Link
CN (1) CN111491358B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102278037B1 (en) * 2019-10-22 2021-07-15 성균관대학교산학협력단 Method for controlling receiver by transmitter for simultaneous wireless information and power transfer operating in dual mode, adaptive mode switching method based on machine learning, and apparatus for performing the same
CN112508172A (en) * 2020-11-23 2021-03-16 北京邮电大学 Space flight measurement and control adaptive modulation method based on Q learning and SRNN model
CN114126021B (en) * 2021-11-26 2024-04-09 福州大学 Power distribution method of green cognitive radio based on deep reinforcement learning
CN114533321A (en) * 2022-04-18 2022-05-27 深圳市宏丰科技有限公司 Control circuit and method for tooth washing device
CN114980293B (en) * 2022-05-07 2023-08-11 电子科技大学长三角研究院(湖州) Intelligent self-adaptive power control method for large-scale OFDM system
CN117579136B (en) * 2024-01-17 2024-04-02 南京控维通信科技有限公司 AUPC and ACM control method for reverse burst by network control system in TDMA

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101340592A (en) * 2008-08-14 2009-01-07 上海交通大学 Energy control system for video transmission under hybrid radio environment
KR101710012B1 (en) * 2015-11-10 2017-02-24 성균관대학교산학협력단 Energy harvesting method and apparatus in a receiver and a receiver using said method, and blind modulation manner detecting method and apparatus for the energy harvesting
CN108449803A (en) * 2018-04-02 2018-08-24 太原理工大学 Predictable energy management in rechargeable wireless sensor network and mission planning algorithm
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101340592A (en) * 2008-08-14 2009-01-07 上海交通大学 Energy control system for video transmission under hybrid radio environment
KR101710012B1 (en) * 2015-11-10 2017-02-24 성균관대학교산학협력단 Energy harvesting method and apparatus in a receiver and a receiver using said method, and blind modulation manner detecting method and apparatus for the energy harvesting
CN108449803A (en) * 2018-04-02 2018-08-24 太原理工大学 Predictable energy management in rechargeable wireless sensor network and mission planning algorithm
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Joint Interleaver and Modulation Design For Multi-User SWIPT-NOMA;Yizhe Zhao等;《IEEE TRANSACTIONS ON COMMUNICATIONS》;20191031;第67卷(第10期);全文 *
Optimal Power Splitting for Simultaneous Wireless Information and Power Transfer in Amplify-and-Forward Multiple-Relay Systems;DEREK KWAKU POBI ASIEDU等;《IEEE》;20180130;全文 *
一种具有顽健性的无线数能网络的时隙资源分配和多用户选择算法;杨佳雨等;《物联网学报》;20190930;第3卷(第3期);全文 *

Also Published As

Publication number Publication date
CN111491358A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN111491358B (en) Adaptive modulation and power control system based on energy acquisition and optimization method
Ortiz et al. Reinforcement learning for energy harvesting point-to-point communications
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
Ortiz et al. Reinforcement learning for energy harvesting decode-and-forward two-hop communications
CN112383922B (en) Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN111666149A (en) Ultra-dense edge computing network mobility management method based on deep reinforcement learning
Chen et al. Genetic algorithm-based optimization for cognitive radio networks
CN108075975B (en) Method and system for determining route transmission path in Internet of things environment
CN114513855B (en) Edge computing unloading decision and resource allocation method based on wireless energy-carrying communication
CN105519030A (en) Computer program product and apparatus for fast link adaptation in a communication system
CN110267274A (en) A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user
Mashhadi et al. Deep reinforcement learning based adaptive modulation with outdated CSI
Ji et al. Reconfigurable intelligent surface enhanced device-to-device communications
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
CN115065728A (en) Multi-strategy reinforcement learning-based multi-target content storage method
CN112788629B (en) Online combined control method for power and modulation mode of energy collection communication system
CN112738849B (en) Load balancing regulation and control method applied to multi-hop environment backscatter wireless network
Zhang et al. Deep Deterministic Policy Gradient for End-to-End Communication Systems without Prior Channel Knowledge
CN111556511B (en) Partial opportunistic interference alignment method based on intelligent edge cache
CN109951239B (en) Adaptive modulation method of energy collection relay system based on Bayesian classifier
Masadeh et al. Look-ahead and learning approaches for energy harvesting communications systems
Huang et al. Joint AMC and resource allocation for mobile wireless networks based on distributed MARL
CN115665763A (en) Intelligent information scheduling method and system for wireless sensor network
Cui et al. Hierarchical learning approach for age-of-information minimization in wireless sensor networks
Alajmi et al. An efficient actor critic drl framework for resource allocation in multi-cell downlink noma

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant