CN113207129A - Dynamic spectrum access method based on confidence interval upper bound algorithm and DRL algorithm - Google Patents
Dynamic spectrum access method based on confidence interval upper bound algorithm and DRL algorithm Download PDFInfo
- Publication number
- CN113207129A CN113207129A CN202110506184.9A CN202110506184A CN113207129A CN 113207129 A CN113207129 A CN 113207129A CN 202110506184 A CN202110506184 A CN 202110506184A CN 113207129 A CN113207129 A CN 113207129A
- Authority
- CN
- China
- Prior art keywords
- sue
- channel
- access
- dynamic spectrum
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/373—Predicting channel quality or other radio frequency [RF] parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/382—Monitoring; Testing of propagation channels for resource allocation, admission control or handover
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Electromagnetism (AREA)
- Quality & Reliability (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to a dynamic spectrum access method based on a confidence interval upper bound algorithm and a DRL algorithm, and belongs to the field of wireless communication. The method specifically comprises the following steps: s1: constructing a distributed dynamic spectrum access system model; s2: constructing a cumulative expected reward function for the SUE; s3: obtaining an optimal access strategy according to historical experience and state actions of an access channel so as to obtain the maximum accumulated expected reward; s4: and solving the access strategy by adopting a method of combining a DQN algorithm and a confidence interval upper bound algorithm in deep reinforcement learning, and obtaining the optimal access strategy through continuous iteration. The invention can obtain the optimal dynamic spectrum access strategy corresponding to the condition that the channel state transfer rule is known under the condition that the channel dynamic change rule is unknown.
Description
Technical Field
The invention belongs to the field of wireless communication, and relates to a dynamic spectrum access method based on a confidence interval upper bound algorithm and a DRL algorithm.
Background
In recent years, increasing spectrum resources is one of the key solutions for future wireless communication networks to cope with this exponential data traffic growth. However, radio spectrum is an expensive and scarce resource. The current shortage of radio spectrum makes it difficult for wireless operators to obtain sufficient proprietary licensed bands. On the other hand, experimental tests and investigations from the academic and industrial circles indicate that the static spectrum allocation policy results in insufficient utilization of allocated licensed bands, most of which are under 30%, and more than half of which are under 20%. These statistics reflect the fact that radio spectrum resources are under-utilized, which has prompted the industry to reconsider current static spectrum allocation policies and to employ dynamic spectrum access to promote spectrum efficient utilization.
In order to realize the coexistence of frequency spectrums between cognitive users and primary users, various frequency spectrum access strategies have been proposed at present, and are mainly divided into two frequency spectrum access mechanisms. The first is Listen Before Talk (LBT), also known as an interleaving scheme, in which a SUE can access a band only if it detects that the band is available. Although this scheme can effectively avoid strong interference to the primary user, the chances of the SUE accessing the shared band are quite limited. This is because under LBT, spectrum access is completely dependent on the current spectrum access result. In reality, due to randomness of a wireless environment, limited or no cooperation among cognitive users, and other practical factors, a spectrum access result may always have a large error. This will result in false positives or missed detections of primary user activity, leading to incorrect decisions by cognitive users on channel access. The second spectrum access scheme is spectrum sharing, also referred to as the underlay scheme. In this scheme, cognitive users coexist with primary users on a shared frequency band and adjust their transmit power levels such that the cumulative interference experienced at the primary users is less than a tolerable interference threshold. This scheme requires a strong assumption that channel state information between the transmitter of the cognitive user and the receiver of the primary user is already known for power control. However, in reality, it is often difficult to obtain such channel state information without a central controller. Even in the presence of a central controller, exchanging these channel state information may impose heavy control overhead on the underlying network, making it difficult to implement in practice.
In summary, in view of the various defects and shortcomings of the conventional dynamic spectrum access, a new dynamic spectrum access method is needed to solve the above problems.
Disclosure of Invention
In view of the above, the present invention provides a dynamic spectrum access method based on a combination of a confidence interval upper bound algorithm and a Deep Reinforcement Learning (DRL) algorithm, which aims at various defects and deficiencies of the conventional dynamic spectrum access, and obtains an optimal dynamic spectrum access strategy corresponding to a situation that a channel state transition rule is known approximately under the situation that a channel dynamic change rule is unknown.
In order to achieve the purpose, the invention provides the following technical scheme:
a dynamic spectrum access method based on a confidence interval upper bound algorithm and a DRL algorithm specifically comprises the following steps:
s1: constructing a distributed dynamic spectrum access system model;
s2: constructing a cumulative expectation reward function (SUE) of a Secondary User Equipment (SUE);
s3: according to the historical experience of the first SUE in M time slots before t time slotAnd accessing the state action of the channel to obtain an optimal access strategy so as to obtain the maximum accumulated expected reward;
s4: and solving the access strategy by adopting a method of combining a DQN algorithm and a confidence interval upper bound algorithm in deep reinforcement learning, and obtaining the optimal access strategy through continuous iteration.
Further, in step S1, the constructed distributed dynamic spectrum access system model specifically includes: a Primary User network consisting of N Primary Users (PUs) and a secondary User network consisting of L SUEs; assuming that there are N orthogonal channels, each PU transmits on a unique wireless channel to avoid interference between PUs; the operating states of the PU on the channel are indicated as both active (labeled 1) and idle (labeled 0), and communicate in the channel in a TDMA fashion. And the state of the channel is determined by the PUThe state decision of (2): occupied (0) or idle (1), the state of all channels is defined by 2NDiscrete markov models of individual states, whose state space is represented as: s ═ S1,s2,...,sn,...,sN)∣sn0 or 1, N1, 2, N, wherein s isn0 or 1 respectively represents two states per channel: occupied (0) or idle (1).
Further, in step S1, the state transition probability on a single channel is expressed as:
wherein p isijRepresenting the probability of state i transitioning to state j. Assuming that the channel is stationary, the transfer matrix P is constant and time independent.
Further, in step S1, assuming that each SUE has a need for transmitting data, each SUE should select at least one channel to access to transmit data, and different SUE access action spaces are the same, and at this time, the action space of the ith SUE is used to generally represent the SUE; the access action of the ith SUE in the time slot t is represented as:
al(t)∈{1,2,...,n,...,N}
wherein, al(t) indicates the channel within the time slot t that the ith SUE is to access and transmit data; suppose that after the SUE accesses the nth channel at the t time slot, the feedback of the nth channel accessed by the SUE sent by the receiving end through the control channel is received by the SUE sending end asAfter the SUE accesses the nth channel, three situations occur: (1) successful transmission of the SUE; (2) the SUEs collide with each other to interfere with each other; (3) the SUE creates interference to the PU; corresponding to the three cases, feedback is set toNamely, it is
Further, in step S1, the reward value is set as the feedback signalThe value of (c), the cumulative discount reward earned by the ith SUE is expressed as:
wherein, gamma is more than or equal to 0 and less than or equal to 1, which is a discount factor and represents the influence of future rewards on the current action; r isl(t) indicates the prize value for which the ith SUE transmitted successfully on the channel.
Further, in step S2, the cumulative expected reward function of the constructed SUE is expressed as:
wherein the content of the first and second substances,the historical experience of the first SUE M slots before t slots is shown, L the number of SUEs.
Further, in step S2, the historical experience of the first SUE for M time slots before t time slotSelecting an action access channel to obtain the maximum cumulative expected reward, whereby the SUE optimal access policy formula is:
further, in step S3, the method of combining the DQN algorithm and the confidence interval upper bound algorithm in the deep reinforcement learning is used to solve the access policy, which specifically includes: when the SUE takes action, the action is selected asWherein the content of the first and second substances,indicating action before t time slotThe selected times, sigma, represent the uncertainty measure, control the degree of exploration;showing the historical experience given by the ith SUE at the t time slotActing as a stateIs expressed as
The invention has the beneficial effects that: the invention can adapt to the dynamically changing cognitive radio environment. Specifically, with deep reinforcement learning, spectrum access selection depends not only on the current spectrum access result but also on the learning result of the past spectrum state. In this way, it is possible to greatly reduce the negative effect of the traditional imperfect access method on the spectrum access performance. In addition, deep reinforcement learning can enable cognitive user equipment to obtain more accurate channel states and useful channel state prediction/statistical information, such as behavior rules of a master user. The frequency spectrum access based on the invention can also greatly reduce the conflict between the cognitive user equipment and the master user. In addition, the exploration strategy adopting the confidence interval also accelerates the exploration and convergence speed of the deep reinforcement learning.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
fig. 1 is a diagram of a dynamic spectrum access scenario;
FIG. 2 is a state transition model of a channel;
fig. 3 is a flowchart of a dynamic spectrum access method based on a combination of confidence intervals and deep reinforcement learning.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to fig. 3, an implementation process of a dynamic spectrum access method based on a combination of a confidence interval and deep reinforcement learning specifically includes the following three initial conditions and five main steps.
Initial condition 1:
the system model is a dynamic multi-channel access problem in a specific cell, and the structure of the system model is shown in fig. 1. In a dynamic multi-channel access scene, a primary user network composed of N PUs and a secondary user network composed of L SUEs are considered. Assuming that there are N orthogonal channels, each PU transmits on a unique wireless channel to avoid interference between PUs; the SUE may find a free channel among the N channels for transmission at any time. Since the channel may not be accessed or a failed transmission may occur while accessing the channel, a feedback signal is needed between transceivers to flag whether the transmission was successful. Specifically, when the SUE receiver successfully receives a packet from a channel, it transmits a feedback signal to its corresponding transmitting end through the common control channel of the SUE system itself at the end of the slot. The operating state of a PU on a channel may be represented as both active (labeled 1) and inactive (labeled 0), and communicates in the channel in a TDMA fashion. The state of the channel is determined by the PU state: assuming that the PU on the channel n is active, the channel is in a busy state, and the state of the channel is 0; conversely, in the time slot t, if the nth channel is in the idle state, it is denoted as 1.
Initial condition 2:
the states of the channels conform to a discrete Markov model, and the state space of the N channels is represented as follows:
S={s=(s1,...,sn,...,sN)∣sn0 or 1, N ═ 1,2,.., N } (1)
Wherein s isn0 or 1 respectively represents two states per channel: occupied (0) and idle (1).
The state of each channel is described as a markov chain, and the state transition probability on the nth channel is expressed as:
wherein p isijRepresenting the probability of a transition of state i to state j, transition matrix PnIs constant and time-independent. Because the SUE can only access one channel at the beginning of each time slot and cannot observe the states of all the channels, the considered dynamic multi-channel access problem belongs to the category of the POMDP, and the invention adopts a deep reinforcement learning method to solve the problem.
Initial condition 3:
assuming that each SUE has a need for transmitting data, each SUE should select at least one channel to access for transmitting data, and different SUEs access action spaces are the same, and are summarized by the action space of the ith SUE. The access action of the ith SUE in the time slot t is represented as:
al(t)∈{1,2,...,N} (3)
wherein, al(t) denotes the channel within the time slot t that the ith SUE is to access and transmit data. Suppose that after the SUE accesses the nth channel at the t time slot, the feedback of the nth channel accessed by the SUE sent by the receiving end through the control channel is received by the SUE sending end asAfter the SUE accesses the nth channel, three situations occur: (1) successful transmission of the SUE; (2) the SUEs collide with each other to interfere with each other; (3) SUEs create interference to PUs. Corresponding to the three cases, feedback is set to Namely, it is
Initial condition 4:
suppose that the ith SUE is based on historical experience at the t slotAdopt strategy pil(t) after accessing the nth channel, the SUE transmitting end receives the feedback signal of the nth channel accessed by the SUE transmitted by the receiving end through the control channelWhether the data of the ith SUE is successfully transmitted depends on the state of the channel occupied by the PU and other SUE access action strategies, and if the channel is occupied by the PU or the SUE accesses the channel to transmit data, the data transmission of the ith SUE fails. To generally represent the quality of the transmission of the ith SUE on the nth channel, one may combineThe reward for successful transmission is set to the transmission rate on the channel, e.g.Wherein, B is the nth channel bandwidth. To simplify the calculation process, in an embodiment the reward value is set as a feedback signalThe value of (c). The cumulative discount reward earned by this ith SUE may be expressed as:
wherein, gamma is more than or equal to 0 and less than or equal to 1, which is a discount factor and represents the influence of future rewards on the current action.
Step 1:
the dynamic spectrum access policy is distributed, and access result information is not shared between SUEs. Each SUE has its own DQN network to make channel access decisions independently. According to the initial condition 4, the goal of each SUE is to find a strategy pi suitable for the current dynamic spectrum environment, and prompt the SUE to take a proper access action, so that the cumulative discount reward of the SUE itself is maximized. An act of mapping observations of historical timeslots to next timeslots, the cumulative expected reward function for the ith SUE may be expressed as
Wherein γ ∈ (0,1) represents a decay factor,an action taken at t time slot for the ith SUE, s represents a state in reinforcement learning;historical experience showing the first SUE M time slots before t time slot, including its accessAnd its observed channel state. The optimization-aware strategy formula that can be derived from equation (7) is expressed as:
step 2:
the merits of the measurement strategy can be measured by a state action value function, namely a Q function, in addition to the equation (5). Under strategy π, the Q function of the ith SUE is expressed as:
where s and a represent states and actions in reinforcement learning.
The access policy of the ith SUE of equation (6) can be solved by solving for the Q value:
the access strategy of dynamic spectrum access is distributed, because access results and historical experience information are not shared among the SUEs, each SUE has its own deep reinforcement learning to decide the decision of accessing the channel, but the strategy solving mode between different SUEs is the same, only it needs to be noticed that the same channel may be accessed between different SUEs, thereby causing interference between SUEs, in order to avoid the conflict of accessing the same channel between SUEs, the strategy of other SUEs also needs to be learned between different SUEs in the invention, which mainly learns through the difference of reward values (namely feedback signals).
And step 3:
and solving the access strategy by adopting a method of combining a DQN algorithm and an upper bound of confidence interval (UCB) in deep reinforcement learning. Firstly, initializing variables in the learning process: initializing the size of an experience playback pool E to be D; ② initialize two nets in DQN of the first SUEComplexing: current network and target network, respectively denoted asAndsetting the weight of the current network as theta and the weight of the target network as theta-θ; ③ 10 for initial learning rate alpha-4The activation function in the neural network is ReLU, and the attenuation factor γ is 0.9.
And 4, step 4:
in the t slot, the ith SUE will experience historyAnd actions takenAs input to the neural network, all state action pairs based on this state are outputQ value ofWhen the ith SUE takes action based on the policy of the upper bound of the confidence interval at the t time slot, the optimal action is expressed as:
wherein the content of the first and second substances,the confidence level is indicated and the confidence level is indicated,indicates that the ith SUE acts before the t slotThe number of times chosen, σ, represents an uncertainty measure, controls the extent of exploration,showing the historical experience given by the ith SUE at the t time slotAs actions taken in a stateThe Q value of (A) is represented by
The Q value is updated by adopting a method of DQN + UCB, and the Q value updating formula of the ith SUE is expressed as:
wherein the content of the first and second substances, to representNumber of times selected before t slotRepresenting algorithm versus current state action pairWherein c > 0 represents a constant, H represents the number of iteration steps per round, and access or no access is made to one round in dynamic spectrum access, so H is generally set to 1(H3It is meaningless in this scenario, only if it is a roundThe influence is large when there are many actions, for example, when there is a round from the starting point to the ending point in a maze scene); the most efficient exploration is found when iota is log (S | a | T/P), where P ∈ (0,1), | S | represents the number of states, | a | represents the number of actions, and T represents the algorithm runtime.
In the process of interactive learning with environment, the access action is carried outHistorical experience as statusAwarding of prizesAnd the new state generatedAs training samplesAnd storing the training samples into an experience playback pool E, and deleting the old training samples when the number of the training samples in the experience playback pool is more than M. During the subsequent DQN training, samples can be selected randomly from the experience playback pool continuously and input into the neural network for training, so that the correlation among data is broken.
And 5:
as can be seen from step 3, in DQN, there are two types of neural networks: one is the current networkAn estimate representing a cumulative discount reward for resolving all actions; one is a target networkFor generating the target values, both networks have the same structure. In DQN, the loss function is calculated by time difference, i.e. the loss function is expressed as:
wherein the content of the first and second substances,representing the target value generated by the target network.
The weight θ in the loss function L (θ) is updated using Adam's method. Every other TsUpdating the target network by one time slot, let theta-=θ。
Step 6: after a period of iterative learning, each SUE gradually obtains its own optimal access strategy
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (8)
1. A dynamic spectrum access method based on a confidence interval upper bound algorithm and a DRL algorithm is characterized by comprising the following steps:
s1: constructing a distributed dynamic spectrum access system model;
s2: constructing a cumulative expected reward function for the SUE;
s3: according to the historical experience of the first SUE in M time slots before t time slotAnd accessing the state action of the channel to obtain an optimal access strategy so as to obtain the maximum accumulated expected reward;
s4: and solving the access strategy by adopting a method of combining a DQN algorithm and a confidence interval upper bound algorithm in deep reinforcement learning, and obtaining the optimal access strategy through continuous iteration.
2. The dynamic spectrum access method according to claim 1, wherein the distributed dynamic spectrum access system model constructed in step S1 specifically includes: a primary user network consisting of N PUs and a secondary user network consisting of L SUEs; assuming that there are N orthogonal channels, each PU transmits on a unique wireless channel; the working state of the PU on the channel is represented as active and idle, and respectively marked as '1' and '0'; the states of all channels are represented by 2NDiscrete markov models of individual states, whose state space is represented as: s ═ S (S)1,s2,...,sn,...,sN)∣sn0 or 1, N1, 2, N, wherein s isn0 or 1 respectively represents two states per channel: occupied or idle.
4. The dynamic spectrum access method according to claim 2, wherein in step S1, assuming that each SUE has a need for data transmission, each SUE accesses a channel, and different SUE access action spaces are the same, and are generally represented by the action space of the ith SUE; the access action of the ith SUE in the time slot t is represented as:
al(t)∈{1,2,...,n,...,N}
wherein, al(t) indicates the channel within the time slot t that the ith SUE is to access and transmit data; assuming that the SUE accesses the nth channel at t-slotThen, the feedback of the nth channel accessed by the SUE sent by the receiving end through the control channel is received by the SUE sending end asAfter the SUE accesses the nth channel, three situations occur: (1) successful transmission of the SUE; (2) the SUEs collide with each other to interfere with each other; (3) the SUE creates interference to the PU; corresponding to the three cases, feedback is set toNamely, it is
5. The dynamic spectrum access method of claim 4, wherein in step S1, a reward value is set as a feedback signalThe value of (c), the cumulative discount reward earned by the ith SUE is expressed as:
wherein, gamma is more than or equal to 0 and less than or equal to 1, which is a discount factor and represents the influence of future rewards on the current action; r isl(t) indicates the prize value for which the ith SUE transmitted successfully on the channel.
6. The dynamic spectrum access method of claim 5, wherein in step S2, the cumulative expected reward function of the SUE is constructed by the following expression:
8. the dynamic spectrum access method according to claim 7, wherein in step S3, a method of combining a DQN algorithm and a confidence interval upper bound algorithm in deep reinforcement learning is used to solve the access policy, which specifically includes: when the SUE takes action, the action is selected asWherein the content of the first and second substances,indicating action before t time slotThe selected times, sigma, represent the uncertainty measure, control the degree of exploration;showing the historical experience given by the ith SUE at the t time slotActing as a stateIs expressed as
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110506184.9A CN113207129B (en) | 2021-05-10 | 2021-05-10 | Dynamic spectrum access method based on confidence interval upper bound algorithm and DRL algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110506184.9A CN113207129B (en) | 2021-05-10 | 2021-05-10 | Dynamic spectrum access method based on confidence interval upper bound algorithm and DRL algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113207129A true CN113207129A (en) | 2021-08-03 |
CN113207129B CN113207129B (en) | 2022-05-20 |
Family
ID=77030590
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110506184.9A Active CN113207129B (en) | 2021-05-10 | 2021-05-10 | Dynamic spectrum access method based on confidence interval upper bound algorithm and DRL algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113207129B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102256262A (en) * | 2011-07-14 | 2011-11-23 | 南京邮电大学 | Multi-user dynamic spectrum accessing method based on distributed independent learning |
CN103209419A (en) * | 2013-04-25 | 2013-07-17 | 西安电子科技大学 | User demand orientated dynamic spectrum accessing method capable of improving network performance |
US20180098330A1 (en) * | 2016-09-30 | 2018-04-05 | Drexel University | Adaptive Pursuit Learning Method To Mitigate Small-Cell Interference Through Directionality |
CN108833040A (en) * | 2018-06-22 | 2018-11-16 | 电子科技大学 | Smart frequency spectrum cooperation perceptive method based on intensified learning |
CN111654342A (en) * | 2020-06-03 | 2020-09-11 | 中国人民解放军国防科技大学 | Dynamic spectrum access method based on reinforcement learning with priori knowledge |
-
2021
- 2021-05-10 CN CN202110506184.9A patent/CN113207129B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102256262A (en) * | 2011-07-14 | 2011-11-23 | 南京邮电大学 | Multi-user dynamic spectrum accessing method based on distributed independent learning |
CN103209419A (en) * | 2013-04-25 | 2013-07-17 | 西安电子科技大学 | User demand orientated dynamic spectrum accessing method capable of improving network performance |
US20180098330A1 (en) * | 2016-09-30 | 2018-04-05 | Drexel University | Adaptive Pursuit Learning Method To Mitigate Small-Cell Interference Through Directionality |
CN108833040A (en) * | 2018-06-22 | 2018-11-16 | 电子科技大学 | Smart frequency spectrum cooperation perceptive method based on intensified learning |
CN111654342A (en) * | 2020-06-03 | 2020-09-11 | 中国人民解放军国防科技大学 | Dynamic spectrum access method based on reinforcement learning with priori knowledge |
Non-Patent Citations (4)
Title |
---|
CHEN DAI: "Contextual Multi-Armed Bandit for Cache-Aware Decoupled Multiple Association in UDNs: A Deep Learning Approach", 《IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING》 * |
YU ZHANG: "Multi-Agent Deep Reinforcement Learning-Based Cooperative Spectrum Sensing With Upper Confidence Bound Exploration", 《IEEE》 * |
宁文丽: "基于强化学习的频谱感知策略研究", 《CNKI硕士期刊》 * |
王董礼等: "基于UCB的短波认知信道选择算法", 《铁道学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113207129B (en) | 2022-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5274140B2 (en) | Method for reducing inter-cell interference in a radio frequency division multiplexing network | |
CN111726811B (en) | Slice resource allocation method and system for cognitive wireless network | |
CN112188503B (en) | Dynamic multichannel access method based on deep reinforcement learning and applied to cellular network | |
US11777636B2 (en) | Joint link-level and network-level intelligent system and method for dynamic spectrum anti-jamming | |
CN113423110B (en) | Multi-user multi-channel dynamic spectrum access method based on deep reinforcement learning | |
CN110492955B (en) | Spectrum prediction switching method based on transfer learning strategy | |
CN110891276A (en) | Multi-user anti-interference channel access system and dynamic spectrum cooperative anti-interference method | |
CN112153744B (en) | Physical layer security resource allocation method in ICV network | |
EP2566273A1 (en) | Method for dynamically determining sensing time in cognitive radio network | |
CN116744311B (en) | User group spectrum access method based on PER-DDQN | |
CN113207129B (en) | Dynamic spectrum access method based on confidence interval upper bound algorithm and DRL algorithm | |
CN108449151B (en) | Spectrum access method in cognitive radio network based on machine learning | |
CN114501667A (en) | Multi-channel access modeling and distributed implementation method considering service priority | |
KR101073294B1 (en) | DYNAMIC FREQUENCY SELECTION SYSTEM AND METHOD BASED ON GENETIC ALGORITHM For COGNITIVE RADIO SYSTEM | |
CN114126021B (en) | Power distribution method of green cognitive radio based on deep reinforcement learning | |
CN116709567A (en) | Joint learning access method based on channel characteristics | |
Kaytaz et al. | Distributed deep reinforcement learning with wideband sensing for dynamic spectrum access | |
CN113890653B (en) | Multi-agent reinforcement learning power distribution method for multi-user benefits | |
CN115278896A (en) | MIMO full duplex power distribution method based on intelligent antenna | |
CN112367131B (en) | Jump type spectrum sensing method based on reinforcement learning | |
CN114916087A (en) | Dynamic spectrum access method based on India buffet process in VANET system | |
CN111866979B (en) | Base station and channel dynamic allocation method based on multi-arm slot machine online learning mechanism | |
CN104660392A (en) | Prediction based joint resource allocation method for cognitive OFDM (orthogonal frequency division multiplexing) network | |
CN113473419B (en) | Method for accessing machine type communication device into cellular data network based on reinforcement learning | |
Chen et al. | Dynamic Spectrum Access Scheme of Joint Power Control in Underlay Mode Based on Deep Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |