CN109039505B - Channel state transition probability prediction method in cognitive radio network - Google Patents
Channel state transition probability prediction method in cognitive radio network Download PDFInfo
- Publication number
- CN109039505B CN109039505B CN201810696652.1A CN201810696652A CN109039505B CN 109039505 B CN109039505 B CN 109039505B CN 201810696652 A CN201810696652 A CN 201810696652A CN 109039505 B CN109039505 B CN 109039505B
- Authority
- CN
- China
- Prior art keywords
- state
- channel
- channel state
- probability
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 167
- 230000007704 transition Effects 0.000 title claims abstract description 93
- 230000001149 cognitive effect Effects 0.000 title claims abstract description 61
- 230000008569 process Effects 0.000 claims abstract description 85
- 230000000875 corresponding effect Effects 0.000 claims description 66
- 230000009471 action Effects 0.000 claims description 48
- 230000005540 biological transmission Effects 0.000 claims description 23
- 238000001228 spectrum Methods 0.000 claims description 17
- 239000000126 substance Substances 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000010187 selection method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 13
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003920 cognitive function Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B17/00—Monitoring; Testing
- H04B17/30—Monitoring; Testing of propagation channels
- H04B17/391—Modelling the propagation channel
- H04B17/3913—Predictive models, e.g. based on neural network models
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention designs a channel state transition probability prediction method in a cognitive radio network based on a two-stage Q learning method, which comprises two learning processes, namely a channel state transition probability learning process and a channel state learning process; the channel state transition probability learning process and the channel state learning process are both based on a Q learning method, and the channel state learning process can learn the state of a channel according to the channel state transition probability learned by the channel state transition probability learning process and output the learned channel state. The method can predict the continuously changing authorized channel state in real time. Furthermore, the method does not require that any probability distribution is assumed in advance for an authorized user or interferer. Moreover, the method fully considers the problem that the used channel state transition probability predicts whether the sample state is correct or not.
Description
Technical Field
The invention relates to a method for predicting channel state transition probability in a cognitive radio network, and belongs to the technical field of radio networks.
Background
Over the past decades, wireless networks have supported the need for ever increasing higher data rates in order to meet consumer demand for fast, secure, and intelligent wireless networks. However, current wireless network systems face bottlenecks due to spectrum resources. This bottleneck makes it difficult to enhance performance in the limited available frequency band. Therefore, new communication cases are needed to further enhance the performance of wireless networks. The industry has predicted that wireless networks require more spectrum resources to serve more users of high-speed communications. Also, current spectrum resources need to be utilized more efficiently. Cognitive radio technology has emerged to make more efficient use of limited spectrum resources. In a cognitive radio network, a cognitive radio user can dynamically utilize a licensed spectrum without interfering with normal communications of the licensed user. However, the design of cognitive radio users faces many security challenges. In a cognitive radio network, cognitive radio users may be attacked by interference. A jammer equipped with cognitive functions can perceive channels occupied by unauthorized users and attack these channels. The attack of the disturber on the cognitive radio users will destroy the normal communication between the cognitive radio users. Therefore, in order to ensure normal communication of the cognitive radio users in the cognitive radio network and avoid interference to authorized users, prediction of channel state information is very important.
In a cognitive radio network, a cognitive radio user can generally perceive the state of a licensed channel through spectrum sensing technology. However, due to cognitive radio hardware and energy limitations, a cognitive radio user may not have the ability to perceive all licensed spectrum at the same time. Furthermore, even if the cognitive radio user has the ability to sense the entire licensed spectrum, sensing the entire licensed spectrum will consume a large amount of time, which will cause a large degree of delay in the communication of the cognitive radio user. Therefore, in order to obtain more correct spectrum state in a limited time, we need to obtain more information about the channel. The most important information is the channel state transition probability. Many existing articles assume that the channel state is determined by the state of an authorized user, and that the cognitive radio user knows the state transition information of the authorized user. However, in a real cognitive radio network, it is difficult for a cognitive radio user to know state transition information of an authorized user. Therefore, the cognitive radio user needs to predict the state information of the grant channel. To date, only a small percentage of the related research has predicted the state transition information of the grant channel in the cognitive radio network.
The prior relevant research literature for the channel state transition probability prediction method in the cognitive radio network is as follows:
liu et al, in 2012, published in IEEE Globecom, "Prediction of interpreted Distributed Primary User Traffic for Dynamic Spectrum Access," proposed an authorized User activity Prediction method based on the maximum likelihood approach. This method assumes that the activities of authorized users follow an exponential distribution and that perception is perfect. In this prediction method, the probability that an authorized user occupies a channel is first calculated from a sample. Then, the time of the authorized user occupying the authorized channel is evaluated according to the evaluation probability. And then, evaluating the transition probability of the channel occupation situation of the unauthorized user according to the evaluated channel occupation probability and the channel occupation time. And finally, estimating the channel state at the next moment according to the perfectly perceived channel state and the estimated transition probability. However, this maximum likelihood prediction method assumes that the channel perception result is perfect. In a real cognitive radio network, due to the limitations of sensing time and sensing method, the channel state sensing result may be imperfect. Furthermore, the occupation interval of the channel by the authorized user needs to be subject to exponential distribution. Moreover, a long sampling process is required in predicting the next state. Therefore, the method brings long time delay to the communication of the cognitive radio user.
Song et al proposed a Markov-based channel prediction method in an article "underlying the prediction of Wireless Spectrum: A Large-scale Empirical Study" published in IEEE ICC 2010. The method predicts the channel state of the slot using the first K channel states of the slot to be predicted. And predicting the channel state of the time slot by counting the number of the occupied time slot and the number of the unoccupied time slot of the K channel states. However, this approach predicts that different levels of error rates will be encountered in different situations. In some cases, this method faces a large prediction error rate. Furthermore, the channel states in the sequence of channel states used for prediction are simultaneously facing the problem of perceived channel state errors.
A Channel transition probability prediction algorithm Based on the Baum-Welch algorithm is proposed in an article 'Optimal Channel-Sensing Scheme for Cognitive Radio Systems Based on Fuzzy Q-Learning' published in IEICE transition on Communication 2014 by F.H. Panahi et al. However, this method requires that the channel transition probability be predicted over a certain number of sample spaces before the channel is predicted using the channel transition probability. In addition, if the channel transition probability changes, the prediction method takes a lot of time to re-sample to re-predict the channel transition probability. This channel transition prediction algorithm cannot predict the state of the channel in real time. Therefore, the communication delay of the cognitive radio user will be increased. And the time of resampling cannot be predicted.
S. Filippi et al propose a Channel Transition probability prediction method based on a maximum likelihood Estimation method in An article "An Estimation Algorithm of Channel State Transition Prohability for Cognitive Radio Systems" published in IEEE CrownCom 2008, which can estimate the Channel Transition probability relatively accurately. The state transition probability of the method is based on the fact that the cognitive radio user can obtain the correct channel state. However, in an actual cognitive radio network, due to limitations of cognitive radio users in sensing time, sensing method and surrounding communication environment, the cognitive radio users may not obtain real channel state information. In addition, in this method, the cognitive radio user needs to simultaneously collect channel state information for a long time to evaluate the channel transition probability. Therefore, the channel transition probability prediction method will increase the communication delay of the cognitive radio user.
Akbuliut et al propose a Channel transfer probability prediction method based on a Particle Swarm method in an article "Estimation of Time-Varying Channel State Transition Probabilities for Cognitive Radio Systems by Means of Particle Swarm Optimization" published in Radio engineering 2012. This method enables prediction of varying channel transition probabilities. However, this method also faces the case where the used channel state information may be different from the actual channel state.
On the basis of summarizing these studies, it can be seen that the following major problems exist in the design of current cognitive radio network architectures:
1. many articles default to the same state of the observed channel state sequence as the actual channel state. However, due to the spectrum sensing time of the unauthorized user, the spectrum sensing method, and the limitation of the communication environment, the observed channel state may deviate from the actual channel state to some extent.
2. Most of the articles need to predict the channel state transition probability in advance, which will bring some delay to the communication of the cognitive radio user. Furthermore, most articles cannot predict the channel state transition probability in real time. Therefore, the channel state transition probability in most of the articles is only applicable to the cognitive radio network in which the channel state transition probability is always unchanged. However, in a real cognitive radio network, the channel state transition probability may be time varying.
3. The channel state transition probability prediction methods in some articles need to meet certain probability distribution and cannot be generally used in cognitive radio networks.
Disclosure of Invention
The technical problem is as follows: the invention designs a channel state transition probability prediction method in a cognitive radio network based on a two-stage Q learning method. The prediction method comprises two stages of prediction processes based on a Q learning method, namely a channel state transition probability learning process and a channel state learning process. The input and output of the channel state transition probability learning process are the channel state learned by the channel state learning process and the predicted channel state transition probability, respectively. The inputs and outputs of the channel state learning process are: the output of the channel state transition probability learning process, and the state of the learned channel. The cognitive radio user can finally deduce the state transition probability of the channel through continuous learning of the channel, so that the state information of the channel at the next moment can be correctly deduced, and the situation that the cognitive radio user is attacked by an interferer and causes interference to an authorized user is avoided. The method can predict the continuously changing authorized channel state in real time. Furthermore, the method does not require that any probability distribution is assumed in advance for an authorized user or interferer. Moreover, the method fully considers the problem that the used channel state transition probability predicts whether the sample state is correct or not.
The technical scheme of the invention is as follows:
the invention relates to a channel state transition probability prediction method in a cognitive radio network, which comprises two learning processes, namely a channel state transition probability learning process and a channel state learning process;
the channel state transition probability learning process and the channel state learning process are both based on a Q learning method, and the channel state learning process can learn the state of a channel according to the channel state transition probability learned by the channel state transition probability learning process and output the learned channel state.
The channel state transition probability learning process is based on the basic elements of a Q learning method and respectively comprises the following steps:
the state is as follows: the states of the authorized channel are 0 and 1, when the state of the authorized channel is 0, the authorized channel is in an idle state, and when the state of the authorized channel is 1, the authorized channel is occupied by an authorized user or attacked by an interferer or both the authorized user and the interferer;
the actions are as follows: the predicted grant channel state at the next time instant, i.e., 0or 1;
reward: determining according to the predicted authorized channel state at the next moment and the difference of the channel state learned by the channel learning process;
the Q learning process of the channel state transition probability learning process is as follows:
(2a) initializing parameters in a Q-learning method
Initializing a parameter in the Q learning method to be a Q value corresponding to a channel state to be predicted at the next moment corresponding to the current channel state in Q learning; during initialization, setting a Q value corresponding to a channel state to be predicted at the next moment corresponding to the current channel state in the Q learning method to be 0;
(2b) action decision process
Selecting action according to the Q value of the next state corresponding to the current state, namely predicting the state of the channel at the next moment;
selecting actions by a xi greedy method in the action decision process, namely selecting the state of the next moment corresponding to the maximum Q value by the cognitive radio user according to the probability of xi, and selecting the state of the next moment corresponding to the non-maximum Q value according to the probability of 1-xi;
the selection method comprises the following steps:
wherein StIs the state corresponding to the channel at the time t; st+1The state corresponding to the channel at the time of t + 1; q (-) is the channel state StQ values corresponding to different next time channel states;
(2c) updating the Q value
Updating according to the channel state learned in the channel state learning process, wherein the Q value updating method comprises the following steps:
wherein the content of the first and second substances,the Q value corresponding to the next state selected in the current state;the value is the maximum Q value corresponding to the current channel state; α is the learning rate; gamma is a discount factor; r is a reward earned for performing the selected action;
when the action selected in the channel state transition probability learning process is the same as the channel state learned by the channel state learning method, the channel state transition probability learning process can obtain the reward and record the reward as 1; if the action selected in the channel state transition probability learning process is different from the channel state learned by the channel state learning method, the obtainable reward is marked as-1;
(2d) computing channel state transition probabilities
Calculating the channel state transition probability according to the Q value of the next state corresponding to the current state;
wherein, P0,jIs the probability of the channel transitioning from state 0 to state j; p1,jIs the probability of the channel transitioning from state 1 to j; q. q.s0,jThe Q value of j is the next state corresponding to the current state 0; q. q.s0,0The Q value of the next state which corresponds to the current state 0 and is 0; q. q.s0,1The Q value of the next state corresponding to the current state 0 is 1; q. q.s1,jThe Q value of j is the next state corresponding to the current state 1; q. q.s1,0The Q value of the next state corresponding to the current state 1 is 0; q. q.s1,1The Q value of the next state corresponding to the current state 1 is 1.
A channel state learning process comprising the steps of:
the basic elements of the channel state learning process based on the Q learning method are respectively as follows:
the state is as follows: the state probability of the grant channel, that is, the probability that the state of the grant channel is 0 and the probability that the state of the grant channel is 1;
the actions are as follows: transmitting data or not transmitting data;
reward: determining whether to transmit data or not and whether to generate a collision;
the Q learning process of the channel state learning process is as follows:
(3a) predicting grant channel state probability
The cognitive radio calculates the probability of each state of the authorized channel according to the spectrum sensing result and the channel state transition probability output in the channel state transition probability learning process,
wherein the content of the first and second substances,probability of state i at time t for grant channel,i=0or1;The probability that the state of the grant channel is i at the time t +1 is 0or 1; y isiThe probability that the authorized channel state observed by the sensing result is i is 0or 1; pi,jThe probability of the grant channel transitioning from state i to state j, j being 0or 1;
(3b) selecting corresponding action
Selecting a corresponding action according to the predicted state probability of the authorized channel,
the specific selection process is as follows: if the probability that the grant channel is free is greater than the probability that the grant channel is occupied, i.e. the grant channel is freeWhen the data is transmitted, the authorized channel is considered to be in an idle state, data transmission is carried out according to probability selection of eta, and data transmission is not carried out according to probability selection of 1-eta; otherwise, the authorized channel is considered to be in an occupied state, data transmission is not carried out according to the probability selection of eta, and data transmission is carried out according to the probability selection of 1-eta;
(3c) calculating rewards
If the selected action is data transmission and does not conflict with authorized users and interferers, the reward is 1; if the selected action is data transmission and conflicts with an authorized user or an interferer, the reward is-1; if the action is selected as not carrying out data transmission, the reward is 0;
(3d) updating the Q value
The Q value updating method comprises the following steps:
wherein the content of the first and second substances,a Q value corresponding to the selected action in the current state;value is currentMaximum Q value of action corresponding to the state; β is the learning rate; χ is a discount factor; rαA reward earned for performing the selected action;
(3e) outputting grant channel state
If the selected action is data transmission and does not conflict with the authorized user and the interferers, outputting the authorized channel state as idle; otherwise, the output channel state is occupied.
The invention achieves the following beneficial effects:
(1) the invention takes into account the problem of whether the perceived channel state is correct or not. The cognitive radio user continuously learns the perceived channel state through a channel state learning process, so that a correct channel state is obtained, and interference of the cognitive radio user on an authorized user and malicious attack of an interferer on the cognitive radio user are avoided.
(2) The invention can evaluate the channel state transition probability in real time and can evaluate the changed channel state transition probability. Therefore, certain evaluation time delay and channel state transition probability prediction error rate are avoided.
(3) The channel state transition probability prediction method in the cognitive radio network does not need to assume in advance that the activities of authorized users or interference users meet any probability distribution, and the prediction method is more generally applied to the cognitive radio network.
Drawings
FIG. 1 is an overall block diagram of the channel state transition probability prediction method of the present invention;
FIG. 2 is a flow chart of a channel state transition probability prediction method of the present invention;
fig. 3 is a flowchart of a channel state prediction method according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1, the present invention relates to a method for predicting channel state transition probability in a cognitive radio network, which includes two learning processes, namely a channel state transition probability learning process and a channel state learning process;
the channel state transition probability learning process and the channel state learning process are both based on a Q learning method, and the channel state learning process can learn the state of a channel according to the channel state transition probability learned by the channel state transition probability learning process and output the learned channel state.
The channel state transition probability learning process is based on the basic elements of a Q learning method and respectively comprises the following steps:
the state is as follows: the states of the authorized channel are 0 and 1, when the state of the authorized channel is 0, the authorized channel is in an idle state, and when the state of the authorized channel is 1, the authorized channel is occupied by an authorized user or attacked by an interferer or both the authorized user and the interferer;
the actions are as follows: the predicted grant channel state at the next time instant, i.e., 0or 1;
reward: determining according to the predicted authorized channel state at the next moment and the difference of the channel state learned by the channel learning process;
as shown in fig. 2, the Q learning process of the channel state transition probability learning process is as follows:
(2a) initializing parameters in a Q-learning method
Initializing a parameter in the Q learning method to be a Q value corresponding to a channel state to be predicted at the next moment corresponding to the current channel state in Q learning; during initialization, setting a Q value corresponding to a channel state to be predicted at the next moment corresponding to the current channel state in the Q learning method to be 0;
(2b) action decision process
Selecting action according to the Q value of the next state corresponding to the current state, namely predicting the state of the channel at the next moment;
selecting actions by a xi greedy method in the action decision process, namely selecting the state of the next moment corresponding to the maximum Q value by the cognitive radio user according to the probability of xi, and selecting the state of the next moment corresponding to the non-maximum Q value according to the probability of 1-xi;
the selection method comprises the following steps:
wherein StIs the state corresponding to the channel at the time t; st+1The state corresponding to the channel at the time of t + 1; q (-) is the channel state StQ values corresponding to different next time channel states;
(2c) updating the Q value
Updating according to the channel state learned in the channel state learning process, wherein the Q value updating method comprises the following steps:
wherein the content of the first and second substances,the Q value corresponding to the next state selected in the current state;the value is the maximum Q value corresponding to the current channel state; α is the learning rate; gamma is a discount factor; r is a reward earned for performing the selected action;
when the action selected in the channel state transition probability learning process is the same as the channel state learned by the channel state learning method, the channel state transition probability learning process can obtain the reward and record the reward as 1; if the action selected in the channel state transition probability learning process is different from the channel state learned by the channel state learning method, the obtainable reward is marked as-1;
(2d) computing channel state transition probabilities
Calculating the channel state transition probability according to the Q value of the next state corresponding to the current state;
wherein, P0,jIs the probability of the channel transitioning from state 0 to state j; p1,jIs the probability of the channel transitioning from state 1 to j; q. q.s0,jThe Q value of j is the next state corresponding to the current state 0; q. q.s0,0The Q value of the next state which corresponds to the current state 0 and is 0; q. q.s0,1The Q value of the next state corresponding to the current state 0 is 1; q. q.s1,jThe Q value of j is the next state corresponding to the current state 1; q. q.s1,0The Q value of the next state corresponding to the current state 1 is 0; q. q.s1,1The Q value of the next state corresponding to the current state 1 is 1.
A channel state learning process comprising the steps of:
the basic elements of the channel state learning process based on the Q learning method are respectively as follows:
the state is as follows: the state probability of the grant channel, that is, the probability that the state of the grant channel is 0 and the probability that the state of the grant channel is 1;
the actions are as follows: transmitting data or not transmitting data;
reward: determining whether to transmit data or not and whether to generate a collision;
as shown in fig. 3, the Q learning process of the channel state learning process is as follows:
(3a) predicting grant channel state probability
The cognitive radio calculates the probability of each state of the authorized channel according to the spectrum sensing result and the channel state transition probability output in the channel state transition probability learning process,
wherein the content of the first and second substances,for grant channel state at time tProbability of i, i ═ 0or 1;for granting channels
Probability that the state is i at time t +1, i ═ 0or 1; y isiThe probability that the authorized channel state observed by the sensing result is i is 0or 1; pi,jThe probability of the grant channel transitioning from state i to state j, j being 0or 1;
(3b) selecting corresponding action
Selecting a corresponding action according to the predicted state probability of the authorized channel,
the specific selection process is as follows: if the probability that the grant channel is free is greater than the probability that the grant channel is occupied, i.e. the grant channel is freeWhen the data is transmitted, the authorized channel is considered to be in an idle state, data transmission is carried out according to probability selection of eta, and data transmission is not carried out according to probability selection of 1-eta; otherwise, the authorized channel is considered to be in an occupied state, data transmission is not carried out according to the probability selection of eta, and data transmission is carried out according to the probability selection of 1-eta;
(3c) calculating rewards
If the selected action is data transmission and does not conflict with authorized users and interferers, the reward is 1; if the selected action is data transmission and conflicts with an authorized user or an interferer, the reward is-1; if the action is selected as not carrying out data transmission, the reward is 0;
(3d) updating the Q value
The Q value updating method comprises the following steps:
wherein the content of the first and second substances,a Q value corresponding to the selected action in the current state;the value is the maximum Q value of the action corresponding to the current state; β is the learning rate; χ is a discount factor; rαA reward earned for performing the selected action;
(3e) outputting grant channel state
If the selected action is data transmission and does not conflict with the authorized user and the interferers, outputting the authorized channel state as idle; otherwise, the output channel state is occupied.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (2)
1. A channel state transition probability prediction method in a cognitive radio network is characterized by comprising two learning processes, namely a channel state transition probability learning process and a channel state learning process;
the channel state transition probability learning process and the channel state learning process are based on a Q learning method, and the channel state learning process can learn the state of a channel according to the channel state transition probability learned by the channel state transition probability learning process and output the learned channel state;
the channel state transition probability learning process is based on the basic elements of a Q learning method and respectively comprises the following steps:
the state is as follows: the states of the authorized channel are 0 and 1, when the state of the authorized channel is 0, the authorized channel is in an idle state, and when the state of the authorized channel is 1, the authorized channel is occupied by an authorized user or attacked by an interferer or both the authorized user and the interferer;
the actions are as follows: the predicted grant channel state at the next time instant, i.e., 0or 1;
reward: determining according to the predicted authorized channel state at the next moment and the difference of the channel state learned by the channel learning process;
the Q learning process of the channel state transition probability learning process is as follows:
(2a) initializing parameters in a Q-learning method
Initializing a parameter in the Q learning method to be a Q value corresponding to a channel state to be predicted at the next moment corresponding to the current channel state in Q learning; during initialization, setting a Q value corresponding to a channel state to be predicted at the next moment corresponding to the current channel state in the Q learning method to be 0;
(2b) action decision process
Selecting action according to the Q value of the next state corresponding to the current state, namely predicting the state of the channel at the next moment;
selecting actions by a xi greedy method in the action decision process, namely selecting the state of the next moment corresponding to the maximum Q value by the cognitive radio user according to the probability of xi, and selecting the state of the next moment corresponding to the non-maximum Q value according to the probability of 1-xi;
the selection method comprises the following steps:
wherein StIs the state corresponding to the channel at the time t; st+1The state corresponding to the channel at the time of t + 1; q (-) is the channel state StQ values corresponding to different next time channel states;
(2c) updating the Q value
Updating according to the channel state learned in the channel state learning process, wherein the Q value updating method comprises the following steps:
wherein the content of the first and second substances,the Q value corresponding to the next state selected in the current state;the value is the maximum Q value corresponding to the current channel state; α is the learning rate; gamma is a discount factor; r is a reward earned for performing the selected action;
when the action selected in the channel state transition probability learning process is the same as the channel state learned by the channel state learning method, the channel state transition probability learning process can obtain the reward and record the reward as 1; if the action selected in the channel state transition probability learning process is different from the channel state learned by the channel state learning method, the obtainable reward is marked as-1;
(2d) computing channel state transition probabilities
Calculating the channel state transition probability according to the Q value of the next state corresponding to the current state;
wherein, P0,jIs the probability of the channel transitioning from state 0 to state j; p1,jIs the probability of the channel transitioning from state 1 to j; q. q.s0,jThe Q value of j is the next state corresponding to the current state 0; q. q.s0,0The Q value of the next state which corresponds to the current state 0 and is 0; q. q.s0,1The Q value of the next state corresponding to the current state 0 is 1; q. q.s1,jThe Q value of j is the next state corresponding to the current state 1; q. q.s1,0The Q value of the next state corresponding to the current state 1 is 0; q. q.s1,1The Q value of the next state corresponding to the current state 1 is 1.
2. The method for predicting channel state transition probability in a cognitive radio network according to claim 1, wherein: a channel state learning process comprising the steps of:
the basic elements of the channel state learning process based on the Q learning method are respectively as follows:
the state is as follows: the state probability of the grant channel, that is, the probability that the state of the grant channel is 0 and the probability that the state of the grant channel is 1;
the actions are as follows: transmitting data or not transmitting data;
reward: determining whether to transmit data or not and whether to generate a collision;
the Q learning process of the channel state learning process is as follows:
(3a) predicting grant channel state probability
The cognitive radio calculates the probability of each state of the authorized channel according to the spectrum sensing result and the channel state transition probability output in the channel state transition probability learning process,
wherein the content of the first and second substances,the probability that the state of the grant channel is i at the time t is 0or 1;the probability that the state of the grant channel is i at the time t +1 is 0or 1; y isiThe probability that the authorized channel state observed by the sensing result is i is 0or 1; pi,jThe probability of the grant channel transitioning from state i to state j, j being 0or 1;
(3b) selecting corresponding action
Selecting a corresponding action according to the predicted state probability of the authorized channel,
the specific selection process is as follows: if the probability that the grant channel is free is greater than the probability that the grant channel is occupied, i.e. the grant channel is freeAnd then, the authorized channel is considered to be in an idle state, and data transmission is carried out according to probability selection of etaAnd (3) selecting the probability of 1-eta not to transmit data; otherwise, the authorized channel is considered to be in an occupied state, data transmission is not carried out according to the probability selection of eta, and data transmission is carried out according to the probability selection of 1-eta;
(3c) calculating rewards
If the selected action is data transmission and does not conflict with authorized users and interferers, the reward is 1; if the selected action is data transmission and conflicts with an authorized user or an interferer, the reward is-1; if the action is selected as not carrying out data transmission, the reward is 0;
(3d) updating the Q value
The Q value updating method comprises the following steps:
wherein the content of the first and second substances,a Q value corresponding to the selected action in the current state;the value is the maximum Q value of the action corresponding to the current state; β is the learning rate; χ is a discount factor; rαA reward earned for performing the selected action;
(3e) outputting grant channel state
If the selected action is data transmission and does not conflict with the authorized user and the interferers, outputting the authorized channel state as idle; otherwise, the output channel state is occupied.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810696652.1A CN109039505B (en) | 2018-06-29 | 2018-06-29 | Channel state transition probability prediction method in cognitive radio network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810696652.1A CN109039505B (en) | 2018-06-29 | 2018-06-29 | Channel state transition probability prediction method in cognitive radio network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109039505A CN109039505A (en) | 2018-12-18 |
CN109039505B true CN109039505B (en) | 2021-02-09 |
Family
ID=65521930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810696652.1A Expired - Fee Related CN109039505B (en) | 2018-06-29 | 2018-06-29 | Channel state transition probability prediction method in cognitive radio network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109039505B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110380802A (en) * | 2019-06-14 | 2019-10-25 | 中国人民解放军陆军工程大学 | Single-user dynamic spectrum anti-interference system and method based on software radio platform |
CN110601826B (en) * | 2019-09-06 | 2021-10-08 | 北京邮电大学 | Self-adaptive channel distribution method in dynamic DWDM-QKD network based on machine learning |
CN111181669B (en) * | 2020-01-03 | 2022-02-11 | 中国科学院上海高等研究院 | Self-adaptive spectrum sensing method, system, medium and terminal based on pre-evaluation processing |
CN111211831A (en) * | 2020-01-13 | 2020-05-29 | 东方红卫星移动通信有限公司 | Multi-beam low-orbit satellite intelligent dynamic channel resource allocation method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102665219A (en) * | 2012-04-20 | 2012-09-12 | 南京邮电大学 | Dynamic frequency spectrum allocation method of home base station system based on OFDMA |
CN105120468A (en) * | 2015-07-13 | 2015-12-02 | 华中科技大学 | Dynamic wireless network selection method based on evolutionary game theory |
CN105357158A (en) * | 2015-10-26 | 2016-02-24 | 天津大学 | Method for node to access multiple channels exactly and efficiently in underwater cognitive network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10091785B2 (en) * | 2014-06-11 | 2018-10-02 | The Board Of Trustees Of The University Of Alabama | System and method for managing wireless frequency usage |
-
2018
- 2018-06-29 CN CN201810696652.1A patent/CN109039505B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102665219A (en) * | 2012-04-20 | 2012-09-12 | 南京邮电大学 | Dynamic frequency spectrum allocation method of home base station system based on OFDMA |
CN105120468A (en) * | 2015-07-13 | 2015-12-02 | 华中科技大学 | Dynamic wireless network selection method based on evolutionary game theory |
CN105357158A (en) * | 2015-10-26 | 2016-02-24 | 天津大学 | Method for node to access multiple channels exactly and efficiently in underwater cognitive network |
Non-Patent Citations (2)
Title |
---|
Intelligent Spectrum Management Based on Transfer Actor-Critic Learning for Rateless Transmissions in Cognitive Radio Networks;Koushik A.M.等;《IEEE Transactions on Mobile Computing》;20180501;第17卷(第5期);1204-1215页 * |
无线网络中基于深度Q学习的传输调度方案;朱江等;《通信学报》;20180430;第39卷(第4期);36-43页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109039505A (en) | 2018-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109039505B (en) | Channel state transition probability prediction method in cognitive radio network | |
Tumuluru et al. | Channel status prediction for cognitive radio networks | |
Mastronarde et al. | Joint physical-layer and system-level power management for delay-sensitive wireless communications | |
Fu et al. | Structure-aware stochastic control for transmission scheduling | |
CN103731173A (en) | Transceiver operating in wireless communication network, network transmission system and method | |
Wang et al. | Analysis of opportunistic spectrum access in cognitive radio networks using hidden Markov model with state prediction | |
Hosahalli et al. | Enhanced reinforcement learning assisted dynamic power management model for internet‐of‐things centric wireless sensor network | |
CN105915300B (en) | It is a kind of that spectrum prediction method being kept out of the way based on RLNC in CR networks | |
CN103916969A (en) | Combined authorized user perception and link state estimation method and device | |
CN113014340A (en) | Satellite spectrum resource dynamic allocation method based on neural network | |
Yan et al. | Gaussian process reinforcement learning for fast opportunistic spectrum access | |
Yang et al. | Adaptive modulation based on nondata-aided error vector magnitude for smart systems in smart cities | |
Thien et al. | A transfer games actor–critic learning framework for anti-jamming in multi-channel cognitive radio networks | |
Mafuta et al. | Decentralized resource allocation-based multiagent deep learning in vehicular network | |
CN108449151B (en) | Spectrum access method in cognitive radio network based on machine learning | |
Yang et al. | Detection performances and effective capacity of cognitive radio with primary user emulators | |
Ganewattha et al. | Confidence aware deep learning driven wireless resource allocation in shared spectrum bands | |
Nandakumar et al. | LSTM Based Spectrum Prediction for Real-Time Spectrum Access for IoT Applications. | |
CN109769258B (en) | Resource optimization method based on secure URLLC communication protocol | |
Håkansson et al. | Cost-aware dual prediction scheme for reducing transmissions at IoT sensor nodes | |
Osman | Empowering internet-of-everything (IoE) networks through synergizing Lagrange optimization and deep learning for enhanced performance | |
Caetano et al. | A recurrent neural network mac protocol towards to opportunistic communication in wireless networks | |
Teixeira et al. | Model-free predictor of signal-to-noise ratios for mobile communications systems | |
Fazeli‐Dehkordy et al. | Markovian‐based framework for cooperative channel selection in cognitive radio networks | |
Liu et al. | A cognitive relay network throughput optimization algorithm based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210209 |