CN103179675A - Epsilon-greed based online sequential perceiving and opportunity accessing method - Google Patents
Epsilon-greed based online sequential perceiving and opportunity accessing method Download PDFInfo
- Publication number
- CN103179675A CN103179675A CN2013100063434A CN201310006343A CN103179675A CN 103179675 A CN103179675 A CN 103179675A CN 2013100063434 A CN2013100063434 A CN 2013100063434A CN 201310006343 A CN201310006343 A CN 201310006343A CN 103179675 A CN103179675 A CN 103179675A
- Authority
- CN
- China
- Prior art keywords
- channel
- perception
- time slot
- access
- sequential
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
Disclosed is an epsilon-greed based online sequential perceiving and opportunity accessing method. In each time slot, users perceive channels sequentially and get access opportunistically to realize transmission. The method includes a step of initializing relative parameters and making access decisions based on online learning in each time slot, and has capabilities of learning environments actively and adapting to dynamic changes of environments. In addition, the method is an online decision implementing method, and the next decision is adjusted in real time according to each decision and feedback by systems, so that long-term accumulated throughput gain of the systems is maximized.
Description
Technical field
The present invention relates to the cognition wireless electrical domain in wireless communication technology, is specifically in the opportunistic spectrum connecting system for statistics the unknown, the on-line study method of optimum sequential perception order.
Background technology
Be subjected to the proposal of frequency spectrum supervision department and the driving of cognitive radio technology progress, dynamic spectrum access (DSA) has been widely recognized as the effective means that improves the availability of frequency spectrum.Unaffected in order to protect the primary user to communicate by letter, cognitive user needs channel is carried out frequency spectrum perception before the access channel, to guarantee channel idle.Be subject to level of hardware, the sub-fraction that cognitive terminal usually once can only the whole frequency range of perception.In the case, how reasonably to arrange the perception order, will directly affect throughput and the access delay of system.A crucial difficult point in realizing optimum channel-aware and accessing, the channel statistical that is difficult to exactly estimate distributes, and especially under actual heterogeneous network scene, usable probability and the link-quality of different channels are not quite similar.
On-line study due to its inherent adaptivity and validity, is widely used in dynamic wireless network.By limiting cognitive user channel of a perception in each time slot, existing online access research is modeled as classical multi-arm Slot Machine (MAB:Multi-Armed Bandit) analytical model with problems.Be that the user only needs according to the statistics of channel income, at channel access of each Slot selection, come maximization system accumulative total throughput.Although the research model of this simple " every time slot selects a channel " has certain reasonability in synchronous cycle sensory perceptual system, in more distributed cognition network, point-to-point communication scene especially, this naive model is also improper.On the one hand; due to the channel-aware time usually all be far smaller than transmission time slot (such as; the detecting period of TV channel is generally 10 Milliseconds; and the primary user protect the constraint under transmission time slot be 2 seconds); when user awareness is found current channel occupancy, be directly switch to next channel and carry out frequency spectrum perception than wait for that at former channel transmission time slot is more reasonable and effective next time.On the other hand, due to the randomness of radio channel state, switching channels carries out perception can obtain more transmission opportunity usually, namely obtains the multichannel diversity gain.And, due to the number of available channel numerous (such as, the user who surpasses half has the available TV channel more than 20), this diversity gain or considerable.
Based on this, the present invention is directed under the unknown isomery channel network of statistics, a kind of sequential channel-aware and access strategy based on on-line study proposed.In method different from the past, the every time slot of restriction is only selected a channel-aware access, in the model of this programme, allow user's sequential ground channel perception in each time slot, and the access of chance is transmitted.Thus, by adjusting real-time dynamicly perception order and access strategy, the aggregated throughput income of maximization system on the certain hour section.
Summary of the invention
The present invention proposes in a kind of dynamic spectrum environment online sequential perception and chance cut-in method based on ε-greediness, solving when statistical information is unknown, the problem of sequential perception serial order learning and aggregated throughput optimization.
The present invention realizes by the following technical solutions:
A kind of online sequential perception and chance cut-in method based on ε-greedy algorithm, in each time slot, user sequential ground channel perception, and the access of chance is transmitted.
In the present invention, comprise the step of initialization relevant parameter and the step based on the access decision-making of on-line study of carrying out at each time slot.
In the present invention, the step of initialization relevant parameter specifically comprises:
1.1 to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization
The number of times statistics n that each channel is perceived
i=0;
1.2 initialization candidate channel S set
0=1 ..., and N}, wherein N is total number of channels;
1.3 the control parameter ε=ε of initialization ε-greedy algorithm
0, ε
0Value relevant to total number of channels N, according to the channel number N in network scenarios, ε
0Get a value between 0.5 ~ 2.5.
In the present invention, described control parameter of algorithm ε
0Value and the relation of total number of channels N, as shown in table 1;
Total number of channels N | ≤2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Parameter ε 0Value | 0 | 0.08 | 0.16 | 0.31 | 0.44 | 0.65 | 0.78 | 0.98 | 1.17 |
Total number of channels N | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | ≥19 |
Parameter ε 0Value | 1.35 | 1.55 | 1.72 | 1.90 | 2.15 | 2.22 | 2.31 | 2.39 | 2.41 |
Table 1.
In the present invention, a time slot j in office carries out specifically comprising based on the step of the channel access decision-making of on-line study:
Step 0. is to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization
The number of times statistics n that each channel is perceived
i=0;
Step 1. adopts following formula to adjust candidate channel S set and control parameter of algorithm ε;
S=S
0
Channel in step 3. pair candidate channel S set carries out perception to select at random as lower probability;
Represent that channel idle is that channel can be used,
Represent channel occupied be that channel is unavailable;
Step 4. is upgraded the channel idle probability Estimation of each channel
The number of times statistics n perceived with each channel
i(j) as follows:
Wherein, n
i(j) j perceived number of times of time slot channel i of expression;
Step 5. is upgraded current channel set
If the channel of current perception
Unavailable, namely
Return to step 2, continue next channel of perception; Otherwise, access current idle channel ψ
The transmission of data.
Step 6. time slot finishes, and returns to step 1, begins sequential perception and the access of next time slot.
The present invention has advantages of following with respect to prior art:
1, academic environment that can be initiatively and the dynamic change of adaptive environment.Institute's extracting method is a kind of online implementation decision method, and system adjusts next step decision-making in real time according to decision and feedback each time, thereby is maximized the long-term accumulated throughput income of system.
2, can obtain the multichannel diversity gain.In on-line study scheme in the past, every time slot user can only select a channel to carry out the perception access, when selected channel-aware takies, needs to wait for that next time slot operates.And suggest plans, user sequential ground channel perception, and carry out on this basis the chance access.Thereby, can significantly improve pace of learning and throughput of system performance.
3, the computation complexity of learning method is low.Under institute's extracting method, the user selects constantly only need to select according to channel statistical income (single index) at each channel of every time slot, and optional space is N, thereby its computational complexity maximum is only also O (N).
4, the storage complexity of learning method is low.Suggest plans, the user only needs two variablees of each channel storage: the idle probability of statistical average and statistics perception number of times.And in whole statistics renewal process, there is no extra storage overhead, thereby have low-down storage complexity.
Description of drawings
Fig. 1 is sequential perception and access schematic diagram in the present invention.
Fig. 2 be in the present invention suggest plans and tradition based on without the Performance Ratio of the sequential perception of study and access scheme.
Fig. 3 is that in the present invention suggest plans and tradition be based on the Performance Ratio of the cycle perception access scheme of on-line study.
Embodiment
Sequential channel-aware and chance cut-in method based on on-line study provided by the invention, as shown in Figure 1, embodiment is as follows:
Consider a cognitive radio system that comprises N channel, channel set be 1,2 ..., N}.As Fig. 1, access and transmitting procedure based on sequential perception are described as: at each time slot, the user is according to certain order, and sequential carries out perception to channel, until find an idle channel, access this channel and use speed R at this time slot the transmission of data in the remaining time.(be current time slots when the user accesses channel after k step perception
), the instantaneous transmission income that it obtains is R (T-k τ
s), wherein, R is transmission rate, T is the duration of a transmission opportunity, and is the time overhead of a channel of every perception.
Problem solved by the invention is: under the prerequisite of Unknown Channel statistics, provide sequential (namely by time slot) the selection perception order of learning strategy, system's aggregated throughput is maximized.For this reason, proposition is based on the on-line study method of ε-greediness.Basic thought based on the dynamic order method of ε-greediness is exactly in learning process, the user estimates according to current channel statistical in every time slot, with the highest variable of the current estimation average of the probability selection of 1-ε, simultaneously, with random the selecting in all variablees of the probability of ε, meanwhile upgrade channel statistical and estimate.In algorithm, the ε initial assignment is ε
0, and along with time j changes: at j time slot,
Obviously, the ε value will slowly reduce along with the carrying out of study, thereby the strategy that makes the user is also along with channel statistical is tending towards accurately and gradually convergence.Concrete implementation step following steps:
(1) parameter initialization is completed following work
1.1 to each channel i, i ∈ 1 ..., N}, the idle probability Estimation of Initial Channel Assignment
And the perceived number of times statistics n of channel
i=0;
1.2 initialization candidate channel S set
0=1 ..., N};
1.3 initialization algorithm is controlled parameter ε
0ε
0Usually get a constant between 0.5 ~ 2.5, best ε
0Value is according to carrying out value as table 1.
(2) at each time slot j, carry out as follows on-line study and decision-making:
2.1 candidate channel set: S=S
0, while regularized learning algorithm parameter:
2.2 determine the statistical estimate preferred channels in current candidate channel set:
2.3 to determine at random k channel perception in j time slot as lower probability:
2.4 according to sensing results, upgrade channel statistical estimation and relevant parameter as follows:
Wherein,
Channel i at the upstate of time slot j:
Represent channel idle (channel can be used), and
Expression channel occupied (unavailable).
2.5 last, if channel is unavailable (namely
), returning to step 2.2, switching channels continues next channel of perception; Otherwise, access current idle channel, the transmission of data.
The embodiment of the present invention:
Example of the present invention is as follows, and the parameter setting does not affect generality.As shown in Figure 1, have 10 candidate channel, i.e. N=10 in the considering cognition wireless network.The idle probability θ of channel
i∈ [0,1], in this embodiment, idle probability is as shown in table 2:
Channel i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Idle probability θ i | 0.3 | 0.7 | 0.6 | 0.2 | 0.8 | 0.5 | 0.7 | 0.6 | 0.9 | 0.4 |
Table 2
Obviously, optimum channel-aware is sequentially [9,5,7,2,3,8,6,10, Isosorbide-5-Nitrae] (certainly, because the idle probability of local channel equates, the perception order after its transposition is still optimum perception order).
If the user possesses perfect statistical information in advance, it can be according to optimal ordering sequential perception and access.Yet in fact under most of scene, the user does not also know the statistical information of channel, only can constantly learn Distribution Statistics by perception.The present invention carries out sequential perception and access decision-making based on on-line study namely for this.Concrete step is as follows:
Initiation parameter:
To each channel i, i ∈ 1 ..., 10} thinks the channel idle probability at initial time
Simultaneously, the perceived number of times statistics n of channel this moment
i=0; Candidate channel set S
0=1 ..., 10}.
According to the optimization control parameter-channel number shown in table 1, initialization ε
0=1.2(is due to N=10).
Thus, the detailed process based on the online sequential perception of ε-greediness and chance cut-in method that proposes of the present invention is as follows:
At first time slot, namely during j=1:
1.1 the idle probability Estimation of all channels all equates, is zero.At this moment, channel of random selection carries out perception;
1.2 perceive channel idle, access transmission.Otherwise channel of random selection, carry out perception in remaining channel;
In remaining time slots, i.e. j 〉=2 o'clock:
2.1 every time slot upgrades and controls parameter:
2.2 because channel statistical is estimated to change, the statistical estimate preferred channels in current candidate channel set is determined in the current estimation of foundation:
2.3 to determine at random that as lower probability a channel carries out perception:
2.4 perceive channel idle, access; Otherwise, continue repeating step 2.2 and 2.3, until the access channel transmits;
2.5 in sequential perception, according to sensing results, upgrade channel statistical estimation and relevant parameter as follows:
2.6 because algorithm that the present invention carries is the on-line decision algorithm, therefore do not need to arrange specially end condition.On-line operation is until data transmission procedure finishes.
Fig. 2 has provided tradition based on the throughput performance comparison analogous diagram without the sequential perception of study and access scheme and this patent institute extracting method.As can be seen from Figure 2, institute of the present invention extracting method is owing to introducing efficient on-line learning algorithm, the throughput of system performance along with the time increase can approach fast perfect statistics lower optimal performance.Compare with traditional sequential perception access scheme (also claiming random sequential perception and access scheme) without under study, have obvious advantage.
Fig. 3 is that in the present invention suggest plans and tradition be based on the Performance Ratio of the cycle perception access scheme of on-line study.Traditional online access scheme only selects a channel to carry out perception based on cycle perception access mechanism at every time slot, and the free time is accessed, and takies and waits for next time slot.On-line learning algorithm under this background can't effectively excavate the multichannel diversity.As shown in Figure 3, the on-line study scheme that this programme is carried or on pace of learning, all is far superior to existing on-line study scheme no matter on throughput.
Claims (5)
1. online sequential perception and chance cut-in method based on a ε-greedy algorithm, is characterized in that in each time slot, user sequential ground channel perception, and the access of chance is transmitted.
2. online sequential perception and chance cut-in method based on ε-greediness according to claim 1, is characterized in that, comprises the step of initialization relevant parameter and the step based on the access decision-making of on-line study of carrying out at each time slot.
3. online sequential perception and chance cut-in method based on ε-greedy algorithm according to claim 2, is characterized in that, the step of initialization relevant parameter specifically comprises:
1.1 to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization
The number of times statistics n that each channel is perceived
i=0;
1.2 initialization candidate channel S set
0=1 ..., and N}, wherein N is total number of channels;
1.3 the control parameter ε=ε of initialization ε-greedy algorithm
0, ε
0Value relevant to total number of channels N, according to the channel number N in network scenarios, ε
0Get a value between 0.5 ~ 2.5.
4. online sequential perception and chance cut-in method based on ε-greedy algorithm according to claim 3, is characterized in that described control parameter of algorithm ε
0Value and the relation of total number of channels N, as shown in table 1;
Table 1.
5. online sequential perception and chance cut-in method based on ε-greediness according to claim 2, is characterized in that, a time slot j in office carries out specifically comprising based on the step of the channel access decision-making of on-line study:
Step 0. is to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization
The number of times statistics n that each channel is perceived
i=0;
Step 1. adopts following formula to adjust candidate channel S set and control parameter of algorithm ε;
S=S
0
Step 2. is chosen the channel i of idle probability Estimation value maximum in current candidate channel S set
*
Channel in step 3. pair candidate channel S set carries out perception to select at random as lower probability;
Represent that channel idle is that channel can be used,
Represent channel occupied be that channel is unavailable;
Step 4. is upgraded the channel idle probability Estimation of each channel
The number of times statistics n perceived with each channel
i(j) as follows:
Wherein, n
i(j) j perceived number of times of time slot channel i of expression;
Step 5. is upgraded current channel set
If the channel ψ of current perception
Unavailable, namely
Return to step 2, continue next channel of perception; Otherwise, access current idle channel
The transmission of data.
Step 6. time slot finishes, and returns to step 1, begins sequential perception and the access of next time slot.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310006343.4A CN103179675B (en) | 2013-01-08 | 2013-01-08 | Online sequential perception based on ε-greediness and chance cut-in method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310006343.4A CN103179675B (en) | 2013-01-08 | 2013-01-08 | Online sequential perception based on ε-greediness and chance cut-in method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103179675A true CN103179675A (en) | 2013-06-26 |
CN103179675B CN103179675B (en) | 2016-05-04 |
Family
ID=48639229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310006343.4A Active CN103179675B (en) | 2013-01-08 | 2013-01-08 | Online sequential perception based on ε-greediness and chance cut-in method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103179675B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107046468A (en) * | 2017-06-14 | 2017-08-15 | 电子科技大学 | A kind of physical layer certification thresholding determines method and system |
CN110351886A (en) * | 2019-06-29 | 2019-10-18 | 中国人民解放军军事科学院国防科技创新研究院 | Opportunistic spectrum access method based on sideband observation information multi-arm Slot Machine model |
CN110351884A (en) * | 2019-06-29 | 2019-10-18 | 中国人民解放军军事科学院国防科技创新研究院 | A kind of spectrum opportunities cut-in method based on the double-deck multi-arm Slot Machine statistical model |
CN110856181A (en) * | 2019-11-20 | 2020-02-28 | 长江师范学院 | Distributed service matching sequential spectrum access decision method |
CN113472843A (en) * | 2021-05-24 | 2021-10-01 | 国网山东省电力公司电力科学研究院 | Greedy algorithm based MQTT protocol QoS mechanism selection method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101466111A (en) * | 2009-01-13 | 2009-06-24 | 中国人民解放军理工大学通信工程学院 | Dynamic spectrum access method based on policy planning constrain Q study |
CN102413078A (en) * | 2011-11-09 | 2012-04-11 | 南京邮电大学 | Method for searching idle channel in cognitive sensor network |
-
2013
- 2013-01-08 CN CN201310006343.4A patent/CN103179675B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101466111A (en) * | 2009-01-13 | 2009-06-24 | 中国人民解放军理工大学通信工程学院 | Dynamic spectrum access method based on policy planning constrain Q study |
CN102413078A (en) * | 2011-11-09 | 2012-04-11 | 南京邮电大学 | Method for searching idle channel in cognitive sensor network |
Non-Patent Citations (2)
Title |
---|
HAI JIANG, ET.AL: ""Optimal Selection of Channel Sensing Order in Cognitive Radio"", 《IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS》 * |
王金刚等: ""认知无线电多信道下的感知时间优化"", 《通信技术》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107046468A (en) * | 2017-06-14 | 2017-08-15 | 电子科技大学 | A kind of physical layer certification thresholding determines method and system |
CN107046468B (en) * | 2017-06-14 | 2020-10-02 | 电子科技大学 | Physical layer authentication threshold determination method and system |
CN110351886A (en) * | 2019-06-29 | 2019-10-18 | 中国人民解放军军事科学院国防科技创新研究院 | Opportunistic spectrum access method based on sideband observation information multi-arm Slot Machine model |
CN110351884A (en) * | 2019-06-29 | 2019-10-18 | 中国人民解放军军事科学院国防科技创新研究院 | A kind of spectrum opportunities cut-in method based on the double-deck multi-arm Slot Machine statistical model |
CN110351884B (en) * | 2019-06-29 | 2020-08-28 | 中国人民解放军军事科学院国防科技创新研究院 | Spectrum opportunity access method based on double-layer multi-arm tiger machine statistical model |
CN110856181A (en) * | 2019-11-20 | 2020-02-28 | 长江师范学院 | Distributed service matching sequential spectrum access decision method |
CN110856181B (en) * | 2019-11-20 | 2021-07-20 | 长江师范学院 | Distributed service matching sequential spectrum access decision method |
CN113472843A (en) * | 2021-05-24 | 2021-10-01 | 国网山东省电力公司电力科学研究院 | Greedy algorithm based MQTT protocol QoS mechanism selection method |
Also Published As
Publication number | Publication date |
---|---|
CN103179675B (en) | 2016-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103179675A (en) | Epsilon-greed based online sequential perceiving and opportunity accessing method | |
WO2016150395A1 (en) | Adaptive, anomaly detection based predictor for network time series data | |
CN104993857B (en) | A kind of method and device of cooperative beam figuration | |
CN106658422A (en) | Network side positioning method and network side positioning system for aiming at high-sparse WiFi data | |
CN103442366B (en) | A kind of cognitive radio users space division multiplexing method based on interference alignment | |
He et al. | Throughput maximization in cognitive radio under peak interference constraints with limited feedback | |
CN113316154A (en) | Authorized and unauthorized D2D communication resource joint intelligent distribution method | |
CN114885426B (en) | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network | |
CN101729164B (en) | Wireless resource allocation method and cognitive radio user equipment | |
EP2237511A1 (en) | Self-adaptive codebook processing method | |
CN102368854B (en) | Cognitive radio network frequency spectrum sharing method based on feedback control information | |
CN105472754A (en) | Resource allocation method and device | |
CN102448070B (en) | Frequency-power united allocation method based on multi-agent reinforcement learning in dynamic frequency spectrum environment | |
CN104301281A (en) | Transmitting antenna number estimation method for MIMO-OFDM system under frequency selective fading channel | |
CN112134602A (en) | Method for updating user state information in large-scale MIMO system | |
CN102186232B (en) | Power distribution method for multi-district OFDMA (Orthogonal Frequency Division Modulation) system | |
CN104009824A (en) | Pilot assisted data fusion method based on differential evolution in base station coordination uplink system | |
WO2013104120A1 (en) | Frequency-power joint distribution method based on multi-agent reinforcement learning in dynamic spectrum environment | |
Albataineh | Blind decoding of massive MIMO uplink systems based on the higher order cumulants | |
CN103248460A (en) | Interference alignment based signal processing method for MIMO (Multiple Input Multiple Output) system in nonideal channel state | |
CN104135736A (en) | A method of frequency spectrum monitoring and adaptive communication based on cognition | |
CN103686755A (en) | On-line learning method capable of realizing optimal transmission for cognitive radio | |
CN103220025B (en) | A kind of multi-user of the VMIMO of being applied to system matches algorithm | |
CN107249213B (en) | A kind of maximized power distribution method of D2D communication Intermediate Frequency spectrum efficiency | |
CN104467930A (en) | Multi-user MIMO system user selection method based on space angle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |