CN103179675A

CN103179675A - Epsilon-greed based online sequential perceiving and opportunity accessing method

Info

Publication number: CN103179675A
Application number: CN2013100063434A
Authority: CN
Inventors: 王金龙; 吴启晖; 李柏文; 郑学强
Original assignee: COMMUNICATION ENGINEERING COLLEGE SCIENCE & ENGINEEIRNG UNIV PLA
Current assignee: COMMUNICATION ENGINEERING COLLEGE SCIENCE & ENGINEEIRNG UNIV PLA
Priority date: 2013-01-08
Filing date: 2013-01-08
Publication date: 2013-06-26
Anticipated expiration: 2033-01-08
Also published as: CN103179675B

Abstract

Disclosed is an epsilon-greed based online sequential perceiving and opportunity accessing method. In each time slot, users perceive channels sequentially and get access opportunistically to realize transmission. The method includes a step of initializing relative parameters and making access decisions based on online learning in each time slot, and has capabilities of learning environments actively and adapting to dynamic changes of environments. In addition, the method is an online decision implementing method, and the next decision is adjusted in real time according to each decision and feedback by systems, so that long-term accumulated throughput gain of the systems is maximized.

Description

Online sequential perception and chance cut-in method based on ε-greediness

Technical field

The present invention relates to the cognition wireless electrical domain in wireless communication technology, is specifically in the opportunistic spectrum connecting system for statistics the unknown, the on-line study method of optimum sequential perception order.

Background technology

Be subjected to the proposal of frequency spectrum supervision department and the driving of cognitive radio technology progress, dynamic spectrum access (DSA) has been widely recognized as the effective means that improves the availability of frequency spectrum.Unaffected in order to protect the primary user to communicate by letter, cognitive user needs channel is carried out frequency spectrum perception before the access channel, to guarantee channel idle.Be subject to level of hardware, the sub-fraction that cognitive terminal usually once can only the whole frequency range of perception.In the case, how reasonably to arrange the perception order, will directly affect throughput and the access delay of system.A crucial difficult point in realizing optimum channel-aware and accessing, the channel statistical that is difficult to exactly estimate distributes, and especially under actual heterogeneous network scene, usable probability and the link-quality of different channels are not quite similar.

On-line study due to its inherent adaptivity and validity, is widely used in dynamic wireless network.By limiting cognitive user channel of a perception in each time slot, existing online access research is modeled as classical multi-arm Slot Machine (MAB:Multi-Armed Bandit) analytical model with problems.Be that the user only needs according to the statistics of channel income, at channel access of each Slot selection, come maximization system accumulative total throughput.Although the research model of this simple " every time slot selects a channel " has certain reasonability in synchronous cycle sensory perceptual system, in more distributed cognition network, point-to-point communication scene especially, this naive model is also improper.On the one hand; due to the channel-aware time usually all be far smaller than transmission time slot (such as; the detecting period of TV channel is generally 10 Milliseconds; and the primary user protect the constraint under transmission time slot be 2 seconds); when user awareness is found current channel occupancy, be directly switch to next channel and carry out frequency spectrum perception than wait for that at former channel transmission time slot is more reasonable and effective next time.On the other hand, due to the randomness of radio channel state, switching channels carries out perception can obtain more transmission opportunity usually, namely obtains the multichannel diversity gain.And, due to the number of available channel numerous (such as, the user who surpasses half has the available TV channel more than 20), this diversity gain or considerable.

Based on this, the present invention is directed under the unknown isomery channel network of statistics, a kind of sequential channel-aware and access strategy based on on-line study proposed.In method different from the past, the every time slot of restriction is only selected a channel-aware access, in the model of this programme, allow user's sequential ground channel perception in each time slot, and the access of chance is transmitted.Thus, by adjusting real-time dynamicly perception order and access strategy, the aggregated throughput income of maximization system on the certain hour section.

Summary of the invention

The present invention proposes in a kind of dynamic spectrum environment online sequential perception and chance cut-in method based on ε-greediness, solving when statistical information is unknown, the problem of sequential perception serial order learning and aggregated throughput optimization.

The present invention realizes by the following technical solutions:

A kind of online sequential perception and chance cut-in method based on ε-greedy algorithm, in each time slot, user sequential ground channel perception, and the access of chance is transmitted.

In the present invention, comprise the step of initialization relevant parameter and the step based on the access decision-making of on-line study of carrying out at each time slot.

In the present invention, the step of initialization relevant parameter specifically comprises:

1.1 to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization The number of times statistics n that each channel is perceived _i=0;

1.2 initialization candidate channel S set ₀=1 ..., and N}, wherein N is total number of channels;

1.3 the control parameter ε=ε of initialization ε-greedy algorithm ₀, ε ₀Value relevant to total number of channels N, according to the channel number N in network scenarios, ε ₀Get a value between 0.5 ~ 2.5.

In the present invention, described control parameter of algorithm ε ₀Value and the relation of total number of channels N, as shown in table 1;

Total number of channels N	≤2	3	4	5	6	7	8	9	10
										Parameter ε ₀Value	0	0.08	0.16	0.31	0.44	0.65	0.78	0.98	1.17
Total number of channels N	11	12	13	14	15	16	17	18	≥19
										Parameter ε ₀Value	1.35	1.55	1.72	1.90	2.15	2.22	2.31	2.39	2.41

Table 1.

In the present invention, a time slot j in office carries out specifically comprising based on the step of the channel access decision-making of on-line study:

Step 0. is to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization The number of times statistics n that each channel is perceived _i=0;

Step 1. adopts following formula to adjust candidate channel S set and control parameter of algorithm ε;

S=S ₀

ϵ = \min {1, ϵ_{0} \frac{\ln (j + 1)}{j + 1}}

Step 2. is chosen the channel i of idle probability Estimation value maximum in current candidate channel S set ^*

i^{*} = \arg \max_{i &Element; S} {{\hat{θ}}_{i}}

Channel in step 3. pair candidate channel S set carries out perception to select at random as lower probability;

Represent the channel perception that carries out the k time perception in j time slot;

Recording sensing results is the channel of perception

Upstate at time slot j

Represent that channel idle is that channel can be used,

Represent channel occupied be that channel is unavailable;

Step 4. is upgraded the channel idle probability Estimation of each channel

The number of times statistics n perceived with each channel _i(j) as follows:

n_{i} (j) = \{\begin{matrix} n_{i} (j - 1) + 1, & i = ψ_{k}^{j} \\ n_{i} (j - 1), & i &NotEqual; ψ_{k}^{j} \end{matrix}

{\hat{θ}}_{i} (j) = \{\begin{matrix} \frac{{\hat{θ}}_{i} (j - 1) n_{i} (j - 1) + a_{i}^{j}}{n_{i} (j - 1) + 1}, & i = ψ_{k}^{j} \\ {\hat{θ}}_{i} (j - 1), & i &NotEqual; ψ_{k}^{j} \end{matrix}

Wherein, n _i(j) j perceived number of times of time slot channel i of expression;

The idle probability Estimation value that represents j time slot channel i;

Step 5. is upgraded current channel set

If the channel of current perception

Unavailable, namely Return to step 2, continue next channel of perception; Otherwise, access current idle channel ψ

The transmission of data.

Step 6. time slot finishes, and returns to step 1, begins sequential perception and the access of next time slot.

The present invention has advantages of following with respect to prior art:

1, academic environment that can be initiatively and the dynamic change of adaptive environment.Institute's extracting method is a kind of online implementation decision method, and system adjusts next step decision-making in real time according to decision and feedback each time, thereby is maximized the long-term accumulated throughput income of system.

2, can obtain the multichannel diversity gain.In on-line study scheme in the past, every time slot user can only select a channel to carry out the perception access, when selected channel-aware takies, needs to wait for that next time slot operates.And suggest plans, user sequential ground channel perception, and carry out on this basis the chance access.Thereby, can significantly improve pace of learning and throughput of system performance.

3, the computation complexity of learning method is low.Under institute's extracting method, the user selects constantly only need to select according to channel statistical income (single index) at each channel of every time slot, and optional space is N, thereby its computational complexity maximum is only also O (N).

4, the storage complexity of learning method is low.Suggest plans, the user only needs two variablees of each channel storage: the idle probability of statistical average and statistics perception number of times.And in whole statistics renewal process, there is no extra storage overhead, thereby have low-down storage complexity.

Description of drawings

Fig. 1 is sequential perception and access schematic diagram in the present invention.

Fig. 2 be in the present invention suggest plans and tradition based on without the Performance Ratio of the sequential perception of study and access scheme.

Fig. 3 is that in the present invention suggest plans and tradition be based on the Performance Ratio of the cycle perception access scheme of on-line study.

Embodiment

Sequential channel-aware and chance cut-in method based on on-line study provided by the invention, as shown in Figure 1, embodiment is as follows:

Consider a cognitive radio system that comprises N channel, channel set be 1,2 ..., N}.As Fig. 1, access and transmitting procedure based on sequential perception are described as: at each time slot, the user is according to certain order, and sequential carries out perception to channel, until find an idle channel, access this channel and use speed R at this time slot the transmission of data in the remaining time.(be current time slots when the user accesses channel after k step perception

), the instantaneous transmission income that it obtains is R (T-k τ _s), wherein, R is transmission rate, T is the duration of a transmission opportunity, and is the time overhead of a channel of every perception.

Problem solved by the invention is: under the prerequisite of Unknown Channel statistics, provide sequential (namely by time slot) the selection perception order of learning strategy, system's aggregated throughput is maximized.For this reason, proposition is based on the on-line study method of ε-greediness.Basic thought based on the dynamic order method of ε-greediness is exactly in learning process, the user estimates according to current channel statistical in every time slot, with the highest variable of the current estimation average of the probability selection of 1-ε, simultaneously, with random the selecting in all variablees of the probability of ε, meanwhile upgrade channel statistical and estimate.In algorithm, the ε initial assignment is ε ₀, and along with time j changes: at j time slot,

Obviously, the ε value will slowly reduce along with the carrying out of study, thereby the strategy that makes the user is also along with channel statistical is tending towards accurately and gradually convergence.Concrete implementation step following steps:

(1) parameter initialization is completed following work

1.1 to each channel i, i ∈ 1 ..., N}, the idle probability Estimation of Initial Channel Assignment

And the perceived number of times statistics n of channel _i=0;

1.2 initialization candidate channel S set ₀=1 ..., N};

1.3 initialization algorithm is controlled parameter ε ₀ε ₀Usually get a constant between 0.5 ~ 2.5, best ε ₀Value is according to carrying out value as table 1.

(2) at each time slot j, carry out as follows on-line study and decision-making:

2.1 candidate channel set: S=S ₀, while regularized learning algorithm parameter:

ϵ = \min {1, ϵ_{0} \frac{\ln (j + 1)}{j + 1}}

2.2 determine the statistical estimate preferred channels in current candidate channel set:

i^{*} = \arg \max_{i &Element; S} {{\hat{θ}}_{i}}

2.3 to determine at random k channel perception in j time slot as lower probability:

2.4 according to sensing results, upgrade channel statistical estimation and relevant parameter as follows:

n_{i} (j) = \{\begin{matrix} n_{i} (j - 1) + 1, & i = ψ_{k}^{j} \\ n_{i} (j - 1), & i &NotEqual; ψ_{k}^{j} \end{matrix}

{\hat{θ}}_{i} (j) = \{\begin{matrix} \frac{{\hat{θ}}_{i} (j - 1) n_{i} (j - 1) + a_{i}^{j}}{n_{i} (j - 1) + 1}, & i = ψ_{k}^{j} \\ {\hat{θ}}_{i} (j - 1), & i &NotEqual; ψ_{k}^{j} \end{matrix}

S = S_{0} \ {ψ_{k}^{j}}

Wherein,

Channel i at the upstate of time slot j:

Represent channel idle (channel can be used), and

Expression channel occupied (unavailable).

2.5 last, if channel is unavailable (namely

), returning to step 2.2, switching channels continues next channel of perception; Otherwise, access current idle channel, the transmission of data.

The embodiment of the present invention:

Example of the present invention is as follows, and the parameter setting does not affect generality.As shown in Figure 1, have 10 candidate channel, i.e. N=10 in the considering cognition wireless network.The idle probability θ of channel _i∈ [0,1], in this embodiment, idle probability is as shown in table 2:

Channel i	1	2	3	4	5	6	7	8	9	10
											Idle probability θ _i	0.3	0.7	0.6	0.2	0.8	0.5	0.7	0.6	0.9	0.4

Table 2

Obviously, optimum channel-aware is sequentially [9,5,7,2,3,8,6,10, Isosorbide-5-Nitrae] (certainly, because the idle probability of local channel equates, the perception order after its transposition is still optimum perception order).

If the user possesses perfect statistical information in advance, it can be according to optimal ordering sequential perception and access.Yet in fact under most of scene, the user does not also know the statistical information of channel, only can constantly learn Distribution Statistics by perception.The present invention carries out sequential perception and access decision-making based on on-line study namely for this.Concrete step is as follows:

Initiation parameter:

To each channel i, i ∈ 1 ..., 10} thinks the channel idle probability at initial time

Simultaneously, the perceived number of times statistics n of channel this moment _i=0; Candidate channel set S ₀=1 ..., 10}.

According to the optimization control parameter-channel number shown in table 1, initialization ε ₀=1.2(is due to N=10).

Thus, the detailed process based on the online sequential perception of ε-greediness and chance cut-in method that proposes of the present invention is as follows:

At first time slot, namely during j=1:

1.1 the idle probability Estimation of all channels all equates, is zero.At this moment, channel of random selection carries out perception;

1.2 perceive channel idle, access transmission.Otherwise channel of random selection, carry out perception in remaining channel;

1.3 repeat step 2, until the access transmission, and in this process, upgrade n _iWith

As follows:

n_{i} = \{\begin{matrix} n_{i} + 1, & i = ψ_{k}^{j} \\ n_{i}, & i &NotEqual; ψ_{k}^{j} \end{matrix}

{\hat{θ}}_{i} = \{\begin{matrix} \frac{{\hat{θ}}_{i} n_{i} + a_{i}^{j}}{n_{i} + 1}, & i = ψ_{k}^{j} \\ {\hat{θ}}_{i}, & i &NotEqual; ψ_{k}^{j} \end{matrix}

In remaining time slots, i.e. j 〉=2 o'clock:

2.1 every time slot upgrades and controls parameter:

ϵ = \min {1, ϵ_{0} \frac{\ln j}{j}}

2.2 because channel statistical is estimated to change, the statistical estimate preferred channels in current candidate channel set is determined in the current estimation of foundation:

i^{*} = \arg \max_{i &Element; S} {{\hat{θ}}_{i}}

2.3 to determine at random that as lower probability a channel carries out perception:

2.4 perceive channel idle, access; Otherwise, continue repeating step 2.2 and 2.3, until the access channel transmits;

2.5 in sequential perception, according to sensing results, upgrade channel statistical estimation and relevant parameter as follows:

n_{i} (j) = \{\begin{matrix} n_{i} (j - 1) + 1, & i = ψ_{k}^{j} \\ n_{i} (j - 1), & i &NotEqual; ψ_{k}^{j} \end{matrix}

{\hat{θ}}_{i} (j) = \{\begin{matrix} \frac{{\hat{θ}}_{i} (j - 1) n_{i} (j - 1) + a_{i}^{j}}{n_{i} (j - 1) + 1}, & i = ψ_{k}^{j} \\ {\hat{θ}}_{i} (j - 1), & i &NotEqual; ψ_{k}^{j} \end{matrix}

S = S_{0} \ {ψ_{k}^{j}}

2.6 because algorithm that the present invention carries is the on-line decision algorithm, therefore do not need to arrange specially end condition.On-line operation is until data transmission procedure finishes.

Fig. 2 has provided tradition based on the throughput performance comparison analogous diagram without the sequential perception of study and access scheme and this patent institute extracting method.As can be seen from Figure 2, institute of the present invention extracting method is owing to introducing efficient on-line learning algorithm, the throughput of system performance along with the time increase can approach fast perfect statistics lower optimal performance.Compare with traditional sequential perception access scheme (also claiming random sequential perception and access scheme) without under study, have obvious advantage.

Fig. 3 is that in the present invention suggest plans and tradition be based on the Performance Ratio of the cycle perception access scheme of on-line study.Traditional online access scheme only selects a channel to carry out perception based on cycle perception access mechanism at every time slot, and the free time is accessed, and takies and waits for next time slot.On-line learning algorithm under this background can't effectively excavate the multichannel diversity.As shown in Figure 3, the on-line study scheme that this programme is carried or on pace of learning, all is far superior to existing on-line study scheme no matter on throughput.

Claims

1. online sequential perception and chance cut-in method based on a ε-greedy algorithm, is characterized in that in each time slot, user sequential ground channel perception, and the access of chance is transmitted.

2. online sequential perception and chance cut-in method based on ε-greediness according to claim 1, is characterized in that, comprises the step of initialization relevant parameter and the step based on the access decision-making of on-line study of carrying out at each time slot.

3. online sequential perception and chance cut-in method based on ε-greedy algorithm according to claim 2, is characterized in that, the step of initialization relevant parameter specifically comprises:

1.1 to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization

The number of times statistics n that each channel is perceived _i=0;

4. online sequential perception and chance cut-in method based on ε-greedy algorithm according to claim 3, is characterized in that described control parameter of algorithm ε ₀Value and the relation of total number of channels N, as shown in table 1;

Table 1.

5. online sequential perception and chance cut-in method based on ε-greediness according to claim 2, is characterized in that, a time slot j in office carries out specifically comprising based on the step of the channel access decision-making of on-line study:

Step 0. is to each channel i, i ∈ 1 ..., N}, each channel idle probability Estimation of initialization