CN108880709B

CN108880709B - Distributed multi-user dynamic spectrum access method in a kind of cognition wireless network

Info

Publication number: CN108880709B
Application number: CN201810737835.3A
Authority: CN
Inventors: 李立欣; 杨佩彤; 张会生; 高昂; 梁微; 李旭
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2018-07-06
Filing date: 2018-07-06
Publication date: 2019-05-07
Anticipated expiration: 2038-07-06
Also published as: CN108880709A

Abstract

The invention discloses distributed multi-user dynamic spectrum access method in a kind of cognition wireless network, this method is as follows: Step 1: building system model, the system model are as follows: n authorized user and k cognitive user in cell share b channel；Wherein: n, k and b take the natural number being not zero, and the value of n and b is equal；K > n；Step 2: carrying out frequency spectrum selection and access to cognitive user using DQN algorithm, specifically: setting initial Q function, each cognitive user is set as an executor, each executor carries out movement selection according to DQN algorithm, selection one is transmitted i.e. from b channel, the average utility value of computing system sets reward value using evolution theory of games；Then neural network is trained, uses neural network as function approximator, obtains updated Q function；Step 3: step 2 is continued to execute, until obtained power system capacity tends to definite value.Under high spectrum conditions of demand, band efficiency is high.

Description

Distributed multi-user dynamic spectrum access method in a kind of cognition wireless network

Technical field

The invention belongs to wireless communication technology fields, and in particular to distributed multi-user dynamic in a kind of cognition wireless network Frequency spectrum access method.

Background technique

With the rapid development of wireless communication technique and universal and new business the continuous growth of wireless device, frequency spectrum money Source becomes more and more in short supply, and fixed available frequency spectrum resource can no longer meet the communicating requirement of user, and low frequency spectrum utilizes in addition The problems such as frequency spectrum resource brought by rate is insufficient becomes to get worse, so that wireless communication system is pushing economy and society development When receive the constraint of frequency spectrum resource.Cognitive radio technology has become the key for solving the problems, such as low frequency spectrum utilization rate at present Technology, the main thought of the technology are to detect which frequency spectrum is in idle condition first, then these skies of intelligent selection and access Ideler frequency spectrum, which greatly enhances the availability of frequency spectrums.

In order to improve the Quality of experience of user and alleviate frequency spectrum pressure, there are many dynamics in cognition wireless network recently The relevant work of spectrum management is completed.These research work greatly enhance the availability of frequency spectrum.But these have Its limitation, although to the more demanding of environment, and user does not have as that can obtain Nash Equilibrium and evolution stable equilibrium Process through overfitting, convergence rate are slow.Or research is military network, each auxiliary nothing being built upon in military network Line electrical nodes can distribute in the premise of frequency spectrum resource according to its priority, using there is certain limitation.Some is not In view of the equilibrium between each user and coordinate, systematic comparison is unstable.

Summary of the invention

Technical problem to be solved by the present invention lies in view of the above shortcomings of the prior art, provide a kind of cognitive wireless Distributed multi-user dynamic spectrum access method in network, under high spectrum conditions of demand, band efficiency is high.

In order to solve the above technical problems, the technical solution adopted by the present invention is that, it is distributed more in a kind of cognition wireless network User's dynamic spectrum access method, this method are as follows:

Step 1: building system model, the system model are as follows: the n authorized user and k cognitive user in cell are total Enjoy b channel；Wherein: n, k and b take the natural number being not zero, and the value of n and b is equal；K > n；

Step 2: frequency spectrum selection and access are carried out to cognitive user using DQN algorithm, specifically:

Initial Q function is set, sets each cognitive user as an executor, each executor calculates according to DQN Method carries out movement selection, i.e., selection one is transmitted from b channel, the average utility value of computing system, rich using evolution Play chess theory setting reward value；Then neural network is trained, uses neural network as function approximator, obtains updated Q function；

Step 3: step 2 is continued to execute, until obtained power system capacity tends to definite value.

Further, the average utility value of system isCalculating process is as follows:

The transmission value of utility u of each cognitive user is determined first_i, 1≤i≤k, specifically: using signal-to-noise ratio as effectiveness Value:And i-th of cognitive user monopolizes channel p； (1)；

Then:

Wherein: SNR_iIndicate the signal-to-noise ratio that i-th of cognitive user obtains；

y_pFor the state of authorized user on channel p；When being 1, show that channel authorized user occupies；When being 0, show this The uncommitted user occupancy of channel；

S_iThe signal power sent for i-th of cognitive user；

N_pFor the noise power of channel p；

For the signal power summation for selecting the other users of same channel to send with i-th of cognitive user；

Further, determine that the process of reward value is as follows using evolution theory of games:

Award value function is provided that

Wherein:

R is reward value；

Indicate the change rate that the ratio of the total cognitive user of cognitive user Zhan of same channel is selected with i-th of user；

Change rate is more than or equal to 0, then reward value is+1；Less than 0 reward value of change rate is -1.

Further, the process of training neural network is as follows:

Set error function:

Neural network is trained, network θ is updated, to approach Q functional value；

Wherein:

θ is network parameter；

It is the parameter value of one of network；

It is the parameter value of another network；

E expression is averaged；

Expression takes the maximum value of the network parameter；

S indicates state；

Which channel a expression movement, select；

The state of s' subsequent time；

A' indicates which channel subsequent time selects.

Distributed multi-user dynamic spectrum access method has the advantages that 1. is logical in a kind of cognition wireless network of the present invention It crosses and learns to combine with evolution game theory by deeply, propose in a kind of cognitive radio networks distributed multi-user dynamically The new method of frequency spectrum access.

2. carrying out dynamic spectrum access as main frame using DQN algorithm, each user implements DQN as independent agency Algorithm carries out channel selection and study, to be continuously increased power system capacity, while reducing the collision rate between user.

3. introducing evolution theory of games, and the reward functions of nitrification enhancement are set using Replicator Dynamics model, With the independent study of equiblibrium mass distribution formula multi-user.

Detailed description of the invention

Fig. 1: cognition wireless network structure chart；

Fig. 2: system spectrum environmental structure figure；

Fig. 3: intensified learning schematic diagram；

Fig. 4: DQN algorithm flow chart；

Fig. 5: using and does not use DQN-RD method, power system capacity simulation comparison figure；

Fig. 6: using and does not use DQN-RD method, user's collision rate simulation comparison figure；

Fig. 7: when only channel quantity changes, using DQN-RD method system Capacity Simulation figure.

Specific embodiment

Distributed multi-user dynamic spectrum access method in a kind of cognition wireless network of the present invention, as shown in Figure 1, step One, system model, the system model are constructed are as follows: n authorized user and k cognitive user in cell share b channel；Its In: n, k and b take the natural number being not zero, and the value of n and b is equal；K > n；, n authorized user be authorized to use this respectively A little channels, it is assumed that n and k is constant.The spectrum environment of system is as shown in Fig. 2, n authorized user is authorized to use b letter Road, and k unauthorized user can only seek an opportunity, and utilize frequency spectrum machine therein without the free time of transmission in authorized user Meeting.Isometric time slot is divided time into, authorized user and unauthorized user keep slot synchronization, and data packet is divided into can be at one The length that time slot has passed.All unauthorized users have always demand of giving out a contract for a project, and each unauthorized user has independent study and determines The ability of plan, unauthorized user select optimum channel using independent study algorithm to attempt to access.

Selection and access problem due to dynamic spectrum can be expressed as having continuous state and motion space it is discrete when Between Markovian decision process, and in mobile environment state transition probability and stateful expectation reward be all often unknown , therefore power distribution problems are expressed as a Markov process.Under normal circumstances, Markovian decision process is by one Quaternary array representation, i.e. M=<S, A, P, R>.

Frequency spectrum selection and access are carried out to cognitive user using DQN algorithm, specifically:

Initial Q function is set, sets each cognitive user as an executor, each executor calculates according to DQN Method carries out movement selection, i.e., selection one is transmitted from b channel, the average utility value of computing system, rich using evolution Play chess theory setting reward value；Then neural network is trained, uses neural network as function approximator, obtains updated Q function； Above-mentioned steps are continued to execute, until obtained power system capacity tends to definite value.

In the system model that the present invention studies, each cognitive user is calculated as an executor, independent execution DQN Method carries out channel selection and access.Each executor is in the selectable behavior aggregate A={ a of moment t₁,a₂,...,a_b, a_bWhen Carve the b channel that each cognitive user of t can select.State set S={ s₁,s₂,...,s_bIndicate, s_bIndicate moment t's State, s_bIncluding two data: the selected channel p of executor (1≤p≤b) and the value of utility obtained after transmission on channel p u_i(1≤i≤b).Reward functions R we introduce the relevant knowledge of evolution theory of games and be configured.

Introduce signal-to-noise ratio as system value of utility, it is specific as follows:

Wherein: y_pFor the state of authorized user on channel p；S_iThe signal power sent for i-th of cognitive user；N_pFor letter The noise power of road p；For the signal power summation for selecting the other users of same channel to send with i-th of cognitive user.

Then:

Reward value is arranged using replicator dynamics equation:

Wherein: ε is the factor for influencing evolution speed；

x_iIndicate the ratio that the total cognitive user of cognitive user Zhan of same channel is selected with i-th of user；

U indicates the resulting expected utility of individual of selection access channel in group, and group refers to all k cognitive users Set；

For group's average expectation effectiveness；

Award value function is provided that

Wherein:

R is reward value；

Adopt the channel of i-th of user selection；It is+1 that change rate, which is more than or equal to 0 reward value, less than 0 reward value of change rate It is -1.

The DQN algorithm that the present invention uses is a kind of algorithm for learning Q to combine with neural network.It is used in DQN algorithm Neural network approaches Q function as function approximator, and training the basic thought of neural network is by minimizing cost Function trains the parameter of neural network, and optimal neural network parameter is obtained with this.

Therefore, in Q network, error function is set:

Find out gradient of the error function about parameter θ, so that it may train neural network with the methods of stochastic gradient descent, more New parameter obtains optimal Q value.For difference and Q function, used in formula (5) and (6)Symbol, in difference and the present invention Q value.

Wherein:

θ is network parameter；

It is the parameter value of one of network；

It is the parameter value of another network；

E expression is averaged；

Expression takes the maximum value of the network parameter；

S indicates state；

Which channel a expression movement, select；

The state of s' subsequent time；

A' indicates which channel subsequent time selects.

Formula (5) and (6) be it is existing in the prior art, be used in the model in the present invention.

Q study is one of intensified learning algorithms most in use, and Q study indicates state action to value, Q function Q with Q value (s a) is described, and is meant that in state s housing choice behavior a award obtained and the then tactful expectation for obtaining award.Q letter Several replacement criterias are as follows:

Wherein α ∈ (0,1] be learning rate, β ∈ (0,1] be discount factor, r_tFor reward functions.

In the system model that the present invention studies, each cognitive user is as an executor, independent execution DQN algorithm Carry out channel selection and access.I-th of executor is in the selectable behavior aggregate A={ a of moment t₁,a₂,...,a_b, it is held in moment t The b channel that passerby can select；State set S={ s₁,s₂,...,s_bIndicate, the state s of moment t_bIncluding two data: The selected channel p of executor (1≤p≤N) and the effectiveness u obtained after transmission on channel p_i(1≤i≤K)。

As shown in Figures 3 and 4, in each time slot, each cognitive user is as an independent executor according to DQN algorithm Movement selection is carried out, one is selected from b channel and is transmitted.The transmission effectiveness u of each cognition is obtained after transmission_i.Simultaneously Calculate the channel capacity of each cognition and the average size of system.Then according to the transmission value of utility of each cognitive user u_iCalculate average utilityThe tactful differentiation rate of each cognitive user is calculated according to the replicator dynamics equation in evolution theory of games x_i.Rule is set according to reward functions, according to x_iSize obtain reward value.Neural network is finally trained, Q value is updated, carries out down The movement of one time slot selects, until obtained power system capacity tends to definite value.

The present invention has carried out simulation analysis to the scheme mentioned, as shown in Figure 1, in simulations, it is contemplated that secondary user Quantity k=300, channel quantity n=100, and 100 primary users are authorized to and use this 100 channels respectively.

Fig. 5 shows comparison of the different channels access scheme in terms of power system capacity.Line with diamond shape indicates the side DQN-RD Method indicates random access with circular line.It can be seen from the figure that being held as time slot increases using the system of DQN-RD method Amount is also increasing, and basicly stable after 550 time slots.It is fluctuated using the power system capacity of random access and increase is not presented Trend.Although the blue line with circle may be higher than the cyan line with diamond shape, it is not showed in some that study starts The trend risen out.

In Fig. 6, we show both different access schemes with the comparison in terms of user's collision rate.Collision rate refers to difference Secondary user select same channel probability.According to Fig. 6, we may safely draw the conclusion: using DQN-RD algorithm, user's collision rate It will gradually decrease, therefore the utilization rate of channel will increase, system will be gradually stable.This is because being examined when calculating utility function The interference of the secondary user of the mutual same channel of access is considered.This conflict can be effectively reduced in study.With DQN-RD algorithm phase Than random access scheme will lead to the result of random fluctuation.

Fig. 7 indicates when using DQN-RD algorithm, power system capacity with channel quantity variation.When other parameters are constant, but When only the number of channel changes, the simulation result of power system capacity variation tendency is as shown in Figure 7.As can be seen that in number of users phase With in the case where, no matter increased number of available channels results how much, algorithm can effectively improve power system capacity.

Claims

1. distributed multi-user dynamic spectrum access method in a kind of cognition wireless network, which is characterized in that this method is as follows:

Step 1: building system model, the system model are as follows: n authorized user and k cognitive user in cell share b A channel；Wherein: n, k and b take the natural number being not zero, and the value of n and b is equal；K > n；

Set initial Q function, set each cognitive user as an executor, each executor according to DQN algorithm into Action elects, i.e., selection one is transmitted from b channel, the average utility value of computing system, is managed using evolution game By setting reward value；Then neural network is trained, uses neural network as function approximator, obtains updated Q function；

Step 3: step 2 is continued to execute, until obtained power system capacity tends to definite value；

The average utility value of system isCalculating process is as follows:

The transmission value of utility u of each cognitive user is determined first_i, 1≤i≤k, specifically: using signal-to-noise ratio as value of utility:And i-th of cognitive user monopolizes channel p；

Then:

y_pFor the state of authorized user on channel p；Show that channel authorized user occupies when being 1, shows the channel not when being 0 Authorized user occupies；

S_iThe signal power sent for i-th of cognitive user；

N_pFor the noise power of channel p；

S_i-For the signal power summation for selecting the other users of same channel to send with i-th of cognitive user；

It is described to determine that the process of reward value is as follows using evolution theory of games:

Reward value is arranged using replicator dynamics equation:

Wherein: ε is the factor for influencing evolution speed；

U indicates the resulting expected utility of individual of selection access channel in group, and group refers to the collection of all k cognitive users It closes；

For group's average expectation effectiveness；

Award value function is provided that

Wherein:

R is reward value；

When change rate is more than or equal to 0, then reward value is+1；When change rate is less than 0, then reward value is -1.