CN113613337B

CN113613337B - User cooperation anti-interference method for beam forming communication

Info

Publication number: CN113613337B
Application number: CN202110896542.1A
Authority: CN
Inventors: 任国春; 徐煜华; 张云鹏; 徐逸凡; 方贵
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-06-20
Anticipated expiration: 2041-08-05
Also published as: CN113613337A

Abstract

The invention discloses a user cooperation anti-interference method for beam forming communication, which models the antagonism relation between multiple users and interference; the interference is a leader, and the user is a follower; continuously adjusting an interference strategy to ensure that the interference utility is maximum; modeling the cooperative anti-interference behavior among users as potential energy games. Firstly, initializing a strategy of user and interference, namely randomly selecting a communication/interference channel, and setting each user zone bit to 0; then all users execute channel detection or channel updating operation simultaneously, then corresponding utility is calculated, user experience quality satisfaction is interacted between the users, and the zone bit is updated according to the selected strategy; iterating circularly until all the anti-interference strategies of the users are converged; updating a Q table and adjusting a strategy; until the strategy of interference converges. The invention improves the convergence rate by setting different learning parameters for different users, and improves the anti-interference efficiency of the network by the cooperation of the information layers among the users.

Description

User cooperation anti-interference method for beam forming communication

Technical Field

The invention belongs to the technical field of wireless communication, and particularly relates to a user cooperation anti-interference method for beam forming communication.

Background

With the development of wireless technology, global communication services show an exponential burst growth, and in hot spot areas, users usually show ultra-dense distribution, so that great difficulty is brought to the cooperation of users for frequency adjustment and anti-malicious interference attack. To solve this problem, the former proposes to avoid the interference attack by using the frequency hopping method (F.Yao and L.Jia, A Collaborative Multi-Agent Reinforcement Learning Anti-Jamming Algorithm in Wireless Networks, IEEE Wireless Communications Letters, vol.8, no.4, pp.1024-1027, aug.2019.); however, most previous studies only use the maximized throughput of the whole network as an optimization target, and do not consider the actual business requirements of users, and do not consider the requirements of users in the decision-making closed loop. Such methods often suffer from situations where the optimization objective cannot be completely matched with the user requirements, thereby wasting resources.

In addition, the existing anti-interference algorithm has the following two problems: (1) The lack of a cooperative mechanism among users makes the anti-interference method biased to be independent and resistant, and does not exert the crowd-sourced effect. (2) Asynchronous updating algorithms are popular, i.e. only one user updating strategy per iteration, resulting in a slower algorithm convergence speed.

Disclosure of Invention

The invention aims to provide a collaborative anti-interference model and a corresponding anti-interference learning algorithm, which can improve user quality of experience (QoE) and reduce interference influence.

The technical solution for realizing the purpose of the invention is as follows: consider that a malicious user can adaptively adjust an interference strategy according to the frequency usage condition of a communication user, so that the interference utility is maximized. First, the antagonism between the user and the disturbance is modeled as a Stackelberg game. In addition, in the aspect of modeling of user relationship, considering the characteristic of asymmetric mutual interference among users under the condition of space division multiple access, a non-cooperative game model with the characteristic of local advantage is provided. Secondly, to overcome the waste of resources caused by blindly improving throughput, a user experience quality model based on the average evaluation value MOS (Mean Opinion Score) is proposed, and the user utility is quantified by QoE grade. Then, the local literacy game among users is proved to be an accurate potential energy game, and further, the full-network optimal strategy of the users is proved to be a pure strategy Nash balance of the game. Finally, a user cooperation anti-interference algorithm which can realize the whole network optimization only by local information is designed.

An anti-jamming algorithm comprising the steps of:

step 1, modeling a cooperative anti-interference problem in a multi-user single-interference scene as a single-leader multi-follower Stackelberg game model, wherein game participants are all users and interferences in a system;

and 2, randomly selecting one channel for interference by interference, and defining a utility function of the interference as the sum of interference power applied by the jammer to all users of the co-channel. The users select anti-interference channels according to an interference strategy, in order to reduce the inter-user interference in the process, a local cooperation model is considered, the cooperation among the users is analyzed by utilizing the potential energy game framework, and each user needs to consider the benefits of the neighbor users. Thus, the user's utility function is defined as the sum of the QoE satisfaction of the user itself with the neighbor users.

And 3, all users simultaneously perform anti-interference strategy adjustment, and the users perform channel selection according to the current zone bit, the strategy and return of the first two time slots. According to different influence degrees of users on the network, the invention sets different learning parameters for each user so as to improve the convergence rate of the algorithm.

And 4, cycling to the step 3, and performing strategy selection by the user through exploring and learning until the interference strategy and the anti-interference strategies of all the users are converged or the set iteration times are reached.

Step 5, interference assessment of its utility u _j (k) And updates the Q table.

And step 6, disturbing the updating strategy, and circulating to the step 3 until the maximum circulation times are reached.

Further, the cooperative anti-interference problem in the multi-user single-interference scenario described in step 1 is modeled as a single-leader multi-follower Stackelberg game, expressed as:

wherein ,

for user set, j is malicious jammer, < ->

and />

Policy set, u, representing user and interference, respectively _n and u_j Representing the utility functions of user n and interference, respectively.

Further, the inter-user local cooperation model described in step 2 is modeled as a precise potential energy game, which is specifically as follows: defining the potential energy function among users as follows:

wherein a_n Channel access policy for user n, c _j Selecting channels for interference;

for the set of users interfered by user n, < +.>

A user set which causes interference to the user n; the formula represents the sum of QoE satisfaction for all users of the whole network.

The potential energy game proves the following process:

if any user n unilaterally changes the policy from a _n Conversion to

The amount of change in the user utility function is as follows:

in addition, the unilateral change of the policy choices by user n results in a change of the potential energy function as:

wherein

For the set of users interfered by user n, < +.>

For a set of users causing interference to user n, < > for>

Expressed in the collection->

Delete set in->

The following will be further concluded:

the local collaboration model between users is therefore a potential energy game.

Further, all the users in step 3 perform anti-interference policy adjustment at the same time, and the users perform channel selection according to the current flag bit and the policies and rewards of the first two time slots. The specific operation is as follows:

if the flag bit Y _n (t-1) =0, and user n updates the channel according to the following rule:

where M represents the number of channels available to the user,

is the learning parameter of user n. If a is _n (t)＝a _n (t-1), the flag bit Y _n (t) set to 0, otherwise set to 1.

If the flag bit Y _n (t-1) =1, user n updates the channel according to the following rule:

wherein β is the learning rate; u (u) _n (t-1) and u _n (t-2) is the utility of user n in t-1 and t-2 slots, respectively. Setting a flag bit Y after updating _n (t)＝0。

Further, the learning parameters of the user are set as

When x is _n When the method is large enough, the user cooperation anti-interference algorithm can gradually converge to the full-network optimum, and different learning parameters are set for different users mainly for accelerating the convergence speed, and the method specifically comprises the following steps:

x _n (t)＝Γ _n ·ε(t)

where ε (t) =ε (0) +tΔε is the amount of change in time, ε (0) is the initial value, Δε is the step size, and t is the number of iterations.

Indicating how much user n affects the network.

Further, the interference of step 5 evaluates its utility u _j (k) And updates the Q table. The method comprises the following steps:

interference assessment current utility u _j ：

wherein ,p_j Is the interference power;

is the interference frequency; d, d _jn Distance between jammer and user n; />

For channel gain, the interference frequency and interference distance are related;

updating the Q table:

Q ^k+1 (c _j (k))＝(1-λ)Q ^k (c _j (k))+λu _j (k)，

wherein ,Q^k+1 Q value of the period k+1 of the jammer; c _j (k) Selecting an interference channel for an interference machine in a k period; q (Q) ^k Q value of period k of the jammer; u (u) _j (k) The utility of the jammer in the k period; lambda epsilon (0, 1) represents the learning rate for controlling the Q learning convergence rate.

Further, the interference policy updating method in step 6 is as follows:

the channel selection strategy of the self is updated by adopting the Boltzmann function:

where τ is a temperature coefficient, representing a compromise between exploration and utilization.

Selecting channel c for jammer during k period _j (k) Is a probability of (2).

Compared with the prior art, the invention has the remarkable advantages that: (1) The method provides a framework for modeling the relationship between the user and the interference strong countermeasure and the cooperative relationship between the users for the multi-user anti-interference problem. (2) The method and the system consider diversified service demands of users, meanwhile, in order to overcome resource waste caused by improving throughput for blind purposes of the users, a QoE model based on MOS and an optimization mechanism centering on the user demands are provided, the user utility is quantified by QoE level, and the system performance is improved by using user demand diversity. (3) Through the limited improvement of the potential energy game, a multi-user synchronous anti-interference algorithm is designed, and the convergence rate of the algorithm is improved by setting different learning parameters for different users by utilizing the characteristic that the influence degree of each user on the whole network is different.

Drawings

Fig. 1 is a schematic diagram of a multi-user single interference network in a hierarchical anti-interference model for heterogeneous service requirements according to the present invention.

FIG. 2 is a graph comparing the convergence of the algorithm of the present invention with the prior art asynchronous learning algorithm.

Fig. 3 is a schematic diagram of the anti-interference effect of the algorithm of the present invention when the interference power is changed.

Detailed Description

With reference to fig. 1, the hierarchical anti-interference model for multi-user service requirements of the present invention has two millimeter wave picocell base stations in the system, the distance between the two base stations is 50m, and the users are randomly distributed in a circle with the radius of 100m centered on the base station. Meanwhile, the interference is distributed in a range of about 100-200m from the two base stations. In addition, the number of available channels is set to m=4, the channel bandwidth b=1 MHz, and the noise power spectral density N ₀ ＝-130dB/Hz。

The invention is directed to a layered anti-interference model of multi-user business demands, which models interference as a leader and models users as followers. Modeling the antagonism of interference and users as a jackberg game, a method capable of avoiding interference is sought. Modeling the collaboration relationship between users as potential energy games, and searching for a method capable of eliminating co-channel interference. In addition, the collaboration among users provided by the invention is information-level collaboration, which means interaction QoE satisfaction among adjacent users.

Based on the relation between the QoE satisfaction degree of the whole network and the user strategy, the invention accurately maps the user behavior to the system performance by proving the existence of Nash equilibrium and Stackelberg equilibrium, and provides theoretical guidance for further providing a corresponding anti-interference algorithm.

The invention discloses a user cooperation anti-interference algorithm of a layering anti-interference model facing heterogeneous service demands, which comprises the following steps:

and 2, randomly selecting one channel for interference by interference, and defining a utility function of the interference as the sum of interference power applied by the jammer to all users of the same channel. The users select anti-interference channels according to an interference strategy, in order to reduce the inter-user interference in the process, a local cooperation model is considered, the cooperation among the users is analyzed by utilizing the potential energy game framework, and each user needs to consider the benefits of the neighbor users. Thus, the user's utility function is defined as the sum of the QoE satisfaction of the user itself with the neighbor users.

And 3, all users simultaneously perform anti-interference strategy adjustment, and the users perform channel selection according to the current zone bit, the strategy and return of the first two time slots. According to different degrees of influence of users on the whole network, different learning parameters are set for each user, so that the convergence speed of the algorithm is improved.

The specific embodiments of the present invention are as follows:

1. modeling the antagonism between multiuser and interference as a Stackelberg game, expressed as

wherein ,/>

For user set, j is malicious jammer, < ->

and />

2. Considering that users have multiple services, the throughput requirements are also different. In other words, the same throughput may correspond to different QoE satisfaction under different services. The specific QoE satisfaction calculating process comprises the following steps:

user n can only access one base station at a time, we will represent the base station accessed by user n as S _n . Base station S _n And the distance between the user n is expressed as

Base station S _n The direction angle to user n is denoted +.>

We can obtain the base station S _n The directional gain in the direction in which user m is located when user n is served using beamforming techniques is:

wherein ,θ_n For base station S _n Main lobe width of beam when serving user n.

The beam coverage area of the serving user n is defined as:

Further, define the set of potential users that are interfered by user n as:

coverage area for the beam serving user n;

defining a set of potential users that cause interference to user n as:

wherein ,

coverage area of beam for serving user m; g _mn Is S _m Serving user m using beamforming techniques

The gain in the direction in which the user n is located; g ₀ Is the beam gain threshold, taken as 0.01.

Representing the set of all but user n.

Thus, the sum of the external malicious interference suffered by user n and the inter-user interference is expressed as:

wherein ,

is the interference frequency; />

For channel a _m The frequency; a, a _m ,a _n and c_j Channels selected for user m, user n and jammer respectively; g _mn Is S _m The directional gain in the direction of the user n when the user m is served by using the beam forming technology;

channel gain for the channel on which user m is located; />

Is the channel gain of the channel in which the jammer is located. P is p _m For the transmit power of user m, d _jn Which is the distance of the jammer to the user n. P is p _j Is the interference power. Delta (x, y) is an indicator function defined as

Therefore, the communication rate of user n is expressed as:

wherein B is the channel bandwidth; p is p _n Representing the transmit power of user n;

for base station S _n Distance to user n; n (N) ₀ Power spectral density, which is noise; d (D) _n Is the sum of external malicious interference and mutual interference suffered by the user n. />

Channel gain for the channel on which user n is located;

the MOS function is defined as:

MOS＝εlog ₁₀ (R/γ)，

where R is the throughput of the user; epsilon and gamma are constants which are sized according to the maximum and minimum throughput requirements of the users, and the values of the constants are different due to the different service requirements of the users. The mapping relation between the MOS value and the five levels is shown in Table 1.

Table 1: mean Opinion Score (MOS)

Further, using a function

Quantifying different experience levels of the user, and representing satisfaction degree of the user n under different QoE levels:

based on the above analysis, the optimization objective is expressed as the maximum QoE return (i.e. sum of user satisfaction) for the whole network, namely:

based on the above analysis, the utility function for user n is expressed as:

the optimization problem for user n can be expressed as:

further, all users compose a lower level sub-game, denoted:

for interference, the objective is to maximize the cumulative interference for all users, and its utility function is defined as:

we express the decision optimization problem of interference as:

the upper layer sub-game is represented as:

3. the channel selection procedure for each user is as follows:

(1) Initializing: each user

From its set of available channels->

Medium probability of randomly selecting a channel a _n (0) And set the flag bit Y _n (0)＝0。

(2) Channel sounding: if Y _n (t-1) =0, and user n updates the channel according to the following rule:

where M represents the number of channels available to the user,

can be considered as the learning rate of user n. If a is _n (t)＝a _n (t-1), the flag bit Y _n (t) set to 0, otherwise set to 1.

(3) Updating the channel: if Y _n (t-1) =1, user n updates the channel according to the following rule:

wherein beta is learningParameters; u (u) _n (t-1) and u _n (t-2) is the user utility of user n in time slots t-1 and t-2, respectively. Setting a flag bit Y after updating _n (t)＝0。

4. And (3) circulating the steps 1 to 3, and simultaneously performing exploration learning and channel access by all users until the channel access selection of all users achieves convergence or reaches the set iteration times.

For the partial cooperative model, it can prove to be a potential energy game, and there is at least one Nash equilibrium solution. And the corresponding anti-interference algorithm can be designed by utilizing the limited improved property of the potential energy game.

5. Interference assessment of its utility u _j (k) The method comprises the steps of carrying out a first treatment on the surface of the Interference updates Q value as follows

Q ^k+1 (c _j (k))＝(1-λ)Q ^k (c _j (k))+λu _j (k)， (6-25)

Wherein λ e (0, 1) represents a learning rate for controlling a Q learning convergence rate.

Similar to the user, the interference also updates its own channel selection strategy using the boltzmann function:

6. And (3) cycling to the step (3) until the maximum cycle number is reached.

Example 1

One embodiment of the invention is described below: matlab software is adopted for system simulation, and parameter setting does not affect generality; the system has two millimeter wave picocell base stations, the distance between the two base stations is 50m, and users are randomly distributed in a circle with the radius of 100m taking the base station as the center. Meanwhile, the interference is distributed in a range of about 100-200m from the two base stations. In addition, the number of available channels is set to m=4, the channel bandwidth b=1 MHz, and the noise power spectral density N ₀ -130dB/Hz, learning parameter β=t/2500. Learning rate of interference λ=0.1, temperature coefficient

Where K is the total simulation period and K is the current simulation period.

The invention discloses a user cooperation anti-interference algorithm, which comprises the following specific processes:

step 1: t=0, k=0, initializing the mixing strategy of the interference

Step 2: in the kth period, the interference depends on probability

Selecting a channel c _j (k) The method comprises the steps of carrying out a first treatment on the surface of the Every user +.>

From its set of available channels->

During this period, all users simultaneously perform the following processes:

cycle t=1, 2, …:

channel sounding:

if Y _n (t-1) =0, and user n updates the channel according to the following rule:

where M represents the number of channels available to the user,

Updating the channel:

if Y _n (t-1) =1, user n updates the channel according to the following rule:

wherein, beta is a learning parameter; u (u) _n (t-1) and u _n (t-2) is the utility of user n in t-1 and t-2 slots, respectively. Setting a flag bit Y after updating _n (t)＝0

Step 3: interference acquisition utility u _j (k)；

Step 4: the interference updates the Q value according to:

Q ^k+1 (c _j (k))＝(1-λ)Q ^k (c _j (k))+λu _j (k)，

Step 5: update k=k+1, go to step 2. Until the maximum number of cycles is reached

In connection with fig. 2, for the convergence of the collaborative anti-interference algorithm, the comparison algorithm is an asynchronous learning algorithm, i.e. only one user performs policy update per iteration. The figure shows that the synchronous learning algorithm provided by the invention can obviously improve the learning speed.

In connection with fig. 3, the impact of interference power on network satisfaction rate at different user numbers. The network satisfaction rate is basically unchanged with the increase of the user power, and the method provided by the invention can help the user to avoid the interference channel successfully and has a better anti-interference effect.

In summary, the hierarchical anti-interference model and the user cooperation anti-interference algorithm for multi-user service requirements provided by the invention consider that malicious users can adaptively adjust the interference strategy according to the frequency utilization condition of communication users, so that the interference utility is maximized. The idea of modeling the antagonism relationship between the user and the interference as a Stackelberg game is provided. In addition, by considering the characteristic of asymmetric mutual interference among users under the space division multiple access condition, a user cooperation anti-interference algorithm is provided, and the network satisfaction rate is effectively improved. By comparing with an asynchronous learning algorithm, the remarkable improvement of the convergence rate of the proposed algorithm is proved. And the effectiveness of the anti-interference algorithm provided by the invention is proved by performance comparison under different interference powers.

Claims

1. The user cooperation anti-interference method for the beam forming communication is characterized in that interference is modeled as a leader, a user is modeled as a follower, and the interference always aims at causing maximum interference to the user; the user needs to combine the self business requirement and utilize the anti-interference algorithm to maximize the user satisfaction degree of the whole network, namely the network satisfaction rate; the method comprises the following steps:

the cooperative anti-interference problem in the multi-user single-interference scene is modeled as a single-leader multi-follower Stackelberg game model, which is expressed as:

wherein ,

for user set, j is malicious jammer, < ->

and />

Policy set, u, representing user and interference, respectively _n and u_j The utility functions of user n and interference are represented respectively;

step 2, randomly selecting one channel for interference, and defining a utility function of the interference as the sum of interference power applied by an interfering machine to all users of the same channel; the users select anti-interference channels according to the interference strategy, the potential energy game framework is utilized to analyze the cooperation among the users, and each user needs to consider the benefits of the neighbor users; the utility function of the user is defined as the sum of QoE satisfaction of the user itself and the neighbor users;

utility function u of users in local collaboration model _n Defined as the sum of QoE satisfaction of the user itself with the neighbor user, expressed as:

for the set of users interfered by user n, < +.>

A user set which causes interference to the user n; />

A user set which causes interference to the user k; />

For user set->

Channel selection for all users in the network; />

For user set->

Channel selection for all users in the network; q _n QoE satisfaction for user n; q _k QoE satisfaction for user k;

wherein ,

is a function related to user throughput and specific service requirements, and the mapping relation can be represented by MOS functions;

the MOS function is defined as:

MOS＝εlog ₁₀ (R/γ)， (6-3)

where R is the throughput of the user; epsilon and gamma are constants, the size is determined according to the maximum throughput requirement and the minimum throughput requirement of users, and the values of the constants are different due to different service requirements of the users;

satisfaction of user n at different QoE levels is expressed as:

the partial cooperative model has been demonstrated to be an accurate potential energy game, which has been demonstrated as follows:

the potential energy function is expressed as:

from policy a due to policy unilateral to arbitrary user n _n Change to

The resulting satisfaction change is consistent with the change in potential energy function, namely:

wherein a_n For the original channel access policy of user n,

a changed channel access policy for user n; />

Policy change for user n is +>

After that, user set->

Channel selection for all users in the network; a, a _-n Channel access for the remaining users c _j Selecting channels for interference; />

For the set of users interfered by user n, < +.>

For a set of users causing interference to user n, < > for>

Expressed in the collection->

Delete set in->

Step 3, all users simultaneously carry out anti-interference strategy adjustment, and the users carry out channel selection according to the current zone bit, the strategy and return of the first two time slots; according to different influence degrees of users on the whole network, different learning parameters are set for each user, and the convergence speed of the algorithm is improved;

according to different influence degrees of users on the network, different learning parameters are set for each user, and the method specifically comprises the following steps:

the learning parameters are set as

wherein x_n (t)＝Γ _n ×ε(t)；

Indicating the influence degree of the user n on the network; epsilon (t) =epsilon (0) +tΔepsilon, epsilon (0) is an initial value, Δepsilon is a step size, and t is the iteration number;

step 4, cycling the steps 1 to 3, and performing strategy selection by the user through exploration and learning until the interference strategy and the anti-interference strategies of all the users are converged or the set iteration times are reached;

step 5, interference evaluation utility u _j (k) And updating the Q table; the method comprises the following steps:

interference assessment current utility u _j ：

wherein ,p_j Is the interference power; f (f) _cj Is the interference frequency; d, d _jn Distance between jammer and user n; h (f) _cj ,d _jn ) For channel gain, the interference frequency and interference distance are related;

updating the Q table:

Q ^k+1 (c _j (k))＝(1-λ)Q ^k (c _j (k))+λu _j (k)，

wherein ,Q^k+1 Q value of the period k+1 of the jammer; c _j (k) Selecting an interference channel for an interference machine in a k period; q (Q) ^k Q value of period k of the jammer; u (u) _j (k) The utility of the jammer in the k period; lambda E (0, 1) represents learning rate for controlling Q learning convergenceA speed;

step 6, interfering with the updating strategy, and circulating to the step 3 until the maximum circulation times are reached;

the interference strategy updating mode is as follows:

wherein τ is a temperature coefficient, represents a compromise between exploration and utilization,

selecting channel c for jammer during k period _j (k) Is a probability of (2).