CN115296705A - Active monitoring method in MIMO communication system - Google Patents

Active monitoring method in MIMO communication system Download PDF

Info

Publication number
CN115296705A
CN115296705A CN202210470392.2A CN202210470392A CN115296705A CN 115296705 A CN115296705 A CN 115296705A CN 202210470392 A CN202210470392 A CN 202210470392A CN 115296705 A CN115296705 A CN 115296705A
Authority
CN
China
Prior art keywords
antenna
transmitter
listener
parameter
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210470392.2A
Other languages
Chinese (zh)
Other versions
CN115296705B (en
Inventor
唐岚
郭德邻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202210470392.2A priority Critical patent/CN115296705B/en
Publication of CN115296705A publication Critical patent/CN115296705A/en
Application granted granted Critical
Publication of CN115296705B publication Critical patent/CN115296705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0473Wireless resource allocation based on the type of the allocated resource the resource being transmission power
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an active monitoring method in an MIMO communication system, which comprises a suspicious transmitter A, a suspicious receiver B and a legal monitor E, wherein the transmitter A sends information to the transmitter B, the monitor E makes a decision according to a part of known channels to improve the monitoring performance, and the A makes a corresponding decision to stop the monitoring of the monitor, thereby generating a monitoring and anti-monitoring game between a source node A and the monitor E. The invention designs a reinforcement learning algorithm to optimize the transmitting power strategies of the monitor E and the transmitter A and obtain Nash equilibrium in the monitoring and anti-monitoring games between the monitor E and the transmitter A.

Description

Active monitoring method in MIMO communication system
Technical Field
The invention belongs to the field of wireless communication, and particularly relates to an active monitoring method in a Multiple Input Multiple Output (MIMO) system, and more particularly relates to an optimization method of a power distribution strategy based on a multi-agent reinforcement learning algorithm (FSP-SAC).
Background
In recent decades, wireless communication has played a very important role in people's daily life by providing efficient convenience in linking people.
Currently, the physical layer snooping can be divided into two categories, passive snooping and active snooping. Passive listening is simply to accept the leaked radio signal through a silent receiver. However, with the deployment of highband channels and MIMO in fifth generation (5G) networks, the beams transmitting the signals become more and more station-wide, so that passive listening will have difficulty listening to valid information. According to the information theory, information can be decoded in the physical layer sense as long as the channel capacity of the communication channel is smaller than the channel capacity of the listening channel. Therefore, in order to improve the monitoring efficiency, an active monitoring method for reducing the capacity of the communication channel by using the interference signal is beginning to be widely used.
Currently, a common active monitoring method only considers a static monitored target, and with the development of anti-monitoring measures, more and more illegal information transmitters start to intelligently adjust transmission power and reduce monitoring channel capacity by using noise signals, so that a game problem about anti-monitoring and monitoring is generated, which also causes great difficulty to legal active monitoring. Therefore, it is important to construct a method to seek the solution of nash equilibrium in the anti-snooping and snooping game problems.
Disclosure of Invention
The purpose of the invention is as follows: in view of the problems and deficiencies of the prior art, an object of the present invention is to provide an active listening method in an MIMO communication system, which optimizes a power transmission policy of a listener and a power transmission policy of a suspicious source node, so that the policies of the two achieve nash balance.
The technical scheme is as follows: in order to achieve the above object, the present invention adopts a technical solution of an active monitoring method in an MIMO communication system, comprising the steps of:
(1) In each time slot t, a multi-antenna transmitter (transmitter for short) A transmits an information signal to a multi-antenna receiver (receiver for short) B
Figure BDA0003622200150000021
And transmits an interference signal to a multi-antenna listener (simply called listener) E
Figure BDA0003622200150000022
To reduce the channel capacity A to E and thus prevent listening from E, transmitter A's action
Figure BDA0003622200150000023
Is shown as
Figure BDA0003622200150000024
Wherein
Figure BDA0003622200150000025
And
Figure BDA0003622200150000026
are respectively
Figure BDA0003622200150000027
And
Figure BDA0003622200150000028
based on local channel information, transmitter A
Figure BDA0003622200150000029
Using a strategy of pi A Selecting actions, emitter A use based on
Figure BDA00036222001500000210
Conditional probability distribution of
Figure BDA00036222001500000211
The value of sampling is the selected action
Figure BDA00036222001500000212
(2) At each time slot t, the listener E transmits an interference signal x to the receiver B E (t) to reduce the channel capacity between transmitter A and receiver B to increase the listening success rate, the action of listener E
Figure BDA00036222001500000213
Is shown as
Figure BDA00036222001500000214
Wherein
Figure BDA00036222001500000215
Is x E (t) covariance matrix, and the listener E is based on the local channel information
Figure BDA0003622200150000031
Using a strategy of pi E Selecting actions, listener E usage based on
Figure BDA0003622200150000032
Conditional probability distribution of
Figure BDA0003622200150000033
The value of sampling is the selected action
Figure BDA0003622200150000034
(3) At each time slot t, when both the transmitter A and the listener E have performed an action, a prize is respectively awarded
Figure BDA00036222001500000332
And
Figure BDA00036222001500000333
let pi = { pi AE Define the average reward function J of the emitter A A (π) is
Figure BDA00036222001500000334
Wherein
Figure BDA00036222001500000335
Representing the average reward function J of a listener E based on a mathematical expectation of the time-slot t taken by a condition pi E (π) is
Figure BDA00036222001500000336
Optimization strategy pi A To maximize J A (π), optimization strategy π E To maximum J E And (pi) to achieve nash-nash equalization in the snoop and anti-snoop game.
Further, the step (3) further comprises the following steps:
1) For any device n, wherein n is epsilon { A, E }, initializing a parameter of theta n Parameterized strategy of
Figure BDA0003622200150000035
One parameter is psi n Strategy (2)
Figure BDA0003622200150000036
One parameter is ω n Value function of
Figure BDA0003622200150000037
And one parameter is phi n Value function of
Figure BDA0003622200150000038
A parameter phi n Is assigned to
Figure BDA0003622200150000039
Parameter (d) of
Figure BDA00036222001500000310
2) At each time slot t, device n uses the policy with a probability of 0.1
Figure BDA00036222001500000311
To choose an action to use the policy with a probability of 0.9
Figure BDA00036222001500000312
Selecting an action to collect data
Figure BDA00036222001500000313
Store to the first storage area
Figure BDA00036222001500000314
Wherein when n = A, the data
Figure BDA00036222001500000315
Is composed of
Figure BDA00036222001500000316
When n = E, data
Figure BDA00036222001500000317
Is composed of
Figure BDA00036222001500000318
If the action is by policy
Figure BDA00036222001500000319
Selecting, then the data will be
Figure BDA00036222001500000320
To a second storage area
Figure BDA00036222001500000321
3) From
Figure BDA00036222001500000322
Randomly selecting samples with length L
Figure BDA00036222001500000323
Figure BDA00036222001500000324
Calculating gradients
Figure BDA00036222001500000325
Figure BDA00036222001500000326
Wherein
Figure BDA00036222001500000327
Which means that the gradient is taken over the variable x,
Figure BDA00036222001500000328
is made by a policy
Figure BDA00036222001500000329
Sampling is carried out, and the temperature parameter alpha belongs to [0,1 ]]Calculating
Figure BDA00036222001500000330
Gradient of (2)
Figure BDA00036222001500000331
Figure BDA0003622200150000041
Calculating gradients
Figure BDA0003622200150000042
Wherein the discount factor γ ∈ (0, 1), from
Figure BDA0003622200150000043
Randomly selecting samples with the length L
Figure BDA0003622200150000044
Calculating gradients
Figure BDA0003622200150000045
Then the parameter theta is updated n ←θ n + ηΔθ nn ←ω n +ηΔω nn ←φ n +ηΔφ n ,
Figure BDA0003622200150000046
ψ n ←ψ n + ηΔψ n Wherein eta is learning rate, eta value range is (0, 1), v is moving average parameter, v value range is (0, 1), symbol ← shows that the value on the right side of arrow is assigned to the left side, then step 2 is returned until strategy parameter theta n No longer changed.
Has the advantages that: the invention solves the problem of dimension disaster in the face of high-dimensional games by designing an algorithm FSP-SAC and introducing deep strong chemistry, and solves the problem that the common single-agent reinforcement learning algorithm is difficult to converge in the game problem by combining virtual game playing and deep reinforcement learning technologies, so that the algorithm can gradually converge to Nash equilibrium.
Drawings
FIG. 1 is a diagram of a system model of the present invention;
FIG. 2 is a graph comparing the performance of the method used in the present invention with other methods;
FIG. 3 is a graph comparing the performance of the method used in the present invention with other methods.
Detailed Description
The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.
As shown in fig. 1, the communication system considered consists of a multi-antenna transmitter (transmitter) a, a multi-antenna receiver (receiver) B, and a multi-antenna listener (listener) E. Let the number of transmitting antennas of transmitter A be N A The number of receiving antennas of the receiver B is N B And the monitor E has two groups of antennas, one for transmitting the interference signals, the number of which is
Figure BDA0003622200150000051
Another group for listening to signals from A, the number of which is
Figure BDA0003622200150000052
At time slot t, the channel matrices between transmitter A and receiver B, transmitter A and listener E, and listener E and receiver B are denoted H, respectively AB (t),H AE (t), and H EB (t)。
Figure BDA0003622200150000053
Figure BDA0003622200150000054
Wherein
Figure BDA0003622200150000055
A complex field space of size i x j is represented.
In each time slot t, the signal transmitted by the transmitter A is composed of an information signal
Figure BDA0003622200150000056
And artificial noise signal
Figure BDA0003622200150000057
Is composed of (a) wherein
Figure BDA0003622200150000058
Is expressed as
Figure BDA0003622200150000059
Wherein
Figure BDA00036222001500000510
Is represented as a pre-coding matrix of
Figure BDA00036222001500000511
Wherein
Figure BDA00036222001500000512
In order that the artificial noise does not interfere with the information signal,
Figure BDA00036222001500000513
from H AB N of (t) corresponding to non-0 singular values B A right singular value vector, and
Figure BDA00036222001500000514
from the remaining N A -N B Right singular vectors corresponding to singular values of 0. The total signal transmitted by the transmitter a in the time slot t is denoted as
Figure BDA00036222001500000515
Figure BDA00036222001500000516
The signals received by receiver B are:
Figure BDA00036222001500000517
wherein x E (t) is the interference signal transmitted by listener E to B,
Figure BDA00036222001500000518
n B is gaussian white noise. The signal received by listener E is:
Figure BDA00036222001500000519
from equation (1), the covariance matrix of the received signal B is:
Figure BDA00036222001500000520
wherein
Figure BDA00036222001500000521
Superscript (.) H Representing the conjugate transpose of a matrix or vector. The covariance matrix of the received interference is:
Figure BDA00036222001500000522
wherein
Figure BDA00036222001500000523
Is n B Of covariance matrix, σ 2 Is the noise coefficient, I x An identity matrix of size x is shown. According to equations (3) and (4), the channel capacity between the transmitter A and the receiver B is
Figure BDA0003622200150000061
Where the function det represents the determinant of the matrix, the superscript (.) -1 Representing the inverse of the matrix.
According to the formula (2), the covariance matrix of the signal at the E terminal of the listener is:
Figure BDA0003622200150000062
the covariance matrix of the interference is:
Figure BDA0003622200150000063
from equations (6) and (7), the channel capacity between transmitter a and receiver B is:
Figure BDA0003622200150000064
in time slot t, both transmitter a and listener E can only obtain partial channel information, which we call local observation information (or local channel information). In time slot t, the local observation information of transmitter A is defined as
Figure BDA0003622200150000065
Wherein
Figure BDA0003622200150000066
A space of local observation information of a. The local observation information of the listener E is defined as
Figure BDA0003622200150000067
Figure BDA0003622200150000068
The local observation information space of E. And global state is defined as
Figure BDA0003622200150000069
Wherein
Figure BDA00036222001500000610
Is a global state space.
At each time slot t, the transmitter A decides to transmit a signal
Figure BDA00036222001500000611
And
Figure BDA00036222001500000612
power allocation of (2), i.e. covariance matrix
Figure BDA00036222001500000613
And
Figure BDA00036222001500000614
the power of the signal is
Figure BDA00036222001500000615
Figure BDA00036222001500000616
Where tr is the trace of the matrix, the power of the artificial noise
Figure BDA00036222001500000617
If transmitter A does not know channel H AE (t), noise power can be assumed
Figure BDA00036222001500000618
Equally distributed over each artificial noise stream, i.e.
Figure BDA00036222001500000619
If it is assumed that each stream of the signal is uncorrelated with each other, then
Figure BDA00036222001500000620
And
Figure BDA00036222001500000621
are all positive semidefinite symmetric matrices. The signal power and the artificial noise power need to meet the overall constraint of the attack:
Figure BDA00036222001500000622
Figure BDA0003622200150000071
wherein
Figure BDA0003622200150000072
Is the maximum power of the transmitter a. Define the action of the transmitter A in the time slot t as
Figure BDA0003622200150000073
Interference signal x of listener E E (t) covariance matrix
Figure BDA0003622200150000074
Assuming that each stream of the signal is uncorrelated with each other, then
Figure BDA0003622200150000075
A symmetric matrix is defined for half positive. The power of the noise signal needs to satisfy the total power constraint:
Figure BDA0003622200150000076
Figure BDA0003622200150000077
wherein
Figure BDA0003622200150000078
Is the maximum transmit power of the listener E. Define the action of E in time slot t as
Figure BDA0003622200150000079
Defining federated actions
Figure BDA00036222001500000710
According to the information theory, if C E (t) is greater than C B (t), the listener can decode the transmitter with arbitrarily small errors in the physical layer senseA transmits information to receiver B, so define the reward function of listener E as:
Figure BDA00036222001500000711
wherein
Figure BDA00036222001500000712
Representing a Boolean indicating function, outputting 1 when the input value is true, otherwise outputting 0,
Figure BDA00036222001500000713
because the amplitude of the variation of the reward function to the action can be increased exponentially. If the information transmitted by transmitter a is once eavesdropped, a penalty is given based on each portion of the eavesdropped data. Transmitter a aims to reduce the amount of eavesdropped information while maximizing the transmission rate, so the reward function of transmitter a is defined as:
Figure BDA00036222001500000714
where ζ is a coefficient that balances transmission rate and information leakage penalty greater than 0. We will r E (s t ,a t ) And r A (s t ,a t ) Are respectively abbreviated as
Figure BDA00036222001500000715
And
Figure BDA00036222001500000716
the strategy for defining the emitter A is pi A The selection of the action is carried out,
Figure BDA00036222001500000717
is based on
Figure BDA00036222001500000718
The observed information is
Figure BDA00036222001500000719
When A chooses an action using the probability distribution
Figure BDA00036222001500000720
The policy for definition of listener E is π E The selection of the action is carried out,
Figure BDA00036222001500000721
is based on
Figure BDA00036222001500000722
The observed information is
Figure BDA00036222001500000723
The transmitter A uses the probability distribution to select an action
Figure BDA00036222001500000724
The joint strategy of the two is expressed as pi = (pi) AE ). The objective function of the transmitter A is
Figure BDA0003622200150000081
Meaning that under the conditions of the strategy pi,
Figure BDA0003622200150000082
mathematical expectation in the time dimension, i.e. the average prize value. Likewise, the listener E has an objective function of
Figure BDA0003622200150000083
The optimization objectives for transmitter a are:
Figure BDA0003622200150000084
the optimization objectives for listener E are:
Figure BDA0003622200150000085
to solve the problems (11) and (12), a common reinforcement learning algorithm can be applied to the transmitter a and the listener E to solve the problems, but the learning results are difficult to converge due to the fact that the strategies of the two parties are changed. Therefore, a multi-agent reinforcement learning algorithm FSP-SAC is designed to learn respectively optimal strategies pi for the transmitter A and the monitor E A And pi E . The method stabilizes the learning process by training an average strategy and an optimal response strategy.
We present the process of solving the problems (11) and (12) using FSP-SAC, since both problems (11) and (12) are solved using FSP-SAC, for simplicity of description, one of the transmitter a or the listener E is represented by n, i.e. n ∈ { a, E }, and the algorithm process is as follows:
1) For any n ∈ { A, E }, a parameter is initialized to be theta n Parameterized optimal response strategy
Figure BDA0003622200150000086
One parameter is psi n Average strategy of (1)
Figure BDA0003622200150000087
One parameter is ω n Value function of
Figure BDA0003622200150000088
And one parameter is phi n Value function of
Figure BDA0003622200150000089
Initializing a value function
Figure BDA00036222001500000810
To stabilize the learning process, the parameter phi is set n Is assigned to
Figure BDA00036222001500000811
Parameter (d) of
Figure BDA00036222001500000812
2) Data collection: the definition of x to p (x) denotes that x obeys the probability distribution p (x). In each time slot t, the local observation information is
Figure BDA00036222001500000813
n use policy with probability of 0.1
Figure BDA00036222001500000814
To select an action:
Figure BDA00036222001500000815
Figure BDA0003622200150000091
using policy with a probability of 0.9
Figure BDA0003622200150000092
Selecting an action:
Figure BDA0003622200150000093
we call this probabilistic selection strategy a hybrid strategy,
Figure BDA0003622200150000094
and
Figure BDA0003622200150000095
the mixing strategy of (1) is pi n . After both transmitter A and listener E have performed the action, the system transitions to the next state and n is awarded
Figure BDA0003622200150000096
And local observation information for observing next time slot
Figure BDA0003622200150000097
Collecting the collected data
Figure BDA0003622200150000098
Store to storage area
Figure BDA0003622200150000099
If the action is by a policy
Figure BDA00036222001500000910
If selected, the data is then processed
Figure BDA00036222001500000911
Store to another storage area
Figure BDA00036222001500000912
Assuming that the data collection phase is T steps, when T = T, the data collection is finished, and the optimization learning phase is entered.
3) Reinforcement learning stage using
Figure BDA00036222001500000913
To update the data in
Figure BDA00036222001500000914
And
Figure BDA00036222001500000915
from
Figure BDA00036222001500000916
Randomly selecting samples with the length L
Figure BDA00036222001500000917
Figure BDA00036222001500000918
Calculate about
Figure BDA00036222001500000919
Gradient:
Figure BDA00036222001500000920
wherein
Figure BDA00036222001500000921
Which means that the gradient is taken over the variable x,
Figure BDA00036222001500000922
is made by a policy
Figure BDA00036222001500000923
Derived from the sample, not from the sample tau RL Is obtained by a temperature parameter alpha belongs to [0,1 ]]. Recalculating about
Figure BDA00036222001500000924
Gradient (2):
Figure BDA00036222001500000925
calculate about
Figure BDA00036222001500000926
Gradient (2):
Figure BDA00036222001500000927
wherein the discount factor γ ∈ (0, 1). Then, parameters are updated: theta.theta. n ←θ n +ηΔθ nn ←ω n + ηΔω nn ←φ n +ηΔφ n ,
Figure BDA00036222001500000928
Wherein eta is the learning rate, eta is in a value range of (0, 1), v is the sliding average parameter, v is in a value range of (0, 1), and the symbol ← shows that the value on the right side of the arrow is assigned to the left side.
4) Supervising the learning phase, using
Figure BDA00036222001500000929
Is updated with the data of
Figure BDA00036222001500000930
From
Figure BDA00036222001500000931
Randomly selecting samples with the length of B
Figure BDA00036222001500000932
Then calculating the gradient
Figure BDA0003622200150000101
Then, parameters are updated: psi n ←ψ n +ηΔψ n . Then return to step 2) until strategy
Figure BDA0003622200150000102
Converges to a steady state.
Finally, we simulated the system. The simulation parameters are set as: sigma 2 =10 -8 mW,N A = 4,
Figure BDA0003622200150000103
The distances among the transmitter A, the receiver B and the monitor E are all 200m, and the path loss index is 3.48. The coefficient ζ =2 in the formula (10). The strategy and value functions are parameterized by a multilayer perceptron (one type of artificial neural network), and the activation function is ReLu (Rectified Linear Unit), which has two layers, wherein each layer has 128 neurons. η =0.0003, α =0.05, ν = 0.005, γ =0.99, t =1000, l =128.
In FIG. 3, we compared several other methods, where SAC (Soft activator-critical) method is from Soft Actor-critical: off-Policy Maximum Engine expression evaluation with a Stoustic Actor, woLF-PPO method is from Win or lean fast expression strategy optimization. Fig. 2 and 3 are learning curve diagrams of the transmitter a and the monitor E, respectively, and it can be seen that a result curve using the multi-agent reinforcement learning algorithm FSP-SAC progressively converges to a steady-state value, while other learning algorithms for comparison face severe fluctuation, so the present invention solves the problem that other reinforcement learning methods are difficult to converge in the game problem, and according to the relationship between convergence and rationality in the game, since the FSP-SAC inherits the rationality from the SAC, it can be judged that the result of the FSP-SAC method converges to nash equilibrium.

Claims (2)

1. An active monitoring method in a MIMO communication system comprises the following steps:
(1) At each time slot t, the multi-antenna transmitter A transmits an information signal to the multi-antenna receiver B
Figure FDA0003622200140000011
And transmits an interference signal to the multi-antenna listener E
Figure FDA0003622200140000012
To reduce the channel capacity of the multi-antenna transmitter A to the multi-antenna listener E to prevent listening from the multi-antenna listener E, the multi-antenna transmitter A being active
Figure FDA0003622200140000013
Is shown as
Figure FDA0003622200140000014
Figure FDA0003622200140000015
Wherein
Figure FDA0003622200140000016
And
Figure FDA0003622200140000017
are respectively
Figure FDA0003622200140000018
And
Figure FDA0003622200140000019
assistant ofDifference matrix, multiple antenna transmitter A based on local channel information
Figure FDA00036222001400000110
Using a strategy of pi A Selection action, multiple antenna transmitter A usage based on
Figure FDA00036222001400000111
Conditional probability distribution of
Figure FDA00036222001400000112
The value of sampling is the selected action
Figure FDA00036222001400000113
(2) At each time slot t, the multi-antenna listener E transmits an interference signal x to the multi-antenna receiver B E (t) to reduce the channel capacity between the multi-antenna transmitter A and the multi-antenna receiver B to increase the listening success rate, the action of the multi-antenna listener E
Figure FDA00036222001400000114
Is shown as
Figure FDA00036222001400000115
Wherein
Figure FDA00036222001400000116
Is x E (t) covariance matrix, multi-antenna listener E based on local channel information
Figure FDA00036222001400000117
Using a strategy of pi E Selecting actions, multi-antenna listener E usage based on
Figure FDA00036222001400000118
Conditional probability distribution of
Figure FDA00036222001400000119
The value of sampling is the selected action
Figure FDA00036222001400000120
(3) At each time slot t, when the multi-antenna transmitter A and the multi-antenna transmitter E perform actions, respectively obtaining prizes
Figure FDA00036222001400000121
And
Figure FDA00036222001400000122
let pi = { pi = AE Define an average reward function J for a multi-antenna transmitter A A (π) is
Figure FDA00036222001400000123
Wherein
Figure FDA00036222001400000124
Representing the average reward function J of a multi-antenna listener E based on a mathematical expectation of the time slot t taken by the condition pi E (π) is
Figure FDA00036222001400000125
Optimization strategy pi A To maximize J A (π), optimization strategy π E To maximum J E And (pi) to achieve nash-nash equalization in the snoop and anti-snoop game.
2. The active listening method in a MIMO communication system according to claim 1, wherein said step (3) further comprises the steps of:
1) For any device n, where n is in the range of { A, E }, initializing a parameter to be theta n Parameterized strategy of
Figure FDA0003622200140000021
One parameter is psi n Strategy (2)
Figure FDA0003622200140000022
One parameter is ω n Value function of
Figure FDA0003622200140000023
And one parameter is phi n Value function of
Figure FDA0003622200140000024
A parameter phi n Is assigned to
Figure FDA0003622200140000025
Parameter (d) of
Figure FDA0003622200140000026
2) At each time slot t, device n uses the policy with a probability of 0.1
Figure FDA0003622200140000027
To choose an action to use the policy with a probability of 0.9
Figure FDA0003622200140000028
Selecting an action to collect data
Figure FDA0003622200140000029
Store to the first storage area
Figure FDA00036222001400000210
Wherein when n = A, data
Figure FDA00036222001400000211
Is composed of
Figure FDA00036222001400000212
When n = E, data
Figure FDA00036222001400000213
Is composed of
Figure FDA00036222001400000214
If the action is by policy
Figure FDA00036222001400000215
Selecting, then data will be
Figure FDA00036222001400000216
To a second storage area
Figure FDA00036222001400000217
3) From
Figure FDA00036222001400000218
Randomly selecting samples with the length L
Figure FDA00036222001400000219
Calculating gradients
Figure FDA00036222001400000220
Wherein
Figure FDA00036222001400000221
Which means that the gradient is taken over the variable x,
Figure FDA00036222001400000222
is made by a policy
Figure FDA00036222001400000223
Sampling is carried out, and the temperature parameter alpha belongs to [0,1 ]]Calculating
Figure FDA00036222001400000224
Gradient of (2)
Figure FDA00036222001400000225
Figure FDA00036222001400000226
Calculating gradients
Figure FDA00036222001400000227
Figure FDA00036222001400000228
Wherein the discount factor γ ∈ (0, 1), from
Figure FDA00036222001400000229
Randomly selecting samples with the length L
Figure FDA00036222001400000230
Calculating gradients
Figure FDA00036222001400000231
Then updating the parameter theta n ←θ n +ηΔθ nn ←ω n +ηΔω nn ←φ n +ηΔφ n ,
Figure FDA00036222001400000232
ψ n ←ψ n +ηΔψ n Wherein eta is learning rate, eta value range is (0, 1), v is sliding average parameter, v value range is (0, 1), symbol ← shows value on arrow right side is assigned to left side, then step 2 is returned until strategy parameter theta n No longer changed.
CN202210470392.2A 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system Active CN115296705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210470392.2A CN115296705B (en) 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210470392.2A CN115296705B (en) 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system

Publications (2)

Publication Number Publication Date
CN115296705A true CN115296705A (en) 2022-11-04
CN115296705B CN115296705B (en) 2023-11-21

Family

ID=83819503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210470392.2A Active CN115296705B (en) 2022-04-28 2022-04-28 Active monitoring method in MIMO communication system

Country Status (1)

Country Link
CN (1) CN115296705B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090088176A1 (en) * 2007-09-27 2009-04-02 Koon Hoo Teo Method for Reducing Inter-Cell Interference in Wireless OFDMA Networks
US10069592B1 (en) * 2015-10-27 2018-09-04 Arizona Board Of Regents On Behalf Of The University Of Arizona Systems and methods for securing wireless communications
CN111726845A (en) * 2020-07-01 2020-09-29 南京大学 Base station switching selection and power distribution method in multi-user heterogeneous network system
CN112087749A (en) * 2020-08-27 2020-12-15 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN112840600A (en) * 2018-08-20 2021-05-25 瑞典爱立信有限公司 Immune system for improving sites using generation of countermeasure networks and reinforcement learning
WO2021136070A1 (en) * 2019-12-30 2021-07-08 三维通信股份有限公司 Resource allocation method for simultaneous wireless information and power transfer, device, and computer
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090088176A1 (en) * 2007-09-27 2009-04-02 Koon Hoo Teo Method for Reducing Inter-Cell Interference in Wireless OFDMA Networks
US10069592B1 (en) * 2015-10-27 2018-09-04 Arizona Board Of Regents On Behalf Of The University Of Arizona Systems and methods for securing wireless communications
CN112840600A (en) * 2018-08-20 2021-05-25 瑞典爱立信有限公司 Immune system for improving sites using generation of countermeasure networks and reinforcement learning
WO2021136070A1 (en) * 2019-12-30 2021-07-08 三维通信股份有限公司 Resource allocation method for simultaneous wireless information and power transfer, device, and computer
CN111726845A (en) * 2020-07-01 2020-09-29 南京大学 Base station switching selection and power distribution method in multi-user heterogeneous network system
CN112087749A (en) * 2020-08-27 2020-12-15 华北电力大学(保定) Cooperative active eavesdropping method for realizing multiple listeners based on reinforcement learning
CN114363908A (en) * 2022-01-13 2022-04-15 重庆邮电大学 A2C-based unlicensed spectrum resource sharing method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DELIN GUO: "A Proactive Eavesdropping Game in MIMO Systems Based on Multiagent Deep Reinforcement Learning", 《 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS》 *
DELIN GUO: "Eavesdropping Game Based on Multi-Agent Deep Reinforcement Learning", 《 2022 IEEE 23RD INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATION (SPAWC)》 *
吴伟;胡冰;胡峰;: "基于全双工的主动监听系统中合法通信速率最大化方法设计", 南京邮电大学学报(自然科学版), no. 02 *
李奕男: "基于博弈论的移动Ad hoc网络入侵检测模型", 《电子与信息学报》 *

Also Published As

Publication number Publication date
CN115296705B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
Zhang et al. NAS-AMR: Neural architecture search-based automatic modulation recognition for integrated sensing and communication systems
CN109617584B (en) MIMO system beam forming matrix design method based on deep learning
CN112600772B (en) OFDM channel estimation and signal detection method based on data-driven neural network
CN109274456B (en) Incomplete information intelligent anti-interference method based on reinforcement learning
Zhao et al. Cognitive radio engine design based on ant colony optimization
CN109302262A (en) A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth
CN108712748B (en) Cognitive radio anti-interference intelligent decision-making method based on reinforcement learning
CN111224905B (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN112906035B (en) Method for generating frequency division duplex system key based on deep learning
CN118054828B (en) Intelligent super-surface-oriented beam forming method, device, equipment and storage medium
CN109412661A (en) A kind of user cluster-dividing method under extensive mimo system
CN114449584B (en) Distributed computing unloading method and device based on deep reinforcement learning
Zhang et al. Deep reinforcement learning-empowered beamforming design for IRS-assisted MISO interference channels
Fan et al. Demodulator based on deep belief networks in communication system
Zhang et al. Resource management for heterogeneous semantic and bit communication systems
CN115296705A (en) Active monitoring method in MIMO communication system
Omid et al. Deep Reinforcement Learning-Based Secure Standalone Intelligent Reflecting Surface Operation
Zhou et al. QoS-aware power management with deep learning
Zhang et al. Beyond supervised power control in massive MIMO network: Simple deep neural network solutions
Dai et al. Power allocation for multiple transmitter-receiver pairs under frequency-selective fading based on convolutional neural network
Miao et al. A novel millimeter wave channel estimation algorithm based on IC-ELM
CN112087275A (en) Cooperative spectrum sensing method based on birth and death process and viscous hidden Markov model
Tingting et al. Dynamic threshold spectrum sensing method based on DQN combined with clustered cooperative sensing architecture
Zhang et al. Machine Learning enabled Heterogeneous Semantic and Bit Communication
Tan et al. Personalized Recognition for Distributed Jamming in Dynamic Environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant