CN114884547A - Active monitoring method based on deep reinforcement learning - Google Patents

Active monitoring method based on deep reinforcement learning Download PDF

Info

Publication number
CN114884547A
CN114884547A CN202210312148.3A CN202210312148A CN114884547A CN 114884547 A CN114884547 A CN 114884547A CN 202210312148 A CN202210312148 A CN 202210312148A CN 114884547 A CN114884547 A CN 114884547A
Authority
CN
China
Prior art keywords
transmitter
listener
receiver
power
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210312148.3A
Other languages
Chinese (zh)
Inventor
唐岚
陈家乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202210312148.3A priority Critical patent/CN114884547A/en
Publication of CN114884547A publication Critical patent/CN114884547A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0619Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
    • H04B7/0636Feedback format
    • H04B7/0639Using selective indices, e.g. of a codebook, e.g. pre-distortion matrix index [PMI] or for beam selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/022Site diversity; Macro-diversity
    • H04B7/024Co-operative use of antennas of several sites, e.g. in co-ordinated multipoint or co-operative multiple-input multiple-output [MIMO] systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0456Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0619Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
    • H04B7/0621Feedback content
    • H04B7/0626Channel coefficients, e.g. channel state information [CSI]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Radio Transmission System (AREA)

Abstract

The invention discloses an active monitoring method based on deep reinforcement learning, and belongs to the field of communication. In massive MIMO-OFDM systems, conventional passive and active listening schemes become inefficient or even ineffective when the listener E and suspect receiver D are not within the coverage area of the same communication beam. In order to realize legal monitoring of a large-scale MIMO-OFDM system, a monitor is used as a pseudo relay to realize beam induction and data monitoring. When the transmitter S performs beam scanning, the listener E induces the transmitter to select a beam that is favorable for listening by optimizing the relay precoding matrix. In the data listening phase, the listener E increases the listening rate by optimizing the relay power allocation factor and the power gain factor. Because the channel state information of the suspicious communication link is unknown, an optimal precoding matrix and a power distribution factor are searched through a deep reinforcement learning algorithm-MADDPG. Computer simulation verifies the validity of the proposed design.

Description

Active monitoring method based on deep reinforcement learning
Technical Field
The invention belongs to the field of communication, and particularly relates to an active monitoring method based on deep reinforcement learning, and more particularly relates to an active monitoring method in a large-scale MIMO-OFDM (Multiple Input Multiple Output-Orthogonal Frequency Division Multiplexing) system based on deep reinforcement learning.
Background
MIMO-OFDM technology is considered a key technology of fifth generation (5G) mobile networks. However, when advanced beamforming techniques are employed in 5G base stations, directional narrow beams make conventional listening methods inefficient or even ineffective. Therefore, in order to realize lawful interception of suspicious links, it is important to study an interception scheme in a narrow beam scenario.
Existing documents on monitoring can be divided into three categories, passive monitoring, interference-type active monitoring and spoofed relay-type active monitoring. In passive listening, the listener remains silent at listening, i.e. only listens to the data sent by the transmitter. This method is effective only when the listening channel is better than the suspect channel. To overcome this drawback, a method of active monitoring of the interfering type is introduced, i.e. the listener sends interfering signals towards the suspect receiver, forcing the transmitter to reduce the rate so that the information can be decoded by the listener. In order to flexibly implement active listening, a listening method called spoofed relay is proposed. When the listening channel is better than the suspect channel, the method can maximize the listening rate by disguising the listener as a relay. However, when a transmitter sends information to a suspected user using a directional beam, using any of the listening schemes described above does not allow a listener that is outside the beam coverage to successfully listen.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the invention researches a scheme that a monitor outside a beam coverage range in an MIMO-OFDM system can successfully monitor a suspicious communication link, and provides an active monitoring method in a large-scale MIMO-OFDM system based on deep reinforcement learning, so as to ensure that communication data can be successfully monitored even if a transmitter adopts narrow beams to communicate with a suspicious receiver.
The technical scheme is as follows: an active monitoring method based on deep reinforcement learning comprises the following steps:
(1) the transmitter S executes analog beam scanning in a time division manner according to the beam precoding book;
(2) during the phase of beam scanning performed by the transmitter S, the listener E performs a phase of beam scanning according to itselfAnd the beam report fed back by the receiver D to the transmitter S determines the best beam index j advantageous to itself *
(3) The listener E induces the transmitter S to select the best beam index j by optimizing the forward precoding matrix *
(4) In beam j * In the determined communication stage, the listener E serves as a pseudo relay for data forwarding, communication beams are maintained, and the data interception rate is improved.
Further, the step (2) comprises the following steps:
1) the receiver D and the monitor E respectively receive a beam quality measurement reference signal sent by the transmitter S, and calculate the beam quality according to the received signal, and the receiver D forms a beam quality report and feeds the beam quality report back to the transmitter S to be used as beam selection reference;
2) the monitor E determines the best beam index j according to the beam quality report and the beam quality report fed back to the transmitter S by the monitor receiver D, and finally according to the beam induction success rate and the power consumption compromise formula by considering the power consumption factor *
The step (3) comprises the following steps:
(1) the listener forms an optimization problem: minimizing the total transmitting power of the monitor under the constraint of successful beam induction, deducing the form of an optimal precoding matrix according to the optimization problem, and obtaining that the optimal precoding matrix is related to the channel state information of the transmitter S and the receiver D;
(2) the listener E trains a first fitting network using a madpg (Multi-Agent Deep Deterministic Policy Gradient) algorithm to determine transmission parameters of a first forwarding matrix, and then forwards a beam quality measurement reference signal to the receiver D using the first forwarding matrix determined by the transmission parameters, so as to induce the receiver D to send a faulty beam measurement report, thereby enabling the transmitter S to select a beam that is beneficial to the listener E.
The step (4) comprises the following steps:
the i-listener E receives the transmission data sent by the transmitter S and forms an optimization problem: under the condition that the successful monitoring and the sending power are smaller than the upper limit of the forwarding power, the data monitoring rate is maximized;
ii, the monitor trains a second fitting network by using the MADDPG algorithm to determine a power distribution factor and a power gain factor, a part of power is used for decoding, a part of power is used for forwarding signals, and then communication data is forwarded to the receiver D by using a second forwarding matrix, so that communication beams are maintained, and the data monitoring rate is improved.
Further, the step (2) comprises the following steps:
modeling a beam induction problem into a first multi-agent cooperation MDP (Markov Decision Process) problem;
secondly, according to the form of the optimal precoding matrix, the problem of searching the optimal precoding matrix is converted into the problem of searching a pair of constants, so that the training process is accelerated; at a certain time, the motion on a single subcarrier is taken as the angle and the amplitude of a precoding matrix, so that the motion of all subcarriers is a set of motion on a single carrier;
state on single sub-carrier is the sum of beam report information on the feedback channel through monitoring and analyzing and known channel information at a certain specific moment, and the global state is the union of non-overlapping information of all sub-carrier states;
and fourthly, the reward function design at a specific moment encourages successful beam induction and punishs the behavior of consuming excessive energy.
Further, the step ii includes the following steps:
i, modeling a data monitoring problem into a second multi-agent cooperation MDP problem;
II, according to the form of the optimal precoding matrix, converting the problem of searching the optimal precoding matrix into the problem of searching a pair of constants, thereby accelerating the training process; at a certain specific moment, the actions of a single subcarrier are power gain factors and power distribution factors, so that the actions of all subcarriers are a set of actions on a single carrier;
III, at a certain specific moment, the state on a single subcarrier is the sum of the signal-to-interference-and-noise ratio obtained by monitoring and feeding back channel information and the known channel information, and the global state is the union of the non-overlapping information of all the subcarrier states;
IV the bonus design at a particular time encourages the subcarriers to maximize the listening rate under the constraints of listening success and power limitations.
Has the advantages that: the method is suitable for monitoring the suspicious communication link in the narrow-beam large-scale MIMO-OFDM system. In the beam scanning and beam determining process of the transmitter, the listener realizes the induction of the beam by optimizing a precoding matrix. The listening rate is maximized by optimizing a power allocation factor and a power gain factor during transmission of data by the transmitter. Considering that the listener is difficult to obtain the channel information between suspicious nodes, the invention provides a learning scheme based on MADDPG to help the listener to perform beam induction and data listening. The active monitoring method in the large-scale MIMO-OFDM system based on the deep reinforcement learning can effectively induce the transmitter S to select the wave beam beneficial to the monitor E, lay the foundation for the following data monitoring process, enable the monitor E to readjust the power division factor and the power gain factor, effectively maintain the communication link and improve the data monitoring rate.
Drawings
FIG. 1 is a diagram of an active listening model in a massive MIMO-OFDM system according to the present invention;
fig. 2 is a diagram of the functioning of the listener E of the invention in different transmission phases of the transmitter S (BS and DT are abbreviations for the beam scanning and data transmission phases);
FIG. 3 is a diagram of a transceiver of listener E of the present invention;
FIG. 4 is a variation N of the present invention te A relation graph of the beam induction success rate and the transmission power under configuration;
FIG. 5 is a variation N of the present invention te A relation graph of the sending power of the configured monitor E and the transmission power of the transmitter S;
FIG. 6 is a different P of the present invention S And N ts Average interception rate graph under the condition;
fig. 7 is a graph of average snoop rates for various snoop methods of the present invention.
Detailed Description
The invention provides a monitoring method in a large-scale MIMO-OFDM system on the basis of the traditional pseudo relay monitoring, wherein the legal full duplex relay is adopted to realize beam induction and data monitoring. The present invention assumes that analog beamforming is employed at the suspect transmitter and utilizes beam scanning to select the optimal beam vector. The beam steering is done during the beam scanning phase. The purpose of the beam steering is to steer the suspect receiver to select a beam that is favorable to the listener. To achieve this, the listener acts as a relay, amplifying and forwarding the measurement reference signal of the desired beam to the suspect receiver. At this stage, the object of the present invention is to minimize the total transmit power of the listeners under the constraint of successful beam induction by optimizing the precoding matrix of the listeners. Through mathematical derivation, a closed-form representation of the optimal precoding matrix is calculated, which is related to the CSI (Channel State Information) of the suspected communication pair. When the listeners do not know the CSI between them, the present invention uses DRL (Deep Reinforcement Learning) algorithm-maddppg (Multi-Agent Deep Deterministic Policy Gradient) to determine the transmission parameters of all subcarriers. Once beam steering is achieved, the listener can perform data listening and improve the listening rate by continuing to act as a pseudo-relay. At this stage, the power division factor and the power gain factor are optimized to maximize the listening rate. Also, since the listener is unaware of the CSI of the suspect communication pair, the present invention still uses madpg to optimize the listener's relay parameters.
Embodiments of the invention are described in detail below with reference to the accompanying drawings:
the application scenario of the present invention is shown in fig. 1: the invention concerns a legal interception system consisting of a pair of suspicious communication nodes (transmitter S and receiver D) and a legal interceptor E. The transmitter S and the receiver D are each provided with N ts Root transmitting antenna and N rd The root receives the antenna. The invention assumes that both the transmitter S and the receiver D employ a large gauge of analog beamformingA modulo MIMO-OFDM array to transmit and receive information. The analog beam is selected from a predefined discrete codebook, and the invention represents the codebook of the transmitter S as
Figure BDA0003567487890000051
The listener E acts as a full duplex pseudo-relay, passing N re The antenna receives the signal from the transmitter S while passing N te The antenna forwards the signal to the receiver D. In order to improve the monitoring quality, the monitor E employs a digital beamforming technique on each subcarrier. The present invention assumes that all channels in the system remain unchanged in each RB (Resource Block), but may vary from RB to RB according to a markov model.
In the solution of the invention, as shown in fig. 2, for the transmitter S, the whole process of each transport block is divided into two phases: a BS (Beam scanning) phase and a DT (Data Transmission) phase. For listener E, however, the listening process includes three phases: beam selection, beam inducibility, and fraudulent data forwarding. In the beam selection phase, the listener acquires beam quality information by listening to the feedback channel. In particular, when the transmitter S uses beamforming vectors
Figure BDA0003567487890000052
In transmission, the signals received by receiver D and listener E on the k-th sub-carrier can be represented as
Figure BDA0003567487890000053
And
Figure BDA0003567487890000054
wherein s is k Is a transmission signal of a transmitter and
Figure BDA0003567487890000055
Figure BDA0003567487890000056
indicating expectation, f j Is a beamforming vector and f j (n)|=1,n=1,...,N ts And j is the beam index,
Figure BDA0003567487890000057
and
Figure BDA0003567487890000058
is the transmit power on the kth sub-carrier, the channel matrix between the transmitter S and the receiver D, the channel matrix between the transmitter S and the listener E,
Figure BDA0003567487890000061
representing the matrix dimensions.
Figure BDA0003567487890000062
And
Figure BDA0003567487890000063
is zero mean additive white Gaussian noise and has a covariance matrix of σ 2 I. At the receiver of the receiver D, use
Figure BDA0003567487890000064
Figure BDA0003567487890000065
The analog beamformer of (1) processes the received signal and has | v D2 =N rd Where | represents the F-norm modulo a vector or taking a matrix. At listener E, the received signal on the k sub-carrier uses a digital beamformer
Figure BDA0003567487890000066
The signal is processed. At the BS stage of the transmitter S, the receiver D and the listener E, the SNR (Signal to Noise Ratio) at the k-th subcarrier of the receiver D and the listener E is
Figure BDA0003567487890000067
And
Figure BDA0003567487890000068
receiver D calculates the average SNR of all subcarriers
Figure BDA0003567487890000069
Where K is the number of subcarriers, and
Figure BDA00035674878900000610
selecting J pieces
Figure BDA00035674878900000611
The candidate beam with large value is then
Figure BDA00035674878900000612
And corresponding index feedback to S, where J is the number of beams for maximum feedback. The present invention assumes that the listener E can obtain this feedback information by listening to the feedback channel between the transmitter S and the receiver D. When the beam selected by the transmitter S results in low
Figure BDA00035674878900000613
It is difficult for listener E to listen to the communication transmitted by transmitter S, and therefore, listener E induces beam selection of transmitter S as a pseudo-relay. For listener E, the ideal beam should provide a higher signal-to-noise ratio for both listener E and receiver D, because of the low signal-to-noise ratio
Figure BDA00035674878900000614
Will consume more of the listener E's forwarding power and therefore the listener E will determine the optimum beam index required based on
Figure BDA00035674878900000615
Where δ is a trade-off factor used to balance snoop success rate and power consumption of the snoopers.
After determining the required optimal beam index j * The listener E will then induce the transmitter S to select the best beam index j during the next BS * . In the DT phase, the transmitter S transmits the communication data to the receiver D, and the listener E acts as an AF (Amplify and Forward) spoofing relay, listening while forwarding the data.
As shown in figure 3 of the drawings,
Figure BDA00035674878900000616
α k and g k Respectively representing a receiving beamforming vector, a transmitting beamforming vector, a power allocation factor and a power gain factor of the listener E at the subcarrier k. During the beam-steering phase, the received signal is amplified and passed through a k 1 transmission, i.e. no decoding information is needed. In the data forwarding phase, the received signal power is divided into two parts for decoding and forwarding. The invention analyzes how to optimize
Figure BDA0003567487890000071
To achieve the maximum listening rate.
During the beam sweep of the transmitter S, the listener E will amplify and forward the pilot signal sent from the transmitter S to measure the beam quality between the transmitter S and the receiver D. The present invention assumes that the delay of the AF relay used by the listener is much smaller than the symbol duration and therefore negligible. Due to the full duplex nature of listener E, the received signal at subcarrier k from listener E is
Figure BDA0003567487890000072
Wherein
Figure BDA0003567487890000073
In order to be a self-interfering channel,
Figure BDA0003567487890000074
is the pre-coding matrix of the listener E,
Figure BDA0003567487890000075
is the signal received by the listener at the last time. As can be seen from (6), if W k In that
Figure BDA0003567487890000076
Null space of (1), then
Figure BDA0003567487890000077
Then (6)
Figure BDA0003567487890000078
Is provided with
Figure BDA0003567487890000079
Figure BDA00035674878900000710
Is composed of
Figure BDA00035674878900000711
The right singular matrix corresponding to the zero singular value of (b), the precoding matrix may be written as
Figure BDA00035674878900000712
Wherein,
Figure BDA00035674878900000713
is the new matrix to be optimized. To ensure r 0 Greater than 0, the present invention has N te >N re I.e. listener E needs more transmit antennas than receive antennas to suppress self-interference. After eliminating self-interference, transmitting signal
Figure BDA00035674878900000714
Can be expressed as
Figure BDA00035674878900000715
Transmission power of listener E
Figure BDA00035674878900000716
Is composed of
Figure BDA00035674878900000717
The signal received by the receiver D after the reception beam forming can represent
Figure BDA00035674878900000718
Wherein,
Figure BDA00035674878900000719
is the channel matrix between listener E and receiver D on the k sub-carrier, an
Figure BDA00035674878900000720
And
Figure BDA00035674878900000721
is a newly constructed equivalent channel. The received signal-to-noise ratio of receiver D on the k-th subcarrier can then be expressed as
Figure BDA00035674878900000722
To accomplish the inducement of the transmitter S to select the best beam index j with the minimum transmit power * This problem can be expressed as
Figure BDA0003567487890000081
Wherein,
Figure BDA0003567487890000082
and
Figure BDA0003567487890000083
to obtain
Figure BDA0003567487890000084
The present invention decomposes problem (10) into K independent sub-problems, such that
Figure BDA0003567487890000085
Wherein
Figure BDA0003567487890000086
Figure BDA0003567487890000087
Therefore, the K sub-problems can be expressed as
Figure BDA0003567487890000088
To solve (11), the invention first proves a lemma: the optimal solution to the problem (11) can be expressed as
Figure BDA0003567487890000089
Wherein
Figure BDA00035674878900000810
The reasoning proves as follows:
for the sake of brevity, the identification of subcarrier k will be omitted in the following proof of lemma. To prove the lemma, the present invention assumes a feasible solution of (11)
Figure BDA00035674878900000811
Wherein
Figure BDA00035674878900000812
w For the amplitude parameter of the feasible solution, the power consumption P (W') corresponding to the feasible solution is
Figure BDA00035674878900000813
The invention then constructs a matrix
Figure BDA00035674878900000814
Wherein
Figure BDA00035674878900000815
The inequality follows the equation of any matrix (vector) A and B, and is less than or equal to A and B. It will be demonstrated below that the new matrix W 'is not only feasible for the problem (11), but also results in a target value that is smaller than P (W'). Order to
Figure BDA00035674878900000816
By substituting W' into beta in (9) k Numerator and denominator of (1), thus having
Figure BDA00035674878900000817
Figure BDA00035674878900000818
Wherein (13) and (14) follow the triangle inequality. Based on (13) and (14), the present invention infers that β (W ') > β (W') > β D . The above results show that W' is feasible for problem (11). By substituting W' into the objective function of (11), the invention obtains
Figure BDA0003567487890000091
Wherein (15) follows the Cauchy-Schwarz inequality
Figure BDA0003567487890000092
In summary, for any solution W' to the problem (11), the present invention can always construct another one
Figure BDA0003567487890000093
A smaller target value is obtained, which proves thatThis rationale is clear.
Substituting (12) into the objective function in (11) can be seen
Figure BDA0003567487890000094
Is w k Is used as an increasing function of. Therefore, in (12), w is gradually increased from a small value k Until the constraint in (11) is satisfied, a unique unknown variable w can be found k . The doctrine of equivalents applying to any given
Figure BDA0003567487890000095
The optimal solution of (10) is therefore of the same form as (12). In theory, the invention can be implemented by considering all possible combinations
Figure BDA0003567487890000096
To obtain the optimal solution of (10). For a given
Figure BDA0003567487890000097
(11) Provides an upper bound for the solution of (10). Only if the listener E is able to know all the channels
Figure BDA0003567487890000098
The listener E can then use the precoding matrix in (12). The invention assumes that the listener E can obtain equivalent channel vectors by listening to the pilot signal
Figure BDA0003567487890000099
However, due to the non-cooperative relationship between the transmitter S and the listener E, it is difficult to obtain
Figure BDA00035674878900000910
Thus, the present invention relies on the feedback β and β of the DRL D Adjustment of
Figure BDA00035674878900000911
Figure BDA00035674878900000912
To minimize P E . Real-time determination by employing a learning framework of MADDPG
Figure BDA00035674878900000913
The transmitter S is induced to select the beam required by the listener E. Finally W in (7) k Can be expressed as a column vector
Figure BDA00035674878900000914
Sum row vector
Figure BDA00035674878900000915
The product of (c) as shown in fig. 3.
Successful beam steering does not mean that listening can be successfully performed. In the data transmission phase, if the listener E does not forward data to the receiver D, the error rate of the receiver D may be above the threshold and trigger a beam recovery procedure, thereby switching beams. Therefore, in order to realize data relay and listening of the listener E under the AF relay operation, a received signal is transmitted
Figure BDA00035674878900000916
One part is used for forwarding information to increase the signal-to-noise ratio of the receiver D, and the other part is used for decoding information to listen to the message sent by the transmitter S. Due to alpha k The introduction of (a) into (b),
Figure BDA00035674878900000917
power gain factor w in k Re-optimization is required. Definition of
Figure BDA00035674878900000918
For the purpose of normalizing the beamforming vector, the transmitted signal of the listener E
Figure BDA00035674878900000919
Is shown as
Figure BDA00035674878900000920
Whereing k For the power gain factor, alpha, for controlling the transmission power during the data listening phase k A factor is assigned to the power. It should be noted that it is preferable that,
Figure BDA0003567487890000101
and
Figure BDA0003567487890000102
consistent with beam induction, since the present invention is directed to improving the signal-to-noise ratio of the receiver D at both stages. Similarly to (8), the reception signal of the receiver D in the data transmission stage can be written as
Figure BDA0003567487890000103
For a given
Figure BDA0003567487890000104
And
Figure BDA0003567487890000105
the received signal-to-noise ratio of the receiver D and the listener E can be calculated as
Figure BDA0003567487890000106
And
Figure BDA0003567487890000107
the goal of listener E is then to optimize
Figure BDA0003567487890000108
Thereby maximizing the listening rate within the constraints of the transmission power. Thus, the optimization problem can be expressed as
Figure BDA0003567487890000109
Wherein,
Figure BDA00035674878900001010
Figure BDA00035674878900001011
and P M Respectively, the total transmit power and the power constraint of the listener E. The invention assumes that the listener E is only at R E ≥R D The monitoring can be realized only when the corresponding monitoring rate is R D . If the listener E knows the global CSI, then a solution can be derived (18) using the Lagrangian multiplier method. However, when
Figure BDA00035674878900001012
When unknown, the invention can not obtain the optimum
Figure BDA00035674878900001013
Is to know
Figure BDA00035674878900001014
A more reasonable assumption is that this can be obtained by listening to the uplink control channel between the transmitter S and the receiver D
Figure BDA00035674878900001015
Therefore, DRL is adopted to
Figure BDA00035674878900001016
Determining as an observed state and interacting with a system
Figure BDA00035674878900001017
By training the neural network using MADDPG, it is given in real time
Figure BDA00035674878900001018
Thereby improving the listening rate of the listener E at a controllable transmission power.
Based on the above analysis, when the CSI between the transmitter S and the receiver D is unknown, the beam induction and data listening problem is expressed as an MDP (Markov Decision Process) problem. Treating all sub-carriers as one agent and obtaining the policy through the Actor-critical network of a single DDPG is the first intuitive deep learning solution. However, in practical implementations, it is often more difficult to train one strategy with a large action space than to train multiple strategies with a small action space. Thus, in both phases, the present invention treats each subcarrier as a separate agent, which cooperate to achieve a common goal. Therefore, the present invention adopts the learning architecture of MADDPG, which includes K actors (policies) and a centralized Critic (cost function). During the training phase, Actor and Critic are updated with global data, including global state, shared rewards, and all actions, which will be defined later.
Modeling a beam induction problem into a first multi-agent cooperation MDP problem, and converting the problem of searching the optimal precoding matrix into the problem of searching a pair of constants (w) according to the form of the optimal precoding matrix kk ) Problem of where θ k As a MADDPG algorithm pair
Figure BDA0003567487890000111
Thereby speeding up the training process. At time t, the k-th sub-carrier is used for operation
Figure BDA0003567487890000112
And (4) showing. Thus, the action of all subcarriers is
Figure BDA0003567487890000113
At time t, the state on each subcarrier k is
Figure BDA0003567487890000114
Beta and beta D Obtained by listening and analyzing beam reports on the feedback channel. Global state s t Is the all sub-carrier state
Figure BDA0003567487890000115
Of non-overlapping information, i.e. union
Figure BDA0003567487890000116
Reward r at time t t Is defined as r t =-a 1 P E -a 2 (β-β D -B) 2 +a 3 I(β,β D ) Wherein
Figure BDA0003567487890000117
B is a positive factor for balancing the induced success rate and power consumption of the listener, and is used to increase the selection of the best beam index j * The constant of probability, I (x, y), is a boolean function where I (x, y) is 1 when x ≧ y, and I (x, y) is 0 otherwise. The reward function encourages successful beam inducement while penalizing excessive energy consumption behavior.
Modeling the data monitoring problem into a second multi-agent collaborative MDP problem, and converting the problem of finding the optimal precoding matrix into the problem of finding a pair of constants (g) according to the form of the optimal precoding matrix kk ) Problem wherein g is k And alpha k Respectively representing the power gain factor and the power distribution ratio of the monitor on the subcarrier k, thereby accelerating the training process. At time t, the k-th subcarrier acts as
Figure BDA0003567487890000118
Thus, the action of all subcarriers is
Figure BDA00035674878900001117
At time t, the state on each subcarrier k is
Figure BDA0003567487890000119
Figure BDA00035674878900001110
Wherein
Figure BDA00035674878900001111
Is obtained by listening and feeding back channel information. Global state s t Is the all sub-carrier state
Figure BDA00035674878900001112
Of non-overlapping information, i.e. union
Figure BDA00035674878900001113
Reward r at time t t Is defined as
Figure BDA00035674878900001114
Wherein
Figure BDA00035674878900001115
C is a constant used to boost the listening rate, a positive coefficient used to balance the listening rate and power consumption of the relay. Reward function encourages subcarriers at R E >R D And P E ≤P M Maximize R under the constraint of D
As shown in FIG. 4, during the beam-inducing phase, different N te Lower beam induction success rate and transmission power P S A graph of the relationship (c). The invention allocates the same transmission power on each subcarrier, i.e.
Figure BDA00035674878900001116
The induction rate was calculated by 10 5 Statistics of beta is more than or equal to beta in sub Monte Carlo simulation D The amount of (c) is obtained. In the passive approach, the listener E remains silent while the transmitter S performs the beam scanning. In this case, when the listener E and the receiver D are far apart, the receiver D will select the best beam index j with low probability * . The results show that the success rate of the method based on MADDPG provided by the invention is close to 100%. These results verify the effectiveness of the method in different system configurations.
As shown in FIG. 5, during the beam-inducing phase, different N te P under configuration E And P S The relationship between them. In the best case, P E Is known as (10)
Figure BDA0003567487890000121
The calculated optimal target value. In the MADDPG based scheme, P E Calculated using parameters learned by maddppg. It can be seen that although P is E Will follow P S Is increased, but is provided with more N te Can effectively reduce P E . In combination with fig. 3 and 4, canIt is seen that even though
Figure BDA0003567487890000122
Unknown, the present invention can still use the beam-inducing strategy learned by maddppg to achieve beam-inducing, and the transmit power is slightly higher than the theoretical minimum power.
As shown in fig. 6, at the stage of data listening, by solving (18), an optimal solution is obtained. A passive approach with SBM (Successful Beam steering) means that the agent achieves Beam steering in the BS phase but remains silent in the DT phase. As shown in FIG. 6, after successful induction, the interception rate will follow P S Or N ts Can ensure R by adjusting transmission parameters E ≥R D . Meanwhile, the monitoring rate of the method provided by the invention is close to the optimal solution, and is obviously superior to the passive monitoring method under the SBM.
As shown in FIG. 7, FIG. 7 contrasts different power constraints P M The following monitoring schemes and the traditional active interference scheme are drawn as comparison. The result shows that the monitoring rate obtained by the MADDPG scheme provided by the invention is close to the optimal solution and is along with the P M Is increased. When P is M >55dBm,
Figure BDA0003567487890000123
The interception rate approaches the maximum R E . The monitoring performance of the passive monitoring scheme without the SBM is irrelevant to the transmitting power of the monitor E, and the monitoring performance of the passive monitoring with the SBM is superior to that of the method without the SBM. The average interception rate of the interference scheme is limited by the power constraint of the listener E, since it is at a power limit value P M Relatively low R cannot be guaranteed E ≥R D
Simulation proves that the active monitoring method in the large-scale MIMO-OFDM system based on deep reinforcement learning can effectively induce the transmitter S to select the wave beam beneficial to the monitor E, lay the foundation for the following data monitoring process, enable the monitor E to readjust the power distribution factor and the power gain factor, effectively maintain the communication link and improve the data monitoring rate. The two stages are combined to realize a large-scale MIMO-OFDM system for monitoring narrow beam communication.

Claims (6)

1. An active monitoring method based on deep reinforcement learning comprises the following steps:
(1) the transmitter S executes analog beam scanning in a time division manner according to the beam precoding book;
(2) in the phase of beam scanning performed by the transmitter S, the monitor E determines the optimal beam index j favorable to itself according to its beam quality report and the beam report fed back to the transmitter S by the receiver D *
(3) The listener E induces the transmitter S to select the best beam index j by optimizing the forward precoding matrix *
(4) At the best beam index j * In the determined communication stage, the listener E serves as a pseudo relay for data forwarding, communication beams are maintained, and the data interception rate is improved.
2. The active listening method based on deep reinforcement learning of claim 1, wherein the step (2) comprises the following steps:
1) the receiver D and the monitor E respectively receive a beam quality measurement reference signal sent by the transmitter S, and calculate the beam quality according to the received signal, and the receiver D forms a beam quality report and feeds the beam quality report back to the transmitter S to be used as beam selection reference;
2) the monitor E determines the best beam index j according to the beam quality report and the beam quality report fed back to the transmitter S by the monitor receiver D, and finally according to the beam induction success rate and the power consumption compromise formula by considering the power consumption factor *
3. The active listening method based on deep reinforcement learning of claim 1, wherein the step (3) comprises the following steps:
(1) the listener forms an optimization problem: minimizing the total transmitting power of the monitor under the constraint of successful beam induction, deducing the form of an optimal precoding matrix according to the optimization problem, and obtaining that the optimal precoding matrix is related to the channel state information of the transmitter S and the receiver D;
(2) the listener E trains a first fitting network using the maddppg algorithm to determine transmission parameters of a first forwarding matrix, and then forwards a beam quality measurement reference signal to the receiver D using the first forwarding matrix determined by the transmission parameters, so as to induce the receiver D to send a beam measurement report with errors, thereby enabling the transmitter S to select a beam that is favorable for the listener E.
4. The active listening method based on deep reinforcement learning of claim 1, wherein the step (4) comprises the following steps:
the i-listener E receives the transmission data sent by the transmitter S and forms an optimization problem: under the condition that the monitoring is successful and the sending power is smaller than the upper limit of the forwarding power, the data monitoring rate is maximized;
ii, the monitor trains a second fitting network by using the MADDPG algorithm to determine a power distribution factor and a power gain factor, a part of power is used for decoding, a part of power is used for forwarding signals, and then communication data is forwarded to the receiver D by using a second forwarding matrix, so that communication beams are maintained, and the data monitoring rate is improved.
5. The active listening method based on deep reinforcement learning of claim 3, wherein the step (2) comprises the following steps:
firstly, modeling a beam induction problem into a first multi-agent cooperation MDP problem;
secondly, according to the form of the optimal precoding matrix, the problem of searching the optimal precoding matrix is converted into the problem of searching a pair of constants, so that the training process is accelerated; at a certain time, the motion on a single subcarrier is taken as the angle and the amplitude of a precoding matrix, so that the motion of all subcarriers is a set of motion on a single carrier;
state on single sub-carrier is the sum of beam report information on the feedback channel through monitoring and analyzing and known channel information at a certain specific moment, and the global state is the union of non-overlapping information of all sub-carrier states;
and fourthly, designing a reward function at a specific moment to encourage successful beam induction and punish the behavior of consuming excessive energy.
6. The active listening method based on deep reinforcement learning of claim 4, wherein the step ii comprises the following steps:
i, modeling a data monitoring problem into a second multi-agent cooperation MDP problem;
II, according to the form of the optimal precoding matrix, converting the problem of searching the optimal precoding matrix into the problem of searching a pair of constants, thereby accelerating the training process; at a certain specific moment, the actions of a single subcarrier are power gain factors and power distribution factors, so that the actions of all subcarriers are a set of actions on a single carrier;
III, at a certain specific moment, the state on a single subcarrier is the sum of the signal-to-interference-and-noise ratio obtained by monitoring and feeding back channel information and the known channel information, and the global state is the union of the non-overlapping information of all the subcarrier states;
IV the bonus design at a particular time encourages the subcarriers to maximize the listening rate under the constraints of listening success and power limitations.
CN202210312148.3A 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning Pending CN114884547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210312148.3A CN114884547A (en) 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210312148.3A CN114884547A (en) 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114884547A true CN114884547A (en) 2022-08-09

Family

ID=82669000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210312148.3A Pending CN114884547A (en) 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114884547A (en)

Similar Documents

Publication Publication Date Title
Tsinos et al. Joint transmit waveform and receive filter design for dual-function radar-communication systems
US9413474B2 (en) Efficient large-scale multiple input multiple output communications
Jiang et al. Recurrent neural network-based frequency-domain channel prediction for wideband communications
US11742910B2 (en) Multi-user uplink and downlink beam alignment method for asymmetric millimeter wave large-scale MIMO
Liu et al. Joint beamforming and reflection design for RIS-assisted ISAC systems
Uher et al. Review of distributed beamforming
CN113721198A (en) Physical layer security combined beam forming method for dual-function MIMO radar communication system
Gao et al. Deep reinforcement learning for joint beamwidth and power optimization in mmWave systems
KR101155629B1 (en) Method for selective transmit/receive antenna repetition
Zhu et al. Resource allocation for IRS assisted mmWave integrated sensing and communication systems
CN115632684B (en) Transmission strategy design method of perception and communication integrated system
Chen-Hu et al. Differential data-aided beam training for RIS-empowered multi-antenna communications
Chen et al. A two-stage beamforming design for active RIS aided dual functional radar and communication
Abdulateef et al. Performance analyses of channel estimation and precoding for massive MIMO downlink in the TDD system
CN109669167B (en) Airborne radar emission waveform selection method based on radio frequency stealth
Chen et al. Proactive eavesdropping in massive MIMO-OFDM systems via deep reinforcement learning
CN114884547A (en) Active monitoring method based on deep reinforcement learning
CN113297724B (en) Distributed MIMO radar power and bandwidth joint optimization method based on target positioning
CN116614161A (en) Radar communication integrated beam forming method based on linear constraint power distribution
CN113923746B (en) Anti-interference method of wireless energy supply communication network based on time reversal
CN115348577A (en) Beam scanning method based on reinforcement learning in covert communication system
Zhang et al. Design and performance analysis of wireless legitimate surveillance systems with radar function
CN114793127A (en) Dual-function radar communication method and device, computer equipment and storage medium
Kloob et al. Novel KLD-based Resource Allocation for Integrated Sensing and Communication
Wang et al. Joint Reliability Optimization and Beamforming Design for STAR-RIS-Aided Multi-user MISO URLLC systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination