CN114884547A - Active monitoring method based on deep reinforcement learning - Google Patents

Active monitoring method based on deep reinforcement learning Download PDF

Info

Publication number
CN114884547A
CN114884547A CN202210312148.3A CN202210312148A CN114884547A CN 114884547 A CN114884547 A CN 114884547A CN 202210312148 A CN202210312148 A CN 202210312148A CN 114884547 A CN114884547 A CN 114884547A
Authority
CN
China
Prior art keywords
listener
transmitter
power
receiver
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210312148.3A
Other languages
Chinese (zh)
Inventor
唐岚
陈家乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202210312148.3A priority Critical patent/CN114884547A/en
Publication of CN114884547A publication Critical patent/CN114884547A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0619Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
    • H04B7/0636Feedback format
    • H04B7/0639Using selective indices, e.g. of a codebook, e.g. pre-distortion matrix index [PMI] or for beam selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/022Site diversity; Macro-diversity
    • H04B7/024Co-operative use of antennas of several sites, e.g. in co-ordinated multipoint or co-operative multiple-input multiple-output [MIMO] systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0456Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0619Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal using feedback from receiving side
    • H04B7/0621Feedback content
    • H04B7/0626Channel coefficients, e.g. channel state information [CSI]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Radio Transmission System (AREA)

Abstract

The invention discloses an active monitoring method based on deep reinforcement learning, and belongs to the field of communication. In massive MIMO-OFDM systems, conventional passive and active listening schemes become inefficient or even ineffective when the listener E and suspect receiver D are not within the coverage area of the same communication beam. In order to realize legal monitoring of a large-scale MIMO-OFDM system, a monitor is used as a pseudo relay to realize beam induction and data monitoring. When the transmitter S performs beam scanning, the listener E induces the transmitter to select a beam that is favorable for listening by optimizing the relay precoding matrix. In the data listening phase, the listener E increases the listening rate by optimizing the relay power allocation factor and the power gain factor. Because the channel state information of the suspicious communication link is unknown, an optimal precoding matrix and a power distribution factor are searched through a deep reinforcement learning algorithm-MADDPG. Computer simulation verifies the validity of the proposed design.

Description

基于深度强化学习的主动监听方法Active monitoring method based on deep reinforcement learning

技术领域technical field

本发明属于通信领域,具体涉及一种基于深度强化学习的主动监听方法,更具体涉及一种基于深度强化学习的大规模MIMO-OFDM(Multiple Input Multiple Output-Orthogonal Frequency Division Multiplexing,多输入多输出-正交频分复用)系统中的主动监听方法。The invention belongs to the field of communications, in particular to an active monitoring method based on deep reinforcement learning, and more particularly to a massive MIMO-OFDM (Multiple Input Multiple Output-Orthogonal Frequency Division Multiplexing, Multiple Input Multiple Output- Active listening method in orthogonal frequency division multiplexing) system.

背景技术Background technique

MIMO-OFDM技术被认为是第五代(5G)移动网络的一项关键技术。然而,当在5G基站中采用先进的波束赋形技术时,定向的窄波束使得传统的监听方法效率变低,甚至无效。因此,为了实现对可疑链路的合法监听,研究窄波束场景中的监听方案至关重要。MIMO-OFDM technology is considered a key technology for fifth generation (5G) mobile networks. However, when advanced beamforming technology is adopted in 5G base stations, the directional narrow beam makes traditional listening methods inefficient or even ineffective. Therefore, in order to achieve legal monitoring of suspicious links, it is crucial to study monitoring schemes in narrow beam scenarios.

现有的关于监听的文献可分为被动监听、干扰式主动监听和欺骗中继式主动监听三类。在被动监听中,监听者在监听时保持沉默,即只监听发射机发送的数据。这种方法只有在监听信道优于可疑信道时才有效。为了克服这一缺点,引入了干扰式主动监听的方法,即监听者将干扰信号发向可疑接收机,迫使发射机降低速率,从而信息可以被监听者解码。为了灵活地实现主动监听,提出了一种称为欺骗中继的监听方法。当监听信道优于可疑信道时,该方法可以通过将监听器伪装为中继,使监听速率最大化。然而,当发射机用定向波束向可疑用户发送信息时,使用上述的任何监听方案都无法使得在波束覆盖范围外的监听器成功地监听。The existing literature on monitoring can be divided into three categories: passive monitoring, jamming active monitoring and deception relay active monitoring. In passive listening, the listener remains silent while listening, ie only listens to the data sent by the transmitter. This method only works if the listening channel is better than the suspect channel. In order to overcome this shortcoming, the method of jamming active monitoring is introduced, that is, the listener sends the jamming signal to the suspicious receiver, forcing the transmitter to reduce the rate, so that the information can be decoded by the listener. In order to realize active monitoring flexibly, a monitoring method called spoofing relay is proposed. When the listening channel is better than the suspicious channel, this method can maximize the listening rate by disguising the listener as a relay. However, when a transmitter uses a directional beam to send information to a suspicious user, none of the listening schemes described above will allow listeners outside the beam coverage to successfully listen.

发明内容SUMMARY OF THE INVENTION

发明目的:针对上述现有技术的缺陷,本发明研究在MIMO-OFDM系统中的波束覆盖范围外的监听器能够成功监听可疑通信链路的方案,提出一种基于深度强化学习的大规模MIMO-OFDM系统中的主动监听方法,以保证即使发射机采用窄波束和可疑接收机通信,依然能够成功的监听通信数据。Purpose of the invention: In view of the above-mentioned defects of the prior art, the present invention studies the scheme that the listener outside the beam coverage in the MIMO-OFDM system can successfully monitor the suspicious communication link, and proposes a massive MIMO- The active monitoring method in the OFDM system ensures that even if the transmitter uses a narrow beam to communicate with a suspicious receiver, it can still successfully monitor the communication data.

技术方案:一种基于深度强化学习的主动监听方法,包含以下步骤:Technical solution: an active monitoring method based on deep reinforcement learning, including the following steps:

(1)发射机S按照波束预码本以时分的方式执行模拟波束扫描;(1) The transmitter S performs analog beam scanning in a time-division manner according to the beam precodebook;

(2)在发射机S执行波束扫描阶段,监听器E根据自身的波束质量报告和接收机D反馈给发射机S的波束报告确定对自身有利的最佳波束索引j*(2) In the stage of beam scanning performed by the transmitter S, the listener E determines the optimal beam index j * that is beneficial to itself according to its own beam quality report and the beam report fed back to the transmitter S by the receiver D;

(3)监听器E通过优化转发预编码矩阵来诱导发射机S选择最佳波束索引j*(3) The listener E induces the transmitter S to select the best beam index j * by optimizing the forwarding precoding matrix;

(4)在波束j*确定后的通信阶段,监听器E充当数据转发的伪中继,维护通信波束,提高数据监听率。(4) In the communication stage after the beam j * is determined, the listener E acts as a pseudo relay for data forwarding, maintains the communication beam, and improves the data monitoring rate.

进一步的,所述步骤(2)中包括如下步骤:Further, the step (2) includes the following steps:

1)接收机D和监听器E分别接收发射机S发出的波束质量测量参考信号,并根据接收信号计算波束质量,所述接收机D形成波束质量报告并反馈给发射机S做波束选择参考;1) The receiver D and the listener E respectively receive the beam quality measurement reference signal sent by the transmitter S, and calculate the beam quality according to the received signal, and the receiver D forms a beam quality report and feeds it back to the transmitter S for beam selection reference;

2)监听器E根据自身的波束质量报告和通过监听接收机D反馈给发射机S的波束质量报告,同时考虑功率消耗的因素,最终根据波束诱导成功率和功耗折衷公式确定最佳波束索引j*2) Listener E determines the optimal beam index according to its own beam quality report and the beam quality report fed back to transmitter S through monitoring receiver D, while considering the factors of power consumption, and finally according to the compromise formula of beam induction success rate and power consumption j * .

所述步骤(3)中包括如下步骤:Described step (3) comprises the following steps:

㈠监听器形成优化问题:在波束诱导成功的约束下最小化监听器的总发射功率,根据优化问题推导出最优预编码矩阵的形式,得出最优预编码矩阵与发射机S和接收机D的信道状态信息有关;(1) Listener formation optimization problem: The total transmit power of the listener is minimized under the constraint of successful beam induction, and the form of the optimal precoding matrix is derived according to the optimization problem, and the optimal precoding matrix and transmitter S and receiver are obtained. D's channel state information;

㈡监听器E使用MADDPG(Multi-Agent Deep Deterministic Policy Gradient,多代理深度确定性策略梯度)算法训练第一拟合网络来确定第一转发矩阵的传输参数,之后利用所述传输参数确定的第一转发矩阵向接收机D转发波束质量测量参考信号,诱导接收机D发送有误的波束测量报告,从而使得发射机S选择对监听器E有利的波束。(II) The listener E uses the MADDPG (Multi-Agent Deep Deterministic Policy Gradient, Multi-Agent Deep Deterministic Policy Gradient) algorithm to train the first fitting network to determine the transmission parameters of the first forwarding matrix, and then utilizes the first transmission parameters determined by the transmission parameters. The forwarding matrix forwards the beam quality measurement reference signal to the receiver D, and induces the receiver D to send an erroneous beam measurement report, so that the transmitter S selects a favorable beam for the listener E.

所述步骤(4)中包括如下步骤:Described step (4) comprises the following steps:

i监听器E接收发射机S发出的传输数据,并形成优化问题:在成功监听和发送功率小于转发功率上限的条件下,最大化数据监听率;The i listener E receives the transmission data sent by the transmitter S, and forms an optimization problem: maximize the data monitoring rate under the condition that the successful monitoring and the transmission power is less than the upper limit of the forwarding power;

ii监听器使用MADDPG算法训练第二拟合网络来确定功率分配因子和功率增益因子,让一部分功率用来解码,一部分功率用来转发信号,之后利用第二转发矩阵向接收机D转发通信数据,以维护通信波束,提高数据监听率。ii The listener uses the MADDPG algorithm to train the second fitting network to determine the power distribution factor and the power gain factor, so that a part of the power is used for decoding and a part of the power is used for forwarding the signal, and then the second forwarding matrix is used to forward the communication data to the receiver D, In order to maintain the communication beam and improve the data monitoring rate.

进一步的,所述步骤㈡中包括如下步骤:Further, the step (ii) includes the following steps:

①将波束诱导问题建模成第一多智能体协同MDP(Markov Decision Process,马尔科夫决策过程)问题;①Model the beam steering problem as the first multi-agent collaborative MDP (Markov Decision Process) problem;

②根据最优预编码矩阵的形式,将寻找最优预编码矩阵问题转化为寻找一对常数问题,从而加快训练过程;在某一特定时刻,单子载波上的动作为预编码矩阵的角度及幅度,因此,所有子载波的动作为单载波上动作的集合;② According to the form of the optimal precoding matrix, the problem of finding the optimal precoding matrix is transformed into the problem of finding a pair of constants, thereby speeding up the training process; at a certain moment, the action on a single subcarrier is the angle and amplitude of the precoding matrix , therefore, the actions of all subcarriers are a set of actions on a single carrier;

③某一特定时刻,单子载波上的状态为通过监听和分析反馈信道上的波束报告信息加上已知信道信息,全局状态是所有子载波状态的非重叠信息的并集;③ At a certain moment, the state on a single subcarrier is the addition of known channel information by monitoring and analyzing the beam report information on the feedback channel, and the global state is the union of the non-overlapping information of all subcarrier states;

④某一特定时刻的奖励函数设计鼓励成功的波束诱导,同时惩罚消耗过多能量的行为。④ The reward function design at a particular moment encourages successful beam induction while punishing behaviors that consume too much energy.

进一步的,所述步骤ii中包括如下步骤:Further, the step ii includes the following steps:

I将数据监听问题建模成第二多智能体协同MDP问题;I model the data monitoring problem as a second multi-agent cooperative MDP problem;

II根据最优预编码矩阵的形式,将寻找最优预编码矩阵问题转化为寻找一对常数问题,从而加快训练过程;在某一特定时刻,单个子载波的动作为功率增益因子和功率分配因子,因此,所有子载波的动作为单载波上动作的集合;II According to the form of the optimal precoding matrix, the problem of finding the optimal precoding matrix is transformed into the problem of finding a pair of constants, thereby speeding up the training process; at a certain moment, the actions of a single subcarrier are the power gain factor and power allocation factor , therefore, the actions of all subcarriers are a set of actions on a single carrier;

III某一特定时刻,单个子载波上的状态为通过监听和反馈信道信息获得的信干噪比加上已知信道信息,全局状态是所有子载波状态的非重叠信息的并集;III At a particular moment, the state on a single subcarrier is the signal-to-interference-to-noise ratio obtained by monitoring and feeding back channel information plus known channel information, and the global state is the union of the non-overlapping information of all subcarrier states;

IV某一特定时刻的奖励设计鼓励子载波在监听成功和功率限制的约束下最大化监听率。IV The reward design at a particular moment encourages subcarriers to maximize the listening rate under the constraints of listening success and power constraints.

有益效果:本发明适用于窄波束大规模MIMO-OFDM系统中监听可疑通信链路。在发射机的波束扫描确定波束过程,监听器通过优化预编码矩阵来实现波束的诱导。在发射机传输数据的过程中,通过优化功率分配因子和功率增益因子来最大化监听率。考虑到监听器很难获得可疑节点之间的信道信息,本发明提出了基于MADDPG的学习方案,以帮助其进行波束诱导和数据监听。本发明提出的基于深度强化学习的大规模MIMO-OFDM系统中的主动监听方法不仅可以有效诱导发射机S选择对监听器E有利的波束,为接下来的数据监听过程打下基础,而且能使得监听器E重新调整功率分割因子和功率增益因子,有效维护通信链路,提高数据监听率。Beneficial effects: the present invention is suitable for monitoring suspicious communication links in a narrow beam massive MIMO-OFDM system. During the beam scanning process of the transmitter to determine the beam, the listener realizes the beam induction by optimizing the precoding matrix. In the process of transmitting data by the transmitter, the listening rate is maximized by optimizing the power distribution factor and the power gain factor. Considering that it is difficult for a listener to obtain channel information between suspicious nodes, the present invention proposes a MADDPG-based learning scheme to help it conduct beam induction and data monitoring. The active monitoring method in the massive MIMO-OFDM system based on deep reinforcement learning proposed by the present invention can not only effectively induce the transmitter S to select a beam that is beneficial to the listener E, and lay the foundation for the subsequent data monitoring process, but also enable the monitoring The device E readjusts the power division factor and power gain factor to effectively maintain the communication link and improve the data monitoring rate.

附图说明Description of drawings

图1是本发明的大规模MIMO-OFDM系统中的主动监听模型图;Fig. 1 is the active monitoring model diagram in the massive MIMO-OFDM system of the present invention;

图2是本发明的监听器E在发射机S的不同传输阶段的作用图(BS和DT是波束扫描和数据传输阶段的缩写);Fig. 2 is the action diagram of the listener E of the present invention in different transmission stages of the transmitter S (BS and DT are the abbreviations of beam scanning and data transmission stages);

图3是本发明的监听器E的收发器结构图;Fig. 3 is the transceiver structure diagram of the listener E of the present invention;

图4是本发明的不同Nte配置下的波束诱导成功率与传输功率关系图;FIG. 4 is a graph showing the relationship between beam induction success rate and transmission power under different N te configurations of the present invention;

图5是本发明的不同Nte配置下的监听器E的发送功率与发射机S的传输功率关系图;5 is a diagram showing the relationship between the transmission power of the listener E and the transmission power of the transmitter S under different N te configurations of the present invention;

图6是本发明的不同的PS和Nts条件下的平均监听率图;Fig. 6 is the average listening rate graph under different P S and N ts conditions of the present invention;

图7是本发明的各种监听方法的平均监听率图。FIG. 7 is a graph of the average interception rate of various interception methods of the present invention.

具体实施方式Detailed ways

本发明是在传统的伪中继监听的基础上提出了一种在大规模MIMO-OFDM系统中的监听方法,其中采用合法的全双工中继来实现波束诱导和数据监听。本发明假设在可疑的发射机上采用了模拟波束赋形,并利用波束扫描来选择最优的波束矢量。波束诱导是在波束扫描阶段完成的。波束诱导的目的是诱导可疑的接收机选择有利于监听器的波束。为了实现这一目的,监听器充当一个中继,将期望波束的测量参考信号放大并转发到可疑的接收机。在这一阶段,本发明的目标是通过优化监听器的预编码矩阵,在成功波束诱导的约束下,最小化监听器的总发送功率。经过数学推导,计算出了最优预编码矩阵的闭合表达形式,该形式与可疑通信对的CSI(Channel State Information,信道状态信息)有关。当监听器不知道它们之间的CSI时,本发明使用DRL(Deep Reinforcement Learning,深度强化学习)算法-MADDPG(Multi-Agent Deep Deterministic Policy Gradient,多代理深度确定性策略梯度)来确定所有子载波的传输参数。一旦实现波束诱导,监听器就可以实施数据监听,并通过继续扮演伪中继来提高监听率。在这一阶段,对功率分割因子和功率增益因子进行了优化,以使监听率最大化。同样,由于监听器不知道可疑通信对的CSI,本发明仍然用MADDPG来优化监听器的中继参数。The present invention proposes a monitoring method in a massive MIMO-OFDM system on the basis of traditional pseudo-relay monitoring, wherein a legal full-duplex relay is used to realize beam induction and data monitoring. The present invention assumes that analog beamforming is employed on the suspect transmitter and utilizes beam scanning to select the optimal beam vector. Beam steering is done in the beam scanning phase. The purpose of beam steering is to induce suspicious receivers to choose beams that favor the listener. To achieve this, the listener acts as a relay, amplifying and forwarding the measurement reference signal of the desired beam to the suspect receiver. At this stage, the goal of the present invention is to minimize the total transmit power of the listener under the constraints of successful beam induction by optimizing the precoding matrix of the listener. After mathematical derivation, the closed expression form of the optimal precoding matrix is calculated, which is related to the CSI (Channel State Information, channel state information) of the suspicious communication pair. When the listeners do not know the CSI between them, the present invention uses the DRL (Deep Reinforcement Learning) algorithm-MADDPG (Multi-Agent Deep Deterministic Policy Gradient, Multi-Agent Deep Deterministic Policy Gradient) to determine all sub-carriers transmission parameters. Once beam steering is achieved, the listener can implement data listening and increase the listening rate by continuing to act as a pseudo-relay. At this stage, the power split factor and power gain factor are optimized to maximize the listening rate. Also, since the listener does not know the CSI of the suspicious communication pair, the present invention still uses MADDPG to optimize the relay parameters of the listener.

下面结合附图,详细描述本发明的实施方式:Embodiments of the present invention are described in detail below in conjunction with the accompanying drawings:

本发明的应用场景如图1所示:本发明考虑一个合法的监听系统,它由一对可疑通信节点(发射机S和接收机D)和一个合法的监听器E组成。发射机S和接收机D分别配备了Nts根发射天线和Nrd根接收天线。本发明假设发射机S和接收机D都采用了模拟波束赋形的大规模MIMO-OFDM阵列来传输和接收信息。模拟波束是从预定义的离散码本中选择的,本发明把发射机S的码本表示为

Figure BDA0003567487890000051
监听器E作为一个全双工伪中继,通过Nre天线接收来自发射机S的信号,同时通过Nte天线将信号转发给接收机D。为了提高监听质量,监听器E在每个子载波上都采用了数字波束赋形技术。本发明假设系统中的所有信道在每个RB(Resource Block,资源块)中保持不变,但可能不同RB之间可能根据马尔可夫模型变化。The application scenario of the present invention is shown in FIG. 1 : the present invention considers a legal monitoring system, which consists of a pair of suspicious communication nodes (transmitter S and receiver D) and a legal listener E. Transmitter S and receiver D are equipped with Nts transmit antennas and Nrd receive antennas, respectively. The present invention assumes that both the transmitter S and the receiver D employ an analog beamforming massive MIMO-OFDM array to transmit and receive information. The analog beam is selected from a predefined discrete codebook, and the present invention expresses the codebook of the transmitter S as
Figure BDA0003567487890000051
As a full-duplex pseudo-relay, the listener E receives the signal from the transmitter S through the N re antenna, and at the same time forwards the signal to the receiver D through the N te antenna. In order to improve the monitoring quality, Listener E adopts digital beamforming technology on each subcarrier. The present invention assumes that all channels in the system remain unchanged in each RB (Resource Block, resource block), but may vary between different RBs according to the Markov model.

在本发明的方案中,如图2所示,对于发射机S,每个传输块的整个过程分为两个阶段:BS(Beam Sweeping,波束扫描)阶段和DT(Data Transmission,数据传输)阶段。而对于监听器E,监听过程包括三个阶段:波束选择、波束诱导性和欺骗性数据转发。在波束选择阶段,监听器通过监听反馈信道来获取波束质量信息。具体地说,当发射机S用波束赋形矢量

Figure BDA0003567487890000052
传输时,在第k个子载波上由接收机D和监听器E接收到的信号可以表示为In the solution of the present invention, as shown in FIG. 2 , for the transmitter S, the entire process of each transmission block is divided into two stages: a BS (Beam Sweeping, beam scanning) stage and a DT (Data Transmission, data transmission) stage . For listener E, the listening process includes three stages: beam selection, beam induction, and deceptive data forwarding. In the beam selection stage, the listener obtains beam quality information by listening to the feedback channel. Specifically, when transmitter S uses a beamforming vector
Figure BDA0003567487890000052
When transmitting, the signal received by receiver D and listener E on the kth subcarrier can be expressed as

Figure BDA0003567487890000053
Figure BDA0003567487890000053

and

Figure BDA0003567487890000054
Figure BDA0003567487890000054

其中sk是发射机的发送信号且

Figure BDA0003567487890000055
Figure BDA0003567487890000056
表示取期望,fj是波束赋形向量且|fj(n)|=1,n=1,...,Nts,j是波束索引,
Figure BDA0003567487890000057
Figure BDA0003567487890000058
是第k子载波上的发送功率,发射机S和接收机D之间的信道矩阵,发射机S和监听器E之间的信道矩阵,
Figure BDA0003567487890000061
表示矩阵维度。
Figure BDA0003567487890000062
Figure BDA0003567487890000063
是零均值加性高斯白噪声且协方差矩阵为σ2I。在接收机D的接收器上,使用
Figure BDA0003567487890000064
Figure BDA0003567487890000065
的模拟波束赋形器处理接收信号且有‖vD2=Nrd,其中‖·‖表示对向量取模或取矩阵的F范数。在监听器E处,第k子载波上的接收信号使用数字波束赋形器
Figure BDA0003567487890000066
处理信号。在发射机S、接收机D和监听器E的BS阶段,接收机D和监听器E的第k个子载波处的SNR(Signal to Noise Ratio,信噪比)为where sk is the transmitted signal of the transmitter and
Figure BDA0003567487890000055
Figure BDA0003567487890000056
represents the expectation, f j is the beamforming vector and |f j (n)|=1,n=1,...,N ts , j is the beam index,
Figure BDA0003567487890000057
and
Figure BDA0003567487890000058
is the transmit power on the kth subcarrier, the channel matrix between transmitter S and receiver D, the channel matrix between transmitter S and listener E,
Figure BDA0003567487890000061
represents the matrix dimension.
Figure BDA0003567487890000062
and
Figure BDA0003567487890000063
is zero mean additive white Gaussian noise and the covariance matrix is σ 2 I. On receiver D's receiver, use
Figure BDA0003567487890000064
Figure BDA0003567487890000065
The analog beamformer of A handles the received signal and has ‖v D2 =N rd , where ‖·‖ represents the F-norm of a vector modulo or a matrix. At listener E, the received signal on the kth subcarrier uses a digital beamformer
Figure BDA0003567487890000066
Process the signal. In the BS stage of transmitter S, receiver D and listener E, the SNR (Signal to Noise Ratio, signal-to-noise ratio) at the kth subcarrier of receiver D and listener E is:

Figure BDA0003567487890000067
Figure BDA0003567487890000067

and

Figure BDA0003567487890000068
Figure BDA0003567487890000068

接收机D计算所有子载波的平均信噪比

Figure BDA0003567487890000069
其中K是子载波的数量,并从
Figure BDA00035674878900000610
选择J个
Figure BDA00035674878900000611
值大的作为候选波束,然后将
Figure BDA00035674878900000612
和相应的索引反馈给S,其中J是最大反馈的波束数目。本发明假设监听器E可以通过监听发射机S和接收机D之间的反馈信道来获得这些反馈信息。当发射机S选择的波束导致低
Figure BDA00035674878900000613
监听器E很难监听发射机S传输的通信信息,因此,监听器E作为伪中继诱导发射机S的波束选择。对于监听器E,理想的波束应该为监听器E和接收机D提供较高的信噪比,因为低
Figure BDA00035674878900000614
的波束将消耗更多的监听器E的转发功率,因此,监听器E确定所需的最佳波束索引将根据Receiver D calculates the average SNR of all subcarriers
Figure BDA0003567487890000069
where K is the number of subcarriers and is derived from
Figure BDA00035674878900000610
choose J
Figure BDA00035674878900000611
The larger value is used as the candidate beam, and then the
Figure BDA00035674878900000612
And the corresponding index is fed back to S, where J is the maximum number of beams fed back. The present invention assumes that the listener E can obtain these feedback information by listening to the feedback channel between the transmitter S and the receiver D. When the beam selected by transmitter S results in a low
Figure BDA00035674878900000613
It is difficult for the listener E to monitor the communication information transmitted by the transmitter S. Therefore, the listener E acts as a pseudo-relay to induce the beam selection of the transmitter S. For listener E, the ideal beam should provide a high signal-to-noise ratio for both listener E and receiver D, because the low
Figure BDA00035674878900000614
The beams of will consume more of the forwarding power of the listener E, therefore, the optimal beam index required by the listener E will be determined according to

Figure BDA00035674878900000615
Figure BDA00035674878900000615

其中,δ是一个用于平衡监听器的监听成功率和功耗的折衷因子。where δ is a trade-off factor for balancing the monitoring success rate and power consumption of the listener.

在确定了所需的最佳波束索引j*之后,监听器E将诱导发射机S在接下来的BS期间选择最佳波束索引j*。在DT阶段,发射机S将通信数据传输给接收机D,监听器E充当一个AF(Amplify and Forward,放大转发)欺骗中继,在转发数据时进行监听。After determining the desired optimal beam index j * , the listener E will induce the transmitter S to select the optimal beam index j * during the next BS. In the DT stage, the transmitter S transmits the communication data to the receiver D, and the listener E acts as an AF (Amplify and Forward, Amplify and Forward) spoofing relay, and monitors when the data is forwarded.

如图3所示,

Figure BDA00035674878900000616
αk和gk分别表示监听器E在子载波k处的接收波束赋形向量、发送波束赋形向量、功率分配因子和功率增益因子。在波束诱导阶段,接收到的信号被放大并通过αk=1传输,即不需要解码信息。在数据转发阶段,将接收到的信号功率分成用于解码和转发两部分。本发明将分析如何优化
Figure BDA0003567487890000071
以实现最大的监听速率。As shown in Figure 3,
Figure BDA00035674878900000616
α k and g k represent the receive beamforming vector, transmit beamforming vector, power allocation factor and power gain factor of the listener E at subcarrier k, respectively. During the beam induction phase, the received signal is amplified and transmitted with α k =1, ie no decoding of the information is required. In the data forwarding stage, the received signal power is divided into two parts for decoding and forwarding. This invention will analyze how to optimize
Figure BDA0003567487890000071
for maximum listening rate.

在发射机S的波束扫描期间,监听器E将放大和转发接收到发射机S发出的用来测量发射机S和接收机D之间的波束质量的导频信号。本发明假设监听器使用的AF中继的延迟远小于符号持续时间,因此可以忽略。由于监听器E的全双工性质,监听器E在子载波k处的接收信号为During the beam scan of the transmitter S, the listener E will amplify and retransmit the pilot signal received by the transmitter S to measure the beam quality between the transmitter S and the receiver D. The present invention assumes that the delay of the AF relay used by the listener is much smaller than the symbol duration and therefore can be ignored. Due to the full-duplex nature of listener E, the received signal of listener E at subcarrier k is

Figure BDA0003567487890000072
Figure BDA0003567487890000072

其中

Figure BDA0003567487890000073
为自干扰信道,
Figure BDA0003567487890000074
为监听器E的预编码矩阵,
Figure BDA0003567487890000075
是上一时刻监听器接收的信号。从(6)中可以看到,如果Wk
Figure BDA0003567487890000076
的零空间,则
Figure BDA0003567487890000077
那么(6)中
Figure BDA0003567487890000078
Figure BDA0003567487890000079
Figure BDA00035674878900000710
Figure BDA00035674878900000711
的零奇异值所对应于的右奇异矩阵,预编码矩阵可以写成in
Figure BDA0003567487890000073
is the self-interfering channel,
Figure BDA0003567487890000074
is the precoding matrix of listener E,
Figure BDA0003567487890000075
is the signal received by the listener at the last moment. It can be seen from (6) that if W k is in
Figure BDA0003567487890000076
the null space, then
Figure BDA0003567487890000077
Then in (6)
Figure BDA0003567487890000078
Assume
Figure BDA0003567487890000079
Figure BDA00035674878900000710
for
Figure BDA00035674878900000711
The right singular matrix corresponding to the zero singular value of , the precoding matrix can be written as

Figure BDA00035674878900000712
Figure BDA00035674878900000712

其中,

Figure BDA00035674878900000713
是新的待优化的矩阵。为了保证r0>0,本发明有Nte>Nre,即监听器E需要比接收天线更多的发射天线来抑制自干扰。消除自干扰后,传输信号
Figure BDA00035674878900000714
可以表示为
Figure BDA00035674878900000715
监听器E的传输功率
Figure BDA00035674878900000716
Figure BDA00035674878900000717
经过接收波束赋形后接收机D接收到的信号可以表示in,
Figure BDA00035674878900000713
is the new matrix to be optimized. In order to ensure that r 0 >0, the present invention has N te >N re , that is, the listener E needs more transmit antennas than receive antennas to suppress self-interference. After eliminating self-interference, transmit the signal
Figure BDA00035674878900000714
It can be expressed as
Figure BDA00035674878900000715
Transmit power of listener E
Figure BDA00035674878900000716
for
Figure BDA00035674878900000717
The signal received by receiver D after receive beamforming can be expressed as

Figure BDA00035674878900000718
Figure BDA00035674878900000718

其中,

Figure BDA00035674878900000719
是第k个子载波上监听器E和接收机D之间的信道矩阵,以及
Figure BDA00035674878900000720
Figure BDA00035674878900000721
为新构建的等效信道。之后,第k个子载波上接收机D的接收信噪比可以表示为in,
Figure BDA00035674878900000719
is the channel matrix between listener E and receiver D on the kth subcarrier, and
Figure BDA00035674878900000720
and
Figure BDA00035674878900000721
for the newly constructed equivalent channel. Afterwards, the received signal-to-noise ratio of receiver D on the kth subcarrier can be expressed as

Figure BDA00035674878900000722
Figure BDA00035674878900000722

为了用最小传输功率完成诱导发射机S选择最佳波束索引j*,这个问题可以表述为In order to induce transmitter S to choose the best beam index j * with minimum transmission power, this problem can be formulated as

Figure BDA0003567487890000081
Figure BDA0003567487890000081

其中,

Figure BDA0003567487890000082
Figure BDA0003567487890000083
为了得到
Figure BDA0003567487890000084
的闭式解,本发明将问题(10)分解为K独立子问题,令
Figure BDA0003567487890000085
其中
Figure BDA0003567487890000086
Figure BDA0003567487890000087
因此,这K个子问题可以表述为in,
Figure BDA0003567487890000082
and
Figure BDA0003567487890000083
in order to get
Figure BDA0003567487890000084
The closed-form solution of , the present invention decomposes problem (10) into K independent subproblems, let
Figure BDA0003567487890000085
in
Figure BDA0003567487890000086
Figure BDA0003567487890000087
Therefore, the K subproblems can be formulated as

Figure BDA0003567487890000088
Figure BDA0003567487890000088

为了求解(11),本发明首先证明了一个引理:问题(11)的最优解可以表示为In order to solve (11), the present invention first proves a lemma: the optimal solution of problem (11) can be expressed as

Figure BDA0003567487890000089
Figure BDA0003567487890000089

其中

Figure BDA00035674878900000810
引理的证明如下:in
Figure BDA00035674878900000810
The proof of the lemma is as follows:

为了简洁起见,在以下引理证明中将省略子载波k的标识。为了证明引理,本发明假设(11)的可行解为

Figure BDA00035674878900000811
其中
Figure BDA00035674878900000812
w为可行解的幅值参数,则可行解对应的功率消耗P(W′)为
Figure BDA00035674878900000813
然后本发明构建矩阵
Figure BDA00035674878900000814
其中
Figure BDA00035674878900000815
其中的不等式遵循对任意矩阵(向量)A和B,有||AB||≤||A||||B||。下面将证明新矩阵W″不仅对问题(11)是可行的,而且得到一个小于P(W′)的目标值。令
Figure BDA00035674878900000816
通过将W″代入(9)中βk的分子和分母,因此有了For the sake of brevity, the identification of subcarrier k will be omitted in the following lemma proofs. To prove the lemma, the present invention assumes that the feasible solution of (11) is
Figure BDA00035674878900000811
in
Figure BDA00035674878900000812
w is the amplitude parameter of the feasible solution, then the power consumption P(W′) corresponding to the feasible solution is
Figure BDA00035674878900000813
Then the present invention constructs the matrix
Figure BDA00035674878900000814
in
Figure BDA00035674878900000815
where the inequality follows for arbitrary matrices (vectors) A and B, ||AB||≤||A||||B||. The following will prove that the new matrix W" is not only feasible for problem (11), but also obtains a target value smaller than P(W'). Let
Figure BDA00035674878900000816
By substituting W" into the numerator and denominator of β k in (9), we have

Figure BDA00035674878900000817
Figure BDA00035674878900000817

Figure BDA00035674878900000818
Figure BDA00035674878900000818

其中,(13)和(14)遵循三角形不等式。基于(13)和(14),本发明推断出β(W″)≥β(W′)≥βD。以上结果表明,W″对问题(11)是可行的。通过将W″代入到(11)的目标函数中,本发明得到了where (13) and (14) follow the triangle inequality. Based on (13) and (14), the present invention deduces β(W″)≥β(W′)≥β D . The above results show that W″ is feasible for problem (11). By substituting W" into the objective function of (11), the present invention obtains

Figure BDA0003567487890000091
Figure BDA0003567487890000091

其中(15)遵循柯西-施瓦兹不等式

Figure BDA0003567487890000092
综上所述,对于问题(11)的任何解W′,本发明总是可以构造另一个
Figure BDA0003567487890000093
得到更小的目标值,这就证明了这一引理。where (15) follows the Cauchy-Schwartz inequality
Figure BDA0003567487890000092
To sum up, for any solution W' of problem (11), the present invention can always construct another
Figure BDA0003567487890000093
A smaller target value is obtained, which proves the lemma.

将(12)代入(11)中的目标函数,可以看到

Figure BDA0003567487890000094
是wk的递增函数。因此,在(12)中,从一个小值逐渐增加wk,直到满足(11)中的约束,可以找到唯一的未知变量wk。由于引理适用于任何给定的
Figure BDA0003567487890000095
因此(10)的最优解与(12)具有相同的形式。在理论上,本发明可以通过考虑所有可能的组合
Figure BDA0003567487890000096
来得到(10)的最优解。对于给定的
Figure BDA0003567487890000097
(11)的解提供了(10)的解的上界。只有当监听器E能够知道所有信道
Figure BDA0003567487890000098
时,监听器E才能采用(12)中的预编码矩阵。本发明假设监听器E可以通过监听导频信号得到等效的信道向量
Figure BDA0003567487890000099
然而,由于发射机S和监听器E之间的非合作关系,很难获得
Figure BDA00035674878900000910
因此,本发明根据DRL的反馈β和βD调整
Figure BDA00035674878900000911
Figure BDA00035674878900000912
来最小化PE。通过采用MADDPG的学习框架,实时确定
Figure BDA00035674878900000913
诱导发射机S选择监听器E所需的波束。最终,(7)中的Wk可以表示为列向量
Figure BDA00035674878900000914
和行向量
Figure BDA00035674878900000915
的乘积,如图3所示。Substituting (12) into the objective function in (11), we can see
Figure BDA0003567487890000094
is an increasing function of wk . Therefore, in (12), gradually increasing wk from a small value until the constraint in (11) is satisfied, the unique unknown variable wk can be found. Since the lemma holds for any given
Figure BDA0003567487890000095
Therefore the optimal solution of (10) has the same form as (12). In theory, the present invention can be realized by considering all possible combinations
Figure BDA0003567487890000096
to get the optimal solution of (10). for a given
Figure BDA0003567487890000097
The solution of (11) provides an upper bound on the solution of (10). Only if listener E can know all channels
Figure BDA0003567487890000098
, the listener E can use the precoding matrix in (12). The present invention assumes that the listener E can obtain the equivalent channel vector by monitoring the pilot signal
Figure BDA0003567487890000099
However, due to the non-cooperative relationship between transmitter S and listener E, it is difficult to obtain
Figure BDA00035674878900000910
Therefore, the present invention adjusts according to the feedback β and β D of DRL
Figure BDA00035674878900000911
Figure BDA00035674878900000912
to minimize P E . By adopting the learning framework of MADDPG, real-time determination
Figure BDA00035674878900000913
The transmitter S is induced to select the desired beam of the listener E. Finally, W k in (7) can be expressed as a column vector
Figure BDA00035674878900000914
and row vector
Figure BDA00035674878900000915
, as shown in Figure 3.

成功的波束诱导并不意味着监听可以成功地进行。在数据传输阶段,如果监听器E不将数据转发到接收机D,则接收机D的误码率可能高于阈值,并触发波束恢复过程,从而切换波束。因此,为了在AF中继操作下实现监听器E的数据中继和监听,将接收信号

Figure BDA00035674878900000916
分为两部分,一部分用于转发信息以增加接收机D的信噪比,另一部分用于信息解码以监听发射机S发送的消息。由于αk的引入,
Figure BDA00035674878900000917
中的功率增益因子wk需要重新优化。定义
Figure BDA00035674878900000918
为标准化波束赋形向量,则监听器E的发射信号
Figure BDA00035674878900000919
表示为Successful beam steering does not imply that listening can be successfully performed. During the data transmission phase, if the listener E does not forward the data to the receiver D, the bit error rate of the receiver D may be higher than the threshold and trigger the beam recovery process, thereby switching the beam. Therefore, in order to realize the data relay and monitoring of the listener E under the AF relay operation, the received signal will be
Figure BDA00035674878900000916
It is divided into two parts, one part is used to forward the information to increase the signal-to-noise ratio of the receiver D, and the other part is used to decode the information to listen to the message sent by the transmitter S. Due to the introduction of α k ,
Figure BDA00035674878900000917
The power gain factor w k in needs to be re-optimized. definition
Figure BDA00035674878900000918
is the normalized beamforming vector, then the transmitted signal of the listener E
Figure BDA00035674878900000919
Expressed as

Figure BDA00035674878900000920
Figure BDA00035674878900000920

其中gk为功率增益因子,用于控制数据监听阶段的传输功率,αk为功率分配因子。需要注意的是,

Figure BDA0003567487890000101
Figure BDA0003567487890000102
与波束诱导相一致,因为在这两个阶段,本发明都旨在提高接收机D的信噪比。与(8)类似,数据传输阶段接收机D的接收信号可以写为Among them, g k is the power gain factor, which is used to control the transmission power in the data monitoring stage, and α k is the power distribution factor. have to be aware of is,
Figure BDA0003567487890000101
and
Figure BDA0003567487890000102
Consistent with beam steering, since the present invention aims to improve the signal-to-noise ratio of receiver D in both stages. Similar to (8), the received signal of receiver D in the data transmission stage can be written as

Figure BDA0003567487890000103
Figure BDA0003567487890000103

对于给定的

Figure BDA0003567487890000104
Figure BDA0003567487890000105
接收到的接收机D和监听器E的信噪比可以计算为
Figure BDA0003567487890000106
Figure BDA0003567487890000107
然后,监听器E的目标是优化
Figure BDA0003567487890000108
从而使监听率在传输功率的约束下达到最大值。因此,优化问题可以表示为for a given
Figure BDA0003567487890000104
and
Figure BDA0003567487890000105
The received signal-to-noise ratio of receiver D and listener E can be calculated as
Figure BDA0003567487890000106
and
Figure BDA0003567487890000107
Then, the goal of listener E is to optimize
Figure BDA0003567487890000108
Therefore, the monitoring rate can reach the maximum value under the constraint of transmission power. Therefore, the optimization problem can be expressed as

Figure BDA0003567487890000109
Figure BDA0003567487890000109

其中,

Figure BDA00035674878900001010
Figure BDA00035674878900001011
和PM分别为监听器E的总发射功率和功率约束。本发明假设监听器E只有在RE≥RD时才能实现监听,相应的监听速率为RD。如果监听器E知道全局CSI,则可以用拉格朗日乘子法推导出(18)的解。然而,当
Figure BDA00035674878900001012
未知时,本发明无法得到最优的
Figure BDA00035674878900001013
比知道
Figure BDA00035674878900001014
更合理的假设是,可以通过监听发射机S和接收机D之间的上行控制信道来获得
Figure BDA00035674878900001015
因此,采用DRL以
Figure BDA00035674878900001016
作为观测状态并与系统交互来确定
Figure BDA00035674878900001017
通过使用MADDPG训练神经网络,实时给出了
Figure BDA00035674878900001018
从而提高了在可控传输功率下监听器E的监听率。in,
Figure BDA00035674878900001010
Figure BDA00035674878900001011
and P M are the total transmit power and power constraint of listener E, respectively. The present invention assumes that the monitor E can monitor only when RE ≥ RD , and the corresponding monitor rate is RD . If the listener E knows the global CSI, the solution to (18) can be derived using the Lagrange multiplier method. However, when
Figure BDA00035674878900001012
When unknown, the present invention cannot obtain the optimal
Figure BDA00035674878900001013
than know
Figure BDA00035674878900001014
A more reasonable assumption is that it can be obtained by listening to the uplink control channel between transmitter S and receiver D
Figure BDA00035674878900001015
Therefore, using DRL to
Figure BDA00035674878900001016
Determined as observed state and interaction with the system
Figure BDA00035674878900001017
By training a neural network with MADDPG, in real-time given
Figure BDA00035674878900001018
Thus, the listening rate of the listener E under the controllable transmission power is improved.

基于上述的分析,当发射机S和接收机D之间的CSI未知时,将波束诱导和数据监听问题表述为MDP(Markov Decision Process,马尔可夫决策过程)问题。将所有子载波视为一个代理,并通过单一DDPG的Actor-Critic网络获得策略是第一直觉的深度学习解决方案。然而,在实际实现中,训练一个具有大动作空间的策略通常比训练多个具有小动作空间的策略更困难。因此,在这两个阶段中,本发明将每个子载波视为一个单独的代理,它们合作实现一个共同的目标。因此,本发明采用了MADDPG的学习架构,其中包括K个Actor(策略)和一个集中的Critic(价值函数)。在训练阶段,Actor和Critic使用全局数据进行更新,包括全局状态、共享奖励和所有动作,这些数据将在稍后定义。Based on the above analysis, when the CSI between the transmitter S and the receiver D is unknown, the beam steering and data monitoring problems are formulated as MDP (Markov Decision Process, Markov Decision Process) problems. Treating all sub-carriers as a proxy and obtaining the policy through the Actor-Critic network of a single DDPG is a first-intuitive deep learning solution. However, in practical implementations, training one policy with a large action space is usually more difficult than training multiple policies with small action spaces. Therefore, in these two phases, the present invention treats each sub-carrier as a separate agent that cooperate to achieve a common goal. Therefore, the present invention adopts the learning architecture of MADDPG, which includes K Actors (policies) and a centralized Critic (value function). During the training phase, Actors and Critic are updated with global data, including global state, shared rewards, and all actions, which will be defined later.

将波束诱导问题建模成第一多智能体协同MDP问题,根据最优预编码矩阵的形式,将寻找最优预编码矩阵问题转化为寻找一对常数(wkk)问题,其中θk为MADDPG算法对

Figure BDA0003567487890000111
的估计,从而加快训练过程。在t时刻,第k个子载波的动作用
Figure BDA0003567487890000112
表示。因此,所有子载波的动作为
Figure BDA0003567487890000113
t时刻,每个子载波k上的状态为
Figure BDA0003567487890000114
β和βD是通过监听和分析反馈信道上的波束报告而获得的。全局状态st是所有子载波状态
Figure BDA0003567487890000115
的非重叠信息的并集,即
Figure BDA0003567487890000116
t时刻的奖励rt定义为rt=-a1PE-a2(β-βD-B)2+a3I(β,βD),其中
Figure BDA0003567487890000117
为正的用于平衡监听器的诱导成功率和功耗的系数,B是用来增加选择最佳波束索引j*概率的常数,I(x,y)是一个布尔函数,其中当x≥y时I(x,y)=1,否则I(x,y)=0。奖励函数鼓励成功的波束诱导,同时惩罚消耗过多能量的行为。The beam steering problem is modeled as the first multi-agent cooperative MDP problem. According to the form of the optimal precoding matrix, the problem of finding the optimal precoding matrix is transformed into the problem of finding a pair of constants (w k , θ k ), where θ k is the MADDPG algorithm pair
Figure BDA0003567487890000111
, thereby speeding up the training process. At time t, the action of the kth subcarrier
Figure BDA0003567487890000112
express. Therefore, the actions of all subcarriers are
Figure BDA0003567487890000113
At time t, the state on each subcarrier k is
Figure BDA0003567487890000114
β and β D are obtained by listening and analyzing beam reports on the feedback channel. The global state s t is the state of all subcarriers
Figure BDA0003567487890000115
The union of non-overlapping information of , i.e.
Figure BDA0003567487890000116
The reward rt at time t is defined as r t =-a 1 P E -a 2 (β-β D -B) 2 +a 3 I(β,β D ), where
Figure BDA0003567487890000117
is a positive coefficient used to balance the induction success rate and power consumption of the listener, B is a constant used to increase the probability of choosing the best beam index j * , and I(x,y) is a Boolean function where x ≥ y When I(x,y)=1, otherwise I(x,y)=0. The reward function encourages successful beam induction while penalizing behaviors that consume too much energy.

将数据监听问题建模成第二多智能体协同MDP问题,根据最优预编码矩阵的形式,将寻找最优预编码矩阵问题转化为寻找一对常数(gkk)问题,其中gk和αk分别表示监听器在子载波k上的功率增益因子和功率分配比,从而加快训练过程。在t时刻,第k个子载波的动作为

Figure BDA0003567487890000118
因此,所有子载波的动作为
Figure BDA00035674878900001117
t时刻,每个子载波k上的状态为
Figure BDA0003567487890000119
Figure BDA00035674878900001110
其中
Figure BDA00035674878900001111
是通过监听和反馈信道信息获得的。全局状态st是所有子载波状态
Figure BDA00035674878900001112
的非重叠信息的并集,即
Figure BDA00035674878900001113
t时刻的奖励rt定义为
Figure BDA00035674878900001114
其中
Figure BDA00035674878900001115
为正的用于平衡中继的监听率和功耗的系数,C是用来提升监听率的常数。奖励函数鼓励子载波在RE>RD和PE≤PM的约束下最大化RD。The data monitoring problem is modeled as the second multi-agent cooperative MDP problem. According to the form of the optimal precoding matrix, the problem of finding the optimal precoding matrix is transformed into the problem of finding a pair of constants (g k , α k ), where g k and α k represent the power gain factor and power distribution ratio of the listener on subcarrier k, respectively, thereby speeding up the training process. At time t, the action of the kth subcarrier is
Figure BDA0003567487890000118
Therefore, the actions of all subcarriers are
Figure BDA00035674878900001117
At time t, the state on each subcarrier k is
Figure BDA0003567487890000119
Figure BDA00035674878900001110
in
Figure BDA00035674878900001111
It is obtained by monitoring and feeding back channel information. The global state s t is the state of all subcarriers
Figure BDA00035674878900001112
The union of non-overlapping information of , i.e.
Figure BDA00035674878900001113
The reward r t at time t is defined as
Figure BDA00035674878900001114
in
Figure BDA00035674878900001115
A positive factor used to balance the repeater's listening rate and power consumption, C is a constant used to increase the listening rate. The reward function encourages sub-carriers to maximize RD under the constraints of RE > RD and PEPM .

如图4所示,在波束诱导阶段,不同Nte下波束诱导成功率与发射功率PS的关系图。本发明在每个子载波上分配相同的传输功率,即

Figure BDA00035674878900001116
诱导率是通过计算105次蒙特卡洛仿真中统计β≥βD的数量得到的。在被动方法中,当发射机S执行波束扫描时,监听器E保持沉默。在这种情况下,当监听器E和接收机D相距很远时,接收机D将以低概率选择最佳波束索引j*。结果表明,本发明提出的基于MADDPG的方法的成功率接近100%。这些结果验证了该方法在不同系统配置下的有效性。As shown in Figure 4, in the beam induction stage, the relationship between the beam induction success rate and the transmit power P S under different N te . The present invention allocates the same transmission power on each sub-carrier, namely
Figure BDA00035674878900001116
The induction rate was obtained by counting the number of statistical β ≥ β D in 10 5 Monte Carlo simulations. In the passive method, the listener E remains silent while the transmitter S performs beam scanning. In this case, when listener E and receiver D are far apart, receiver D will choose the best beam index j * with low probability. The results show that the success rate of the MADDPG-based method proposed in the present invention is close to 100%. These results verify the effectiveness of the method under different system configurations.

如图5所示,在波束诱导阶段,不同Nte配置下的PE和PS之间的关系图。在最优方案中,PE是已知(10)中

Figure BDA0003567487890000121
计算的最优目标值。在基于MADDPG的方案中,PE是用MADDPG学习到的参数来计算的。可以看出,虽然PE会随着PS的增加而增加,但配备更多的Nte可以有效地降低PE。结合图3和图4,可以看出,即使
Figure BDA0003567487890000122
未知,本发明仍然可以利用MADDPG学习到的波束诱导策略实现波束诱导,且发射功率略高于理论最小功率。Figure 5 shows the relationship between PE and PS under different N te configurations during the beam induction stage. In the optimal solution, PE is known in (10)
Figure BDA0003567487890000121
Calculated optimal target value. In MADDPG -based schemes, PE is computed using the parameters learned by MADDPG. It can be seen that although PE increases with the increase of PS , equipping more Nte can effectively decrease PE . Combining Figures 3 and 4, it can be seen that even if
Figure BDA0003567487890000122
Unknown, the present invention can still utilize the beam induction strategy learned by MADDPG to achieve beam induction, and the transmit power is slightly higher than the theoretical minimum power.

如图6所示,在数据监听的阶段,通过求解(18),得到了最优解。具有SBM(Successful Beam Misleading,成功的波束诱导)的被动方法意味着代理在BS阶段实现波束诱导,但在DT阶段保持沉默。如图6所示,成功诱导后,监听率会随着PS或Nts的增加而增加,本发明可以通过调整传输参数来保证RE≥RD。同时,本发明提出的方法监听率接近于最优解,并且明显优于有SBM下的被动监听方法。As shown in Fig. 6, in the stage of data monitoring, the optimal solution is obtained by solving (18). A passive approach with SBM (Successful Beam Misleading) means that the agent achieves beam steering during the BS phase, but remains silent during the DT phase. As shown in FIG. 6 , after successful induction, the listening rate will increase with the increase of P S or N ts , and the present invention can ensure that RE ≥ R D by adjusting the transmission parameters. At the same time, the monitoring rate of the method proposed by the present invention is close to the optimal solution, and is obviously better than the passive monitoring method with SBM.

如图7所示,图7对比了不同功率约束PM下的多种监听方案,并绘制了传统的主动干扰方案作为比较。结果表明,本发明提出的MADDPG方案所获得的监听率接近于最优解,且随着PM的增加而增加。当PM>55dBm,

Figure BDA0003567487890000123
时,监听率接近最大RE。无SBM被动监听方案的监听性能与监听器E的发射功率无关,有SBM的被动监听的监听性能优于无SBM的方法。干扰方案的平均窃听率受到监听器E的功率约束的限制,因为它在功率限制值PM相对较低时不能保证RE≥RD。As shown in Fig. 7, Fig. 7 compares various monitoring schemes under different power constraints PM, and draws the traditional active jamming scheme for comparison. The results show that the interception rate obtained by the MADDPG scheme proposed by the present invention is close to the optimal solution, and increases with the increase of PM. When P M > 55dBm,
Figure BDA0003567487890000123
, the monitoring rate is close to the maximum RE . The monitoring performance of the passive monitoring scheme without SBM has nothing to do with the transmit power of the listener E, and the monitoring performance of passive monitoring with SBM is better than that of the method without SBM. The average eavesdropping rate of the jamming scheme is limited by the power constraint of the listener E , since it cannot guarantee RE ≥ RD when the power limit value PM is relatively low.

仿真证明,本发明提出的基于深度强化学习的大规模MIMO-OFDM系统中的主动监听方法不仅可以有效诱导发射机S选择对监听器E有利的波束,为接下来的数据监听过程打下基础,而且能使得监听器E重新调整功率分配因子和功率增益因子,有效维护通信链路,提高数据监听率。两个阶段结合,实现了监听窄波束通信的大规模MIMO-OFDM系统。The simulation proves that the active monitoring method in the massive MIMO-OFDM system based on deep reinforcement learning proposed by the present invention can not only effectively induce the transmitter S to select a beam that is beneficial to the listener E, and lay the foundation for the subsequent data monitoring process, but also The monitor E can be made to readjust the power distribution factor and the power gain factor, effectively maintain the communication link, and improve the data monitoring rate. The combination of the two stages realizes a massive MIMO-OFDM system for monitoring narrow-beam communications.

Claims (6)

1.一种基于深度强化学习的主动监听方法,包含以下步骤:1. An active monitoring method based on deep reinforcement learning, comprising the following steps: (1)发射机S按照波束预码本以时分的方式执行模拟波束扫描;(1) The transmitter S performs analog beam scanning in a time-division manner according to the beam precodebook; (2)在发射机S执行波束扫描阶段,监听器E根据自身的波束质量报告和接收机D反馈给发射机S的波束报告确定对自身有利的最佳波束索引j*(2) In the stage of beam scanning performed by the transmitter S, the listener E determines the optimal beam index j * that is beneficial to itself according to its own beam quality report and the beam report fed back to the transmitter S by the receiver D; (3)监听器E通过优化转发预编码矩阵来诱导发射机S选择最佳波束索引j*(3) The listener E induces the transmitter S to select the best beam index j * by optimizing the forwarding precoding matrix; (4)在最佳波束索引j*确定后的通信阶段,监听器E充当数据转发的伪中继,维护通信波束,提高数据监听率。(4) In the communication stage after the optimal beam index j * is determined, the listener E acts as a pseudo relay for data forwarding, maintains the communication beam, and improves the data monitoring rate. 2.根据权利要求1所述的基于深度强化学习的主动监听方法,其特征在于,所述步骤(2)中包括如下步骤:2. the active monitoring method based on deep reinforcement learning according to claim 1, is characterized in that, comprises the following steps in described step (2): 1)接收机D和监听器E分别接收发射机S发出的波束质量测量参考信号,并根据接收信号计算波束质量,所述接收机D形成波束质量报告并反馈给发射机S做波束选择参考;1) The receiver D and the listener E respectively receive the beam quality measurement reference signal sent by the transmitter S, and calculate the beam quality according to the received signal, and the receiver D forms a beam quality report and feeds it back to the transmitter S for beam selection reference; 2)监听器E根据自身的波束质量报告和通过监听接收机D反馈给发射机S的波束质量报告,同时考虑功率消耗的因素,最终根据波束诱导成功率和功耗折衷公式确定最佳波束索引j*2) Listener E determines the optimal beam index according to its own beam quality report and the beam quality report fed back to transmitter S through monitoring receiver D, while considering the factors of power consumption, and finally according to the compromise formula of beam induction success rate and power consumption j * . 3.根据权利要求1所述的基于深度强化学习的主动监听方法,其特征在于,所述步骤(3)中包括如下步骤:3. the active monitoring method based on deep reinforcement learning according to claim 1, is characterized in that, comprises the following steps in described step (3): ㈠监听器形成优化问题:在波束诱导成功的约束下最小化监听器的总发射功率,根据优化问题推导出最优预编码矩阵的形式,得出最优预编码矩阵与发射机S和接收机D的信道状态信息有关;(1) Listener formation optimization problem: The total transmit power of the listener is minimized under the constraint of successful beam induction, and the form of the optimal precoding matrix is derived according to the optimization problem, and the optimal precoding matrix and transmitter S and receiver are obtained. D's channel state information; ㈡监听器E使用MADDPG算法训练第一拟合网络来确定第一转发矩阵的传输参数,之后利用所述传输参数确定的第一转发矩阵向接收机D转发波束质量测量参考信号,诱导接收机D发送有误的波束测量报告,从而使得发射机S选择对监听器E有利的波束。(2) Listener E uses MADDPG algorithm to train the first fitting network to determine the transmission parameter of the first forwarding matrix, and then utilizes the first forwarding matrix determined by the described transmission parameter to forward the beam quality measurement reference signal to receiver D, inducing receiver D An erroneous beam measurement report is sent so that transmitter S selects a beam that is beneficial to listener E. 4.根据权利要求1所述的基于深度强化学习的主动监听方法,其特征在于,所述步骤(4)中包括如下步骤:4. the active monitoring method based on deep reinforcement learning according to claim 1, is characterized in that, comprises the following steps in described step (4): i监听器E接收发射机S发出的传输数据,并形成优化问题:在成功监听和发送功率小于转发功率上限的条件下,最大化数据监听率;The i listener E receives the transmission data sent by the transmitter S, and forms an optimization problem: maximize the data monitoring rate under the condition that the successful monitoring and the transmission power is less than the upper limit of the forwarding power; ii监听器使用MADDPG算法训练第二拟合网络来确定功率分配因子和功率增益因子,让一部分功率用来解码,一部分功率用来转发信号,之后利用第二转发矩阵向接收机D转发通信数据,以维护通信波束,提高数据监听率。ii The listener uses the MADDPG algorithm to train the second fitting network to determine the power distribution factor and the power gain factor, so that a part of the power is used for decoding and a part of the power is used for forwarding the signal, and then the second forwarding matrix is used to forward the communication data to the receiver D, In order to maintain the communication beam and improve the data monitoring rate. 5.根据权利要求3所述的基于深度强化学习的主动监听方法,其特征在于,所述步骤㈡中包括如下步骤:5. the active monitoring method based on deep reinforcement learning according to claim 3, is characterized in that, comprises the following steps in described step (ii): ①将波束诱导问题建模成第一多智能体协同MDP问题;①Model the beam steering problem as the first multi-agent cooperative MDP problem; ②根据最优预编码矩阵的形式,将寻找最优预编码矩阵问题转化为寻找一对常数问题,从而加快训练过程;在某一特定时刻,单子载波上的动作为预编码矩阵的角度及幅度,因此,所有子载波的动作为单载波上动作的集合;② According to the form of the optimal precoding matrix, the problem of finding the optimal precoding matrix is transformed into the problem of finding a pair of constants, thereby speeding up the training process; at a certain moment, the action on a single subcarrier is the angle and amplitude of the precoding matrix , therefore, the actions of all subcarriers are a set of actions on a single carrier; ③某一特定时刻,单子载波上的状态为通过监听和分析反馈信道上的波束报告信息加上已知信道信息,全局状态是所有子载波状态的非重叠信息的并集;③ At a certain moment, the state on a single subcarrier is the addition of known channel information by monitoring and analyzing the beam report information on the feedback channel, and the global state is the union of the non-overlapping information of all subcarrier states; ④某一特定时刻的奖励函数设计鼓励成功的波束诱导,同时惩罚消耗过多能量的行为。④ The reward function design at a particular moment encourages successful beam induction while punishing behaviors that consume too much energy. 6.根据权利要求4所述的基于深度强化学习的主动监听方法,其特征在于,所述步骤ii中包括如下步骤:6. the active monitoring method based on deep reinforcement learning according to claim 4, is characterized in that, comprises the following steps in described step ii: I将数据监听问题建模成第二多智能体协同MDP问题;I model the data monitoring problem as a second multi-agent cooperative MDP problem; II根据最优预编码矩阵的形式,将寻找最优预编码矩阵问题转化为寻找一对常数问题,从而加快训练过程;在某一特定时刻,单个子载波的动作为功率增益因子和功率分配因子,因此,所有子载波的动作为单载波上动作的集合;II According to the form of the optimal precoding matrix, the problem of finding the optimal precoding matrix is transformed into the problem of finding a pair of constants, thereby speeding up the training process; at a certain moment, the actions of a single subcarrier are the power gain factor and power allocation factor , therefore, the actions of all subcarriers are a set of actions on a single carrier; III某一特定时刻,单个子载波上的状态为通过监听和反馈信道信息获得的信干噪比加上已知信道信息,全局状态是所有子载波状态的非重叠信息的并集;III At a particular moment, the state on a single subcarrier is the signal-to-interference-to-noise ratio obtained by monitoring and feeding back channel information plus known channel information, and the global state is the union of the non-overlapping information of all subcarrier states; IV某一特定时刻的奖励设计鼓励子载波在监听成功和功率限制的约束下最大化监听率。IV The reward design at a particular moment encourages subcarriers to maximize the listening rate under the constraints of listening success and power constraints.
CN202210312148.3A 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning Pending CN114884547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210312148.3A CN114884547A (en) 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210312148.3A CN114884547A (en) 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114884547A true CN114884547A (en) 2022-08-09

Family

ID=82669000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210312148.3A Pending CN114884547A (en) 2022-03-28 2022-03-28 Active monitoring method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114884547A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090232229A1 (en) * 2008-03-17 2009-09-17 Sumeet Sandhu Device, system, and method of resource allocation in a wireless network
CN106411911A (en) * 2016-10-19 2017-02-15 浙江大学 Method for maximizing monitoring non-interruption probability of system based on active monitoring
CN113810930A (en) * 2021-08-24 2021-12-17 华北电力大学(保定) A kind of intelligent reflective surface monitoring optimization method, device and controller

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090232229A1 (en) * 2008-03-17 2009-09-17 Sumeet Sandhu Device, system, and method of resource allocation in a wireless network
CN106411911A (en) * 2016-10-19 2017-02-15 浙江大学 Method for maximizing monitoring non-interruption probability of system based on active monitoring
CN113810930A (en) * 2021-08-24 2021-12-17 华北电力大学(保定) A kind of intelligent reflective surface monitoring optimization method, device and controller

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUOJIE HU,ET AL.: "Proactive Eavesdropping via Jamming for Multichannel Decode-and-ForwardRelay System", IEEE XPLORE, 13 December 2019 (2019-12-13) *
JIALE CHEN, ET AL.: "Proactive Eavesdropping in Massive MIMO-OFDM Systems via DeepReinforcement Learning", IEEE XPLORE, 13 July 2022 (2022-07-13) *
吴伟,等: "基于全双工的主动监听系统中合法通信速率 最大化方法设计", 南京邮电大学学报, 30 April 2020 (2020-04-30) *

Similar Documents

Publication Publication Date Title
Almohamad et al. Smart and secure wireless communications via reflecting intelligent surfaces: A short survey
Forouzesh et al. Covert communication and secure transmission over untrusted relaying networks in the presence of multiple wardens
CN105515717A (en) Cooperative relay security transmission method based on artificial noise interference
CN104283629B (en) A kind of channel safety transmission method
Bai et al. Throughput maximization for multipath secure transmission in wireless ad-hoc networks
KR101155629B1 (en) Method for selective transmit/receive antenna repetition
CN113411105A (en) AP selection method of non-cell large-scale antenna system
Kang et al. Scheduling versus contention for massive random access in massive MIMO systems
Elhoushy et al. Nearest APs-based downlink pilot transmission for high secrecy rates in cell-free massive MIMO
Sun et al. IRS-assisted RF-powered IoT networks: System modeling and performance analysis
CN117354791A (en) Safe transmission method, system, equipment and medium in millimeter wave internet of vehicles multi-base-station multi-user scene
Gharagezlou et al. Energy efficient power allocation with joint antenna and user selection in massive MIMO systems
CN108418651B (en) Safe transmission method of bidirectional wireless power supply relay system
Xu et al. Beam‐domain SWIPT in massive MIMO system with energy‐constrained terminals
Chen et al. Proactive eavesdropping in massive MIMO-OFDM systems via deep reinforcement learning
Feng et al. Random caching design for multi-user multi-antenna HetNets with interference nulling
CN114884547A (en) Active monitoring method based on deep reinforcement learning
CN103297108A (en) Upstream beam forming method for multisource multi-relay collaborative network
CN107994934B (en) A Secure Transmission Method Based on Symbol Separation and Beamforming in Untrusted Relay Networks
Guo et al. Massive MIMO aided secure multi-pair relaying with power control
Wang et al. Joint reliability optimization and beamforming design for STAR-RIS-Aided Multi-user MISO URLLC systems
CN114567352A (en) Wireless energy-carrying communication method based on multi-antenna relay system
CN115276744A (en) A millimeter wave system transmission method based on rate division multiple access technology
CN107425887B (en) A beamforming method in a multi-antenna untrusted relay network
Su et al. Hybrid Resource Allocation Scheme in Secure Intelligent Reflecting Surface-Assisted IoT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination