CN116321236A

CN116321236A - RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method

Info

Publication number: CN116321236A
Application number: CN202310367185.9A
Authority: CN
Inventors: 宋清洋; 李慧; 孙巍
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-06-23

Abstract

The invention relates to the technical field of wireless communication networks, and discloses an RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method.S. 1, the AP carries out channel estimation according to a pilot sequence sent by a legal user and an active eavesdropper, obtains channel state information and beam coefficients of the legal user and the active eavesdropper, and sends the channel state information and the beam coefficients to a central processing unit CPU through a return link; s2, the CPU calculates interference vectors between legal users and between the legal users and an active eavesdropper, and sends the interference vectors to all APs and reconfigurable intelligent surfaces connected with the CPU; and S3, the AP adjusts power according to the interference vector transmitted by the CPU and the transmission data of the legal user and the active eavesdropper, and the reconfigurable intelligent surface adjusts phase. The invention can improve the energy efficiency of the system on the premise of realizing safe communication.

Description

RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method

Technical Field

The invention relates to the technical field of wireless communication networks, in particular to an RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization design method.

Background

In the last decades, multiple input multiple output (Multiple Input and Multiple Output, MIMO) technology has received widespread attention. However, since conventional wireless communication networks employ cellular structures, the performance of multi-cell MIMO systems is often limited by inter-cell interference. In order to solve this problem, a new user-centric network paradigm-no cellular network has recently been proposed. Unlike conventional cellular network design principles, non-cellular networks employ user-centric transmission designs, i.e., all of the networks together provide collaborative services without cell boundaries to all users. Due to the efficient cooperation among the distributed systems, the inter-cell interference is effectively relieved, and the method has high potential in improving the network spectrum and energy efficiency. This technique is considered as a potential candidate for future communication systems, and has attracted increasing research interest in recent years, such as resource allocation, precoding/beamforming, and channel estimation. However, deploying a large number of distributed devices can result in higher costs and power consumption, resulting in less than optimal energy efficient performance.

Fortunately, a new revolutionary technology, known as reconfigurable intelligent surface (Reconfigurable Intelligent Surface, RIS), has been identified as a low cost and low energy consumption solution to spectral efficiency that can effectively solve this problem. The RIS is made up of a large number of low cost passive elements whose phase shift is controlled by simple programmable PIN diodes, which can reflect the signal and generate a directional beam to the user. Unlike large-scale phased array antennas implemented by phase shifters in a non-cellular network, RIS does not require additional hardware implementations, such as complex digital phase shifting circuitry, thus greatly saving the energy consumption and complexity of signal processing. Thus, deploying RIS into a non-cellular network enables the same services to be implemented with lower levels of power consumption, effectively solving the problems described above.

The system aims at the problems of high cost and high energy consumption caused by the need of large-scale deployment in the existing non-cellular network (RIS-assisted non-cellular large-scale MIMO system), and proposes that the RIS is adopted to replace part of the communication so as to realize relatively low-cost and low-energy consumption communication. Furthermore, since no cellular network is vulnerable to active eavesdropper during the uplink channel estimation phase.

Therefore, on the premise of realizing secure communication, how to optimize the system energy efficiency of the RIS-assisted non-cellular large-scale MIMO system and improve the energy efficiency of the system is a problem to be solved.

Disclosure of Invention

The invention provides an RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method to solve the problems.

The invention is realized by the following technical scheme:

the utility model provides a safe no honeycomb massive MIMO system energy efficiency optimization method of RIS auxiliary, the safe no honeycomb massive MIMO system of RIS auxiliary includes a central processing unit CPU, M many antennas 'APs, K single antenna's legal users, R reflection unit number N's reconfigurable intelligent surface and a single antenna's initiative eavesdropper, the concrete steps are:

s1, in an uplink training stage, channel estimation is carried out according to pilot sequences sent by legal users and eavesdroppers, so that channel state information and beam coefficients of the legal users and the eavesdroppers are obtained, and the channel state information and the beam coefficients are sent to a central processing unit CPU through a return link;

s2, in a downlink transmission stage, the central processing unit CPU calculates an interference vector according to the received channel state information, the received beam coefficients and the spatial relation among antennas, and sends the interference vector to all APs and reconfigurable intelligent surfaces connected with the central processing unit CPU;

and S3, the AP carries out power adjustment according to the interference vector transmitted by the CPU and the transmission data of the legal user and the eavesdropper, the reconfigurable intelligent surface carries out phase adjustment according to the interference vector transmitted by the CPU and the transmission data of the legal user and the eavesdropper, and after adjustment, the AP and the reconfigurable intelligent surface send the transmission data to all legal users and active eavesdroppers connected with the AP and the reconfigurable intelligent surface.

As an optimization, in S3, the specific steps of the access point performing phase adjustment and power adjustment according to the interference vector transmitted by the CPU and the transmission data of the legal user are as follows:

s3.1, setting related constraint conditions for the AP to send transmission information and the reconfigurable intelligent surface to reflect the transmission information to legal users and eavesdroppers according to the interference vector and the transmission data;

s3.2, establishing a total energy consumption model of the RIS-assisted safe honeycomb-free large-scale MIMO system;

s3.3, combining the related constraint conditions and the total energy consumption model to obtain an optimization problem;

s3.4, solving the optimization problem through a depth deterministic strategy gradient algorithm based on priority experience playback to obtain an optimal phase of the reconfigurable intelligent surface and an optimal power control coefficient of the AP;

and S3.5, adjusting the phase of the reconfigurable intelligent surface according to the optimal phase, and adjusting the power of the AP according to the optimal power control coefficient.

As optimization, in S3.1, the relevant constraint conditions are:

C ₁ :

C ₂ :

C ₃ :R _sec ≥R _th

C ₄ :

C ₅ :

C ₆ :

C ₁ indicating that the backhaul link capacity of the system is not greater than the maximum capacity, which is the backhaul link capacity constraint; c (C) ₂ Representing the value range of the power control coefficient; c (C) ₃ The confidentiality rate of the intercepted legal user is not less than the minimum confidentiality rate, so that the legal user can be ensured to communicate safely; c (C) ₄ The indicated total power control coefficient of the transmission is not more than 1, which is the transmission power constraint; c (C) ₅ Representing that the frequency spectrum efficiency of each legal user is not lower than the minimum frequency spectrum efficiency, so as to ensure the communication quality of the legal users; c (C) ₆ The gain magnitude of each element representing the reconfigurable smart surface is 1, which is a constraint on the modulus characteristic of the reflection coefficient of the reconfigurable smart surface, wherein,

representing spectral efficiency, τ, of legitimate users _c Is the length of the coherence interval τ _p Is the pilot sequence length; η (eta) _mk A power control coefficient representing the mth transmitted transmission signal to the kth user; θ _r，n Representing the reflection coefficient, a, of the nth reflection unit on the nth reconfigurable intelligent surface _m 1 indicates that the data rate of the mth forward link transmission should be a of the mth AP total achievable rate _m Multiple of (I)>

Represents the maximum capacity of the backhaul link between the mth AP and the CPU, R _sec Indicating privacy rate of eavesdropped user, R _th Representing a privacy rate threshold; s is S _ok Representing the lowest spectral efficiency, M representing the total number of APs, K representing the total number of users, and N representing the total number of reflective elements in the reconfigurable intelligent surface.

As optimization, the total energy consumption model of the RIS-assisted safe non-cellular massive MIMO system is:

wherein P is _m Is the power consumption generated by the amplifier and the circuit at the mth AP, including the power consumption of the transceiver link and the power consumption of the signal processing, P _k Is the circuit loss of the kth user, P _r,n Representing the low power consumption, P, produced by the nth reflective element in the nth reconfigurable intelligent surface _fh,m The power consumed for the backhaul link connecting the central processing unit CPU and the mth AP is used for transmitting data between the AP and the central processing unit CPU, M represents the total number of APs, K represents the total number of users, N represents the total number of reflective elements in the reconfigurable smart surface, and R represents the total number of reconfigurable smart surfaces.

As an optimization, the total throughput of the RIS-assisted safe non-cellular massive MIMO system is:

wherein B is the system bandwidth, S _ek ({η _mk ，θ _r，n }) is the spectral efficiency per user, K is the total number of users.

As an optimization, the power consumption P generated by the amplifier and the circuit at the mth AP _m The method comprises the following steps:

wherein 0 < alpha _m Less than or equal to 1 is the efficiency of the power amplifier, N ₀ Is the noise power, b is the number of antennas of the AP, P _tc,m Is the power consumption, ρ, required to operate the circuit components associated with each antenna on the AP _d Maximum normalized transmit power for the AP;

as an optimization, the power P consumed by the forward link connecting the central processing unit CPU and the mth AP _fh,m The method comprises the following steps:

P _fh,m ＝P _0,m +B·S _e ({η _mk ,θ _r,n })·P _bt,m ；

wherein P is _0,m Is the fixed energy consumption per backhaul, B is the system bandwidth, P _bt,m Is the power consumption related to the flow, the unit is Watt/(Gbits/S), S _e ({η _mk ,θ _r,n -j) is the overall spectral efficiency of the system;

as optimization, solving the optimization problem by adopting a DDPG-PER algorithm, wherein the DDPG-PER algorithm flow is specifically as follows:

t1, initializing the parameters of an Actor network and a Critic network, setting a priority experience playback buffer zone, and defining learning rate parameters of the Actor and Critic, a reward attenuation coefficient, the number of training rounds and the number of training steps per round;

t2, in each training round, generating an action value by the Actor network according to the current state value;

t3, the action value obtained by the T2 is interacted with the environment to obtain a rewarding value, and a next state value is generated;

t4, storing experience into a priority experience playback buffer, wherein the experience consists of a current state value, an action value, a reward value and a generated next state value;

t5, sampling a batch of experiences with high priority from the priority experience playback buffer area according to the weight of each experience, and training an Actor network and a Critic network;

t6, calculating the Q value of the value function in the current state, and updating the parameters of the cost function by using the Critic network;

t7, using the Actor network and the current state value as inputs, calculating a value function Q value of the generated action value, and updating the Actor network parameter by using the value;

t8, repeating the steps from T2 to T7 until a preset training step number is reached or a stopping condition is met;

and T9, making a decision by using the trained Actor network to obtain a final strategy.

As an optimization of this process,

the state value is: signal-to-interference-noise ratio of legitimate users:

signal-to-interference-noise ratio of active eavesdroppers: />

The action value is as follows: a phase shift matrix θ and a power control coefficient matrix η;

the prize value is: energy efficiency of the system.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the invention provides an RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method, which can improve the energy efficiency of the system on the premise of realizing safe communication by carrying out joint optimization on a reconfigurable intelligent surface reflection coefficient theta and a power control coefficient eta, and can obviously improve the energy efficiency performance of the honeycomb-free large-scale MIMO system under the conditions of ensuring the safe communication of eavesdropped users, the reliable communication of general legal users and no additional cost through simulation verification display, thereby having very strong application value and development potential;

the invention adopts a deep reinforcement learning algorithm to solve, and jointly optimizes the reconfigurable intelligent surface reflection coefficient theta and the power control coefficient eta, and maximizes the energy efficiency of the system on the premise of meeting the safety communication of legal users.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings that are needed in the examples will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings may be obtained from these drawings without inventive effort for a person skilled in the art. In the drawings:

FIG. 1 is a schematic diagram of the structural composition of a RIS-assisted safe non-cellular massive MIMO system according to the present invention;

fig. 2 is a graph of energy efficiency versus AP transmit power;

FIG. 3 is a graph of energy efficiency versus number of legitimate users;

fig. 4 is a graph of safety energy efficiency versus safety rate threshold.

Detailed Description

For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.

An energy efficiency optimization method for an RIS-assisted safe non-cellular massive MIMO system in embodiment 1, where the composition of the RIS-assisted safe non-cellular massive MIMO system (called "system") is shown in fig. 1, and the system includes a central processing unit CPU, M multi-antenna APs (which can be understood as base stations), K legal users with single antennas, R reconfigurable intelligent surfaces with N reflection units, and a single-antenna active eavesdropper, where M, K, R are all positive integers.

The basic working principle of the RIS-assisted honeycomb-free massive MIMO system is as follows: in RIS-assisted non-cellular massive MIMO systems, data is transmitted mostly in Time division duplex (Time-Division Duplexing, TDD) mode. For a system in TDD mode, the uplink and downlink data transmissions occur in the same frequency band. In the uplink training stage, a legal user sends a training sequence in a certain time slot, an Access Point (AP) carries out channel estimation and beam selection on signals after receiving the training sequence, and then sends channel state information (Channel State Information, CSI) obtained by calculation and a beam coefficient after selection to a Central Processing Unit (CPU) through a return link; in the downlink transmission stage, a Central Processing Unit (CPU) calculates an interference vector according to the received CSI, beam coefficients and spatial relations among antennas, and sends the interference vector to all APs connected with the CPU, after each AP receives interference vector information transmitted by the CPU, the CPU adjusts the transmitting power and the phase shift of an RIS according to the interference vector and the transmitted data, and then transmits the interference vector information to all legal users connected with the CPU and an active eavesdropper, and each user receives signals transmitted by all APs and RIS serving the user, and then performs interference elimination and signal decoding to obtain required data.

S3.1, setting related constraint conditions for the access point to send transmission information to the legal user according to the interference vector and the transmission data;

From the above description, the signal received by the kth legal user can be obtained as follows:

wherein ρ is _d For the maximum normalized transmitting power of the AP, 0 is less than or equal to eta _mk Less than or equal to 1 is a power control coefficient,

is a composite channel between the mth and kth legal users, w _mk Precoding vector representing the transmission of data by an AP to a legitimate user, for example>

Representing the data signal sent to legal user k, and +.>

Additive white gaussian noise (Additive White Gaussion Noise, AWGN) at legal user k;

similarly, the signal received by the active eavesdropper is:

is additive white gaussian noise at the eavesdropper.

The signal to noise ratios of the legitimate user and the active eavesdropper are respectively:

the security rate of the legitimate user 1 thus obtained is the following:

R _sec,j ＝log ₂ (1+γ _j )-log ₂ (1+γ _E )

furthermore, the total energy consumption of the system can be modeled as:

wherein P is _m Is the power consumption (including the power consumption of the transmit-receive link and the power consumption of the signal processing) generated by the amplifier and the circuit at the mth AP, P _k Is the circuit loss of the kth user, P _r,n Representing low power consumption, P, produced by the nth reflective element in the nth RIS _fh,m Is the power consumed by the forward link connecting the CPU and the mth AP forData is transferred between the APs and the central processing unit CPU, M represents the total number of APs, K represents the total number of users, N represents the total number of reflective elements in the reconfigurable intelligent surface, and R represents the total number of reconfigurable intelligent surfaces.

Power consumption P generated by amplifier and circuit at mth AP _m Can be modeled as:

wherein 0 < alpha _m Less than or equal to 1 is the efficiency of the power amplifier, N ₀ Is the noise power, b is the number of antennas, P _tc,m Is the power consumption, ρ, required to operate the circuit components associated with each antenna on the AP _d Is the maximum normalized transmit power.

Backhaul is used to transfer data between an AP and a CPU (each backhaul represents a process of transferring data between an AP and a CPU), and its power consumption is proportional to spectral efficiency, and can be expressed as:

P _fh,m ＝P _0,m +B·S _e ({η _mk ,θ _r,n })·P _bt,m

wherein P is _0,m Is the fixed energy consumption per backhaul, B is the system bandwidth, P _bt,m Is the power consumption (unit: watt/(Gbits/S), S associated with the flow _e ({η _mk ,θ _r,n }) is the overall spectral efficiency of the system, the corresponding expression is:

the spectral efficiency of each user can be expressed as:

wherein τ _c Is the length of the coherence interval τ _p Is the pilot length, these two terms of art are the terms of art that occur during pilot training, and are notAnd is further detailed.

In order to solve the problems of low energy efficiency and poor safety of the non-cellular large-scale MIMO system, the invention introduces RIS, and maximizes the energy efficiency of the RIS-assisted safe non-cellular large-scale MIMO system by jointly optimizing the reconfigurable intelligent surface reflection coefficient theta and the AP power control coefficient eta.

C ₁ :

C ₂ :

C ₃ :R _sec ≥R _th

C ₄ :

C ₅ :

C ₆ :

C ₁ Indicating that the backhaul link capacity of the system is not greater than the maximum capacity, which is the backhaul link capacity constraint; c (C) ₂ Representing the value range of the power control coefficient; c (C) ₃ The confidentiality rate of the intercepted legal user is not less than the minimum confidentiality rate, so that the legal user can be ensured to communicate safely; c (C) ₄ The indicated total power control coefficient of the transmission is not more than 1, which is the transmission power constraint; c (C) ₅ Representing that the frequency spectrum efficiency of each legal user is not lower than the minimum frequency spectrum efficiency, so as to ensure the communication quality of the legal users; c (C) ₆ Representing reconfigurableThe gain magnitude of each element of the smart surface is 1, which is a constraint of the modulus characteristic of the reflection coefficient of the reconfigurable smart surface, wherein,

Represents the maximum capacity of the backhaul link between the mth AP and the CPU, R _SEC Privacy rate, R, representing the legal user who is eavesdropped _th Representing a privacy rate threshold; s is S _ok Representing the lowest spectral efficiency, M representing the total number of APs, K representing the total number of users, and N representing the total number of reflective elements in the reconfigurable intelligent surface.

Since the optimization problem comprises a non-convex objective function and a constraint function, the optimization problem is a non-convex optimization problem which is difficult to solve, and the scheme does not directly solve the challenging optimization problem mathematically, but adopts a deep reinforcement learning algorithm to solve.

The invention adopts a depth deterministic strategy gradient (Deep Deterministic Policy Gradient with Prioritized Experience Replay, DDPG-PER) algorithm based on preferential experience playback to solve the optimization problem, and the algorithm totally comprises four neural networks: an Actor network, a Critic network, a Target Actor network, a Target Critic network, and a preferential experience playback pool. The Actor network and the Target Actor network are used for estimating the probability of taking various actions under the current state, namely determining the current strategy, the Critic network and the Target Critic network are used for evaluating the value of the current state and the action pair, namely determining the optimal action under the current state, and the priority experience playback pool is used for storing experiences.

Solving the optimization problem by adopting a DDPG-PER algorithm, wherein the DDPG-PER algorithm comprises the following steps:

Specifically, the Actor network receives the current state as input, outputs the current policy, and the Critic network receives the current state and the action under the current policy as input, and outputs the value Q under the current state. The Target Actor network and the Target Critic network are Target networks of the Actor network and the Critic network, respectively, and are used for estimating Target strategies and Target values. The priority experience playback pool gives each experience different priority according to the importance of the experience, the priority can be calculated according to the TD error of the experience, and the TD error refers to the difference between the rewards obtained by taking a certain action in the current state and the cost function of the next state under the current strategy. Experience with higher priority is frequently extracted, thereby increasing the importance of these experiences.

At the beginning of the algorithm, the parameters of the priority empirical playback pool, the four neural networks, actions θ and η need to be initialized.

The algorithm runs a total of E rounds, each round being trained T times. For each round, the algorithm terminates when it converges or reaches the maximum number of exercises allowed. In addition, the present invention utilizes DRL to obtain optimal θ and η instead of training the neural network for online processing.

To the state

The actions θ and η and the transient rewards are described in detail as follows:

1. status: state at time t

Providing useful information about the environment, helping to train the network, the present invention considers the signal-to-interference-and-noise ratio of the user and the signal-to-interference-and-noise ratio of the active eavesdropper as state vectors.

2. The actions are as follows: the invention considers the motion a taking the phase shift matrix theta and the power control coefficient matrix eta as the time t _t . Because the input of the neural network can only be real, and the RIS phase shift matrix is complex, the invention converts the reflection coefficient into an angle and then inputs the angle to the neural network.

3. Rewarding: the invention considers the energy efficiency of the system as a rewarding value, but only when the action output by the Actor network meets all constraint conditions, the system can obtain rewards, otherwise, the system can be punished to a certain degree.

In order to verify the superiority of RIS-assisted honeycomb-free large-scale MIMO system in energy efficiency performance and the superiority of DDPG-PER algorithm used in the invention, two other schemes and an algorithm are designed as comparison, and the two schemes and the algorithm are respectively as follows: 1.RIS is not adopted but power control coefficients are optimized; 2. the power control coefficient scheme used by the invention is not adopted but RIS is adopted; 3. a conventional DDPG algorithm is used.

The superiority of the algorithm used and the system performance of the invention are verified by simulation experiments. The system simulation parameters were set as follows: the total number of the AP numbers M and the reconfigurable intelligent surface numbers R is 10, the number of users K=5, the number of eavesdroppers is 1, the number of antennas b=2 of each AP, the total number of reflection units of all the reconfigurable intelligent surfaces is 20-120, and the transmitting power p of the APs _d =0 to 30dBm, a noise figure of 9dB, a system bandwidth of b=20 MHz, a pilot sequence length τ _p Length of coherence interval τ =30 _c Pilot transmit power p of legal user =200 _u =0.1w, active eavesdropper pilot transmit power p _E =0.1w, power amplifier efficiency α _m Power consumption P of each antenna =0.4 _tc,m =0.2W, fixed power consumption P per backhaul _0,m Flow dependent power consumption p=0.825W _bt,m =0.25W/(Gbits), backhaul link capacity between mth AP and CPU

Safety rate threshold R _th =0.2 bit/S, minimum spectrum requirement S for kth user _ok =0.7 bit/s/Hz. The channels from the AP to all legal users, from the AP to the eavesdropper, and from the reconfigurable intelligent surface to the users are modeled as rayleigh channels, the channels from the AP to the reconfigurable intelligent surface are modeled as rice channels, the path loss at the reference distance of 1m is 30db, and the path loss indexes of the AP-RIS link, the RIS-user link, and the AP-user link are 2.2, 2.8, and 3.5, respectively.

Fig. 2 shows the energy efficiency versus transmit power under different schemes and algorithms. It can be seen that as the transmit power increases, the energy efficiency under different schemes and algorithms gradually increases and then stabilizes, because as the transmit power threshold increases, the power allocated by the AP to the user increases, resulting in an increase in energy efficiency. However, the threshold value of the transmitting power is further increased, the transmitting power reaches the optimal value, and the energy efficiency tends to be stable. In addition, under the same transmitting power, the scheme and algorithm of the invention are superior to other schemes and the traditional DDPG algorithm, because RIS reflection phase is optimized, thereby the legal user has high-efficiency transmission rate, and the priority mechanism is used, thereby the utilization rate of important experience of the algorithm can be improved.

Fig. 3 shows the energy efficiency versus the number of legitimate users. It can be seen that the energy efficiency increases monotonically with increasing number of users in the case of different numbers of APs and RIS, because multi-user diversity is exploited. It can be found that when the total number of APs and RIS is fixed, deploying more RIS can achieve better performance because the RIS is made up of a large number of passive components, which can reduce system power consumption, while the reconfigurable nature of the RIS improves the spectral efficiency of legitimate users. Furthermore, in FIG. 3, the total number of reflective elements for all RISs is unchanged, and when the number of users is fixed, the energy efficiency increases as the number of RISs increases, which illustrates the advantage of RISs employing distributed deployment.

Fig. 4 shows the relationship of the safety energy efficiency to the safety rate threshold under different schemes and algorithms. It can be seen that, as the safety rate threshold increases, the energy efficiency of the system is kept stable and then reduced, because under the constraint of smaller safety rate, the throughput of the user easily meets the constraint condition of the safety rate and remains unchanged, but as the safety rate threshold increases, the transmitting power needs to be increased to meet the constraint condition, so that the power consumption of the system is increased, and the energy efficiency is reduced. It can also be seen from fig. 4 that the use of the proposed solution can be better than other solutions in terms of safety and energy efficiency performance.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

The RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method is characterized in that the RIS-assisted honeycomb-free large-scale MIMO system comprises a central processing unit CPU, M multi-antenna APs, K single-antenna legal users, R reconfigurable intelligent surfaces with N reflecting units and a single-antenna active eavesdropper, and the method comprises the following specific steps of:

s1, in an uplink training stage, the AP carries out channel estimation according to pilot sequences sent by legal users and active eavesdroppers to obtain channel state information and beam coefficients of the transmission data, and sends the channel state information and the beam coefficients to the CPU through a return link;

s2, in a downlink transmission stage, the central processing unit CPU calculates interference vectors between legal users and between the legal users and an active eavesdropper according to the received channel state information, the received beam coefficients and the spatial relation among antennas, and sends the interference vectors to all APs and reconfigurable intelligent surfaces connected with the central processing unit CPU;

and S3, the AP carries out power adjustment according to the interference vector transmitted by the CPU and the transmission data of the legal user, the reconfigurable intelligent surface carries out phase adjustment according to the interference vector transmitted by the CPU and the transmission data of the legal user, and after adjustment, the AP and the reconfigurable intelligent surface send the transmission data to all legal users and active eavesdroppers connected with the AP and the reconfigurable intelligent surface.
2. The method for optimizing the energy efficiency of the RIS-assisted safe non-cellular massive MIMO system according to claim 1, wherein in S3, the specific steps of the access point performing phase adjustment and power adjustment according to the interference vector transmitted by the CPU and the transmission data of the legal user and the active eavesdropper are as follows:

s3.1, setting related constraint conditions about the sending of transmission information by the AP and the reflection of the transmission information by the reconfigurable intelligent surface to legal users and eavesdroppers according to the interference vector and the transmission data;

s3.2, establishing a total energy consumption model of the RIS-assisted safe honeycomb-free large-scale MIMO system;

s3.3, combining the related constraint conditions and the total energy consumption model to obtain an optimization problem;

s3.4, solving the optimization problem through a depth deterministic strategy gradient algorithm based on priority experience playback to obtain an optimal phase of the reconfigurable intelligent surface and an optimal power control coefficient of the AP;

and S3.5, adjusting the phase of the reconfigurable intelligent surface according to the optimal phase, and adjusting the power of the AP according to the optimal power control coefficient.
3. The method for optimizing the energy efficiency of a safe non-cellular massive MIMO system with the aid of claim 2, wherein in S3.1, the relevant constraint conditions are:

C ₁ :

C ₂ :

C ₃ :R _sec ≥R _th

C ₄ :

C ₅ :

C ₆ :

C ₁ indicating that the backhaul link capacity of the system is not greater than the maximum capacity, which is the backhaul link capacity constraint; c (C) ₂ Representing the value range of the power control coefficient; c (C) ₃ The confidentiality rate of the intercepted legal user is not less than the minimum confidentiality rate, so that the legal user can be ensured to communicate safely; c (C) ₄ Indicating that the total power control coefficient of the transmission of the AP is not largeAt 1, a transmit power constraint for the AP; c (C) ₅ Representing that the frequency spectrum efficiency of each legal user is not lower than the minimum frequency spectrum efficiency, so as to ensure the communication quality of the legal users; c (C) ₆ The gain magnitude of each element representing the reconfigurable smart surface is 1, which is a constraint on the modulus characteristic of the reflection coefficient of the reconfigurable smart surface, wherein,

representing spectral efficiency, τ, of legitimate users _c Is the length of the coherence interval τ _p Is the pilot sequence length; η (eta) _mk A power control coefficient representing the mth transmitted transmission signal to the kth user; θ _r，n Representing the reflection coefficient, a, of the nth reflection unit on the nth reconfigurable intelligent surface _m 1 indicates that the data rate of the mth forward link transmission should be a of the mth AP total achievable rate _m Multiple of (I)>
Represents the maximum capacity of the backhaul link between the mth AP and the CPU, R _sec Indicating privacy rate of eavesdropped user, R _th Representing a privacy rate threshold; s is S _ok Representing the lowest spectral efficiency, M representing the total number of APs, K representing the total number of users, and N representing the total number of reflective elements in the reconfigurable intelligent surface.
4. A method for optimizing the energy efficiency of a RIS-assisted safe non-cellular massive MIMO system according to claim 3, wherein the total energy consumption model of the RIS-assisted safe non-cellular massive MIMO system is:

wherein P is _m Is the power consumption generated by the amplifier and the circuit at the mth AP, including the power consumption of the transceiver link and the power consumption of the signal processing, P _k Is the circuit loss of the kth user, P _r,n Representing the r-th reconfigurable intelligenceLow power consumption, P, generated by the nth reflective element in the surface _fh,m The power consumed for the backhaul link connecting the central processing unit CPU and the mth AP is used for transmitting data between the AP and the central processing unit CPU, M represents the total number of APs, K represents the total number of users, N represents the total number of reflective elements in the reconfigurable smart surface, and R represents the total number of reconfigurable smart surfaces.
5. The method for optimizing the energy efficiency of a RIS-assisted safe non-cellular massive MIMO system of claim 4, wherein the total throughput of the RIS-assisted safe non-cellular massive MIMO system is:

wherein B is the system bandwidth, S _ek ({η _mk ，θ _r，n }) is the spectral efficiency per user, K is the total number of users.
6. The method for optimizing the energy efficiency of a RIS-assisted safe non-cellular massive MIMO system according to claim 5, wherein the power consumption P at the mth AP is generated by an amplifier and a circuit _m The method comprises the following steps:

wherein 0 < alpha _m Less than or equal to 1 is the efficiency of the power amplifier, N ₀ Is the noise power, b is the number of antennas of the AP, P _tc,m Is the power consumption, ρ, required to operate the circuit components associated with each antenna on the AP _d The maximum normalized transmit power for the AP.
7. The RIS-assisted safe non-cellular massive MIMO system energy efficiency optimization method according to claim 5, wherein the power P consumed by the backhaul link connecting the CPU and the mth AP _fh,m The method comprises the following steps:

P _fh,m ＝P _0,m +B·S _e ({η _mk ,θ _r,n })·P _bt,m ；

wherein P is _0,m Is the fixed energy consumption per backhaul, B is the system bandwidth, P _bt,m Is the power consumption related to the flow, the unit is Watt/(Gbits/S), S _e ({η _mk ,θ _r,n -j) is the overall spectral efficiency of the system;
8. the RIS-assisted safe honeycomb-free large-scale MIMO system energy efficiency optimization method according to claim 1, wherein the optimization problem is solved by adopting a DDPG-PER algorithm, and the DDPG-PER algorithm flow is specifically as follows:

t1, initializing the parameters of an Actor network and a Critic network, setting a priority experience playback buffer zone, and defining learning rate parameters of the Actor and Critic, a reward attenuation coefficient, the number of training rounds and the number of training steps per round;

t2, in each training round, generating an action value by the Actor network according to the current state value;

t3, the action value obtained by the T2 is interacted with the environment to obtain a rewarding value, and a next state value is generated;

t4, storing experience into a priority experience playback buffer, wherein the experience consists of a current state value, an action value, a reward value and a generated next state value;

t5, sampling a batch of experiences with high priority from the priority experience playback buffer area according to the weight of each experience, and training an Actor network and a Critic network;

t6, calculating the Q value of the value function in the current state, and updating the parameters of the cost function by using the Critic network;

t7, using the Actor network and the current state value as inputs, calculating a value function Q value of the generated action value, and updating the Actor network parameter by using the value;

t8, repeating the steps from T2 to T7 until a preset training step number is reached or a stopping condition is met;

and T9, making a decision by using the trained Actor network to obtain a final strategy.
9. The RIS-assisted safe non-cellular massive MIMO system energy efficiency optimization method of claim 8, wherein,

the state value is: signal-to-interference-noise ratio of legitimate users:
signal-to-interference-noise ratio of active eavesdroppers: />

The action value is as follows: a phase shift matrix θ and a power control coefficient matrix η;

the prize value is: energy efficiency of the system.