CN114938512A - Broadband capacity optimization method and device - Google Patents

Broadband capacity optimization method and device Download PDF

Info

Publication number
CN114938512A
CN114938512A CN202210435498.9A CN202210435498A CN114938512A CN 114938512 A CN114938512 A CN 114938512A CN 202210435498 A CN202210435498 A CN 202210435498A CN 114938512 A CN114938512 A CN 114938512A
Authority
CN
China
Prior art keywords
ris
representing
theta
optimization
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210435498.9A
Other languages
Chinese (zh)
Inventor
张海君
吴舒勍
刘向南
孙春蕾
李卫
王健全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202210435498.9A priority Critical patent/CN114938512A/en
Publication of CN114938512A publication Critical patent/CN114938512A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0456Selection of precoding matrices or codebooks, e.g. using matrices antenna weighting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a method and a device for optimizing broadband capacity, wherein the method comprises the following steps: replacing part of base stations in a cell-free network with an intelligent reconfigurable surface RIS to construct an RIS auxiliary cell-free network system model; designing a combined precoding problem in an RIS assisted non-cell network system, and maximizing the sum rate of users by jointly optimizing the active precoding at a base station end and the passive precoding at an RIS end; and introducing an auxiliary variable: the method comprises the steps that a phase shift matrix theta and a precoding vector matrix W of the RIS convert a joint precoding problem into an optimization problem of theta and W; and respectively optimizing the phase shift matrix theta and the precoding vector matrix W by using a deep reinforcement learning algorithm, so that the theta and the W both obtain the optimal solution, the sum rate performance of the user is improved to the maximum extent, and the broadband capacity optimization is realized. The method for optimizing the broadband capacity can better solve the problems of the cost and the power consumption of a cell-free network.

Description

Broadband capacity optimization method and device
Technical Field
The present invention relates to the field of wireless communications technologies, and in particular, to a method and an apparatus for optimizing a broadband capacity.
Background
In recent years, intelligent reconfigurable surfaces (RIS) have attracted much attention in the industry as a promising technology. The RIS is a completely new and revolutionary technology that can intelligently reconfigure the wireless propagation environment by integrating a large number of low-cost passive reflective elements on a flat surface, thereby significantly improving the performance of wireless communication networks. In particular, different elements of the RIS can independently reflect incident signals by controlling their amplitude and/or phase, thereby synergistically enabling fine three-dimensional passive beamforming for directional signal enhancement or nulling.
The cell-free network is a new user-centered network, and effectively solves the problem of common inter-cell interference in the traditional network by utilizing the cooperation between base stations. How to improve the sum rate of users in a cell is the focus of research in an RIS-assisted cell-free network, and no effective solution exists at present.
Disclosure of Invention
The invention provides a method and a device for optimizing broadband capacity, which are used for improving the sum rate of users in a cell and optimizing the broadband capacity in a cell-free network scene assisted by an RIS (remote location system).
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, the present invention provides a method for optimizing broadband capacity, including:
replacing part of base stations in the cell-free network with an intelligent reconfigurable surface RIS to construct an RIS-assisted cell-free network system model; wherein RIS communication is employed between pairs of users in the system model;
designing a combined precoding problem in an RIS assisted non-cell network system, and maximizing the sum rate of users by jointly optimizing the active precoding at a base station end and the passive precoding at an RIS end; and introducing an auxiliary variable: the method comprises the steps that a phase shift matrix theta and a precoding vector matrix W of the RIS convert a joint precoding problem into an optimization problem of theta and W;
and respectively optimizing the phase shift matrix theta and the precoding vector matrix W by using a deep reinforcement learning algorithm, so that both theta and W are optimal, the sum rate performance of a user is improved to the maximum extent, and the broadband capacity optimization is realized.
Further, the RIS assisted cell-free network system is a discrete time slot system, and the model of the RIS assisted cell-free network system is modeled as a Markov decision model.
Further, the deep reinforcement learning algorithm optimizes the PPO algorithm for a near-end strategy in the deep reinforcement learning DRL.
Further, the optimizing the phase shift matrix Θ and the precoding vector matrix W by using the deep reinforcement learning algorithm respectively to make Θ and W both obtain an optimal solution, so as to improve the sum rate performance of users to the maximum extent, and realize the broadband capacity optimization, including:
step 1, initializing wireless device information, user information and intelligent environment information, comprising: maximum transmitting power P of base station max User weight η k And agent actions and states;
step 2, repeatedly executing the following processes: the current state s t Inputting the operator-new network to obtain the action a t Then input into the environment to obtain the reward r t And the state of the next step s t′ (ii) a Until a predetermined number of s' are stored t ,a t ,r t };
Wherein the equivalent channel from base station b to user k on subcarrier p is represented as:
Figure BDA0003612741570000021
in the formula (I), the compound is shown in the specification,
Figure BDA0003612741570000022
G b,r,p
Figure BDA0003612741570000023
respectively representing frequency domain channels from a base station b to a user k, from the base station b to a RISr and from the RISr to the user k on a subcarrier p;
Figure BDA0003612741570000024
a phase shift matrix representing the RIS R, R representing the RIS number;
the current state takes into account a Gaussian channel, let z k,p Representing an additive white gaussian noise, the received signal is:
Figure BDA0003612741570000025
in the formula, y b,k,p Representing the baseband frequency domain signal, w, from base station b to user k on subcarrier p b,p,j Representing the precoding vector, s, of base station b p,j Is represented by w b,p,j A pre-coded frequency domain signal; b represents the number of base stations; k represents the number of users;
step 3, designing an objective function for solving the user and the rate;
wherein the signal s of the subcarrier p on the user k p,k Is expressed as:
Figure BDA0003612741570000026
in the formula (I), the compound is shown in the specification,
Figure BDA0003612741570000027
Ξ k,p representing an additive white gaussian noise variance;
from this, a weighted sum expression of the users is derived, i.e. the objective function is:
Figure BDA0003612741570000028
in the formula, Θ is set to diag (Θ) 1 ,…,Θ R ) (ii) a P represents the number of subcarriers; eta k Representing a user weight;
two constraints are required for the objective function to be true:
Figure BDA0003612741570000031
wherein the content of the first and second substances,
Figure BDA0003612741570000032
representing a feasible set of RIS reflection coefficients; theta r,n Represents the reflectance of the RIS;
step 4, all s stored t ,a t ,r t And (3) combining and inputting the combination into a critic network, and calculating a dominance function:
wherein the merit function is calculated by the following formula:
Figure BDA0003612741570000033
where γ denotes a discount factor, s t Indicating the current state, s t′ Indicates the next state, a t Represents the current action, r t′ Representing an objective function, gamma t′-t Representing the difference between the discount factors at the next time and the current time, f' (Θ, W) representing the objective function at the next time; t' represents the next time, t represents the current time, pi(s) t ,a t ) Representing the current context and the policy of action generation, P(s) t′ |s t ,a t ) Representing the probability of generating new context information under the current context and action; v Φ (s t ) Representing a cost function, calculated by the following formula:
Figure BDA0003612741570000034
step 5, obtaining instant rewards according to a Bellman equation for learning, and adopting a PPO algorithm to realize alternate optimization learning of theta and W to obtain an optimized solution of theta and W; to maximize user and rate performance.
Further, the alternating optimization learning of Θ and W is realized by using the PPO algorithm, and the optimal solution of Θ and W is obtained, including:
optimizing a precoding vector matrix W, wherein the calculation formula is as follows:
Figure BDA0003612741570000035
where y (W) represents the actual value of the objective function,
Figure BDA0003612741570000036
representing a constraint function; p is W (a t |s t ) Representing environmental information s when W is not optimized t Lower generation action a t The probability of (a) of (b) being,
Figure BDA0003612741570000037
denotes that W is optimized to W k Post-presence context information s t Lower generation of action a t The probability of (a) of (b) being,
Figure BDA0003612741570000038
is represented as optimized as W k The latter merit function, ε, represents a parameter that limits the difference between the old and new strategies;
when the algorithm tends to converge, the optimization process is terminated, and the optimization solution W of W is recorded opt
Optimizing the phase shift matrix theta and solving the optimization W opt Substituting an objective function f (theta, W), optimizing theta, and terminating the optimization process when the algorithm tends to converge to obtain an optimal solution theta of theta opt
On the other hand, the invention also provides a broadband capacity optimization device, which comprises:
the RIS assisted cell-free network system model building module is used for replacing part of base stations in a cell-free network with an intelligent reconfigurable surface RIS to build an RIS assisted cell-free network system model; wherein RIS communication is employed between pairs of users in the system model;
the combined precoding problem design module is used for designing a combined precoding problem in the RIS auxiliary cell-free network system, and maximizing the sum rate of users by jointly optimizing the active precoding at the base station end and the passive precoding at the RIS end; and introducing an auxiliary variable: the method comprises the steps that a phase shift matrix theta and a precoding vector matrix W of the RIS convert a joint precoding problem into an optimization problem of theta and W;
and the deep reinforcement learning optimization module is used for optimizing the phase shift matrix theta and the precoding vector matrix W by using a deep reinforcement learning algorithm respectively, so that the theta and the W both obtain an optimal solution, the sum rate performance of a user is improved to the maximum extent, and the broadband capacity optimization is realized.
Further, the RIS assisted cell-free network system is a discrete time slot system, and the model of the RIS assisted cell-free network system is modeled as a Markov decision model.
Further, the deep reinforcement learning algorithm is a near-end strategy optimization PPO algorithm in the deep reinforcement learning DRL.
Further, the deep reinforcement learning optimization module is specifically configured to:
step 1, initializing wireless device information, user information and intelligent environment information, comprising: maximum transmitting power P of base station max User weight η k And agent actions and states;
step 2, repeatedly executing the following processes: the current state s t Inputting into the operator-new network to obtain action a t Then input into the environment to obtain the reward r t And the state of the next step s t′ (ii) a Until a predetermined number of s' are stored t ,a t ,r t };
Wherein the equivalent channel from base station b to user k on subcarrier p is represented as:
Figure BDA0003612741570000041
in the formula (I), the compound is shown in the specification,
Figure BDA0003612741570000042
G b,r,p
Figure BDA0003612741570000043
respectively representing frequency domain channels from a base station b to a user k, from the base station b to an RISr and from the RISr to the user k on a subcarrier p;
Figure BDA0003612741570000044
a phase shift matrix representing the RIS R, R representing the RIS number;
the current state takes into account a Gaussian channel, let z k,p Representing additive white gaussian noise, the received signal is:
Figure BDA0003612741570000045
in the formula, y b,k,p Representing the baseband frequency domain signal, w, from base station b to user k on subcarrier p b,p,j Representing the precoding vector, s, of base station b p,j Is represented by w b,p,j A pre-coded frequency domain signal; b represents the number of base stations; k represents the number of users;
step 3, designing an objective function for solving the user and the rate;
wherein the signal s of the subcarrier p on the user k p,k The signal-to-noise ratio of (c) is expressed as:
Figure BDA0003612741570000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003612741570000052
Ξ k,p to representAn additive white gaussian noise variance;
from this, a weighted sum expression of the users is derived, i.e. the objective function is:
Figure BDA0003612741570000053
wherein Θ ═ diag (Θ) 1 ,…,Θ R ) (ii) a P represents the number of subcarriers; eta k Representing a user weight;
two constraints are required to make the objective function true:
Figure BDA0003612741570000054
wherein the content of the first and second substances,
Figure BDA0003612741570000055
representing a feasible set of RIS reflection coefficients; theta r,n Represents the reflectance of the RIS;
step 4, all s stored t ,a t ,r t And (3) combining and inputting the combination into a critic network, and calculating a dominance function:
wherein the merit function is calculated by the following formula:
Figure BDA0003612741570000056
where γ denotes a discount factor, s t Indicating the current state, s t′ Indicates the next state, a t Represents the current action, r t′ Representing an objective function, gamma t′-t Representing the difference between the discount factors at the next time and the current time, f' (Θ, W) representing the objective function at the next time; t' represents the next time, t represents the current time, pi(s) t ,a t ) Representing the current context and the policy of action generation, P(s) t′ |s t ,a t ) Representing the probability of generating new context information under the current context and action; v Φ (s t ) To representA cost function calculated by the formula:
Figure BDA0003612741570000057
step 5, obtaining instant rewards according to a Bellman equation for learning, realizing alternate optimization learning of theta and W by adopting a PPO algorithm, and solving an optimal solution of theta and W; to maximize user and rate performance.
Further, the alternating optimization learning of Θ and W is realized by using the PPO algorithm, and the optimal solution of Θ and W is obtained, including:
optimizing a precoding vector matrix W, wherein the calculation formula is as follows:
Figure BDA0003612741570000061
where y (W) represents the actual value of the objective function,
Figure BDA0003612741570000062
representing a constraint function; p is W (a t |s t ) Indicating environmental information s when W is not optimized t Lower generation action a t The probability of (a) of (b) being,
Figure BDA0003612741570000063
denotes that W is optimized to W k Post-presence context information s t Lower generation action a t The probability of (a) of (b) being,
Figure BDA0003612741570000064
is represented as optimized as W k The latter merit function, ε, represents a parameter that limits the difference between the old and new strategies;
when the algorithm tends to converge, the optimization process is terminated, and the optimization solution W of W is recorded opt
Optimizing the phase shift matrix theta and solving the optimization W opt The objective function f (theta, W) is brought in, then the theta is optimized, when the algorithm tends to be convergent, the optimization process is terminated, and the maximum theta is obtainedOptimal solution theta opt
In yet another aspect, the present invention also provides an electronic device comprising a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the above-described method.
In yet another aspect, the present invention also provides a computer-readable storage medium having at least one instruction stored therein, which is loaded and executed by a processor to implement the above-mentioned method.
The technical scheme provided by the invention has the beneficial effects that at least:
the invention provides a broadband capacity optimization scheme of a cell-free network by taking maximized users and speed as the target, and uses RIS to replace part of base stations in the cell-free network, and deploys more RIS in the system to further improve the network capacity.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a RIS assisted cell-free network system model provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
First embodiment
The embodiment provides a broadband capacity optimization method, which is used for improving the sum rate of users in a cell and optimizing the broadband capacity in an RIS-assisted cell-free network scene. The method considers the problem of capacity improvement under the condition of limited cost and power consumption by utilizing an intelligent reconfigurable surface (RIS) technology. The RIS is applied to the cell-free network, a part of base stations in the cell-free network are replaced by the RIS with low cost and low power consumption, and more RISs are deployed in the system to further improve the network capacity, so that the problems of the cost and the power consumption of the cell-free network can be better solved. In addition, the method designs and models the joint precoding problem in the RIS assisted cell-free network system, introduces a phase shift matrix theta and a precoding vector matrix W of an auxiliary variable RIS, and converts the joint precoding problem into the optimization problem of theta and W. Training the intelligent agent by using a near-segment optimization strategy (PPO) algorithm in deep reinforcement learning to obtain corresponding actions and strategies, updating the strategies by using an on-policy method, and limiting differences between new strategies and old strategies by using a clipped method. The alternate learning optimization of the phase shift matrix and the precoding vector matrix of the RIS is realized, the convergence rate of users and speed can be greatly improved, and the aim of improving the performance of a communication system is fulfilled. The feasibility of the optimization method of broadband capacity of PPO-based RIS assisted cell-free networks will be demonstrated by cycling to maximize the user's sum rate.
The method has the main idea that the active precoding of the base station end and the passive precoding of the RIS end are optimized in a combined mode by designing the problem of combined precoding of a cell-free network system model; and the alternating learning optimization of the auxiliary variable phase shift matrix theta and the precoding vector matrix W is finished by adopting deep reinforcement learning, so that the user sum rate is maximized. The system is a discrete time slot system, and the model is modeled as a Markov decision model. And simultaneously, decision optimization is carried out by adopting PPO, and the goal is to maximize the objective function of the system.
Specifically, the method for optimizing the broadband capacity of the embodiment includes the following steps:
s1, replacing part of base stations in the cell-free network with the intelligent reconfigurable surface RIS, and constructing an RIS auxiliary cell-free network system model; wherein RIS communication is employed between pairs of users in the system model;
fig. 1 shows a broadband scenario of a cell-free network, which includes M base station antennas, U user antennas, N RIS elements, B base stations, K users, R RIS elements, and P subcarriers, thereby forming an RIS-assisted cell-free network model. The system is a discrete time slot system, and models are modeled as Markov decision models.
S2, designing the joint pre-coding problem in the RIS assisted non-cell network system, and maximizing the sum rate of users by jointly optimizing the active pre-coding of the base station end and the passive pre-coding of the RIS end; and introducing an auxiliary variable: a phase shift matrix theta and a precoding vector matrix W of the RIS convert the joint precoding problem into an optimization problem of theta and W;
s3, phase shift matrix theta and precoding vector matrix W are optimized by using a deep reinforcement learning algorithm respectively, so that theta and W both obtain an optimal solution, the sum rate performance of users is improved to the maximum extent, and the broadband capacity optimization is realized.
Wherein, the algorithm adopted in S3 is a PPO algorithm, specifically, S3 includes the following steps:
s31, initializing wireless device, user and RIS (intelligent agent) environment information, including: maximum transmitting power P of base station max User weight η k And agent actions and states, etc.;
wherein the user equipment, RIS and channel state are modeled as finite state markov models. The system state does not change in the same moment, and the system is switched at the next moment according to the state transition probability.
S32, converting the current environment information S t Inputting into the operator-new network to obtain action a t Then input into the environment to obtain the reward r t And the state s of the next step t′ (ii) a Repeating the above steps until a certain number of s are stored t ,a t ,r t };
Wherein, the equivalent channel from the base station b to the user k on the subcarrier p is represented as:
Figure BDA0003612741570000081
in the formula (I), the compound is shown in the specification,
Figure BDA0003612741570000082
G b,r,p
Figure BDA0003612741570000083
respectively representing frequency domain channels from a base station b to a user k, from the base station b to a RISr and from the RISr to the user k on a subcarrier p; theta r Phase shift matrix theta representing RISr r =diag(θ r,1 ,…,θ r,N ),
Figure BDA0003612741570000084
Represents a feasible set of RIS reflection coefficients,
Figure BDA0003612741570000085
the current environment information considers a Gaussian channel, let z k,p Means additive white Gaussian noise (expected to be zero, variance xi) k,p =σ 2 I U ) Then the received signal is expressed as:
Figure BDA0003612741570000086
in the formula, y b,k,p Representing the baseband frequency domain signal, w, on subcarrier p from base station b to user k b,p,j Representing the precoding vector, s, of base station b p,j Is represented by w b,p,j A pre-coded frequency domain signal;
s33, designing the pre-coding problem of the model to obtain the objective function of the user and the speed;
wherein the signal s of the subcarrier p on the user k p,k Is expressed as:
Figure BDA0003612741570000087
in the formula (I), the compound is shown in the specification,
Figure BDA0003612741570000088
from this, a weighted sum expression of the users is derived, i.e. the objective function is:
Figure BDA0003612741570000089
wherein Θ ═ diag (Θ) 1 ,…,Θ R );η k Representing the user weight, gamma k,p Signal s representing subcarrier p at user k p,k Signal to noise ratio (SINR).
Two constraints are required for the objective function to be true:
1)
Figure BDA00036127415700000810
2)
Figure BDA00036127415700000811
wherein, theta r,n Reflectance coefficient representing RIS
S34, storing all S t ,a t ,r t And (3) combining and inputting the combination into a critic network, and calculating a dominance function:
wherein the merit function is calculated by the following formula:
Figure BDA0003612741570000091
in which gamma denotes a discount factor, r t′ Represents the objective function, s t Indicating the current state, s t′ Indicates the next state, a t Representing the current action, γ t′-t Representing the difference between the discount factors at the next time and the current time, f' (Θ, W) representing the objective function at the next time; t' represents the next time, t represents the current time, pi(s) t ,a t ) Policy, P(s), representing the current context and the generation of actions t′ |s t ,a t ) Representing the probability of generating new context information under the current context and action; v Φ (s t ) Representing a cost function, calculated by the formula:
Figure BDA0003612741570000092
and S35, obtaining instant rewards according to the Bellman equation to carry out learning reinforcement learning, wherein the goal is to maximize reward functions, namely reaching maximum values of users and speed. By adopting the PPO algorithm, the algorithm realizes the small-batch update of the target function through a plurality of training steps, and the problem that the step length in the strategy gradient algorithm is difficult to determine is solved. The goal of deep reinforcement learning is to maximize the reward function, i.e., reach the user and rate maximum.
Because two variables exist in the objective function, alternating optimization learning is carried out by using PPO (polyphenylene oxide), so that an optimization solution-W of the two is obtained opt And Θ opt The method comprises the following steps:
firstly, a precoding vector matrix W is optimized, and the calculation formula is as follows:
Figure BDA0003612741570000093
where y (W) represents the actual value of the objective function,
Figure BDA0003612741570000094
representing a constraint function; p W (a t |s t ) Indicating environmental information s when W is not optimized t Lower generation action a t The probability of (a) of (b) being,
Figure BDA0003612741570000095
denotes that W is optimized to W k Post-presence environment information s t Lower generation action a t The probability of (a) of (b) being,
Figure BDA0003612741570000096
is represented as optimized as W k The latter merit function, ε, represents a parameter that limits the difference between the old and new strategies
When the algorithm tends to converge, namely the reward function is maintained within a certain range for a long time and does not rise any more, the sum rate of the users in the network without the cell reaches the maximum, and the optimization is terminatedProcedure, record optimization solution W of W opt
Secondly, the phase shift matrix theta is optimized, and the optimization solution W is obtained opt The objective function f (theta, W) is substituted, and the above steps are repeated, wherein the objective function becomes f (theta, W) opt ) When the reward function is maintained within a certain range for a long time and does not rise any more, the optimization process is terminated, and the optimal solution theta of theta is obtained opt
In summary, the embodiment optimizes the sum rate of users in the cell-free network, utilizes the RIS to assist the cell-free network, designs the joint precoding problem of the system, and maximizes the sum rate of users by jointly optimizing the active precoding at the base station side and the passive precoding at the RIS side. Meanwhile, two auxiliary variables, namely a phase shift matrix theta and a precoding vector matrix W, are introduced, the idea of alternate optimization is adopted, and a PPO algorithm of deep reinforcement learning is used for optimizing each auxiliary variable in the optimization process. The method is characterized in that active precoding at a base station end and passive precoding at an RIS end are performed, PPO is adopted for learning, so that an optimal solution is adopted for both a phase shift matrix theta and a precoding vector matrix W, and the sum rate performance of a user is improved to the maximum extent.
Second embodiment
The embodiment provides a broadband capacity optimization device, which comprises the following modules:
the RIS assisted cell-free network system model building module is used for replacing part of base stations in a cell-free network with an intelligent reconfigurable surface RIS to build an RIS assisted cell-free network system model; wherein RIS communication is employed between pairs of users in the system model;
the combined precoding problem design module is used for designing a combined precoding problem in the RIS-assisted cell-free network system, and maximizing the sum rate of users by jointly optimizing the active precoding of the base station end and the passive precoding of the RIS end; and introducing an auxiliary variable: the method comprises the steps that a phase shift matrix theta and a precoding vector matrix W of the RIS convert a joint precoding problem into an optimization problem of theta and W;
and the deep reinforcement learning optimization module is used for optimizing the phase shift matrix theta and the precoding vector matrix W by respectively using a deep reinforcement learning algorithm so as to enable the theta and the W to obtain an optimal solution, thereby improving the sum rate performance of users to the maximum extent and realizing the broadband capacity optimization.
The broadband capacity optimizing device of the present embodiment corresponds to the broadband capacity optimizing method of the first embodiment described above; the functions realized by the functional modules in the broadband capacity optimization apparatus of this embodiment correspond to the flow steps in the broadband capacity optimization method of the first embodiment one by one; therefore, it is not described herein.
Third embodiment
The present embodiment provides an electronic device, which includes a processor and a memory; wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the method of the first embodiment.
The electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) and one or more memories, where at least one instruction is stored in the memory, and the instruction is loaded by the processor and executes the method.
Fourth embodiment
The present embodiment provides a computer-readable storage medium, in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the method of the first embodiment. The computer readable storage medium may be, among others, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The instructions stored therein may be loaded by a processor in the terminal and perform the above-described method.
Furthermore, it should be noted that the present invention may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the media.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or terminal equipment comprising the element.
Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once the basic inventive concepts have been learned, numerous changes and modifications may be made without departing from the principles of the invention, which shall be deemed to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims (10)

1. A method for wideband capacity optimization, comprising:
replacing part of base stations in a cell-free network with an intelligent reconfigurable surface RIS to construct an RIS auxiliary cell-free network system model; wherein RIS communication is employed between pairs of users in the system model;
designing a combined precoding problem in an RIS assisted non-cell network system, and maximizing the sum rate of users by jointly optimizing the active precoding at a base station end and the passive precoding at an RIS end; and introducing an auxiliary variable: the method comprises the steps that a phase shift matrix theta and a precoding vector matrix W of the RIS convert a joint precoding problem into an optimization problem of theta and W;
and respectively optimizing the phase shift matrix theta and the precoding vector matrix W by using a deep reinforcement learning algorithm, so that the theta and the W both obtain the optimal solution, the sum rate performance of the user is improved to the maximum extent, and the broadband capacity optimization is realized.
2. The broadband capacity optimization method of claim 1, wherein the RIS assisted cell-free network system is a discrete time slot system and the RIS assisted cell-free network system model is modeled as a markov decision model.
3. The broadband capacity optimization method of claim 1, wherein the deep reinforcement learning algorithm is a near-end policy optimization (PPO) algorithm in Deep Reinforcement Learning (DRL).
4. The method for optimizing the wideband capacity according to claim 3, wherein the optimizing the phase shift matrix Θ and the precoding vector matrix W using the deep reinforcement learning algorithm respectively makes both Θ and W obtain the optimal solution, so as to improve the sum-rate performance of the user to the maximum extent, and realize the wideband capacity optimization, includes:
step 1, initializing wireless device information, user information and intelligent environment information, including: maximum transmitting power P of base station max User weight η k And agent actions and states;
step 2, repeatedly executing the following processes: the current state s t Inputting the operator-new network to obtain the action a t Then input into the environment to obtain the reward r t And the state of the next step s t′ (ii) a Until a predetermined number of s' are stored t ,a t ,r t };
Wherein, the equivalent channel from the base station b to the user k on the subcarrier p is represented as:
Figure FDA0003612741560000011
in the formula (I), the compound is shown in the specification,
Figure FDA0003612741560000012
G b,r,p
Figure FDA0003612741560000013
respectively representing frequency domain channels from a base station b to a user k, from the base station b to a RISr and from the RISr to the user k on a subcarrier p;
Figure FDA0003612741560000014
a phase shift matrix representing the RIS R, R representing the RIS number;
the current state takes into account a Gaussian channel, let z k,p Representing additive white Gaussian noise, thenThe received signals are:
Figure FDA0003612741560000015
in the formula, y b,k,p Representing the baseband frequency domain signal, w, from base station b to user k on subcarrier p b,p,j Representing the precoding vector, s, of base station b p,j Is represented by w b,p,j A pre-coded frequency domain signal; b represents the number of base stations; k represents the number of users;
step 3, designing an objective function for solving the user and the rate;
wherein the signal s of the subcarrier p on the user k p,k The signal-to-noise ratio of (c) is expressed as:
Figure FDA0003612741560000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003612741560000022
Ξ k,p representing an additive white gaussian noise variance;
from this, a weighted sum expression of the users is derived, i.e. the objective function is:
Figure FDA0003612741560000023
wherein, P represents the number of subcarriers; eta k Representing a user weight;
two constraints are required for the objective function to be true:
Figure FDA0003612741560000024
wherein the content of the first and second substances,
Figure FDA0003612741560000025
representing a feasible set of RIS reflection coefficients; theta r,n Represents the reflectance of the RIS;
step 4, storing all s t ,a t ,r t And (3) combining and inputting the combination into a critic network, and calculating a dominance function:
wherein the merit function is calculated by the following formula:
Figure FDA0003612741560000026
where γ denotes a discount factor, s t Indicating the current state, s t′ Indicates the next state, a t Represents the current action, r t′ Representing an objective function, gamma t′-t Representing the difference between the discount factors at the next time and the current time, f' (Θ, W) representing the objective function at the next time; t' represents the next time, t represents the current time, pi(s) t ,a t ) Policy, P(s), representing the current context and the generation of actions t′ |s t ,a t ) Representing the probability of generating new context information under the current context and action; v φ (s t ) Representing a cost function, calculated by the following formula:
Figure FDA0003612741560000027
step 5, obtaining instant rewards according to a Bellman equation for learning, and adopting a PPO algorithm to realize alternate optimization learning of theta and W to obtain an optimized solution of theta and W; to maximize user and rate performance.
5. The method for optimizing broadband capacity according to claim 4, wherein the performing alternate optimization learning of Θ and W by using the PPO algorithm to obtain an optimized solution of Θ and W comprises:
optimizing a precoding vector matrix W, wherein the calculation formula is as follows:
Figure FDA0003612741560000031
where y (W) represents the actual value of the objective function,
Figure FDA0003612741560000032
representing a constraint function; p is W (a t |s t ) Indicating environmental information s when W is not optimized t Lower generation action a t The probability of (a) of (b) being,
Figure FDA0003612741560000033
denotes that W is optimized to W k Post-presence context information s t Lower generation action a t The probability of (a) of (b) being,
Figure FDA0003612741560000034
is represented as optimized as W k The latter merit function, ε, represents a parameter that limits the difference between the old and new strategies;
when the algorithm tends to converge, the optimization process is terminated, and the optimization solution W of W is recorded opt
Optimizing the phase shift matrix theta to obtain an optimized solution W opt The objective function f (theta, W) is brought in, then the theta is optimized, when the algorithm tends to be convergent, the optimization process is terminated, and the optimal solution theta of the theta is obtained opt
6. A broadband capacity optimizing device, comprising:
the RIS assisted cell-free network system model building module is used for replacing part of base stations in a cell-free network with an intelligent reconfigurable surface RIS to build an RIS assisted cell-free network system model; wherein RIS communication is employed between pairs of users in the system model;
the combined precoding problem design module is used for designing a combined precoding problem in the RIS-assisted cell-free network system, and maximizing the sum rate of users by jointly optimizing the active precoding of the base station end and the passive precoding of the RIS end; and introducing an auxiliary variable: the method comprises the steps that a phase shift matrix theta and a precoding vector matrix W of the RIS convert a joint precoding problem into an optimization problem of theta and W;
and the deep reinforcement learning optimization module is used for optimizing the phase shift matrix theta and the precoding vector matrix W by using a deep reinforcement learning algorithm respectively, so that the theta and the W both obtain an optimal solution, the sum rate performance of a user is improved to the maximum extent, and the broadband capacity optimization is realized.
7. The broadband capacity optimization device of claim 6, wherein the RIS assisted cell-free network system is a discrete time slot system and the RIS assisted cell-free network system model is modeled as a markov decision model.
8. The broadband capacity optimization device of claim 6, wherein the deep reinforcement learning algorithm is a near-end policy-optimized (PPO) algorithm in a Deep Reinforcement Learning (DRL).
9. The broadband capacity optimization device of claim 8, wherein the deep reinforcement learning optimization module is specifically configured to:
step 1, initializing wireless device information, user information and intelligent environment information, including: maximum transmitting power P of base station max User weight η k And agent actions and states;
step 2, repeatedly executing the following processes: the current state s t Inputting the operator-new network to obtain the action a t Then input into the environment to obtain the reward r t And the state s of the next step t′ (ii) a Until a predetermined number of s' are stored t ,a t ,r t };
Wherein, the equivalent channel from the base station b to the user k on the subcarrier p is represented as:
Figure FDA0003612741560000041
in the formula (I), the compound is shown in the specification,
Figure FDA0003612741560000042
G b,r,p
Figure FDA0003612741560000043
respectively representing frequency domain channels from a base station b to a user k, from the base station b to a RISr and from the RISr to the user k on a subcarrier p;
Figure FDA0003612741560000044
a phase shift matrix representing the RIS R, R representing the RIS number;
the current state takes into account a Gaussian channel, let z k,p Representing additive white gaussian noise, the received signal is:
Figure FDA0003612741560000045
in the formula, y b,k,p Representing the baseband frequency domain signal, w, on subcarrier p from base station b to user k b,p,j Representing the precoding vector, s, of base station b p,j Is represented by w b,p,j A pre-coded frequency domain signal; b represents the number of base stations; k represents the number of users;
step 3, designing an objective function for solving the user and the rate;
wherein the signal s of the subcarrier p on the user k p,k The signal-to-noise ratio of (c) is expressed as:
Figure FDA0003612741560000046
in the formula (I), the compound is shown in the specification,
Figure FDA0003612741560000047
Ξ k,p representing an additive white gaussian noise variance;
from this, a weighted sum expression of the users is derived, i.e. the objective function is:
Figure FDA0003612741560000048
wherein, P represents the number of subcarriers; eta k Representing a user weight;
two constraints are required for the objective function to be true:
Figure FDA0003612741560000049
wherein the content of the first and second substances,
Figure FDA00036127415600000410
representing a feasible set of RIS reflection coefficients; theta r,n Represents the reflectance of the RIS;
step 4, storing all s t ,a t ,r t And (3) combining and inputting the combination into a critic network, and calculating a dominance function:
wherein the merit function is calculated by the following formula:
Figure FDA00036127415600000411
where γ denotes a discount factor, s t Indicating the current state, s t′ Indicates the next state, a t Represents the current action, r t′ Representing an objective function, gamma t′-t Representing the difference between the discount factors at the next time and the current time, f' (Θ, W) representing the objective function at the next time; t' represents the next time, t represents the current time, pi(s) t ,a t ) Representing the current context and the policy of action generation, P(s) t′ |s t ,a t ) Representing the probability of generating new context information under the current context and action; v φ (s t ) Representing a cost function, calculated by the following formula:
Figure FDA0003612741560000051
step 5, obtaining instant rewards according to a Bellman equation for learning, and adopting a PPO algorithm to realize alternate optimization learning of theta and W to obtain an optimized solution of theta and W; to maximize user and rate performance.
10. The apparatus as claimed in claim 9, wherein the performing alternate optimization learning of Θ and W by using PPO algorithm to obtain an optimized solution of Θ and W includes:
optimizing a precoding vector matrix W, wherein the calculation formula is as follows:
Figure FDA0003612741560000052
where y (W) represents the actual value of the objective function,
Figure FDA0003612741560000053
representing a constraint function; PW (a) t |s t ) Indicating environmental information s when W is not optimized t Lower generation action a t The probability of (a) of (b) being,
Figure FDA0003612741560000054
denotes that W is optimized to W k Post-presence environment information s t Lower generation action a t The probability of (a) of (b) being,
Figure FDA0003612741560000055
is represented as optimized as W k The latter merit function, ε, represents a parameter that limits the difference between the old and new strategies;
when the algorithm tends to converge, the optimization process is terminated, and the optimization solution W of W is recorded opt
Optimizing the phase shift matrix theta and solving the optimization W opt Substituting an objective function f (theta, W), optimizing theta, and terminating the optimization process when the algorithm tends to converge to obtain an optimal solution theta of theta opt
CN202210435498.9A 2022-04-24 2022-04-24 Broadband capacity optimization method and device Pending CN114938512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210435498.9A CN114938512A (en) 2022-04-24 2022-04-24 Broadband capacity optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210435498.9A CN114938512A (en) 2022-04-24 2022-04-24 Broadband capacity optimization method and device

Publications (1)

Publication Number Publication Date
CN114938512A true CN114938512A (en) 2022-08-23

Family

ID=82862190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210435498.9A Pending CN114938512A (en) 2022-04-24 2022-04-24 Broadband capacity optimization method and device

Country Status (1)

Country Link
CN (1) CN114938512A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024074009A1 (en) * 2023-03-10 2024-04-11 Lenovo (Beijing) Ltd. Inter-cell interference suppression under ris-assisted wireless network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024074009A1 (en) * 2023-03-10 2024-04-11 Lenovo (Beijing) Ltd. Inter-cell interference suppression under ris-assisted wireless network

Similar Documents

Publication Publication Date Title
CN109302262B (en) Communication anti-interference method based on depth determination gradient reinforcement learning
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN109617584B (en) MIMO system beam forming matrix design method based on deep learning
Ding et al. No-pain no-gain: DRL assisted optimization in energy-constrained CR-NOMA networks
CN112383922B (en) Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
CN113162679A (en) DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN108075975B (en) Method and system for determining route transmission path in Internet of things environment
Tang et al. Decoupling or learning: Joint power splitting and allocation in MC-NOMA with SWIPT
CN111491358B (en) Adaptive modulation and power control system based on energy acquisition and optimization method
CN109274456B (en) Incomplete information intelligent anti-interference method based on reinforcement learning
CN114422363B (en) Capacity optimization method and device for unmanned aerial vehicle-mounted RIS auxiliary communication system
CN113316169B (en) UAV auxiliary communication energy efficiency optimization method and device for smart port
Sarma et al. Modeling MIMO channels using a class of complex recurrent neural network architectures
CN114885340B (en) Ultra-dense wireless network power distribution method based on deep migration learning
CN114938512A (en) Broadband capacity optimization method and device
Peng et al. Energy harvesting reconfigurable intelligent surface for UAV based on robust deep reinforcement learning
CN112330021A (en) Network coordination control method of distributed optical storage system
CN116489712A (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN116131892A (en) Combined beam forming method, system, medium and terminal of heterogeneous intelligent reflecting surface system
CN113259944B (en) RIS auxiliary frequency spectrum sharing method based on deep reinforcement learning
CN109543225A (en) Control program generation method, device, storage medium and the electronic equipment of vehicle
CN115412936A (en) IRS (intelligent resource management) assisted D2D (device-to-device) system resource allocation method based on multi-agent DQN (differential Quadrature reference network)
CN114640966A (en) Task unloading method based on mobile edge calculation in Internet of vehicles
CN114219074A (en) Wireless communication network resource allocation algorithm dynamically adjusted according to requirements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination