CN110086591B - Pilot pollution suppression method in large-scale antenna system - Google Patents

Pilot pollution suppression method in large-scale antenna system Download PDF

Info

Publication number
CN110086591B
CN110086591B CN201910399212.4A CN201910399212A CN110086591B CN 110086591 B CN110086591 B CN 110086591B CN 201910399212 A CN201910399212 A CN 201910399212A CN 110086591 B CN110086591 B CN 110086591B
Authority
CN
China
Prior art keywords
pilot frequency
pilot
optimization problem
algorithm
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910399212.4A
Other languages
Chinese (zh)
Other versions
CN110086591A (en
Inventor
朱禹涛
洪军华
连永进
胡志明
刘泽民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yingtan Taier Internet Of Things Research Center
Original Assignee
Yingtan Taier Internet Of Things Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yingtan Taier Internet Of Things Research Center filed Critical Yingtan Taier Internet Of Things Research Center
Priority to CN201910399212.4A priority Critical patent/CN110086591B/en
Publication of CN110086591A publication Critical patent/CN110086591A/en
Application granted granted Critical
Publication of CN110086591B publication Critical patent/CN110086591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0048Allocation of pilot signals, i.e. of signals known to the receiver
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/26TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
    • H04W52/267TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the information rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/32TPC of broadcast or control channels
    • H04W52/325Power control of control or pilot channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/38TPC being performed in particular situations
    • H04W52/42TPC being performed in particular situations in systems with time, space, frequency or polarisation diversity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/0413MIMO systems
    • H04B7/0426Power distribution

Abstract

The application discloses a pilot frequency pollution suppression method in a large-scale antenna system, and relates to the technical field of wireless communication. The method mainly adopts the technical scheme that an optimization problem model is established for an optimization target; decomposing the optimization problem model into a pilot frequency distribution submodel and a power control submodel; and (3) solving algorithms of the pilot frequency distribution submodel and the power control submodel are iterated circularly to obtain an approximate optimal solution of the optimization problem model, and pilot frequency distribution and power control for inhibiting pilot frequency pollution are determined. By adopting the pilot pollution inhibition method provided by the application, the uplink and the speed of a large-scale antenna system user are improved, and the problem of pilot pollution is effectively inhibited.

Description

Pilot pollution suppression method in large-scale antenna system
Technical Field
The present application relates to the field of wireless communication technologies, and in particular, to a pilot pollution suppression method in a large-scale antenna system.
Background
With the development of wireless communication networks and the emergence of emerging services such as internet of things, machine-to-machine communication, e-learning, electronic banking and the like, the number of mobile users and mobile devices is rapidly increased, and the demand for mobile data traffic is explosively increased. The demand for mobile traffic for various emerging services is expected to increase by thousands of orders of magnitude over the current demand for new technologies with higher capacity than existing wireless networks in the next decade. Although the conventional mimo technology increases the reliability and efficiency of the system by increasing the number of antennas at the base station and the user terminals. However, the number of base station antennas of the conventional multi-antenna system is small, and the performance of the system can not meet the emerging business requirement of the future society for high data rate. A large-scale antenna system is a new communication system in which a Base Station (BS) with several hundred antenna arrays simultaneously serves multiple user terminals (UEs) in the same time-frequency resource, each user terminal having a single or multiple antennas, and the base station with multiple antennas simultaneously transmits independent data streams to the multiple terminals.
Research shows that the large-scale antenna system can greatly improve the spectrum efficiency and the energy efficiency of the system by utilizing the array gain of the large-scale antenna system to support the spatial multiplexing transmission of more users. Also, when the number of antennas at the base station end tends to infinity, the multi-user multi-cell large-scale antenna system exhibits many excellent characteristics: the channel vectors among different users tend to be orthogonal, and the interference among the users in the same cell tends to be zero; the effect of uncorrelated noise in the system fades away and the small scale fading of the channel is averaged out.
In a large-scale antenna system, accurate channel state information estimation is crucial, otherwise, the communication quality of an uplink and a downlink is seriously affected, and currently, channel state information acquisition is mainly classified into two types: pilot-based channel estimation and subspace-based channel estimation. The large-scale antenna system adopts a common block fading model, the channel state information can be regarded as being kept unchanged within a period of coherent time, and channel estimation needs to be carried out again when the coherent time interval is exceeded. The channel estimation method based on subspace has high error rate, and the time for performing one-time channel estimation is longer and far longer than the channel coherence time of the system, so that the channel estimation by using the orthogonal pilot frequency sequence is the main method for acquiring channel state information at present, while the orthogonal pilot frequency sequence is limited by the channel coherence time, the quantity is very limited, and cannot meet the increasing number of mobile users, so that users in a cell in a large-scale antenna system need to send the same or non-orthogonal pilot frequency training sequence for channel estimation, so that the users are interfered by users using the same or non-orthogonal pilot frequency, and the phenomenon is called pilot pollution.
The random allocation of the pilot frequency can cause interference caused by multiplexing of the non-orthogonal pilot frequency by users among cells, and the accurate channel estimation is influenced. Rational pilot allocation is one of the methods to effectively suppress pilot pollution. Pilot frequency pollution suppression research based on pilot frequency allocation mostly defaults that uplink pilot frequency transmission power of each user is the same, only the suppression effect of different pilot frequency allocation schemes on pilot frequency pollution is considered, however, different pilot frequency transmission power control schemes also have influence on system throughput. Some researches have proposed methods for suppressing pilot pollution of large-scale antenna systems from the perspective of pilot power control, but most of them are heuristic designs, and no upper limit of pilot power control on system performance gain is given.
Disclosure of Invention
The application provides a pilot pollution suppression method in a large-scale antenna system, which is characterized by comprising the following steps: establishing an optimization problem model for an optimization target; decomposing the optimization problem model into a pilot frequency distribution submodel and a power control submodel; and (3) solving algorithms of the pilot frequency distribution submodel and the power control submodel are iterated circularly to obtain an approximate optimal solution of the optimization problem model, and the optimal pilot frequency distribution and power control for inhibiting pilot frequency pollution are determined.
As above, for modeling the optimization objective, the optimization problem model is obtained as follows:
Figure BDA0002059177910000021
wherein, the value of i is 1-L, which represents the ith cell in the large-scale antenna system, and L is the total number of the cells; k is 1-K and represents the kth user terminal in the cell, and K is the total number of the user terminals in the cell;
Figure BDA0002059177910000022
in order to be the pilot frequency allocation mode,
Figure BDA0002059177910000023
represents all (K!)LA pilot frequency distribution scheme is adopted, wherein s represents a pilot frequency set and comprises k orthogonal pilot frequencies in total; p is a radical ofikRepresents the uplink pilot transmission power, p, of the kth user terminal in the ith cell base stationjk′Denotes the jth cellThe uplink pilot transmission power of the kth user terminal, p ═ pik}L×KA matrix of L rows and K columns formed by the uplink pilot frequency transmitting power of all users;
Figure BDA0002059177910000024
hiikrepresenting the signal gain of the kth user terminal in the ith cell,
Figure BDA0002059177910000025
is hiikThe conjugate transpose of (1);
Figure BDA0002059177910000026
hijk′indicating the channel gain from the kth user terminal in the jth cell to the ith cell base station,
Figure BDA0002059177910000027
is hijk′The conjugate transpose of (1);
Figure BDA0002059177910000028
βijkis a large scale fading factor, gijkIn order to provide a small-scale fading factor,
Figure BDA0002059177910000029
a set of complex numbers is represented that,
Figure BDA00020591779100000210
a complex vector representing a dimension of M + 1; c. CikIndicating the pilot sequence used by the kth user terminal in the ith cell, cjk′Indicating a pilot sequence used by a k' th user terminal in a j cell; sigma2Is the standard deviation of gaussian white noise.
As above, when the number of antennas in the large-scale antenna system increases, the small-scale fading factor is ignored according to the characteristics of the large-scale antenna system, and the optimization problem model of the optimization target is simplified as follows:
Figure BDA0002059177910000031
wherein s.t. denotes that the simplified optimization problem model is limited to 0 < pik≤Pmax,PmaxRepresenting the maximum transmit power of the user uplink.
As above, the solving algorithm of the power control submodel adopts the continuous convex approximation algorithm, and the convex approximation problem is obtained as follows:
Figure BDA0002059177910000032
wherein, p is a matrix formed by the uplink pilot frequency transmitting power of all users; a isikTo represent
Figure BDA0002059177910000033
bikTo represent
Figure BDA0002059177910000034
As above, the method for maximally tightening the lower bound of the original optimization target by the iterative method in the solution process using the continuous convex approximation algorithm specifically includes the following sub-steps:
obtaining a value of an optimization problem according to the initialized user power;
circularly solving the optimization problem, and outputting the power distribution result p of the t time when the solution of the optimization problem meets the set condition(t)I.e. the optimal solution of the optimization problem and the value of the optimization problem to which the optimal solution corresponds.
As above, the circular solution of the optimization problem to output the optimal solution of the optimization problem specifically includes: after the initial solution phi of the optimization problem is calculated [0 ]]Then, t is added by 1, if calculated
Figure BDA0002059177910000035
T continues to add 1 by itself, and the optimization problem is solved again to obtain the power distribution result p of the t time(t)Return to continue the comparison
Figure BDA0002059177910000036
And epsilon up to
Figure BDA0002059177910000041
And outputting the calculated power distribution result.
As above, the pilot allocation sub-model solution algorithm adopts a pilot allocation algorithm based on distributed Q learning, and models the pilot allocation sub-problem in combination with the Q learning algorithm, and specifically includes:
virtual agent: taking L cell base stations in a large-scale antenna system as virtual agents;
the actions are as follows: each agent has an action set A, the ith agent action
Figure BDA0002059177910000042
Wherein
Figure BDA0002059177910000043
Is the pilot allocation for each user in the ith cell, the action of each agent is K! K is the total number of user terminals in the cell;
the state is as follows: the time division duplex multi-cell multi-user large-scale antenna system consisting of L hexagonal cell cells is used as an environment for interacting with intelligent agents, and each intelligent agent has a respective state vector which represents a user pilot frequency distribution state in each cell;
reward punishment signal: the method comprises the following steps that an intelligent body selection action acts on the environment, the environment influences the learning process of the intelligent body through reward and punishment signals, and a return function of the intelligent body is determined as a system and a speed in an ideal state after a large-scale antenna system base station selects a certain pilot frequency distribution scheme; and the agents update the respective Q value tables according to the return function, and after the Q tables are updated, each agent needs to select actions by an epsilon greedy strategy, and randomly selects action vectors in an action space according to the probability epsilon or selects actions according to the probability 1-epsilon.
As above, the pilot allocation sub-model solving algorithm adopts a pilot allocation algorithm based on distributed Q learning, and specifically includes the following sub-steps:
initializing a pilot frequency, power, an agent action set and a Q value table;
traversing each agent in turn:
if the random number generated for each agent is less than the probability epsilon, then an action is arbitrarily selected for the agent from the agent action set; if the random number generated for each agent is greater than or equal to the probability ε, then an action a is selected based on the Q-value tableiExecuting the motion vector aiAnd traversing each agent i, acquiring a return function according to the state and the last action, and updating the Q value table according to the return function.
As above, the pilot pollution suppression main algorithm for joint power control and pilot allocation is constructed according to the solving algorithm of the pilot allocation submodel and the power control submodel, and the suboptimal joint pilot allocation and power control solution for the system and rate maximization problem is obtained, which specifically includes the following sub-steps:
initializing the power of each user terminal to be equal power distribution;
calculating the values of a system and a rate according to the power of the user terminal, and when the difference value between the value of the system and the rate of the (i + 1) th time and the result of the ith time is greater than or equal to an error value epsilon, sequentially and alternately iterating a pilot frequency distribution algorithm and a continuous convex approximation algorithm based on distributed Q learning; and when the difference value between the system sum rate value of the (i + 1) th time and the result of the (i) th time is smaller than the error value epsilon, ending the iteration process and ending the algorithm.
As above, the pilot allocation algorithm and the successive convex approximation algorithm based on distributed Q learning are sequentially and alternately iterated, specifically: and setting an error value epsilon, and circularly executing the operation when the difference value between the i +1 th time system sum speed value and the i-th time result is less than epsilon or i is equal to 0: pilot allocation algorithm and p based on distributed Q learning(i-1)Obtaining a(i)Then according to the successive convex approximation algorithm and a(i)Obtaining p(i)Update R(i)(ii) a Until the difference between the value of the system sum rate of the (i + 1) th time and the result of the (i) th time is larger than or equal to epsilon.
The beneficial effect that this application realized is as follows: by adopting the pilot pollution inhibition method provided by the application, the uplink and the speed of a large-scale antenna system user are improved, and the problem of pilot pollution is effectively inhibited.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a schematic diagram of a large-scale antenna model;
fig. 2 is a flowchart of a pilot pollution suppression method according to an embodiment of the present application;
FIG. 3 is a flow chart of a distributed Q-learning based pilot allocation algorithm;
FIG. 4 is a flowchart of an iterative method employed to tighten the lower bound of the original optimization target during the solution of the convex optimization problem;
fig. 5 is a flow chart of a pilot pollution suppression algorithm for joint power control and pilot allocation.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application describes a large-scale antenna model, as shown in fig. 1, in a multi-cell multi-user large-scale antenna system (MIMO system), each cell includes a base station configured with M antennas and K single-antenna user terminals (UEs), the base station is located at the center of the cell, the single-antenna user terminals are uniformly distributed in the cell, and reciprocity exists between the M antennas and the K single-antenna user terminals, that is, in the case of single excitation, when positions of an excitation port and a response port are interchanged, a response does not change due to the interchange.
Because the number of pilot frequencies is limited by the coherence time of a channel, orthogonal pilot training sequences cannot be allocated to all users; in the embodiment of the application, users in the same cell are set to transmit mutually orthogonal pilot frequencies, and users in different cells reuse the same set of pilot frequency sets. On the basis of pilot frequency distribution, in order to further suppress pilot frequency pollution, the influence of pilot frequency transmission power on the performance of a large-scale antenna system is considered with the aim of maximizing the uplink and the rate of a system user, and the uplink transmission and the efficiency of the large-scale antenna system are improved as much as possible.
Example one
As shown in fig. 2, the pilot pollution suppression method includes:
step 110, establishing an optimization problem model for an optimization target;
for the optimization objective modeling, an optimization problem model is obtained as follows:
Figure BDA0002059177910000061
in the formula (1), the value of i is 1-L, which represents the ith cell in the large-scale antenna system, and L is the total number of the cells; k is 1-K and represents the kth user terminal in the cell, and K is the total number of the user terminals in the cell;
Figure BDA0002059177910000062
in order to be the pilot frequency allocation mode,
Figure BDA0002059177910000063
represents all (K!)LA pilot frequency distribution scheme is adopted, wherein s represents a pilot frequency set and comprises k orthogonal pilot frequencies in total; p is a radical ofikRepresents the uplink pilot transmission power, p, of the kth user terminal in the ith cell base stationjk′Denotes the uplink pilot transmission power of the kth user terminal in the jth cell, p ═ pik}L×KL rows and K columns formed by uplink pilot transmitting power of all usersA matrix of (a);
Figure BDA0002059177910000064
hiikrepresenting the signal gain of the kth user terminal in the ith cell,
Figure BDA0002059177910000065
is hiikThe conjugate transpose of (1);
Figure BDA0002059177910000066
hijk′indicating the channel gain from the kth user terminal in the jth cell to the ith cell base station,
Figure BDA0002059177910000067
is hijk′The conjugate transpose of (1);
Figure BDA0002059177910000068
βijkis a large scale fading factor, gijkIn order to provide a small-scale fading factor,
Figure BDA0002059177910000069
a set of complex numbers is represented that,
Figure BDA00020591779100000610
a complex vector representing a dimension of M + 1; c. CikIndicating the pilot sequence used by the kth user terminal in the ith cell, cjk′Indicating a pilot sequence used by a k' th user terminal in a j cell; sigma2Is the standard deviation of white gaussian noise;
when the number of antennas in the large-scale antenna system is gradually increased, the small-scale fading factor g is increased according to the characteristics of the large-scale antenna systemijkAveraged, approximately equal to 0, neglected, simplifying the optimization problem model of the optimization objective to:
Figure BDA00020591779100000611
in the formula (2), "s.t." meanssubject to, meaning that equation (2) is limited to 0 < pik≤Pmax,PmaxRepresents the maximum transmit power of the user uplink, and cikBelongs to a pilot set s;
as can be seen from equation (2), under the constraint of orthogonal pilot set and uplink maximum transmission power, the optimization target is not only associated with the pilot allocation mode
Figure BDA0002059177910000071
It is also influenced by the uplink transmission power p of each user; the system sum rate (i.e., the sum of the uplink rates of all users in the system) of a large-scale antenna system is expressed as including a pilot allocation pattern
Figure BDA0002059177910000072
Function of two variables of sum power p
Figure BDA0002059177910000073
Figure BDA0002059177910000074
According to the pilot frequency distribution mode
Figure BDA0002059177910000075
And a function of the power p
Figure BDA0002059177910000076
The optimization problem model is simplified as follows:
Figure BDA0002059177910000077
in equation (4), "s.t." means subject to, meaning that equation (2) is limited to 0 < pik≤Pmax,PmaxRepresents the maximum transmit power of the user uplink;
Figure BDA0002059177910000078
is made ofThe term "refers to any user terminal in any cell.
Referring back to fig. 1, step 120, decomposing the optimization problem model into a pilot allocation submodel and a power control submodel;
the optimization problem model obtained by modeling the optimization target is
Figure BDA0002059177910000079
And p, which belongs to the problem of combination optimization, and the optimal solution of the joint problem cannot be obtained, so that the optimization problem model is decomposed into a pilot frequency allocation sub-model and a power control sub-model, and the original optimization problem is decomposed into a pilot frequency allocation sub-problem and a power control sub-problem for processing.
And step 130, circularly iterating a pilot frequency distribution algorithm and a power control algorithm to obtain an approximate optimal solution of the optimal problem model and determine pilot frequency distribution and power control for inhibiting pilot frequency pollution.
Specifically, the pilot frequency distribution subproblem is optimized under the condition of giving user power distribution, the power control subproblem is optimized under the condition of giving pilot frequency distribution, and then the solving algorithms of the two subproblems are subjected to loop iteration to obtain the approximate optimal solution of the optimization problem model, so that the pilot frequency distribution and power control scheme capable of effectively inhibiting pilot frequency pollution is obtained.
Pilot frequency allocation submodel:
specifically, in the case of a given user power allocation, it is preferable to solve the pilot allocation subproblem by using a pilot allocation algorithm based on distributed Q learning, specifically including: converting the pilot frequency distribution subproblem into L parallel base stations and jointly utilizing a multi-agent Q learning algorithm to solve the problem of the optimal pilot frequency distribution scheme;
it should be noted that the multi-agent Q learning algorithm is divided into two types, namely, centralized Q learning and distributed Q learning; for the problem, the total amount of joint actions is the number of distribution schemes after exhaustion, so that the algorithm complexity is too high, and the learning process may not be realized. Therefore, in this problem, the distributed Q learning algorithm with each base station as an agent is preferably adopted in the present application.
In the distributed Q learning process adopted in the present application, each base station maintains its own Q value table, and models the pilot frequency assignment subproblem in combination with five major elements of the Q learning algorithm:
virtual agent: taking L cell base stations in a large-scale antenna system as virtual agents;
the actions are as follows: each agent has a set of actions
Figure BDA0002059177910000081
Ith agent action
Figure BDA0002059177910000082
Wherein
Figure BDA0002059177910000083
Is the pilot allocation for each user in the ith cell, the sum of the actions of all L agents constitutes the solution space in the optimization objective
Figure BDA0002059177910000084
The state is as follows: a TDD (Time Division duplex) multi-cell multi-user large-scale antenna system consisting of L hexagonal cells is used as an environment for interacting with intelligent agents, each intelligent agent has a respective state vector, and the state vector of the ith intelligent agent is
Figure BDA0002059177910000085
Representing the user pilot frequency distribution state in each cell;
reward punishment signal: the intelligent agent selection action acts on the environment, the environment influences the learning process of the intelligent agent through reward and punishment signals, namely, the base station selects a certain pilot frequency distribution scheme to play a role in the MIMO system, and the selection of the base station is fed back according to whether the role positively affects the system or not, so that the next pilot frequency distribution scheme selection of the base station is influenced; in order to maximize the system uplink and the rate, the intelligent agent needs to take the system uplink and the rate after a certain action in a certain state, specifically:
at time t, the intelligent base station i senses that the current MIMO system environment is in a pilot frequency distribution state s, and selects corresponding action
Figure BDA0002059177910000086
The pilot frequency distribution is carried out on the users in the cell, the action has influence on the uplink and the speed of the system, the environment state is changed from s to s', and a return function is fed back to the intelligent agent base station i
Figure BDA0002059177910000087
stIndicating the current state of the system at time t,
Figure BDA0002059177910000088
representing the action taken by agent i at time t,
Figure BDA0002059177910000089
representing the reward function of the ith agent at the time t;
considering the mutual influence among different agents, the reward function is determined by the cooperative action of each agent, so that the ith agent is in the state s at the moment ttBy time selecting actions
Figure BDA00020591779100000810
Is a function of return
Figure BDA00020591779100000811
The system and rate in an ideal state after a pilot allocation scheme is selected at time t for base station i are determined as follows:
Figure BDA00020591779100000812
in the Q learning process, the intelligent agent updates respective Q value tables according to the return function, when the new Q value is larger than the Q value at the previous t moment at the t +1 moment, the Q value tables are updated, otherwise, the Q value is not changed, and the Q value is calculated as shown in the following formula;
Figure BDA0002059177910000091
in the formula (6), Qi t+1(s,ai) Is the Q value at time t +1, Qi t(s,ai) Is the Q value at the time t; alpha epsilon (0, 1)]The learning rate is used for measuring the speed of Q learning convergence, when the value of alpha is small, the learning time consumption is large, otherwise, the algorithm may not converge; gamma is a discount factor representing the attenuation degree of the return function value;
Figure BDA0002059177910000092
represents the operation of which Q value is maximum at time t, s'i,a′iThe method comprises the steps that the state and action selection of the ith intelligent agent at the moment t is carried out, A is a limited set of all possible actions of the intelligent agent, namely a pilot frequency distribution scheme which can be adopted by each base station, and the limited sets of different intelligent agents are preferably set to be the same;
after Q table update, each agent performs action selection using an epsilon greedy strategy, selecting an action vector randomly in the action space with a probability epsilon, or selecting action a with a probability of 1-epsiloni
Figure BDA0002059177910000093
ε is [0, 1]The random number of (2) is generally 0.1; in the process, the agent i continuously optimizes an action selection strategy, the strategy represents the mapping relation between the environment and the action, and different environment states correspond to different action selections.
As shown in fig. 3, the pilot allocation algorithm based on distributed Q learning specifically includes:
step 210: initializing a pilot frequency, power, an agent action set and a Q value table;
specifically, the pilot frequency of each user terminal in each cell is defined as cikPower of pikAnd defining an action set for each agent, initializing a Q value table to make Q of each agenti(si,ai)=0。
Step 220: and traversing each agent in sequence, and judging whether the random number generated by each agent is smaller than the probability epsilon, if so, randomly selecting an action for the agent from the agent action set, otherwise, executing the step 230:
defining each agent as i, generating a random number xi for the ith agenti∈[0,1]If (xi)i< epsilon), then one action is arbitrarily selected for the ith agent from the agent action set, if (xi)i≧ epsilon), step 230 is executed.
Step 230: selecting action a according to Q value tableiExecuting the motion vector ai
Defining the Q value of an agent i at the time t as Qt(si,ai) Updating the Q value table according to the calculated Q value, and generating a random number ([ xi ]) when the ith agent generatesi≧ ε), an action is selected from the Q-value table
Figure BDA0002059177910000094
And executes the motion vector ai
Step 240: traversing each agent i, and acquiring a return function R according to the state and the last actioni(si,ai) Updating the Q value table Q according to the return functioni t+1(si,ai)。
The pilot frequency distribution algorithm based on multi-agent Q learning is adopted, algorithm complexity is greatly reduced, mutual influence among different agents is considered, and a return function target is determined by cooperative action of each agent, namely an optimization target of a pilot frequency distribution subproblem.
Power control submodel:
specifically, in the case of a given pilot allocation mode, since the optimization target is in a logarithmic form, it is preferable to convert the target function by using a Sequential Convex Approximation (SCA) algorithm; for arbitrary non-negative numbers γ and γ0Satisfies the following formula:
log(1+γ)≥f(γ,a,b)=alog(γ)+b (7)
wherein for a specific value γ0
Figure BDA0002059177910000101
f (γ, a, b) is a function with γ, a, b as an argument, i.e.:
Figure BDA0002059177910000102
in the formula (8), when γ ═ γ0The equal sign is taken, the formula (8) is a univariate function taking gamma as a variable, and the inequality is proved to be established by the derivation of a shift term. Order to
Figure BDA0002059177910000103
Derived from the formula
Figure BDA0002059177910000104
When gamma > gamma0When f (gamma) is increased, when gamma < gamma0When f (gamma) is decreased, when gamma is gamma0The time f (γ) takes the minimum value of 0, so the inequality holds.
Therefore, the temperature of the molten metal is controlled,
Figure BDA0002059177910000105
in the formula (9), the reaction mixture,
Figure BDA0002059177910000106
aikto represent
Figure BDA0002059177910000107
bikTo represent
Figure BDA0002059177910000108
According to the convexity approximation, the optimization target is approximated to the following optimization problem by using a speed lower bound approximation method:
Figure BDA0002059177910000109
the derived formula (10) is a convex optimization problem, and a standard convex optimization tool (such as a cvx tool) is used for directly solving the convex optimization problem; it should be noted that, although the optimization problem is converted into the convex optimization problem by the continuous convex approximation method, and the approximate optimization problem is solved, the approximate problem as in the formula (10) is only to maximize the lower bound of the original optimization target, so in order to further improve the accuracy of the result, the lower bound of the original optimization target is tightened as much as possible by an iterative manner in the solving process, specifically including, as shown in fig. 4:
step 310: obtaining a value of an optimization problem according to the initialized user power;
the method adopts a power control algorithm based on SCA and inputs a pilot frequency allocation scheme cikDefining a large-scale fading factor beta from each user terminal to each base stationijk(ii) a Defining the power p of each userik=PmaxWhen the initialization t is 0, the optimization problem solution at the time of initialization is defined as p(0)Remember phi 0]To solve the problem p(0)The value of the corresponding optimization problem.
Step 320: circularly solving the optimization problem, and outputting the power distribution result p of the t time when the solution of the optimization problem meets the set condition(t)I.e. the optimal solution of the optimization problem, and the value phi t of the optimization problem to which the optimal solution corresponds]。
t is added by 1 to solve the optimization problem to obtain the power distribution result p of the t time(t)And phi t](ii) a If it calculates
Figure BDA0002059177910000111
T continues to add 1 by itself, and the optimization problem is solved again to obtain the power distribution result p of the t time(t)Return to continue the comparison
Figure BDA0002059177910000112
And epsilon up to
Figure BDA0002059177910000113
And outputting the calculated power distribution result.
The optimization problem is decomposed into two sub-problems, a pilot frequency distribution algorithm and a continuous convex approximation (SCA) algorithm based on distributed Q learning are obtained, a main algorithm, namely a pilot frequency pollution suppression algorithm combining power control and pilot frequency distribution, is constructed according to the two sub-problems, and a suboptimal combined pilot frequency distribution and power control solution method for the system and rate maximization problem is obtained, and specifically comprises the following steps that as shown in figure 5:
step 410: initializing the power of each user terminal;
specifically, p is(0)Initialisation to equal power allocation, with i-0, R(0)=0;
Step 420: according to the power calculation system and the speed value of the user terminal, when the difference value between the system sum speed value of the (i + 1) th time and the result of the ith time is larger than or equal to an error value epsilon, sequentially and alternately iterating a pilot frequency distribution algorithm and a continuous convex approximation (SCA) algorithm based on distributed Q learning;
the method comprises the following steps of performing pilot frequency distribution by using a pilot frequency distribution algorithm based on distributed Q learning, performing power distribution by using a continuous convex approximation (SCA) algorithm on the basis of the pilot frequency distribution, performing pilot frequency distribution again on the basis of the power distribution, and sequentially performing alternate iteration, specifically, setting an error value epsilon, and when the difference value between the system sum rate value of the (i + 1) th time and the result of the (i) th time is less than epsilon, or i is 0, performing operation in a circulating manner: pilot allocation algorithm and p based on distributed Q learning(i-1)Obtaining a(i)According to the Sequential Convex Approximation (SCA) algorithm and a(i)Obtaining p(i)Update R(i)(ii) a Until the difference between the value of the system sum rate of the (i + 1) th time and the result of the (i) th time is larger than or equal to epsilon.
Step 430: and when the difference value between the system sum rate value of the (i + 1) th time and the result of the (i) th time is smaller than the error value epsilon, ending the iteration process and ending the algorithm.
The beneficial effect that this application realized is as follows:
(1) by adopting the pilot frequency pollution inhibition method provided by the application, the uplink and the speed of a large-scale antenna system user are improved, and the problem of pilot frequency pollution is effectively inhibited;
(2) the method comprises the steps of modeling an optimization target and splitting the target to decompose a combined optimization problem which cannot directly obtain an optimal solution, and obtain an approximate optimal scheme of the optimization target;
(3) the pilot frequency distribution subproblem is solved by using a multi-agent distributed Q learning method, the optimization problem is mapped to the Q learning process, the algorithm complexity is greatly reduced, the mutual influence among different agents is considered, the return function target is determined by the cooperative action of each agent, and the optimization target of the pilot frequency distribution subproblem is obtained.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (3)

1. A method for pilot pollution mitigation in a massive antenna system, comprising:
establishing an optimization problem model for an optimization target;
decomposing the optimization problem model into a pilot frequency distribution submodel and a power control submodel;
solving algorithms of the loop iteration pilot frequency distribution submodel and the power control submodel are adopted to obtain an approximate optimal solution of the optimization problem model and determine the optimal pilot frequency distribution and power control for inhibiting the pilot frequency pollution;
the solving algorithm of the power control submodel adopts a continuous convex approximation algorithm, and the convex approximation problem is obtained as follows:
Figure FDA0003264391930000011
s.t.0<pik≤Pmax
wherein, p is a matrix formed by the uplink pilot frequency transmitting power of all users; the value of i is 1-L, which represents the ith cell in the large-scale antenna system, and L is the total number of the cells; k is 1-K and represents the kth user terminal in the cell, and K is the total number of the user terminals in the cell; a isikTo represent
Figure FDA0003264391930000012
bikTo represent
Figure FDA0003264391930000013
pikRepresents the uplink pilot transmission power, P, of the kth user terminal in the ith cell base stationmaxRepresents the maximum transmit power of the user uplink; p is a radical ofjk′Representing the uplink pilot frequency transmitting power of the kth user terminal in the jth cell; c. CikIndicating the pilot sequence used by the kth user terminal in the ith cell, cjk′Indicating a pilot sequence used by a k' th user terminal in a j cell; sigma2Is the standard deviation of white gaussian noise;
in the solving process by adopting the continuous convex approximation algorithm, the lower bound of the original optimization target is maximally tightened by an iteration method, and the method specifically comprises the following substeps:
obtaining a value of an optimization problem according to the initialized user power;
circularly solving the optimization problem, and outputting the power distribution result p of the t time when the solution of the optimization problem meets the set condition(t)The optimal solution of the optimization problem and the value of the optimization problem corresponding to the optimal solution;
the method for circularly solving the optimization problem to output the optimal solution of the optimization problem specifically comprises the following steps: after the initial solution phi of the optimization problem is calculated [0 ]]Then, t is added by 1, if calculated
Figure FDA0003264391930000021
Then t continuesAdding 1, solving the optimization problem again to obtain the power distribution result p of the t time(t)Return to continue the comparison
Figure FDA0003264391930000022
And epsilon up to
Figure FDA0003264391930000023
Outputting the power distribution result obtained by calculation;
the pilot frequency distribution sub-model solving algorithm adopts a pilot frequency distribution algorithm based on distributed Q learning, and is combined with the Q learning algorithm to model the pilot frequency distribution sub-problem, and the method specifically comprises the following steps:
virtual agent: taking L cell base stations in a large-scale antenna system as virtual agents;
the actions are as follows: each agent has an action set A, the ith agent action
Figure FDA0003264391930000024
Wherein
Figure FDA0003264391930000025
Is the pilot allocation for each user in the ith cell, the action of each agent is K! K is the total number of user terminals in the cell;
the state is as follows: the time division duplex multi-cell multi-user large-scale antenna system consisting of L hexagonal cell cells is used as an environment for interacting with intelligent agents, and each intelligent agent has a respective state vector which represents a user pilot frequency distribution state in each cell;
reward punishment signal: the method comprises the following steps that an intelligent body selection action acts on the environment, the environment influences the learning process of the intelligent body through reward and punishment signals, and a return function of the intelligent body is determined as a system and a speed in an ideal state after a large-scale antenna system base station selects a certain pilot frequency distribution scheme; the intelligent agents update respective Q value tables according to the return function, after the Q tables are updated, each intelligent agent needs to select actions by an epsilon greedy strategy, and randomly selects action vectors in an action space according to the probability epsilon or selects actions according to the probability of 1-epsilon;
the pilot frequency distribution sub-model solving algorithm adopts a pilot frequency distribution algorithm based on distributed Q learning, and specifically comprises the following sub-steps:
initializing a pilot frequency, power, an agent action set and a Q value table;
traversing each agent in turn:
if the random number generated for each agent is less than the probability epsilon, then an action is arbitrarily selected for the agent from the agent action set; if the random number generated for each agent is greater than or equal to the probability ε, then an action a is selected based on the Q-value tableiExecuting the motion vector aiTraversing each agent i, acquiring a return function according to the state and the last action, and updating a Q value table according to the return function;
the pilot pollution suppression main algorithm combining power control and pilot distribution is constructed according to the solving algorithm of the pilot distribution submodel and the power control submodel, and a suboptimal combined pilot distribution and power control solving method for the system and rate maximization problem is obtained, and the method specifically comprises the following substeps:
initializing the power of each user terminal to be equal power distribution;
calculating the values of a system and a rate according to the power of the user terminal, and when the difference value between the value of the system and the rate of the (i + 1) th time and the result of the ith time is greater than or equal to an error value epsilon, sequentially and alternately iterating a pilot frequency distribution algorithm and a continuous convex approximation algorithm based on distributed Q learning; when the difference value between the system sum rate value of the (i + 1) th time and the result of the ith time is smaller than the error value epsilon, ending the iteration process and ending the algorithm;
the method comprises the following steps of sequentially and alternately iterating a pilot frequency distribution algorithm and a continuous convex approximation algorithm based on distributed Q learning, and specifically comprises the following steps: and setting an error value epsilon, and circularly executing the operation when the difference value between the i +1 th time system sum speed value and the i-th time result is less than epsilon or i is equal to 0: pilot allocation algorithm and p based on distributed Q learning(i-1)Obtaining a(i)Then according to the successive convex approximation algorithm and a(i)Obtaining p(i)Update R(i)(ii) a Until the difference between the value of the system sum rate of the (i + 1) th time and the result of the (i) th time is larger than or equal to epsilon.
2. The method of claim 1, wherein the modeling of the optimization objective to obtain the optimization problem model comprises:
Figure FDA0003264391930000031
wherein the content of the first and second substances,
Figure FDA0003264391930000032
in order to be the pilot frequency allocation mode,
Figure FDA0003264391930000033
represents all (K!)LA pilot frequency distribution scheme is adopted, wherein s represents a pilot frequency set and comprises k orthogonal pilot frequencies in total; p ═ pik}L×KA matrix of L rows and K columns formed by the uplink pilot frequency transmitting power of all users;
Figure FDA0003264391930000034
hiikrepresenting the signal gain of the kth user terminal in the ith cell,
Figure FDA0003264391930000035
is hiikThe conjugate transpose of (1);
Figure FDA0003264391930000036
hijk′indicating the channel gain from the kth user terminal in the jth cell to the ith cell base station,
Figure FDA0003264391930000037
is hijk′The conjugate transpose of (1);
Figure FDA0003264391930000038
Figure FDA0003264391930000039
a set of complex numbers is represented that,
Figure FDA00032643919300000310
represents a complex vector of dimension M + 1.
3. The method as claimed in claim 2, wherein when the number of antennas in the large-scale antenna system increases, the small-scale fading factor is ignored according to the characteristics of the large-scale antenna system, and the optimization problem model of the optimization target is simplified as:
Figure FDA00032643919300000311
s.t.0<pik≤Pmax;cik∈s
wherein s.t. denotes that the simplified optimization problem model is limited to 0 < pik≤Pmax
CN201910399212.4A 2019-05-14 2019-05-14 Pilot pollution suppression method in large-scale antenna system Active CN110086591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910399212.4A CN110086591B (en) 2019-05-14 2019-05-14 Pilot pollution suppression method in large-scale antenna system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910399212.4A CN110086591B (en) 2019-05-14 2019-05-14 Pilot pollution suppression method in large-scale antenna system

Publications (2)

Publication Number Publication Date
CN110086591A CN110086591A (en) 2019-08-02
CN110086591B true CN110086591B (en) 2021-10-22

Family

ID=67420050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910399212.4A Active CN110086591B (en) 2019-05-14 2019-05-14 Pilot pollution suppression method in large-scale antenna system

Country Status (1)

Country Link
CN (1) CN110086591B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113193945B (en) * 2021-05-07 2022-08-02 中山大学 Pilot frequency and power distribution joint optimization method, system, medium and communication equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012105278A (en) * 2002-10-25 2012-05-31 Qualcomm Inc Multi-carrier transmission using multiple symbol lengths
CN104009947A (en) * 2014-06-19 2014-08-27 华中科技大学 Pilot signal sending and channel estimation method
CN104243121A (en) * 2014-09-11 2014-12-24 西安交通大学 Pilot frequency distribution method based on sectorization in Massive MIMO system
CN104270820A (en) * 2014-08-04 2015-01-07 西安交通大学 Combined vertical beam control and power allocation method in 3D large-scale MIMO system
CN104378813A (en) * 2008-01-30 2015-02-25 高通股份有限公司 Method and apparatus for mitigating pilot pollution in a wireless network
CN109151975A (en) * 2018-07-27 2019-01-04 北京工业大学 A kind of the joint dynamic pilot and data power distribution method of the extensive mimo system of time division duplex

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012105278A (en) * 2002-10-25 2012-05-31 Qualcomm Inc Multi-carrier transmission using multiple symbol lengths
CN104378813A (en) * 2008-01-30 2015-02-25 高通股份有限公司 Method and apparatus for mitigating pilot pollution in a wireless network
CN104009947A (en) * 2014-06-19 2014-08-27 华中科技大学 Pilot signal sending and channel estimation method
CN104270820A (en) * 2014-08-04 2015-01-07 西安交通大学 Combined vertical beam control and power allocation method in 3D large-scale MIMO system
CN104243121A (en) * 2014-09-11 2014-12-24 西安交通大学 Pilot frequency distribution method based on sectorization in Massive MIMO system
CN109151975A (en) * 2018-07-27 2019-01-04 北京工业大学 A kind of the joint dynamic pilot and data power distribution method of the extensive mimo system of time division duplex

Also Published As

Publication number Publication date
CN110086591A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
Xia et al. A deep learning framework for optimization of MISO downlink beamforming
Cheng et al. Optimal pilot and payload power control in single-cell massive MIMO systems
CN109005551B (en) Multi-user NOMA downlink power distribution method of non-ideal channel state information
CN104601209B (en) A kind of cooperative multi-point transmission method suitable for 3D mimo systems
CN105680920B (en) A kind of multi-user multi-antenna number energy integrated communication network throughput optimization method
CN106992805A (en) Multi-antenna transmission method, base station and user terminal
Zaher et al. Learning-based downlink power allocation in cell-free massive MIMO systems
CN102111352B (en) Method, device and system for information feedback in multi-point coordinated joint transmission network
CN107947841B (en) Multi-antenna user pair scheduling method for large-scale MIMO non-orthogonal multiple access system
Dong et al. Improved joint antenna selection and user scheduling for massive MIMO systems
CN111970033A (en) Large-scale MIMO multicast power distribution method based on energy efficiency and spectrum efficiency joint optimization
CN114337976A (en) Transmission method combining AP selection and pilot frequency allocation
CN109995496B (en) Pilot frequency distribution method of large-scale antenna system
CN110086591B (en) Pilot pollution suppression method in large-scale antenna system
CN102740325B (en) Method, device for acquiring channel information and method, device for optimizing beam forming
CN106851726A (en) A kind of cross-layer resource allocation method based on minimum speed limit constraint
Li et al. IRS-Based MEC for Delay-Constrained QoS Over RF-Powered 6G Mobile Wireless Networks
CN108282788A (en) A kind of resource allocation methods of the Energy Efficient based on quasi- newton interior point method
CN105611640B (en) A kind of adjustable CoMP downlink user dispatching method of equitable degree
CN108064070B (en) User access method for large-scale MIMO multi-cell network
CN102104879A (en) Multi-cell cooperative transmission method
CN114710187A (en) Power distribution method for multi-cell large-scale MIMO intelligent communication under dynamic user number change scene
CN111010697B (en) Multi-antenna system power optimization method based on wireless energy carrying technology
CN110445519B (en) Method and device for resisting inter-group interference based on signal-to-interference-and-noise ratio constraint
Chung et al. Semidynamic cell-clustering algorithm based on reinforcement learning in cooperative transmission system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant