CN116248156A

CN116248156A - Deep learning-based large-scale MIMO channel state information feedback and reconstruction method

Info

Publication number: CN116248156A
Application number: CN202211395637.6A
Authority: CN
Inventors: 杨朔; 李勇; 刘丽哲; 王斌; 李行健; 朱云飞; 汪畅; 夏金涛; 赵显超
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-06-09

Abstract

The invention discloses a deep learning-based large-scale MIMO channel state information feedback and reconstruction method, and belongs to the technical field of communication. It comprises the following steps: building a deep neural network model; generating a sample data set required for training, validation and testing; training the built model on a training set; saving the trained model parameters; loading the stored model parameters, and performing model performance test on the test set; the method comprises the steps of using a model with best test performance for channel state information feedback and reconstruction, performing two-dimensional discrete Fourier transform on a channel matrix of MIMO channel state information in a space frequency domain at a user terminal to obtain the channel matrix, inputting the channel matrix from an encoder network of the user terminal, transmitting the channel matrix to a base station terminal, and obtaining an original channel matrix estimated value by a decoder network of the base station terminal through decoding. The invention can greatly improve the feature extraction capability and the CSI reconstruction performance of the model algorithm.

Description

Deep learning-based large-scale MIMO channel state information feedback and reconstruction method

Technical Field

The invention belongs to the technical field of communication, and relates to a channel state information feedback method of a wireless communication system, in particular to a channel state information feedback and reconstruction method of a large-scale Multiple-in Multiple-out (MIMO) communication system based on deep learning.

Background

Massive MIMO is considered to be one of the most potential core technologies of 5G because of its many advantages, such as improving system capacity, spectral efficiency, user experience rate, reducing delay and inter-user interference, enhancing full-dimensional coverage, and saving energy consumption.

However, the development and application of massive MIMO also faces a number of problems. For example, the performance of massive MIMO is closely related to the quality of the channel state information (Channel State Information, CSI) obtained by the transmitter. For the uplink, only the user side needs to send the training pilot frequency, the BS can easily accurately estimate the CSI through the pilot frequency sent by the terminal, and the CSI acquisition of the downlink is difficult to achieve, which is also a problem to be solved in the current massive MIMO technology. In time division duplex (Time Division Duplexing, TDD) mode, the Base Station (BS) performs channel estimation by transmitting training pilot on the uplink, and then uses reciprocity to infer the CSI of the downlink. While in frequency division duplex (Frequency Division Duplexing, FDD) mode there is weak reciprocity, which makes it difficult to infer the CSI of the downlink by observing the uplink CSI. In a conventional MIMO system, downlink CSI of the FDD system is first acquired by a user side through downlink pilot estimation and then fed back to the BS, and then feedback overhead is reduced by adopting a codebook-based or vector quantization method, but this approach is not feasible in massive MIMO, because a large number of antennas are used at the BS, which greatly increases the dimension of the CSI matrix, and the feedback amount and design complexity of the codebook are significantly improved. Meanwhile, as reciprocity does not exist in the uplink and the downlink, only the downlink CSI can be fed back through the uplink transmission link, uplink resources are occupied, and great bandwidth waste is caused. Therefore, how to feed back the downlink channel state information to the base station with high accuracy and low overhead becomes a problem to be solved in the development of the massive MIMO communication system in the FDD mode.

Since there is shared local scattering in the actual channel physical propagation environment, and the multi-antenna array of the massive MIMO system gives channels a certain spatial correlation, the massive MIMO system exhibits joint sparsity in the user channel matrix, which also inspires the proposal of the CSI feedback scheme based on compressed sensing (Compressive Sensing, CS). Firstly, a CSI matrix is expressed as a sparse matrix under a certain base, then a code word with lower dimensionality is obtained through compressed sampling, and then an original CSI matrix is reconstructed from the code word by utilizing a compressed sensing theory and a related optimization algorithm. However, the real channel is not perfectly sparse, and the CS-based method does not consider the problems of channel delay difference, channel estimation error, feedback link error and calculation complexity brought by improving the reconstruction precision in the practical application, so that the algorithm cannot completely reserve the channel structure information, and the channel reconstruction performance is poor.

With the rapid development of artificial intelligence technology, a deep learning network model algorithm CsiNet based on a compressed sensing theory is proposed and used for channel state information feedback and reconstruction. The model algorithm belongs to a self-encoder network, and comprises a pair of encoder and decoder networks. The encoder network encodes the channel state information at the user end, and the decoder decodes and reconstructs the feedback information at the receiving end. Experiments prove that the performance of the algorithm is obviously superior to that of the traditional algorithm. However, the CsiNet network has relatively weak capability of extracting the compressed information features, the channel state information reconstruction performance of the receiving end still has a relatively large improvement space, and the reconstruction performance of the CsiNet under the condition of low compression ratio is relatively poor, which is not preferable in a large-scale MIMO system with long coherence time.

Disclosure of Invention

In view of this, the present invention provides a method for feeding back and reconstructing state information of a large-scale MIMO channel based on deep learning, which can solve the problem of poor channel reconstruction performance of CSI feedback in an FDD system, and can still maintain good channel reconstruction performance under the condition of low compression rate.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a feedback and reconstruction method of large-scale MIMO channel state information based on deep learning comprises the following steps:

step 1: setting up a depth neural network model CBAM-CsiNet, wherein the model comprises an encoder network, a decoder network and a CBAM attention module, the encoder network is positioned at a user terminal and used for compressing and encoding a channel matrix H into a codeword s with lower dimension, the decoder network is positioned at a base station end and used for reconstructing an estimated value of an original channel matrix from the codeword s transmitted by a feedback link

Step 2: generating sample data sets required for training, validation and testing under indoor and outdoor conditions, respectively, using a COST 2100 model;

step 3: training the built CBAM-CsiNet neural network model on a training set so that

The method is as close as possible to H, and an Adam gradient optimization algorithm is selected for gradient optimization, so that the loss function is reduced to the minimum and finally tends to be stable;

step 4: saving the model parameters trained in the step 3, including weight, bias, step length and convolution kernel size;

step 5: loading the stored model parameters, and performing model performance test on the test set;

step 6: the best-test model is used for channel state information feedback and reconstruction, and the channel matrix of MIMO channel state information in the space-frequency domain is carried out at the user end

Performing two-dimensional discrete Fourier transform to obtain a channel matrix H sparse in an angular delay domain, inputting the channel matrix H from an encoder network deployed at a user terminal, performing compression encoding, transmitting to a base station end through a feedback link, and decoding by a decoder network at the base station end to obtain an original channel matrix estimated value +.>

Further, in step 1, the encoder network includes a convolution layer and a full connection layer, parameters of each layer of the network are initialized randomly, the input of the encoder network is a channel matrix H with sparse angular delay domain, the output is a codeword s obtained after compression encoding, and the codeword is a one-dimensional vector with lower dimension than H;

the code word s is transmitted to a base station end through a feedback link, and decoding operation is carried out through a decoder network of the base station end, so that an original channel matrix estimated value is obtained;

the decoder network comprises a full connection layer, two refinet+ units and a convolution layer, random initialization operation is carried out on parameters of each layer of the decoder network, a codeword s is input into the decoder network, and a reconstructed channel matrix estimated value with the same dimension as a channel matrix H is output;

the CBAM attention module comprises a channel attention module and a space attention module, an intermediate feature map is given, the CBAM attention module sequentially deduces an attention map according to two independent dimensions of a channel and a space, and then the attention map is multiplied into the input feature map to carry out self-adaptive feature refinement; the channel attention module generates a channel attention feature map by utilizing the feature relation among channels, and is used for searching key input features which need to be focused by a network; the space attention module generates a space attention feature map through the space relation of feature space, and is used for searching the position of important information features to be extracted, and the channel attention module and the space attention module form a CBAM attention module in a serial arrangement mode;

the last convolutional layer of the decoder network uses Sigmoid activation function, the other convolutional layers of the encoder network and the decoder network all adopt ReLU activation function and use batch standardization operation, and the full connection layer adopts linear activation function.

Further, the specific mode of the step 2 is as follows:

using COST 2100 model, generating 150000 space-frequency domain channel matrix samples in 5.3GHz indoor picocell scene and 300MHz outdoor rural scene, and dividing matrix sample data into training set, verification set and test set; the training set comprises 100000 sample data and is used for driving the neural network model to train, the verification set comprises 30000 sample data and is used for verifying the model convergence condition in the training process, and the test set comprises 20000 sample data and is used for testing the training effect of the neural network model.

Further, step 3 adopts Adam gradient optimization algorithm and end-to-end learning mode to jointly train parameters of the encoder network and the decoder network, so that the loss function is minimum; the loss function is the estimated value of the original channel matrix output by the decoder network

The mean square error with the real channel matrix H, the loss function L is specifically expressed as follows:

wherein T is the number of all samples of the training set, I.I ₂ For euclidean norms, the subscript i denotes the number of the estimated channel matrix and the original channel matrix.

Further, the refinet+ unit is composed of 1 input layer, 3 convolution layers and 1 CBAM attention module, and the data of the input layers, the output of the third convolution layer and the output of the CBAM attention module are added through a jump connection structure to be used as the output of the refinet+ unit.

The invention has the beneficial effects that:

(1) Compared with the traditional algorithm based on compressed sensing, the method adopts the algorithm based on the CBAM-CsiNet neural network, does not depend on perfect sparse characteristics of channels, structurally uses an encoder network to replace a random measurement method, uses a decoder network to replace the traditional iterative reconstruction algorithm, and greatly improves channel state information feedback and reconstruction performance and accuracy.

(2) Compared with the current algorithm based on deep learning, the algorithm based on the CBAM-CsiNet neural network is adopted, and the CBAM attention module is introduced, so that the feature extraction capability is stronger, key information reserved in the decoding process is more, the information loss is smaller, the performance is obviously improved, and good reconstruction performance and accuracy can be maintained under the condition of low compression rate.

Drawings

FIG. 1 is a schematic diagram of the structure of a CBAM-CsiNet neural network model in an embodiment of the invention.

Fig. 2 is a schematic diagram of a structure of a refianenet+ unit according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of the structure of a CBAM attention module in an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a channel attention module in an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a spatial attention module according to an embodiment of the present invention.

Note that the numbers above the modules in fig. 1 and 2 indicate the number of feature maps generated.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

A large-scale MIMO channel state information feedback and reconstruction method based on deep learning is characterized in that a neural network model framework is built on the basis of the deep learning theory, the framework comprises a pair of encoders and decoders, the encoders are arranged at a mobile user terminal, the decoders are arranged at a base station end, the neural network model is trained on channel state information data, so that the encoders can compress and encode the channel state information into low-dimension code words, the low-dimension code words are transmitted to the base station end through a feedback link, and the decoder at the base station end reconstructs the channel state information.

Specifically, the method comprises the following steps:

step 6: the best model of the performance test data index is used for channel state information feedback and reconstruction, and the channel matrix of MIMO channel state information in the space-frequency domain is carried out at the user station

Wherein, the data index for performance test can adopt Normalized Mean Square Error (NMSE).

Further, the specific mode of the step 2 is as follows:

The method can reduce the calculated amount and feedback expenditure of the receiving and transmitting end and improve the channel reconstruction performance and accuracy of the algorithm.

The following is a more specific example:

a large-scale MIMO communication system, its downlink base station end transmitting antenna number is N _t =32, the user receiver uses a single antenna, the number of subcarriers is

The massive MIMO system adopts an OFDM signal modulation mode.

Based on the above conditions, the COST 2100 model was used to generate the sample data sets required for training, validation, and testing, respectively, under indoor and outdoor conditions. Specifically, 150000 space-frequency domain channel matrix samples are generated in a 5.3GHz indoor picocell scenario and a 300MHz outdoor rural scenario, and the matrix sample data is divided into a training set, a validation set, and a test set. The training set comprises 100000 sample data and is used for driving the neural network model to train, the verification set comprises 30000 sample data and is used for verifying the model convergence condition in the training process, and the test set comprises 20000 sample data and is used for testing the training effect of the neural network model. Using

And N _t ×N _t 32×32 discrete fourier transform matrix F _d And F _a Channel matrix for each space-frequency domain in the samples +.>

Performing two-dimensional discrete Fourier transform to obtain sparse channel matrix in angular delay domain>

Namely +.>

Because the multipath arrival time delay is controlled in a limited time range, the multipath arrival time delay is in a delay domainOn the channel matrix H is only preceding +.>

There are values on the rows, so the previous w=32 row element values are retained, corrected to 32×32 channel matrix +.>

Encoder network in CBAM-CsiNet architecture as shown in FIG. 1, complex domain channel matrix

The real and imaginary parts of (a) are split into two feature maps of 32 x 32 size, i.e. two real matrices of 32 x 32 size. The two matrixes are straightened and recombined into a 2048 multiplied by 1 vector, the vector is input into a second layer of an encoder, namely a full-connection layer containing M neurons, and a linear activation function is adopted to output an Mmultiplied by 1 vector s, namely a compressed and coded codeword to be transmitted to a base station end by a user terminal. />

The decoder network in the CBAM-csifet architecture is shown in fig. 1, where the decoder network includes a full connection layer, two refinet+ units and a convolution layer, the refinet+ units include an input layer, three convolution layers and a CBAM attention module, and adds the input data of the input layer to the output of the last convolution layer and the output of the CBAM attention module through a jump connection structure, as shown in fig. 2. The first layer is a fully connected layer containing 2048 neurons, takes a received codeword s as an input, and outputs a 2048×1 vector by adopting a linear activation function. The vector is input to the second layer of the decoder network, namely a refinnenet+ unit, the first layer of which is the input layer, and the 2048×1 vector input from the previous layer is recombined into two real matrices with the size of 32×32, which are respectively used as the initialization values of the real part and the imaginary part of the estimated channel matrix. The second, third and fourth layers of the RefineNet+ are all convolution layers, respectively adopting 8, 16 and 2 convolution kernels with the size of 3 multiplied by 3, adopting proper zero padding, reLU activation function and batch normalization operation (Batch Normalization), so that the characteristic diagram after each convolutionThe size of the channel matrix is 32×32, and the size is consistent with the size of the original channel matrix H. In addition, the data of the input layer is added with the data output by the third convolution layer and the data output by the CBAM attention module through a jump connection structure to be used as the output of the whole Reffield+ unit. The addition of the hopped link structure reduces the loss of information during the back propagation through the convolutional layer and the network. The output of the first refinet+ unit is used as the input of a second refinet+ unit, the structure of the second refinet+ unit is the same as that of the first refinet+ unit, the output characteristic diagram is input into the last convolution layer of the decoder, and the Sigmoid activation function is used to limit the output value range to be 0,1]The interval, and thus the output of the decoder network is two real matrices of 32 x 32 size, respectively, as the final reconstructed channel matrix

Real and imaginary parts of (a) are provided.

The CBAM attention module is an efficient feed forward convolutional neural network, comprising two parts, a channel attention module and a spatial attention module, as shown in fig. 3. Given an intermediate feature map, the module sequentially extrapolates the attention map in two independent dimensions of the channel and space, and then multiplies the attention map to the input feature map for adaptive feature refinement. Given an intermediate feature

As input, the CBAM attention module predicts a one-dimensional channel attention profile +.>

And a two-dimensional spatial attention profile +.>

The entire attention process can be summarized as:

wherein,,

representing element-wise multiplication, during which the channel attention value propagates along the spatial dimension to make a copy, and vice versa. F "is the refinement result of the final output of the CBAM attention module.

The channel attention module is shown in fig. 4, and the module uses the characteristic relation among channels to generate a channel attention characteristic diagram. Since each channel of the feature map is equivalent to a feature detector, the channel attention is mainly used to find the key input features that the network needs to pay attention to. In order to calculate channel attention more efficiently, the spatial dimensions of the input feature space need to be compressed. The channel attention module simultaneously comprises two pooling operations of mean pooling and maximum pooling, the mean pooling can effectively aggregate spatial information characteristics, the maximum pooling operation is used for searching other related information of different characteristic objects so as to acquire more accurate channel attention characteristics, and compared with a method using only one pooling operation, the method combining two types of red operations can greatly improve the characterization capability of a network.

The specific operation process of the channel attention module is as follows:

first, two different spatial feature maps are generated using spatial information of the feature maps aggregated by mean pooling and maximum pooling operations

And->

Wherein->

Representing the mean pooling feature,/->

Representing a maximum pooling feature. />

The two feature maps are then fed together into a shared network to generate a channel attention profile

The shared network is a two-layer neural network comprising a multi-layer perceptron (MLP) and a hidden layer. After the two feature maps are respectively sent into the shared network, the feature vectors are combined and output by using an element summation method.

The calculation of channel attention can be expressed as:

where s represents the Sigmoid function,

and the weight W of MLP ₀ And W is ₁ Is shared for both inputs.

The spatial attention module generates a spatial attention profile primarily through spatial relationships of feature spaces as shown in fig. 5. Unlike channel attention, spatial attention is mainly focused on where important information features need to be extracted, and is complementary to channel attention. To calculate spatial attention, it is necessary to first use the mean-pooling and maximum-pooling operations, respectively, along the direction of the channel axis, and concatenate the output vectors to generate an efficient feature description, followed by a convolution layer to generate the spatial attention feature map

The attention profile is used for encoding where special attention or suppression is required.

The specific implementation steps of the spatial attention module are as follows:

two-dimensional maps are generated by aggregating the channel information of the feature map using two pooling operations:

and->

The two-dimensional maps respectively represent the mean pooling feature and the maximum pooling feature in the channel direction;

then, two features are connected through a convolution layer and convolution operation is carried out, so that a two-dimensional space attention feature map is generated. The specific calculation method of the spatial attention is as follows:

where s represents a Sigmoid activation function, f ^7×7 Indicating that the convolution operation employs a convolution kernel of size 7 x 7.

The channel attention module and the space attention module form a CBAM attention module in a serial arrangement mode. The introduction of the CBAM attention module improves the feature extraction capability of the decoder network, so that the decoder network can keep as many detail features as possible when decoding operation is carried out according to the compressed code word, the information loss is reduced, the decoding capability of a base station terminal is improved, and the network channel state information reconstruction performance is improved.

The loss function of the CBAM-CsiNet neural network model is a channel matrix output by a decoder network

The mean square error with the real channel matrix H is formulated as:

wherein T is the number of all samples of the training set, I.I ₂ Is the Frobenius norm.

And (3) training the built neural network model by using 100000 training set channel matrix H samples generated in the step (1), selecting an Adam optimizer to perform gradient optimization, and jointly training parameters of an encoder network and a decoder network in an end-to-end learning mode, wherein the parameters mainly comprise weight, bias, step length, convolution kernel size and the like, so that a loss function is minimum and tends to be stable. Wherein the learning rate adopted in Adam algorithm is 0.001, each iteration is to use 512 samples in the training set to calculate the gradient, and update the parameters according to the formula of Adam algorithm, so as to traverse the whole training set 1000 times. After training, the relevant model weight parameters are saved, and then the performance of the network model can be tested by using a test set.

The CBAM-CsiNet network model with good test performance can be used for channel state information feedback of the MIMO system. The space-frequency domain channel state information is converted into a channel matrix H of an angular delay domain, the channel matrix H is input into a CBAM-CsiNet architecture, and the output is the channel matrix after model reconstruction

And carrying out two-dimensional inverse discrete Fourier transform on the matrix to recover the channel state information of the original space-frequency domain.

The performance of the channel state information feedback and reconstruction method based on the CBAM-CsiNet neural network model algorithm in the large-scale MIMO communication system of the embodiment is obviously superior to that of the traditional algorithm and the existing deep learning algorithm through tests, as shown in the table 1. Wherein NMSE represents normalized mean square error, and calculates reconstructed channel matrix

An expected value of the square of the difference with the original channel matrix H; r represents the reconstructed channel matrix->

Cosine similarity with the original channel matrix H.

Table 1 comparison of the algorithm of this example with other algorithm performance test results

In a word, the invention aims at the defects that the traditional channel state information (Channel State Information, CSI) feedback and reconstruction algorithm is too dependent on perfect sparsity of channels, high feedback overhead and weak CSI reconstruction capability, and the traditional CSI feedback and reconstruction method based on a deep learning algorithm is relatively weak in characteristic extraction capability, and the CSI reconstruction performance is especially poor in CSI feedback and reconstruction performance under low compression rate; and by using channel matrix data as a drive, the dependence of the algorithm on perfect sparsity of the channel is greatly reduced. Meanwhile, the model adopts the modes of off-line training and on-line testing, and transfers the calculated amount and the cost to the off-line training process, thereby greatly reducing the feedback cost in the practical application of the algorithm. Theoretical analysis and simulation results show that the method has obvious advantages compared with the existing algorithm.

Claims

1. A method for feeding back and reconstructing large-scale MIMO channel state information based on deep learning is characterized by comprising the following steps:

；

Performing two-dimensional discrete Fourier transform to obtain a channel matrix H sparse in an angular delay domain, inputting the channel matrix H from an encoder network deployed at a user terminal, performing compression encoding, transmitting to a base station end through a feedback link, and obtaining an original channel matrix estimated value by a decoder network of the base station end through decoding

。

2. The method for feeding back and reconstructing large-scale MIMO channel state information based on deep learning according to claim 1, wherein in step 1, the encoder network comprises a convolution layer and a full connection layer, parameters of each layer of the network are randomly initialized, the input of the encoder network is a channel matrix H with sparse angular delay domain, the output is a codeword s obtained after compression encoding, and the codeword is a one-dimensional vector with lower dimension than H;

3. The method for feeding back and reconstructing large-scale MIMO channel state information based on deep learning as claimed in claim 1, wherein the specific manner of step 2 is as follows:

4. The deep learning-based large-scale MIMO channel state information feedback and reconstruction method according to claim 1, wherein step 3 adopts Adam gradient optimization algorithm and end-to-end learning mode, and the encoder network and decoder network parameters are jointly trained so as to minimize the loss function; the loss function is the estimated value of the original channel matrix output by the decoder network

Mean square error with real channel matrix H, loss functionLThe concrete representation is as follows:

wherein T is the number of all samples of the training set,

is Euclidean norm, subscriptiThe number representing the estimated channel matrix and the original channel matrix.

5. The deep learning-based massive MIMO channel state information feedback and reconstruction method according to claim 2, wherein the refinnenet+ unit is composed of 1 input layer, 3 convolution layers and 1 CBAM attention module, and the data of the input layer, the output of the third convolution layer and the output of the CBAM attention module are added together through a jump connection structure to be used as the output of the refinnenet+ unit.