CN113660046A

CN113660046A - Method for accelerating generation of large-scale wireless channel coefficients

Info

Publication number: CN113660046A
Application number: CN202110941874.7A
Authority: CN
Inventors: 张念祖; 严康宁; 蒋政波; 洪伟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2021-11-16
Anticipated expiration: 2041-08-17
Also published as: CN113660046B

Abstract

The invention discloses a method for accelerating the generation of a large-scale wireless channel coefficient, which comprises the following steps: selecting a channel model according to the 3GPP TS 38.901 standard, and inputting a channel scale parameter; generating parameters by using a central processing unit according to the determined channel model and the channel scale parameters; calculating initialization and respectively allocating a host memory and an equipment memory for the parameters according to the scale of the parameters generated by the central processing unit; copying parameters generated by the central processing unit from the central processing unit to the graphic processor; calling a kernel function to perform accelerated calculation to obtain a channel coefficient H [ U × S × N × D ]; the resulting channel coefficients are copied from the graphics processor back to the central processor. Compared with the traditional wireless channel coefficient generation method, especially for a large-scale multi-input multi-output channel in fifth-generation mobile communication, the method can realize an acceleration effect of tens of times to hundreds of times along with the increase of the channel scale, and has very high engineering value.

Description

Method for accelerating generation of large-scale wireless channel coefficients

Technical Field

The invention relates to the technical field of wireless channel coefficient generation, in particular to a method for accelerating the generation of large-scale wireless channel coefficients.

Background

As a core of a mobile communication system, a radio channel plays a crucial role in performance of the entire communication system. Therefore, intensive research into the characteristics of the wireless channel is necessary.

The channel simulation and simulation based on the computer can accurately and efficiently simulate various channel environments for the performance verification and test of the system and the terminal. With the proposal of Massive MIMO technology and the increasingly refined channel models, the scale of the channel coefficient required to be generated increases explosively, and the traditional channel coefficient generation method based on a central processing unit consumes too long time, which cannot meet the requirements of the current 5G large-scale multiple-input multiple-output channel simulation.

The TS 38.901 protocol defined by the mobile communication standardization organization 3GPP is a channel model and test standard for fifth generation mobile communication, and is applicable to mobile communication scenarios with frequency ranging from 0.5GHz to 100 GHz. All communication scenes are abstracted into 10 types according to the relative positions of a base station and a mobile station, the difference of the complexity of surrounding environment scatterers and the existence of a direct path (LOS). In addition, 5 kinds of Clustered Delay Line (CDL) channel models are also specified for the requirement of simplified modeling.

The generation of channel coefficients requires a large amount of parallel computation. The main frequency of the graphics processor is generally slower than that of the central processing unit, but the number of arithmetic logic units used for calculation is much larger than that of the central processing unit, so the graphics processor is suitable for massive parallel calculation.

The Unified computing Device Architecture (CUDA) was introduced by the graphics card manufacturer NVIDIA in 2007, and is a widely used parallel computing Architecture based on graphics processors. Developers do not need to learn new programming languages and grammars, only need to know some parallel computing knowledge and reasonably schedule threads, and therefore performance of the algorithm can be greatly improved.

Disclosure of Invention

In view of this, the present invention is directed to provide an acceleration method for large-scale wireless channel coefficient generation, which is used to solve the technical problems mentioned in the background art, and the method can achieve an acceleration effect of tens of times to hundreds of times, has a high engineering value, and has an advantage of 1 to 2 orders of magnitude in computation time as the scale of the mimo channel is larger, and the acceleration effect is more obvious.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for accelerating the generation of large-scale wireless channel coefficients comprises the following steps:

s1, selecting a channel model according to the 3GPP TS 38.901 standard, and inputting a channel scale parameter;

step S2, generating parameters by using a central processing unit according to the channel model and the channel scale parameters determined in the step S1;

step S3, calculating initialization and respectively allocating a host memory and a device memory for the parameters according to the scale of the parameters generated by the central processing unit in the step S2;

s4, copying the parameters generated by the CPU in the S2 from the CPU to the graphics processor;

step S5, calling a kernel function to perform accelerated calculation to obtain a channel coefficient H [ UxSxNxD ];

and step S6, copying the channel coefficient obtained in the step S5 from the graphics processor to the central processing unit.

Further, the channel scale parameter includes a receiving antenna number U, a transmitting antenna number S, and a sampling point number D, where the receiving antenna number U, the transmitting antenna number S, and the sampling point number D are positive integers, and the sampling point number D is an integer multiple of 1024.

Further, in step S2, the cpu generates specific parameters including normalized linear power P [ N ], directional diagram and cross polarization ratio factor F _ ALL [ U × S × N × M ], transmitting-side phase factor MOV1[ N × M ], receiving-side phase factor MOV2[ N × M ], and speed factor MOV3[ N × M ], where N is the number of clusters and M is the number of rays per cluster.

Further, the host memory is a memory on a motherboard of the central processing unit, and the device memory is a memory on a board card of the graphics processing unit.

Further, in the step S4, the parameters generated by the central processing unit in the step S2 are copied from the host to the graphics processor in the form of a memory, and the copying between the memories is realized through an interface provided by the unified computing device architecture.

Further, in step S5, the kernel function is a function that the central processing unit calls the graphics processor to perform calculation, and is in a format of < < grid _ size, block _ size > >, where grid _ size is configured as (U × S, N, D/1024), block _ size is configured as (1024,1,1), U × S × N × D sub-threads are started in the calculation process, each sub-thread generates one channel coefficient, and finally, the results calculated by the respective sub-threads are combined into a channel coefficient H [ U × S × N × D ].

Further, in step S6, the obtained channel coefficients are copied from the graphics processor memory to the host memory.

The invention has the beneficial effects that:

compared with the traditional wireless channel coefficient generation method, especially for a large-scale multi-input multi-output channel in fifth-generation mobile communication, the method can realize an acceleration effect of tens of times to hundreds of times along with the increase of the channel scale, and has very high engineering value.

Drawings

Fig. 1 is a device for accelerating the generation of large-scale wireless channel coefficients provided in embodiment 1;

FIG. 2 is a graph comparing the computation times for different channel sizes in an embodiment;

FIG. 3 is a comparison of various channel scale speed-up ratios in an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1, the present embodiment provides an apparatus for accelerating generation of large-scale wireless channel coefficients, the apparatus comprising:

the input module selects a channel model according to the 3GPP TS 38.901 standard and inputs a channel scale parameter;

an initialization and allocation module, which generates parameters by using a central processing unit according to the determined channel model and the channel scale parameters; then, calculating initialization and respectively allocating a host memory and an equipment memory for the parameters according to the scale of the parameters generated by the central processing unit; finally, copying the parameters generated by the central processing unit from the central processing unit to the graphic processor;

a kernel function acceleration module, which calls a kernel function to perform acceleration calculation to obtain a channel coefficient H [ U × S × N × D ];

and the output module copies the obtained channel coefficient from the graphic processor back to the central processing unit.

Example 2

The embodiment provides a method for accelerating generation of a large-scale wireless channel coefficient, which specifically comprises the following steps:

specifically, in the present embodiment, 10 conventional channel models defined in the TS 38.901 test standard and 5 simplified channel models such as CDL-A, CDL-B, CDL-C, CDL-D, CDL-E are provided, and the number N of clusters and the number M of sub-paths per cluster are different according to the models.

The channel scale parameters comprise the number U of receiving antennas, the number S of transmitting antennas and the number D of sampling points, wherein the number U of receiving antennas, the number S of transmitting antennas and the number D of sampling points are positive integers, and the number D of sampling points is an integral multiple of 1024.

specifically, in this embodiment, after determining the channel model and the channel size, the host generates the normalized linear power P [ N ], the directional pattern and cross-polarization ratio factor F _ ALL [ U × S × N × M ], the transmitting-side phase factor MOV1[ N × M ], the receiving-side phase factor MOV2[ N × M ], and the speed factor MOV3[ N × M ] according to the specification of the communication protocol, and these 5 parameters are ALL present in the form of an array and then copied to the GPU for further calculation.

specifically, since data transmission between the cpu and the gpu is via the respective memories wash, the host memory and the device memory are allocated according to the size of the parameter array. For example, the normalized linear power P [ N ] occupies 16 bytes, and the size of the memory to be allocated by P [ N ] is 16N.

specifically, the parameter array is copied from the host to the graphics processor in the form of a memory, and the copying between the memories is realized through an interface provided by a unified computing device architecture. For example, the function cudammcmpy (void fraction, const void fraction, src, size _ count, cudammcmypykid) where dst is the target memory head address, src is the source memory head address, count is the size of the copied memory in bytes, and kid is the direction of the copied memory, indicating that the memory is copied from the host to the device when kid is cudampycpyhosttovice, and indicating that the memory is copied from the device to the host when it is cudampycpydevicetotohost.

in step S5, the kernel function is a function that the cpu calls the graphics processor to perform calculation, and is in the format of < < < grid _ size, block _ size > >, where the grid _ size is configured as (U × S, N, D/1024), the block _ size is configured as (1024,1,1), U × S × N × D sub-threads are started in the calculation process, each sub-thread generates one channel coefficient, and finally, the results calculated by the respective sub-threads are combined into a channel coefficient H [ U × S × N × D ].

And step S6, copying the channel coefficient obtained in the step S5 from the graphics processor to the central processing unit. That is, the obtained channel coefficients are copied from the graphics processor memory to the host memory

To verify the effectiveness and universality of the present invention, a number of different cpus and graphics processors were selected for a number of tests to calculate channel coefficients for different data volumes and to test the average calculation time, as shown in fig. 2. For a single-input single-output channel, the traditional method generates the channel coefficient with the same data volume, and the calculation time is far longer than the time consumed by the method. And with the increase of the number of the sampling points of the calculated channel coefficient, the time required by the traditional method is rapidly increased, but the calculation time of the method is increased, but the increase rate is far lower than that of the original method. As can be seen from fig. 3, as the size of the mimo channel is larger, the calculation time has an advantage of 1 to 2 orders of magnitude, and the acceleration effect is more obvious.

The invention is not described in detail, but is well known to those skilled in the art. The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A method for accelerating the generation of large-scale wireless channel coefficients is characterized by comprising the following steps:

2. The method of claim 1, wherein the channel scale parameters include a number of receiving antennas U, a number of transmitting antennas S, and a number of sampling points D, wherein the number of receiving antennas U, the number of transmitting antennas S, and the number of sampling points D are positive integers, and the number of sampling points D is an integer multiple of 1024.

3. The method as claimed in claim 2, wherein in step S2, the cpu generates specific parameters including normalized linear power P [ N ], directional pattern and cross polarization ratio factor F _ ALL [ U × S × N × M ], transmitting-side phase factor MOV1[ N × M ], receiving-side phase factor MOV2[ N × M ], and speed factor MOV3[ N × M ], where N represents the number of clusters and M represents the number of rays per cluster.

4. The method of claim 3, wherein the host memory is a memory on a motherboard of a Central Processing Unit (CPU), and the device memory is a memory on a graphics processor board (GPU) card.

5. The method of claim 4, wherein in step S4, the parameters generated by the CPU in step S2 are copied from the host to the graphics processor in the form of memory, and the copying between memories is implemented through an interface provided by the unified computing device architecture.

6. The method of claim 5, wherein in step S5, the kernel function is a function that the CPU calls the graphics processor to perform calculation, and is in the format of < < grid size, block size > >, where grid size is configured as (UxS, N, D/1024) and block size is configured as (1024,1,1), U x S x N x D sub-threads are started in the calculation process, each sub-thread generates one channel coefficient, and finally the results calculated by each sub-thread are combined into a channel coefficient H [ U x S x N x D ].

7. The method of claim 6, wherein in step S6, the obtained channel coefficients are copied from a graphics processor memory to a host memory.