CN111600681B

CN111600681B - Downlink bit level processing method based on FPGA hardware acceleration

Info

Publication number: CN111600681B
Application number: CN202010414777.8A
Authority: CN
Inventors: 王程; 徐闻璐; 张志丽; 王卫东
Original assignee: Beijing University of Posts and Telecommunications; CETC 54 Research Institute
Current assignee: Beijing University of Posts and Telecommunications; CETC 54 Research Institute
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2022-07-01
Anticipated expiration: 2040-05-15
Also published as: CN111600681A

Abstract

The invention relates to a downlink bit level processing method based on FPGA hardware acceleration, which can be applied to high-capacity real-time signal and protocol processing of a satellite mobile communication virtualization gateway station based on LTE. The invention realizes the functions of the MAC layer and the above functions in the CPU; on a hardware accelerator FPGA, data to be transmitted is processed and transmitted by a data processing module, a transmission block CRC24A adding module, a code block segmentation parameter calculating module and a code block CRC24B adding module by adopting an 8-bit parallel transmission architecture, then bit collection, selection and pruning are carried out after parallel-serial conversion, a Turbo coding module and a code block interleaving module, finally code blocks are output after cascade connection, and bit level data processing of the whole PDSCH channel is completed. The invention reduces the signal processing time delay and improves the processing capacity of large-capacity data of the virtual gateway station and the real-time property of communication transmission.

Description

Downlink bit level processing method based on FPGA hardware acceleration

Technical Field

The invention belongs to the technical field of mobile communication, and particularly relates to a technology for realizing LTE (long term evolution) physical layer bit-level data processing by utilizing an FPGA (field programmable gate array), which can be applied to high-capacity real-time signal and protocol processing of a satellite mobile communication virtualization gateway station based on LTE.

Background

With the evolution of satellite mobile communication systems, high-capacity data transmission places higher demands on communication resources, computing resources and storage resources of satellite gateway stations. In a satellite mobile communication system, a MAC (media access control) layer needs to complete tasks such as resource scheduling, multiplexing, demultiplexing, HARQ (hybrid automatic repeat request), random access, and the like, which is heavy in processing burden for the MAC layer. Therefore, on the premise of ensuring the correct time sequence of data interaction between the base station and the user, the pressure of an upper layer can be relieved by improving the data processing efficiency of the physical layer channel and reducing the signal processing time of the physical layer. An FPGA, i.e. a field programmable gate array, is a semiconductor device that can be programmed primarily for application or functional requirements. Has been widely applied to the field of heterogeneous acceleration, and has shown better computing performance than a general-purpose processor CPU (central processing unit). The FPGA is used for realizing the data processing process of the system physical layer downlink shared channel (PDSCH), and the data processing process is used as a core device for data processing, so that the data processing performance of the system can be effectively improved. The FPGA has an advantage over the DSP (digital signal processing) in that it has a good timing control capability, and thus, it is more suitable and efficient to use the FPGA for data processing of the PDSCH than the DSP. The bit-level part of the PDSCH involves a large amount of computation from the processes of CRC (cyclic redundancy check) check code addition, code block segmentation and code block CRC addition, Turbo coding, rate matching, etc. of the transport block, and the requirement on real-time performance is very strict. On the other hand, considering the principle of resource optimization utilization for FPGA design, for the allocation of on-chip logic resources and storage resources, the parallel processing architecture and ping-pong read-write storage data operation can greatly increase the speed of data stream processing, reserve more time for the upper layer to schedule, and achieve the purpose of hardware acceleration.

At present, in the PDSCH bit-level FPGA design, the serial data transmission mode may reduce the difficulty of module algorithm design, but may increase the delay of the entire mobile communication system. In addition, complex sequential logicality needs to be considered in the design of the PDSCH bit-level FPGA codes, and the problem of data disorder and loss brings difficulty to the design.

Disclosure of Invention

In order to meet the requirements of high-capacity real-time signal and protocol processing in a satellite mobile communication system, the invention designs an implementation scheme of a PDSCH bit-level data processing module based on FPGA hardware acceleration, and improves the efficiency of the mobile communication system and ensures the correctness of the time sequence of the mobile communication system by a data processing scheme combining parallel and serial and reasonable design of a RAM (random access memory).

The invention provides a downlink bit level processing method based on FPGA hardware acceleration, which combines a general processing platform CPU and a hardware accelerator FPGA, realizes the functions of an MAC layer and above in the CPU, and realizes the bit level signal processing function of a satellite mobile communication system physical layer on the FPGA. The functions of the method of the invention respectively realized on the general processing platform CPU and the hardware accelerator FPGA are as follows.

The functions of an upper network layer including an MAC layer are realized on a CPU of a general processing platform, and data interaction is carried out with a hardware accelerator realized by an FPGA through a high-speed exchange interface.

A clock generation module, a data processing module, a transmission block CRC24A adding module, a code block segmentation parameter calculation module, a code block CRC24B adding module, a parallel-serial conversion module, a Turbo coding module, a code block interleaving module, a bit collection and punching module and a code block cascading module are arranged on a hardware accelerator FPGA; wherein, a clock signal is generated by a clock generation module; the data processing module receives data transmitted by the CPU and converts the data into a transmission block in a bit form, and after receiving a clock signal transmitted by the clock generation module, the data processing module transmits a control signal and transmits the transmission block to the transmission block CRC24A adding module by adopting an 8-bit parallel transmission architecture; the transmission block CRC24A adding module adds CRC check codes to transmission block data, sends a control signal to the code block segmentation parameter calculating module to calculate the number of code blocks, and transmits the data added with the check codes to the code block CRC24B adding module by adopting an 8-bit parallel transmission architecture; the code block CRC24B adding module adds CRC check codes to the received data by taking the code block as a unit according to the size and the number of the code blocks calculated by the code block segmentation parameter calculating module, and then transmits the data added with the check codes to the parallel-serial conversion module by adopting an 8-bit parallel transmission architecture; the parallel-serial conversion module converts the received parallel data into serial data after receiving the clock signal sent by the clock generation module, and sends a control signal and the serial data to the Turbo coding module; the Turbo coding module codes and outputs three paths of data and control signals to the code block interleaving module; the code block interleaving module respectively stores the received three paths of data in the RAM in a blocking mode, performs interleaving operation on the data code blocks, and then outputs three paths of interleaved data and control signals to the bit collecting and punching module; the bit collection and punching module carries out bit collection and punching operations on the three paths of data and outputs serial data and control signals to the code block cascade module; and the code block cascade module cascades and outputs the code block data to finish the downlink bit-level data processing.

Compared with the prior art, the PDSCH bit level processing method based on FPGA hardware acceleration has the advantages and positive effects that: in order to meet the requirements of high-capacity real-time signal and protocol processing in a satellite mobile communication system, a general processing platform and a hardware accelerator are combined, so that the virtualized gateway station has the capabilities of dynamic resource allocation and transmission parameter configuration, can realize real-time high-speed processing on transmission signals, and meets the time delay requirement of the LTE-based satellite mobile communication system. The invention realizes the transmission of parallel data and the conversion of parallel data on the FPGA, and reduces the time delay of data transmission. Meanwhile, the invention considers the sequential logicality among the modules on the FPGA, adopts the control signal to ensure that the internal sequential design of the modules strictly conforms to the relevant protocol regulations of 3GPP, and realizes the high-efficiency and accurate processing of PDSCH bit-level data, thereby improving the processing capacity of large-capacity data and the real-time performance of communication transmission.

Drawings

Fig. 1 is a block diagram of the present invention's FPGA hardware acceleration based downlink PDSCH bit-level design;

fig. 2 is a diagram of an FPGA design of the code block interleaving module of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The existing general processor platform realizes the operation processing of the whole link on a CPU, and needs to consume more time when facing complex calculation. The invention realizes the FPGA on the downstream bit-level data processing, thereby reducing the signal processing time delay and improving the processing capacity of large-capacity data of the virtualized gateway station and the real-time property of communication transmission.

As shown in fig. 1, the downlink PDSCH bit level processing method based on FPGA hardware acceleration of the present invention is implemented based on a combination of a general processing platform CPU and a hardware accelerator FPGA; the general processing platform CPU mainly performs functions such as selection of a communication transmission system, configuration of an algorithm set and a parameter set, for example, parameter configuration of a transmission system scheme, parameter configuration of a windowing/filtering scheme, parameter configuration of a variable parameter set and algorithm configuration in fig. 1; and the functions of an MAC layer and the functions of the MAC layer are realized in the CPU, and data interaction is carried out with the FPGA through a high-speed exchange interface. The FPGA mainly completes the signal processing function of the physical layer bit level of the satellite mobile communication system.

In the FPGA, a clock generation module generates a clock signal; then, the data is processed by a data processing module, a transmission block CRC24A adding module, a code block segmentation parameter calculating module and a code block CRC24B adding module, and 8bit parallel transmission architectures are adopted in the modules; and finally, the code blocks are output after being cascaded, and the bit level data processing process of the whole PDSCH channel is completed.

As shown in fig. 1, the functional blocks included in the FPGA are a clock generation module, a data processing module, a transport block CRC24A addition module, a code block segmentation parameter calculation module, a code block CRC24B addition module, a parallel-to-serial conversion module, a Turbo coding module, a code block interleaving module, a bit collection, puncturing module, and a code block concatenation module.

The clock generation module generates a clock signal clk _ clc to be transmitted to the data processing module, and generates a clock signal clk _ turbo to the parallel-serial conversion module.

The data processing module receives upper layer data transmitted by the CPU and converts the data into a bit form, and the data to be transmitted is called a transmission block. After receiving the clock signal clk _ clc, the data processing module sends a control signal and transmits the transmission block to the transmission block CRC24A addition module using an 8-bit parallel transmission architecture. The control signal transmitted between the modules refers to both the start and end signals of the transmission data. Since there is a time delay after the data is processed by the modules and the length of the data is also changed, the control signal needs to be transmitted to the next module after each module processes the data.

The transmission block CRC24A adding module adds CRC24A (24-bit cyclic redundancy check code) to the transmission block data, and then sends a control signal to the code block segmentation parameter calculating module to calculate the number of the code blocks; the transmission block CRC24A adding module transmits the control signal and the data added with the check code to the code block CRC24B adding module by adopting an 8-bit parallel transmission architecture. CRC24A and CRC24B are CRCs for two different calculation formats.

The code block segmentation parameter calculation module performs segmentation calculation on the data added with the CRC24A, performs code block segmentation if the data length is greater than 6144 bits, and transmits the calculated code block size and code block number to the code block CRC24B addition module.

The code block CRC24B adding module adds CRC24B (24-bit cyclic redundancy check code) to each code block data, and then transmits the data added with the check code again to the parallel-serial conversion module by adopting an 8-bit parallel transmission architecture.

The parallel-serial conversion module converts the received parallel data into serial data, and after receiving the clock signal, sends a control signal and transmits the serial data to the Turbo coding module. The clk _ Turbo clock signal in FIG. 1 is used to control the data processing rate of the Turbo coding module.

The Turbo coding module performs Turbo coding operation on the serial data and outputs three paths of data and control signals to the code block interleaving module. The control signal sent by the Turbo coding module is a start signal and an end signal of each path of data transmission of the three paths of data.

The code block interleaving module respectively stores the three paths of data input by the Turbo coding module in the RAM in a blocking mode, performs interleaving operation on the data code blocks, and then outputs the three paths of interleaved data and control signals to the bit collecting and punching module.

The bit collecting and punching module carries out bit information collection and bit punching operations on the input three paths of data and outputs a control signal and serial data to the code block cascading module.

And the code block cascade module cascades and outputs the data of each code block to finish the integral realization of the bit-level part of the downlink shared channel.

In the invention, the functions of the transmission block CRC24A adding module, the code block CRC24B adding module and the code block segmentation parameter calculating module are realized on the FPGA, and some improvements are made in the function realization.

According to the 3GPP protocol, if CRC serial operation is performed in a 1-bit mode according to a conventional CRC addition algorithm and output is performed after calculation, the time delay of the module is extremely large. Therefore, the operation of adding the 8-bit parallel CRC needs to be completed by using the characteristic that the FPGA can support a parallel computing mode, so that the transmission and computing time is greatly reduced. In addition, a counter of output byte needs to be set in the procedure of adding the module to the transport block CRC24A, when the size of the counter reaches (TBS/8) +3, which indicates that the original data of the transport block has been output, the output of the calculated check code of 3 bytes can be started. TBS represents the size of a transport block. To ensure that data is not lost, the transmission block CRC24A addition module starts outputting data after 5 clock cycles of the data transmission start signal i _ din _ start, and the output data is also transmitted in parallel by 8 bits, which is denoted as [7:0] o _ data _ out. Pulling up 1 cycle of the control signal o _ dout _ start for starting to output data at the same time of starting to output data indicates that data starts to be output, and this signal is connected to the input port of the next module as a control signal. The addition of the control signal is particularly important, the effect of starting and stopping in each module is achieved, and the time sequence correctness of the system can be guaranteed.

In the invention, because parallel processing operation can be realized in the FPGA, when CRC calculation of the transmission block is carried out, code block segmentation and code block CRC addition operation are carried out simultaneously. The code block segmentation parameter calculation module starts the calculation process of the code block segmentation parameters under the control of a signal transmitted by a previous-stage module (i.e. the transmission block CRC24A adding module), and transmits the calculated result to the code block CRC24B adding module, and the code block CRC24B adding module performs the adding operation of the code block CRC24B by the data output from the previous transmission block CRC24A adding module under the control of relevant parameters, and stores the calculated result in the RAM resource of the next stage for the channel coding operation. Three clock cycles are needed to output the code block cyclic redundancy check code after each time data with the corresponding code block size is output in time sequence, so that the output data and the input data cannot be synchronized in time sequence. Therefore, in order to prevent data loss, the code block CRC24B adding module stores the input data into RAM, and then fetches the data according to the size of the code block and the number of corresponding code blocks to calculate the check code.

When the code block segmentation starts to be calculated and one code block CRC adding operation is completed, the start of the parallel-to-serial conversion operation can be controlled by oda _ start signal (control signal sent by the code block CRC24B adding module to the parallel-to-serial conversion module). In the parallel-serial conversion module, data is input in parallel under 20M clock, 8 bits of data are input simultaneously in one clock cycle, and then output in series under 160M clock, and 1bit is output in one clock cycle. The clock frequency needs to satisfy the relation of 8 times to ensure that the data is not lost. The parallel-serial conversion module transmits the data into a DL _ encoder _3gpplte IP core provided by vivado development software in a serial mode, performs Turbo coding operation, and then outputs rsc1_ systematic, rsc1_ parity0 and rsc2_ parity0 in three paths, and transmits the data into the next module, namely a code block interleaving module, in a serial mode.

The invention also improves the code block interleaving module and the bit collecting and punching module so as to realize the requirement of real-time high-speed processing of transmission signals and meet the time delay requirement of the LTE-based satellite mobile communication system.

The code block interleaving module consists of two interleavers and three data memories, as shown in fig. 2. The three data memories are two dual-port RAMs of D0, D1 and D2 paths of data, and the two interleavers are respectively a first two-path sub-block interleaver and a third-path sub-block interleaver. The first two path sub-block interleaver performs interleaving operation on the sub-blocks of the first two paths of data, and the third path sub-block interleaver performs interleaving operation on the resources of the third path of data. The three paths of data after Turbo coding are respectively stored in three data storage modules, two dual-port RAMs are called in each data storage module to carry out ping-pong storage operation on the data, and the situation that the data are washed away without being used is prevented. And when the storage of the first code block of the three paths of data is finished, the next interleaving operation can be carried out from the RAM according to the interleaving address. The reading sequence of the interleaved addresses is respectively obtained by the first two-path sub-block interleaver D0D1_ INTER _ CAL and the third-path sub-block interleaver D2_ INTER _ CAL through interleaving calculation. The first two paths of data adopt addresses obtained by the first two paths of sub-block interleaver algorithms, and the third path of data adopt addresses obtained by the third path of sub-block interleaver algorithms. And respectively transmitting the obtained interleaving addresses addr _ rd _ interleaver _ d0d1 and addr _ rd _ interleaver _ d2 to three data storage modules for data reading. The first two-path sub-block interleaver transmits the interleaving address back to the first two-path data storage module for reading the RAM, when the first code block is written into the first data storage RAM1, the second code block is written into the second data storage RAM2, and simultaneously the written RAM1 is read, the data is sequentially output according to the interleaving address, and the period of the control signal dout _ TB _ start is pulled up while the data is output. And the third path of sub-block interleaver transmits the interleaving address back to the third path of data storage module for RAM reading operation, and sequentially outputs data according to the interleaving address, and raises the period of the data output control signal when outputting the data in the same way.

In fig. 2, the code block interleaving module receives a code block length signal Kr, a code block number signal CB, and a clock signal clk, receives three paths of input data from the Turbo encoding module, and outputs three paths of data ram _ dout _ D0[1:0], ram _ dout _ D1[1:0], and ram _ dout _ D2[1:0] after code block interleaving. In FIG. 2, dout _ TB _ start is the output data start flag, dout _ TB _ end is the output data end flag, dout _ valid is the output data valid time flag, and RTC [12:0] is a parameter in the interleaving calculation process, representing the number of rows in the interleaving matrix.

In the bit collecting and punching module, after the code block interleaving operation process is finished, the operation process of bit collection is needed, the bit collecting operation process is realized by collecting the bit information output after sub-block interleaving, namely, the function of a circular buffer is realized. And the bit collection and punching module correspondingly stores the generated three paths of data according to a bit collection rule through a virtual circular buffer. When the FPGA is implemented, the RAM IP core only has two ports and cannot call memory data resources in the chip, so that a three-port RAM is designed and compiled, three ports A, B and C are provided, the three ports can be used for reading and writing access to storage, and three paths of interleaved data can be written into the memory at the same time. In addition, the amount of the collected data is large, the storage position of the collected data is fixed, but the data cannot be processed in real time after transmission processing, so that the bit collection and punching module adopts two three-port RAMs to perform cache processing operation through ping-pong operation.

Claims

1. A downlink bit-level processing method based on FPGA hardware acceleration is realized based on a mode of combining a general processing platform and a hardware accelerator, and is characterized in that the method comprises the following steps:

(1) the method comprises the steps that the functions of an upper network layer including an MAC layer are realized on a CPU of a general processing platform, and data interaction is carried out with a hardware accelerator realized by an FPGA through a high-speed exchange interface; the configuration function set on the general processing platform comprises transmission system scheme parameter configuration, windowing/filtering scheme parameter configuration, variable parameter set and algorithm configuration;

(2) a clock generation module, a data processing module, a transmission block CRC24A adding module, a code block segmentation parameter calculation module, a code block CRC24B adding module, a parallel-serial conversion module, a Turbo coding module, a code block interleaving module, a bit collection and punching module and a code block cascading module are arranged on a hardware accelerator FPGA; wherein, a clock signal is generated by a clock generating module; the data processing module receives data transmitted by the CPU and converts the data into a transmission block in a bit form, and after receiving a clock signal transmitted by the clock generation module, the data processing module transmits a control signal and transmits the transmission block to the transmission block CRC24A adding module by adopting an 8-bit parallel transmission architecture; the transmission block CRC24A adding module adds CRC check codes to transmission block data, sends a control signal to the code block segmentation parameter calculating module to calculate the number of code blocks, and transmits the data added with the check codes to the code block CRC24B adding module by adopting an 8-bit parallel transmission architecture; the code block CRC24B adding module adds CRC check codes to the received data by taking the code block as a unit according to the size and the number of the code blocks calculated by the code block segmentation parameter calculating module, and then transmits the data added with the check codes to the parallel-serial conversion module by adopting an 8-bit parallel transmission architecture; the parallel-serial conversion module converts the received parallel data into serial data, and then sends a control signal and the serial data to the Turbo coding module after receiving a clock signal sent by the clock generation module; in the parallel-serial conversion module, data is input in parallel under 20M clock, 8 bits of data enter simultaneously in one clock period, and then are output in series under 160M clock, and 1bit is output in one clock period; the Turbo coding module codes and outputs three paths of data and control signals to the code block interleaving module; the code block interleaving module respectively stores the received three paths of data in the RAM in a blocking mode, performs interleaving operation on the data code blocks, and then outputs three paths of interleaved data and control signals to the bit collecting and punching module; the bit collection and punching module carries out bit collection and punching operations on the three paths of data and outputs serial data and control signals to the code block cascade module; the code block cascade module cascades and outputs the data of each code block to finish the downlink bit-level data processing;

the code block interleaving module consists of two interleavers and three data memories; the three data memories are two dual-port RAMs, and the two interleavers are a first two-path sub-block interleaver and a third-path sub-block interleaver respectively; three paths of data after Turbo coding are respectively stored in three data memories, and two dual-port RAMs are called in each data memory to carry out ping-pong storage operation on the data; the first two paths of sub-block interleavers perform interleaving operation on the sub-blocks of the first two paths of data, and the third path of sub-block interleavers perform interleaving operation on the sub-blocks of the third path of data; in the code block interleaving module, after the first code block of the three paths of data is stored, the next interleaving operation is carried out by taking the data from the RAM according to the interleaving address; the reading sequence of the interleaving addresses is respectively obtained by the first two paths of sub-block interleavers and the third path of sub-block interleavers through interleaving calculation; the first two-path sub-block interleaver returns the interleaving address to the first two-path data storage module to read the RAM, when the first code block is written into the first data storage RAM1, the second code block is written into the second data storage RAM2, and simultaneously the written RAM1 is read, the data is sequentially output according to the interleaving address, and the period of the control signal is increased while the data is output; the third path of sub-block interleaver transmits the interleaving address back to the third path of data storage module for reading RAM operation, and sequentially outputs data according to the interleaving address, and raises the cycle of the data output control signal when outputting the data;

wherein, the control signal refers to a start signal and an end signal of transmission data; FPGA represents a field programmable gate array, CPU represents a central processing unit, MAC layer represents a media intervention control layer, CRC represents cyclic redundancy check, and RAM represents a random access memory;

the bit collection and punching module adopts two three-port RAMs to cache data through ping-pong operation; and each three-port RAM is provided with three ports, read-write access is carried out on the stored data through each port, and the three paths of interleaved data output by the code block interleaving module are simultaneously cached according to a bit collection rule through the three ports.

2. The method of claim 1, wherein the transmission block CRC24A adding module performs 8-bit parallel CRC adding operation on the transmission block data transmitted by an 8-bit parallel transmission architecture; a counter for outputting bytes is set in the transport block CRC24A adding module, when the counter size reaches (TBS/8) +3, which indicates that the transport block data has been output, the output of the calculated 3-byte check code is started, where TBS represents the size of the transport block.