WO2021214788A1

WO2021214788A1 - Method and system for efficient low latency rate-matching and bit-interleaving for 5g nr

Info

Publication number: WO2021214788A1
Application number: PCT/IN2021/050391
Authority: WO
Inventors: Khitish Chandra Behera
Original assignee: Khitish Chandra Behera
Priority date: 2020-04-21
Filing date: 2021-04-21
Publication date: 2021-10-28

Abstract

A rate-matching and interleaving module for a down link transmit chain for data payload of a 5G New Radio (NR) that includes a rate-matching buffer for storing a set of encoded bits outputted by a Low-density-parity-check code (LDPC) encoder, an M-parallel pointer generation module for initializing M parallel pointers for reading data from corresponding M bit addresses of the rate-matching buffer, based on a number E_r of rate matching bits, a modulation order Q_m, and a start index in the rate-matching buffer, a decoding and bits re-arranging module for decoding encoded data, based on the modulation order, to generate M output bits, in each clock cycle. The M-parallel pointer generation module further increments a bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded by the decoding and bits re-arranging module.

Description

Method and system for efficient low latency rate-matching and bit-interleaving for 5G NR

FIELD OF THE INVENTION

[001] The present invention relates to 5G cellular communication standard, and more specifically to a method and system for efficient low latency rate-matching and bit-interleaving for 5G NR.

BACKGROUND

[002] 5G New Radio, also known as NR, is a cellular communication standard that has two very important features: low-latency and multi-gigabit per second throughput. The peak data rate requirement is 20 Gbps for downlink (DL) and 10 Gbps for uplink (UL). 5GNR is characterized by scalable latency with a slot duration of 0.5 millisecond for 30 kHz sub-carrier spacing (SCS), and slot duration scales with SCS such that it is 0.25 millisecond for 60 kHz SCS and 0.125 millisecond for 120 kHz SCS. The transmitter chain of the communication device physical layer has to process high volume of data in order of few millions of bits per Transport Block for multi gigabit throughput requirement. The major constraint for the transmitter is to meet latency (0.5 ms slot for 30 kHz SCS, 0.25 ms slot for 60 kHz SCS, and 0.125 ms slot for 120 kHz SCS) requirement to process high volume of data bits.

[003] The channel encoding function in the transmitter chain as shown on FIG. 1A is applied separately on each code-block (CB) where each code-block can be maximum 8448 bits, adds additional bits known as parity bits for forward-error-correction. The code-rate (CR) is defined as ratio between number of information bits to total number of encoded bits, and is selected based on channel conditions. LDPC (Low-Density Parity Check Codes) code which is used for data channel encoding in 5G NR, can output total 25344 bits for base code rate (CR)=l/3 for a code-block of 8448 bits.

[004] Before applying rate-matching and bit-interleaving functions the encoded bits are stored in a circular buffer known as rate-matching buffer of length N_Cb- The purpose of rate-matching is to select a specific length and set of bits given by E_r and RV (Redundant Version) known as the total size of rate-matching bits and starting- indices respectively. Further, 5G NR LDPC codes have 2 base-graphs BG1 and BG2, where each of base-graphs is characterized by lifting size (Z). The supported lifting sizes referenced from 5 G NR technical specifications, are given in Table 1. BG1 has 22Z information bits and 46Z parity bits. BG2 has 10Z information bits and 42Z parity bits. The first two columns of both base graphs are not transmitted, and are referred to as built-in puncture region that corresponds to 2Z information bits.

Table 1 : Lifting sizes supported by 5G LDPC Codes

[005] To support HARQ functionality, there are 4 RV indices RVo, RVi, RV2 and RV3 that are defined in 5G NR. Each RV index specifies a specific starting location for base graph 1 and base graph 2 within the circular buffer as defined in 5G NR technical specification, is given in Table 2.

Table 2 : Starting position of redundancy versions

[006] The encoded bits from LDPC encoder are written into a circular buffer of length N_Cb- To reduce complexity of the buffer, a limited buffer rate matching (LBRM) is specified in 5G NR. If LBRM is disabled, N_Cb = Ni_dpc, where Ni_dpc is 66Z for BG1 and 50Z for BG2. If LBRM is enabled, N_eb is calculated as per the below equation as defined in 5G NR technical specification.

Neb

Where RLBRM = 2/3 and TBSLBRM is a function of number of layers, modulation order and the number of physical resource blocks. TBSLBRM is calculated in higher layers as described in 5G NR technical specification.

[007] The output of the rate-matching buffer goes through row-column permutation function in bit-interleaving process. The rate-matched bits from the rate-matching buffer are written in row- first order into another buffer and read in column-first order. While copying the bits from the rate matching buffer, the fdler bits are skipped and does not enter the row-column buffer.

[008] The copy operation from rate-matching circular buffer to row-column buffer consumes large compute cycles. For example, when total number of rate-matching bits to be interleaved is 16000, a typical implementation may cost 16000 cycles for writing into another row-column buffer, in a hardware with limited clock frequency such as in FPGA with a clock of about 400 MHz, it takes 40 us (=16000 x 2.5 ns) for each code-block. And transmit chain can at most transmit about 12 code-blocks in a 0.5 ms slot, with very little margin left for other modules to be processed in the signal chain. This approach may not meet the actual requirement to process larger number (> 100 CBs) of CBs for multi-Gig (> lGbps) throughput.

[009] Further, as 5G NR specifies removing 2Z bits information bits known as built-in puncture bits, where Z can be maximum 384 bits, thus removing up-to 768 bits from the beginning of the encoder stream, costs additional cycles due to shift-operations. The filler-bits or zero-padding bits as well needs to be removed before the row-column re-ordering. In a typical hardware implementation, zero-padded bits are removed by shift operations within the circular buffer. The number of shift operations depend on the number of filler bits, and costs additional cycles. [0010] In a radio hardware which is limited by operating clock frequency, the buffer-to-buffer copy operations and shift operations cost large compute cycles for every LDPC encoded-block. In 5G NR, ultra-low latency and high throughput pose a challenge for the hardware to process a plurality of encoded blocks through the rate-matching and bit-interleaving functions within sub millisecond duration.

[0011] Existing hardware implementations using Application Specific Integrated Chip (ASIC) chips or using Field Programmable Gate Array (FPGA) prototypes address the latency issues in the transmitter signal chain by applying high-degree of parallelism, and multiplying the hardware resource usage. For example, an ASIC solution, which benefits from silicon technology node miniaturization, can operate at a clock in the range of GHz frequencies, thus cuts-down the latency by a direct scaling factor of about 2.5 as opposed to a hardware operating at 400 MHz clock frequency.

[0012] The ASIC solutions target to meet the low-latency by multiplying the hardware count, at the cost of increasing hardware resource count. But in a FPGA solution, which comes with limited resource count and a restricted and fixed clock frequency in the range of 400~500MHz, meeting the latency requirement is a big challenge. The companies typically implement digital baseband hardware for 5G base-stations known as gNBs using customized ASIC or using programmable FPGAs. Their solution may not be cost effective, since to meet ultra-low-latency in sub millisecond range and high throughput (> 1 Gbps) requirements in the transmit signal chain in physical downlink data channel, high-degree of parallel processing with multiple instances of hardware components might be used.

[0013] In view of the above, there is a need for a method and system that addresses the latency challenges by avoiding the buffer-to-buffer copy operations, also avoiding any shift-operations within the rate matching buffer.

SUMMARY OF THE INVENTION

[0014] This summary is provided to introduce a selection of concepts, in a simple manner, which are further described in detailed description of the disclosure. This summary is neither intended to identify the key or essential inventive concept of the subject matter, nor to determine the scope of the disclosure. [0015] In an aspect of the present disclosure, there is provided a rate-matching and bit-interleaving module for a down link transmit chain for data payload of a 5G New Radio (NR) that includes a rate-matching buffer configured to store a set of encoded bits outputted by a Low-density-parity- check code (LDPC) encoder, an M-parallel pointer generation module configured to initialize M parallel pointers for reading encoded data from corresponding M bit addresses of the rate- matching buffer, in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index of the set of encoded bits in the rate-matching buffer, and a decoding and bits re-arranging module. The decoding and bits rearranging module is configured to decode, based on the modulation order, to generate M output bits, in each clock cycle. The M-parallel pointer generation module is further configured to increment a bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded and outputted by the decoding and bits re-arranging module.

[0016] In another aspect of the present disclosure, there is provided a rate-matching and bit interleaving method for a down link transmit chain for data payload of a 5G New Radio (NR) that includes storing a set of encoded bits outputted by a Low-density-parity-check code (LDPC) encoder in a rate-matching buffer, initializing M parallel pointers for reading encoded data from corresponding M bit addresses of the rate-matching buffer, in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index of the set of encoded bits in the rate-matching buffer, converting M bit addresses in the M parallel pointers to generate M word addresses, based on a word-width of the rate matching buffer; reading M words corresponding to M word addresses from the rate matching buffer; decoding the M words to generate M bits; and incrementing bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded and outputted.

[0017] In yet another aspect of the present disclosure, there is provided a non-transitory computer readable medium configured to store a program causing a computer to perform rate-matching and bit-interleaving for a down link transmit chain for data payload of a 5G New Radio (NR). The program is configured to store a set of encoded bits outputted by a Low-density-parity-check code (LDPC) encoder in a rate-matching buffer, initialize M parallel pointers for reading encoded data from corresponding M bit addresses of the rate-matching buffer, in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index RVi_d of the set of encoded bits in the rate-matching buffer, convert the M bit addresses in the M parallel pointers to M word addresses, based on a word-width of the rate matching buffer, read M words corresponding to the M word addresses from the rate matching buffer, decode the M words to generate M bits, at the end of each clock cycle, and increment bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded and outputted.

[0018] Various embodiments of the present disclosure, provide an efficient M-parallel look-ahead pointers generation process, to read M-interleaved bits directly from rate-matching buffer, avoiding row-column permutation operation, and thus the need of a separate buffer, where M is programmed for a target latency. The process is independent of the clock frequency and underlying hardware. In the proposed disclosure, there is no buffer-to-buffer copy or shift operations required. By look-ahead computing a set of parallel pointers, it is possible to read multiple-data from the rate- matching buffer and thus, avoiding the need of another interleaving buffer for row-column re ordering. By look-ahead computation, a plurality of pointers is updated on the fly, which enable reading a plurality of data every clock cycle from the rate matching buffer. Also, the concept of parallel pointer generation and look-ahead pointer computation can be deployed to avoid buffer copy operations in application within wireless domain.

[0019] Further, the resulting latency of rate-matching and bit-interleaving functions in the downlink transmit chain for data payload in 5G NR is scaled by a programmable factor M. By scaling the latency by a programmable factor, downlink transmitter latency requirement for 5G NR, across a plurality of clock frequencies can be met. The disclosure addresses the latency aspects while processing the large Transport Blocks corresponding to the maximum downlink (DL) throughput.

[0020] Further benefits, goals and features of the present disclosure will be described by the following specification of the attached figures, in which components of the disclosure are exemplarily illustrated. Components of the devices and method according to the disclosures, which match at least essentially with respect to their function, can be marked with the same reference sign, wherein such components do not have to be marked or described in all figures.

[0021] The disclosure is just exemplarily described with respect to the attached figures in the following.

BRIEF DESCRIPTION OF DRAWINGS [0022] The disclosure will be described and explained with additional specificity and detail with the accompanying figures in which:

[0023] FIG. 1A illustrates overall processing of data payload in down-link transmit chain of 5G New Radio (NR);

[0024] FIG. IB illustrates LDPC encoder output processing by rate-matching and bit-interleaving functions, in accordance with an embodiment of the present disclosure;

[0025] FIG. 2 illustrates an exemplary rate-matching buffer, in accordance with an embodiment of the present disclosure;

[0026] FIG. 3 illustrates initialization of M parallel pointers for the rate-matching buffer, when the modulation order Q_m=l, modulation type is BPSK and M=8, in accordance with an embodiment of the present disclosure;

[0027] FIG. 4 illustrates pointer initialization for the rate-matching buffer, when modulation order Q_m=2, modulation type is QPSK and M=8, in accordance with an embodiment of the present disclosure;

[0028] FIG. 5 illustrates pointer initialization for rate-matching buffer, when modulation order Q_m=4, modulation type is 16-QAM and M=8, in accordance with an embodiment of the present disclosure;

[0029] FIG. 6 illustrates pointer initialization for the rate-matching buffer, when modulation order Q_m=6, modulation type is 64-QAM and M=8, in accordance with an embodiment of the present disclosure;

[0030] FIG. 7 illustrates pointer initialization for the rate-matching buffer, when modulation order Q_m=8, modulation type is 256-QAM and M=8, in accordance with an embodiment of the present disclosure;

[0031] FIG. 8 illustrates an exemplary initialization of 8 parallel pointers ptr[0], ptr[l], ptr[2] . ptr[7] based on various input parameters, in accordance with an embodiment of the present disclosure;

[0032] FIG. 9 illustrates generation of exemplary 8 output bits through the 8 parallel pointers (shown in FIG. 8) from the rate matching buffer, in accordance with an embodiment of the present disclosure; and [0033] FIG. 10 is a flowchart of reading and decoding data from the rate matching buffer, in accordance with an embodiment of the present disclosure.

[0034] Furthermore, the figures may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures with details that will be readily apparent to those skilled in the art having the benefit of the description herein.

DETAILED DESCRIPTION OF DISCLOSURE

[0035] For the purpose of promoting an understanding of the principles of the disclosure, reference will now be made to the embodiment illustrated in the figures and specific language will be used to describe them. It will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would normally occur to those skilled in the art are to be construed as being within the scope of the present disclosure. [0036] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof.

[0037] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more sub-systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other, sub-systems, elements, structures, components, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.

[0038] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting. [0039] Embodiments of the present disclosure will be described below in detail with reference to the accompanying figures.

[0040] FIG. IB illustrates a system 100 for processing LDPC encoder output processing through rate-matching and bit-interleaving for a 5G New Radio (NR), in accordance with an embodiment of the present disclosure.

[0041] The system 100 includes a Low-density-parity-check code (LDPC) encoder 102, and a rate-matching and bit-interleaving module 104. The rate-matching and bit-interleaving module 104 includes an M-parallel pointer generation module 106, a rate-matching buffer 108, and a decoding and bits re-arranging module 110.

[0042] The LDPC encoder 102 is a well-known encoder that outputs multiple encoded bits per clock cycle, and writes the encoded bits into a circular buffer such as the rate-matching buffer 108. A code-rate (CR) is defined as ratio between number of information bits to total number of encoded bits, and is selected based on channel conditions. In an example, the LDPC encoder 102 outputs total 25344 bits for base Code Rate (CR)=l/3 for a maximum length code-block of 8448 bits. The LDPC encoder 102 may be implemented using an Application Specific Integrated Circuit (ASIC) and/or a Field Programmable Gate Array (FPGA).

[0043] Referring to FIG. IB and 2, the rate-matching buffer 108 stores a built-in puncture region 200 including punctured information bits of length 2Z, and a set of encoded bits Ni_dpc- In an example, a maximum value of Z is 384 bits and thus, the set of punctured information bits include 768 bits. Further, a start read index 202 of the set of encoded bits Ni_dpc is referred to as RVi_d. There are 4 possible indices defined for RV are RVo, RVi_, RV2 and RV3_. If the start read location within the rate-matching circular buffer 108 is given as RVo by the higher layer, the start index RVi_d is pre-calculated in an initial cycle so that it points to an address 2Z within rate-matching circular buffer. If the fixed read location given by higher layer is a nonzero RV index such as RVi or RV2 or RV3, the start index RVi_d is pre-incremented by an amount 2Z. Furthermore, the set of encoded bits Ni_dpc include a filler region 204 that includes a predefined number of zero padded bits (filler bits). The filler region 204 has a start index nFiller st and an end index nFiller end. The purpose of rate-matching is to select a specific length and set of bits given by E_r and RV (Redundant Version) known as the total size of rate- matching bits and start- indices respectively.

[0044] The M-parallel pointer generation module 106 initializes M parallel pointers ptr[0], ptr[l] . ptr[M-l] in an initial clock cycle, i.e., a first clock cycle, for reading encoded data from M bit addresses of the rate- matching buffer 108, based on the lifting size Z, the start index 200, start and end indices of the filler region 204, a number E_r of rate-matching bits, and a modulation order Q_m in the down link transmit chain. The modulation order Q_m may take values 1 or 2 or 4 or 6 or 8 based on modulation types such as Binary phase shift keying (BPSK), Quadrature phase shift keying (QPSK),16 Quadrature Amplitude Modulation (QAM), 64-QAM and 256-QAM respectively. M parallel pointers are generated on the fly based on the output data- rate/throughput and latency requirements across different modulation orders and operating clock frequencies. The initialization of pointers can be parameterized and generalized across different output data bit-widths (M) and modulation types.

[0045] In an embodiment of the present disclosure, the M parallel pointers include Q_m sets of parallel pointers, each set including M/Q_m sequential pointers, and wherein an initial set of the Q_m sets start from the bit address at the start index RVi_d, and a next set start from an offset address of E_r/Q_m from a starting bit address of corresponding previous set. Also, during the initialization process, the higher layer parameters such as start index RVi_d, start filler index nFiller st, and end filler index nFiller end, are pre-incremented by 2Z before assigning bit addresses to the M parallel pointers. In one example, the start index RVi_d has an address 2Z, instead of 0 when RVo is given by higher layers. However, it would be apparent to one of ordinary skill in the art, that the start index RVi_d may have addresses other than 2Z.

[0046] In another embodiment of the present disclosure, the M-parallel pointer generation module 106 assigns to a parallel pointer, an offset address from the fdler region 204, when a bit address in the parallel pointer falls in the filler region 204 during initializing the parallel pointer.

[0047] In yet another embodiment of the present disclosure, the M-parallel pointer generation module 106 assigns to a parallel pointer, an offset address from the built-in puncture region, when a wraparound condition occurs, i.e., when the bit address in the parallel pointer falls in the built- in puncture region 200, during initializing the parallel pointer.

[0048] In the context of the present disclosure, the initialization of the M parallel pointers is done based on the below pseudo-code. The algorithm is generalized assuming that the LDPC encoder 102 does not puncture 2 Z bits and does not remove filler bits.

Algorithm 1 : M-parallel pointer Initialization

M = Number of parallel pointers RVid = RVid + 2Z

N_eb = Ni_dpc + 2Z // rate-matching buffer also stores the puncture bits nFiller_st = nFiller_st + 2Z nFiller end = nFiller end + 2Z

Q_m = 1 or 2 or 4 or 6 or 8 // based on modulation type

// BPSK/QPSK/16-QAM/64-QAM/256- QAM N = (M/Q_m) // N is number of sequential addresses for i=0 to Qm - 1 for j=0 to N-l incr = 0 for k=0 to j do if (nFiller_st < (RVi_d + i*(E_r/Q_m) + k) < nFiller_end) incr++ end if end for if (RVi_d + l*(E_r/Q_m)+j) > Ne_b) ptr[i*N +j] = RVid + i*(E,/Q_m) + j - Nidpc else if (incr ¹ 0) ptr[i*N +j] = nFiller_end + incr else ptr[i*N +j] = RVi_d + i*(E_r/Q_m) + j end if end for end

[0049] FIG. 3 illustrates initialization of M parallel pointers for the rate-matching buffer, when the modulation order Q_m = 1, modulation type is BPSK and M=8. In BPSK modulation, 8- sequential pointers are initialized with address pointed by RVi_d. The M-parallel pointer generation module 106 generates eight sequential pointers ptr[0], ptr[l] . ptr[7] starting from bit address

2Z at the start index 202. Thus, the eight sequential pointers ptr[0], ptr[l] . ptr[7] are initialized with addresses 2Z, 2Z+1, 2Z+2...2Z+7, respectively. [0050] FIG. 4 illustrates pointer initialization for the rate-matching buffer 108, when modulation order Q_m=2, modulation type is QPSK and M=8. In the QPSK modulation, first and second sets of pointers are initialized with 4-sequential pointers in each set. The first and second sets start from interleaved-offset addresses 0 and E_r/Q_mfrom the start index 202, respectively.

[0051] As illustrated, the first set of pointers includes four sequential pointers ptr[0], ptr[l], ptr[2], and ptr[3] initialized with addresses 2 Z, 2Z+1, 2Z+2 and 2Z+3 respectively. The second set of pointers includes four sequential pointers ptr[4], ptr[5], ptr[6], and ptr[7] initialized with addresses E_r/2 + 2Z, E_r/2+l+2Z, nFiller_end+l and nFiller_end+2 respectively. The ptr[6], and ptr[7] fall in the filler region 204 during initialization, therefore, ptr[6], and ptr[7] are incremented to skip the filler region 204 and to have bit addresses that are offset from the filler region 204.

[0052] FIG. 5 illustrates pointer initialization for rate-matching buffer 108, when modulation order Q_m=4, modulation type is 16-Q AM and M=8. In the 16-QAM modulation, first, second, third and fourth sets of pointers are initialized with 2-sequential pointers in each set. The first, second, third and fourth sets start from interleaved-offset addresses 0, E_r/Q_m, 2*(E_r/Q_m) and 3*(E_r/Q_m) from the start index 202, respectively.

[0053] As illustrated, the first set of pointers includes two sequential pointers ptr[0] and ptr[l], initialized with addresses 2Z and 2Z+1, the second set of pointers includes two sequential pointers ptr[2] and ptr[3] initialized with addresses (E_r/4)+2Z, and nFiller end+1, the third set of pointers include two sequential pointers ptr[4] and ptr[5] initialized with addresses 2*(E_r/4)+2Z, and 2*(E_r/4)+l+2Z, and the fourth set of pointers ptr[6] and ptr[7] are initialized with addresses 3*(E_r/4) + 2Z and 3*(E_r/4)+l+2Z respectively. The ptr[3] falls in the filler region 204 during initialization, therefore, ptr[3] is incremented to skip the filler region 204 and to have bit addresses that is offset from the filler region 204 using look ahead-computing of the offsets from the filler region 204.

[0054] FIG. 6 illustrates pointer initialization for the rate-matching buffer 108, when modulation order Q_m=6, modulation type is 64-QAM and M=8. In the 64-QAM modulation, six parallel pointers are initialized with interleaved-offset addresses 0, E_r/Q_m, 2*(E_r/Q_m), 3*(E_r/Q_m), 4*(E_r/Q_m), and 5*(E_r/Q_m) from the start index 202, respectively. Since M is not a multiple of 6, as 6 does not divide 8, only 6 pointers are initialized and used.

[0055] As illustrated, the first pointer ptr[0] is initialized with addresses 2 Z, the second ptr[l] is initialized with address (E_r/6)+2Z, the third pointer ptr[2] is initialized with address 2*(E_r/6)+2Z, the fourth pointer ptr[3] is initialized with address 3*(E_r/6)+2Z, the fifth pointer ptr[4] is initialized with address 4*(E_r/6)+2Z, and the sixth pointer ptr[5] is initialized with address 5*(E_r/6)+2Z.

[0056] FIG. 7 illustrates pointer initialization for the rate-matching buffer 108, when modulation order Q_m=8, modulation type is 256-QAM and M=8. In the 256-QAM modulation, eight parallel pointers are initialized with interleaved-offset addresses 0, E_r/Q_m, 2*(E_r/Q_m), 3*(E_r/Q_m), 4*(Er/Qm), 5*(Er/Qm), 6*(E_r/Qm) and 7*(E_r/Q_m) from the start index 202, respectively.

[0057] As illustrated, the first pointer ptr[0] is initialized at addresses 2Z, the second ptr[l] is initialized with address E_r/8+2Z, the third pointer ptr[2] is initialized with address 2*(E_r/8)+2Z, the fourth pointer ptr[3] is initialized with address 3*(E_r/8)+2Z, the fifth pointer ptr[4] is initialized with address 4*(E_r/8)+2Z, the sixth pointer ptr[5] is initialized with address 5*(E_r/8)+2Z, the seventh pointer ptr[6] is initialized with address 6*(E_r/8)+2Z, and the eighth pointer ptr[7] is initialized with address 7*(E_r/8)+2Z.

[0058] It would be apparent to one of ordinary skill in the art, that M represents output data width per clock cycle, and is assumed to be 8, for the illustration purposes. However, the disclosure is not limited to M=8, it is generalized and a generalized algorithm is proposed.

[0059] Also, the value of M is chosen such that it is a least common multiple of Q_m values (1 or 2 or 4 or 6 or 8) and divides E_r. In case, when M is chosen as a multiple of the highest Q_m value, some of Q_m values may not divide M, for example, when M=8, Q_m=6 for 64-QAM modulation, as 6 does not divide 8, only 6 out of 8 pointers are used in 64-QAM modulation, whereas in BPSK, QPSK, 16-QAM and 256-QAM modulations, all the 8 pointers are used. The M parallel pointers are eventually used to read M parallel bits from the rate matching buffer 108, and M bits should therefore, constitute an integer number of symbols, based on the modulation order Q_m. For example, when M=24, 24 parallel pointers are generated, equivalently 24 bits read from the buffer would constitute 24 symbols in the BPSK modulation, 12 symbols in the QPSK modulation, 6 symbols in the 16-QAM modulation, 4 symbols in the 64-QAM modulation, and 3 symbols in the 256-QAM modulation modes.

[0060] Referring back to FIG. IB, the decoding and bits re-arranging module 110 reads encoded data from the rate matching buffer 108 every clock cycle, through the M parallel pointers, such that the pointers ptr[0], ptr[l], ptr[2] . ptr[M-l] read M words memout[0], memout[l], memout[2] . memout[M-l] from corresponding M bit address locations of the rate matching buffer 108. The decoding and bits re-arranging module 110 further decodes the data words memout[0], memout[l], memout[2] . memout[M-l] based on the modulation order Q_m, to generate M output bits bit[0], bit[l], bit[2] .. bit[M-l] respectively.

[0061] In an embodiment of the present disclosure, the decoding and bits re-arranging module 110 is configured to: convert M bit addresses in M parallel pointers to generate M word addresses, based on a word-width of the rate matching buffer 108, read M words memout[0], memout[l], memout[2] . memout[M-l] in parallel, corresponding to M word addresses from the rate matching buffer 108, decode the M words to generate M bits, and rearrange the M bits based on the modulation order, to generate the M output bits bit[0], bit[l], bit[2] ... bit[M-l] respectively.

[0062] It is assumed that the rate-matching buffer word width (W) value is a power of two, and the decoding of each M word is performed using modulus operation. Since, modulus operation using power of two in hardware reduces to a simple bit selection operation, the decoding function can be applied as soon as the encoded data is read from the rate matching buffer 108. The re arrange operation is a regrouping operation based on the modulation order.

[0063] In the context of the present disclosure, the decoding and rearranging of M parallel bits for different type of modulation, using the M parallel pointers is performed based on the below pseudo-code. Algorithm 2: M-parallel pointer Initialization

M = Number of parallel pointers

W = rate-match memory word-width I I in power of 2

Q_m = 1 or 2 or 4 or 6 or 8 I I based on modulation type

11 BPSK/QPSK/16-QAM/64-QAM/256-Q AM N = (M/Q_m) I I N is number of sequential addresses case (Qm)

BPSK: bit [0] = rate-match-memory [ptr [0] %W] bit [1] = rate-match-memory [ptr [1] %W] bit [2] = rate-match-memory [ptr [2] %W] bit [M-l] = rate-match-memory [ptr [M-l] %W]

QPSK: bit [0] = rate-match-memory [ptr [0] %W] bit [1] = rate-match-memory [ptr [1*N] %W] bit [2] = rate-match-memory [ptr [1 + 0*N] %W] bit [3] = rate-match-memory [ptr [1 + 1*N] %W] bit[M-l] = rate-match-memory [ptr[M-l] %W] 16QAM: bit [0] = rate-match-memory [ptr [0]] bit [1] = rate-match-memory [ptr [1*N] %W] bit [2] = rate-match-memory [ptr [2*N] %W] bit [3] = rate-match-memory [ptr [3*N] %W] bit [4] = rate-match-memory [ptr [1 + 0*N] %W] bit [5] = rate-match-memory [ptr [1 + 1*N] %W] bit [6] = rate-match-memory [ptr [2 + 2*N] %W] bit [M-l] = rate-match-memory [ptr [M-l] %W] 64QAM: bit [0] = rate-match-memory [ptr [0] %W] bit [1] = rate-match-memory [ptr [1*N] %W] bit [2] = rate-match-memory [ptr [2*N] %W] bit [3] = rate-match-memory [ptr [3*N] %W] bit [4] = rate-match-memory [ptr [4*N] %W] bit [5] = rate-match-memory [ptr [5*N] %W] bit [6] = rate-match-memory [ptr [1 + 0*N] %W] bit [7] = rate-match-memory [ptr [1 + 1*N] %W] bit [M-l] = rate-match-memory [ptr [M-l] %W] 256QAM: bit [0] = rate-match-memory [ptr [0] %W] bit [1] = rate-match-memory [ptr [1*N] %W] bit [2] = rate-match-memory [ptr [2*N] %W] bit [3] = rate-match-memory [ptr [3*N] %W] bit [4] = rate-match-memory [ptr [4*N] %W] bit [5] = rate-match-memory [ptr [5*N] %W] bit [6] = rate-match-memory [ptr [6*N] %W] bit [7] = rate-match-memory [ptr [7*N] %W] bit [8] = rate-match-memory [ptr [1 + 0*N] %W] bit [9] = rate-match-memory [ptr [1 + 1*N] %W] bit [M-l] = rate-match-memory [ptr [M-l] %W]

[0064] The M-parallel pointer generation module 106 is further configured to increment a bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till total E_r number of rate matching bits are decoded and outputted by the decoding and bits re-arranging module 110. After initialization phase, sequential pointers in each set are incremented on the fly. It may be noted, that the pointer initialization and update/increment are done in separate compute cycles in the hardware. Thus, total 3 -steps are applied in sequence: pointer initialization, on-the-fly increment, and decoding and bit rearrangement, for reading and decoding data from the rate matching buffer 108

[0065] In an embodiment of the present disclosure, the sequential pointers in each set corresponding to modulation order are checked to find if they fall in the filler region or wrap around to the built-in puncture region. The increment value for sequential pointers to skip filler region and the puncture region, is found by look ahead computation. The M-parallel pointer generation module 106 assigns to a parallel pointer, an offset address from the filler region 204, when a bit address in the parallel pointer falls in the filler region 204, while incrementing the parallel pointer. The M-parallel pointer generation module 106 further assigns to a parallel pointer, an offset address from the built-in puncture region 200, when a bit address in the parallel pointer falls in the built-in puncture region 200, while incrementing the parallel pointer.

[0066] In the context of the present disclosure, the incrementing of M parallel pointers across different modulation order and output data-widths is performed based on the below pseudo-code. The algorithm is generalized assuming that the LDPC encoder does not puncture 2 Z bits and remove the filler bits. Algorithm 3: M-Parallel-Pointer Increment

M = Number of parallel pointers

Q_m =1 or 2 or 4 or 6 or 8 11 based on modulation type

I I BPSK/QPSK/16-QAM/64-Q AM/256-Q AM N_eb = Ni_dpc + 2Z I I rate-matching circular buffer also stores puncture bits

N = (M/Q_m) I I N is number of sequential addresses for i=0 to Qm-1 for j=l to N incr = 0 for k= 1 to j if (nFiller_st < (ptr[i*N+N-l] + k) < nFiller_end) incr++ end if end for if ((ptr[i*N+N-l] +j) > Ncb) ptr[i*N+j-l] = ptr[i*N+N-l] +j - Ni_dpc else if (incr ¹ 0) ptr[i*N+j-l] = nFiller_end + incr else ptr[i*N+j-l] = ptr[i*N+N-l] +j end if end for end for

[0067] FIG. 8 illustrates an exemplary initialization of 8 parallel pointers ptr[0], ptr[l], ptr[2] . ptr[7] based on various input parameters such as modulation type 256-QAM, RVi_d =

RVo, Z=384, E_r=16000, nFiller_st = 2000, and nFiller_end=2200. Also, as shown, the second pointer ptr[l] matches with filler region start index, therefore, it is incremented to one location next to filler region end index i.e., nFiller end+1. Thus, the parallel pointers ptr[0], ptr[l], ptr[2] . ptr[7] point to bit addresses 768, 2969, 4768, 6768, 8768, 10768, 12768, and 14768 respectively, to retrieve encoded data.

[0068] FIG. 9 illustrates generation of exemplary 8 output bits through the 8 parallel pointers (shown in FIG. 8) from the rate matching buffer 108. The rate matching buffer 108 has a word- width of 64 bits, and stores total 26112 bits (built-in puncture of size 2Z=384 and Ni_dpC=25344), therefore, has a depth 408 (= 26112/64), implying that the rate matching buffer 108 stores 408 words, of width 64 bits each. Herein, it is assumed that the LDPC encoder 102 generates 64-bits every clock cycle, hence the rate-matching buffer 108 is assumed to have a word width of 64-bits.

[0069] Each parallel pointer ptr[0], ptr[l ], ptr[2] . ptr[7] is provided to the rate matching buffer

108, through a shift register 902. Each shift register 902 divides each parallel pointer by 2⁶=64, to generate a corresponding word address of the rate matching buffer 108. Thus, the pointer addresses 768, 2969, 4768, 6768, 8768, 10768, 12768, and 14768 are divided by 64 to generate word addresses 12, 46, 74, 105, 137, 168, 199 and 230 to read corresponding words from the rate matching buffer 108. The divide function can be implemented by applying right shift operation to each of M parallel pointers ptr[0], ptr[l], ptr[2] . ptr[7] by a decimal value 6 considering the word width is 64 as shown in FIG 9.

[0070] The rate matching buffer 108 outputs a 64-bit word at each word address of the rate matching buffer 108. The number of such words is equal to M, i.e., 8. The decode and bits rearranging module 110 applies modulus operation on each pointer values to find bit-location in the corresponding 64-bit word. Since the number of output parallel word width (M=8) is same as the modulation order (Q_m=8) in 256-QAM modulation case, the bit arrangement function directly drives the 8 -bit outputs.

[0071] By generating M parallel pointers, the latency for each code-block is scaled by a factor 1/M. Avoiding another buffer for row-column permutation for interleaving, and associated buffer operation, the proposed disclosure leads to an efficient algorithm with a scalable latency implementation. The novel features of the proposed disclosure are by avoiding memory operations, rather generating and updating a plurality of pointers on the fly, results in a direct latency reduction by a factor based on the number of parallel pointers generated. As an example, for reading E_r=16000 bits from the rate-matching buffer 108, total read time with a clock frequency of 400 MHz (period=2.5 ns) scales to 5 us (= (16000/8) x 2.5 ns). The total read latency would scale to 2.5 ns if 16-parallel pointers are used instead of 8 parallel pointers.

[0072] By reducing the latency, the proposed disclosure saves hardware resource count and also power by allowing the design to work at a slower clock frequency. For example, it is not required to run the rate-matching buffer at higher GHz frequency, latency requirements can still be met by running at 400 MHz.

[0073] The present disclosure is generalized to address the scenario when LDPC encoder 102 does not remove built-in puncture bits of length 2Z and filler bits; by storing built-in puncture bits and filler bits in the rate-matching buffer 108; however, it also allows to address the scenario when the rate matching buffer 108 does not include the built-in puncture region, and filler region.

[0074] The present disclosure is generalized for parallelism factor M; the higher M values scale the processing time by a factor of 1/M, but also increases the computational logic required for lookahead computation to find the increment value while on-the-fly updating the pointers. If the LDPC encoder hardware removes the built-in puncture 2 Z bits and fdler bits from the encoding stream, the computations are simplified and look-ahead logic to find the offset from filler region is not be required, thus it will enable using a higher M values. The M parallel pointers require M words to be read in parallel from the rate- matching buffer 108. If the physical memory cell cannot support multiple read ports, redundant memory instances will be required.

[0075] FIG. 10 is a flowchart of reading and decoding data from the rate matching buffer 108, in accordance with an embodiment of the present disclosure. FIG. 10 has been explained with reference to FIG. IB and 2.

[0076] At step 1002, a set of encoded bits outputted by a Low-density-parity-check code (LDPC) encoder are stored in the rate-matching buffer 108. The rate-matching buffer 108 also stores a built-in puncture region 200 including punctured information bits of size 2 Z, and a set of encoded bits Ni_dpc The Z is a lifting size of the LDPC encoder 102. In an example, a maximum value of Z is 384 bits and thus, the set of punctured information bits include 768 bits. Furthermore, the set of encoded bits Ni_dpc include a filler region 204 that includes a predefined number of zero padded bits (filler bits). The filler region 204 has a start index nFiller st and an end index nFiller end.

[0077] At step 1004, M parallel pointers corresponding to M bit addresses of the rate-matching buffer 108 are initialized in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index RVi_d of the set of encoded bits in the rate-matching buffer 108. In an embodiment of the present disclosure, the M parallel pointers include Q_m sets of parallel pointers, each set including M/Q_m sequential pointers, and wherein an initial set of the Q_m sets start from the bit address at the start index, and a next set start from an offset address of E_r/Q_m from a starting bit address of corresponding previous set. Also, during the initialization process, the higher layer parameters such as start index RVi_d, start filler index nFiller st and end filler index nFiller end, are pre-incremented by 2Z before assigning bit addresses to the M parallel pointers. Thus, the start index RVi_d has an address 2Z, if the fixed start read location within the rate-matching circular buffer is given as RVo by the higher layer. In the case a non-zero RV index RVi, or RV2 or RV3, the start index RVi_d is pre-incremented by 2Z.

[0078] At step 1006, the M bit addresses in the M parallel pointers are converted to generate M word addresses, based on a word-width of the rate matching buffer. In an embodiment of the present disclosure, each parallel pointer is provided to the rate matching buffer 108, through a shift register. Each shift register divides each parallel pointer by predefined number, to generate a corresponding word address of the rate matching buffer 108.

[0079] At step 1008, M words corresponding to the M word addresses are read from the rate matching buffer 108. Each word has a word width of the rate matching buffer 108.

[0080] At step 1010, the M words are decoded to generate M bits. In an embodiment of the present disclosure, the decode and bits rearranging module 110 applies modulus operation on each pointer values to decode an output bit from corresponding M word.

[0081] At step 1012, bit address in each parallel pointer is incremented by M/Q_m, at end of each clock cycle, till total E_r number of rate matching bits are decoded and outputted. In an embodiment of the present disclosure, each parallel pointer is incremented on the fly after the initialization.

[0082] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.

Claims

What is claimed is:

1. A rate-matching and bit-interleaving module for a down link transmit chain for data payload of a 5G New Radio (NR), comprising: a rate-matching buffer configured to store a set of encoded bits outputted by a Low- density-parity-check code (LDPC) encoder; an M-parallel pointer generation module configured to: initialize M parallel pointers for reading encoded data from corresponding M bit addresses of the rate-matching buffer, in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index RVi_d of the set of encoded bits in the rate-matching buffer; and a decoding and bits re-arranging module configured to decode the encoded data, and rearrange the encoded data, based on the modulation order, to generate M output bits, in each clock cycle, wherein the M-parallel pointer generation module is further configured to increment a bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded and outputted by the decoding and bits re-arranging module.

2. The rate matching and bit- interleaving module as claimed in claim 1, wherein the M parallel pointers include Q_m sets of parallel pointers, each set including M/Q_m sequential pointers, and wherein an initial set of the Q_m sets start from the memory address at the start index RVi_d, and a next set start from an offset address of E_r/Q_m from a starting memory address of corresponding previous set.

3. The rate matching and bit- interleaving module as claimed in claim 2, wherein the M parallel pointers include one set of eight sequential pointers starting from bit address at the start index, when the modulation order is 1 , modulation type is Binary Phase Shift Keying (BPSK) modulation, and M=8; the M parallel pointers include first and second set of pointers, each including four sequential pointers, when the modulation order is 2, modulation type is Quadrature Phase Shift Keying (QPSK) modulation, and M=8, wherein the first and second set of pointers start from interleaved-offset addresses at 0 and E_r/Q_m from the start index RVi_d, respectively; the M parallel pointers include first, second, third and fourth set of pointers, each including two sequential pointers, when the modulation order is 4, modulation type is 16 Quadrature Amplitude Modulation (QAM) modulation and M=8, wherein the first, second, third and fourth sets start from interleaved-offset addresses 0, E_r/Q_m, 2*(E_r/Q_m) and 3*(E_r/Q_m) from the start index RVi_d, respectively; the M parallel pointers include six parallel pointers starting from interleaved offset addresses at 0, E_r/Q_m, 2*(E_r/Q_m) and 3*(E_r/Q_m), 4*(E_r/Q_m), and 5*(E_r/Q_m) from the start index RVi_d, respectively, when the modulation order is 6, modulation type is 64 Quadrature Amplitude Modulation (QAM) modulation, and M=8; and the M parallel pointers include eight parallel pointers starting from interleaved offset addresses at 0, E_r/Q_m, 2*(E,/ Q_m) and 3*(E_r/Q_m), 4*(E_r/Q_m), 5*(E_r/Q_m), 6*(E_r/Q_m) and 7*(E_r/Q_m) from the start index RVi_d, respectively, when the modulation order is 8, modulation type is 256 Quadrature Amplitude Modulation (QAM) modulation, and M=8.

4. The rate matching and bit- interleaving module as claimed in claim 1 , wherein the decoding and bits re-arranging module is configured to: convert M bit addresses in M parallel pointers to generate M word addresses, based on a word-width of the rate matching buffer; read M words corresponding to M word addresses from the rate matching buffer, wherein the M words constitute encoded data; decode M words to generate M bits; and rearrange the M bits based on the modulation order, to generate the M output bits.

5. The rate matching and bit-interleaving module as claimed in claim 1, wherein the rate matching buffer is configured to further include a built-in puncture region storing a predefined number of punctured information bits before the set of encoded bits, and wherein the set of encoded bits includes a set of filler bits defined by a filler region with start and end indices.

6. The rate matching and bit-interleaving module as claimed in claim 5, wherein the M- parallel pointer generation module is further configured to increment the start index, the filler start index and the filler end index by the predefined number, before assigning bit addresses to the M parallel pointers.

7. The rate matching and bit-interleaving module as claimed in claim 6, wherein the M- parallel pointer generation module is further configured to assign to a parallel pointer of the M parallel pointers, an offset address from the filler region, when a bit address in the parallel pointer falls in the filler region during at least one of: initializing and incrementing the parallel pointer.

8. The rate matching and bit-interleaving module as claimed in claim 6, wherein the M- parallel pointer generation module is further configured to assign to a parallel pointer, an offset address from the built-in puncture region, when a bit address in the parallel pointer of the M parallel pointers falls in the built-in puncture region, during at least one of: initializing and incrementing the parallel pointer.

9. A rate-matching and bit-interleaving method for a down link transmit chain for data payload of a 5GNew Radio (NR), comprising: storing a set of encoded bits outputted by a Low-density -parity-check code (LDPC) encoder in a rate-matching buffer; initializing M parallel pointers for reading encoded data from corresponding M bit addresses of the rate-matching buffer, in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index RVi_d of the set of encoded bits in the rate-matching buffer; converting the M bit addresses in the M parallel pointers to M word addresses, based on a word-width of the rate matching buffer; reading M words corresponding to the M word addresses from the rate matching buffer; decoding the M words to generate M bits, at the end of each clock cycle; and incrementing bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded and outputted.

10. The rate matching and bit- interleaving method as claimed in claim 9, wherein the M parallel pointers include Q_m sets of parallel pointers, each set including M/Q_m sequential pointers, and wherein an initial set of the Q_m sets start from the memory address at the start index RVi_d, and a next set start from an offset address of E_r/Q_m from a starting memory address of corresponding previous set.

11. The rate matching and bit- interleaving method as claimed in claim 10, wherein the M parallel pointers include one set of eight sequential pointers starting from bit address at the start index, when the modulation order is 1 , modulation type is Binary Phase Shift Keying (BPSK) modulation, and M=8; the M parallel pointers include first and second set of pointers, each including four sequential pointers, when the modulation order is 2, modulation type is Quadrature Phase Shift Keying (QPSK) modulation, and M=8, wherein the first and second set of pointers start from interleaved-offset addresses at 0 and E_r/Q_m from the start index RVi_d, respectively; the M parallel pointers include first, second, third and fourth set of pointers, each including two sequential pointers, when the modulation order is 4, modulation type is 16 Quadrature Amplitude Modulation (QAM) modulation and M=8, wherein the first, second, third and fourth sets start from interleaved-offset addresses 0, E_r/Q_m, 2*(E_r/Q_m) and 3*(E_r/Q_m) from the start index RVi_d, respectively; the M parallel pointers include six parallel pointers starting from interleaved offset addresses at 0, E_r/Q_m, 2*(E_r/Q_m) and 3*(E_r/Q_m), 4*(E_r/Q_m), and 5*(E_r/Q_m) from the start index RVi_d, respectively, when the modulation order is 6, modulation type is 64 Quadrature Amplitude Modulation (QAM) modulation, and M=8; and the M parallel pointers include eight parallel pointers starting from interleaved offset addresses at 0, E_r/Q_m, 2*(E,/ Q_m) and 3*(E_r/Q_m), 4*(E_r/Q_m), 5*(E_r/Q_m), 6*(E_r/Q_m) and 7*(E_r/Q_m) from the start index RVi_d, respectively, when the modulation order is 8, modulation type is 256 Quadrature Amplitude Modulation (QAM) modulation, and M=8.

12. The rate matching and bit- interleaving method as claimed in claim 9 further comprising: converting M bit addresses in M parallel pointers to generate M word addresses, based on a word-width of the rate matching buffer; reading M words corresponding to M word addresses from the rate matching buffer, wherein the M words constitute encoded data; decoding M words to generate M bits; and rearranging the M bits based on the modulation order, to generate the M output bits.

13. The rate matching and bit-interleaving method as claimed in claim 9, wherein the rate matching buffer is configured to further include a built-in puncture region storing a predefined number of punctured information bits before the set of encoded bits, and wherein the set of encoded bits includes a set of filler bits defined by a filler region with start and end indices.

14. The rate matching and bit- interleaving method as claimed in claim 13 further comprising: incrementing the start index, the filler start index and the filler end index by the predefined number, before assigning bit addresses to the M parallel pointers.

15. The rate matching and bit-interleaving method as claimed in claim 14 further comprising: assigning to a parallel pointer of the M parallel pointers, an offset address from the filler region, when a bit address in the parallel pointer falls in the filler region during at least one of: initializing and incrementing the parallel pointer.

16. The rate matching and bit- interleaving method as claimed in claim 14 further comprising: assigning to a parallel pointer, an offset address from the built-in puncture region, when a bit address in the parallel pointer of the M parallel pointers falls in the built-in puncture region, during at least one of: initializing and incrementing the parallel pointer.

17. A non-transitory computer readable medium configured to store a program causing a computer to perform rate-matching and bit-interleaving for a down link transmit chain for data payload of a 5GNew Radio (NR), said program configured to: store a set of encoded bits outputted by a Low-density-parity-check code (LDPC) encoder in a rate-matching buffer; initialize M parallel pointers for reading encoded data from corresponding M bit addresses of the rate-matching buffer, in an initial clock cycle, based on a number E_r of rate matching bits, a modulation order Q_m in the down link transmit chain, and a start index RVi_d of the set of encoded bits in the rate-matching buffer; convert the M bit addresses in the M parallel pointers to M word addresses, based on a word-width of the rate matching buffer; read M words corresponding to the M word addresses from the rate matching buffer; decode the M words to generate M bits, at the end of each clock cycle; and increment bit address in each parallel pointer by M/Q_m, at end of each clock cycle, till each rate matching bit is decoded and outputted.