Disclosure of Invention
The invention aims to provide a polar code encoding method and an encoder capable of supporting a multi-code-rate multi-code-length high-performance common mode, and simultaneously, the polar code encoding method and the encoder are oriented to the 5G standard, greatly reduce redundant resources and ensure that the constructed encoding scheme realizes high performance, thereby meeting the requirements on transmission reliability and indexes in communication.
The invention provides a polar code encoder capable of supporting multi-code rate, multi-code length and high-performance common mode, which mainly comprises: a multi-core pipeline processing structure and an encoding core; the structure of the coding core comprises: the CRC encoding module, the FIFO, the first-stage matrix encoding module, the second-stage matrix encoding module and the butterfly encoding module;
the multi-core flow processing structure is used for coordinating the input and output time sequence of the encoding core, and comprises:
the priority Judger is a priority determination switch that determines which core should process input data of a certain time according to the priority.
The FSM Jump is an output state machine and is used for sequentially processing output requests of a plurality of cores, and when a core is in an input or output state, the processing and state Jump of the current state machine to the output requests are suspended.
The CRC encoding module is used for adding CRC check to information bits so as to support CA-SCL decoding, the module is realized by using a direct division circuit, and initial values of CRC _ REG of registers are all 1, as shown in FIG. 1;
the FIFO is used for buffering the information bits after the CRC is added, and is compatible with a large number of burst information bit inputs.
The first-level matrix coding module and the second-level matrix coding module are shown in fig. 2, the first-level matrix coding module realizes the operation of a first generation matrix, and the second-level matrix coding module realizes the operation of a second generation matrix; the left side is a first level matrix coding module, comprising:
the Reliability Order is a read-only memory, stores Reliability sequences corresponding to various code lengths, and selects and outputs corresponding Reliability sequences according to the input code length N.
The Bram is used for buffering and reading input code words, data read from the FIFO are stored in the Bram according to addresses in the reliability sequence, and the rest positions are filled with zero, and finally, N long input code words are filled.
FIRST ADDR LOOP is an address cycle state machine used to generate the address from which data is read from Bram and the address from which the row vector of the generator matrix is read.
The G vector generator is a row vector generator of the first generation matrix, configured to generate a row vector of a certain row of the first generation matrix, and calculate an element in the ith row and jth column using a formula G (i, j) & (i | (-j)).
The first register REG1 is a register for data processing, has an initial value of 0, and is used for performing the operation of the first xor module and storing the result, and after the operation of each code block is completed, outputs and clears the data.
The MUX is a 1-out-of-2 selector that selects whether the matrix selects by determining whether the data read by Bram exclusive ors the matrix row vectors with the data in the first register REG 1.
The right side is that the second part is second level matrix coding module, includes:
the second register REG2 is a data processing register with a bit width of NminAnd the initial value is 0, the first XOR module is used for partitioning to perform the operation of the second XOR module and storing the result, and the data is output and cleared after all the code block operations in the register are finished.
The G element generator is an element generator of the second generator matrix, configured to generate matrix elements in a row and a column of the second generator matrix, and calculates elements in the ith row and the jth column by using a formula G (i, j) & (i | (-j)).
ADDR LOOP is an address cycle state machine that controls the operation of the corresponding code blocks in the second register REG2 and generates the row and column addresses of the second generator matrix elements.
The MUX is a 1-out-of-2 selector, and determines whether or not the element selection matrix read by the G element generator exclusive ors the input data with the data of the corresponding block in the second register REG 2.
Wherein, the butterfly coding module includes:
the Bram2 computer is a data cache RAM for storing the input from the previous section and performing each layer of butterfly operations.
ADDR JUMP is a level JUMP state machine that controls the cross-layer change of two addresses of data read from Bram2 computer.
ADDR ADD-SELF is an address SELF-adding state machine used to control the change of addresses within a layer.
And the Bram2Cache is a final data Cache RAM, and after the operation is accepted, the results in the Bram2 computer are read one by one and cached to the Bram2Cache, so that the next round of operation can be started.
The invention provides a high-performance polarization coding method for 5G standard discontinuous communication, which comprises the following specific processes: firstly, different Input code words are Input into a First-in First-out queue FIFO (First Input First output) after being supplemented into an information sequence containing CRC through a CRC generation module according to the corresponding code length N. In the case of polarization process of N long code words without considering puncturing, it is known from the butterfly diagram that N-log is needed to pass through2The butterfly operation of N layers can be regarded as 2 for the operation process of the i-th layern-iThe result of the i-layer butterfly operation is aimed at the maximum value N of the code length in the multi-code length and multi-code ratemaxAnd a minimum value NminFirstly, one or more input codes with the length of N are carried outminThe butterfly polarization process operation is carried out, and the subsequent butterfly operation is determined according to the actual code length N, so the polarization coding method is divided into three parts, the first two parts realize the input code length N through a matrix structureminThe third part realizes the subsequent butterfly operation:
a first part:
s1, reading the information code words from the FIFO according to the reliability sequence stored in the ROM in advance and storing the information code words in the RAM to be filled with N long code words;
s2, dividing the N long code words into a plurality of code blocks, reading the number of the code blocks from the RAM bit by bit and simultaneously reading the row vectors of the matrix from the generating module of the first generating matrix when processing each code block;
s3, determining whether the bit read by the RAM1 is 1, selecting whether to perform bit-wise xor on the row vector of the read matrix and the data result stored in the first register REG1 whose initial value is all zero, and storing the obtained value back to the first register REG 1;
s4, after the operation of each code block is finished, the value in the first register REG1 is output to the second part and cleared to 0, and then the polarization encoding of the next code block is started.
A second part: the results of the first part are xored according to the elements of the second generator matrix.
Regarding each code block of the first part as a node of the butterfly graph encoded by the polarization code, the subsequent butterfly graph polarization process can be considered as performing corresponding multiplication with the second generator matrix.
S1, storing the input from the first part into a buffer area;
s2, reading elements one by one from corresponding rows of the second generator matrix;
s3, judging whether the bit read by the matrix row is 1, selecting whether to carry out bitwise XOR on the data in the buffer area and the data result of the corresponding area in the second register REG2, and storing the obtained value back to the corresponding area in the second register REG2 again;
and S4, after traversing and operation of each pair of second generating matrixes are finished, outputting the value in the second register REG2 to the third part and clearing, and then starting polarization coding of the next code group.
And in the third part, subsequent butterfly polarization is carried out according to the actual code length N.
S1, the input code length from the second part is NminAccording to the actual code length N, if N is equal to NminStoring one of the inputs into RAM if N > NminStoring a plurality of the inputs into the RAM;
s2, after all input and storage are finished, selecting different levels to start corresponding operation according to the actual code length N;
s3, reading out the data in the RAM in pairs according to the hierarchy rule, performing XOR operation, and storing the data again, and performing the operation of the next hierarchy after all the operations of each hierarchy are finished until all the hierarchies are operated;
s4, in the non-output state or the non-request output state, all the results are read and stored in another RAM, and a high level of the output request is given, and data is output after the output request is answered.
The fourth part is to realize high throughput, the invention provides a multi-core pipeline processing structure, each core internal structure consists of a CRC (cyclic redundancy check) coding module, an FIFO (first in first out), a first-stage polarization code coding module, a second-stage polarization code coding module and a butterfly coding module, and multi-core sequential input is realized by setting priorities among a plurality of cores; the output requests of the cores are processed according to the priority order through the state machine, the cores which are requested to be processed preferentially emit idle signals and input data preferentially, virtuous circle is realized, and input and output disorder caused by inconsistent processing time of multi-code-length multi-code-rate code words is effectively avoided.
Preferably, in the CRC generation module, a uniform CRC generation structure is used, and for different N input codewords, this is achieved only by adjusting the part using the structure and the parameters in the structure. And the CRC with different lengths corresponding to different N is generated into a structure common mode, so that the space is saved.
Preferably, when the reliability sequence in the first part is coded for a polar code with a length not greater than 1024 codes, the reliability sequence is consistent with the 5G standard, and for a polar code with a length greater than 1024 code words, the reliability sequence obtained by corresponding Monte Carlo simulation is used. In practical implementation, reliability sequences of 1024 code lengths are extracted, reliability sequences of 128 code lengths, 256 code lengths and 512 code lengths are extracted, redundancy of traversing reliability sequences of different code lengths is eliminated, and configuration time is shortened.
As an optimization, the input code length is N in the first part and the second part through a matrixminIn the polarization process, one-time matrix splitting is adopted, and a primary generated matrix is replaced by a two-stage generated matrix, wherein the dimensionality of the two-stage generated matrix is N1And N2The relationship satisfies N1×N2=NminAnd N is1≥N2Resources for generating/storing the generator matrix can be reduced, and N is required for the first-stage matrix module to generate one output1N is needed for the second-level matrix module to complete the operation of a certain input2A clock being satisfied with N1≥N2Under the condition of (3), the operation time of the two stages can be almost completely riveted, so that the coding time is not wasted.
Optimally, in the process of carrying out subsequent butterfly polarization according to the actual code length N in the third part, the multiplexing of the butterfly graph is used, and the code length N is constructed in a layered modemaxSubsequent butterfly ofAnd the shape structure is formed, different hierarchies are selected for inserting the actual code length N, and the subsequent operation is carried out. Meanwhile, serial operation is adopted for operation of each level, and only two bit widths are carried out at each time and are NminThe XOR operation of the data makes full use of the code length NmaxThe large butterfly structure is converted into multi-step single exclusive OR operation, and resources are greatly saved.
As optimization, in the process of realizing the multi-core pipeline processing structure, the core priority is introduced, and when the multi-core is idle, data is preferentially input from the core with the higher priority and is processed. As shown in fig. 6 and 7, after each core processes data, the data is not output immediately but an output request is sent to wait for processing, for the output requests of different cores, a state machine is set to process the output request of each core in sequence according to priority, and after the output request of each core is processed, an idle signal is sent out and data is output, so that a new data stream is input again according to the priority sequence and processed, a virtuous cycle is realized, and the output can be output in sequence according to the input sequence under the input conditions of different multi-code length and multi-code rate.
The invention relates to a polar code coding method and a coder capable of supporting a multi-code-rate multi-code-length high-performance common mode, which have the advantages of:
1. the FPGA hardware resource consumed by the encoder is extremely low, and the comprehensive result of the xcklu 035-ffva1156-2-i chip by using the Xilinx Vivado 2020.2 comprehensive tool is shown in the table 1;
2. the polar code encoder has high throughput rate, and when a code word with the code length of N and the code rate of R is processed, the information bit throughput rate and the time delay of the polar code encoder under a 180MHz clock are shown in a table 1; table 1 shows the resource consumption and coding delay data implemented by the encoder of the present invention on the FPGA.
TABLE 1
Detailed Description
The following further describes the embodiments of the present invention with reference to the drawings.
Let the input code length of the polar code encoder be N and the maximum value of the code length be N
maxMinimum value of N
min. The number of information bits K, the length of the divided sub-polarization code block is N
1,N
1N can be divided evenly; n bit width log for depth of polarization code reliability sequence information
2Index table of N
Is shown, e.g. as
Indicating that the k-th reliable position is j after the input positions are ranked according to reliability.
First, the input code word is compensated to the length of the input coding module as K by the CRC generation module of fig. 1 according to the CRC length and the generator polynomial in table 21And K is equal to1The output result of the bits is input to a FIFO for buffering. Table 2 shows the function of the interface signal of the polar code encoder designed by the present invention.
TABLE 2
The second step is that: primary matrix coding on the left side of fig. 2:
traversing the reliability sequence, sequentially reading data from the FIFO, sequentially storing the data into the Bram according to the address of the reliability sequence, after all information bits are stored, performing zero filling operation on the subsequent address, and supplementing the input code word into an N long code word.
After the above steps are completed, dividing the input code word code with N code length into
Group N
1Code length code groups, each of which is from
bit 1 to N
1Reading bits from the Bram one by one, reading row vectors at the same positions from a G vector generator, selecting a first register REG with all zero initial values and the row vectors to carry out exclusive OR operation or keep the original values according to whether the data read from the Bram is 1, and storing the operation result in the first register REG again. And outputting the data in the first register REG and clearing to zero after one code group is completely operated, and performing operation of the next code group. Output data bit width is N
1The time interval between every two outputs is N
1 clk。
The third step: the second matrix encoding on the right side of fig. 2:
every N inputs from the second step2Each is quadratic matrix coded for a code block, where Nmin=N1*N2If the value of the element in the j-th column is 1, the element in the second register REG is set to [ N [, ] N1*j-N1:N1*j-1]Is exclusive-ORed with this input and the result is restored in the second register REG N1*j-N1:N1*j-1]Location. For each input, N is required2Each clock passing N1*N2A clock capable of completing one traversal of the generated matrix and outputting a bit width of Nmin=N1*N2To output of (c). After each output, the second register REG is cleared and the operation and traversal of the next code group and matrix are restarted. Co-generating N/NminBit width of NminTo output of (c).
The fourth step: and (3) all the input from the third step is buffered in the Bram2 computer, and after all the input is buffered, each data in the RAM is regarded as one unit of a butterfly according to the actual N, and the corresponding hierarchy is selected to be inserted and operated. Different levels have different inner layer jumping and self-adding rules, and the operation structure is (r) -layer (r) operation from left to right according to the graph shown in fig. 4 and table 3, and if N is N, N ismaxThe operation is started from the layer I, two adjacent numbers of the addresses are subjected to XOR operation, the result is stored in the previous address, when the lowest bit of the next address is 1, inner-layer address jump occurs, the two addresses are added by 2, and when the next address reaches Nmax/NminThen, the process skips to the second floor and so on. If N is equal to NmaxAnd/2, the operation is started from the layer II, and so on. After all operations are finished, if the current core does not output a request or output, caching data in the Bram2 computer into the Bram2Cache one by one, releasing a high level signal of the output request dout _ request, and pulling down the dout _ request signal and serially outputting the data in the Bram2Cache until a dout _ permit high level signal of a top module is obtained.
The fifth step: the top level file processes the input and output of each Core according to priority, such as 4 coding cores shown in fig. 5, 6 and 7, the Core priority is Core0, Core1, Core2 and Core3 from high to low, and when a plurality of cores are idle, the cores with high priority and idle are input preferentially. After each Core finishes processing data, the data is not output immediately but an output request is sent to wait for processing, aiming at the output requests of different cores, a state machine is arranged to process the output request of each Core from Core0 to Core3 in sequence according to the priority, and after the output request of each Core is processed, an idle signal is emitted and data is output, so that a new data stream is input again according to the priority sequence and processed, a virtuous cycle is realized, and the output can be output in sequence according to the input sequence under the input conditions of different multi-code length and multi-code rate. Meanwhile, the phenomenon of input error cores is avoided, namely when two cores process data streams with different N difference, the core with high priority finishes processing the short data stream and outputs the short data stream completely, and the core with low priority is still in the input process of the data stream, at the moment, because the core with high priority is idle, the data stream input to the core with low priority is interrupted and input to the core with high priority. When the data input enable din _ en is in a high level, the state machine pauses the input request and the state jump, and only after the input is finished, the core output which is currently requested to be output can be allowed. Table 3 is a description of the address calling and jumping logic of each layer in the butterfly coding module designed by the present invention.
TABLE 3
Below with Nmin=128,Nmax2048 for an example implementation, set N1=16、N2N1024 and K512 are used as serial inputs of the codeword to be encoded, which specifically explains the encoder:
the first step is as follows: since N is 1024, the CRC length in the CRC coding module is first configured to be 16, the generator polynomial Gx is 17' b10001000000100001, and then the serial input codeword to be coded is subjected to CRC code generation by the division circuit of the CRC coding module. In the division operation process, the CRC coding module directly outputs the input code words to be coded in sequence, after all the code words to be coded are input, the division operation is finished, and the result in the CRC _ REG is shifted and output in sequence. The outputs of all CRC encoding modules are stored in a FIFO.
The second step is that: after CRC coding is completed, a first-level matrix coding module selects a corresponding reliability sequence according to N to traverse, an address is obtained from high reliability to low reliability, if the traversed address is smaller than N, an information bit code word of one bit is read from FIFO and stored in Bram according to the traversed address, after all information bits are read, the reliability sequence is traversed continuously, according to the obtained address, 0 is stored at the corresponding position of Bram, and after N is 1024 clocks are traversed, K information bits and a CRC coding result are stored in Bram to be complemented into N is 1024 long codes to be coded. After completion, the entire coding core enters a non-idle state.
Thirdly, dividing 1024 code length input code words in the Bram into 64 code blocks, processing the code blocks in sequence, reading the number of the code blocks one by one, and reading row vectors from a G vector generator, wherein the G vector generator reads the row vectors of the first generation matrix of the 16 multiplied by 16 polarization codes according to the input address. Firstly, reading the number from the address 0 to the address 15, which is the first code block in the Bram, in sequence, firstly, reading the number with the address 0, which is the first number in the code block, simultaneously, reading the first row vector of the first generation matrix, selecting a first register REG with the initial value of all zero and the row vector to carry out exclusive OR operation or keep the original value according to whether the data read from the Bram is 1, storing the operation result in the first register REG again, then, reading the number with the address 1 of the Bram and the row vector of the second row of the first generation matrix to continue the operation until the 16 number of the code blocks and the 16 row vectors of the first generation matrix are all read and operated, outputting the data in the first register REG to a second-level matrix coding module, and clearing to carry out the operation of the next code block. The output data bit width is 16, with a time interval between each two outputs of 16 clk.
And fourthly, the second-level matrix coding module sequentially processes the input from the first-level coding module, simultaneously reads the G element generator read matrix elements, and performs exclusive-or operation on the matrix elements and the corresponding part in the second register REG, wherein 8 inputs from the first-level coding module form a group, and the output is output after the processing is finished. The G element generator reads the elements of the second generator matrix of the 8 × 8 polar code according to the input address, the second register REG is a register with a bit width of 128, the register is divided into 8 parts, each part has a bit width of 16, and the registers participate in the operation. When receiving the input of the first-level coding module, sequentially reading 8 elements of a first row in a second generating matrix, firstly reading the elements of a first row and a first column in the second generating matrix, selecting whether to carry out XOR operation on the input and a first part (0: 15) in a second register REG according to whether the elements are 1, and storing the result in the first part (0: 15) again. The elements of the first row and the second column of the second generator matrix are then read, and depending on whether the elements are 1, it is selected whether to xor the input with the second part of the second register REG, i.e. [16:31], and to restore the result to the first part, i.e. [16:31 ]. And repeating the steps until all the elements in the first row are read and operated, wherein 8 clk are spent, and the input time interval from the first module is 16clk, so that the second-level matrix coding module enters a waiting state until the next input from the first-level matrix coding module reads the elements in the second row of the matrix, and the operation is continued. And after 8 inputs are received and the operation is finished, the traversal of the matrix is finished, the data in the second register REG is output to the butterfly coding module and cleared, and the operation of the next group of 8 inputs is carried out. The bit width of the output data is 128, and 8 groups of data are output in total.
Fifthly, the butterfly coding module stores all the input from the second-level matrix coding module into the Bram2 computer, and after all the input is stored, operation is started. Since N is 1024, the operation is started from the second layer, that is, the data is read from the address addr1 being 0 and the address addr2 being 4, and the numbers read from the two addresses are subjected to bitwise exclusive-or operation and then stored again in the address addr1 being 0. Next, the two addresses are added by 1, data is read from the address addr1 of 1 and the address addr2 of 5, and the above steps are repeated. And jumping to the inner layer of the 2 nd layer until the last three bits in the binary expression of addr2 are all 1, namely addr2 is equal to 7, adding 5 to addr1 and addr2, continuing the reading and operation steps until addr2 is equal to 15, finishing the operation of the 2 nd layer, jumping to the 3 rd layer, and continuing to operate according to the logic of table 3 until all the outer layers are operated. And if the arithmetic core does not carry out the request output state or the output state of the previous frame data, sequentially reading and storing the data in the Bram2 computer into the Bram2Cache, and entering the request output state. And when the output state is answered, sequentially reading the numbers in the Bram2Cache, and outputting the shift, wherein the encoding core enters an idle state.
And sixthly, for the coding cores from the first step to the fifth step, the top-level file simultaneously coordinates the time sequence of 4 identical coding cores. Firstly, numbering the 4 coding cores from high to low according to priority levels to be a first code and a second code, and when the first code is idle, inputting data only from the first code; when the first kernel is not idle and the second kernel is idle, data is only input from the second kernel; and so on. And after the core operation is finished, sending an output request signal to a top file, and judging whether a coding core is in a data input state or a data output state by the top file. If no coding core is in the state, the output requests of the cores are processed according to the sequence of the first kernel to the fourth kernel, the initial state processes the requests of the first kernel preferentially, and the output requests of the second kernel can be processed only after the output requests of the first kernel are processed and all data of the first kernel are output, and so on until the output requests of the fourth kernel are processed and all data of the fourth kernel are output, the output requests of the first kernel can be processed. By thus looping back and forth, output requests of 4 cores are processed and data thereof is output.