CN104617959B

CN104617959B - A kind of LDPC coding and decoding methods based on general processor

Info

Publication number: CN104617959B
Application number: CN201510026526.1A
Authority: CN
Inventors: 牛凯; 贺志强; 张竟意
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2015-01-20
Filing date: 2015-01-20
Publication date: 2017-09-05
Anticipated expiration: 2035-01-20
Also published as: CN104617959A

Abstract

This application discloses a kind of LDPC coding methods, it is determined that vector p₁And p₂, and obtain coding result vector；It is determined that vector p₁And p₂Shi Renyi matrixes include with any vectorial processing that is multiplied：Using every a line of any matrix as a thread, carry out the matrix corresponding line and be multiplied with any vectorial, and the multiplied result of all rows is constituted into result vector；Every a line of any matrix includes with any vectorial multiplication operations：Determine the corresponding vectorial original positions of each element j of the row of matrix i-th, by it is any vector in from the original position Z A_i,jThe data of length are shifted left by way of single-instruction multiple-data stream (SIMD), and the preceding A that original position is started_i,jThe data of length are moved to after the data after shifting left, and obtain the corresponding vector shift results of element j；Again by the vector shift results added of each element.By the above method, using multithreading and SIMD processing, coding rate can be improved in general processor.

Description

A kind of LDPC coding and decoding methods based on general processor

Technical field

The application is related to LDPC coding and decoding technologies, more particularly to a kind of LDPC coding and decoding methods based on general processor.

Background technology

LDPC code, is a kind of larger linear block codes of code length.Its check matrix is also larger, and non-in check matrix Seldom, i.e., the number of " 1 " is seldom for neutral element, therefore claims low-density.

During IEEE 802.11n WLAN host-host protocols are realized, LDPC coding and decoding technologies need to be used, According to protocol requirement, wherein LDPC PPDU (Presentation Protocol Data Unit, presentation protocol data unit) Generating process it is as follows, referring to Fig. 1：

(1) calculate and shorten bit

(1a) calculates available bit number N_avbits, formula is：

N_pld=length × 8+16,

Wherein, if STBC (Space-time block code) precoding, then flag bit m_STBCIt is otherwise 1 for 2； N_CBPSRepresent the number of coded bits of each symbol；Length represents PSDU's (presentation Service DataUnit) The byte number of byte number, as information bit position；N_pldRepresent PSDU and SERVICE FIELD total bit number；R represents to compile Code code check.

(1b) calculates LDPC code word number N_CWWith code length L_LDPC

Work as N_avbitsWhen≤648, code word number N_CWFor 1, and if N_avbits≥N_pldDuring+912 × (1-R), code length L_LDPCFor 1296, otherwise code length L_LDPCFor 648；When 648<N_avbitsWhen≤1296, code word number N_CWFor 1, and if N_avbits≥N_pld+ During 1464 × (1-R), code length L_LDPCFor 1944, otherwise code length L_LDPCFor 1296；When 1296<N_avbitsWhen≤1944, code word number N_CWFor 1, now code length L_LDPCFor 1944；When 1944<N_avbitsWhen≤2592, code word number N_CWFor 2, and if N_avbits≥N_pld+ During 2916 × (1-R), code length L_LDPCFor 1944, otherwise code length L_LDPCFor 1296；Work as N_avbits>When 2592, code word number N_CWForNow code length L_LDPCFor 1944；

(1c), which is calculated, shortens bit number N_shrt, shorten bit and be filled into information bit position before LDPC codings Afterwards：

N_shrt=max (0, (N_CW×L_LDPC×R)-N_pld)

Work as N_shrtWhen=0, operated without mending 0.Work as N_shrt>When 0, shorten bit in all N_CWAverage mark in individual code word Cloth, i.e., each code assignment to shortening bit number beIf N_shrtmodN_CW≠ 0, wherein mod is Remainder, i.e. N_shrtTo N_CWRemainder, then first code word more than other code words one shortening bit.

(2) LDPC codings are carried out, check bit position is obtained.

(3) abandon and shorten bit

(4) calculate punching bit position number and abandon punching bit position, punching bit after LDPC codings is calculated according to following formula Position number N_punc：

N_punc=max (0, (N_CW×L_LDPC)-N_avbits-N_shrt)

IfOr (N_punc>0.3×N_CW ×L_LDPC× (1-R)), increase N_avbitsThen N is recalculated according to following formula_punc：

N'_avbits=N_avbits+N_CBPS×m_STBC, N_punc=max (0, (N_CW×L_LDPC)-N'_avbits-N_shrt)

Punching bit position is in all N_CWBe evenly distributed in individual code word, i.e., each code assignment to punching bit position number beIf N_puncmodN_CW≠ 0, wherein mod are remainder, i.e. N_puncTo N_CWRemainder, then first code word is than other The many punching bit positions of code word.

(5) repetition bits position is calculated, repetition bits position number N is calculated according to following formula_rep：

N_rep=max (0, N'_avbits-N_CW×L_LDPC×(1-R)-N_pld)

Repetition bits position is in all N_CWBe evenly distributed in individual code word, i.e., each code assignment to repetition bits position number beIf N_repmodN_CW≠ 0, wherein mod is remainder, i.e. N_repTo N_CWRemainder, then first code word is than other The many repetition bits positions of code word.Repetition bits position order since first bit of information bit position is chosen, until meeting Length requirement, repetition bits position is replicated from the code word after the shortening bit removed.The repetition bits position order selected It is connected to after check bit position.When needing punching, check bit position need not be repeated, and vice versa.

In LDPC PPDU generating process, LDPC coding methods are mostly important, the code word exported after LDPC is encoded to Amount is designated as c=(S, p₁,p₂), wherein S is information vector, p₁And p₂Vector is verified for code word, but because of the check matrix H of LDPC code Larger, the computing in cataloged procedure will be very cumbersome.The check matrix provided in observation agreement is as can be seen that under different code check R Matrix its row be 24 again, its row is 24 × (1-R) again, according to the characteristic of check matrix H, is carried out following piecemealBe divided into matrix A, matrix B, matrix D, matrix E, six submatrixs of matrix T and matrix F, wherein matrix B, The structure of matrix D, matrix E and matrix T is more special, B=(1-- ... 0- ...)^T, D=(1), E=(- ... -0),The structure irregularities of matrix A and matrix F, referring to agreement 802.11n.Further, since check matrix H is advised Mould is larger, therefore, when representing check matrix, by a submatrix in an actual check matrix of element representation, specifically Ground, in check matrix H and matrix in block form A, B, D, E, T, F method for expressing, "-" represents that the submatrix is null matrix, " 0 " It is unit matrix to represent the submatrix, and " constant C " represents that the submatrix is the result square behind unit Matrix C time ring shift right position Battle array.Wherein the dimension of submatrix is Z*Z, and Z can be determined previously according to code length.By the above-mentioned means, verification can be greatly reduced The expression size of matrix and each matrix in block form.

On the premise of known check matrix H and information vector S, the concrete mode for determining codeword vector c is：According to verification Equation Hc^T=0^TCan score solve equationIt can obtain after optimizationObtain p₁And p₂Afterwards, you can obtain codeword vector c=(S, p₁,p₂)。

With reference to above-mentioned LDPC cataloged procedures encoder composition referring to Fig. 2, wherein containing 4 kinds of functional modules：Encode square Battle array maker, matrix multiplier, matrix adder and LDPC code word synthesizer.

6 pre-encoding matrix generators are had in the encoder, its input is a matrix, is matrix A, matrix B, square respectively Battle array D, matrix E, six submatrixs of matrix T and matrix F, its output is a matrix, is encoded matrix generator processing Matrix afterwards, its function be by the matrix of input compress storage by way of, i.e., matrix non-zero element is only deposited, by input matrix Enter line translation, obtain output matrix.

6 multipliers are had in the encoder, it has two inputs, one output end, and two inputs are information respectively Vectorial S, matrix A, handled by pre-encoding matrix generator after matrix, the matrix of consequence by other multipliers or by adding Two in matrix of consequence after musical instruments used in a Buddhist or Taoist mass, its output is a matrix, be two inputs carry out the result after multiplying to Amount, its function is that two inputs are carried out into matrix multiplication operation and output result matrix.

The encoder has 2 matrix adders, and its input is two matrixes, is matrix multiplier output in encoder Matrix afterwards, its output is a matrix, is that two input matrixes carry out the result after addition of matrices, its function is by two inputs Matrix carries out addition of matrices and output result matrix.

Have 1 LDPC code word synthesizer in the encoder, its input is three vectors, be information vector S, code word verify to Measure p₁With code word verification vector p₂, its output is a vector, is codeword vector c, and its function is by information vector S, code word verification Vectorial p₁With code word verification vector p₂Three vector synthesis codeword vector c=(S, p₁,p₂) and codeword vector c.

Above-mentioned is that existing LDPC coding methods and corresponding encoder are constituted.In receiving terminal, in addition it is also necessary to reception LDPC code word enters row decoding, the information vector rebuild.Existing LDPC decoding techniques, its key step is as follows：

(1) M checkpoint is divided into M_bLayer, every layer includes T check-node.Next, one layer of order for connecing one layer is held Row decoding process.The information of check-node and variable node is calculated in first layer processing procedure, first layer decoding process terminates Afterwards, the second layer uses the information of the variable node obtained from first layer to be initialized, and so on；

(2) initialize：With LLRs (log-likelihood ratios, i.e.,Information to variable nodeValue initialized, and will be all Check-node information0 is set to, the iterations of decoding algorithm is I, and iterative process is carried out by row, wherein minimum With the n ∈ N in algorithm_mRepresent check matrix prototype H_bIn [H_b]_m,nThe row of ≠ '-'；

(3) minimize：Variable node vector q_nRing shift right position (shift count S (m, n)=[H_b]_m,n) subtract verification Nodal informationBe there is into vector t in result_nIn, according to OMS (offset min-sum, i.e.,Value reuse characteristic, it is only necessary to calculate vector in element minimum value And sub-minimum；

(4) minimum value is selected：To n ∈ N_m, calculate and update q_nWithValue.

To realize above-mentioned interpretation method, existing decoder is made up of 4 parts referring to Fig. 3, respectively initializes decoder Unit, minimum value and sub-minimum selecting unit, data truncate unit and cycle shift unit.

Initialization translator unit in the decoder, its input is a LDPC test matrix, and it is a warp that it, which is exported, The test matrix crossed after initialization decoding unit processing, its function is to be stored test matrix according to decoder input requirements Change and export the test matrix after processing.

Minimum value and sub-minimum selecting unit in the decoder, its input is two matrixes, and one of them is by first Test matrix after the processing of beginningization translator unit, another is LDPC code word matrix c, i.e., after wireless channel is transmitted LDPC code word matrix c, it is a matrix after minimum value and the processing of sub-minimum selecting unit that it, which is exported, and its function is to calculate The minimum value of the difference gone out between variable node ring shift right position and check-node and sub-minimum and output result matrix is supplied to Data truncate unit and cycle shift unit.

The data of the decoder truncate unit, and it is an output after minimum value and the processing of sub-minimum selecting unit that it, which is inputted, Matrix, it is a matrix after data truncate cell processing that it, which is exported, and its function is to prevent check-node information Overflow, data truncation processing is carried out to it, and output result matrix is supplied to cycle shift unit.

The cycle shift unit of the decoder, its input is two matrixes, and one of them is truncated after cell processing by data The matrix of output, another is the matrix exported after minimum value and the processing of sub-minimum selecting unit, and it is a process that it, which is exported, Matrix after cycle shift unit processing, its function is added by the way that minimum value matrix and check-node matrix are carried out into step-by-step mould two Calculate variable node matrix, and output variable node matrix equation.

As described above, the coding and decoding of LDPC code is theoretical more ripe at present, but because LDPC code is that a kind of code length is larger Linear block codes, check matrix is also larger, and algorithm complex is very high, and traditional LDPC coding and decoding modes are not well positioned to meet The throughput requirement of IEEE 802.11n systems, has largely had influence on the performance of system.Existing high speed wireless access system The realization of LDPC code is mostly based on FPGA (Field-Programmable Gate Array, field programmable gate array) in system Chip and DSP (Digital Signal Processor, Digital Signal Processing) chip.Although can be met by previous methods Processing and the requirement of time delay in Modern High-Speed protocol of wireless local area network, but FPGA programmings and specialty DSP are more complicated, lack Abundant programmed environment and debugging acid, applicability are general.

The content of the invention

The application provides a kind of LDPC coding and decoding methods based on general processor, can on aageneral-purposeaprocessor efficiently Realize LDPC coding and decodings.

To achieve the above object, the application is adopted the following technical scheme that：

A kind of LDPC coding methods based on general processor, including：Obtain to be encoded by signal acquisition or reception Signal vector S, determines check matrix H and its matrix in block form A, B, D, E, F and T, and preserved；According toIt is determined that vector p₁And p₂, and obtain LDPC coding result vector c=(S, p₁,p₂)；Wherein, institute State determination vector p₁And p₂Shi Jinhang any matrix includes with any vectorial processing that is multiplied：

Using every a line of any matrix as a thread, carry out the corresponding line of the matrix with it is described any vectorial Multiplication operations, and the multiplied result of all rows is combined into composition result vector；

Wherein, every a line of any matrix includes with any vectorial multiplication operations：Determine matrix current i-th The capable corresponding vectorial original positions of each element j=any vectorial original position+A_i,j+ (j-1) * Z, described will appoint In one vector from the original position Z-A_i,jThe data of length are moved to left by way of single-instruction multiple-data stream (SIMD) SIMD Position, and the preceding A that the original position is started_i,jThe data of length are moved to after the data after shifting left, and obtain the element j Corresponding vector shift result；Again by the corresponding vector shift results added of each element, described every a line and described are used as The multiplied result of one vector；

In the mode of the SIMD, will from the original position Z-A_i,jThe data of length are divided into units of length WSection is rightSegment data carries out shift left operation parallel, then by remaining (Z-A_i,j) modW length data Carry out shift left operation；

The submatrix size that Z represents for an element in the check matrix.

It is preferred that when any matrix is T^-1When, the T^-1Every a line and the multiplication operations of corresponding vector when, only Carry out T^-1Value is multiplied for 0 element with corresponding vectorial, the value is obtained for the corresponding vector shift result of 0 element, by it The corresponding vector shift result of remaining element is set to null vector；Again by the corresponding vector shift results added of each element, as Described every a line and any vectorial multiplied result.

It is preferred that it is valid data to carry out Z data before being taken after shift left operation simultaneously to W segment datas.

It is preferred that described include the corresponding vector shift results added of each element：By the corresponding vector of each element Shift result is divided into units of length WSection, passes through SIMD pairsSegment data carries out addition behaviour parallel Make, then by remaining (Z-A_i,j) modW length data carry out phase add operation.

It is preferred that the matrix A, B, D, E, F and T^-1Preserved by linear search table.

A kind of LDPC interpretation methods based on general processor, including：Encoded LDPC code word signal c is received, it is determined that Check matrix H；Variable node vector q is calculated by successive ignition and is used as decoding result, during each iteration, according to current variable Knot vector q and check-node vector r calculates temporary variable vectorAnd according to temporary variable vector t Check-node vector r is updated, updating variable node vector q further according to check-node vector r and temporary variable vector t isDuring first iteration, using character signal c as variable node vector q, verification knot vector r is set to 0；Its In,

Every time when iterative calculation temporary variable vector t, check-node vector r and variable node vector q, with check matrix Carry out computing and renewal as thread per a line, obtain with call number in often capable corresponding vectorial t, q and r fromArriveSubvector；Wherein, i is the line index of check matrix, I-th row of the correspondence check matrix calculates temporary variable vector t, check-node vector r and the corresponding sons of variable node vector q When vectorial, according to each non-"-" element H of the check matrix row_i,jWith element H in correspondence calculating vector t, q and r_i,jIt is corresponding Call number fromArriveSubvector, then enter successively Row connection is obtained and often capable corresponding subvector, during i=1, order

Calculate and H_i,jThe mode of corresponding temporary variable vector t subvectors is：Determine H_i,jCorresponding vectorial original position Z*(n-1)+H_i,n, original position described in the corresponding vectorial q subvectors of the i-th row is risen into length isOr 6 data Copied to by way of SIMD and H_i,jThe beginning of corresponding temporary variable vector t subvectors；In H_i,n≠0、H_i,n≠ '-' and (Z-H_i,n) modW ≠ 0 when, determine matrix M_{LdpcAssemble1}In with check matrix element H_i,jThe value of each element in correspondence rowAnd will be with element H_i,jCall number in corresponding current vectorial q subvector ForEach element copy to successively and H_i,jCorresponding temporary variable to On the current location for measuring t subvectors；Each element H is determined again_i,jCorresponding secondary vector original position M_LdpcOffset2, will be described Secondary vector original position plays lengthData copied to by way of SIMD and H_i,jCorresponding temporary variable to On the current location for measuring t subvectors；Take and H_i,jPreceding Z and the conduct that takes absolute value in corresponding temporary variable vector t subvectors With H_i,jCorresponding temporary variable vector t effective subvector；

Work as H_i,n≠0、H_i,n≠ '-' and (Z-H_i,n) modW ≠ 0 when, Work as H_i,n=0 or H_i,n='-' or (Z-H_i,n) modW=0 when, (M_LdpcOffset2)_i,j=Z* (n-1)； K is that general processor once can processing data amount size, the fundamental unit size that k is handled for SIMD；Code length L_LDPCWhen=648, LdpcRemain=11；As code length L_LDPCWhen=1296, LdpcRemain=6；As code length L_LDPCWhen=1944, LdpcRemain =1；Indexes of the j for each non-"-" element in the i-th row in all non-"-" elements of the row, n is j-th of non-"-" member of the i-th row Column index of the element in check matrix.

It is preferred that calculating is with the mode of the often capable corresponding check-node vector r subvectors of check matrix：

The corresponding temporary variable vector t subvectors that will often be gone with check matrix are write as V_{LdpcRowLength}(v) row and The matrix T of row_v, wherein, the matrix T_vEach behavior described in temporary variable vector t subvectors with element H_i,jCorresponding son Vector, carries out cover when columns is inadequate；

To the matrix T_vMost value distribution is carried out, most value variable vector m subvector matrixes M is obtained_v；

According to the matrix T_vCalculate intermediate variable vector s subvector matrixes S_v；

According to the matrix M_vWith the matrix S_vMiddle index value identical element, determines an intermediary matrix R_v' in corresponding rope Draw the element value of value；Wherein, if matrix S_vIn either element be less than 0, then take the complement of the either element and any with this Element is added, and will add up result as matrix R_v' in value with the either element index value identical element；If matrix S_v In either element be equal to 0, then the either element is added with 0, will add up result as matrix R_vIn with the either element The value of index value identical element；If matrix S_vIn either element>0, then in matrix M_vIn take and the either element index Value identical element is added with the either element, will add up result as matrix R_vIn it is identical with the either element index value Element value；It is described compare carried out with the operation being added by way of SIMD；

By SIMD modes by the matrix R_v' and matrix T_vMiddle index value identical element subtracts each other, and regard result as verification Knot vector r subvector matrixes R_vThe element value of middle same index value；By the matrix R_vIn often row preceding Z element according to The mode of row major sequential reads out composition check-node vector r subvectors.

It is preferred that the progress most value distribution includes：

The matrix T is determined by way of SIMD_vIn each row minimum value and sub-minimum and minimum value it is corresponding Line index；Obtained minimum value and sub-minimum are modified, default correction value β is subtracted, when revised minimum value and When sub-minimum is less than 0,0 is set to, otherwise keeps constant；

According to the matrix T_vIn each row current minimum value, sub-minimum and the corresponding line index of minimum value, construction most It is worth variable vector m subvector matrixes M_vThe row of middle same index, wherein, in M_vEither rank in, will phase corresponding with current minimum value Element with line index is set to the minimum value determined, remaining element is set into sub-minimum.

It is preferred that the minimum value and sub-minimum that each row are determined by way of SIMD and corresponding line index Mode includes：

By the matrix T_vEach row element be divided intoIndividual sub-block, each sub-block includes W base unit；Comparing The matrix T_vIn any two row element when, W base unit is disposably compared by way of SIMD.

It is preferred that the calculating intermediate variable vector s subvector matrixes S_vIncluding：

For matrix T_vIn each row, by the row all elements carry out xor operation, then by result with i-th ' row member Carry out or operate with 0x7f after plain XOR, incite somebody to action or operating result is used as intermediate vector matrix S_vI-th ' row member of middle same index row Element；Wherein, by the matrix T_vEach row element be divided intoIndividual sub-block, each sub-block includes W base unit, is carrying out When XOR/or operation, the XOR/of W base unit is disposably performed by way of SIMD or is operated.

It is preferred that calculating and H_i,jCorresponding variable node vector q subvectors include：

Determine H_i,jCorresponding vectorial original position Z* (n-1)+H_i,n, by SIMD modes by H_i,jCorresponding temporary variable Vectorial t subvectors and H_i,jCorresponding check-node vector r subvectors are added, and original position described in result vector is risen into length ForOr 5 data are copied to and H by way of SIMD_i,jThe beginning of corresponding variable node vector q subvectors； In H_i,n≠0、H_i,n≠ '-' and (Z-H_i,n) modW ≠ 0 when, determine matrix M_{LdpcAssemble1}In with check matrix element H_i,jCorrespondence The value of the element of each in rowAnd will be with element H_i,jIt is corresponding when it is preceding to Measuring call number in q subvector isEach element copy to successively with H_i,jOn the current location of corresponding variable node vector q subvectors；

It is determined that each element H_i,jCorresponding secondary vector original position M_LdpcOffset2, by the secondary vector original position Rise length be 0 orData copied to by way of SIMD and H_i,jCorresponding variable node vector q subvectors On current location；

The cover number indicated according to LdpcRemain, according to M_{LdpcAssemble1}In with check matrix element H_i,jCorrespondence row The value of middle element carries out cover.

It is preferred that precalculating and preserving each element H_i,jCorresponding vectorial original position Z* (n-1)+H_i,nWith second to Measure original position M_LdpcOffset2, matrix M_{LdpcAssemble1}, the vector that constitutes of the number of non-"-" element of often going in check matrix V_{LdpcRowLength}、M_{LdpcAssemble1}、LdpcRemain。

As seen from the above technical solution, the LDPC coding and decoding methods in the application, can by SIMD instruction, multithreading and The mode such as prestore and improve coding and decoding speed.

Brief description of the drawings

Fig. 1 is LDPC PPDU generating process schematic diagram；

Fig. 2 is the encoder composition schematic diagram of LDPC cataloged procedures；

Fig. 3 is existing ldpc decoder schematic diagram；

Fig. 4 is the overview flow chart of coding method in the application；

Fig. 5 is calculating code word verification vector p in the application LDPC coded treatments₁Computing schematic diagram；

Fig. 6 is calculating code word verification vector p in the application LDPC coded treatments₂Computing schematic diagram；

Fig. 7 is the structural representation of optimization multiplier 1；

Fig. 8 is the structural representation of optimization multiplier 2；

Fig. 9 is the structural representation of optimization adder；

Figure 10 be matrix A in an element be multiplied with vector S transposition handle schematic diagram；

Figure 11 is the processing schematic diagram of step 5 in the application LDPC coding methods；

Figure 12 is the processing schematic diagram of step 5 in the application LDPC interpretation methods；

Figure 13 uses figure for the idiographic flow of the application LDPC interpretation methods；

Figure 14 distributes the schematic flow sheet handled for the most value carried out in the application LDPC interpretation methods for a sub-block；

Figure 15 be LDPC interpretation methods in once most be worth distributive operation structural representation；

Figure 16 calculates the vectorial schematic flow sheet of intermediate variable to be directed to a sub-block in LDPC interpretation methods；

Figure 17 is the structural representation of an intermediate vector calculating in LDPC interpretation methods.

Embodiment

In order that the purpose of the application, technological means and advantage are more clearly understood, the application is done below in conjunction with accompanying drawing It is further described.

The application provides the LDPC coding methods realized suitable for general processor and interpretation method.It is described below in detail Coding method and interpretation method in the application.

According to the LDPC PPDU generation methods in IEEE 802.11n agreements, by codeword vector after coding be designated as c=(S, p₁,p₂), wherein S is information vector, p₁And p₂Vector is verified for code word, check matrix H is simplified into six partsIt is divided into matrix A, matrix B, matrix D, matrix E, six submatrixs of matrix F and matrix T.Entering row decoding Before, need extraneous input code length L_LDPC, encoder bit rate R and information vector S.The application is according to general processor (GPP) chip carrier The characteristic of structure is optimized as follows to LDPC coding methods：

1st, using SIMD (Single Instruction Multiple Data, single-instruction multiple-data stream (SIMD)) operation method Coding method is optimized, its essential concept is that multiple data are handled within a CPU clock cycle to obtain The effect of parallel processing, rather than being common occupation mode --- each clock cycle only carries out data processing behaviour Make.Used general processor will be wherein related to once can processing data amount size, it is assumed that the size is K bit, SIMD processing Fundamental unit size be k, then once-through operation can processing data amount

2nd, the information for optimizing check matrix using the method for linear search table is stored, and check matrix H is split into six portions PointBe divided into matrix A, matrix B, matrix D, matrix E, six submatrixs of matrix F and matrix T, and with this six Individual submatrix generates six linear search tables, reduces computation complexity.

3rd, using the method for multithreading, using the line number of check matrix H as Thread Count, i.e., to handle one in check matrix H Row data are a thread, and the processing operation of multiple threads, and then the disposed of in its entirety performance of lifting system are performed in the same time.

Fig. 4 be the application in coding method general flow chart, wherein, the coding method based on algorithm principle with present LDPC coding methods are identical, and difference is the implementing for coding method in general processor.Idiographic flow is as follows：

1st, according to 802.11n agreements, different code length L_LDPCAnd different coding code check R correspond to different check matrix Hs. First, according to code length L_LDPCAnd encoder bit rate R, corresponding check matrix H is extracted, and following parameters are carried out just Beginningization：

1.1 generator matrix A, matrix B, matrix D, matrix E, matrix F and matrix T^-1Six submatrixs, wherein ()^-1For square Battle array it is inverse, and it is stored in linear search table successively, the particular location of data, uses internal memory needed for being marked with the method for offset Exchange computation complexity for, improve the data processing speed of LDPC code coding method.

The size of submatrix representated by each element is Z in the selected check matrix H of 1.2 generations.As code length L_LDPC=648 When, Z=27；As code length L_LDPCWhen=1296, Z=54；As code length L_LDPCWhen=1944, Z=81.

2nd, according to GPP chip characteristic, coding method is optimized using the method for multithreading, with the row of check matrix H Number is Thread Count, and wherein the data line in check matrix H is that a thread process step 3 arrives step 4, is performed in the same time many The processing operation of individual thread, i.e., the same time carries out the processing that multiple steps 3 arrive step 4, and then the disposed of in its entirety of lifting system Performance.Following step 3 and step 4, is the idiographic flow of single thread processing.

3rd, code word verification vector p is calculated₁, multiplying and add operation therein are entered using SIMD operation method Row optimization, its computing schematic diagram is referring to Fig. 5, and idiographic flow is as follows：

3.1 matrix As are multiplied with vector S transposition, are as a result vector, and vector length is equal with the line number of matrix A.

3.2 matrix T^-1It is multiplied with the transposition of step 3.1 acquired results vector, is as a result vector, vector length and matrix T^-1 Line number it is equal.

3.3 matrix E are multiplied with the transposition of step 3.2 acquired results vector, are as a result vector, and vector length is with matrix E's Line number is equal.

3.4 matrix Fs are multiplied with vector S transposition, are as a result vector, and vector length is equal with the line number of matrix F.

3.5 step 3.3 acquired results vector and step 3.4 acquired results addition of vectors, as a result as code word verification is vectorial p₁。

4th, code word verification vector p is calculated₂, multiplying and add operation therein are entered using SIMD operation method Row optimization, its computing schematic diagram is referring to Fig. 6, and idiographic flow is as follows：

4.1 matrix Bs and vector p₁Transposition be multiplied, be as a result vector, vector length is equal with the line number of matrix B.

4.2 step 3.1 acquired results vector and step 4.1 acquired results addition of vectors, are as a result vector.

4.3 matrix T^-1It is multiplied with the transposition of step 4.2 acquired results vector, as a result as code word verification vector p₂。

5th, assembling LDPC code word vector c：

By gained vector according to S, p₁、p₂Sequential storage, produce LDPC code word vector c=(S, p₁,p₂)。

Be related to two kinds of optimization multipliers and a kind of optimization adder in the coding method of above-mentioned the application, respectively into For optimization multiplier 1, optimization multiplier 2 and optimization adder.According to GPP chip architected features, optimization multiplier 1, optimization Being related to the part of parallel work-flow in multiplier 2 and optimization adder can be optimized with SIMD operation method.Below one One is introduced.

Optimization multiplier 1 has two inputs, one output end, and the optimization schematic diagram of multiplier 1 is referring to Fig. 7, in Optimized Coding Based Involved matrix and multiplication of vectors computing are used in step 3.1, step 3.3, step 3.4 and the step 4.1 of method To optimization multiplier 1.By taking step 3.1 as an example, the input of optimization multiplier 1 is matrix A and vector S transposition, and it is specific real Existing flow is as follows：

1st, judge whether to reach the maximum number of lines of matrix A, if reaching, complete the operation；If being not reaching to, carry out Step 2.

2nd, the submatrix of the Z*Z representated by first element in matrix A is multiplied with vector S transposition.Because the sub- square Battle array is that Z*Z unit matrix passes through A_1,1(A_1,1The element of representing matrix A the first row first rows) behind secondary ring shift left position As a result, so the submatrix is multiplied with vector S transposition carries out A equivalent to vectorial S_1,1Secondary circulative shift operation.The operation can SIMD optimizations are carried out, referring to Figure 10, concrete operations flow is as follows：

2.1 calculateThe data length of part 2=(Z-A_1,1) modW (its For Z-A_1,1To W modulus), and required data initial value position=information vector s original positions+A_1,1。

2.2 play required data initial value positionThe data copy of length as intermediate data starting Position data；

2.3 couples of remaining (Z-A_1,1) modW data carry out displacement copy, and by " remainder " in Figure 10 and " cover " is copied in output.In order to adapt to SIMD computings, input vector length is Z, and output vector size will be But in output vector, preceding Z element only therein is valid data, and result data is stored in into result vector register In.

3rd, judge whether to reach the maximum number of column of matrix A, if reaching, return to step 1；If being not reaching to, walked Rapid 4.

4th, the submatrix for entering to be about to the Z*Z in matrix A representated by next element is multiplied with vector S transposition, specifically Step need not be stored in result vector register with step 2.1,2.2,2.3, but transposition multiplied result, and performs step 5.

5th, step 4 result of calculation is added with element in result vector register, two binary numbers are added equivalent to two Number carries out xor operation, can now carry out SIMD optimizations, i.e. once-through operation and can obtain W result, referring to Figure 11, wherein (a₁, a₂,…,a_W) represent step 4 result of calculation, (b₁,b₂,…,b_W) element in result vector register is represented, rectangle frame represents XOR Arithmetic unit, (y₁,y₂,…,y_W) represent the result after computing, i.e.,And be stored in result vector register, return to step 3.

Optimization multiplier 2 has two inputs, one output end, and the optimization schematic diagram of multiplier 2 is referring to Fig. 8, in Optimized Coding Based Involved matrix uses optimization multiplier 2 with multiplication of vectors computing in the step 3.2 and step 4.3 of method.It is excellent It is the special circumstances for optimizing multiplier 1 to change multiplier 2, and the one of input of optimization multiplier 2 is matrix T^-1, different check square Under battle array H, matrixMatrix T^-1The element only descended in triangle is virtual value.By taking step 3.2 as an example, The input for optimizing multiplier 2 is matrix T_- ¹With the transposition of step 3.1 acquired results vector, it is as follows that it implements flow：

2nd, haveAs can be seen that matrix T^-1Upper triangle element is 0,

Result after the submatrix of Z*Z representated by element " 0 " is multiplied with the transposition of step 3.1 acquired results vector in Fig. 4 It is still the latter, so to matrix T^-1Element " 0 " in often going is multiplied with the transposition of step 3.1 acquired results vector in Fig. 4, and presses According to the step 5 in optimization multiplier 1, acquired results are added, return to step 1.

Optimization adder has two inputs, one output end, and two inputs are vector, optimizes adder schematic diagram Referring to Figure 12, involved vector makes with addition of vectors computing in the step 3.5 and step 4.2 of Optimized Coding Use optimization adder.By taking step 3.5 as an example, the input of optimization adder is step 3.3 acquired results vector and step 3.4 Acquired results vector, because input is binary number, two binary numbers are added is XOR behaviour equal to two binary numbers Make, can now carry out SIMD optimizations, i.e. once-through operation and can obtain W result, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent step Rapid 3.3 acquired results vector, (b₁,b₂,…,b_W) step 3.4 acquired results vector is represented, rectangle frame represents exclusive-OR operator, (y₁,y₂,…,y_W) represent the result after computing, i.e.,

Above-mentioned is the idiographic flow of LDPC coding methods in the application.The application enters to the interpretation method of existing decoder Optimization is gone, optimization interpretation method particular flow sheet is referring to Figure 13.Before row decoding is entered, the external world need to input code length L_LDPC, coding Codeword vector c, maximum iteration I and amendment offset β after code check R, coding.According to the characteristic pair of GPP chip framework LDPC interpretation methods are optimized as follows：

1st, coding method is optimized using SIMD operation method, its essential concept is a clock week in CPU Multiple data are handled to obtain the effect of parallel processing in phase, rather than being common occupation mode --- each when The clock cycle only carries out a data processing operation.To wherein be related to used general processor once can processing data amount size, Assuming that the size is K bit, the fundamental unit size of SIMD processing is k, then once-through operation can processing data amount

2nd, the method that the part flow in optimization interpretation method uses multithreading, using the line number of check matrix H as Thread Count, I.e. to handle the data line in check matrix H as a thread, the processing operation of multiple threads, Jin Erti are performed in the same time The disposed of in its entirety performance of the system of liter.

The idiographic flow of interpretation method in the application introduced below, wherein, the general framework of interpretation method and decoding at present Method is identical, specifically includes：Encoded LDPC code word signal c is received, check matrix H is determined；Calculated and become by successive ignition Knot vector q is measured as decoding result, during each iteration, is calculated according to current variable node vector q and check-node vector r Temporary variable vector isAnd check-node vector r is updated according to temporary variable vector t, further according to check-node Vectorial r and temporary variable vector t updates variable node vector qThe application provide interpretation method with it is existing The difference of technology is, difference is implemented in general processor.Concrete operation step is as follows：

1st, according to 802.11n agreements, different code length L_LDPCAnd different coding code check R correspond to different check matrix Hs. First, according to code length L_LDPCAnd encoder bit rate R, corresponding check matrix H is extracted, parameters are initialized, And it is stored in linear search table successively, the particular location of data, exchanges meter for internal memory needed for being marked with the method for offset Complexity is calculated, the data processing speed of LDPC code interpretation method is improved：

The size of submatrix representated by each element, is variable in 1.1Z, i.e., selected check matrix H.As code length L_LDPC= When 648, Z=27；As code length L_LDPCWhen=1296, Z=54；As code length L_LDPCWhen=1944, Z=81.

1.2LdpcRowNum, i.e., the line number of selected check matrix H, is variable.Work as code checkWhen, LdpcRowNum= 12；Work as code checkWhen, LdpcRowNum=8；Work as code checkWhen, LdpcRowNum=6；Work as code checkWhen, LdpcRowNum=4.

1.3V_{LdpcRowLength}, i.e., often go the number of non-"-" element in selected check matrix H, be that length is 1* LdpcRowNum vector.

1.4LdpcBufferNum, that is, store register number needed for each submatrix data of selected check matrix H, to become Amount.Its operational formula is

One of variable needed for 1.5LdpcRemain, i.e. step 9, is variable.As code length L_LDPCWhen=648, LdpcRemain =11；As code length L_LDPCWhen=1296, LdpcRemain=6；As code length L_LDPCWhen=1944, LdpcRemain=1.

One of variable needed for 1.6LdpcRoundNum, i.e. step 9, is variable.As code length L_LDPCWhen=648, LdpcRoundNum=27；As code length L_LDPCWhen=1296, LdpcRoundNum=22；As code length L_LDPCWhen=1944, LdpcRoundNum=17.

1.7V_{LdpcRowBuffer}, that is, store selected check matrix H and often go register number needed for non-"-" data, be length For 1*LdpcRowNum vector.Its operational formula is V_{LdpcRowBuffer}(v)=V_{LdpcRowLength}(v) * LdpcBufferNum (its Middle V_{LdpcRowBuffer}(v) vector LdpcRowBuffer v-th of element is represented, v correspond to select the line number of check matrix H, as follows Similarly).

1.8M_LdpcOffset1One of, i.e., the cycle offset calculated according to selected check matrix H, for step 4 and step 9, it is LdpcRowNum*max (V_{LdpcRowLength}(v) matrix (wherein max (V)_{LdpcRowLength}(v) amount of orientation) is represented V_{LdpcRowLength}The maximum of middle element, as follows similarly).Its operational formula is (M_LdpcOffset1)_i,j=Z* (n-1)+H_i,n, wherein N represents the columns in selected check matrix H, and j and n corresponding relation is selected check matrix H j-th of non-"-" of the i-th row The position of the check matrix H is that the i-th row n-th is arranged where element, as follows similarly.

1.9M_LdpcRound1One of, i.e., the cycle-index calculated according to selected check matrix H, for step 4, it is LdpcRowNum*max(V_{LdpcRowLength}(v) matrix).Its operational formula is to work as H_i,n≠ 0 and H_i,n≠ '-' when,Work as H_i,n=0 and H_i,n≠ '-' when, (M_LdpcRound1)_i,j=6.

1.10M_{LdpcAssemble1}, i.e., one of offset flag position is supplied according to what selected check matrix H was calculated, for walking Rapid 4, it is LdpcRowNum*max (V_{LdpcRowLength}(v) matrix).Its operational formula is to work as H_i,n=0 and H_i,n≠ '-' when, (M_{LdpcAssemble1})_i,j=0；Work as H_i,n≠0、H_i,n≠ '-' and (Z-H_i,n) modW=0 when, (M_{LdpcAssemble1})_i,j=0；Work as H_i,n ≠0、H_i,n≠ '-' and (Z-H_i,n) modW ≠ 0 when, (M_{LdpcAssemble1})_i,j=1.

1.11M_{LdpcAssembleTable1}, i.e., circulation is calculated according to selected check matrix H and supplies offset, for step 4 and Step 9, be (Matrix (wherein To calculate the sum of all elements in vector LdpcRowLength),

1.12M_LdpcOffset2One of, i.e., the cycle offset calculated according to selected check matrix H, for step 4 and step Rapid 9, it is LdpcRowNum*max (V_{LdpcRowLength}(v) matrix).Its operational formula is as (M_{LdpcAssemble1})_i,jWhen=0, (M_LdpcOffset2)_i,j=Z* (n-1)+[W-Z+H_i,n+(M_LdpcRound1)_i,j*W^]；As (M_{LdpcAssemble1})_i,jWhen=1, (M_LdpcOffset2)_i,j=Z* (n-1).

1.13M_LdpcRound2One of, i.e., the cycle-index calculated according to selected check matrix H, for step 4, it is LdpcRowNum*max(V_{LdpcRowLength}(v) matrix).Its operational formula is

1.14M_LdpcRound3One of, i.e., the cycle-index calculated according to selected check matrix H, for step 9, it is LdpcRowNum*max(V_{LdpcRowLength}(v) matrix).Its operational formula is to work as H_i,n≠ 0 and H_i,n≠ '-' when,Work as H_i,n=0 and H_i,n≠ '-' when, (M_LdpcRound3)_i,j=5.

1.15M_{LdpcAssemble2}, i.e., one of offset flag position is supplied according to what selected check matrix H was calculated, for walking Rapid 9, same to M_{LdpcAssemble1}。

1.16M_LdpcRound4One of, i.e., the cycle-index calculated according to selected check matrix H, for step 9, it is LdpcRowNum*max(V_{LdpcRowLength}(v) matrix).Its operational formula is the (M as i=0_LdpcRound4)_i,j=0；When i ≠ 0,

2nd, judge whether to reach maximum iteration I.If being not reaching to maximum iteration I, step 3 is carried out；If reaching To maximum iteration I, then decoding terminates.

3rd, according to GPP chip characteristic, interpretation method is optimized using the method for multithreading, with the row of check matrix H Number is Thread Count, and wherein the data line in check matrix H is that a thread process step 4 arrives step 9, is performed in the same time many The processing operation of individual thread, i.e., the same time carries out the processing that multiple steps 4 arrive step 9, and then the disposed of in its entirety of lifting system Performance.Following step 4 arrives step 9, is the idiographic flow of single thread processing.

4th, temporary variable vector t is calculated, it is that length isVector. Wherein, temporary variable vector includes subvector corresponding with the every a line of check matrix, and its call number isArriveThe subvector includes and each non-"-" element H again_i,j Corresponding subvector.(here, the non-"-" element H only in check matrix_i,jThere is the "-" in corresponding subvector, check matrix Element is in temporary variable vector, check-node vector r and variable node vector q all without corresponding subvector.) specifically, With H_i,jCorresponding subvector t' calculation formula isI.e. temporary variable vector t subvectors t' value is change With element H in amount knot vector q_i,jCorresponding subvector q' according to after check matrix H the i-th row jth column element value cyclic shift with With element H in check-node vector r_i,jCorresponding subvector r' difference.If being now first time interative computation, variable node Vectorial q is the codeword vector c after LDPC is encoded, and check-node vector r is original state, now temporary variable vector t subvectors T' calculation formula isThat is temporary variable vector t subvectors t' is variable node vector q subvectors q' according to school Test result after matrix H the i-th row jth column element value cyclic shift.In order to adapt to SIMD computings, input variable knot vector q to It is Z that q', which is measured, with check-node vector r subvector r' length, and output temporary variable vector t subvector t' sizes will be But in output subvector, preceding Z element only therein is valid data.Temporary variable vector t computing is with selected verification Matrix H is often gone non-"-" element number and circulated, such as selected j-th of non-"-" element of the i-th row of check matrix H will be calculated The of temporary variable vector tArriveThe element of position. Its specific calculation procedure is as follows：

4.1 according to M_LdpcOffset1J-th of non-"-" element of the i-th row of check matrix H selected by being found out in matrix is corresponding to be circulated partially Shifting amount, the initial value position of data, initial value position=variable node of required data needed for being found out in variable node vector q Vectorial q subvectors q' initial value position+corresponding cycle offset.

4.2 according to M_LdpcRound1Matrix finds out the corresponding circulation time of the non-"-" element of j-th of selected the i-th row of check matrix H Number, by (M behind the initial value position of required data_LdpcRound1)_i,j* W data copy temporary variable vector t subvectors t''s to On current location, current location here refers to the original position that data are not yet copied in subvector.

4.3 according to M_{LdpcAssemble1}Matrix, judges whether to need to carry out padding operation.If (M_{LdpcAssemble1})_i,j=1, then According to M_{LdpcAssembleTable1}Indicated offset carries out padding operation in matrix；If (M_{LdpcAssemble1})_i,j=0, then not Need padding operation.Specific padding operation includes：Determine matrix M_{LdpcAssemble1}In with check matrix element H_i,jCorrespondence The value of the element of each in rowAnd will be with element H_i,jIt is corresponding to work as Call number is in preceding vectorial q subvectorEach element copy successively Shellfish arrives and H_i,jOn the current location of corresponding temporary variable vector t subvectors；Wherein,

4.4 according to M_LdpcOffset2J-th of non-"-" element of the i-th row of check matrix H selected by being found out in matrix is corresponding to be circulated partially Shifting amount, the initial value position of data, initial value position=variable node of required data needed for being found out in variable node vector q Vectorial q subvectors q' initial value position+corresponding cycle offset.

4.5 according to M_LdpcRound2Matrix finds out the corresponding circulation time of the non-"-" element of j-th of selected the i-th row of check matrix H Number, by (M behind the initial value position of required data_LdpcRound2)_i,j* W data copy temporary variable vector t subvectors t''s to On current location.

If 4.6 now non-first time interative computations, need to carry out temporary variable vector t subvector t'=temporary variables vectorial T subvector t'- check-node vector r subvector r' computings, by temporary variable vector t subvectors t' and check-node vector r Subvector r' all elements are divided into using W element as one group, can now carry out SIMD optimizations, i.e. once-through operation and can obtain W facing Element in variations per hour vector t, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent one group of temporary variable vector t in element, (b₁,b₂,…,b_W) element in one group of check-node vector r is represented, rectangle frame represents subtraction operator, (y₁,y₂,…,y_W) table Show the result after computing, i.e. (y₁,y₂,…,y_W)=(a₁-b₁,a₂-b₂,…,a_W-b_W)；If being now first time interative computation, Carry out step 5.

5th, calculate the absolute value of all elements in temporary variable vector t subvectors t', and be designated as after modulus temporary variable to Amount | t |.Temporary variable vector t subvectors t' is divided into using W element as one group, SIMD optimizations can be now carried out, i.e., once transport The element that can obtain in W temporary variable vector t subvectors t' is calculated, referring to Figure 12, wherein (c₁,c₂,…,c_W) represent one group face Element in variations per hour vector t subvectors t', rectangle frame represents modulo operation device, (y₁,y₂,…,y_W) represent modulus after become temporarily Amount vector | t |, i.e. (y₁,y₂,…,y_W)=(| c₁|,|c₂|,…,|c_W|).As temporary variable vector t after renewal Vectorial t'.

6th, most it is worth distributive operation, and result is stored in most value variable vector m matrixes M.The process is with selected check matrix H Line number is circulated, i.e., every time to the V in temporary variable vector t_{LdpcRowLength}(v) * W*LdpcBufferNum elements are carried out Operation, obtains length for V_{LdpcRowLength}(v) * W*LdpcBufferNum result is stored in most value variable vector m.Once most It is worth distributive operation schematic diagram referring to Figure 15, the corresponding temporary variable vector t subvectors t' that will often be gone with check matrix is write as V_{LdpcRowLength}(v) row andThe matrix T of row_v, wherein, matrix T_vEach behavior temporary variable vector t subvectors t' In with element H_i,jCorresponding subvector, carries out cover when columns is inadequate；Most value distributive operation is calculating matrix T_vIn per column element Minimum value and sub-minimum, be allocated, and result is stored in most value variable vector m subvector matrixes M_v.Matrix T_vOften row has There is W base unit in LdpcBufferNum sub-block, each sub-block；The size of base unit in each row is contrasted, it is drawn In minimum value and sub-minimum, and record the line number of the place line number of the minimum value, i.e. index value；To most it be worth according to index value Variable vector m is filled, if index value is different from most value variable vector m line number, is inserted and is found out in most value variable vector m Minimum value, if index value is identical with most value variable vector m line number, the sub-minimum found out is inserted in most value variable vector m. Exemplified by finding out the minimum value sub-minimum of a sub-block, its flow chart is comprised the following steps that referring to Figure 14：

6.1 comparator matrix T_vThe first row and the size of corresponding base unit in first sub-block of the second row, will be smaller In the line number deposit index value of value, and smaller value is recorded as minimum value, higher value is recorded as sub-minimum, can now carry out SIMD Optimization, carry out computing twice, once take maximum, draw higher value between the two, once go minimum value, draw between the two compared with Small value, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent first sub-block of the first row element, (b₁,b₂,…,b_W) represent second The element of first sub-block of row, rectangle frame represents to take maximum operation device or takes minimum operation device, (y₁,y₂,…,y_W) table Show the result after computing, i.e. (y₁,y₂,…,y_W)=(max (a₁,b₁),max(a₂,b₂),…,max(a_W,b_W)) or (y₁, y₂,…,y_W)=(min (a₁,b₁),min(a₂,b₂),…,min(a_W,b_W))。

6.2 judge whether to have reached maximum cycle V_{LdpcRowLength}(v), if being not reaching to, step 6.3 is carried out； If reaching, step 6.6 is carried out.

6.3 by matrix T_vThe sub-block of next line first and precedence record minimum value carry out take maxima operation, the operation SIMD optimizations can be carried out, with step 6.1.

6.4 results for obtaining step 6.3 and the sub-minimum of current record are carried out taking minimum Value Operations, and the operation can be carried out SIMD optimizes, and sub-minimum is designated as with step 6.1, and by result.

6.5 results for obtaining step 6.3 and the minimum value of current record are carried out taking minimum Value Operations, and the operation can be carried out SIMD optimizes, and minimum value is designated as with step 6.1, and by result, while the line number of the minimum value is recorded as into index value, returns to step Rapid 6.2.

The minimum value and sub-minimum of current record are subtracted correction value β by 6.6, and the operation can carry out SIMD optimizations, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent precedence record minimum value or sub-minimum, (b₁,b₂,…,b_W) represent correction value β (β, β ..., β) is represented by, rectangle frame represents subtraction operator, (y₁,y₂,…,y_W) represent the result after computing, i.e. (y₁, y₂,…,y_W)=(a₁-β,a₂-β,…,a_W- β), and result is recorded as minimum value or sub-minimum.

The minimum value and sub-minimum of 6.7 pairs of current records are modified, and the minimum value or sub-minimum of current record are less than zero When, the value is set to zero, otherwise not operated, the operation can carry out SIMD optimizations, referring to Figure 11, wherein (a₁,a₂,…,a_W) table Show the minimum value or sub-minimum of current record, (b₁,b₂,…,b_W) represent that null value is also referred to as (0,0 ..., 0), rectangle frame Represent amendment arithmetic unit, (y₁,y₂,…,y_W) represent the result after computing, i.e., And result is recorded as minimum value or sub-minimum.

6.8 will most be worth variable vector m according to index value is filled, if index value is with being most worth variable vector m subvector squares Battle array M_vLine number it is different, then in matrix M_vSame position insert the minimum value of current record, if index value and being most worth variable vector M subvector matrixes M_vLine number it is identical, then in most value variable vector m subvector matrixes M_vSame position insert current record Sub-minimum.By matrix M_vIn element according to row major sequentially read composition most value variable vector m subvectors.

7th, intermediate variable vector s is calculated.The process is circulated with selected check matrix H line number, i.e., every time to interim change Measure the V in vector t_{LdpcRowLength}(v) * W*LdpcBufferNum elements are operated, and obtain length for V_{LdpcRowLength} (v) * W*LdpcBufferNum result is stored in intermediate variable vector s.Intermediate variable vector s computings schematic diagram referring to Figure 17, is divided into V by temporary variable vector t_{LdpcRowLength}(v) row is often capable to have in LdpcBufferNum sub-block, each sub-block There is W base unit.Exemplified by calculating the intermediate variable vector s of a sub-block, its flow chart is referring to Figure 16, concrete operations flow It is as follows：

The temporary variable vector t sub-block of the first row first and second the first sub-block of row are carried out xor operation, the operation by 7.1 SIMD optimizations can be carried out, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent the sub-block of the first row first, (b₁,b₂,…,b_W) table Show second the first sub-block of row, rectangle frame represents exclusive-OR operator, (y₁,y₂,…,y_W) represent the result after computing, i.e.,

7.2 judge whether to have reached maximum cycle V_{LdpcRowLength}(v), if being not reaching to, step 7.3 is carried out； If reaching, step 7.4 is carried out, and performed since the first row.

The result of the temporary variable vector t sub-block of next line first and step 7.1 is carried out xor operation, the operation by 7.3 SIMD optimizations can be carried out, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent the sub-block of next line first, (b₁,b₂,…,b_W) represent step Rapid 7.1 result, rectangle frame represents exclusive-OR operator, (y₁,y₂,…,y_W) represent the result after computing, i.e.,Return to step 7.2.

7.4 judge whether to have reached maximum cycle V_{LdpcRowLength}(v), if being not reaching to, step 7.5 is carried out； If reaching, step 8 is carried out.

The result of the temporary variable vector t sub-block of current line first and step 7.3 is carried out xor operation, the operation by 7.5 SIMD optimizations can be carried out.

7.6 by the result of step 7.5 withCarry out or operate, the operation can carry out SIMD Optimization, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent step 7.5 result, (b₁,b₂,…,b_W) representRectangle frame represents or arithmetic unit, (y₁,y₂,…,y_W) represent the result after computing, i.e. (y₁, y₂,…,y_W)=(a₁|b₁,a₂|b₂,…,a_W|b_W), and result is stored in intermediate variable vector s the first sub-block, return to step 7.4。

8th, check-node vector r is calculated, it is that length isVector. Wherein, check-node vector includes subvector corresponding with the every a line of check matrix, and its call number isArriveThe subvector includes and each non-"-" element H again_i,j Corresponding subvector.The process for calculating check-node vector r is circulated with selected check matrix H line number, i.e., every time to interim V in variable vector t_{LdpcRowLength}(v) * W*LdpcBufferNum elements are operated, and obtain length for V_{LdpcRowLength} (v) * W*LdpcBufferNum result is stored in check-node vector r.Calculating check matrix and often go corresponding subvector When row is calculated, with each non-"-" element H_i,jCorresponding subvector is carried out for unit, and calculating obtains index and isArriveVector in element.Below with calculating One and non-"-" element H_i,jExemplified by corresponding check-node vector r subvectors r', its concrete operations flow is as follows：

8.1 judge whether to have reached maximum cycle V_{LdpcRowLength}(v), if being not reaching to, step 8.2 is carried out； If reaching, step 9 is carried out.

8.2 will most be worth variable vector m and H_i,jCorresponding subvector and intermediate variable vector s and H_i,jCorresponding subvector enters Row contrast operation, if intermediate variable vector s is less than zero, result is the complement asked the value for being most worth variable vector m, if middle anaplasia Measure vector s and be equal to zero, then result is zero, if intermediate variable vector s is more than zero, result is the value to being most worth variable vector m.Should Operation can carry out SIMD optimizations, referring to Figure 12, wherein (a₁,a₂,…,a_W) represent most value variable vector m, (b₁,b₂,…,b_W) table Show intermediate variable vector s, rectangle frame represents to contrast arithmetic unit, (y₁,y₂,…,y_W) represent the result after computing, i.e.,

8.3 are added the result of step 8.2 with intermediate variable vector s.The operation can carry out SIMD optimizations, referring to Figure 11, Wherein (a₁,a₂,…,a_W) represent step 8.2 result, (b₁,b₂,…,b_W) intermediate variable vector s is represented, rectangle frame represents to add Method arithmetic unit, (y₁,y₂,…,y_W) represent the result after computing, i.e. (y₁,y₂,…,y_W)=(a₁+b₁,a₂+b₂,…,a_W+b_W)。

8.4 subtract each other the result of step 8.3 with temporary variable vector t subvectors t'.The operation can carry out SIMD optimizations, ginseng Figure 11 is seen, wherein (a₁,a₂,…,a_W) represent step 8.3 result, (b₁,b₂,…,b_W) represent temporary variable vector t, rectangle frame Represent adder calculator, (y₁,y₂,…,y_W) represent the result after computing, i.e. (y₁,y₂,…,y_W)=(a₁-b₁,a₂-b₂,…,a_W- b_W), and result is stored in check-node vector r, return to step 8.1.

9th, variable node vector q is calculated, it is that length isVector. Wherein, variable node vector includes subvector corresponding with the every a line of check matrix, and its call number isArriveThe subvector includes and each non-"-" element H again_i,j Corresponding subvector q'.Variable node vector q subvectors q' calculation formula isThat is variable node vector q Subvector q' value be temporary variable vector t subvector t' values and check-node vector r subvector r' values and, and according to verification Result after matrix H the i-th row jth column element value cyclic shift.In order to adapt to SIMD computings, input temporary variable vector t it is sub to It is Z that t', which is measured, with check-node vector r subvector r' length, and output variable knot vector q subvector q' sizes will be But in output subvector, preceding Z element only therein is valid data.Variable node vector q computing is with selected verification Matrix H is often gone non-"-" element number and circulated, such as selected j-th of non-"-" element of the i-th row of check matrix H will be calculated The of variable node vector qArriveThe element of position. Its specific calculation procedure is as follows：

9.1 according to M_LdpcOffset1J-th of non-"-" element of the i-th row of check matrix H selected by being found out in matrix is corresponding to be circulated partially Shifting amount, finds out required original position in variable node vector q, required original position=variable node vector q subvectors q''s Initial value position+corresponding cycle offset.

9.2 are added temporary variable vector t subvectors t' with check-node vector r subvectors r'.The operation can be carried out SIMD optimizes, referring to Figure 11, wherein (a₁,a₂,…,a_W) represent temporary variable vector t, (b₁,b₂,…,b_W) represent check-node Vectorial r, rectangle frame represents adder calculator, (y₁,y₂,…,y_W) represent the result after computing, i.e. (y₁,y₂,…,y_W)=(a₁+ b₁,a₂+b₂,…,a_W+b_W)。

9.3 according to M_LdpcRound3Matrix finds out the corresponding circulation time of the non-"-" element of j-th of selected the i-th row of check matrix H Number, by (M behind the initial value position of step 9.2 result data_LdpcRound1)_i,j* W data copy the institute found out in step 9.1 to Need original position.

9.4 according to M_{LdpcAssemble2}Matrix, judges whether to need to carry out padding operation.If (M_{LdpcAssemble2})_i,j=1, then According to M_{LdpcAssembleTable1}Indicated offset carries out padding operation in matrix；If (M_{LdpcAssemble2})_i,j=0, then it is not required to Want padding operation.

9.5 according to M_LdpcOffset2J-th of non-"-" element of the i-th row of check matrix H selected by being found out in matrix is corresponding to be circulated partially Shifting amount, finds out required original position, required original position=variable node vector q initial value position in variable node vector q Put+corresponding cycle offset.

9.6 according to M_LdpcRound4Matrix finds out the corresponding circulation time of the non-"-" element of j-th of selected the i-th row of check matrix H Number, by (M behind the initial value position of step 9.2 result data_LdpcRound4)_i,j* W data copy the institute found out in step 9.5 to Need original position.

The 9.7 required cover number according to indicated by LdpcRemain, further according to M_{LdpcAssembleTable1}It is indicated in matrix Offset to variable node vector q surplus elements carry out complement operation, return to step 2.

LDPC codings and interpretation method in above-mentioned as the application.

The coding and decoding of LDPC code is theoretical more ripe, but because LDPC code is a kind of larger linear block codes of code length n, school Test matrix H also larger, algorithm complex is very high, traditional LDPC coding and decodings mode is not well positioned to meet IEEE 802.11n The throughput requirement of system, has largely had influence on the performance of system.LDPC code in existing high speed wireless access system Realization is mostly based on fpga chip and dsp chip.Although Modern High-Speed protocol of wireless local area network can be met by previous methods Middle processing and the requirement of time delay, but FPGA programmings and specialty DSP are more complicated, lack abundant programmed environment and debugging work Tool, applicability is general.And based on GPP chip, developer can use common computer to make under known structure and environment Developed with abundant instrument, such as C/C++ environment.The innovative point of this patent is exactly to high-speed radio local in GPP chip LDPC code in net system is carried out in the case of using original coder according to the characteristic of GPP chip to coding and decoding method Optimization.Because IEEE802.11n LDPC code is irregular LDPC codes, the nonnegative value number that its check matrix prototype is often gone differs It is fixed identical, so, compared to conventional LDPC code Encoding Realization method, the flexibility of GPP chip can have great advantage.In addition, examining Consider CPU (Central Processing Unit, central processing unit) high speed development, the data-handling capacity of GPP chip Can constantly it be lifted.

First, the parallel processing of data is realized using SIMD instruction.SIMD instruction collection, is used in this patent SSE (Streaming SIMD Extensions, instruction set) instruction set is referred to as on Intel CPU, its essential concept is Multiple data are handled within a CPU clock cycle to obtain the effect of parallel processing, rather than being common make With mode --- each clock cycle only carries out a data processing operation.For the CPU of Nehalem frameworks, it handles position A width of 128 bit, for the CPU of Sandy Bridge frameworks, it is 256 bits that it, which handles bit wide, i.e., for 8 bit fixed point numbers Speech, the former can be handled 16 data within an instruction cycle, and the latter can be to 32 within an instruction cycle Data are handled, theoretically for, degree of parallelism be respectively 16 times it is parallel and 32 times it is parallel.But imitated by actual program True result it is recognised that be often unable to reach preferably parallel multiple in actual system operation, be on the one hand because Program is simultaneously non-fully made up of data manipulation flow, while substantial amounts of judgement sentence is further comprises, and these judge that sentence can not Carry out parallel work-flow.On the other hand, if using the SIMD instruction of 128 bit bit wides, for the school of IEEE 802.11n standards Matrix is tested, each submatrix size is not 16 multiple, thus it is parallel during each submatrix last group of data processing of size Degree is less than 16.

Secondly, by using the method for look-up table, i.e., parameter knowable to multiple is initialized, and with the method for offset The particular location of data needed for mark, exchanges computation complexity for internal memory, improves the data processing speed of LDPC code coding and decoding method Degree.In LDPC code encoder, the block matrix needed for the coding under the conditions of different code checks and code length can be calculated in advance, and by its It is stored in LUTs (Look-Up-Table, look-up table), as long as reading in table when program brings into operation, without repeating Calculate.

Finally, the method for employing multithreading, more than one thread, and then the entirety of lifting system are performed in the same time Process performance.In LDPC co mpiler optimizations code method, to the operation wherein in units of check matrix data line, multithreading is used Method optimize, Thread Count be check matrix line number.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God is with principle, and any modification, equivalent substitution and improvements done etc. should be included within the scope of protection of the invention.

Claims

1. a kind of LDPC coding methods based on general processor, including：Letter to be encoded is obtained by signal acquisition or reception Number vector S, determines check matrix H and its matrix in block form A, B, D, E, F and T, and preserved；According toIt is determined that vector p₁And p₂, and obtain LDPC coding result vector c=(S, p₁,p₂)；Its feature exists In the determination vector p₁And p₂Shi Jinhang any matrix includes with any vectorial processing that is multiplied：

Using every a line of any matrix as a thread, the corresponding line for carrying out the matrix is multiplied with described any vectorial Operation, and the multiplied result of all rows is combined into composition result vector；

Wherein, every a line of any matrix includes with any vectorial multiplication operations：Determine current i-th row of matrix The corresponding vectorial original position=any vectorial original position+A of each element j_i,j+ (j-1) * Z, by it is described it is any to In amount from the original position Z-A_i,jThe data of length are shifted left by way of single-instruction multiple-data stream (SIMD) SIMD, and The preceding A that the original position is started_i,jThe data of length are moved to after the data after shifting left, and obtain the element j corresponding Vector shift result；Again by the corresponding vector shift results added of each element, described every a line and any vector are used as Multiplied result；

In the mode of the SIMD, will from the original position Z-A_i,jThe data of length are divided into units of length WSection is rightSegment data carries out shift left operation parallel, then by remaining (Z-A_i,j) mod W length number According to progress shift left operation；

The submatrix size that Z represents for an element in the check matrix.

2. according to the method described in claim 1, it is characterised in that when any matrix is T^-1When, the T^-1Every a line During with corresponding vectorial multiplication operations, T is only carried out^-1Value is multiplied for 0 element with corresponding vectorial, obtains the value for 0 yuan The corresponding vector shift result of element, null vector is set to by the corresponding vector shift result of remaining element；Again by each element pair The vector shift results added answered, is used as described every a line and any vectorial multiplied result.

3. method according to claim 1 or 2, it is characterised in that carried out simultaneously to W segment datas before being taken after shift left operation Z data are valid data.

4. method according to claim 1 or 2, it is characterised in that described by the corresponding vector shift result of each element Addition includes：The corresponding vector shift result of each element is divided into units of length WSection, passes through SIMD pairsSegment data carries out phase add operation parallel, then by remaining (Z-A_i,j) mod W length data carry out phase add operation.

5. method according to claim 1 or 2, it is characterised in that the matrix A, B, D, E, F and T^-1Pass through linear search Table is preserved.

6. a kind of LDPC interpretation methods based on general processor, including：Encoded LDPC code word signal c is received, school is determined Test matrix H；Variable node vector q is calculated by successive ignition and is used as decoding result, during each iteration, according to current variable section Point vector q and check-node vector r calculates temporary variable vectorAnd according to temporary variable vector t more New check-node vector r, updating variable node vector q further according to check-node vector r and temporary variable vector t isDuring first iteration, using character signal c as variable node vector q, verification knot vector r is set to 0；It is special Levy and be,

Every time when iterative calculation temporary variable vector t, check-node vector r and variable node vector q, with each of check matrix Row carries out computing and renewal as thread, obtain with call number in often capable corresponding vectorial t, q and r fromSubvector；Wherein, i is the line index of check matrix, I-th row of the correspondence check matrix calculates temporary variable vector t, check-node vector r and the corresponding sons of variable node vector q When vectorial, according to each non-"-" element H of the check matrix row_i,jWith element H in correspondence calculating vector t, q and r_i,jIt is corresponding Call number fromSubvector, then enter successively Row connection is obtained and often capable corresponding subvector, during i=1, orderV_{LdpcRowLength}(v) it is the school Test in matrix and often to go the number of non-"-" element；

Calculate and H_i,jThe mode of corresponding temporary variable vector t subvectors is：Determine H_i,jCorresponding vectorial original position Z* (n- 1)+H_i,n, original position described in the corresponding vectorial q subvectors of the i-th row is risen into length isOr 6 data pass through SIMD mode is copied to and H_i,jThe beginning of corresponding temporary variable vector t subvectors；In H_i,n≠0、H_i,n≠ '-' and (Z- H_i,n) mod W ≠ 0 when, determine matrix M_{LdpcAssemble1}In with check matrix element H_i,jThe value of each element in correspondence rowAnd will be with element H_i,jCall number in corresponding current vectorial q subvector ForEach element copy to successively and H_i,jCorresponding temporary variable vector On the current location of t subvectors；Each element H is determined again_i,jCorresponding secondary vector original position M_LdpcOffset2, by described Two vectorial original positions play length and areData copied to by way of SIMD and H_i,jCorresponding temporary variable vector On the current location of t subvectors；Take and H_i,jPreceding Z in corresponding temporary variable vector t subvectors and take absolute value as with H_i,jCorresponding temporary variable vector t effective subvector；Wherein, M_{LdpcAssemble1}Calculated according to the check matrix Supply one of offset flag position, M_LdpcOffset2For one of cycle offset for being calculated according to the check matrix；

Work as H_i,n≠0、H_i,n≠ '-' and (Z-H_i,n) mod W ≠ 0 when, Work as H_i,n=0 or H_i,n='-' or (Z-H_i,n) mod W=0 when, (M_LdpcOffset2)_i,j=Z* (n-1)； K is that general processor once can processing data amount size, the fundamental unit size that k is handled for SIMD；Code length L_LDPCWhen=648, LdpcRemain=11；As code length L_LDPCWhen=1296, LdpcRemain=6；As code length L_LDPCWhen=1944, LdpcRemain =1；Indexes of the j for each non-"-" element in the i-th row in all non-"-" elements of the row, n is j-th of non-"-" member of the i-th row Column index of the element in check matrix.

7. method according to claim 6, it is characterised in that calculate with the often capable corresponding check-node of check matrix to Amount r subvectors mode be：

The corresponding temporary variable vector t subvectors that will often be gone with check matrix are write as V_{LdpcRowLength}(v) row andRow Matrix T_v, wherein, the matrix T_vEach behavior described in temporary variable vector t subvectors with element H_i,jIt is corresponding son to Amount, carries out cover when columns is inadequate；

According to the matrix M_vWith the matrix S_vMiddle index value identical element, determines an intermediary matrix R_v' middle respective index value Element value；Wherein, if matrix S_vIn either element be less than 0, then take the either element complement and with the either element It is added, will add up result as matrix R_v' in value with the either element index value identical element；If matrix S_vIn Either element is equal to 0, then the either element is added with 0, will add up result as matrix R_vIn with the either element index It is worth the value of identical element；If matrix S_vIn either element>0, then in matrix M_vIn take and the either element index value phase Same element is added with the either element, will add up result as matrix R_vIn with the either element index value identical member The value of element；Matrix S_vIn the operation compared with 0 of either element and the operation of the addition carried out by way of SIMD；

By SIMD modes by the matrix R_v' and matrix T_vMiddle index value identical element subtracts each other, and regard result as check-node Vectorial r subvectors matrix R_vThe element value of middle same index value；By the matrix R_vIn often row preceding Z element according to go it is excellent First mode sequential reads out composition check-node vector r subvectors.

8. method according to claim 7, it is characterised in that described to be most worth distribution and include：

The matrix T is determined by way of SIMD_vIn each row minimum value and sub-minimum and the corresponding row rope of minimum value Draw；Obtained minimum value and sub-minimum are modified, default correction value β is subtracted, when revised minimum value and time small When value is less than 0,0 is set to, otherwise keeps constant；

According to the matrix T_vIn each row current minimum value, sub-minimum and the corresponding line index of minimum value, construction be most worth variable Vectorial m subvectors matrix M_vThe row of middle same index, wherein, in M_vEither rank in, will rope of mutually going together corresponding with current minimum value The element drawn is set to the minimum value determined, remaining element is set into sub-minimum.

9. method according to claim 8, it is characterised in that the minimum value that each row are determined by way of SIMD Include with the mode of sub-minimum and corresponding line index：

By the matrix T_vEach row element be divided intoIndividual sub-block, each sub-block includes W base unit；Relatively more described Matrix T_vIn any two row element when, W base unit is disposably compared by way of SIMD.

10. method according to claim 7, it is characterised in that the calculating intermediate variable vector s subvector matrixes S_vBag Include：

For matrix T_vIn each row, by the row all elements carry out xor operation, then by result with i-th ' row element XOR Carry out or operate with 0x7f afterwards, incite somebody to action or operating result is used as intermediate vector matrix S_vI-th ' row element of middle same index row；Its In, by the matrix T_vEach row element be divided intoIndividual sub-block, each sub-block includes W base unit, progress XOR/ Or during operation, XOR/or the operation of W base unit, the i-th ' row expression current line are disposably performed by way of SIMD.

11. method according to claim 6, it is characterised in that calculate and H_i,jCorresponding variable node vector q subvectors Including：

Determine H_i,jCorresponding vectorial original position Z* (n-1)+H_i,n, by SIMD modes by H_i,jCorresponding temporary variable vector t Subvector and H_i,jCorresponding check-node vector r subvectors are added, and original position described in result vector is risen into length isOr 5 data are copied to and H by way of SIMD_i,jThe beginning of corresponding variable node vector q subvectors； H_i,n≠0、H_i,n≠ '-' and (Z-H_i,n) mod W ≠ 0 when, determine matrix M_{LdpcAssemble1}In with check matrix element H_i,jCorrespondence The value of the element of each in rowAnd will be with element H_i,jIt is corresponding when it is preceding to Measuring call number in q subvector isEach element copy to successively with H_i,jOn the current location of corresponding variable node vector q subvectors；

It is determined that each element H_i,jCorresponding secondary vector original position M_LdpcOffset2, the secondary vector original position is risen long Spend for 0 orData copied to by way of SIMD and H_i,jCorresponding variable node vector q subvectors it is current On position；

The cover number indicated according to LdpcRemain, according to M_{LdpcAssemble1}In with check matrix element H_i,jElement in correspondence row Value carry out cover.

12. according to any described method in claim 6 to 11, it is characterised in that precalculate and preserve each element H_i,j Corresponding vectorial original position Z* (n-1)+H_i,nWith secondary vector original position M_LdpcOffset2, matrix M_{LdpcAssemble1}, verification square The vectorial V that the number of non-"-" element of often being gone in battle array is constituted_{LdpcRowLength}、M_{LdpcAssemble1}、LdpcRemain。