CN103746771A

CN103746771A - Data format conversion method of channel coding and decoding based on GPP and SIMD technologies

Info

Publication number: CN103746771A
Application number: CN201310729424.7A
Authority: CN
Inventors: 牛凯; 丁忆南; 贺志强
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2013-12-26
Filing date: 2013-12-26
Publication date: 2014-04-23
Anticipated expiration: 2033-12-26
Also published as: CN103746771B

Abstract

The invention relates to a data format conversion method of channel coding and decoding based on GPP and SIMD technologies. Processing before coding is that an input data stream A0A1A2..An-1 with a work length of n bytes is packaged into s SIMDA format data with a length of M first whose data formats are deserialized and are suitable for an SIMD instruction, concurrent ''mapping'' operation is executed, then concurrent ''and'' operation is executed, finally concurrent ''smaller value selection'' operation is executed, and an output data stream B0B1B2..B8n-1 with a work length of 8n bytes is generated; and processing after decoding is that an input data stream C0C1C2..C8n-1 with a work length of 8n bytes is packaged into 8s SIMD format data with a length of M first, concurrent ''equality judgment'' operation is executed, then concurrent ''highest bit combination selection'' operation is executed, and an output data stream D0D1D2..Dn-1 with a word length of n bytes is generated. The data format conversion method of channel coding and decoding based on the GPP and SIMD technologies in the invention uses SIMD concurrent operation instructions, greatly accelerates the conversion speed of data formats, and ensures transmission performance and correctness of coding and decoding. The method also has the characteristics of being low in cost, good in transplantability, easy to debug, easy and convenient to upgrade, etc.

Description

A kind of conversion method of data format of the channel decoding based on GPP and SIMD technology

Technical field

The present invention relates to a kind of based on general processor GPP(General Purpose Processor) and single-instruction multiple-data stream (SIMD) SIMD(Single Instruction Multiple Data) conversion method of data format of channel decoding of technology, the technical field of the coding and decoding of communicating by letter belonged to.

Background technology

In channel coding technology, as Turbo code, low-density checksum LDPC(Low Density Parity Check) code and convolution code etc., because its error-correcting performance is superior, over nearly 10 years, in high-speed radiocommunication standard (system of 3G or 4G), obtained being widely used.For example,, at Long Term Evolution LTE/ Long Term Evolution upgrade version LTE-A(Long Term Evolution/Long Term Evolution Advanced) system in, used Turbo code and convolution code; In 802.11 systems, LDPC code and convolution code have been used.

The coding and decoding of this type of channel has high time complexity, needs to expend a large amount of computing times.Yet emerging radio communication standard is again the means of communication towards big data quantity.Traditional communication implementation is mostly based on hardware handles platform, and hardware platform has following a plurality of problem: cost is high, platform scope of application limitation, and debug process is loaded down with trivial details, and the construction cycle is long, program upgrade inconvenience etc.

Over nearly 5 years, take general processor GPP platform as basic software radio ripe gradually.When having overcome the above-mentioned shortcoming of hardware platform, also there is bottleneck in software and radio technique in arithmetic speed.How to reduce the computation complexity that channel coding and decoding is brought, reduce time delay, become the main method of breaking communication system transmission rate bottleneck.

On GPP platform, the transmission of data and storage are all to take byte as base unit,, in computational process, are therefore minimum of computation unit greatly mainly with byte.And in communication system, information, with bit form storage or processing, is namely carried out according to bit: represent a unit information with a bit, be referred to as bit form.On GPP platform, the most efficient channel decoding implementation algorithm is all to using byte as minimum of computation unit at present,, with bit information of a byte representation, is referred to as byte form.Therefore, chnnel coding must have the Data Format Transform function that the bit form of data flow is converted to byte form.How to complete the bit form of data flow and the mutual translation function of byte form, become the Data Format Transform function of current inevitable chnnel coding front end and the Data Format Transform function of channel decoding rear end.

The object of the Data Format Transform of chnnel coding is: the input traffic A that by word length is the bit form of n byte ₀a ₁a ₂... A _n-1being converted to word length is the output stream B of the byte form of 8n byte ₀b ₁b ₂... B _8n-1.Because any one A _g(0≤g≤n-1) and B _h(0≤h≤8n-1) is all 1 byte, and word length is 8 bits, and its index number is less, represents its Data Format Transform completing also more early.Wherein, A _g=(a _8ga _8g+1a _8g+2a _8g+3a _8g+4a _8g+5a _8g+6a _8g+7), because any one element a is wherein 1 bit, the index number of a is less, and representative is the closer to the low level of its place byte, B _h=(a _h0000000), a _hfor h bit in input traffic A, it is positioned at B _hlowest order in byte.

The object of the Data Format Transform of channel decoding has been the inverse operation of above-mentioned chnnel coding: by word length, be the input traffic C of the byte form of 8n byte ₀c ₁c ₂... C _8n-1being converted to word length is the output stream D of the bit form of n byte ₀d ₁d ₂... D _n-1.Because any one C _l(0≤l≤8n-1) and D _e(0≤e≤n-1) is all 1 byte, and word length is 8 bits, and its index number is less, represents its Data Format Transform completing also more early.Wherein, C _l=(d _l0000000), d _lfor l bit in output stream D, it is positioned at C _llowest order in byte; D _e=(d _8ed _8e+1d _8e+2d _8e+3d _8e+4d _8e+5d _8e+6d _8e+7), and wherein any one element d is 1 bit, the index number of d is less, represents that it is the closer to the low level of place byte.

The conventional method of the channel-encoded data format conversion under GPP framework is: use based on displacement and " with " operation complete.With byte A ₀for example, be translated into 8 byte B ₀b ₁b ₂b ₃b ₄b ₅b ₆b ₇time, this byte A ₀=(a ₀a ₁a ₂a ₃a ₄a ₅a ₆a ₇) to circulate and carry out 8 following operations: each content of operation is all B _f=(A ₀<<f) & 1, and wherein, f is byte sequence number, and 0≤f≤7; Like this, when f=4, B ₄=(A ₀<<4) & 1=(a ₄a ₅a ₆a ₇0000) & (10000000)=(a ₄0000000).Therefore, by input traffic A ₀a ₁a ₂... A _n-1convert output stream B to ₀b ₁b ₂... B _8n-1time, just need circulation to carry out aforesaid operations n time: each circulation just completes the conversion of a byte in input traffic, i.e. the inferior circulation of g (0≤g≤n-1) has been aforementioned by A _gbe converted to B _8gb _8g+1b _8g+2b _8g+3b _8g+4b _8g+5b _8g+6b _8g+7operation; Namely above-mentioned the g time circulation all includes 8 subcycles, a bit of each subcycle conversion, i.e. and the inferior subcycle of f (0≤f≤7) completes B _8g+f=(A _g<<f) operation of & 1.

The Data Format Transform conventional method of the channel decoding under GPP framework is: the operation based on " displacement " and distance completes.With 8 byte C ₀c ₁c ₂c ₃c ₄c ₅c ₆c ₇for example, be translated into 1 byte D ₀time, C wherein _q=(d _q0000000); First make D ₀=0=(00000000), 8 following operations are carried out in recirculation: each content of operation is: D ₀=D ₀^ (C _q>>q), wherein, q is byte sequence number, and 0≤q≤7.Like this, when q=4, D ₀=D ₀^ (C ₄>>4)=(d ₀d ₁d ₂d ₃0000) ^ (0000d ₄000)=(d ₀d ₁d ₂d ₃d ₄000).And by input traffic C ₀c ₁c ₂... C _8n-1convert output stream D to ₀d ₁d ₂... D _n-1, will circulate and carry out n aforesaid operations, the conversion of 8 bytes in input traffic that at every turn circulated, i.e. the inferior circulation of e (0≤e≤n-1) has been by C _8ec _8e+1c _8e+2c _8e+3c _8e+4c _8e+5c _8e+6c _8e+7be converted to D _e.The e time above-mentioned cycling content is: first by D _e=0, then carry out 8 subcycles, a bit of each subcycle conversion, i.e. the inferior subcycle of q (0≤q≤7) completes D _e=D _e^ (C _8e+q>>q) operation.

The shortcoming of above-mentioned two kinds of conventional methods is: the operating unit of each computing only has 1 byte, carry out " displacement ", " with ", distance etc. is when operate, efficiency is on the low side.Therefore, how to improve the operating efficiency of coding&decoding, solve processing speed problem, become the focus problem that scientific and technical personnel pay close attention in the industry.

Single instruction stream multiple data stream SIMD(Single Instruction Multiple Data) be that controller of a kind of employing is controlled a plurality of processors, each data in one group of data (claiming again " data vector ") are carried out respectively to identical operation simultaneously, thus the technology that the concurrency on implementation space is processed.In microprocessor, single instruction stream multiple data stream technology is that a controller is controlled a plurality of parallel processing infinitesimals, for example the 3D Now technology of the MMX of Intel or SSE and AMD.

Summary of the invention

In view of this, the object of this invention is to provide a kind of based on general processor GPP(General Purpose Processor) and single-instruction multiple-data stream (SIMD) SIMD(Single Instruction Multiple Data) conversion method of data format of channel decoding of technology, the method is guaranteeing on the basis of transmission performance and coding and decoding correctness, redesign the applicable transfer algorithm of SIMD, use SIMD parallel work-flow instruction, greatly accelerate conversion speed; Because the present invention adopts GPP chip to realize, have cost low, portable good, debugging is simple and upgrade the feature such as easy.

In order to achieve the above object, the invention provides a kind of based on general processor GPP(General Purpose Processor) and single-instruction multiple-data stream (SIMD) SIMD(Single Instruction Multiple Data) chnnel coding before conversion method of data format, it is characterized in that: by word length, be first the input traffic A of n byte ₀a ₁a ₂... A _n-1be encapsulated as the SIMD formatted data that s length is M, make its data format parallelization, can be applicable to SIMD instruction and it is carried out to parallel " mapping " operation: each byte of input traffic be copied as to 8 bytes, be converted into the first intermediary data stream E ₀e ₁... E _8n-1; Again to the first intermediary data stream E ₀e ₁... E _8n-1carry out parallel AND-operation, extract after each bit information, be converted into the second intermediary data stream F ₀f ₁... F _8n-1; Finally to the second intermediary data stream F ₀f ₁... F _8n-1carry out parallel " choosing smaller value " operation, each bit information is moved on to the lowest order of the byte at its place, generating word length is the output stream B of 8n byte ₀b ₁b ₂... B _8n-1; Wherein, byte length n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.

Described method comprises following operating procedure:

(1) carry out parallel " mapping " SIMD instruction, complete data Replica operation:

Successively s SIMD encapsulation of data used to " mapping " SIMD instruction, the SIMD encapsulation of data of the X input of " mapping " SIMD instruction is A _tM+0, A _tM+1..., A _tM+M-1, and each X input data will participate in interior loop 8 times; Wherein, t is for carrying out the number of operations sequence number of the outer circulation of " mapping " SIMD instruction, and its span is [0, s-1]; U is for carrying out the number of operations sequence number of " mapping " SIMD instruction interior loop, and its span is [0,7]; The SIMD encapsulation of data of the Y of " mapping " SIMD instruction of the u time interior loop of the t time outer circulation input is:

\begin{matrix} {\frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \\ . . ., \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1} \end{matrix},

Complete after this " mapping " SIMD instruction, the first intermediary data stream Z obtaining is:

E _8tM+uM+0,E _8tM+uM+1,...,E _8tM+uM+M-1；

(2) carry out parallel " with " SIMD instruction, complete the operation of extracting bit:

Successively to 8s encapsulation of data carry out parallel " with " SIMD instruction, should " with " the SIMD encapsulation of data of the X input of SIMD instruction is the first intermediary data stream Z:E _rM+0, E _rM+1..., E _rM+M-1, " with " the SIMD encapsulation of data of the Y of SIMD instruction input is:

1,2,4,8,16,32,64,128,1,2,4,8,16,32,64,128 ..., 1,2,4,8,16,32,64,128}; Wherein, r for carrying out " with " the number of operations sequence number of SIMD instruction, its span is [0,8s-1]; Complete this " with " after SIMD instruction, the second intermediary data stream Z obtaining is: F _rM+0, F _rM+1..., F _rM+M-1;

(3) carry out parallel " choosing smaller value " SIMD instruction, complete the lowest order operation that significant bit is displaced to its place byte:

Successively 8s SIMD encapsulation of data carried out to " choosing smaller value " SIMD instruction, the SIMD encapsulation of data of the X input of " choosing smaller value " SIMD instruction is F _jM+0, F _jM+1..., F _jM+M-1, then arrange " choosing smaller value " SIMD instruction Y input SIMD encapsulation of data for 1,1 ..., 1}; Wherein, j is for carrying out the number of operations sequence number of " choosing smaller value " SIMD instruction, and its span is [0,8s-1]; Complete after this SIMD instruction of " choosing smaller value ", the final output stream Z obtaining is: B _jM+0, B _jM+1..., B _jM+M-1.

In order to achieve the above object, the present invention also provides the conversion method of data format after a kind of channel decoding based on GPP and SIMD, it is characterized in that: by word length, be the input traffic C of 8n byte ₀c ₁c ₂... C _8n-1be encapsulated as the SIMD encapsulation format data that 8s length is M, make its data format parallelization, can be applicable to SIMD order structure and it is carried out to parallel " judging whether to equate " operation, each bit information be moved on to the highest order of the byte at its place, thereby be converted into intermediary data stream G ₀g ₁... G _8n-1; Again to this intermediary data stream G ₀g ₁... G _8n-1carry out parallel " choosing highest order combination " operation, generating word length is the output stream D of n byte ₀d ₁d ₂... D _n-1; Wherein, n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.

Described method comprises following operating procedure:

(1) use parallel " judging whether to equate " SIMD instruction, complete the operation that significant bit is displaced to the highest order of place byte:

Successively 8s SIMD encapsulation of data used to " judging whether to equate " SIMD instruction, the encapsulation of data of the X input of " judging whether to equate " SIMD instruction is C _kM+0, C _kM+1..., C _kM+M-1, the encapsulation of data of the Y input of " judging whether to equate " SIMD instruction for 1,1 ..., 1}; Wherein, k is for carrying out the number of operations sequence number of " judging whether to equate " SIMD instruction, and its span is [0,8s-1]; Complete this and " judge whether to equate " that after SIMD instruction, the intermediary data stream Z obtaining is: G _kM+0, G _kM+1..., G _kM+M-1;

(2) use parallel " choosing highest order combination " SIMD instruction to complete the operation of the highest order of each byte of 8 continuous bytes being merged into 1 byte:

Successively s SIMD encapsulation of data used to " choosing highest order combination " SIMD instruction, the encapsulation of data of the X input of " choosing highest order combination " SIMD instruction is G _8wM+0, G _8wM+1..., G _8wM+8M-1; Wherein, w is for carrying out the number of operations sequence number of " choosing highest order combination " SIMD instruction, and its span is [0, s-1]; Complete after this " choosing highest order combines " SIMD instruction, the final output stream Z obtaining is: D _wM+0, D _wM+1..., D _wM+M-1.

The innovation key technology of the inventive method is: make full use of the feature of GPP chip multinuclear, multiprocessor, completed at a high speed, the optimization process of general channel decoding.Compare with traditional method based on shifting function, processing speed of the present invention is accelerated greatly, wherein the data transaction before chnnel coding is that input traffic is carried out to map operation, be encapsulated as SIMD data format, make its data format parallelization, can be applicable to SIMD instruction and carry out SIMD algorithm, improve and process degree of parallelism.The advantage of the conversion method of data format after channel decoding is to save accessing operation, simplifies flow path switch.

Another innovation of the present invention is: under GPP chip, utilize single-instruction multiple-data stream (SIMD) SIMD technology, use parallel computation to process, improve computational speed.Because every SIMD instruction can both be to two groups of (or only to wherein one group) each self-contained M data elements encapsulation of data (X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1) carry out parallel work-flow, make every couple of data element X _i, Y _i(0≤i≤M-1) carry out simultaneously same operation (the inventive method be comprise mapping, with, choose smaller value, judge whether to equate, choose one of them of highest order combination).The M an obtaining result of calculation, is used as again data element and is packaged in one group of data (Z ₀, Z ₁... Z _m-1) in, therefore, use SIMD instruction can obviously improve the arithmetic speed of data in GPP chip.

In a word, the present invention has good popularizing application prospect.

Accompanying drawing explanation

Fig. 1 is the conversion method of data format operating procedure flow chart before chnnel coding of the present invention.

Fig. 2 is the content of operation schematic diagram of " mapping " SIMD instruction.

Fig. 3 be " with " the content of operation schematic diagram of SIMD instruction.

Fig. 4 is the content of operation schematic diagram of " choosing smaller value " SIMD instruction.

Fig. 5 is the conversion method of data format operating procedure flow chart after channel decoding of the present invention.

Fig. 6 is the content of operation schematic diagram of " judging whether to equate " SIMD instruction.

Fig. 7 is the content of operation schematic diagram of " choosing highest order combination " SIMD instruction.

Embodiment

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, the present invention is described in further detail.

The SIMD technology adopting in the inventive method when carrying out every instruction, the SIMD encapsulation of data X to two groups of each self-contained M elements concurrently ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1execution comprises the various operations (while carrying out every SIMD instruction, also can only process one group of encapsulation of data) that judge whether to equate and choose highest order combination.And now, every couple of data element X _iand Y _icarry out same operation, wherein, i is the data sequence number in SIMD encapsulation of data simultaneously, and its span is [0, M-1]; Using a resulting M result of calculation as data element, be encapsulated in again one group of SIMD form encapsulation of data Z ₀, Z ₁... Z _m-1in; Wherein, bit length P=64 * 2 of encapsulation of data ^p; When data element type is byte, corresponding Q=8, wherein, the length M of SIMD encapsulation of data depends on bit length P and the shared bit length Q of data element type of encapsulation of data, its computing formula is

wherein, bit length P=64 * 2 of encapsulation of data ^p, p is natural number; When data element type is word, corresponding Q=8, when data element type is byte, corresponding Q=16.

The conversion method of data format the present invention is based on before the chnnel coding of GPP and SIMD technology is: by word length, be first the input traffic A of n byte ₀a ₁a ₂... A _n-1be encapsulated as the SIMD formatted data that s length is M, make its data format parallelization, can be applicable to SIMD instruction and it is carried out to parallel " mapping " operation: each byte of input traffic be copied as to 8 bytes, be converted into the first intermediary data stream E ₀e ₁... E _8n-1; Again to the first intermediary data stream E ₀e ₁... E _8n-1carry out parallel AND-operation, extract after each bit information, be converted into the second intermediary data stream F ₀f ₁... F _8n-1; Finally to the second intermediary data stream F ₀f ₁... F _8n-1carry out parallel " choosing smaller value " operation, each bit information is moved on to the lowest order of the byte at its place, generating word length is the output stream B of 8n byte ₀b ₁b ₂... B _8n-1; Wherein, byte length n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.

Referring to Fig. 1, introduce the concrete operation step of said method of the present invention:

Step 1, carries out parallel " mapping " SIMD instruction, completes data Replica operation:

\begin{matrix} {\frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \\ . . ., \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1} \end{matrix},

E _8tM+uM+0,E _8tM+uM+1,...,E _8tM+uM+M-1。

The SIMD encapsulation of data of 128 bits of take describes as embodiment: the A circuit-switched data A that the SIMD encapsulation of data of the X input of " mapping " SIMD instruction is 16 continuous bytes ₀a ₁a ₂a ₃a ₄a ₅a ₆a ₇a ₈a ₉a ₁₀a ₁₁a ₁₂a ₁₃a ₁₄a ₁₅, carrying out 8 circulations, the SIMD encapsulation of data of the Y input of each mapping SIMD instruction is:

Y={2u, 2u, 2u, 2u, 2u, 2u, 2u, 2u, 2u+1,2u+1,2u+1,2u+1,2u+1,2u+1,2u+1,2u+1}, wherein, and 0≤u≤7, the Z output intermediate variable C circuit-switched data of each mapping SIMD instruction is:

E _16ue _16u+1e _16u+2e _16u+3e _16u+4e _16u+5e _16u+6e _16u+7e _16u+8e _16u+9e _16u+10e _16u+11e _16u+12e _16u+13e _16u+14e _16u+15, wherein front 8 byte datas equal respectively A _2u, rear 8 byte datas equal respectively A _2u+1.So just completed each original byte has been copied as respectively to 8 bytes Coutinuous store in operation together.

Step 2, carry out parallel " with " SIMD instruction, complete the operation of extracting bit:

In embodiment, " with " the SIMD encapsulation of data of the X of SIMD instruction input is the E circuit-switched data that step 1 generates, the SIMD encapsulation of data of Y input is Y={1,2,4,8,16,32,64,128,1,2,4,8,16,32,64,128 ... }, " with " Z of SIMD instruction is output as intermediate variable F circuit-switched data.

Step 3, carries out parallel " choosing smaller value " SIMD instruction, completes the lowest order operation that significant bit is displaced to its place byte:

In embodiment, the SIMD encapsulation of data of the X of " choosing smaller value " SIMD instruction input is the F circuit-switched data that step 2 generates, and the SIMD encapsulation of data of Y input is Y={1,1,1,1 ..., the Z output of " choosing smaller value " SIMD instruction is exactly final B circuit-switched data.

Referring to Fig. 2, introduce the content of operation of " mapping " SIMD instruction: two groups of SIMD encapsulation of data X to input ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, completing concurrently common M data are carried out after " mapping " SIMD instruction, the SIMD encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein, i element in the SIMD encapsulation of data of output is with Y _ifor subscript, find X ₀, X ₁... X _m-1in respective value,

wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].

Referring to Fig. 3, introduce " with " content of operation of SIMD instruction: two groups of encapsulation of data to input are X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, complete concurrently common M to data carry out " with " after SIMD command operating, the SIMD encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein, Z _i=X _iaMP.AMp.Amp Y _i; Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].

Referring to Fig. 4, introduce the content of operation of " choosing smaller value " SIMD instruction: two groups of encapsulation of data to input are X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, complete concurrently common M to " choosing smaller value " SIMD command operating after, the SIMD encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein, Z _i=min (X _i, Y _i); Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].

The conversion method of data format the present invention is based on after the channel decoding of GPP and SIMD technology is: by word length, be the input traffic C of 8n byte ₀c ₁c ₂... C _8n-1be encapsulated as the SIMD encapsulation format data that 8s length is M, make its data format parallelization, can be applicable to SIMD order structure and it is carried out to parallel " judging whether to equate " operation, each bit information be moved on to the highest order of the byte at its place, thereby be converted into intermediary data stream G ₀g ₁... G _8n-1; Again to this intermediary data stream G ₀g ₁... G _8n-1carry out parallel " choosing highest order combination " operation, generating word length is the output stream D of n byte ₀d ₁d ₂... D _n-1; Wherein, n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.

Referring to Fig. 5, introduce the following concrete operation step of the conversion method of data format after channel decoding:

Step 1, is used parallel " judging whether to equate " SIMD instruction, completes the operation that significant bit is displaced to the highest order of place byte:

Successively 8s SIMD encapsulation of data used to " judging whether to equate " SIMD instruction, the encapsulation of data of the X input of " judging whether to equate " SIMD instruction is C _kM+0, C _kM+1..., C _kM+M-1, the encapsulation of data of the Y input of " judging whether to equate " SIMD instruction for 1,1 ..., 1}; Wherein, k is for carrying out the number of operations sequence number of " judging whether to equate " SIMD instruction, and its span is [0,8s-1]; Complete this and " judge whether to equate " after SIMD instruction, the intermediary data stream Z that output obtains is: G _kM+0, G _kM+1..., G _kM+M-1.

Step 2, use parallel " choosing highest order combination " SIMD instruction to complete the operation of the highest order of each byte of 8 continuous bytes being merged into 1 byte:

SIMD with 128 bits is encapsulated as embodiment, and the encapsulation of data of the X input of " choosing highest order combination " SIMD instruction is G circuit-switched data G ₀g ₁g ₂g ₃g ₄g ₅g ₆g ₇g ₈g ₉g ₁₀g ₁₁g ₁₂g ₁₃g ₁₄g ₁₅, after complete " choosing highest order combination " SIMD instruction, the Z obtaining is output as D circuit-switched data;

D ₀d ₁=(d ₀d ₁d ₂d ₃d ₄d ₅d ₆d ₇) (d ₈d ₉d ₁₀d ₁₁d ₁₂d ₁₃d ₁₄d ₁₅), d wherein _ofor G _ohighest order in byte, 0≤o≤15.

Referring to Fig. 6, introduce the content of operation of the SIMD instruction of " judging whether to equate ": two groups of encapsulation of data to input are X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, completing concurrently common M data are carried out after " judging whether to equate " SIMD command operating, output SIMD encapsulation of data is Z ₀, Z ₁... Z _m-1; Wherein, Z _i=X _i==Y _i255:0, the formula X on "=" number right side _i==Y _i255:0 is the conditional operator that machine word calls the turn, and represents: if X _i==Y _iset up, Z _i=255, i.e. X _iwith Y _iwhile equating, Z _iassignment is 255; If X _i==Y _ibe false, Z _i=0, i.e. X _iwith Y _iwhen unequal, Z _iassignment is 0; In formula, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];

Referring to Fig. 7, introduce the content of operation of the SIMD instruction of " choosing highest order combination ": the encapsulation of data to input is X ₀, X ₁... X _8M-1, completing concurrently M to choosing after highest order combination operation, the encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein,

Z _i=((X _8iaMP.AMp.Amp 0x80) <<7) ^ ((X _8i+1aMP.AMp.Amp 0x80) <<6) ^ ((X _8i+2aMP.AMp.Amp 0x80) <<5) ^ ((X _8i+3aMP.AMp.Amp 0x80) <<4); Formula ^ ((X _8i+4aMP.AMp.Amp 0x80) <<3) ^ ((X _8i+5aMP.AMp.Amp 0x80) <<2) ^ ((X _8i+6aMP.AMp.Amp 0x80) <<1) ^ (X _8i+7aMP.AMp.Amp 0x80)

In, i is the data sequence number in SIMD encapsulation of data, its span is [0, M-1].

The present invention has carried out repeatedly implementing test, and the parameter of emulation experiment is:

core ^tMon the cpu chip that i7-3610QM, dominant frequency are 2.3GHz, carry out the comparison of traditional shifting algorithm and the inventive method, coprocessing data transaction and the needed time of the data transaction after channel decoding before the chnnel coding of 655360 bit informations: the data transaction before chnnel coding, tradition look-up method amounts to 299 nanoseconds consuming time, the inventive method 31 nanoseconds consuming time.Data transaction after channel decoding, traditional look-up method 317 nanoseconds consuming time, the inventive method 73 nanoseconds consuming time.Speed, the inventive method is compared with traditional shifting algorithm, and processing speed has had and significantly improves.

In a word, the embodiment of the present invention has been verified the superperformance of this data transfer device, and experimental result is successfully, has realized goal of the invention.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims

1. the conversion method of data format before the chnnel coding based on general processor GPP and single-instruction multiple-data stream (SIMD) SIMD technology, is characterized in that: by word length, be first the input traffic A of n byte ₀a ₁a ₂... A _n-1be encapsulated as the SIMD formatted data that s length is M, make its data format parallelization, can be applicable to SIMD instruction and it is carried out to parallel " mapping " operation: each byte of input traffic be copied as to 8 bytes, be converted into the first intermediary data stream E ₀e ₁... E _8n-1; Again to the first intermediary data stream E ₀e ₁... E _8n-1carry out parallel AND-operation, extract after each bit information, be converted into the second intermediary data stream F ₀f ₁... F _8n-1; Finally to the second intermediary data stream F ₀f ₁... F _8n-1carry out parallel " choosing smaller value " operation, each bit information is moved on to the lowest order of the byte at its place, generating word length is the output stream B of 8n byte ₀b ₁b ₂... B _8n-1; Wherein, byte length n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.

2. method according to claim 1, is characterized in that: described method comprises following operating procedure:

\begin{matrix} {\frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \frac{M}{8} u + 1, \\ . . ., \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1, \frac{M}{8} u + \frac{M}{8} - 1} \end{matrix},

E _8tM+uM+0,E _8tM+uM+1,...,E _8tM+uM+M-1；

3. method according to claim 1, is characterized in that: described SIMD technology when carrying out every instruction, the SIMD encapsulation of data X to two groups of each self-contained M elements concurrently ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1execution comprise mapping, with and choose the various operations of smaller value, and now, every couple of data element X _iand Y _icarry out same operation, wherein, i is the data sequence number in SIMD encapsulation of data simultaneously, and its span is [0, M-1]; Using a resulting M result of calculation as data element, be encapsulated in again one group of SIMD form encapsulation of data Z ₀, Z ₁... Z _m-1in; Wherein, the length M of SIMD encapsulation of data depends on bit length P and the shared bit length Q of data element type of encapsulation of data, and its computing formula is

in formula, bit length P=64 * 2 of encapsulation of data ^p, p is natural number; When data element type is byte, corresponding Q=8, when data element type is word, corresponding Q=16.

4. method according to claim 3, is characterized in that: described SIMD technology, when carrying out every instruction, also can only be processed one group of encapsulation of data according to described method.

5. method according to claim 3, is characterized in that:

The content of operation of described " mapping " SIMD instruction is: two groups of SIMD encapsulation of data X to input ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, completing concurrently common M data are carried out after " mapping " SIMD instruction, the SIMD encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein, i element in the SIMD encapsulation of data of output is with Y _ifor subscript, find X ₀, X ₁... X _m-1in respective value,

wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];

The content of operation of the SIMD instruction of described AND-operation is: two groups of encapsulation of data to input are X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, complete concurrently common M to data carry out " with " after SIMD command operating, the SIMD encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein, Z _i=X _iaMP.AMp.Amp Y _i; Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];

The SIMD command operating content of described " choosing smaller value " operation is: two groups of encapsulation of data to input are X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, complete concurrently common M to " choosing smaller value " SIMD command operating after, the SIMD encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein, Z _i=min (X _i, Y _i); Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].

6. the conversion method of data format after the channel decoding based on GPP and SIMD, is characterized in that: by word length, be the input traffic C of 8n byte ₀c ₁c ₂... C _8n-1be encapsulated as the SIMD encapsulation format data that 8s length is M, make its data format parallelization, can be applicable to SIMD order structure and it is carried out to parallel " judging whether to equate " operation, each bit information be moved on to the highest order of the byte at its place, thereby be converted into intermediary data stream G ₀g ₁... G _8n-1; Again to this intermediary data stream G ₀g ₁... G _8n-1carry out parallel " choosing highest order combination " operation, generating word length is the output stream D of n byte ₀d ₁d ₂... D _n-1; Wherein, n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.

7. method according to claim 6, is characterized in that: described method comprises following operating procedure:

8. method according to claim 6, is characterized in that: described SIMD technology when carrying out every instruction, the SIMD encapsulation of data X to two groups of each self-contained M elements concurrently ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1execution comprises the various operations that judge whether to equate and choose highest order combination, and now, every couple of data element X _iand Y _icarry out same operation, wherein, i is the data sequence number in SIMD encapsulation of data simultaneously, and its span is [0, M-1]; Using a resulting M result of calculation as data element, be encapsulated in again one group of SIMD form encapsulation of data Z ₀, Z ₁... Z _m-1in; Wherein, bit length P=64 * 2 of encapsulation of data ^p; When data element type is byte, corresponding Q=8, wherein, the length M of SIMD encapsulation of data depends on bit length P and the shared bit length Q of data element type of encapsulation of data, its computing formula is

9. method according to claim 8, is characterized in that: described SIMD technology, when carrying out every instruction, also can only be processed one group of encapsulation of data according to described method.

10. method according to claim 8, is characterized in that:

The content of operation of the SIMD instruction of described " judging whether to equate " is: two groups of encapsulation of data to input are X ₀, X ₁... X _m-1and Y ₀, Y ₁... Y _m-1, completing concurrently common M data are carried out after " judging whether to equate " SIMD command operating, output SIMD encapsulation of data is Z ₀, Z ₁... Z _m-1; Wherein, Z _i=X _i==Y _i255:0, the formula X on "=" number right side _i==Y _i255:0 is the conditional operator that machine word calls the turn, and represents: if X _i==Y _iset up, Z _i=255, i.e. X _iwith Y _iwhile equating, Z _iassignment is 255; If X _i==Y _ibe false, Z _i=0, i.e. X _iwith Y _iwhen unequal, Z _iassignment is 0; In formula, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];

The content of operation of the SIMD instruction of described " choosing highest order combination " is: the encapsulation of data to input is X ₀, X ₁... X _8M-1, completing concurrently M to choosing after highest order combination operation, the encapsulation of data of output is Z ₀, Z ₁... Z _m-1; Wherein,

Z _i=((X _8iaMP.AMp.Amp 0x80) <<7) ^ ((X _8i+1aMP.AMp.Amp 0x80) <<6) ^ ((X _8i+2aMP.AMp.Amp 0x80) <<5) ^ ((X _8i+3aMP.AMp.Amp 0x80) <<4); Formula ^ ((X _8i+4aMP.AMp.Amp 0x80) <<3) ^ ((X _8i+5aMP.AMp.Amp 0x80) <<2) ^ ((X _8i+6aMP.AMp.Amp 0x80) <<1) ^ (X _8i+7aMP.AMp.Amp 0x80) in, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].