CN103746771A - Data format conversion method of channel coding and decoding based on GPP and SIMD technologies - Google Patents
Data format conversion method of channel coding and decoding based on GPP and SIMD technologies Download PDFInfo
- Publication number
- CN103746771A CN103746771A CN201310729424.7A CN201310729424A CN103746771A CN 103746771 A CN103746771 A CN 103746771A CN 201310729424 A CN201310729424 A CN 201310729424A CN 103746771 A CN103746771 A CN 103746771A
- Authority
- CN
- China
- Prior art keywords
- data
- simd
- encapsulation
- instruction
- byte
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Advance Control (AREA)
Abstract
The invention relates to a data format conversion method of channel coding and decoding based on GPP and SIMD technologies. Processing before coding is that an input data stream A0A1A2..An-1 with a work length of n bytes is packaged into s SIMDA format data with a length of M first whose data formats are deserialized and are suitable for an SIMD instruction, concurrent ''mapping'' operation is executed, then concurrent ''and'' operation is executed, finally concurrent ''smaller value selection'' operation is executed, and an output data stream B0B1B2..B8n-1 with a work length of 8n bytes is generated; and processing after decoding is that an input data stream C0C1C2..C8n-1 with a work length of 8n bytes is packaged into 8s SIMD format data with a length of M first, concurrent ''equality judgment'' operation is executed, then concurrent ''highest bit combination selection'' operation is executed, and an output data stream D0D1D2..Dn-1 with a word length of n bytes is generated. The data format conversion method of channel coding and decoding based on the GPP and SIMD technologies in the invention uses SIMD concurrent operation instructions, greatly accelerates the conversion speed of data formats, and ensures transmission performance and correctness of coding and decoding. The method also has the characteristics of being low in cost, good in transplantability, easy to debug, easy and convenient to upgrade, etc.
Description
Technical field
The present invention relates to a kind of based on general processor GPP(General Purpose Processor) and single-instruction multiple-data stream (SIMD) SIMD(Single Instruction Multiple Data) conversion method of data format of channel decoding of technology, the technical field of the coding and decoding of communicating by letter belonged to.
Background technology
In channel coding technology, as Turbo code, low-density checksum LDPC(Low Density Parity Check) code and convolution code etc., because its error-correcting performance is superior, over nearly 10 years, in high-speed radiocommunication standard (system of 3G or 4G), obtained being widely used.For example,, at Long Term Evolution LTE/ Long Term Evolution upgrade version LTE-A(Long Term Evolution/Long Term Evolution Advanced) system in, used Turbo code and convolution code; In 802.11 systems, LDPC code and convolution code have been used.
The coding and decoding of this type of channel has high time complexity, needs to expend a large amount of computing times.Yet emerging radio communication standard is again the means of communication towards big data quantity.Traditional communication implementation is mostly based on hardware handles platform, and hardware platform has following a plurality of problem: cost is high, platform scope of application limitation, and debug process is loaded down with trivial details, and the construction cycle is long, program upgrade inconvenience etc.
Over nearly 5 years, take general processor GPP platform as basic software radio ripe gradually.When having overcome the above-mentioned shortcoming of hardware platform, also there is bottleneck in software and radio technique in arithmetic speed.How to reduce the computation complexity that channel coding and decoding is brought, reduce time delay, become the main method of breaking communication system transmission rate bottleneck.
On GPP platform, the transmission of data and storage are all to take byte as base unit,, in computational process, are therefore minimum of computation unit greatly mainly with byte.And in communication system, information, with bit form storage or processing, is namely carried out according to bit: represent a unit information with a bit, be referred to as bit form.On GPP platform, the most efficient channel decoding implementation algorithm is all to using byte as minimum of computation unit at present,, with bit information of a byte representation, is referred to as byte form.Therefore, chnnel coding must have the Data Format Transform function that the bit form of data flow is converted to byte form.How to complete the bit form of data flow and the mutual translation function of byte form, become the Data Format Transform function of current inevitable chnnel coding front end and the Data Format Transform function of channel decoding rear end.
The object of the Data Format Transform of chnnel coding is: the input traffic A that by word length is the bit form of n byte
0a
1a
2... A
n-1being converted to word length is the output stream B of the byte form of 8n byte
0b
1b
2... B
8n-1.Because any one A
g(0≤g≤n-1) and B
h(0≤h≤8n-1) is all 1 byte, and word length is 8 bits, and its index number is less, represents its Data Format Transform completing also more early.Wherein, A
g=(a
8ga
8g+1a
8g+2a
8g+3a
8g+4a
8g+5a
8g+6a
8g+7), because any one element a is wherein 1 bit, the index number of a is less, and representative is the closer to the low level of its place byte, B
h=(a
h0000000), a
hfor h bit in input traffic A, it is positioned at B
hlowest order in byte.
The object of the Data Format Transform of channel decoding has been the inverse operation of above-mentioned chnnel coding: by word length, be the input traffic C of the byte form of 8n byte
0c
1c
2... C
8n-1being converted to word length is the output stream D of the bit form of n byte
0d
1d
2... D
n-1.Because any one C
l(0≤l≤8n-1) and D
e(0≤e≤n-1) is all 1 byte, and word length is 8 bits, and its index number is less, represents its Data Format Transform completing also more early.Wherein, C
l=(d
l0000000), d
lfor l bit in output stream D, it is positioned at C
llowest order in byte; D
e=(d
8ed
8e+1d
8e+2d
8e+3d
8e+4d
8e+5d
8e+6d
8e+7), and wherein any one element d is 1 bit, the index number of d is less, represents that it is the closer to the low level of place byte.
The conventional method of the channel-encoded data format conversion under GPP framework is: use based on displacement and " with " operation complete.With byte A
0for example, be translated into 8 byte B
0b
1b
2b
3b
4b
5b
6b
7time, this byte A
0=(a
0a
1a
2a
3a
4a
5a
6a
7) to circulate and carry out 8 following operations: each content of operation is all B
f=(A
0<<f) & 1, and wherein, f is byte sequence number, and 0≤f≤7; Like this, when f=4, B
4=(A
0<<4) & 1=(a
4a
5a
6a
70000) & (10000000)=(a
40000000).Therefore, by input traffic A
0a
1a
2... A
n-1convert output stream B to
0b
1b
2... B
8n-1time, just need circulation to carry out aforesaid operations n time: each circulation just completes the conversion of a byte in input traffic, i.e. the inferior circulation of g (0≤g≤n-1) has been aforementioned by A
gbe converted to B
8gb
8g+1b
8g+2b
8g+3b
8g+4b
8g+5b
8g+6b
8g+7operation; Namely above-mentioned the g time circulation all includes 8 subcycles, a bit of each subcycle conversion, i.e. and the inferior subcycle of f (0≤f≤7) completes B
8g+f=(A
g<<f) operation of & 1.
The Data Format Transform conventional method of the channel decoding under GPP framework is: the operation based on " displacement " and distance completes.With 8 byte C
0c
1c
2c
3c
4c
5c
6c
7for example, be translated into 1 byte D
0time, C wherein
q=(d
q0000000); First make D
0=0=(00000000), 8 following operations are carried out in recirculation: each content of operation is: D
0=D
0^ (C
q>>q), wherein, q is byte sequence number, and 0≤q≤7.Like this, when q=4, D
0=D
0^ (C
4>>4)=(d
0d
1d
2d
30000) ^ (0000d
4000)=(d
0d
1d
2d
3d
4000).And by input traffic C
0c
1c
2... C
8n-1convert output stream D to
0d
1d
2... D
n-1, will circulate and carry out n aforesaid operations, the conversion of 8 bytes in input traffic that at every turn circulated, i.e. the inferior circulation of e (0≤e≤n-1) has been by C
8ec
8e+1c
8e+2c
8e+3c
8e+4c
8e+5c
8e+6c
8e+7be converted to D
e.The e time above-mentioned cycling content is: first by D
e=0, then carry out 8 subcycles, a bit of each subcycle conversion, i.e. the inferior subcycle of q (0≤q≤7) completes D
e=D
e^ (C
8e+q>>q) operation.
The shortcoming of above-mentioned two kinds of conventional methods is: the operating unit of each computing only has 1 byte, carry out " displacement ", " with ", distance etc. is when operate, efficiency is on the low side.Therefore, how to improve the operating efficiency of coding&decoding, solve processing speed problem, become the focus problem that scientific and technical personnel pay close attention in the industry.
Single instruction stream multiple data stream SIMD(Single Instruction Multiple Data) be that controller of a kind of employing is controlled a plurality of processors, each data in one group of data (claiming again " data vector ") are carried out respectively to identical operation simultaneously, thus the technology that the concurrency on implementation space is processed.In microprocessor, single instruction stream multiple data stream technology is that a controller is controlled a plurality of parallel processing infinitesimals, for example the 3D Now technology of the MMX of Intel or SSE and AMD.
Summary of the invention
In view of this, the object of this invention is to provide a kind of based on general processor GPP(General Purpose Processor) and single-instruction multiple-data stream (SIMD) SIMD(Single Instruction Multiple Data) conversion method of data format of channel decoding of technology, the method is guaranteeing on the basis of transmission performance and coding and decoding correctness, redesign the applicable transfer algorithm of SIMD, use SIMD parallel work-flow instruction, greatly accelerate conversion speed; Because the present invention adopts GPP chip to realize, have cost low, portable good, debugging is simple and upgrade the feature such as easy.
In order to achieve the above object, the invention provides a kind of based on general processor GPP(General Purpose Processor) and single-instruction multiple-data stream (SIMD) SIMD(Single Instruction Multiple Data) chnnel coding before conversion method of data format, it is characterized in that: by word length, be first the input traffic A of n byte
0a
1a
2... A
n-1be encapsulated as the SIMD formatted data that s length is M, make its data format parallelization, can be applicable to SIMD instruction and it is carried out to parallel " mapping " operation: each byte of input traffic be copied as to 8 bytes, be converted into the first intermediary data stream E
0e
1... E
8n-1; Again to the first intermediary data stream E
0e
1... E
8n-1carry out parallel AND-operation, extract after each bit information, be converted into the second intermediary data stream F
0f
1... F
8n-1; Finally to the second intermediary data stream F
0f
1... F
8n-1carry out parallel " choosing smaller value " operation, each bit information is moved on to the lowest order of the byte at its place, generating word length is the output stream B of 8n byte
0b
1b
2... B
8n-1; Wherein, byte length n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.
Described method comprises following operating procedure:
(1) carry out parallel " mapping " SIMD instruction, complete data Replica operation:
Successively s SIMD encapsulation of data used to " mapping " SIMD instruction, the SIMD encapsulation of data of the X input of " mapping " SIMD instruction is A
tM+0, A
tM+1..., A
tM+M-1, and each X input data will participate in interior loop 8 times; Wherein, t is for carrying out the number of operations sequence number of the outer circulation of " mapping " SIMD instruction, and its span is [0, s-1]; U is for carrying out the number of operations sequence number of " mapping " SIMD instruction interior loop, and its span is [0,7]; The SIMD encapsulation of data of the Y of " mapping " SIMD instruction of the u time interior loop of the t time outer circulation input is:
E
8tM+uM+0,E
8tM+uM+1,...,E
8tM+uM+M-1;
(2) carry out parallel " with " SIMD instruction, complete the operation of extracting bit:
Successively to 8s encapsulation of data carry out parallel " with " SIMD instruction, should " with " the SIMD encapsulation of data of the X input of SIMD instruction is the first intermediary data stream Z:E
rM+0, E
rM+1..., E
rM+M-1, " with " the SIMD encapsulation of data of the Y of SIMD instruction input is:
1,2,4,8,16,32,64,128,1,2,4,8,16,32,64,128 ..., 1,2,4,8,16,32,64,128}; Wherein, r for carrying out " with " the number of operations sequence number of SIMD instruction, its span is [0,8s-1]; Complete this " with " after SIMD instruction, the second intermediary data stream Z obtaining is: F
rM+0, F
rM+1..., F
rM+M-1;
(3) carry out parallel " choosing smaller value " SIMD instruction, complete the lowest order operation that significant bit is displaced to its place byte:
Successively 8s SIMD encapsulation of data carried out to " choosing smaller value " SIMD instruction, the SIMD encapsulation of data of the X input of " choosing smaller value " SIMD instruction is F
jM+0, F
jM+1..., F
jM+M-1, then arrange " choosing smaller value " SIMD instruction Y input SIMD encapsulation of data for 1,1 ..., 1}; Wherein, j is for carrying out the number of operations sequence number of " choosing smaller value " SIMD instruction, and its span is [0,8s-1]; Complete after this SIMD instruction of " choosing smaller value ", the final output stream Z obtaining is: B
jM+0, B
jM+1..., B
jM+M-1.
In order to achieve the above object, the present invention also provides the conversion method of data format after a kind of channel decoding based on GPP and SIMD, it is characterized in that: by word length, be the input traffic C of 8n byte
0c
1c
2... C
8n-1be encapsulated as the SIMD encapsulation format data that 8s length is M, make its data format parallelization, can be applicable to SIMD order structure and it is carried out to parallel " judging whether to equate " operation, each bit information be moved on to the highest order of the byte at its place, thereby be converted into intermediary data stream G
0g
1... G
8n-1; Again to this intermediary data stream G
0g
1... G
8n-1carry out parallel " choosing highest order combination " operation, generating word length is the output stream D of n byte
0d
1d
2... D
n-1; Wherein, n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.
Described method comprises following operating procedure:
(1) use parallel " judging whether to equate " SIMD instruction, complete the operation that significant bit is displaced to the highest order of place byte:
Successively 8s SIMD encapsulation of data used to " judging whether to equate " SIMD instruction, the encapsulation of data of the X input of " judging whether to equate " SIMD instruction is C
kM+0, C
kM+1..., C
kM+M-1, the encapsulation of data of the Y input of " judging whether to equate " SIMD instruction for 1,1 ..., 1}; Wherein, k is for carrying out the number of operations sequence number of " judging whether to equate " SIMD instruction, and its span is [0,8s-1]; Complete this and " judge whether to equate " that after SIMD instruction, the intermediary data stream Z obtaining is: G
kM+0, G
kM+1..., G
kM+M-1;
(2) use parallel " choosing highest order combination " SIMD instruction to complete the operation of the highest order of each byte of 8 continuous bytes being merged into 1 byte:
Successively s SIMD encapsulation of data used to " choosing highest order combination " SIMD instruction, the encapsulation of data of the X input of " choosing highest order combination " SIMD instruction is G
8wM+0, G
8wM+1..., G
8wM+8M-1; Wherein, w is for carrying out the number of operations sequence number of " choosing highest order combination " SIMD instruction, and its span is [0, s-1]; Complete after this " choosing highest order combines " SIMD instruction, the final output stream Z obtaining is: D
wM+0, D
wM+1..., D
wM+M-1.
The innovation key technology of the inventive method is: make full use of the feature of GPP chip multinuclear, multiprocessor, completed at a high speed, the optimization process of general channel decoding.Compare with traditional method based on shifting function, processing speed of the present invention is accelerated greatly, wherein the data transaction before chnnel coding is that input traffic is carried out to map operation, be encapsulated as SIMD data format, make its data format parallelization, can be applicable to SIMD instruction and carry out SIMD algorithm, improve and process degree of parallelism.The advantage of the conversion method of data format after channel decoding is to save accessing operation, simplifies flow path switch.
Another innovation of the present invention is: under GPP chip, utilize single-instruction multiple-data stream (SIMD) SIMD technology, use parallel computation to process, improve computational speed.Because every SIMD instruction can both be to two groups of (or only to wherein one group) each self-contained M data elements encapsulation of data (X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1) carry out parallel work-flow, make every couple of data element X
i, Y
i(0≤i≤M-1) carry out simultaneously same operation (the inventive method be comprise mapping, with, choose smaller value, judge whether to equate, choose one of them of highest order combination).The M an obtaining result of calculation, is used as again data element and is packaged in one group of data (Z
0, Z
1... Z
m-1) in, therefore, use SIMD instruction can obviously improve the arithmetic speed of data in GPP chip.
In a word, the present invention has good popularizing application prospect.
Accompanying drawing explanation
Fig. 1 is the conversion method of data format operating procedure flow chart before chnnel coding of the present invention.
Fig. 2 is the content of operation schematic diagram of " mapping " SIMD instruction.
Fig. 3 be " with " the content of operation schematic diagram of SIMD instruction.
Fig. 4 is the content of operation schematic diagram of " choosing smaller value " SIMD instruction.
Fig. 5 is the conversion method of data format operating procedure flow chart after channel decoding of the present invention.
Fig. 6 is the content of operation schematic diagram of " judging whether to equate " SIMD instruction.
Fig. 7 is the content of operation schematic diagram of " choosing highest order combination " SIMD instruction.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, the present invention is described in further detail.
Single instruction stream multiple data stream SIMD(Single Instruction Multiple Data) be that controller of a kind of employing is controlled a plurality of processors, each data in one group of data (claiming again " data vector ") are carried out respectively to identical operation simultaneously, thus the technology that the concurrency on implementation space is processed.In microprocessor, single instruction stream multiple data stream technology is that a controller is controlled a plurality of parallel processing infinitesimals, for example the 3D Now technology of the MMX of Intel or SSE and AMD.
The SIMD technology adopting in the inventive method when carrying out every instruction, the SIMD encapsulation of data X to two groups of each self-contained M elements concurrently
0, X
1... X
m-1and Y
0, Y
1... Y
m-1execution comprises the various operations (while carrying out every SIMD instruction, also can only process one group of encapsulation of data) that judge whether to equate and choose highest order combination.And now, every couple of data element X
iand Y
icarry out same operation, wherein, i is the data sequence number in SIMD encapsulation of data simultaneously, and its span is [0, M-1]; Using a resulting M result of calculation as data element, be encapsulated in again one group of SIMD form encapsulation of data Z
0, Z
1... Z
m-1in; Wherein, bit length P=64 * 2 of encapsulation of data
p; When data element type is byte, corresponding Q=8, wherein, the length M of SIMD encapsulation of data depends on bit length P and the shared bit length Q of data element type of encapsulation of data, its computing formula is
wherein, bit length P=64 * 2 of encapsulation of data
p, p is natural number; When data element type is word, corresponding Q=8, when data element type is byte, corresponding Q=16.
The conversion method of data format the present invention is based on before the chnnel coding of GPP and SIMD technology is: by word length, be first the input traffic A of n byte
0a
1a
2... A
n-1be encapsulated as the SIMD formatted data that s length is M, make its data format parallelization, can be applicable to SIMD instruction and it is carried out to parallel " mapping " operation: each byte of input traffic be copied as to 8 bytes, be converted into the first intermediary data stream E
0e
1... E
8n-1; Again to the first intermediary data stream E
0e
1... E
8n-1carry out parallel AND-operation, extract after each bit information, be converted into the second intermediary data stream F
0f
1... F
8n-1; Finally to the second intermediary data stream F
0f
1... F
8n-1carry out parallel " choosing smaller value " operation, each bit information is moved on to the lowest order of the byte at its place, generating word length is the output stream B of 8n byte
0b
1b
2... B
8n-1; Wherein, byte length n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.
Referring to Fig. 1, introduce the concrete operation step of said method of the present invention:
Successively s SIMD encapsulation of data used to " mapping " SIMD instruction, the SIMD encapsulation of data of the X input of " mapping " SIMD instruction is A
tM+0, A
tM+1..., A
tM+M-1, and each X input data will participate in interior loop 8 times; Wherein, t is for carrying out the number of operations sequence number of the outer circulation of " mapping " SIMD instruction, and its span is [0, s-1]; U is for carrying out the number of operations sequence number of " mapping " SIMD instruction interior loop, and its span is [0,7]; The SIMD encapsulation of data of the Y of " mapping " SIMD instruction of the u time interior loop of the t time outer circulation input is:
E
8tM+uM+0,E
8tM+uM+1,...,E
8tM+uM+M-1。
The SIMD encapsulation of data of 128 bits of take describes as embodiment: the A circuit-switched data A that the SIMD encapsulation of data of the X input of " mapping " SIMD instruction is 16 continuous bytes
0a
1a
2a
3a
4a
5a
6a
7a
8a
9a
10a
11a
12a
13a
14a
15, carrying out 8 circulations, the SIMD encapsulation of data of the Y input of each mapping SIMD instruction is:
Y={2u, 2u, 2u, 2u, 2u, 2u, 2u, 2u, 2u+1,2u+1,2u+1,2u+1,2u+1,2u+1,2u+1,2u+1}, wherein, and 0≤u≤7, the Z output intermediate variable C circuit-switched data of each mapping SIMD instruction is:
E
16ue
16u+1e
16u+2e
16u+3e
16u+4e
16u+5e
16u+6e
16u+7e
16u+8e
16u+9e
16u+10e
16u+11e
16u+12e
16u+13e
16u+14e
16u+15, wherein front 8 byte datas equal respectively A
2u, rear 8 byte datas equal respectively A
2u+1.So just completed each original byte has been copied as respectively to 8 bytes Coutinuous store in operation together.
Successively to 8s encapsulation of data carry out parallel " with " SIMD instruction, should " with " the SIMD encapsulation of data of the X input of SIMD instruction is the first intermediary data stream Z:E
rM+0, E
rM+1..., E
rM+M-1, " with " the SIMD encapsulation of data of the Y of SIMD instruction input is:
1,2,4,8,16,32,64,128,1,2,4,8,16,32,64,128 ..., 1,2,4,8,16,32,64,128}; Wherein, r for carrying out " with " the number of operations sequence number of SIMD instruction, its span is [0,8s-1]; Complete this " with " after SIMD instruction, the second intermediary data stream Z obtaining is: F
rM+0, F
rM+1..., F
rM+M-1;
In embodiment, " with " the SIMD encapsulation of data of the X of SIMD instruction input is the E circuit-switched data that step 1 generates, the SIMD encapsulation of data of Y input is Y={1,2,4,8,16,32,64,128,1,2,4,8,16,32,64,128 ... }, " with " Z of SIMD instruction is output as intermediate variable F circuit-switched data.
Successively 8s SIMD encapsulation of data carried out to " choosing smaller value " SIMD instruction, the SIMD encapsulation of data of the X input of " choosing smaller value " SIMD instruction is F
jM+0, F
jM+1..., F
jM+M-1, then arrange " choosing smaller value " SIMD instruction Y input SIMD encapsulation of data for 1,1 ..., 1}; Wherein, j is for carrying out the number of operations sequence number of " choosing smaller value " SIMD instruction, and its span is [0,8s-1]; Complete after this SIMD instruction of " choosing smaller value ", the final output stream Z obtaining is: B
jM+0, B
jM+1..., B
jM+M-1.
In embodiment, the SIMD encapsulation of data of the X of " choosing smaller value " SIMD instruction input is the F circuit-switched data that step 2 generates, and the SIMD encapsulation of data of Y input is Y={1,1,1,1 ..., the Z output of " choosing smaller value " SIMD instruction is exactly final B circuit-switched data.
Referring to Fig. 2, introduce the content of operation of " mapping " SIMD instruction: two groups of SIMD encapsulation of data X to input
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, completing concurrently common M data are carried out after " mapping " SIMD instruction, the SIMD encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein, i element in the SIMD encapsulation of data of output is with Y
ifor subscript, find X
0, X
1... X
m-1in respective value,
wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].
Referring to Fig. 3, introduce " with " content of operation of SIMD instruction: two groups of encapsulation of data to input are X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, complete concurrently common M to data carry out " with " after SIMD command operating, the SIMD encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein, Z
i=X
iaMP.AMp.Amp Y
i; Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].
Referring to Fig. 4, introduce the content of operation of " choosing smaller value " SIMD instruction: two groups of encapsulation of data to input are X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, complete concurrently common M to " choosing smaller value " SIMD command operating after, the SIMD encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein, Z
i=min (X
i, Y
i); Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].
The conversion method of data format the present invention is based on after the channel decoding of GPP and SIMD technology is: by word length, be the input traffic C of 8n byte
0c
1c
2... C
8n-1be encapsulated as the SIMD encapsulation format data that 8s length is M, make its data format parallelization, can be applicable to SIMD order structure and it is carried out to parallel " judging whether to equate " operation, each bit information be moved on to the highest order of the byte at its place, thereby be converted into intermediary data stream G
0g
1... G
8n-1; Again to this intermediary data stream G
0g
1... G
8n-1carry out parallel " choosing highest order combination " operation, generating word length is the output stream D of n byte
0d
1d
2... D
n-1; Wherein, n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.
Referring to Fig. 5, introduce the following concrete operation step of the conversion method of data format after channel decoding:
Successively 8s SIMD encapsulation of data used to " judging whether to equate " SIMD instruction, the encapsulation of data of the X input of " judging whether to equate " SIMD instruction is C
kM+0, C
kM+1..., C
kM+M-1, the encapsulation of data of the Y input of " judging whether to equate " SIMD instruction for 1,1 ..., 1}; Wherein, k is for carrying out the number of operations sequence number of " judging whether to equate " SIMD instruction, and its span is [0,8s-1]; Complete this and " judge whether to equate " after SIMD instruction, the intermediary data stream Z that output obtains is: G
kM+0, G
kM+1..., G
kM+M-1.
Successively s SIMD encapsulation of data used to " choosing highest order combination " SIMD instruction, the encapsulation of data of the X input of " choosing highest order combination " SIMD instruction is G
8wM+0, G
8wM+1..., G
8wM+8M-1; Wherein, w is for carrying out the number of operations sequence number of " choosing highest order combination " SIMD instruction, and its span is [0, s-1]; Complete after this " choosing highest order combines " SIMD instruction, the final output stream Z obtaining is: D
wM+0, D
wM+1..., D
wM+M-1.
SIMD with 128 bits is encapsulated as embodiment, and the encapsulation of data of the X input of " choosing highest order combination " SIMD instruction is G circuit-switched data G
0g
1g
2g
3g
4g
5g
6g
7g
8g
9g
10g
11g
12g
13g
14g
15, after complete " choosing highest order combination " SIMD instruction, the Z obtaining is output as D circuit-switched data;
D
0d
1=(d
0d
1d
2d
3d
4d
5d
6d
7) (d
8d
9d
10d
11d
12d
13d
14d
15), d wherein
ofor G
ohighest order in byte, 0≤o≤15.
Referring to Fig. 6, introduce the content of operation of the SIMD instruction of " judging whether to equate ": two groups of encapsulation of data to input are X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, completing concurrently common M data are carried out after " judging whether to equate " SIMD command operating, output SIMD encapsulation of data is Z
0, Z
1... Z
m-1; Wherein, Z
i=X
i==Y
i255:0, the formula X on "=" number right side
i==Y
i255:0 is the conditional operator that machine word calls the turn, and represents: if X
i==Y
iset up, Z
i=255, i.e. X
iwith Y
iwhile equating, Z
iassignment is 255; If X
i==Y
ibe false, Z
i=0, i.e. X
iwith Y
iwhen unequal, Z
iassignment is 0; In formula, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];
Referring to Fig. 7, introduce the content of operation of the SIMD instruction of " choosing highest order combination ": the encapsulation of data to input is X
0, X
1... X
8M-1, completing concurrently M to choosing after highest order combination operation, the encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein,
Z
i=((X
8iaMP.AMp.Amp 0x80) <<7) ^ ((X
8i+1aMP.AMp.Amp 0x80) <<6) ^ ((X
8i+2aMP.AMp.Amp 0x80) <<5) ^ ((X
8i+3aMP.AMp.Amp 0x80) <<4); Formula ^ ((X
8i+4aMP.AMp.Amp 0x80) <<3) ^ ((X
8i+5aMP.AMp.Amp 0x80) <<2) ^ ((X
8i+6aMP.AMp.Amp 0x80) <<1) ^ (X
8i+7aMP.AMp.Amp 0x80)
In, i is the data sequence number in SIMD encapsulation of data, its span is [0, M-1].
The present invention has carried out repeatedly implementing test, and the parameter of emulation experiment is:
core
tMon the cpu chip that i7-3610QM, dominant frequency are 2.3GHz, carry out the comparison of traditional shifting algorithm and the inventive method, coprocessing data transaction and the needed time of the data transaction after channel decoding before the chnnel coding of 655360 bit informations: the data transaction before chnnel coding, tradition look-up method amounts to 299 nanoseconds consuming time, the inventive method 31 nanoseconds consuming time.Data transaction after channel decoding, traditional look-up method 317 nanoseconds consuming time, the inventive method 73 nanoseconds consuming time.Speed, the inventive method is compared with traditional shifting algorithm, and processing speed has had and significantly improves.
In a word, the embodiment of the present invention has been verified the superperformance of this data transfer device, and experimental result is successfully, has realized goal of the invention.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.
Claims (10)
1. the conversion method of data format before the chnnel coding based on general processor GPP and single-instruction multiple-data stream (SIMD) SIMD technology, is characterized in that: by word length, be first the input traffic A of n byte
0a
1a
2... A
n-1be encapsulated as the SIMD formatted data that s length is M, make its data format parallelization, can be applicable to SIMD instruction and it is carried out to parallel " mapping " operation: each byte of input traffic be copied as to 8 bytes, be converted into the first intermediary data stream E
0e
1... E
8n-1; Again to the first intermediary data stream E
0e
1... E
8n-1carry out parallel AND-operation, extract after each bit information, be converted into the second intermediary data stream F
0f
1... F
8n-1; Finally to the second intermediary data stream F
0f
1... F
8n-1carry out parallel " choosing smaller value " operation, each bit information is moved on to the lowest order of the byte at its place, generating word length is the output stream B of 8n byte
0b
1b
2... B
8n-1; Wherein, byte length n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.
2. method according to claim 1, is characterized in that: described method comprises following operating procedure:
(1) carry out parallel " mapping " SIMD instruction, complete data Replica operation:
Successively s SIMD encapsulation of data used to " mapping " SIMD instruction, the SIMD encapsulation of data of the X input of " mapping " SIMD instruction is A
tM+0, A
tM+1..., A
tM+M-1, and each X input data will participate in interior loop 8 times; Wherein, t is for carrying out the number of operations sequence number of the outer circulation of " mapping " SIMD instruction, and its span is [0, s-1]; U is for carrying out the number of operations sequence number of " mapping " SIMD instruction interior loop, and its span is [0,7]; The SIMD encapsulation of data of the Y of " mapping " SIMD instruction of the u time interior loop of the t time outer circulation input is:
E
8tM+uM+0,E
8tM+uM+1,...,E
8tM+uM+M-1;
(2) carry out parallel " with " SIMD instruction, complete the operation of extracting bit:
Successively to 8s encapsulation of data carry out parallel " with " SIMD instruction, should " with " the SIMD encapsulation of data of the X input of SIMD instruction is the first intermediary data stream Z:E
rM+0, E
rM+1..., E
rM+M-1, " with " the SIMD encapsulation of data of the Y of SIMD instruction input is:
1,2,4,8,16,32,64,128,1,2,4,8,16,32,64,128 ..., 1,2,4,8,16,32,64,128}; Wherein, r for carrying out " with " the number of operations sequence number of SIMD instruction, its span is [0,8s-1]; Complete this " with " after SIMD instruction, the second intermediary data stream Z obtaining is: F
rM+0, F
rM+1..., F
rM+M-1;
(3) carry out parallel " choosing smaller value " SIMD instruction, complete the lowest order operation that significant bit is displaced to its place byte:
Successively 8s SIMD encapsulation of data carried out to " choosing smaller value " SIMD instruction, the SIMD encapsulation of data of the X input of " choosing smaller value " SIMD instruction is F
jM+0, F
jM+1..., F
jM+M-1, then arrange " choosing smaller value " SIMD instruction Y input SIMD encapsulation of data for 1,1 ..., 1}; Wherein, j is for carrying out the number of operations sequence number of " choosing smaller value " SIMD instruction, and its span is [0,8s-1]; Complete after this SIMD instruction of " choosing smaller value ", the final output stream Z obtaining is: B
jM+0, B
jM+1..., B
jM+M-1.
3. method according to claim 1, is characterized in that: described SIMD technology when carrying out every instruction, the SIMD encapsulation of data X to two groups of each self-contained M elements concurrently
0, X
1... X
m-1and Y
0, Y
1... Y
m-1execution comprise mapping, with and choose the various operations of smaller value, and now, every couple of data element X
iand Y
icarry out same operation, wherein, i is the data sequence number in SIMD encapsulation of data simultaneously, and its span is [0, M-1]; Using a resulting M result of calculation as data element, be encapsulated in again one group of SIMD form encapsulation of data Z
0, Z
1... Z
m-1in; Wherein, the length M of SIMD encapsulation of data depends on bit length P and the shared bit length Q of data element type of encapsulation of data, and its computing formula is
in formula, bit length P=64 * 2 of encapsulation of data
p, p is natural number; When data element type is byte, corresponding Q=8, when data element type is word, corresponding Q=16.
4. method according to claim 3, is characterized in that: described SIMD technology, when carrying out every instruction, also can only be processed one group of encapsulation of data according to described method.
5. method according to claim 3, is characterized in that:
The content of operation of described " mapping " SIMD instruction is: two groups of SIMD encapsulation of data X to input
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, completing concurrently common M data are carried out after " mapping " SIMD instruction, the SIMD encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein, i element in the SIMD encapsulation of data of output is with Y
ifor subscript, find X
0, X
1... X
m-1in respective value,
wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];
The content of operation of the SIMD instruction of described AND-operation is: two groups of encapsulation of data to input are X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, complete concurrently common M to data carry out " with " after SIMD command operating, the SIMD encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein, Z
i=X
iaMP.AMp.Amp Y
i; Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];
The SIMD command operating content of described " choosing smaller value " operation is: two groups of encapsulation of data to input are X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, complete concurrently common M to " choosing smaller value " SIMD command operating after, the SIMD encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein, Z
i=min (X
i, Y
i); Wherein, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].
6. the conversion method of data format after the channel decoding based on GPP and SIMD, is characterized in that: by word length, be the input traffic C of 8n byte
0c
1c
2... C
8n-1be encapsulated as the SIMD encapsulation format data that 8s length is M, make its data format parallelization, can be applicable to SIMD order structure and it is carried out to parallel " judging whether to equate " operation, each bit information be moved on to the highest order of the byte at its place, thereby be converted into intermediary data stream G
0g
1... G
8n-1; Again to this intermediary data stream G
0g
1... G
8n-1carry out parallel " choosing highest order combination " operation, generating word length is the output stream D of n byte
0d
1d
2... D
n-1; Wherein, n=M * s, natural number M and s are respectively length and the numbers of SIMD encapsulation of data.
7. method according to claim 6, is characterized in that: described method comprises following operating procedure:
(1) use parallel " judging whether to equate " SIMD instruction, complete the operation that significant bit is displaced to the highest order of place byte:
Successively 8s SIMD encapsulation of data used to " judging whether to equate " SIMD instruction, the encapsulation of data of the X input of " judging whether to equate " SIMD instruction is C
kM+0, C
kM+1..., C
kM+M-1, the encapsulation of data of the Y input of " judging whether to equate " SIMD instruction for 1,1 ..., 1}; Wherein, k is for carrying out the number of operations sequence number of " judging whether to equate " SIMD instruction, and its span is [0,8s-1]; Complete this and " judge whether to equate " that after SIMD instruction, the intermediary data stream Z obtaining is: G
kM+0, G
kM+1..., G
kM+M-1;
(2) use parallel " choosing highest order combination " SIMD instruction to complete the operation of the highest order of each byte of 8 continuous bytes being merged into 1 byte:
Successively s SIMD encapsulation of data used to " choosing highest order combination " SIMD instruction, the encapsulation of data of the X input of " choosing highest order combination " SIMD instruction is G
8wM+0, G
8wM+1..., G
8wM+8M-1; Wherein, w is for carrying out the number of operations sequence number of " choosing highest order combination " SIMD instruction, and its span is [0, s-1]; Complete after this " choosing highest order combines " SIMD instruction, the final output stream Z obtaining is: D
wM+0, D
wM+1..., D
wM+M-1.
8. method according to claim 6, is characterized in that: described SIMD technology when carrying out every instruction, the SIMD encapsulation of data X to two groups of each self-contained M elements concurrently
0, X
1... X
m-1and Y
0, Y
1... Y
m-1execution comprises the various operations that judge whether to equate and choose highest order combination, and now, every couple of data element X
iand Y
icarry out same operation, wherein, i is the data sequence number in SIMD encapsulation of data simultaneously, and its span is [0, M-1]; Using a resulting M result of calculation as data element, be encapsulated in again one group of SIMD form encapsulation of data Z
0, Z
1... Z
m-1in; Wherein, bit length P=64 * 2 of encapsulation of data
p; When data element type is byte, corresponding Q=8, wherein, the length M of SIMD encapsulation of data depends on bit length P and the shared bit length Q of data element type of encapsulation of data, its computing formula is
wherein, bit length P=64 * 2 of encapsulation of data
p, p is natural number; When data element type is word, corresponding Q=8, when data element type is byte, corresponding Q=16.
9. method according to claim 8, is characterized in that: described SIMD technology, when carrying out every instruction, also can only be processed one group of encapsulation of data according to described method.
10. method according to claim 8, is characterized in that:
The content of operation of the SIMD instruction of described " judging whether to equate " is: two groups of encapsulation of data to input are X
0, X
1... X
m-1and Y
0, Y
1... Y
m-1, completing concurrently common M data are carried out after " judging whether to equate " SIMD command operating, output SIMD encapsulation of data is Z
0, Z
1... Z
m-1; Wherein, Z
i=X
i==Y
i255:0, the formula X on "=" number right side
i==Y
i255:0 is the conditional operator that machine word calls the turn, and represents: if X
i==Y
iset up, Z
i=255, i.e. X
iwith Y
iwhile equating, Z
iassignment is 255; If X
i==Y
ibe false, Z
i=0, i.e. X
iwith Y
iwhen unequal, Z
iassignment is 0; In formula, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1];
The content of operation of the SIMD instruction of described " choosing highest order combination " is: the encapsulation of data to input is X
0, X
1... X
8M-1, completing concurrently M to choosing after highest order combination operation, the encapsulation of data of output is Z
0, Z
1... Z
m-1; Wherein,
Z
i=((X
8iaMP.AMp.Amp 0x80) <<7) ^ ((X
8i+1aMP.AMp.Amp 0x80) <<6) ^ ((X
8i+2aMP.AMp.Amp 0x80) <<5) ^ ((X
8i+3aMP.AMp.Amp 0x80) <<4); Formula ^ ((X
8i+4aMP.AMp.Amp 0x80) <<3) ^ ((X
8i+5aMP.AMp.Amp 0x80) <<2) ^ ((X
8i+6aMP.AMp.Amp 0x80) <<1) ^ (X
8i+7aMP.AMp.Amp 0x80) in, i is the data sequence number in SIMD encapsulation of data, and its span is [0, M-1].
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310729424.7A CN103746771B (en) | 2013-12-26 | 2013-12-26 | Data format conversion method of channel coding and decoding based on GPP and SIMD technologies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310729424.7A CN103746771B (en) | 2013-12-26 | 2013-12-26 | Data format conversion method of channel coding and decoding based on GPP and SIMD technologies |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103746771A true CN103746771A (en) | 2014-04-23 |
CN103746771B CN103746771B (en) | 2017-04-12 |
Family
ID=50503765
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310729424.7A Active CN103746771B (en) | 2013-12-26 | 2013-12-26 | Data format conversion method of channel coding and decoding based on GPP and SIMD technologies |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103746771B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108780394A (en) * | 2015-12-29 | 2018-11-09 | 英特尔公司 | Hardware device and method for transform coding format |
CN114581281A (en) * | 2020-11-30 | 2022-06-03 | 北京君正集成电路股份有限公司 | Optimization method based on first layer 4bit convolution calculation |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8570393B2 (en) * | 2007-11-30 | 2013-10-29 | Cognex Corporation | System and method for processing image data relative to a focus of attention within the overall image |
RU2011115796A (en) * | 2011-04-22 | 2012-10-27 | ЭлЭсАй Корпорейшн (US) | DEVICE (OPTIONS) AND METHOD FOR APPROXIMATION WITH DOUBLE ACCURACY OPERATIONS WITH SINGLE ACCURACY |
CN103294621B (en) * | 2013-05-08 | 2016-04-06 | 中国人民解放军国防科学技术大学 | Supported data presses the vectorial access method of mould restructuring |
-
2013
- 2013-12-26 CN CN201310729424.7A patent/CN103746771B/en active Active
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108780394A (en) * | 2015-12-29 | 2018-11-09 | 英特尔公司 | Hardware device and method for transform coding format |
CN108780394B (en) * | 2015-12-29 | 2023-07-18 | 英特尔公司 | Hardware apparatus and method for converting encoding format |
CN114581281A (en) * | 2020-11-30 | 2022-06-03 | 北京君正集成电路股份有限公司 | Optimization method based on first layer 4bit convolution calculation |
Also Published As
Publication number | Publication date |
---|---|
CN103746771B (en) | 2017-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113762490B (en) | Matrix multiplication acceleration using sparse matrix with column folding and squeezing | |
DE102018005181B4 (en) | PROCESSOR FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH PERFORMANCE, ACCURACY AND ENERGY REDUCTION CHARACTERISTICS | |
US20230333855A1 (en) | Multi-variate strided read operations for accessing matrix operands | |
CN117931121A (en) | Computer processor for higher precision computation using hybrid precision decomposition of operations | |
CN111512292A (en) | Apparatus, method and system for unstructured data flow in a configurable spatial accelerator | |
DE102018126150A1 (en) | DEVICE, METHOD AND SYSTEMS FOR MULTICAST IN A CONFIGURABLE ROOM ACCELERATOR | |
US11029958B1 (en) | Apparatuses, methods, and systems for configurable operand size operations in an operation configurable spatial accelerator | |
CN104025033B (en) | The SIMD variable displacements manipulated using control and circulation | |
CN105612509A (en) | Methods, apparatus, instructions and logic to provide vector sub-byte decompression functionality | |
CN107992330A (en) | Processor, method, processing system and the machine readable media for carrying out vectorization are circulated to condition | |
CN105264779A (en) | Data compression and decompression using simd instructions | |
CN105975251B (en) | A kind of DES algorithm wheel iteration systems and alternative manner based on coarseness reconstruction structure | |
CN118132147A (en) | System for executing instructions that fast convert slices and use slices as one-dimensional vectors | |
CN110023903B (en) | Binary vector factorization | |
CN107924307A (en) | Register and data element rearrangement processor, method, system and instruction are dispersed to by index | |
EP3623940A2 (en) | Systems and methods for performing horizontal tile operations | |
CN114443559A (en) | Reconfigurable operator unit, processor, calculation method, device, equipment and medium | |
TW201732568A (en) | Systems, apparatuses, and methods for lane-based strided gather | |
US20190004997A1 (en) | Binary Multiplier for Binary Vector Factorization | |
CN111767512A (en) | Discrete cosine transform/inverse discrete cosine transform DCT/IDCT system and method | |
CN103746771A (en) | Data format conversion method of channel coding and decoding based on GPP and SIMD technologies | |
CN109328333B (en) | System, apparatus and method for cumulative product | |
TWI544408B (en) | Apparatus and method for sliding window data gather | |
CN109672524A (en) | SM3 algorithm wheel iteration system and alternative manner based on coarseness reconstruction structure | |
EP3929734A1 (en) | Loading and storing matrix data with datatype conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |