CN102109978A

CN102109978A - Data rearrangement method and rearrangement device

Info

Publication number: CN102109978A
Application number: CN2011100475859A
Authority: CN
Inventors: 孙瑞琛
Original assignee: Individual
Current assignee: Individual
Priority date: 2011-02-28
Filing date: 2011-02-28
Publication date: 2011-06-29
Also published as: CN102163140A

Abstract

A data rearrangement device as a component of a processor can execute the data rearrangement operation in a vector data processing unit according to the operation process of a VLIW processor. The data rearrangement device comprises an instruction storage unit for storing instruction codes to be executed, a data storage unit for storing the instruction codes and reading desired data instructions, an instruction reading and distribution unit for reading the instruction codes from the instruction storage unit and transmitting the instruction codes to all execution units, a vector data processing unit for executing the vector processing instruction of the instruction codes, a vector register group including vector registers VR0, VR1,..., VRn for storing data of the data storing unit, and an operation unit for reading/writing data from/to the proper register group, and a vector rearrangement sequence register group including vector rearrangement sequence registers VR_CFG0, VR_CFG1, VR_CFG2,..., VR_CFGn for configuring units participating in the vector rearrangement operations to select a part or all of vector operation units to participate in the operation.

Description

A kind of rearrangement method of data and rearrangement device

Technical field

The present invention relates to microprocessor and field of computer, specifically, relate to the parallel processing of object oriented processor, especially the rearrangement method of parallel data and rearrangement device.

Background technology

Along with the processor development of technology, the speed of processor improves constantly, and performance constantly strengthens, and the user is more and more higher to the requirement of computing machine or processor simultaneously.It is far from being enough that but the acceleration that only relies on electronic component satisfies the demands, and therefore improves the parallel processing capability of computing machine and increases being even more important of effective processing instruction change.

In existing technical scheme, Chinese patent CN 200480018443.8 has introduced a kind of method and apparatus of data rearrangement, and specific description is as follows:

First and second operands are simd registers, but the register of second operand also available memory position is replaced.First operand comprises the source data that is used to reset, and also is destination register for the register of first operand.Except changing their position, comprise also that according to embodiments of the invention the byte that will select is set to zero ability.

Second operand comprises that one group is reset control mask byte to specify rearrangement pattern.The bit number that is used for selecting source data element be the source operand number data elements log2. for example, one 64 bit register is divided into 8 byte numbers, therefore needs the quantity of data element in three bits indication source operands.3 bits of [2:0] exponential representation in the following code.If set reset the highest significant bit 7 of control byte, then constant zero is write byte as a result.Minimum effective three of the byte I of second operand comprise integer J else if, and then rearrangement instructions makes J byte of the source-register of winning be copied to the I byte location of destination register.Below be to be used for the exemplary pseudo code that the deflation byte of 64BIT operand is reset an embodiment of operation:

Above-mentioned patent CN 200480018443.8, limitation technically is:

One in communication data is handled, particularly in the FFT computing, only needs the partial data in 2 source-registers is exchanged sometimes, and following one-period will promptly operate source data register.The rearrangement method of Intel is if reach this purpose, a lot of redundant instruction of needs increase.

Its two, Intel's technical patent support is reset a data source, and in communication data is handled, need reset two sources or three sources.

They are three years old, Intel's technical patent is not supported the maintenance to legacy data, Intel's first source operand is that source-register also is a destination register, it only is that a vector source operand is reset, though saved 1 register like this, required again the processing that legacy data is reset is added heavily to the difficulty of the work and task for promptly keeping legacy data.

The present invention has solved the problems referred to above effectively, by rearrangement instructions of the present invention, realizes having improved processing speed such as the rapid minimizing of the needed instruction number meeting of FFT computing; Simultaneously, the present invention can reset two sources or three sources even multi-source; Can realize promptly keeping legacy data, again the processing that legacy data is reset.In addition, the present invention has also increased resetting dataset and putting and bear one function except supporting making zero of data rearrangement.

Summary of the invention

In order to address the above problem, the present invention proposes a kind of rearrangement method of data and resets device.

Data rearrangement device of the present invention is an ingredient as processor, and the process of its execution is carried out according to the operating process of VLIW (very long instruction word) processor, and the process in that specific implementation is reset embodies content of the present invention.

Data rearrangement device of the present invention comprises:

The location of instruction is used for storing the instruction code that will carry out;

Data storage cell is used for store instruction codes to carry out required data command and read;

Instruction is read and Dispatching Unit, and all performance elements are given with instruction code reading and sending from the location of instruction in this unit;

Vector data processing unit, this unit vector processing instruction in the middle of the code that is used for executing instruction;

The vector register group comprises vector register VR0, VR1 ..., VRn is used for depositing the data from data storage cell, and its arithmetic element then reads and writes data from an amount of registers group;

Vector is reset the sequence register group, and comprise vector and reset sequence register VR_CFG0, VR_CFG1, VR_CFG2 ..., VR_CFGn is used for disposing the unit of participation vector rearrangement computing, selects part or all vectors unit calculation unit participation computings.

The rearrangement of described data is to finish in the vector data processing unit according to the operating process of vliw processor.

Described instruction is read with Dispatching Unit and read rearrangement instructions from the described location of instruction, is distributed to described vector data processing unit then, and described vector data processing unit is carried out according to described rearrangement instructions and reset operation; Described vector data processing unit will treat that according to described rearrangement instructions the data rearrangement formation is read into the register in described vector register group source from described data storage cell, and the appointment of described source-register is determined by described rearrangement instructions; Described vector data processing unit is reset the content of sequence register according to command request and described vector, finish rearrangement operation to the data element of preserving in the described vector register group, the result leaves in the described vector register group destination register, and the appointment of described destination register is determined by rearrangement instructions; Result with leaving in after resetting in the described destination register is stored in the described data storage cell.

Data rearrangement according to the present invention can be divided into two big classes by function, and a class is that the data interchange in two source-registers is reset.Two source-registers that participate in resetting in exchanging rearrangement also are two destination registers.Concrete grammar may further comprise the steps:

Step 1 will be carried out the required instruction of rearrangement and deliver to the vector data processing unit by reading out in the location of instruction, and this is an instruction or one group of instruction.

Step 2 according to a rearrangement instructions, is loaded in the middle of two registers in the vector register group by data storage cell waiting to exchange data rearrangement.These two registers also are destination registers as the source-register of resetting simultaneously.

Step 3 according to chart three indications, is exchanged rearrangement to two source-registers.Result after the rearrangement still is stored in these two registers, and the result after exchange is reset is shown in chart four.

Step 4, with destination register VR0, VR1 ..., VRn writes back to data-carrier store.

The big class of in the data rearrangement another is to a source, two sources, and the data of three sources or multi-source are reset.Its major function is can be according to actual requirement, a source, and two sources, the data of three sources or multi-source are arranged its position arbitrarily, and the result puts into a destination register.Concrete grammar may further comprise the steps:

Step 2 according to a rearrangement instructions or one group of rearrangement instructions, will treat that data rearrangement is loaded into some in the vector register group by data storage cell, certain two, or in the middle of a plurality of register.These registers are as the source-register of resetting.

Step 3 loads vector and resets sequence register, and vector rearrangement order numerical value is loaded into vector resets sequence register VR_CFG0, VR_CFG1, VR_CFG2 ..., VR_CFGn; The number that vector is reset the number of registers and destination register is identical.The content of resetting sequence register VR_CFG according to vector with data from source vector register VR0, VR1 ..., VRn is written in the middle of the target vector register VRm; Simultaneously the content of resetting sequence register VR_CFG correspondence position according to vector judges that content with destination register VRm is changed to 0 or-1 or keep initial value.

Step 4, with destination register VR0, VR1 ..., VRn writes back to data-carrier store;

Wherein, in described step 3, source-register is put in the destination register according to the content of resetting in the register.Wherein said destination register can be one or more vector registers.The rearrangement order of different target register correspondence is determined by the alignment preface register that fills separately.

A kind of data rearrangement method is characterized in that comprising the steps:

Step 1 is reset in the sequence register group VR_CFG register at vector, if the most significant digit of data is 1, then the data of the i position among the corresponding destination register VRm be 0;

Step 2, vector are reset in the sequence register group VR_CFG register, if the inferior high position of data is 1, then the data of the I position among the corresponding destination register VRm be 1;

Step 3, vector are reset in the sequence register group VR_CFG register, if time time high position of data is 1, then the data of the I position among the corresponding destination register VRm be-1;

Step 4, vector are reset in the sequence register group VR_CFG register, if data less than 24, and data are J, rearrangement instructions makes J data of source-register be copied to the I position of destination register;

Step 5 is reset in the sequence register group VR_CFG register at vector, the situation except that above-mentioned, then to the data of the I position among the corresponding destination register VRm remain unchanged.

Step 1 is reset sequence register group VR_CFGa at vector, VR_CFGb, and VR_CFGc ..., in the VR_CFGm register, if the most significant digit of data is 1, corresponding destination register VRa then, VRb, VRc ..., the data of the i position among the VRm are zero;

Step 2, vector are reset sequence register group VR_CFGa, VR_CFGb, and VR_CFGc ..., in the VR_CFGm register, if the inferior high position of data is 1, corresponding destination register VRa then, VRb, VRc ..., the data of the I position among the VRm are 1;

Step 3, vector are reset sequence register group VR_CFGa, VR_CFGb, and VR_CFGc ..., in the VR_CFGm register, if time time high position of data is 1, corresponding destination register VRa then, VRb, VRc ..., the data of the I position among the VRm are-1;

Step 4, vector are reset sequence register group VR_CFGa, VR_CFGb, and VR_CFGc ... in the VR_CFGm register, if data less than 24, and data are J, rearrangement instructions makes J data of source-register be copied to destination register VRa, VRb, VRc ..., the I position of VRm;

Step 5 is reset sequence register group VR_CFGa at vector, VR_CFGb, and VR_CFGc ..., in the VR_CFGm register, the situation except that above-mentioned, then to corresponding destination register VRa, VRb, VRc ..., the data of the I position among the VRm remain unchanged.

The present invention only needs an instruction cycle promptly to finish 2 partial datas exchange rearrangements in the source-register.Reduced the redundant instruction that some algorithm brought.Improved the computing velocity of processor.

The present invention supports the rearrangement to the multiple source data register, as a source, two sources, three sources etc.

Support of the present invention keeps the partial data in the destination register, and other parts receive the data of rearrangement normally, and the data in the source-register remain unchanged.

The present invention has also increased resetting dataset and putting and bear one function except supporting making zero of data rearrangement.

Description of drawings

With reference to following accompanying drawing with easy to understand the present invention more:

Figure 1 shows that data rearrangement method overview flow chart of the present invention;

Figure 2 shows that the structured flowchart of data rearrangement device of the present invention;

Figure 3 shows that the data interchange synoptic diagram in two sources;

Data content after two source-registers of Fig. 4 exchange;

8 FFT computings of Fig. 5 flow graph;

Intermediate result value behind Fig. 6 one-level butterfly computation;

Fig. 7 is the data interchange re-arrangement example in two sources;

Figure 8 shows that the data rearrangement synoptic diagram in three sources.

Embodiment

Data rearrangement device of the present invention comprises: the location of instruction, and data storage cell, instruction is read and Dispatching Unit, the vector data processing unit, the vector register group, vector is reset the sequence register group.The rearrangement of described data is to finish in the vector data processing unit according to the operating process of vliw processor.

In finishing data rearrangement process, at first with rearrangement instructions by be read in the location of instruction instruction read with Dispatching Unit in, by the instruction reading unit rearrangement instructions is distributed in the vector data processing unit again, carries out by the vector data processing unit and reset operation.The vector data processing unit will treat at first that according to rearrangement instructions the data rearrangement formation is read into the vector register group source-register from data storage cell, and the appointment of source-register is determined by rearrangement instructions; Then the intact vector that is loaded into of configuration sequence is reset sequence register group VR_CFG0, VR_CFG1 ..., in the middle of the VR_CFGn (whether needing this step of configuration sequence register, according to the data rearrangement function, is optional); The vector data processing unit is reset the content of sequence register according to command request and vector, finish rearrangement operation to the data element of preserving in the vector register group, the result leaves in the vector register group destination register, and the appointment of destination register is determined by rearrangement instructions; At last, with the result who leaves in after resetting in the destination register, be stored in the corresponding space of data storage cell.So just finished a data rearrangement operation.

According to data rearrangement method of the present invention, be divided into two big classes by function: the data interchange in two source-registers is reset; Data rearrangement to multi-sources such as a source, two sources, three sources.

When the data interchange in two source-registers is reset, carry out following steps:

Step 3 according to chart 3 indications, is exchanged rearrangement to two source-registers.Result after the rearrangement still is stored in these two registers, and the result after exchange is reset as shown in Figure 4.

During to the data rearrangement of multi-sources such as a source, two sources, three sources, carry out following steps:

At first, will carry out the required instruction of rearrangement (instruction or one group of instruction) and deliver to the vector data processing unit by reading out in the location of instruction.

Secondly,, will treat that data rearrangement is loaded into some in the vector register group by data storage cell according to a rearrangement instructions or one group of rearrangement instructions, certain two, or in the middle of a plurality of register.These registers are as the source-register of resetting.

The 3rd, load vector and reset sequence register, vector rearrangement order numerical value is loaded into vector resets sequence register VR_CFG0, CFG1, CFG2 ..., CFGn; The number that vector is reset the number of registers and destination register is identical.The content of resetting sequence register VR_CFG according to vector with data from source vector register VR0, VR1 ..., VRn is written in the middle of the target vector register VRm; Simultaneously the content of resetting sequence register VR_CFG correspondence position according to vector judges that content with destination register VRm is changed to 0 or-1 or keep initial value.

At last, with destination register VR0, VR1 ..., VRn writes back to data storage cell;

Below, for the two class functions that data rearrangement method of the present invention is realized, promptly support 2 data interchanges in the source-register to reset, and the rearrangement of supporting multi-source datas such as a source, two sources, three sources, specify as follows with reference to accompanying drawing.

For supporting 2 source-register data rearrangement operations, as shown in Figure 3, with 128 bit vector registers is that example (the invention is not restricted to 128, also can be 256,512 etc.) at first, instruction is read with Dispatching Unit and read rearrangement instructions from the location of instruction, and instruction is passed to the vector performance element, vector is carried out according to instruction and is read 8 16 bit data X (0) from data storage cell, X (1), X (2), X (3), X (4), X (5), X (6), X (7), and be loaded among the vector register VR0 in the vector register group, wherein VR0 is the 128bit register.Reload other 8 16 bit data Y (0), Y (1), Y (2), Y (3), Y (4), Y (5), Y (6), Y (7) is among the vector register VR1 to 128.These 16 data are exactly the data source that will reset, and VR0 and VR1 are exactly the source-register of rearrangement instructions.Carry out 2 source-register data rearrangement operations, at this moment, the content of vector rearrangement sequence register as shown in the figure, exchange rearrangement as shown in Figure 3, promptly the indicated part of arrow is exchanged, Y (0) and X (1) exchange, and Y (2) and X (3) exchange, Y (4) and X (5) exchange, and Y (6) and X (7) exchange.Exchange the data X (0) among the VR0 of back, Y (0), X (2), Y (2), X (4), Y (4), X (6), Y (6); Data among the VR1 are X (1), Y (1), and X (3), Y (3), X (5), Y (5), X (7), Y (7), as shown in Figure 4.

To use real row advantage of the present invention being described below, is example with 8 FFT, considers the concurrent operation of data in the FFT computing, and 8 FFT flow graphs are shown in chart five, can be to the eight point data X (0) that load, X (1), X (2), X (3), X (4), X (5), X (6), X (7) resets.As can be seen, each butterfly computation is complex multiplication once, twice complex addition from the flow graph of 8 FFT.In first order butterfly computation, data are divided into 2 groups, deposit in two vector registers.X (0), X (2), X (1), X (3) puts into vector register VR0, X (4), X (6), X (5), X (7) puts into vector register VR1, by figure six as can be seen, X (4), X (6), X (5), X (7) need be multiplied by twiddle factor Wn0, again with X (4), and X (6), X (5), X (7) finishes signed magnitude arithmetic(al).Because Wn0 is one,, draw the intermediate result behind the first order butterfly computation so directly finish X (0) X (2) X (1) X (3) in X (4) X (6) X (5) X (7) signed magnitude arithmetic(al).Be VR0+VR1, VR0-VR1.The result of VR0+VR1 is deposited among the VR2, and the result of VR0-VR1 deposits among the VR3.VR2, the result among the VR3 is shown in figure six.X1 (0) X1 (2) X1 (4) X1 (6) behind the corresponding first order butterfly computation of the value of VR2, X1 (1) X1 (3) X1 (5) X1 (7) behind the corresponding first order butterfly computation of the value of VR3.As can be seen the butterfly computation of the second level, X1 (2) X1 (3) X1 (6) X1 (7) will be multiplied by twiddle factor Wn0, Wn2 respectively from the flow graph of FFT, Wn0, Wn2 if consider the concurrent operation of DSP, need put into a vector calculus unit to X1 (2) X1 (3) X1 (6) X1 (7), such vector multiplication instruction, can finish X1 (2) * Wn0, X1 (3) * Wn2, X1 (6) * Wn0, X1 (7) * Wn2, the result after taking advantage of are that X1 (0) X1 (1) X1 (4) X1 (5) adds and subtracts mutually.Therefore X1 (2) X1 (3) X1 (6) X1 (7) is put into a vector register, X1 (2) X1 (3) X1 (6) X1 (7) puts into a register.Realize that above-mentioned functions need exchange rearrangement to the data among VR2 and the VR3 and can finish.The result as shown in Figure 7.

The above, the method is promptly finished 2 partial datas in the source-register a clock period and is exchanged rearrangement.Reduce the redundant instruction that some algorithm brought, improved the arithmetic speed of processor.

Data rearrangement method of the present invention is reset except supporting two data interchanges in the source-register, also supports a source, and two sources, three sources or multi-source are reset.Its major function is can be according to actual requirement, a source, and two sources, the data of three sources or multi-source are arranged its position arbitrarily, and the result puts into a destination register.The specific implementation method is as follows:

One is reset in the sequence register group VR_CFG register at vector, if the most significant digit of 16BIT data (I position) is 1, then the data of the i position among the corresponding VR3 be 0.

Its two vector is reset in sequence register group VR_CFG register, if the inferior high position of 16BIT data (I position) is 1, then the data of the I position among the corresponding VR3 be 1.

Its three vector is reset in sequence register group VR_CFG register, if time time high position of 16BIT data (I position) is 1, then the data of the I position among the corresponding VR3 be-1.

Its four-vector is reset in sequence register group VR_CFG register, if 16BIT data (I position) less than 24, and data are J, rearrangement instructions makes J 16BIT data of source-register be copied to the I position of destination register.

It five is reset in the sequence register group VR_CFG register at vector, the situation except that above-mentioned, then to the data of the I position among the corresponding VR3 remain unchanged.

In the design, as shown in Figure 8, all vector registers all be 128, though this example vector register is 128, the invention is not restricted to 128,256,512 etc.128 data that are divided into 8 16BIT.VR_CFG is divided into the data of 8 16BIT, specifies the pattern of resetting, and three source-register VR0, VR1, VR2 are the data that will be rearranged, the data after a destination register VR3 storage is reset.

VR0, VR1, VR2 have 24 16BIT data, and corresponding decimal digit is changed to 0-23, is expressed as 0-17 with 16 systems.The position of VR0 correspondence from left to right is 0-7, and the position of VR1 correspondence from left to right is 8-F, and the position of VR2 correspondence from left to right is 10-17.

Load vector and reset sequence register VR-CFG, the data of pairing 8 16BIT of VR-CFG from left to right are respectively 1C F 3F001178005500F.

Vector is reset the following operation of execution of sequence register:

First 16BIT data that the VR-CFG vector is reset in the sequence register are 1, promptly the data X in the VR0 correspondence position (1) are put into first 16BIT of destination register; Second 16BIT data that the VR-CFG vector is reset in the sequence register are C, promptly the data Y in the VR1 correspondence position (4) is put into second 16BIT of destination register, the VR-CFG vector is reset the 3rd data in the sequence register, the 4th data by that analogy.When the 5th 16BIT data in the VR-CFG vector rearrangement sequence register were F001, promptly most significant digit was one, promptly destination register institute data of corresponding positions is put 0; When the 7th 16BIT data in the VR-CFG vector rearrangement sequence register were 8001, promptly a time high position was one, promptly destination register institute data of corresponding positions is put 1; When the 8th 16BIT data in the VR-CFG vector rearrangement sequence register were 500f, promptly a time time high position was one, promptly destination register institute data of corresponding positions is put-1.

Claims

1. data rearrangement device, described data rearrangement device according to the operating process of vliw processor, are carried out the data rearrangement operation as an ingredient of processor in the vector data processing unit, it is characterized in that described data rearrangement device comprises:

The vector register group comprises vector register VR0, VR1 ..., VRn is used for depositing the data from described data storage cell, and its arithmetic element then reads and writes data from an amount of registers group;

2. data rearrangement device according to claim 1 is used for the data interchange of two source-registers is reset, and perhaps is used for the data rearrangement to multi-sources such as a source, two sources, three sources.

3. data rearrangement device according to claim 1 and 2, wherein,

Described instruction is read with Dispatching Unit and read rearrangement instructions from the described location of instruction, is distributed to described vector data processing unit then, and described vector data processing unit is carried out according to described rearrangement instructions and reset operation;

Described vector data processing unit will treat that according to described rearrangement instructions the data rearrangement formation is read into the register in described vector register group source from described data storage cell, and the appointment of described source-register is determined by described rearrangement instructions;

Described vector data processing unit is reset the content of sequence register according to command request and described vector, finish rearrangement operation to the data element of preserving in the described vector register group, the result leaves in the described vector register group destination register, and the appointment of described destination register is determined by rearrangement instructions;

Result with leaving in after resetting in the described destination register is stored in the described data storage cell.

4. a data rearrangement method is promptly finished two partial datas in the source-register by an instruction cycle and is exchanged rearrangement, it is characterized in that comprising the steps:

Step 1 will be carried out the required rearrangement instructions of rearrangement and deliver to the vector data processing unit by reading out in the location of instruction, and required rearrangement instructions is an instruction or one group of instruction;

Step 2 according to described rearrangement instructions, will treat that data rearrangement is loaded in the middle of two registers in the vector register group by data storage cell, and these two registers also are two destination registers as two source-registers resetting simultaneously;

Step 3 is exchanged rearrangement to described two source-registers, and the result after the rearrangement still is stored in these two registers;

5. a data rearrangement method is characterized in that comprising the steps:

Step 1 will be carried out the required rearrangement instructions of rearrangement and deliver to the vector data processing unit by reading out in the location of instruction, and required rearrangement instructions is an instruction or one group of instruction.

Step 2 according to described rearrangement instructions, will treat that data rearrangement is loaded in the middle of one or two or a plurality of register in the vector register group by data storage cell, and these registers are as the source-register of resetting.

Step 3 loads vector and resets sequence register, and vector rearrangement order numerical value is loaded into vector resets sequence register VR_CFGa, VR_CFGb, VR_CFGc ..., VR_CFGm; The number that described vector is reset the number of registers and destination register is identical; The content of resetting sequence register VR_CFG according to described vector with data from source vector register VR0, VR1 ..., VRn is written to target vector register VRa, VRb, and VRc ..., in the middle of the VRm; Reset sequence register VR_CFGa according to described vector simultaneously, VR_CFGb, VR_CFGc ..., the content of VR_CFGm correspondence position is judged described destination register VRa, VRb, and VRc ..., the content of VR is changed to 0 or-1 or keep initial value;

Step 4, with destination register VRa, VRb, VRc ..., VRm writes back to data-carrier store.

6. data rearrangement method according to claim 5, wherein in described step 3, described source-register is put in the destination register according to the content in the described rearrangement register.

7. data rearrangement method according to claim 6, wherein said destination register can be one or more vector registers, and the rearrangement order of different target register correspondence is determined by the alignment preface register that fills separately.

8. a data rearrangement method is characterized in that comprising the steps:

9. a data rearrangement method is characterized in that comprising the steps: