CN202133997U - Data rearrangement device - Google Patents

Data rearrangement device Download PDF

Info

Publication number
CN202133997U
CN202133997U CN2011201569187U CN201120156918U CN202133997U CN 202133997 U CN202133997 U CN 202133997U CN 2011201569187 U CN2011201569187 U CN 2011201569187U CN 201120156918 U CN201120156918 U CN 201120156918U CN 202133997 U CN202133997 U CN 202133997U
Authority
CN
China
Prior art keywords
data
vector
instruction
register
rearrangement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2011201569187U
Other languages
Chinese (zh)
Inventor
孙瑞琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU ZHONGKE CORES ELECTRONIC TECHNOLOGY CO LTD
Original Assignee
JIANGSU ZHONGKE CORES ELECTRONIC TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIANGSU ZHONGKE CORES ELECTRONIC TECHNOLOGY CO LTD filed Critical JIANGSU ZHONGKE CORES ELECTRONIC TECHNOLOGY CO LTD
Priority to CN2011201569187U priority Critical patent/CN202133997U/en
Application granted granted Critical
Publication of CN202133997U publication Critical patent/CN202133997U/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

Provided is a data rearrangement device, which is a component of a processor. According to the operation process of a VLIW processor, data rearrangement operation can be executed in a vector data processing unit. The data rearrangement device comprises an instruction storage unit for storing instruction code to be executed, a data storage unit for storing data instruction read required for storing instruction code execution, an instruction read and distribution unit which reads the instruction code from the instruction storage unit and sends to all the execution units, a vector data processing unit for executing a vector processing instruction in the instruction code, a vector register group which includes vector registers from VR0 to VRn and is used for storing data from the data storage unit with an operation unit reading data from an appropriate register, and a vector rearrangement sequence register group, which includes vector rearrangement sequence registers from VR_CFG0 to VR_CFGn and is used for configuring units participating in the vector rearrangement operation and selecting a part of or the whole vector operation unit to participate in operation.

Description

A kind of rearrangement device of data
Technical field
The present invention relates to microprocessor and field of computer, specifically, relate to the parallel processing of object oriented processor, especially the rearrangement device of parallel data.
Background technology
Along with the processor development of technology, the speed of processor improves constantly, and performance constantly strengthens, simultaneously the user to computing machine or processor require increasingly high.It is far from being enough that but the acceleration that only relies on electronic component satisfies the demands, and therefore improves the parallel processing capability of computing machine and increases being even more important of effective processing instruction change.
In existing technical scheme, Chinese patent CN 200480018443.8 has introduced a kind of device of data rearrangement, and is as follows bright specifically:
First and second operands are simd registers, but, the also available memory position replacement of the register of second operand.First operand comprises the source data that is used to reset, and also is destination register for the register of first operand.Except changing their position, also comprise according to embodiments of the invention selected byte being set to zero ability.
Second operand comprises that one group is reset control mask byte to specify rearrangement pattern.The bit number that is used for selecting source data element be the source operand number data elements log2. for example, one 64 bit register is divided into 8 byte numbers, therefore needs the quantity of data element in three bits indication source operands.3 bits of [2:0] exponential representation in the following code.If set reset the highest significant bit 7 of control byte, then constant zero is write byte as a result.Minimum effective three of the byte I of second operand comprise integer J else if, and then rearrangement instructions makes J byte of the source-register of winning be copied to the I byte location of destination register.Below be to be used for the exemplary pseudo that the deflation byte of 64BIT operand is reset an embodiment of operation:
Figure BSA00000497447900011
Above-mentioned patent CN 200480018443.8, limitation technically is:
One of which in communication data is handled, particularly in the FFT computing, only need exchange the partial data in 2 source-registers sometimes, and following one-period will promptly operate source data register.The rearrangement device of Intel is if reach this purpose, a lot of redundant instruction of needs increase.
Its two, Intel's technical patent support is reset a data source, and in communication data is handled, need reset two sources or three sources.
They are three years old; Intel's technical patent is not supported the maintenance to legacy data; Intel's first source operand is that source-register also is a destination register; It only is that a vector source operand is reset, though saved 1 register like this, requires again the processing that legacy data is reset is added heavily to the difficulty of the work and task for promptly keeping legacy data.
The present invention has solved the problems referred to above effectively, through rearrangement instructions of the present invention, realizes having improved processing speed such as the rapid minimizing of the needed instruction number meeting of FFT computing; Simultaneously, the present invention can reset two sources or three sources even multi-source; Can realize promptly keeping legacy data, the processing of again legacy data being reset.In addition, the present invention has also increased resetting dataset and the function of putting negative except supporting making zero of data rearrangement.
Summary of the invention
In order to address the above problem, the present invention proposes a kind of rearrangement device of data.
Data rearrangement device of the present invention is a building block as processor, and the process of its execution is carried out according to the operating process of VLIW (very long instruction word) processor, in the concrete process that realizes rearrangement, embodies content of the present invention.
Data rearrangement device of the present invention comprises:
The location of instruction is used for storing the instruction code that will carry out;
Data storage cell is used for store instruction codes to carry out required data command and read;
Instruction is read and Dispatching Unit, and all performance elements are given with instruction code reading and sending from the location of instruction in this unit;
Vector data processing unit, this unit vector processing instruction in the middle of the code that is used for executing instruction;
The vector register group comprises vector register VR0, VR1 ..., VRn is used for depositing the data from data storage cell, and its arithmetic element then reads and writes data from an amount of registers group;
Vector is reset the sequence register group, and comprise vector and reset sequence register VR_CFG0, VR_CFG1, VR_CFG2 ..., VR_CFGn is used for disposing and participates in the unit that vector is reset computing, selects the first unit of calculating of part or whole vector to participate in computing.
The rearrangement of said data is to accomplish in the vector data processing unit according to the operating process of vliw processor.
Said instruction is read with Dispatching Unit and from the said location of instruction, is read rearrangement instructions, is distributed to said vector data processing unit then, and said vector data processing unit is carried out according to said rearrangement instructions and reset operation; Said vector data processing unit will treat that according to said rearrangement instructions the data rearrangement formation is read into the register in said vector register group source from said data storage cell, and the appointment of said source-register is determined by said rearrangement instructions; Said vector data processing unit is reset the content of sequence register according to command request and said vector; Completion is to the rearrangement operation of the data element preserved in the said vector register group; The result leaves in the said vector register group destination register, and the appointment of said destination register is determined by rearrangement instructions; Result with leaving in after resetting in the said destination register is stored in the said data storage cell.
Data rearrangement according to the present invention can be divided into two big types by function, and one type is that the data interchange in two source-registers is reset, and two source-registers of in exchanging rearrangement, participating in resetting also are two destination registers.Another is big type to be to a source, two sources, and the data of three sources or multi-source are reset.Its major function is can be according to actual requirement, a source, and two sources, the data of three sources or multi-source are arranged its position arbitrarily, and the result puts into a destination register.
The present invention only needs an instruction cycle promptly to accomplish 2 partial datas exchange rearrangements in the source-register.Reduced the redundant instruction that some algorithm brought.Improved the computing velocity of processor.
The present invention supports the rearrangement to the multiple source data register, like a source, two sources, three sources etc.
Support of the present invention keeps the partial data in the destination register, and other parts receive the data of rearrangement normally, and the data in the source-register remain unchanged.
The present invention has also increased resetting dataset and the function of putting negative except supporting making zero of data rearrangement.
Description of drawings
With reference to following accompanying drawing with easy to understand the present invention more:
Shown in Figure 1 putting for data refitting of the present invention carried out the process flow diagram of handling
Shown in Figure 2 is the structured flowchart of data rearrangement device of the present invention;
Shown in Figure 3 is the data interchange synoptic diagram in two sources;
Data content after two source-registers of Fig. 4 exchange;
8 FFT computings of Fig. 5 flow graph;
Intermediate result value behind Fig. 6 one-level butterfly computation;
Fig. 7 is the data interchange re-arrangement example in two sources;
Shown in Figure 8 is the data rearrangement synoptic diagram in three sources.
Embodiment
Data rearrangement device of the present invention comprises: the location of instruction, and data storage cell, instruction is read and Dispatching Unit, the vector data processing unit, the vector register group, vector is reset the sequence register group.The rearrangement of said data is to accomplish in the vector data processing unit according to the operating process of vliw processor.
In data rearrangement process of completion; At first with rearrangement instructions by be read in the location of instruction instruction read with Dispatching Unit in; By the instruction reading unit rearrangement instructions is distributed in the vector data processing unit again, carries out by the vector data processing unit and reset operation.The vector data processing unit will treat at first that according to rearrangement instructions the data rearrangement formation is read into the vector register group source-register from data storage cell, and the appointment of source-register is determined by rearrangement instructions; Then the intact vector that is loaded into of configuration sequence is reset sequence register group VR_CFG0, VR_CFG1 ..., in the middle of the VR_CFGn (whether needing this step of configuration sequence register, according to the data rearrangement function, is optional); The vector data processing unit is reset the content of sequence register according to command request and vector; Completion is to the rearrangement operation of the data element preserved in the vector register group; The result leaves in the vector register group destination register, and the appointment of destination register is determined by rearrangement instructions; At last, with the result who leaves in after resetting in the destination register, be stored in the corresponding space of data storage cell.So just accomplished a data rearrangement operation.
According to data rearrangement device of the present invention, be divided into two big types by function: the data interchange in two source-registers is reset; Data rearrangement to multi-sources such as a source, two sources, three sources.
When the data interchange in two source-registers is reset, carry out following steps:
Step 1 will be carried out the required instruction of rearrangement and deliver to the vector data processing unit by reading out in the location of instruction, and this is an instruction or one group of instruction.
Step 2 according to a rearrangement instructions, is loaded in the middle of two registers in the vector register group by data storage cell waiting to exchange data rearrangement.These two registers also are destination registers as the source-register of resetting simultaneously.
Step 3 according to chart 3 indications, is exchanged rearrangement to two source-registers.Result after the rearrangement still is stored in these two registers, and the result who exchanges after resetting is as shown in Figure 4.
Step 4, with destination register VR0, VR1 ..., VRn writes back to data-carrier store.
During to the data rearrangement of multi-sources such as a source, two sources, three sources, carry out following steps:
At first, will carry out the required instruction of rearrangement (instruction or one group of instruction) and deliver to the vector data processing unit by reading out in the location of instruction.
Secondly, according to a rearrangement instructions or one group of rearrangement instructions, with treating that data rearrangement is loaded into some in the vector register group by data storage cell, certain two, or in the middle of a plurality of register.These registers are as the source-register of resetting.
The 3rd, load vector and reset sequence register, vector rearrangement order numerical value is loaded into vector resets sequence register VR_CFG0, CFG1, CFG2 ..., CFGn; The number that vector is reset number of registers and destination register is identical.The content of resetting sequence register VR_CFG according to vector with data from the source vector register VR0, VR1 ..., VRn is written in the middle of the target vector register VRm; The content of simultaneously resetting sequence register VR_CFG correspondence position according to vector judges that the content with destination register VRm is changed to 0 or-1 or keep initial value.
At last, with destination register VR0, VR1 ..., VRn writes back to data storage cell;
Below, for two types of functions that data rearrangement device of the present invention is realized, promptly support 2 data interchanges in the source-register to reset, and the rearrangement of supporting multi-source datas such as a source, two sources, three sources, specify as follows with reference to accompanying drawing.
As far as supporting 2 source-register data rearrangement operations, shown in accompanying drawing 3, be that example (the invention is not restricted to 128, also can be 256 with 128 bit vector registers; 512 etc.) at first, instruction is read with Dispatching Unit and from the location of instruction, is read rearrangement instructions, and instruction is passed to the vector performance element, and vector is carried out according to instruction and is read 8 16 bit data X (0) from data storage cell; X (1), X (2), X (3); X (4), X (5), X (6); X (7), and be loaded among the vector register VR0 in the vector register group, wherein VR0 is the 128bit register.Reload other 8 16 bit data Y (0), Y (1), Y (2), Y (3), Y (4), Y (5), Y (6), Y (7) is among the vector register VR1 to 128.These 16 data are exactly the data source that will reset, and VR0 and VR1 are exactly the source-register of rearrangement instructions.Carry out 2 source-register data rearrangement operations, at this moment, the content of vector rearrangement sequence register is as shown in the figure; As shown in Figure 3ly exchange rearrangement, promptly exchange the indicated part of arrow, Y (0) and X (1) exchange; Y (2) and X (3) exchange, and Y (4) and X (5) exchange, and Y (6) and X (7) exchange.Exchange the data X (0) among the VR0 of back, Y (0), X (2), Y (2), X (4), Y (4), X (6), Y (6); Data among the VR1 are X (1), Y (1), and X (3), Y (3), X (5), Y (5), X (7), Y (7), as shown in Figure 4.
To use real row advantage of the present invention being described below, is example with 8 FFT, in the FFT computing, considers the concurrent operation of data, and 8 FFT flow graphs are shown in chart five; Can be to the eight point data X (0) that load, X (1), X (2), X (3); X (4), X (5), X (6), X (7) resets.Can find out that from the flow graph of 8 FFT each butterfly computation is complex multiplication once, twice complex addition.In first order butterfly computation, data are divided into 2 groups, deposit in two vector registers.X (0), X (2), X (1), X (3) puts into vector register VR0, X (4), X (6); X (5), X (7) puts into vector register VR1, can be found out X (4), X (6), X (5) by figure six; X (7) need be multiplied by twiddle factor Wn0, again with X (4), and X (6), X (5), X (7) accomplishes signed magnitude arithmetic(al).Because Wn0 is one,, draw the intermediate result behind the first order butterfly computation so directly accomplish X (0) X (2) X (1) X (3) in X (4) X (6) X (5) X (7) signed magnitude arithmetic(al).Be VR0+VR1, VR0-VR1.Deposit the result of VR0+VR1 among the VR2 in, the result of VR0-VR1 deposits among the VR3.VR2, the result among the VR3 is shown in figure six.X1 (0) X1 (2) X1 (4) X1 (6) behind the corresponding first order butterfly computation of the value of VR2, X1 (1) X1 (3) X1 (5) X1 (7) behind the corresponding first order butterfly computation of the value of VR3.Can find out the butterfly computation of the second level that from the flow graph of FFT X1 (2) X1 (3) X1 (6) X1 (7) will be multiplied by twiddle factor Wn0, Wn2, Wn0 respectively; Wn2 if consider the concurrent operation of DSP, need put into a vector calculus unit to X1 (2) X1 (3) X1 (6) X1 (7); Such vector multiplication instruction can be accomplished X1 (2) * Wn0, X1 (3) * Wn2; X1 (6) * Wn0, X1 (7) * Wn2, the result after taking advantage of is that X 1 (0) X 1 (1) X 1 (4) X 1 (5) adds and subtracts mutually.Therefore put into a vector register to X 1 (2) X 1 (3) X1 (6) X 1 (7), X 1 (2) X1 (3) X1 (6) X 1 (7) puts into a register.Realize that above-mentioned functions need exchange rearrangement to the data among VR2 and the VR3 and can accomplish.The result is as shown in Figure 7.
The above, this device is promptly accomplished 2 partial datas in the source-register a clock period and is exchanged rearrangement.Reduce the redundant instruction that some algorithm brought, improved the arithmetic speed of processor.
Data rearrangement device of the present invention is reset except supporting two data interchanges in the source-register, also supports a source, and two sources, three sources or multi-source are reset.Its major function is can be according to actual requirement, a source, and two sources, the data of three sources or multi-source are arranged its position arbitrarily, and the result puts into a destination register.Process is following:
One of which is reset in the sequence register group VR_CFG register at vector, if the most significant digit of 16BIT data (I position) is 1, then the data of the i position among the corresponding VR3 be 0.
Its two vector is reset in sequence register group VR_CFG register, if the inferior high position of 16BIT data (I position) is 1, then the data of the I position among the corresponding VR3 be 1.
Its three vector is reset in sequence register group VR_CFG register, if time time high position of 16BIT data (I position) is 1, then the data of the I position among the corresponding VR3 be-1.
Its four-vector is reset in sequence register group VR_CFG register, if 16BIT data (I position) less than 24, and data are J, rearrangement instructions makes J 16BIT data of source-register be copied to the I position of destination register.
It five is reset in the sequence register group VR_CFG register at vector, the situation except that above-mentioned, then to the data of the I position among the corresponding VR3 remain unchanged.
In the design, as shown in Figure 8, all vector registers all be 128, though this instance vector register is 128, the invention is not restricted to 128,256,512 etc.128 data that are divided into 8 16BIT.VR_CFG is divided into the data of 8 16BIT, specifies the pattern of resetting, and three source-register VR0, VR1, VR2 are the data that will be reset, the data after a destination register VR3 storage is reset.
VR0, VR1, VR2 have 24 16BIT data, and corresponding decimal digit is changed to 0-23, is expressed as 0-17 with 16 systems.The position that V R0 is corresponding from left to right is 0-7, and the position that VR1 is corresponding from left to right is 8-F, and the position that VR2 is corresponding from left to right is 10-17.
Load vector and reset sequence register VR-CFG, the data of pairing 8 16BIT of VR-CFG from left to right are respectively 1C F 3F001178005500F.
Vector is reset the following operation of execution of sequence register:
First 16BIT data that the VR-CFG vector is reset in the sequence register are 1, promptly put into first 16BIT of destination register to the data X in the VR0 correspondence position (1); Second 16BIT data that the VR-CFG vector is reset in the sequence register are C; Promptly put into second 16BIT of destination register to the data Y in the VR1 correspondence position (4); The VR-CFG vector is reset the 3rd data in the sequence register, the 4th data by that analogy.When the 5th 16B1T data in the VR-CFG vector rearrangement sequence register were F001, promptly most significant digit was one, promptly puts 0 to destination register institute data of corresponding positions; When the 7th 16BIT data in the VR-CFG vector rearrangement sequence register were 8001, promptly a time high position was one, promptly puts 1 to destination register institute data of corresponding positions; When the 8th 16BIT data in the VR-CFG vector rearrangement sequence register were 500f, promptly a time time high position was one, promptly puts-1 to destination register institute data of corresponding positions.

Claims (1)

1. data rearrangement device, said data rearrangement device according to the operating process of vliw processor, are carried out the data rearrangement operation as a building block of processor in the vector data processing unit, it is characterized in that said data rearrangement device comprises:
The location of instruction is used for storing the instruction code that will carry out;
Data storage cell is used for store instruction codes to carry out required data command and read;
Instruction is read and Dispatching Unit, and all performance elements are given with instruction code reading and sending from the location of instruction in this unit;
Vector data processing unit, this unit vector processing instruction in the middle of the code that is used for executing instruction;
The vector register group comprises vector register VR0, VR1 ..., VRn is used for depositing the data from said data storage cell, and its arithmetic element then reads and writes data from an amount of registers group;
Vector is reset the sequence register group, and comprise vector and reset sequence register VR_CFG0, VR_CFG1, VR_CFG2 ..., VR_CFGn is used for disposing and participates in the unit that vector is reset computing, selects the first unit of calculating of part or whole vector to participate in computing.
CN2011201569187U 2011-02-28 2011-05-17 Data rearrangement device Expired - Fee Related CN202133997U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011201569187U CN202133997U (en) 2011-02-28 2011-05-17 Data rearrangement device

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201120049997.1 2011-02-28
CN201120049997 2011-02-28
CN2011201569187U CN202133997U (en) 2011-02-28 2011-05-17 Data rearrangement device

Publications (1)

Publication Number Publication Date
CN202133997U true CN202133997U (en) 2012-02-01

Family

ID=45522760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011201569187U Expired - Fee Related CN202133997U (en) 2011-02-28 2011-05-17 Data rearrangement device

Country Status (1)

Country Link
CN (1) CN202133997U (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315716A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing Outer Product of Vectors computing
CN107315563A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing vectorial comparison operation
CN111352894A (en) * 2018-12-20 2020-06-30 深圳市中兴微电子技术有限公司 Single-instruction multi-core system, instruction processing method and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315716A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing Outer Product of Vectors computing
CN107315563A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing vectorial comparison operation
CN107315716B (en) * 2016-04-26 2020-08-07 中科寒武纪科技股份有限公司 Device and method for executing vector outer product operation
CN107315563B (en) * 2016-04-26 2020-08-07 中科寒武纪科技股份有限公司 Apparatus and method for performing vector compare operations
CN111352894A (en) * 2018-12-20 2020-06-30 深圳市中兴微电子技术有限公司 Single-instruction multi-core system, instruction processing method and storage medium

Similar Documents

Publication Publication Date Title
RU2427895C2 (en) Multiprocessor architecture optimised for flows
US9830156B2 (en) Temporal SIMT execution optimization through elimination of redundant operations
CN101320360B (en) Message queuing system for parallel integrated circuit and related operation method
US10007527B2 (en) Uniform load processing for parallel thread sub-sets
CN101014933B (en) Simulating multiported memories using lower port count memories
CN100480997C (en) System and method for selecting multiple threads for substantially concurrent processing
CN104050033A (en) System and method for hardware scheduling of indexed barriers
CN102163140A (en) Data rearrangement method and rearrangement device thereof
CN103197916A (en) Methods and apparatus for source operand collector caching
CN103262058A (en) Mechanism for conflict detection by using SIMD
CN104050032A (en) System and method for hardware scheduling of conditional barriers and impatient barriers
CN108351786A (en) Data are ranked up in instruction set architecture and merge ranked data
CN103226463A (en) Methods and apparatus for scheduling instructions using pre-decode data
CN109997109B (en) Stream engine with extraction advance hysteresis
CN103365631A (en) Dynamic bank mode addressing for memory access
TWI764997B (en) Graphics processing unit and method to process a graphics application
CN103870309A (en) Register allocation for clustered multi-level register files
JP6551751B2 (en) Multiprocessor device
CN109992559A (en) Remote atomic operation in more socket systems
CN202133997U (en) Data rearrangement device
CN100489830C (en) 64 bit stream processor chip system structure oriented to scientific computing
JP2017045151A (en) Arithmetic processing device and control method of arithmetic processing device
CN111459543B (en) Method for managing register file unit
CN103294449A (en) Pre-scheduled replays of divergent operations
TW202109286A (en) System and architecture of pure functional neural network accelerator

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120201

Termination date: 20200517