CN100517212C - Synchronous periodical orthogonal data converter - Google Patents

Synchronous periodical orthogonal data converter Download PDF

Info

Publication number
CN100517212C
CN100517212C CNB2004100786966A CN200410078696A CN100517212C CN 100517212 C CN100517212 C CN 100517212C CN B2004100786966 A CNB2004100786966 A CN B2004100786966A CN 200410078696 A CN200410078696 A CN 200410078696A CN 100517212 C CN100517212 C CN 100517212C
Authority
CN
China
Prior art keywords
vector
group
components
component
impact damper
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2004100786966A
Other languages
Chinese (zh)
Other versions
CN1591316A (en
Inventor
博里斯·普罗科潘科
蒂莫尔·帕尔塔切夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/666,083 external-priority patent/US7284113B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN1591316A publication Critical patent/CN1591316A/en
Application granted granted Critical
Publication of CN100517212C publication Critical patent/CN100517212C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

An orthogonal data converter for converting the components of a sequential vector component flow to a parallel vector component flow. The data converter has an input rotator configured to rotate corresponding vector components of the sequential vector component flow by a prescribed amount, and a bank of register files configured to store the rotated vector components. The converter also has an output rotator configured to rotate the position of the vector components read from the bank of register files by a prescribed amount. A controller of the converter is operative to control the addressing of the bank of register files and the rotating of the vector components.

Description

Synchronous periodical orthogonal data converter and data transfer device
Technical field
The present invention relates to a kind of conversion of data layout, particularly a kind of System and method for, the data reforming when being applicable to Vector Processing, in order to will be in proper order (vertically) component of a vector circulation be changed to and preface (omnidirectional's amount or level) component of a vector stream.
Background technology
Graph data can represent by vector format, this vector format be comprise geometric component (being X, Y, Z and W) or pixel value components (be R, G, B, A).Geometry engines is in order to handle vector components; Fig. 1 is the calcspar that is expressed as typical geometry engines processing graphics vector.Figure vector 10 is to be input among the input buffer 12, and this input buffer 12 is to store the figure vector in traditional memory storage mode.The figure vector has component Xi, Yi, Zi and Wi.Input buffer 12 are output pattern vectors to a vector operation logical block (vector arithmetic logic unit, ALU) 14, this vectorial ALU 14 is positioned at the function of figure vector 10 in order to execution.The figure vector 18 that vector ALU 14 outputs had been handled, this figure vector 18 is to have identical vector format with tablet pattern vector 10.In particular, processed figure vector 18 comprises Xout, Yout, the component of Zout and Wout, vectorial ALU 14 are component of a vector of handling in the time, (omnidirectional's amount or level) component of a vector flowed side by side, each component X, Y, Z and W are handled by vectorial ALU 14 at one time, and make the output of vectorial ALU 14 comprise each the vectorial Xout that has same format with tablet pattern vector 10, Yout, Zout and Wout.
At present, the scale graphic process unit is in order to handle the figure vector in the vertical vector component stream.Fig. 2 is expressed as single instruction multiple data (singleinstruction, multiple data, SIMD) processing unit in order to the use scale ALU of processing graphics vector.Figure vector 10 is to input to an input buffer 20, and input buffer 20 is to be one 4 row's quadrature access memory, and this is known technology commonly used.Input buffer 20 is to become general component in order to rearrange each figure vector 10.In particular, the output of input buffer 20 will be for having the vector of general component, and this vector is to be a vertical vector form.As shown in Figure 2, input buffer 20 is output one component vector 22, and this component vector 22 is to have general or class component.For example, this component vector 22 is for only comprising the X component or only comprising the Y component.
Input buffer 20 is that this scalar processor 24 is in order to each component of independent computing component vector 22 at time (vertical direction) component of a vector stream output component vector 22 to one scalar processors 24 in proper order.Scalar processor 24 comprises four scale ALU 26a-26d, and its detailed operational scenario can be with reference to U.S. patent application case number: No.10/354,795.
Scalar processor 24 is output one scale result vectors 30, and this scale result vector 30 is the results that have by the component of a vector after the computing.Because scalar processor 24 is in order to the component of a vector of computing in the time, (vertically) component of a vector flowed in proper order, significantly, scale result vector 30 is to be vertical (time in proper order) form.Yet scale result vector 30 has the vector format inequality with figure vector 10, and therefore, scale result vector 30 need be converted into time and preface (omnidirectional's amount or horizontal direction) form.
Summary of the invention
The objective of the invention is to propose a kind of output orthogonal converter, can be rearranged for specific form in order to component with scalar processor.Therefore, the output orthogonal converter becomes the also form of preface component of a vector stream in order to conversion scale result vector.
In addition, purpose of the present invention more proposes a kind of output orthogonal converter, and in order to after the operation of scalar processor, rearranging component of a vector becomes also preface component of a vector stream.In addition, the present invention more proposes a kind of being used for the vertical vector component stream is converted to and the method for preface vector synchronously.
According to purpose of the present invention, be to propose an orthogonal data converter, be converted into and the component of preface component of a vector stream in order to the component of component of a vector stream in proper order.Data converter comprises: an input rotor (rotator) is each correlated components group of rotating these vectors according to a quantity, and quantity is to change along with the time slot that rotates the correlated components group; A plurality of impact dampers are to be coupled to the input rotor, rotated the correlated components group in order to receive, and wherein an impact damper have rotated the correlated components group in order to store each; One output rotor is to be coupled to these impact dampers, is stored in a plurality of component of a vector of impact damper in order to reception, and rotates these component of a vector according to a quantity, and this quantity is to change along with the time slot of these component of a vector; And a controller, when each vectorial correlated components has been stored in impact damper, in order to the addressing (addressing) of controlling these impact dampers and the rotation of these component of a vector.From the above, controller is these component of a vector to be write these impact dampers according to a predefined procedure, at the same time, and according to certain these component of a vector that reads in proper order, in order to produce and preface component of a vector stream.
In preferred embodiment of the present invention, these impact dampers have a plurality of component impact dampers, in order to store these component of a vector.Each vector has the x component, and these impact dampers have the component impact damper of x row.In general, each row will have x component impact damper.These impact dampers are in order in the same clock period, the operation that writes and read.In addition, controller can carry out level in turn and writes and operation of reading and the vertical operation that writes and read.Output rotor is these component of a vector to be rotated to a position, and this position is the relative position for this input rotor.
In addition, the present invention more proposes a kind of method, in order to change a plurality of vectors, be by the time in proper order format conversion be time and preface form, wherein, in time in proper order in the form, these vectors comprise a plurality of correlated components groups, and each correlated components group is at same time slot (time slot), and, in time and preface form, each vector is at a time slot.At first, according to each correlated components group that a quantity is rotated these vectors, quantity is to change along with the time slot that rotates the correlated components group, and each has been rotated the impact damper that the correlated components group writes to a plurality of impact dampers; Then, read this impact damper that is positioned at these impact dampers, in order to the stored a plurality of component of a vector of reception buffer, and rotate these component of a vector by a quantity, quantity is to change along with these vectorial time slots.Reading with being written to these impact dampers is to carry out in the identical cycle.Among the embodiment, these impact dampers are flatly to read and to write in n clock period therein, then, vertically read and write in the clock period at next n.Therefore, in the clock period, method of the present invention is can carry out level in turn to write and operation of reading and the vertical operation that writes and read at each n.
Description of drawings
Fig. 1 is the calcspar that is expressed as typical geometry engines processing graphics vector.
Fig. 2 is expressed as a single instruction multiple data processing unit in order to the use scale ALU of processing graphics vector.
Fig. 3 is the structural drawing that is expressed as according to quadrature converter of the present invention.
Shown in Figure 4, be the calcspar that is expressed as input rotor 34.
Shown in Figure 5, be the calcspar that is expressed as output rotor 38.
Shown in Figure 6, be in order to the structural drawing of expression in order to the controller 36 of generation rotor control bit A1, A2 and position, position AB0-AB3.
Fig. 7 is expressed as the data converter of component vector more than.
Shown in Figure 8, be the sequential chart that is expressed as four component quadrature conversions using quadrature converter 32.
The reference numeral explanation
10-figure vector; The 12-impact damper; 14-vector operation logical block; 18-figure vector; The 20-input buffer; The 22-component vector; The 24-scalar processor; 30-scale result vector; 26a-26d-scale ALU; 32-output orthogonal converter; 34-imports rotor; The 36-controller; 40a-40d-impact damper row; The 38-output rotor; 44a-44d, the 48a-48d-first rank multiplexer row; 46a-46d, the 50a-50d-second rank multiplexer row.
Embodiment
Please refer to illustration, wherein, all diagrams are for preferred embodiment of the present invention is described, but practical range of the present invention is not limited to this.Fig. 3 is the structural drawing that is expressed as according to quadrature converter of the present invention.The scale result vector 30 that is produced by the scalar processor 24 of Fig. 2 is to be provided to an input rotor 34.By as can be known aforementioned, scale result vector 30 is in the time, component of a vector flowed in proper order, uses that correlated components is present among the identical time slot (slot).For example, scale result vector 30a comprises X component X0-X3, and similarly, scale result vector 30b comprises Y component Y0-Y3.Though figure in the present embodiment vector 30 has four components (being X, Y, Y, Z and W) as the disclosed figure vector of known technology, but figure vector 30 of the present invention is not limited to this, also can comprise the figure vector with more or less component.
Input rotor 34 is the components that rotate scale result vector 30 with the number of revolutions in a precalculated position, the number of revolutions in above-mentioned precalculated position is determined that by controller 36 controller 36 is to transmit an input rotation control signal to importing rotor 34 to determine the number of revolutions in above-mentioned precalculated position.After rotation, scale result vector 30 will be written in the component impact damper of impact damper row B0-B3.Impact damper row B0-B 3 has impact damper Bx.0 to Bx.3, in order to store the component of scale result vector 30.Controller 36 is that transfer address signal AB0-AB3 arranges B0-B3 to impact damper, arrives the component impact damper B0.0-B3.3 that expects in order to read or to write component of a vector.In addition, controller 36 is to control with component of a vector write buffer row B0-B3 or by impact damper row B0-B3 via address wire AB0-AB3 to read component of a vector.Component impact damper B0.0-B3.3 can be read or write in the identical clock period.
The component of a vector that is provided by component impact damper B0.0-B3.3 is received by output rotor 38, and this output rotor 38 is to come the rotating vector component with the number of revolutions in a precalculated position.In addition, impact damper row B0-B3 can utilize a kind of method to read, and comes output component with omnidirectional's amount form.Therefore, output rotor 38 is exported, and to handle vector 18 are the component of a vector with time and preface form.For example, vectorial 18a has been handled in output rotor 38 output first, and this first has handled vectorial 18a and have component X1, Y1, Z1 and W1.From the above, the vector of processing 18 exported of output rotor 38 is to be positioned among time and the preface component of a vector stream.
As shown in Figure 4, be the calcspar that is expressed as input rotor 34.Input rotor 34 comprises the first rank multiplexer row 44a-44d, and these first rank multiplexer row 44a-44d is connected to the second rank multiplexer row 46a-46d.Scale result vector 30 provides the input end of component to the first rank multiplexer 44a-44d.Therefore, input end a, b, c and the d of the first rank multiplexer 44a-44d receives X0, X1, X2, X3; Y0, Y1, Y2, Y3; Z0, Z1, Z2, Z3; Deng component of a vector.The second rank multiplexer 46a-46d exports postrotational component among the component impact damper B0.0-B3.3 to.The output terminals A of multiplexer 46a is to be connected to impact damper row B0, the output terminal B of multiplexer 46b is connected to impact damper row B1, the output terminal C of multiplexer 46c is connected to impact damper row B2, and the output terminal D of multiplexer 46d is connected to impact damper row B3.The address wire AB0-AB3 of controller 36 selects to desire to be written into the impact damper B0.0-B3.3 of component of a vector in order in impact damper row B0-B3.Input and output rotor control bit A0 and A1 operation in order to control multiplexer 44a-44d and 46a-46d, for example, the specific direction that component of a vector can be output (also be suitably and rotate).In addition, the first rank multiplexer 44a-44d is controlled by Spin Control position A1, and the second rank multiplexer 46a-46d is controlled by Spin Control position A0.From the above, the present invention can provide the input end of any component of a vector in each second rank multiplexer 46a-46d.
As shown in Figure 5, be the calcspar that is expressed as output rotor 38.Output rotor 38 is similar in appearance to input rotor 34, and the rotation of using identical input and output control bit A0 and A1 to come the control vector component.Output rotor 38 has the first rank multiplexer row 48a-48d, and these first rank multiplexer row 48a-48d is that input end is to be connected to impact damper row B0-B3.From the above, the input end a of multiplexer 48a is connected to impact damper row B0, the input end b of multiplexer 48b is connected to impact damper row B1, and the input end c of multiplexer 48c is connected to impact damper row B2, and the input end d of multiplexer 48d is connected to impact damper row B3.The address wire AB0-AB3 of controller 36 selects to desire to be written into the impact damper B0.0-B3.3 of component of a vector in order in impact damper row B0-B3.Rotor control bit A1 is in order to select the wherein component of a vector of an input end of the output first rank multiplexer 48a-48d.The output of the first rank multiplexer 48a-48d is a corresponding wherein input that exports the second rank multiplexer 50a-50d to.Rotor control bit A0 in order to select the second rank multiplexer 50a-50d wherein an output terminal with as output.Therefore, by the combination of selecting suitably to select rotor control bit A0 and A1, then can be in order to control the first rank multiplexer 48a-48d and the second rank multiplexer 50a-50d, with the rotary buffer component of a vector.Therefore, the second rank multiplexer 50a-50d can produce and preface (omnidirectional's amount or level) component of a vector stream.
As shown in Figure 6, be in order to the structural drawing of expression in order to the controller 36 of generation rotor control bit A1, A2 and position, position AB0-AB3.Controller 36 has upwards a counter 52 and a downward counter 53.Upwards counter 52 is to increase counting according to the instruction cycle, and 53 in counter reduces counting according to the instruction cycle downwards.Upwards counter 52 has three output terminals, is respectively 0,1 and 2.Upwards the output terminal 0 of counter 52 is to be input and output control bit A0 and A1 with output terminal 1.Upwards the output terminal 2 of counter 52 provides one and selects H/L to four multiplexer of signal 61,60,62 and 64.Downwards counter 53 has two output terminals 0 and 1, is to be input to address 54,56 and 58 and multiplexer 61. Address 54,56 and 58 is in order to adding constant value 1,2 and 3 respectively to the counting of downward counter 53, and will count summation and provide respectively to the input end of multiplexer 60,62 and 64. Multiplexer 60,62 and 64 output terminal provide the addressing of impact damper row B0-B3.During instruction cycle 1-4, select signal H/L signal in order to select the upwards counter input end of multiplexer 61,60,62 or 64, use during instruction cycle 1-4, address as shown in Figure 8 is provided.During instruction cycle 5-8, select signal H/L in order to selecting the downward counting address of multiplexer 61, and select to be provided to the output of the adding circuit of multiplexer 62,62 and 64, use during instruction cycle 5-8, address as shown in Figure 8 is provided.During cycle 9-12, selecting signal H/L signal is to select the upwards output of counter once again, to provide to address wire AB0-AB3.
Controller 36 produces input and output Spin Control position A0 and A1, during one-period in, as address wire AB0-AB3, be according to suitable order, component of a vector is write the component impact damper or reads component of a vector by the component impact damper.Become suitably form in order suitably to rearrange component data, component at first needs to write among the component impact damper B0.0-B3.3 " level ", then, when new component is written into component impact damper B0.0-B3.3, then with " vertically " by reading the component that has been written among the impact damper B0.0-B3.3.When new data was written into impact damper B0.0-B3.3, component was side by side with " level " by reading among the impact damper B0.0-B3.3.Above-mentioned operation is constantly repeatedly, uses the above-mentioned component of conversion.
As shown in Figure 8, be the sequential chart that is expressed as four component quadrature conversions using quadrature converter 32.Sequential chart is to show that component impact damper B0.0-B3.3 comes in addition addressing with address wire AB0-AB3.In addition, sequential chart is the input vector component of expression impact damper row B0-B3, and the output vector component of expression impact damper row B0-B3, and is used to import the quantity with output vector, as shown in Figure 8.
During first to period 4 (meaning is cycle 1-4), component of a vector be with " level " and write component impact damper B0.0-B3.3.In particular, during first (1) instruction cycle, at the correlated components X0 of first sequential, X1, X2 and X3 are component impact damper B0.0, B1.0, B2.0 and the B3.0 that is written into separately.During second (2) instruction cycle, at the correlated components Y0 of second sequential, Y1, Y2 and Y3 are that (Y1 Y2), and is written into branch batching counter B0.1, B1.1, B2.1 and B3.1 respectively for Y3, Y0 with the position that turns clockwise by input rotor 34.Similarly, during the 3rd (3) instruction cycle, at the correlated components Z0 of the 3rd sequential, Z1, Z2 and Z3 are that (Z0 Z1), and is written into branch batching counter B0.2, B1.2, B2.2 and B3.2 respectively for Z2, Z3 with two positions that turn clockwise by input rotor 34.During the 4th (4) instruction cycle, at the correlated components W0 of the 4th sequential, W1, W2 and W3 are that (W3 W0), and is written into branch batching counter B0.3, B1.3, B2.3 and B3.3 respectively for W1, W2 with three positions that turn clockwise by input rotor 34.During first to the 4th instruction cycle, component of a vector is only to be written into component impact damper B0.0-B3.3, but and can't help component impact damper B0.0-B3.3 and read.Controller 36 produces input and output Spin Control position A0 and A1 and address wire AB0-AB3, in order to write component of a vector with suitable order.
During the the the 5th (5) to the 8th (8) instruction cycle, be to comprise the situation of reading component of a vector and component of a vector being write component impact damper B0.0-B3.3 by among the component impact damper B0.0-B3.3.As shown in Figure 8, during the 5th (5) instruction cycle, address wire AB0-AB3 comes the suitable component impact damper of addressing according to the shown form of Fig. 8.In case address buffer is addressed, the component that then formerly is written into the component impact damper during addressing period will be read out, and when component of a vector is read out, then new component of a vector will be written in the component impact damper.Therefore, during the 5th (5) instruction cycle, component of a vector X0, Y0, Z0 and W0 will be respectively by impact damper B0.0, and B1.1 is read out among B2.2 and the B3.3, and component of a vector X4, X5, X6 and X7 are write B0.0 respectively, B1.1, B2.2 and B3.3.As shown in Figure 8, during the 5th (5) instruction cycle, do not need rotation input and output vector.In addition, on impact damper, there be one " 45 degree counter forward (clockwise) rotation ", for example, the cornerwise impact damper B0.0 that just has been read out, B1.1, B2.2 and B3.3 become the buffer pool of new first level, write in order to conduct.In the cycle 6, impact damper B1.0, B2.1, B3.2 and B0.3 become the buffer pool of new second level, write in order to conduct, and this is owing to when the 6th cycle, impact damper B1.0, B2.1, B3.2 and B0.3 are read out.7 o'clock cycles, impact damper B2.0, B3.1, B0.2 and B1.3 become the buffer pool of new the 3rd level, write in order to conduct, and this is owing to when the 7th cycle, impact damper B1.0, B2.1, B3.2 and B0.3 are read out.At last, when the 8th cycle, impact damper B3.0, B0.1, B1.2 and B2.3 become the buffer pool of new the 4th level, write in order to conduct, and this is owing to when the 8th cycle, impact damper B1.0, B2.1, B3.2 and B0.3 are read out.During the 5th to the 8th instruction cycle, the component impact damper is to be read and write by " vertically " according to addressing as shown in Figure 8 is next.
During the 9th cycle, component impact damper B0.0-B3.3 flatly reads and writes component of a vector.In the 9th the (9) to the 12 instruction cycle, be identical with the method for output vector with first to the 4th (1-4) instruction cycle in order to the method for addressing component impact damper B0.0-B3.3 and in order to the rotation input.Therefore, during the 9th (9) instruction cycle, output vector is X4, Y4, and Z4 and W4, above-mentioned output vector is to be written into during the instruction cycle at the 5th to the 8th (5-8).In addition, during the 9th (9) instruction cycle, X component X8, X9, X10 and X11 are the component impact dampers that is written to separately.Hence one can see that, and during the 9th the (9) to the ten two (12) instruction cycle, component of a vector is write and reads " level ".
For continue the converting vector component to and preface component of a vector stream, its program be optionally write and read " vertically " component of a vector and " level " write and read component of a vector.Hence one can see that, during the ten two (12) instruction cycle after, the addressing shown in the 5th (5) individual instruction cycle will occur once more with rotation, and can continue to continue.From the above, addressing and rotated sample during the the the 5th (5) to the 8th (8) instruction cycle will occur repeatedly, write and read with " vertically " as component of a vector, and the 9th (9) sample to ten two (12) instruction cycles will occur repeatedly, writes and reads with " level " as component of a vector.This program will continue till all component of a vector all are converted.
As mentioned above, be vector about four components.Yet, in the method for the invention, applicable to having any number of components purpose vector, as shown in Figure 7.For example, data stream can be expressed as follows:
X i={X i0,X i1,...,X in-1} (1)
Wherein, n is the width of the token that separates in the data stream, and i is the number of the token that separates in the data stream.
Then initial at some token i, will produce output Y I+j:
Y i+j={Y i+j0,Y i+j1,...,Y i+jn-1}={X ij,X i+1j,...,X i+n-1j} (2)
Wherein, j<n
In order to understand above-mentioned output Y I+j, arrange, highly be that the quadrature internal storage structure of n explains, as shown in Figure 7 with a n at this.The input data are X, be to utilize component to enroll index, and output data are Y, also enrolls index by component.Internal memory input data are I, be that the row's of utilization number enrolls index, and the internal memory output data are O, also enrolls index by component.In addition, read the address and write the address and be respectively R and W, also enroll index by component.
In the phase one, be written to each row's number i during each clock j of interior existence, shown in following:
W ij=j (3)
I ij=X ((i+j)mod?n)j (4)
Wherein, " mod " function is in order to keeping the remainder after the division, and function ((i+j) mod n) is turning clockwise of each numerical value of carrying out in j>0 o'clock, and the number of times of rotation is the size according to the i value.If i=0 then will can not carry out any rotation.If i=1, then carry out turning clockwise once.If i=2, then carry out twice turn clockwise.If i=3, then carry out three times turn clockwise.
At the same time, the data that before are written into will be read out by same position, shown in following:
R ij=j (5)
Y ij=O (i+j)mod?n)j (6)
Wherein, function ((i+j) mod n) is turning clockwise of each numerical value of carrying out in j>0 o'clock, and the number of times of rotation is the size according to the i value.
This is reading-write phase of " level ".After n clock, all before data will be read out, and new data will be written into.In this connection, above-mentioned flow process is to switch to reading-write phase of " vertically " by the reading of " level "-write phase, and for example, data are being written into " level ", and to be read out " vertically ".
When each clock j, data are to be read into n row by each row i, shown in following:
R ij=(i+j)mod?n (7)
Y ij=O ((i+j)mod?n)j (8)
At the same time, new data will be write " vertically ", and the next one is during the stage again to place, and data can be read " level ", shown in following:
W ij=(j+j)mod?n (9)
I ij=X (i+j)mod?n)j (10)
Above-mentioned flow process is each n column data of conversion between " level " and " vertical " constantly.The idle periodicity of output stream is to equal the idle periodicity of inlet flow, is n with the periodicity that produces the total delay number.
By the explanation of Fig. 7 as can be known, rearrange data according to method of the present invention, if need remove read/write element j/*, the row's number that then has identical address will be by access.Yet if need remove read/write element */i, data will be read out with " diagonal line ground ", so that the address of each row's number increases according to this or reduces.In addition, in Fig. 7, also explanation is when in write phase, and data are can be rotated according to the access address, and when when reading the stage, then data will be reversed and rotate.
Though the present invention discloses as above with a preferred embodiment; right its is not in order to limit the present invention; anyly have the knack of this skill person; without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking appended the claim person of defining.

Claims (24)

1. data converter, in order to change a plurality of vectors of a group, be by the time in proper order format conversion be time and preface form, wherein, in time in proper order in the form, these vectors comprise a plurality of correlated components groups, and each correlated components group is at same time slot, and, in time and preface form, each vector is at a time slot, and this data converter comprises:
One input rotor, according to each correlated components group that one first quantity is rotated these vectors, this first quantity is to change along with the time slot of the correlated components group that is rotated;
The a plurality of impact dampers of one row are to be coupled to this input rotor, in order to receiving the correlated components group that this is rotated, and have an impact damper and store the correlated components group that each is rotated;
One output rotor is to be coupled to these impact dampers, and in order to a plurality of correlated components groups that receive and rotate a vector according to one second quantity, this second quantity is to time slot that should component of a vector; And
One controller when each vectorial correlated components group has been stored in an impact damper of this row's impact damper, in order to controlling the addressing of this row's impact damper, and is collected each vectorial correlated components group, to be used for follow-up output rotation.
2. data converter as claimed in claim 1, wherein, each vector has n correlated components group, and its index value is for 0 to n-1, so 0 to n-1 group correlated components is arranged; And this input rotor is according to this first quantity, and rotates a plurality of correlated components groups of these vectors, and this first quantity is the index value that equals these correlated components groups.
3. data converter as claimed in claim 1, wherein, these vectorial quantity are n, its index value is 0 to n-1; And this output rotor is according to this second quantity, and rotates a plurality of correlated components groups of these vectors, and this second quantity is to equal these vectorial index values.
4. data converter as claimed in claim 1 wherein, comprises one in order to store the impact damper of these correlated components groups in this row's impact damper.
5. data converter as claimed in claim 4, wherein, each vector has n component, and each impact damper has n component impact damper.
6. data converter as claimed in claim 5, wherein, this row's impact damper has n impact damper.
7. data converter as claimed in claim 1, wherein, this row's impact damper is in order to write and to read these correlated components groups in the same clock period.
8. data converter as claimed in claim 1, wherein, this controller can be arranged in the impact damper at this, carries out level in turn and writes and operation of reading and the vertical operation that writes and read.
9. data converter as claimed in claim 8, wherein, this vector has n component, and this controller is flatly to write n group correlated components and flatly read n vector.
10. as claim 9 a described data converter, wherein, after this controller flatly write the correlated components of n group and flatly reads n vector, this controller vertically write n group correlated components and vertically reads n vector.
11. data converter as claimed in claim 1, wherein, this output rotor is that these component of a vector are rotated to a position, and this position is the opposite relatively position for this input rotor.
12. one kind in order to a plurality of vectors of a group by the time in proper order format conversion be the method for time and preface form, wherein, in time in proper order in the form, these vectors comprise a plurality of correlated components groups, and each correlated components group is at same time slot, and, in time and preface form, each vector is at a time slot, and this method comprises:
For each group correlated components, rotate these correlated components groups according to one first quantity, this first quantity is to time slot that should the correlated components group, and each group is rotated correlated components writes to the impact damper that the component in a plurality of impact dampers of a row is opened; And
For each vector in this group, read the selected impact damper in this row's impact damper, in order to collecting these vector components, and to rotate these according to one second quantity and collected component, this second quantity is to time slot that should component of a vector.
13. method as claimed in claim 12, wherein, if these components are flatly write this row's impact damper, then these components are by flatly by reading in this row's impact damper.
14. method as claimed in claim 12, wherein, if these components are vertically write this row's impact damper, then these components are by vertically by reading in this row's impact damper.
15. method as claimed in claim 12, wherein, when wherein one group of correlated components was written into, then in the same clock period, a vector components was read out.
16. method as claimed in claim 12, wherein, each vector has n component, and n group correlated components was flatly write in n clock period, and these vectors are also flatly read in the clock period at this identical n.
17. method as claimed in claim 16, wherein, after the clock period in other n of thing followed clock period, n group correlated components is vertically write, and a plurality of vector is vertically read at this n.
18. data converter, in order to change a plurality of vectors of a group, be by the time in proper order format conversion be time and preface form, wherein, in time in proper order in the form, these vectors comprise a plurality of correlated components groups, and each correlated components group is at same time slot, and, in time and preface form, each vector is at a time slot, and this data converter comprises:
One input whirligig is according to one first predetermined quantity, rotates each group correlated components of these vectors, and this first predetermined quantity is the time slot of a corresponding certain relevant component group;
One storage device is coupled to this input whirligig, is rotated correlated components in order to store one group;
One output whirligig is coupled to this storage device, in order to receiving a vector components from this storage device, and according to one second predetermined quantity, rotates these components, and this second predetermined quantity is to time slot that should certain relevant component group, and
One control device is coupled to this input whirligig, this storage device and this output whirligig, in order to control the operation of this input whirligig, this storage device and this output whirligig.
19. data converter as claimed in claim 18, wherein,
This input whirligig is an input rotor, according to each group correlated components that this first predetermined quantity rotates institute's directed quantity, the time slot of the corresponding one group of correlated components of this first predetermined quantity;
This storage device comprises a plurality of impact dampers of a row, has one in order to store the impact damper that each group is rotated correlated components; And
This output whirligig is an output rotor, according to this second predetermined quantity, receives and rotates a vector components, and this second predetermined quantity is to time slot that should vector.
20. data converter as claimed in claim 19, wherein, this storage device is in order to write and to read these vector components in the same clock period.
21. data converter as claimed in claim 20, wherein, this storage device is in the clock period of a predetermined number, in order to flatly to write these correlated components, then also flatly reads a plurality of vectors.
22. data converter as claimed in claim 21, wherein, in the clock period of other predetermined number, this storage device is then also vertically read a plurality of vectors in order to vertically to write these correlated components.
23. data converter as claimed in claim 18, wherein, this control device writes and reads this storage device in order to control with these vector components, and in order to control this input whirligig and this output whirligig, to rotate these vector components.
24. data converter as claimed in claim 18, wherein, this output whirligig rotates to a direction with these vector components, and this direction is opposite with the direction that this input whirligig rotates one group of correlated components.
CNB2004100786966A 2003-09-19 2004-09-17 Synchronous periodical orthogonal data converter Active CN100517212C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/666,083 2003-09-19
US10/666,083 US7284113B2 (en) 2003-01-29 2003-09-19 Synchronous periodical orthogonal data converter

Publications (2)

Publication Number Publication Date
CN1591316A CN1591316A (en) 2005-03-09
CN100517212C true CN100517212C (en) 2009-07-22

Family

ID=34619749

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100786966A Active CN100517212C (en) 2003-09-19 2004-09-17 Synchronous periodical orthogonal data converter

Country Status (2)

Country Link
CN (1) CN100517212C (en)
TW (1) TWI263934B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8659611B2 (en) * 2010-03-17 2014-02-25 Qualcomm Mems Technologies, Inc. System and method for frame buffer storage and retrieval in alternating orientations
CN106775592B (en) * 2011-12-23 2019-03-12 英特尔公司 Processor, the method for computing system, machine readable media and computer system

Also Published As

Publication number Publication date
TW200512644A (en) 2005-04-01
CN1591316A (en) 2005-03-09
TWI263934B (en) 2006-10-11

Similar Documents

Publication Publication Date Title
US5812147A (en) Instruction methods for performing data formatting while moving data between memory and a vector register file
EP0053457B1 (en) Data processing apparatus
US5421019A (en) Parallel data processor
US6002880A (en) VLIW processor with less instruction issue slots than functional units
JP2021508125A (en) Matrix multiplier
CN104838357B (en) Vectorization method, system and processor
US7761694B2 (en) Execution unit for performing shuffle and other operations
US3270324A (en) Means of address distribution
TWI731904B (en) Systems, apparatuses, and methods for lane-based strided gather
US20040215927A1 (en) Method for manipulating data in a group of processing elements
CN100517212C (en) Synchronous periodical orthogonal data converter
EP1314099B1 (en) Method and apparatus for connecting a massively parallel processor array to a memory array in a bit serial manner
TW201545057A (en) Apparatus and method for selecting elements of a vector computation
CN116050492A (en) Expansion unit
US7263543B2 (en) Method for manipulating data in a group of processing elements to transpose the data using a memory stack
AU604358B2 (en) Prefetching queue control system
CN111291320A (en) Double-precision floating-point complex matrix operation optimization method based on HXDDSP chip
JP2547219B2 (en) Vector data access control apparatus and method
JPS6122830B2 (en)
US7930518B2 (en) Method for manipulating data in a group of processing elements to perform a reflection of the data
JPS6083153A (en) Data memory
GB2393277A (en) Generating the reflection of data in a plurality of processing elements
US5027300A (en) Two level multiplexer circuit shifter apparatus
CN103544131A (en) Vector processing architecture capable of dynamic configuration
SU1647594A1 (en) Programmable controller

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant