CN1294345A - Shift rotation type hardware controller for supporting polycyclic software flow - Google Patents

Shift rotation type hardware controller for supporting polycyclic software flow Download PDF

Info

Publication number
CN1294345A
CN1294345A CN 00133535 CN00133535A CN1294345A CN 1294345 A CN1294345 A CN 1294345A CN 00133535 CN00133535 CN 00133535 CN 00133535 A CN00133535 A CN 00133535A CN 1294345 A CN1294345 A CN 1294345A
Authority
CN
China
Prior art keywords
register
generation module
core
select
validity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 00133535
Other languages
Chinese (zh)
Other versions
CN1108559C (en
Inventor
容红波
汤志忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN 00133535 priority Critical patent/CN1108559C/en
Publication of CN1294345A publication Critical patent/CN1294345A/en
Application granted granted Critical
Publication of CN1108559C publication Critical patent/CN1108559C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

A shift rotation type hardware controller for supporting polycyclic software flow has a start-stop control and clock-generating module to send clock signals to register group, whose contents are updated, stabilized and then output to next kernel generation module. After a new kernel is generated, the selection generating module generates validity selection and register selection and sends them to mapping network to generate operation validity and register selection. The instruction generating operation validity and register selection. The instruction generating module outputs all physical instructions relative to current kernal. The only kernel codes generated by polycyclic software flow can be converted to executable instruction sequence.

Description

A kind of shift rotation type hardware controller of supporting polycyclic software flow
The present invention relates to a kind of shift rotation type hardware controller of supporting polycyclic software flow, it belongs to the very-long instruction word processor technical field in the field of computer technology.
Circulation is a part of code that is rich in concurrency in the program most, takes the execution time most.Software flow is a kind of main technique of compiling of instruction-level parallelism in the development cycle.In this field, the focus that loop parallelization becomes invention and studies because of its importance is because of its difficulty is confined to the substance circulation.
The United States Patent (USP) name is called " the architecture support of polycyclic software flow ", and the patent No. is that US5958048 has proposed a kind of processor structure, has wherein comprised the support to polycyclic software flow.Its principle is: provide a cover loop parameter and status register and multiple branches to shift control structure, with the Control Circulation implementation, thereby produce such effect: the evacuation section of an interior loop can be overlapped with packing into of another interior loop.So compiler is controlled the code of carrying out as long as generate the code that is provided with of loop parameter and status register and needn't generate.Like this, compiler may more effectively carry out software flow to Multiple Cycle.
Above-mentioned United States Patent (USP) has been represented present most polycyclic software flow technology.Their basic thinking all is: at first innermost loop is carried out software flow; Then new innermost loop is regarded as an indivisible atomic operation, circulate so its skin circulation just can be regarded a substance as, continue to use the same method and carry out software flow (being equivalent to the emptying of overlapping two adjacent innermost loop and the part of packing into).This process can expand to outermost loop always.The concurrency that is developed depends primarily on innermost loop.Above-mentioned United States Patent (USP) is exactly the code that this process is produced, and the support of carrying out is provided.
But maximum concurrency not necessarily is present in innermost loop.It may be present in any one deck circulation.This depends on circulation itself.When the innermost loop concurrency was very little, by said process, continuing also can not significantly increase concurrency to skin circulation carrying out software flow.Because it is intrinsic that concurrency is a circulation institute, but not the device in the above-mentioned United States Patent (USP) can change, and therefore, it can't give full play to potentiality.This is an essential defect on the mentality of designing.
In addition, above-mentioned United States Patent (USP) is to packing into and special control is all carried out in emptying, and requires circulation to satisfy two requirements: (1) internal memory read operation must be positioned at first stage of logic loops body; (2). have the last stage that spinoff (for example internal memory is write, cycle interruption etc.) operation must be positioned at the logic loops body.
The objective of the invention is to design a kind of shift rotation type hardware controller of supporting polycyclic software flow, to overcome the shortcoming of above-mentioned prior art, the only core code that polycyclic software flow is generated is automatically converted to the executable instruction sequence.
The shift rotation type hardware controller of the support polycyclic software flow of the present invention's design comprises start and stop control and clock generating module, registers group, following core generation module, displacement Spin Control module, logical row validity selection generation module, logical row register selection generation module, mapping network and instruction generation module; Wherein start and stop control and clock generating module are after receiving the beginning current signal, send clock signal 1 to registers group, after clock signal 1 enters registers group, registers group is upgraded and is stable, exports down content of registers to core generation module, logical row validity selection generation module, logical row register selection generation module and instruction generation module respectively; Then, following core generation module produces the new core that should be provided with when next clock signal 1 arrives, and exports registers group to; The validity that logical row validity selects generation module and logical row register to select generation module to generate all logical rows is respectively selected and register is selected, and sends into mapping network, and output becomes physical operations validity and register is selected; Simultaneously, the instruction generation module is exported pairing all the physics instructions of current core.
In an embodiment of the present invention, export another clock signal 2 from start and stop control and clock generating module to mapping network and instruction generation module, this signal makes the instruction generation module export the physics instruction that current core comprises, and makes mapping network export pairing physical operations validity selection of all physical operationss and register selection in this instruction.
The present invention can also have another kind of version.Difference is: the input of instruction generation module is not from registers group, but from following core generation module, is input with the new core that this module was produced.
In the said apparatus of the present invention, registers group wherein comprises: scalar register file H under m the one dimension 1, H 2..., H mM main logic column selection bit register C 1, C 2..., C mM current core logic column selection bit register S 1, S 2..., S mN-1 is to following scalar register file, and wherein, a pair of 2 dimensions are scalar register file i down 2And i 2', a pair of 3 dimensions are scalar register file i down 3And i 3' ..., a pair of n dimension is scalar register file i down nAnd i n'; And current core register K, wherein, m is the maximum operation number of instruction form defined; N is the maximum nested number of plies of Multiple Cycle; Scalar register file H under described each one dimension xOutput be connected to bus 1 by triple gate, if x=1, described register H xInput from bus 1 and described register H xValue subtract 1 and select among both, if x ≠ 1, described register H xInput from bus 1 and described register H X-1Select among both; Described each main logic column selection bit register C xOutput be connected to bus 2 by triple gate, if x=1, described register C xInput among bus 2 and an invalid flag, select, if x ≠ 1, described register C xInput from bus 2 and described register C X-1Select among both; Described each current core logic column selection bit register S xOutput be connected to bus 3 by triple gate, if x=1, described register S xInput among bus 3 and an effective marker, select, if x ≠ 1, described register S xInput from bus 3 and described register S X-1Select among both; Described each following scalar register file i xInput from the x dimension subscript upper bound or described register i xValue subtract 1 and select among both; Described each following scalar register file i x' input from described register i xPerhaps 0 select among both.
The shift rotation type hardware controller of the support polycyclic software flow of the present invention's design, have following beneficial effect: (1)<(<definition that type is relevant is: correlation distance is tieed up in the components to flow control 2 to n in a big class multiple (the comprising substance) circulation that type is relevant for nothing that the present invention was suitable for, first non-0 component is less than 0), generally speaking, the concurrency that the present invention supported is not less than above-mentioned prior art.Unlike the prior art, basic ideas of the present invention are: " that one deck that selection has maximum concurrency carries out software flow (being parallelization), other layer serialization ".Compiler is given device of the present invention according to the core code of this thinking generation Multiple Cycle with it, and this device just is that this core code dynamically produces an instruction sequence.In theory, the concurrency of this parallelization code is not less than the code that above-mentioned prior art produces at least, so the concurrency that the present invention supported is not less than above-mentioned prior art at least.When having big concurrency in other layer, the concurrency that the present invention supported will be significantly greater than above-mentioned prior art when the innermost loop concurrency is very little.(2) the present invention does not need to pack into and emptying control.Pack into, the unified without distinction control of emptying and core.Therefore simplified control.Pack into and emptying in shifting process, can nature produce and finish: packing into starts from displacement for the first time, and at this moment, some logical rows are because corresponding current core logic column selection is invalid and not selected; Emptying starts from the displacement after the last rotation, and at this moment, some logical rows are because the first corresponding dimension subscript is not selected less than 0.(3) the present invention has no requirement for the particular location of operation in the circulation.
In addition, an advantage applies of the present invention exists: realize that cost is lower.Its employing operation is motionless, and the method for subscript displacement or rotation makes that the input of mapping network is the register selection and the validity selection (having only two lines) of each logical row; And if the employing subscript is motionless, the method for operation displacement or rotation, then the input of mapping network comprises the validity selection of each logical row at least, and all the root lines (sum is inevitable greatly more than 2) that are somebody's turn to do an operation in the row.
Description of drawings:
Fig. 1 is the structured flowchart of apparatus of the present invention.
Fig. 2 is an embodiment of apparatus of the present invention.
Fig. 3 is an another kind of structured flowchart of the present invention.
Fig. 4 is the control circuit schematic diagram of register H, C in the displacement Spin Control module in apparatus of the present invention, S.
Fig. 5 is an example of input and output of the present invention.
Among all figure, single position signal represents that with fine rule a plurality of signals are represented with thick line.
Introduce content of the present invention below in detail, and in conjunction with the accompanying drawings, introduce principle of work of the present invention and embodiment.
Referring now to Fig. 1.Scalar register file H under the one dimension in the registers group 101 1, H 2..., H mBe referred to as H, main logic column selection bit register C 1, C 2..., C mBe referred to as C, current core logic column selection bit register S 1, S 2..., S mBe referred to as S.
The output of C, H, K is directly coupled to down the input of core generation module 102, and this module is exported next core K ', is coupled to the input of current core register K and displacement Spin Control module 103.Row are not last row of current core, K '=K so if C shows main logic; Otherwise if for the circulation of x layer, the main logic row are not last row or the H of this circulation core x≠ 0, and for any one circulation of its inside, as the circulation of y layer, the main logic row are last row and the H of this circulation core y=0, so, make K ' expression x layer round-robin core.
The renewal of displacement Spin Control module 103 control register groups.At first, it controls input selection and the output triple gate of H, C, S.Specifically,
1, when K ' indication outermost loop core, for each group H x, C x, S x, make the triple gate that bus 1, bus 2, bus 3 are led in their output be in high-impedance state; Select H x-1 (if x=1) or H X-1If (x ≠ 1) is as H xInput; Select invalid flag (if x=1) or C X-1If (x ≠ 1) is as C xInput, select effective marker (if x=1) or S X-1If (x ≠ 1) is as S xInput.
2 otherwise, be listed as if first logical row of K ' and last logical row are respectively logic core x row and y, then make H y, C y, S yOutput is opened (allowing output to pass to bus) towards the triple gate of bus 1, bus 2, bus 3, makes other all triple gates be in high-impedance state; For each group H z, C z, S z(x<z≤y), select H respectively Z-1, C Z-1, S Z-1As input; For H x, C x, S x, select bus 1, bus 2, bus 3 respectively as input; Each group H for other u, C u, S u(u>y or u<x), they are kept.
The input that displacement Spin Control module 103 is also controlled other register is selected.Specifically,
1, when K ' indication outermost loop core, selects 0 as i 2', i 3' ..., i n' input, select N 2, N 3..., N nAs i 2, i 3..., in input.
2 otherwise, select i 2, i 3..., i nAs i 2', i 3' ..., i n' input; If K ' indication x layer round-robin core makes i 2, i 3..., i X-1Keep, select i X-1As i xInput, select N X+1, N X+2..., N nAs i X+1, i X+2..., i nInput.
In addition, the input of current core register K K ' always.
It is input with H, S that logical row validity is selected generation module 104, selects signal for each logical row produces a validity, outputs to mapping network 106.For example, for x logical row, produce a validity and select signal E xIf: S xEffectively and H x〉=0, then make E xEffectively, otherwise invalid.
It is input with C that the logical row register is selected generation module 105, for each logical row produces a register selection signal, outputs to mapping network 106.For example, for x logical row, produce a register selection signal R x: if C shows that the main logic row are y logical rows, and x>y then makes R xHave particular value (for example, high level), 2~n dimension subscript of representing this logical row all operations is from i 2', i 3' ..., i n'; Otherwise, make R xHave another particular value (for example, low level), 2~n dimension subscript of representing this logical row all operations is from i 2, i 3..., i n
Mapping network 106 is expressed the mapping relations of a logic core to the physics core, the validity of logic core each row and column selected and register is selected to be mapped to the corresponding row and column of physics core and got on, output becomes that physical operations validity is selected and the register selection.For example, if the operation of the capable y row of the x of logic core is positioned at the capable z row of the x of physics core, then its validity is selected E y, register selects R yBe mapped to that physics core x is capable to instruct z operation, the validity that makes it to become this operation is selected and the register selection, if i.e.: E yInvalid, then this operation will not be performed; Otherwise this operation will be performed, and when carrying out according to R yValue decision quote i 2', i 3' ..., i n' or i 2, i 3..., i nThe implementation method of mapping network 106 is a lot, is that the general engineering technology personnel are familiar with in this area.The most usually, can use the right-angled intersection network.
Instruction generation module 107 is input with current core K, pairing all the physics instructions of output K.Also multiple implementation method can be arranged: the physics core can be left in some registers, then will be wherein pairing all physics of K instruct disposable whole output, perhaps instruction ground output of an instruction.This also is that the general engineering technology personnel accomplish easily in this area.
Start and stop control and clock generating module 108 are input with a commencing signal, export a clock signal 1 to registers group 101.
In general, the input of this device is a commencing signal, and output is that instruction, physical operations validity are selected and register is selected, 1~n ties up subscript.Output can directly be delivered to functional part and carry out, and also can put into the instruction buffer, and this depends on the concrete needs of processor.
Introduce the principle of work of apparatus of the present invention below.After commencing signal arrives, start and stop control and clock generating module 108 clockings 1.After clock signal 1 was sent into registers group 101, registers group was upgraded.Specifically, because the control of 103 pairs of register input and output of above-mentioned displacement Spin Control module makes:
1, when K ' indication outermost loop core, H, C, S displacement.That is: H 2, H 3..., H mValue become original H respectively 1, H 2..., H M-1Value, H 1Subtract 1; C 2, C 3..., C mValue become original C respectively 1, C 2..., C M-1Value, C 1It is invalid to be changed to; S 2, S 3..., S mValue become original S respectively 1, S 2..., S M-1Value, S 1Be changed to effectively.Unique exception is: if K indication this moment is not the outermost loop core, then C is not shifted, but is forced to indicate first logical row of innermost loop core.That is: make each C xBe changed to effective (if logic core x row are first logical rows of innermost loop core) or be changed to invalid (if logic core x row are not first logical rows of innermost loop core).
2 otherwise, H, C, S are corresponding to that part of rotation of K '.That is: if first logical row of K ' and last logical row are respectively logic core x row and y row, then H y, H Y-1..., H X+1Value become original H respectively Y-1, H Y-2..., H xValue, H xBecome original H yValue; C y, C Y-1..., C X+1Value become original C respectively Y-1, C Y-2..., C xValue, C xBecome original C yValue; S y, S Y-1..., S X+1Value become original S respectively Y-1, S Y-2..., S xValue, S xBecome original S yValue.Other H u, C u, S u(u>y or u<x) register is all constant.
For other register, because displacement Spin Control module 103 makes K become the value of K ' to the selection of its input, and:
1, when K ' indication outermost loop core, i 2', i 3' ..., i n' is clear 0, i 2, i 3..., i nBecome N respectively 2, N 3..., N n
2 otherwise, i 2', i 3' ..., i n' become original i respectively 2, i 3..., i nValue; If K ' indication x layer round-robin core, then i 2, i 3..., i X-1Constant, i xSubtract 1, i X+1, i X+2..., i nBecome N respectively X+1, N X+2..., N n
After registers group 101 was finished above-mentioned renewal, following core generation module 102 produced the new core that should be provided with when next clock signal 1 arrives; The validity that logical row validity selects generation module 104 and logical row register to select generation module 105 to generate all logical rows is respectively selected and register is selected, and sends into mapping network 106, and output becomes physical operations validity and register is selected; Simultaneously, pairing all physics instructions of the current core of instruction generation module 107 outputs.
After above-mentioned work was finished, row were not last row of outermost loop core, perhaps H if C shows main logic 1≠ 0, then start and stop control and clock generating module 108 produce next clock signal 1, otherwise do not produce next clock signal 1 (device quits work).
Fig. 2 shows one embodiment of the present of invention.Its composition and Fig. 1. in full accord, but more specifically a bit: instruction generation module 107 is taked way that instruction is exported of an instruction.For reaching this purpose, and increased a signal: the clock signal 2 that outputs to mapping network 106 and instruction generation module 107 from start and stop control and clock generating module 108.In addition, K outputs to start and stop control and clock generating module 108.
Start and stop control and clock generating module 108 are sent clock signal 1 to registers group 101, after registers group is upgraded and is stablized output, send x clock signal 2 (supposing that current core K comprises the instruction of x bar) to mapping network 106 and instruction generation module 107, each clock signal 2 makes instructs the generation module 107 outputs physics that current core K comprised to instruct, and makes pairing physical operations validity selection of all physical operationss and register selection in mapping network 106 these instructions of output.
After above-mentioned work was finished, row were not last row of outermost loop core, perhaps H if C shows main logic 1≠ 0, then repeat said process, otherwise, do not produce next clock signal 1 and clock signal 2 (device quits work).
Fig. 3 is an another kind of version block diagram of the present invention.It and Fig. 2 are basic identical, and still, the input of instruction generation module 107 is not current core K, but core K ' down.To make registers group upgrade like this and instruction generates, selects signal map no longer to resemble Fig. 2 serial to carry out, but overlapping parallel, so efficient is higher.Principle is as follows:
1, send before the commencing signal, initialization register group 101 makes C 1, S 1Effectively, K indication outermost loop core, H 1=N 1, i 2, i 3..., i nBe respectively N 2, N 3..., N n
2, send after the commencing signal, start and stop control and clock generating module 108 are not sent clock signal 1, but send x clock signal 2 (supposing that the outermost loop core comprises the instruction of x bar) to mapping network 106 and instruction generation module 107, each clock signal 2 makes a physics instruction (no matter importing K ' is what) of instructing generation module 107 output outermost loop cores to be comprised, and makes pairing physical operations validity selection of all physical operationss and register selection in mapping network 106 these instructions of output.
Row are last row of outermost loop core if 3 C show main logic, and H 1=0, no longer clocking 1 and clock signal 2 (device quits work) of start and stop control and clock generating module 108 then.Otherwise it sends a clock signal 1, begins to send x clock signal 2 (supposing that K ' comprises the instruction of x bar) simultaneously.So registers group is upgraded, instruct generation module 107 to instruct simultaneously for all physics of K ' generation correspondence, the pairing physical operations validity of all physical operationss is selected and the register selection in these instructions of mapping network 106 outputs.In other words, upgrading in registers group 101, stablizing as yet in the output, instruction generation module 107 has just instructed for this uncreated new current core K ' generation, and mapping network 106 has also generated for it and selected signal.This temporal overlapping, make functional module not have and obtain required instruction with waiting for and select signal.
4, repeating step 3 is until stopping.
Referring now to Fig. 4.Fig. 4 is the control circuit schematic diagram of register H, C, S in Fig. 1, Fig. 2, the displacement Spin Control module 103 shown in Figure 3.The annexation of H, C, S as mentioned above.U xBe when K ' be x layer circulation time, output triple gate control signal, enable signal and the input select signal of H, C, S.First logical row of signal F indication innermost loop core.Rotation finishes judge module 219 in K ' indication outermost loop core and K when not indicating the outermost loop core produces the rotation end signal, as another input selection of C.
The principle of work of control circuit schematic diagram shown in Figure 4 is as follows: when K ' is an x layer circulation time, n selects 1 selector switch 201 under the control of K ', selects output triple gate control signal, enable signal and the input select signal of Ux as H, C, S.For example: from U xObtain H m, C m, S mOutput triple gate control signal 202, enable signal 203 and input select signal 204.The input of module 213 is selected in input select signal 204 control inputs: first group 205 (from bus 1, bus 2, bus 3), second group 220 (from H M-1, C M-1, S M-1), among the figure, same group of input linked up with circle and represented.The situation all fours of other register.When clock signal 1 arrived, each register will be according to these control signals, and whether decision can output to bus; Be to keep raw content constant, or update content is an input value; If upgrade, select any group input.
Unique special case is: produce the rotation end signals if rotation finishes judge module 219, so, this signal will force input to select module to select F (for example, this signal forces input to select the m position 224 of module 213 selection F as C as the input of C mInput), thereby when clock signal 1 arrived, C became the value of F.
Fig. 5 is an example of explanation input and output of the present invention.Fig. 5 a is one two and recirculates that three physical operations O are arranged 1, O 2, O 3, the execution time delay of supposing each operation is 1.Data dependence graph is shown in Fig. 5 b.Wherein, target is the correlation distance vector on every limit.
Because the first dimension component of all correlation distance vectors all is 0, this explanation: there is not correlationship between different outer loop bodies, and in same outer circulation, because associated loop O 1→ O 2→ O 3→ O 1, make interior loop serial to carry out.Therefore, compiler is that the above-mentioned United States Patent (USP) core code that can generate is shown in Fig. 5 c according to current techniques, according to the method for " that one deck that selection has maximum concurrency carries out software flow (being parallelization), other layer serialization " is that core code that apparatus of the present invention generated is shown in Fig. 5 d.In two kinds of core codes, outer all is identical with the interior loop core.
Suppose N 1=6, N 2=2.Core code shown in Fig. 5 c is sent into the described device of above-mentioned United States Patent (USP), produce operation result shown in Fig. 5 e.Each operation the right is its subscript.As seen, its implementation is serial fully, and degree of parallelism is 1.
Core code shown in Fig. 5 d is sent into apparatus of the present invention, produce operation result shown in Fig. 5 f.As seen, in its implementation, always have 3 outer loop bodies carrying out simultaneously, degree of parallelism is 3, and the execution time only is 1/3 of the described device of above-mentioned United States Patent (USP).
Though describe the present invention in detail with reference to accompanying drawing above, these detailed descriptions can not limit desired scope of the present invention in the appended claims.

Claims (4)

1, a kind of shift rotation type hardware controller of supporting polycyclic software flow, it is characterized in that this device comprises start and stop control and clock generating module, registers group, core generation module, displacement Spin Control module, logical row validity select generation module, logical row register to select generation module, mapping network and instruction generation module down; Wherein start and stop control and clock generating module are after receiving the beginning current signal, send clock signal 1 to registers group, after clock signal 1 enters registers group, under the control of displacement Spin Control module, registers group is upgraded and is stable, exports down content of registers to core generation module, logical row validity selection generation module, logical row register selection generation module and instruction generation module respectively; Then, following core generation module produces the new core that should be provided with when next clock signal 1 arrives, and exports registers group and displacement Spin Control module to; The validity that logical row validity selects generation module and logical row register to select generation module to generate all logical rows is respectively selected and register is selected, and sends into mapping network, and output becomes physical operations validity and register is selected; Simultaneously, the instruction generation module is exported pairing all the physics instructions of current core.
2, the device described in claim 1, it is characterized in that, export another clock signal 2 from described start and stop control and clock generating module to described mapping network and described instruction generation module, this signal makes described instruction generation module export the physics instruction that current core comprises, and makes described mapping network export pairing physical operations validity selection of all physical operationss and register selection in this instruction.
3, a kind of shift rotation type hardware controller of supporting polycyclic software flow, it is characterized in that this device comprises start and stop control and clock generating module, registers group, core generation module, displacement Spin Control module, logical row validity select generation module, logical row register to select generation module, mapping network and instruction generation module down; Wherein start and stop control and clock generating module are after receiving the beginning current signal, send clock signal 1 to registers group, after clock signal 1 enters registers group, under the control of displacement Spin Control module, registers group is upgraded and is stable, exports down content of registers to core generation module, logical row validity selection generation module and logical row register respectively and selects generation module; Then, following core generation module produces the new core that should be provided with when next clock signal 1 arrives, and exports registers group, displacement Spin Control module and instruction generation module respectively to; The validity that logical row validity selects generation module and logical row register to select generation module to generate all logical rows is respectively selected and register is selected, and sends into mapping network, and output becomes physical operations validity and register is selected; When registers group was upgraded, the following core of the following core generation module output of instruction generation module was input, produced pairing all the physics instructions of this core.
As claim 1 and 3 described devices, it is characterized in that 4, wherein, described registers group comprises: scalar register file H under m the one dimension 1, H 2..., H mM main logic column selection bit register C 1, C 2..., C mM current core logic column selection bit register S 1, S 2..., S mN-1 is to following scalar register file, and wherein, a pair of 2 dimensions are scalar register file i down 2And i 2', a pair of 3 dimensions are scalar register file i down 3And i 3' ..., a pair of n dimension is scalar register file i down nAnd i n'; And current core register K, wherein, m is the maximum operation number of instruction form defined; N is the maximum nested number of plies of Multiple Cycle; Scalar register file H under described each one dimension xOutput be connected to bus 1 by triple gate, if x=1, described register H xInput from bus 1 and described register H xValue subtract 1 and select among both, if x ≠ 1, described register H xInput from bus 1 and described register H X-1Select among both; Described each main logic column selection bit register C xOutput be connected to bus 2 by triple gate, if x=1, described register C xInput among bus 2 and an invalid flag, select, if x ≠ 1, described register C xInput from bus 2 and described register C X-1Select among both; Described each current core logic column selection bit register S xOutput be connected to bus 3 by triple gate, if x=1, described register S xInput among bus 3 and an effective marker, select, if x ≠ 1, described register S xInput from bus 3 and described register S X-1Select among both; Described each following scalar register file i xInput from the x dimension subscript upper bound or described register i xValue subtract 1 and select among both; Described each following scalar register file i x' input from described register i xPerhaps 0 select among both.
CN 00133535 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow Expired - Fee Related CN1108559C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 00133535 CN1108559C (en) 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 00133535 CN1108559C (en) 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow

Publications (2)

Publication Number Publication Date
CN1294345A true CN1294345A (en) 2001-05-09
CN1108559C CN1108559C (en) 2003-05-14

Family

ID=4595789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 00133535 Expired - Fee Related CN1108559C (en) 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow

Country Status (1)

Country Link
CN (1) CN1108559C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7631305B2 (en) 2003-09-19 2009-12-08 University Of Delaware Methods and products for processing loop nests

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7631305B2 (en) 2003-09-19 2009-12-08 University Of Delaware Methods and products for processing loop nests

Also Published As

Publication number Publication date
CN1108559C (en) 2003-05-14

Similar Documents

Publication Publication Date Title
US8977997B2 (en) Hardware simulation controller, system and method for functional verification
US9032377B2 (en) Efficient parallel computation of dependency problems
US10509876B2 (en) Simulation using parallel processors
KR101275698B1 (en) Data processing method and device
CN100342325C (en) Method and apparatus for register file port reduction in a multithreaded processor
CN1230740C (en) Digital signal processing apparatus
US20110067016A1 (en) Efficient parallel computation on dependency problems
KR100346515B1 (en) Temporary pipeline register file for a superpipe lined superscalar processor
KR20160046331A (en) High-performance processor system and method based on a common unit
CN1105138A (en) Register architecture for a super scalar computer
CN1434380A (en) Image processing device and method, and compiling program for said device
CN113326066B (en) Quantum control microarchitecture, quantum control processor and instruction execution method
KR100955433B1 (en) Cache memory having pipeline structure and method for controlling the same
CN1511280A (en) System and method for multiple store buffer for warding in system with limited memory model
JPH0743733B2 (en) Logical simulation method
US20120096292A1 (en) Method, system and apparatus for multi-level processing
JP2000305781A (en) Vliw system processor, code compressing device, code compressing method and medium for recording code compression program
US20150032995A1 (en) Processors operable to allow flexible instruction alignment
CN1596396A (en) Vliw architecture with power down instruction
JPH07104784B2 (en) Digital data processor
Lim Improving parallelism and data locality with affine partitioning
CN1108559C (en) Shift rotation type hardware controller for supporting polycyclic software flow
US5526496A (en) Method and apparatus for priority arbitration among devices in a computer system
CN1650258A (en) Automatic task distribution in scalable processors
CN1947092A (en) Methods and apparatus for multi-processor pipeline parallelism

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
BB1A Publication of application
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee