CN1108559C - Shift rotation type hardware controller for supporting polycyclic software flow - Google Patents

Shift rotation type hardware controller for supporting polycyclic software flow Download PDF

Info

Publication number
CN1108559C
CN1108559C CN 00133535 CN00133535A CN1108559C CN 1108559 C CN1108559 C CN 1108559C CN 00133535 CN00133535 CN 00133535 CN 00133535 A CN00133535 A CN 00133535A CN 1108559 C CN1108559 C CN 1108559C
Authority
CN
China
Prior art keywords
register
core
generation module
current core
selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 00133535
Other languages
Chinese (zh)
Other versions
CN1294345A (en
Inventor
容红波
汤志忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN 00133535 priority Critical patent/CN1108559C/en
Publication of CN1294345A publication Critical patent/CN1294345A/en
Application granted granted Critical
Publication of CN1108559C publication Critical patent/CN1108559C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

The present invention relates to a shift rotation type hardware controller for supporting multiple circulation software flow. Clock signals 1 are sent to a register group by a stop and start control module and a clock generating module of the shift rotation type hardware controller for supporting polycyclic software flow, the clock signals 1 are updated and stabilized by the register group, and the contents of the clock signals 1 are respectively output to a lower core generating module, etc. Then, a new core is generated by the lower core generating module. Effectiveness selection and register selection are respectively generated by a selection generating module and are sent into a mapping network to generate operation effectiveness selection and the register selection. All physical instructions corresponding to the current core are output by an instruction generating module. Core codes generated by multiple circulation software flow are automatically converted into an executable instruction sequence by the shift rotation type hardware controller for supporting multiple circulation software flow.

Description

A kind of shift rotation type hardware controller of supporting polycyclic software flow
Technical field
The present invention relates to a kind of shift rotation type hardware controller of supporting polycyclic software flow, it belongs to the very-long instruction word processor technical field in the field of computer technology.
Background technology
Circulation is a part of code that is rich in concurrency in the program most, takes the execution time most.Software flow is a kind of main technique of compiling of instruction-level parallelism in the development cycle.In this field, the focus that loop parallelization becomes invention and studies because of its importance is because of its difficulty is confined to the substance circulation.
The United States Patent (USP) name is called " the architecture support of polycyclic software flow ", and the patent No. is that US5958048 has proposed a kind of processor structure, has wherein comprised the support to polycyclic software flow.Its principle is: provide a cover loop parameter and status register and multiple branches to shift control structure, with the Control Circulation implementation, thereby produce such effect: the evacuation section of an interior loop can be overlapped with packing into of another interior loop.So compiler is controlled the code of carrying out as long as generate the code that is provided with of loop parameter and status register and needn't generate.Like this, compiler may more effectively carry out software flow to Multiple Cycle.
Above-mentioned United States Patent (USP) has been represented present most polycyclic software flow technology.Their basic thinking all is: at first innermost loop is carried out software flow; Then new innermost loop is regarded as an indivisible atomic operation, circulate so its skin circulation just can be regarded a substance as, continue to use the same method and carry out software flow (being equivalent to the emptying of overlapping two adjacent innermost loop and the part of packing into).This process can expand to outermost loop always.The concurrency that is developed depends primarily on innermost loop.Above-mentioned United States Patent (USP) is exactly the code that this process is produced, and the support of carrying out is provided.
But maximum concurrency not necessarily is present in innermost loop.It may be present in any one deck circulation.This depends on circulation itself.When the innermost loop concurrency was very little, by said process, continuing also can not significantly increase concurrency to skin circulation carrying out software flow.Because it is intrinsic that concurrency is a circulation institute, but not the device in the above-mentioned United States Patent (USP) can change, and therefore, it can't give full play to potentiality.This is an essential defect on the mentality of designing.
In addition, above-mentioned United States Patent (USP) is to packing into and special control is all carried out in emptying, and requires circulation to satisfy two requirements: (1) internal memory read operation must be positioned at first stage of logic loops body; (2) has the last stage that spinoff (for example internal memory is write, cycle interruption etc.) operation must be positioned at the logic loops body.
Summary of the invention
The objective of the invention is to design a kind of shift rotation type hardware controller of supporting polycyclic software flow, to overcome the shortcoming of above-mentioned prior art, the only core code that polycyclic software flow is generated is automatically converted to the executable instruction sequence.
The shift rotation type hardware controller of the support polycyclic software flow of the present invention's design comprises:
A registers group, comprise scalar register file and a plurality of multidimensional subscript register under a current core register, a plurality of main logic column selection bit register, a plurality of current core logic column selection bit register, a plurality of one dimension, be respectively applied for the current core of storage, main logic row, the logical row of current core, peacekeeping multidimensional subscript data;
Start and stop control and clock generating module are used for sending first clock signal of control register group displacement rotation to registers group after receiving the beginning current signal;
A displacement Spin Control module, be used for after first clock signal enters registers group, make the Data Update of storing in the registers group, and with scalar register file under the one dimension, data in main logic column selection bit register and the current core register export down the core generation module to, export the data in scalar register file under the one dimension and the current core logic column selection bit register to logical row validity and select generation module, export the data of main logic column selection bit register to logical row register selection generation module, export the data in the current core register to the instruction generation module;
A following core generation module, be used for after the first clock enabling signal registers group is upgraded, output according to scalar register file, main logic column selection bit register and current core register under the one dimension produces the new core that should be provided with when next first clock signal arrives, and exports it to registers group and displacement Spin Control module;
A logical row validity selects generation module and a logical row register to select generation module, be used to generate the validity selection and the register selection of all logical rows, and the logical row validity that generated selected and register selects data to send into mapping network, output physics efficient in operation selection and register selection;
An instruction generation module is used for producing pairing all the physics instructions of current core according to the output of current core register.
Start and stop control and clock generating module in the said apparatus, be used for receiving the beginning current signal, after registers group is sent first clock signal and registers group is upgraded, output according to current core register is exported the second clock signal to described mapping network and described instruction generation module, this signal makes described instruction generation module export the physics instruction that current core comprises, and makes described mapping network export pairing physical operations validity selection of all physical operationss and register selection in this instruction.
The another kind of version of the shift rotation type hardware controller of the support polycyclic software flow of the present invention's design comprises:
A registers group, comprise scalar register file and a plurality of multidimensional subscript register under a current core register, a plurality of main logic column selection bit register, a plurality of current core logic column selection bit register, a plurality of one dimension, be respectively applied for the current core of storage, main logic row, the logical row of current core, peacekeeping multidimensional subscript data;
Start and stop control and clock generating module, be used for after receiving the beginning current signal, send first clock signal to registers group, after the first clock enabling signal registers group is upgraded, send the second clock signal to mapping network and instruction generation module according to the output of current core register in the registers group;
A displacement Spin Control module, be used for after first clock signal enters registers group, the control register group is upgraded, and with scalar register file under the one dimension, data in main logic column selection bit register and the current core register export down the core generation module to, export the data in scalar register file under the one dimension and the current core logic column selection bit register to logical row validity and select generation module, export the data of main logic column selection bit register to logical row register selection generation module, export the data in the current core register to start and stop control and clock generating module;
A following core generation module is used to produce the new core that should be provided with when next first clock signal arrives, and exports it to registers group, displacement Spin Control module and instruction generation module respectively;
A logical row validity selects generation module and a logical row register to select generation module, is used to generate the validity selection and the register selection of all logical rows, and sends into mapping network, and output physics efficient in operation selection and register are selected;
An instruction generation module is used to export pairing all the physics instructions of current core.
The shift rotation type hardware controller of the support polycyclic software flow of the present invention's design, have following beneficial effect: circulate for the relevant big class multiple (comprising substance) of nothing<type that the present invention was suitable for (1), and (definition that<type is relevant is: correlation distance is in flow control 2 to n dimension components, first non-0 component is less than 0), generally speaking, the concurrency that the present invention supported is not less than above-mentioned prior art.Unlike the prior art, basic ideas of the present invention are: " that one deck that selection has maximum concurrency carries out software flow (being parallelization), other layer serialization ".Compiler is given device of the present invention according to the core code of this thinking generation Multiple Cycle with it, and this device just is that this core code dynamically produces an instruction sequence.In theory, the concurrency of this parallelization code is not less than the code that above-mentioned prior art produces at least, so the concurrency that the present invention supported is not less than above-mentioned prior art at least.When having big concurrency in other layer, the concurrency that the present invention supported will be significantly greater than above-mentioned prior art when the innermost loop concurrency is very little.(2) the present invention does not need to pack into and emptying control.Pack into, the unified without distinction control of emptying and core.Therefore simplified control.Pack into and emptying in shifting process, can nature produce and finish: packing into starts from displacement for the first time, and at this moment, some logical rows are because corresponding current core logic column selection is invalid and not selected; Emptying starts from the displacement after the last rotation, and at this moment, some logical rows are because the first corresponding dimension subscript is not selected less than 0.(3) the present invention has no requirement for the particular location of operation in the circulation.
In addition, an advantage applies of the present invention exists: realize that cost is lower.Its employing operation is motionless, and the method for subscript displacement or rotation makes that the input of mapping network is the register selection and the validity selection (having only two lines) of each logical row; And if the employing subscript is motionless, the method for operation displacement or rotation, then the input of mapping network comprises the validity selection of each logical row at least, and all the root lines (sum is inevitable greatly more than 2) that are somebody's turn to do an operation in the row.
Description of drawings
Fig. 1 is the structured flowchart of apparatus of the present invention.
Fig. 2 is an embodiment of apparatus of the present invention.
Fig. 3 is an another kind of structured flowchart of the present invention.
Fig. 4 is the control circuit schematic diagram of register H, C in the displacement Spin Control module in apparatus of the present invention, S.
Fig. 5 is an example of input and output of the present invention.
Among all figure, single position signal represents that with fine rule a plurality of signals are represented with thick line.
Embodiment
Introduce content of the present invention below in detail, and in conjunction with the accompanying drawings, introduce principle of work of the present invention and embodiment.
Referring now to Fig. 1.Scalar register file H under the one dimension in the registers group 101 1, H 2..., H mBe referred to as H, main logic column selection bit register C 1, C 2..., C mBe referred to as C, current core logic column selection bit register S 1, S 2..., S mBe referred to as S.
The output of C, H, K is directly coupled to down the input of core generation module 102, and this module is exported next core K ', is coupled to the input of current core register K and displacement Spin Control module 103.Row are not last row of current core, K '=K so if C shows main logic; Otherwise if for the circulation of x layer, the main logic row are not last row or the H of this circulation core x≠ 0, and for any one circulation of its inside, as the circulation of y layer, the main logic row are last row and the H of this circulation core y=0, so, make K ' expression x layer round-robin core.
The renewal of displacement Spin Control module 103 control register groups.At first, it controls input selection and the output triple gate of H, C, S.Specifically,
1, when K ' indication outermost loop core, for each group H x, C x, S x, make the triple gate that bus 1, bus 2, bus 3 are led in their output be in high-impedance state; Select H x-1 (if x=1) or H X-1If (x ≠ 1) is as H xInput; Select invalid flag (if x=1) or C X-1If (x ≠ 1) is as C xInput; Select effective marker (if x=1) or S X-1If (x ≠ 1) is as S xInput.
2 otherwise, be listed as if first logical row of K ' and last logical row are respectively logic core x row and y, then make H y, C y, S yOutput is opened (allowing output to pass to bus) towards the triple gate of bus 1, bus 2, bus 3, makes other all triple gates be in high-impedance state; For each group H z, C z, S z(x<z≤y), select H respectively Z-1C Z-1, S Z-1As input; For H x, C x, S x, select bus 1, bus 2, bus 3 respectively as input; Each group H for other u, C u, S u(u>y or u<x), they are kept.
The input that displacement Spin Control module 103 is also controlled other register is selected.Specifically,
1, when K ' indication outermost loop core, selects 0 as i 2', i 3' ..., i n' input, select N 2, N 3..., N nAs i 2, i 3..., i nInput.
2 otherwise, select i 2, i 3..., i nAs i 2', i 3' ..., i n' input; If K ' indication x layer round-robin core makes i 2, i 3..., i X-1Keep, select i X-1As i xInput, select N X+1, N X+2..., N nAs i X+1, i X+2..., i nInput.
In addition, the input of current core register K K ' always.
It is input with H, S that logical row validity is selected generation module 104, selects signal for each logical row produces a validity, outputs to mapping network 106.For example, for x logical row, produce a validity and select signal E xIf: S xEffectively and H x〉=0, then make E xEffectively, otherwise invalid.
It is input with C that the logical row register is selected generation module 105, for each logical row produces a register selection signal, outputs to mapping network 106.For example, for x logical row, produce a register selection signal R x: if C shows that the main logic row are y logical rows, and x>y then makes R xHave particular value (for example, high level), 2~n dimension subscript of representing this logical row all operations is from i 2', i 3' ..., i n'; Otherwise, make R xHave another particular value (for example, low level), 2~n dimension subscript of representing this logical row all operations is from i 2, i 3..., i n
Mapping network 106 is expressed the mapping relations of a logic core to the physics core, the validity of logic core each row and column selected and register is selected to be mapped to the corresponding row and column of physics core and got on, output becomes that physical operations validity is selected and the register selection.For example, if the operation of the capable y row of the x of logic core is positioned at the capable z row of the x of physics core, then its validity is selected E y, register selects R yBe mapped to that physics core x is capable to instruct z operation, the validity that makes it to become this operation is selected and the register selection, if i.e.: E yInvalid, then this operation will not be performed; Otherwise this operation will be performed, and when carrying out according to R yValue decision quote i 2', i 3' ..., i n' or i 2, i 3..., i nThe implementation method of mapping network 106 is a lot, is that the general engineering technology personnel are familiar with in this area.The most usually, can use the right-angled intersection network.
Instruction generation module 107 is input with current core K, pairing all the physics instructions of output K.Also multiple implementation method can be arranged: the physics core can be left in some registers, then will be wherein pairing all physics of K instruct disposable whole output, perhaps instruction ground output of an instruction.This also is that the general engineering technology personnel accomplish easily in this area.
Start and stop control and clock generating module 108 are input with a commencing signal, export a clock signal 1 to registers group 101.
In general, the input of this device is a commencing signal, and output is that instruction, physical operations validity are selected and register is selected, 1~n ties up subscript.Output can directly be delivered to functional part and carry out, and also can put into the instruction buffer, and this depends on the concrete needs of processor.
Introduce the principle of work of apparatus of the present invention below.After commencing signal arrives, start and stop control and clock generating module 108 clockings 1.After clock signal 1 was sent into registers group 101, registers group was upgraded.Specifically, because the control of 103 pairs of register input and output of above-mentioned displacement Spin Control module makes:
1, when K ' indication outermost loop core, H, C, S displacement.That is: H 2, H 3..., H mValue become original H respectively 1, H 2..., H M-1Value, H 1Subtract 1; C 2, C 3..., C mValue become original C respectively 1, C 2..., C M-1Value, C 1It is invalid to be changed to; S 2, S 3..., S mValue become original S respectively 1, S 2..., S M-1Value, S 1Be changed to effectively.Unique exception is: if K indication this moment is not the outermost loop core, then C is not shifted, but is forced to indicate first logical row of innermost loop core.That is: make each C xBe changed to effective (if logic core x row are first logical rows of innermost loop core) or be changed to invalid (if logic core x row are not first logical rows of innermost loop core).
2 otherwise, H, C, S are corresponding to that part of rotation of K '.That is: if first logical row of K ' and last logical row are respectively logic core x row and y row, then H y, H Y-1..., H X+1Value become original H respectively Y-1, H Y-2..., H xValue, H xBecome original H yValue; C y, C Y-1..., C X+1Value become original C respectively Y-1, C Y-2..., C xValue, C xBecome original C yValue; S y, S Y-1..., S X+1Value become original S respectively Y-1, S Y-2..., S xValue, S xBecome original S yValue.Other H u, C u, S u(u>y or u<x) register is all constant.
For other register, because displacement Spin Control module 103 makes K become the value of K ' to the selection of its input, and:
1, when K ' indication outermost loop core, i 2', i 3' ..., i n' clear 0, i 2, i 3..., i nBecome N respectively 2, N 3..., N n
2 otherwise, i 2', i 3' ..., i n' become original i respectively 2, i 3..., i nValue; If K ' indication x layer round-robin core, then i 2, i 3..., i X-1Constant, i xSubtract 1, i X+1, i X+2..., i nBecome N respectively X+1, N X+2..., N n
After registers group 101 was finished above-mentioned renewal, following core generation module 102 produced the new core that should be provided with when next clock signal 1 arrives; The validity that logical row validity selects generation module 104 and logical row register to select generation module 105 to generate all logical rows is respectively selected and register is selected, and sends into mapping network 106, and output becomes physical operations validity and register is selected; Simultaneously, pairing all physics instructions of the current core of instruction generation module 107 outputs.
After above-mentioned work was finished, row were not last row of outermost loop core, perhaps H if C shows main logic 1≠ 0, then start and stop control and clock generating module 108 produce next clock signal 1, otherwise do not produce next clock signal 1 (device quits work).
Fig. 2 shows one embodiment of the present of invention.It forms with Fig. 1 in full accord, but more specifically a bit: instruction generation module 107 is taked way that instruction is exported of an instruction.For reaching this purpose, and increased a signal: the clock signal 2 that outputs to mapping network 106 and instruction generation module 107 from start and stop control and clock generating module 108.In addition, K outputs to start and stop control and clock generating module 108.
Start and stop control and clock generating module 108 are sent clock signal 1 to registers group 101, after registers group is upgraded and is stablized output, send x clock signal 2 (supposing that current core K comprises the instruction of x bar) to mapping network 106 and instruction generation module 107, each clock signal 2 makes instructs the generation module 107 outputs physics that current core K comprised to instruct, and makes pairing physical operations validity selection of all physical operationss and register selection in mapping network 106 these instructions of output.
After above-mentioned work was finished, row were not last row of outermost loop core, perhaps H if C shows main logic 1≠ 0, then repeat said process, otherwise, do not produce next clock signal 1 and clock signal 2 (device quits work).
Fig. 3 is an another kind of version block diagram of the present invention.It and Fig. 2 are basic identical, and still, the input of instruction generation module 107 is not current core K, but core K ' down.To make registers group upgrade like this and instruction generates, selects signal map no longer to resemble Fig. 2 serial to carry out, but overlapping parallel, so efficient is higher.Principle is as follows:
1, send before the commencing signal, initialization register group 101 makes C 1, S 1Effectively, K indication outermost loop core, H 1=N 1, i 2, i 3..., i nBe respectively N 2, N 3..., N n
2, send after the commencing signal, start and stop control and clock generating module 108 are not sent clock signal 1, but send x clock signal 2 (supposing that the outermost loop core comprises the instruction of x bar) to mapping network 106 and instruction generation module 107, each clock signal 2 makes a physics instruction (no matter importing K ' is what) of instructing generation module 107 output outermost loop cores to be comprised, and makes pairing physical operations validity selection of all physical operationss and register selection in mapping network 106 these instructions of output.
Row are last row of outermost loop core if 3 C show main logic, and H 1=0, no longer clocking 1 and clock signal 2 (device quits work) of start and stop control and clock generating module 108 then.Otherwise it sends a clock signal 1, begins to send x clock signal 2 (supposing that K ' comprises the instruction of x bar) simultaneously.So registers group is upgraded, instruct generation module 107 to instruct simultaneously for all physics of K ' generation correspondence, the pairing physical operations validity of all physical operationss is selected and the register selection in these instructions of mapping network 106 outputs.In other words, upgrading in registers group 101, stablizing as yet in the output, instruction generation module 107 has just instructed for this uncreated new current core K ' generation, and mapping network 106 has also generated for it and selected signal.This temporal overlapping, make functional module not have and obtain required instruction with waiting for and select signal.
4, repeating step 3 is until stopping.
Referring now to Fig. 4.Fig. 4 is the control circuit schematic diagram of register H, C, S in Fig. 1, Fig. 2, the displacement Spin Control module 103 shown in Figure 3.The annexation of H, C, S as mentioned above.U xBe when K ' be x layer circulation time, output triple gate control signal, enable signal and the input select signal of H, C, S.First logical row of signal F indication innermost loop core.Rotation finishes judge module 219 in K ' indication outermost loop core and K when not indicating the outermost loop core produces the rotation end signal, as another input selection of C.
The principle of work of control circuit schematic diagram shown in Figure 4 is as follows: when K ' is an x layer circulation time, n selects 1 selector switch 201 under the control of K ', selects U xOutput triple gate control signal, enable signal and input select signal as H, C, S.For example: from U xObtain H m, G m, S mOutput triple gate control signal 202, enable signal 203 and input select signal 204.The input of module 213 is selected in input select signal 204 control inputs: first group 205 (from bus 1, bus 2, bus 3), second group 220 (from H M-1, C M-1, S M-1), among the figure, same group of input linked up with circle and represented.The situation all fours of other register.When clock signal 1 arrived, each register will be according to these control signals, and whether decision can output to bus; Be to keep raw content constant, or update content is an input value; If upgrade, select any group input.
Unique special case is: produce the rotation end signals if rotation finishes judge module 219, so, this signal will force input to select module to select F (for example, this signal forces input to select the m position 224 of module 213 selection F as C as the input of C mInput), thereby when clock signal 1 arrived, C became the value of F.
Fig. 5 is an example of explanation input and output of the present invention.Fig. 5 a is one two and recirculates that three physical operations O are arranged 1, O 2, O 3, the execution time delay of supposing each operation is 1.Data dependence graph is shown in Fig. 5 b.Wherein, target is the correlation distance vector on every limit.
Because the first dimension component of all correlation distance vectors all is 0, this explanation: there is not correlationship between different outer loop bodies, and in same outer circulation, because associated loop O 1→ O 2→ O 3→ O 1, make interior loop serial to carry out.Therefore, compiler is that the above-mentioned United States Patent (USP) core code that can generate is shown in Fig. 5 c according to current techniques, according to the method for " that one deck that selection has maximum concurrency carries out software flow (being parallelization), other layer serialization " is that core code that apparatus of the present invention generated is shown in Fig. 5 d.In two kinds of core codes, outer all is identical with the interior loop core.
Suppose N 1=6, N 2=2.Core code shown in Fig. 5 c is sent into the described device of above-mentioned United States Patent (USP), produce operation result shown in Fig. 5 e.Each operation the right is its subscript.As seen, its implementation is serial fully, and degree of parallelism is 1.
Core code shown in Fig. 5 d is sent into apparatus of the present invention, produce operation result shown in Fig. 5 f.As seen, in its implementation, always have 3 outer loop bodies carrying out simultaneously, degree of parallelism is 3, and the execution time only is 1/3 of the described device of above-mentioned United States Patent (USP).
Though describe the present invention in detail with reference to accompanying drawing above, these detailed descriptions can not limit desired scope of the present invention in the appended claims.

Claims (3)

1, a kind of shift rotation type hardware controller of supporting polycyclic software flow, comprise a registers group, be respectively: scalar register file and a plurality of multidimensional subscript register under current core register, a plurality of main logic column selection bit register, a plurality of current core logic column selection bit register, a plurality of one dimension;
It is characterized in that described register is respectively applied for the current core of storage, main logic row, the logical row of current core, peacekeeping multidimensional subscript data; Described device also comprises:
Start and stop control and clock generating module are used for sending first clock signal of control register group displacement rotation to registers group after receiving the beginning current signal; This device comprises:
A displacement Spin Control module, be used for after first clock signal enters registers group, make the Data Update of storing in the registers group, and with scalar register file under the one dimension, data in main logic column selection bit register and the current core register export down the core generation module to, export the data in scalar register file under the one dimension and the current core logic column selection bit register to logical row validity and select generation module, export the data of main logic column selection bit register to logical row register selection generation module, export the data in the current core register to the instruction generation module;
A following core generation module, be used for after the first clock enabling signal registers group is upgraded, output according to scalar register file, main logic column selection bit register and current core register under the one dimension produces the new core that should be provided with when next first clock signal arrives, and exports it to registers group and displacement Spin Control module;
A logical row validity selects generation module and a logical row register to select generation module, be used to generate the validity selection and the register selection of all logical rows, and the logical row validity that generated selected and register selects data to send into mapping network, output physics efficient in operation selection and register selection;
An instruction generation module is used for producing pairing all the physics instructions of current core according to the output of current core register.
2, device described in claim 1, it is characterized in that, wherein said start and stop control and clock generating module, be used for receiving the beginning current signal, after registers group is sent first clock signal and registers group is upgraded, output according to current core register is exported the second clock signal to described mapping network and described instruction generation module, this signal makes described instruction generation module export the physics instruction that current core comprises, and makes described mapping network export pairing physical operations validity selection of all physical operationss and register selection in this instruction.
3, a kind of shift rotation type hardware controller of supporting polycyclic software flow, comprise a registers group, be respectively: scalar register file and a plurality of multidimensional subscript register under current core register, a plurality of main logic column selection bit register, a plurality of current core logic column selection bit register, a plurality of one dimension;
It is characterized in that above-mentioned each register is respectively applied for the current core of storage, main logic row, the logical row of current core, peacekeeping multidimensional subscript data; Said apparatus also comprises:
Start and stop control and clock generating module, be used for after receiving the beginning current signal, send first clock signal to registers group, after the first clock enabling signal registers group is upgraded, send the second clock signal to mapping network and instruction generation module according to the output of current core register in the registers group;
A displacement Spin Control module, be used for after first clock signal enters registers group, the control register group is upgraded, and with scalar register file under the one dimension, data in main logic column selection bit register and the current core register export down the core generation module to, export the data in scalar register file under the one dimension and the current core logic column selection bit register to logical row validity and select generation module, export the data of main logic column selection bit register to logical row register selection generation module, export the data in the current core register to start and stop control and clock generating module;
A following core generation module is used to produce the new core that should be provided with when next first clock signal arrives, and exports it to registers group, displacement Spin Control module and instruction generation module respectively; A logical row validity selects generation module and a logical row register to select generation module, is used to generate the validity selection and the register selection of all logical rows, and sends into mapping network, and output physics efficient in operation selection and register are selected;
An instruction generation module is used to export pairing all the physics instructions of current core.
CN 00133535 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow Expired - Fee Related CN1108559C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 00133535 CN1108559C (en) 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 00133535 CN1108559C (en) 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow

Publications (2)

Publication Number Publication Date
CN1294345A CN1294345A (en) 2001-05-09
CN1108559C true CN1108559C (en) 2003-05-14

Family

ID=4595789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 00133535 Expired - Fee Related CN1108559C (en) 2000-11-07 2000-11-07 Shift rotation type hardware controller for supporting polycyclic software flow

Country Status (1)

Country Link
CN (1) CN1108559C (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005029318A2 (en) 2003-09-19 2005-03-31 University Of Delaware Methods and products for processing loop nests

Also Published As

Publication number Publication date
CN1294345A (en) 2001-05-09

Similar Documents

Publication Publication Date Title
JP7210078B2 (en) Memory network processor with programmable optimization
CN100342325C (en) Method and apparatus for register file port reduction in a multithreaded processor
US6567895B2 (en) Loop cache memory and cache controller for pipelined microprocessors
CN100538628C (en) Be used for system and method in SIMD structure processing threads group
KR101275698B1 (en) Data processing method and device
KR20160046331A (en) High-performance processor system and method based on a common unit
CN1434380A (en) Image processing device and method, and compiling program for said device
US20110320680A1 (en) Method and Apparatus for Efficient Memory Bank Utilization in Multi-Threaded Packet Processors
CN1105138A (en) Register architecture for a super scalar computer
JP2008535074A (en) Creating instruction groups in processors with multiple issue ports
CN113326066B (en) Quantum control microarchitecture, quantum control processor and instruction execution method
CN1511280A (en) System and method for multiple store buffer for warding in system with limited memory model
US20240037314A1 (en) Systems and methods for intelligently buffer tracking for optimized dataflow within an integrated circuit architecture
US20120096292A1 (en) Method, system and apparatus for multi-level processing
CN1494677A (en) Digital signal processing apparatus
JP2000305781A (en) Vliw system processor, code compressing device, code compressing method and medium for recording code compression program
WO2003007153A2 (en) Facilitating efficient join operations between a head thread and a speculative thread
CN1596396A (en) Vliw architecture with power down instruction
CN1690951A (en) Optimized processors and instruction alignment
CN1108559C (en) Shift rotation type hardware controller for supporting polycyclic software flow
CN1947092A (en) Methods and apparatus for multi-processor pipeline parallelism
CN1195267C (en) FPGA and CPLD based impulse sequence programmer
US20080244152A1 (en) Method and Apparatus for Configuring Buffers for Streaming Data Transfer
US20080120497A1 (en) Automated configuration of a processing system using decoupled memory access and computation
CN1860436A (en) Method and system for processing a loop of instructions

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
BB1A Publication of application
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee