WO2011125174A1 - 動的再構成プロセッサ及びその動作方法 - Google Patents
動的再構成プロセッサ及びその動作方法 Download PDFInfo
- Publication number
- WO2011125174A1 WO2011125174A1 PCT/JP2010/056227 JP2010056227W WO2011125174A1 WO 2011125174 A1 WO2011125174 A1 WO 2011125174A1 JP 2010056227 W JP2010056227 W JP 2010056227W WO 2011125174 A1 WO2011125174 A1 WO 2011125174A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- clock
- computing unit
- minimum set
- sub
- Prior art date
Links
- 238000011017 operating method Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims description 87
- 238000004364 calculation method Methods 0.000 claims description 43
- 230000008569 process Effects 0.000 claims description 43
- 230000008439 repair process Effects 0.000 claims description 19
- 238000010276 construction Methods 0.000 abstract 4
- 102100040862 Dual specificity protein kinase CLK1 Human genes 0.000 description 35
- 101000749294 Homo sapiens Dual specificity protein kinase CLK1 Proteins 0.000 description 34
- 230000002265 prevention Effects 0.000 description 18
- 102100040844 Dual specificity protein kinase CLK2 Human genes 0.000 description 15
- 101000749291 Homo sapiens Dual specificity protein kinase CLK2 Proteins 0.000 description 15
- 230000015572 biosynthetic process Effects 0.000 description 15
- 238000003786 synthesis reaction Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 230000000630 rising effect Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000007704 transition Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 101100350613 Arabidopsis thaliana PLL1 gene Proteins 0.000 description 3
- 239000002245 particle Substances 0.000 description 3
- 101100082028 Arabidopsis thaliana PLL2 gene Proteins 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010363 phase shift Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/04—Generating or distributing clock signals or signals derived directly therefrom
- G06F1/06—Clock generators producing several clock signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
- G06F15/7885—Runtime interface, e.g. data exchange, runtime control
- G06F15/7892—Reconfigurable logic embedded in CPU, e.g. reconfigurable unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
- G06F9/3869—Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
Definitions
- the present invention relates to a dynamic reconfigurable processor that executes each instruction by executing a series of steps for each instruction, and an operation method thereof.
- an arithmetic processing device including a rewritable memory (RAM) for storing arithmetic unit configuration information and a dedicated arithmetic unit that constitutes a predetermined dedicated arithmetic unit based on the arithmetic unit configuration information in the memory.
- RAM rewritable memory
- This dedicated arithmetic unit is composed of an FPGA (Field Programmable Gate Array).
- RISC Reduced Instruction Set Computer
- processing is performed in a cycle of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB).
- IF fetch
- ID decode
- EX execute
- DC data cache
- WB write back
- Execute is executed using a computing unit prepared in advance for each instruction as hardware of the CPU.
- pipeline processing and the like are executed for speeding up.
- an arithmetic unit (hardware) must be prepared in advance in the CPU, but in reality, only one arithmetic unit operates at the same time, and other arithmetic units Are all stopped.
- the dedicated arithmetic unit can be configured by the FPGA, so that the number of basic arithmetic units to be prepared in the basic arithmetic unit can be reduced, and the operation speed and apparatus can be increased. Can be miniaturized.
- the present invention relates to a process for dynamically configuring an arithmetic unit in accordance with an instruction, a process for performing an arithmetic operation using the configured arithmetic unit, and a dynamic reconfigurable processor that can be completed without delay, and its operation
- the purpose is to provide a method.
- a dynamically reconfigurable processor that executes each instruction by executing a series of steps for each instruction, A dynamically configured computing unit that dynamically configures a computing unit corresponding to the instruction; A main clock and a clock generation circuit for generating a sub clock different from the main clock, In each of the series of steps, a start timing is defined based on the main clock, except for an instruction execution step of executing an instruction using the dynamic configuration calculator.
- the instruction execution step of executing an instruction using the dynamic configuration calculator includes an operator generation substep for dynamically configuring an operator corresponding to the instruction by the dynamic configuration calculator, and the calculator generation substep.
- An operation sub-process for performing an operation corresponding to the instruction by the operation unit configured by:
- the arithmetic unit generation sub-process and the arithmetic sub-process have start timings defined based on the sub-clock,
- the sub-clock is generated in such a manner that the arithmetic unit generation sub-process and the instruction execution sub-process are completed before a start timing of a process immediately after the instruction execution process.
- a configuration processor is provided.
- a processor operation method including a fetch step for fetching an instruction, a decode step for decoding the fetched instruction, an execute step, and a data cache step.
- the execute step includes an arithmetic unit generation sub-step for dynamically configuring an arithmetic unit corresponding to the instruction, and an arithmetic sub-step for performing an operation corresponding to the instruction by the arithmetic unit configured in the arithmetic unit generation sub-step.
- the method is Executing the fetch step at a first timing defined by the main clock; Executing the decoding step at a second timing defined by the main clock; In place of the third timing defined by the main clock, the computing unit generation sub-step is executed at a first timing defined by a subclock different from the main clock, and is defined by the subclock. The calculation sub-step is executed at the second timing A method is provided, wherein the data cache process is executed at a fourth timing defined by the main clock.
- a process for dynamically configuring a computing unit according to an instruction, a process for performing a computation by the configured computing unit, and a dynamic reconfigurable processor that can be completed without delay and its operation A method is obtained.
- FIG. 10 is a diagram illustrating another example (second delay prevention method) of the configuration of the clock generation circuit 12; It is a figure which shows the principle of the delay prevention function implement
- FIG. 1 is a diagram schematically showing a configuration of a dynamic reconfiguration processor 1 according to an embodiment (embodiment 1) of the present invention.
- the dynamic reconfiguration processor 1 includes a CPU 10 and a clock generation circuit 12.
- the clock generation circuit 12 generates two clocks CLK1 and CLK2 necessary for the operation of the CPU 10.
- the clock CLK1 is a main clock.
- the clock CLK2 is a special clock generated for preventing a delay described later.
- the configuration of the clock generation circuit 12 and the function of the clock CLK2 will be described in detail later.
- the term clock refers to the main clock until the description up to FIG. In the description after FIG. 19, the two clocks CLK1 and CLK2 will be described separately.
- the CPU 10 includes a minimum set computing unit 11 constituting an instruction execution unit (mainly an arithmetic circuit).
- the CPU 10 may include general configurations of an instruction decoder control circuit, an instruction cache, a register file, a data cache, and the like (not shown) other than the arithmetic circuit.
- a memory (ROM, RAM, etc.) is connected to the CPU 10.
- the minimum set calculator 11 includes a minimum gate (or element) that can generate calculators corresponding to all instruction sets.
- the total instruction set may be an entire instruction set included in the software incorporated in the dynamic reconfiguration processor 1 or an instruction included in the software incorporated in the dynamic reconfiguration processor 1 in order to have versatility. All instruction sets including other instructions may be used. “Generable” means that it can be generated theoretically, regardless of whether it is actually generated or not.
- FIG. 2 is a table showing an example of a setting method of the minimum set calculator 11.
- the minimum set computing unit 11 includes an FPGA (Field Programmable Gate Array) including a minimum gate capable of generating computing units corresponding to all instruction sets.
- the minimum set computing unit 11 is configured to include a minimum number of gates in a gate level gate unit for so-called FPGA synthesis.
- the gate for FPGA synthesis is configured by combining a gate for ASIC (application specific integrated circuit) logic synthesis such as NAND, NOR and NOT, and a complex gate (gate for ASIC logic synthesis) such as AND and OR. Gate).
- ASIC application specific integrated circuit
- AND gate for ASIC logic synthesis
- AND is a gate configured by combining NAND and NOT
- OR is a gate configured by combining NOR and NOT.
- FIG. 2 shows each computing unit corresponding to each instruction included in the entire instruction set.
- the arithmetic unit 1C is an arithmetic unit for executing a 16-bit addition instruction without a carry, for example, 30 pieces of 2-input AND gates, 20 pieces of OR gates, 40 pieces of NOT gates, and MUX gates. Means four, DFF (D flip-flop) 17 or the like.
- the arithmetic units C2,..., Cn (n is the number of arithmetic units corresponding to the respective instructions of the entire instruction set) are all the instruction sets (all instructions excluding the addition instruction related to the arithmetic unit 1C). Represents another arithmetic unit corresponding to each instruction. It should be noted that the numbers shown in the table are merely exemplary and are not correct.
- OR gates are prepared, With respect to the NOT gate, the maximum number (40, 30,..., 20 in this example) necessary to configure each of all the arithmetic units C1,. In this example, 40) NOT gates are prepared. Similarly, the number of XOR gates required to configure each of all the arithmetic units C1,..., Cn corresponding to all instruction sets (this example) , 0, 4,..., 0) (4 in this example) XOR gates are prepared. Similarly, for MUX gates, all arithmetic units C1,... Corresponding to all instruction sets are prepared.
- Cn are provided with the maximum number (8 in this example) of MUX gates (4, 8,..., 5 in this example) necessary for configuring each of the. Is necessary to configure each of all the arithmetic units C1,..., Cn corresponding to all instruction sets.
- the maximum number (17 in this example) of DFF gates (in this example, 17, 8,..., 16) is prepared, and so on. The minimum number of gates necessary to be able to generate any one of these is provided.
- FIG. 3 is a table showing another example of the setting method of the minimum set computing unit 11.
- the minimum set computing unit 11 is configured to include a minimum number of gates in units smaller than the gate level gate unit for FPGA synthesis.
- the minimum set computing unit 11 is configured to include a minimum number of gates in a gate level gate unit for so-called ASIC logic synthesis. That is, the minimum set calculator 11 is configured to include a minimum number of gates in units of NAND, NOR, and NOT.
- FIG. 3 shows each computing unit corresponding to each instruction included in the entire instruction set, as in FIG.
- the way of viewing the table of FIG. 3 is the same as that of FIG.
- the required number of each gate of NAND, NOR and NOT is shown for each of all the arithmetic units C1,..., Cn corresponding to all instruction sets. It should be noted that the numbers shown in the table are merely exemplary and are not correct.
- FIG. 4 is a table showing still another example of the setting method of the minimum set computing unit 11. It should be noted that the numbers shown in the table are merely exemplary and are not correct.
- the minimum set computing unit 11 is configured to include a minimum number of elements in units smaller than a gate level gate unit for ASIC logic synthesis.
- the minimum set computing unit 11 is configured to include a PchMOSFET (Metal-Oxide-Semiconductor Field-Effect Transistor) and a minimum element in NchMOSFET level elements. That is, the minimum set calculator 11 is configured to include the minimum PchMOSFET and NchMOSFET necessary to be able to generate any one of all the calculators C1,..., Cn.
- PchMOSFET Metal-Oxide-Semiconductor Field-Effect Transistor
- the particle size of the example shown in FIG. 3 is smaller than that of the example shown in FIG. 2, and the particle size of the example shown in FIG. 4 is smaller than that of the example shown in FIG.
- the smaller the particle size the less waste.
- the smaller the granularity the longer the time required to configure a computing unit to be described later using the minimum set computing unit 11.
- the minimum set computing unit 11 configured in this way can constitute any one of the computing units C1,..., Cn corresponding to all instruction sets. That is, the minimum set computing unit 11 configures an arbitrary computing unit among all the computing units C1,..., Cn by connecting gates (or elements) based on corresponding connection information. can do.
- the connection information may be prepared in advance corresponding to each of all the arithmetic units C1,..., Cn (that is, prepared in advance corresponding to each of all instruction sets) and stored in the memory. Note that the connection information is defined according to the minimum unit of the minimum set calculator 11. For example, when the minimum set calculator 11 is constructed with the FPGA synthesis gate unit as the minimum unit as in the example shown in FIG.
- connection information that is, each AND, each AND, each FPGA synthesis gate unit.
- Information indicating the connection mode between the gates such as OR
- the minimum set calculator 11 is constructed with the ASIC logic synthesis gate unit as the minimum unit as in the example shown in FIG. 3
- the connection information that is, each NAND
- Information indicating a connection mode between each NOR and each NOT gate is generated and stored.
- the minimum set computing unit 11 is constructed with the element unit of the PchMOSFET and the NchMOSFET as the minimum unit as in the example shown in FIG.
- connection information that is, the source of each PchMOSFET
- the connection information (That is, the source of each PchMOSFET) in the element unit of the PchMOSFET and the NchMOSFET. (Information indicating the connection mode between the drain and the source and drain of each Nch MOSFET) is generated and stored.
- FIG. 5 is a diagram showing an example of a time series when a single thread (no pipeline) is realized by a single minimum set computing unit 11 according to the present embodiment.
- FIG. 6 is a diagram illustrating a transition mode of the arithmetic unit configured by the minimum set arithmetic unit 11 corresponding to FIG.
- DC data cache
- processing is executed in a cycle of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB).
- IF fetch
- ID decode
- EX execute
- DC data cache
- WB write back
- IF In fetch (IF), an instruction is fetched from the instruction cache.
- decoding (ID) the fetched instruction is decoded and the register operand is fetched.
- EX Execute
- an instruction (operation or the like) is executed based on the decoding result and the fetched register value.
- the execution address In the case of a load / store instruction, the execution address is calculated.
- the branch destination address In the case of a branch instruction, the branch destination address is calculated.
- the execute process includes an arithmetic unit generation process by the minimum set arithmetic unit 11 as described later in addition to these arithmetic processes.
- DC data cache
- WB write back
- the result calculated in the execute process or the operand fetched in the data cache process is stored in a register.
- the data In the case of a store instruction, the data is written to the data cache.
- the instruction 1 is an ADD (addition) instruction and the instruction 2 is a MUL (multiplication) instruction.
- an arithmetic unit (adder) corresponding to the instruction 1 (addition) is configured by the minimum set arithmetic unit 11. (See the adder after instruction 1 in FIG. 6).
- the calculation is executed by the adder configured by the minimum set calculator 11 (that is, the instruction 1 is executed).
- the connection of the adder by the minimum set computing unit 11 and the computation by the constructed adder are configured to be completed by the time of generation of the DC clock related to the instruction 1 (t4). Will be described later).
- the instruction 2 is fetched, and when the instruction 2 is decoded (when the instruction 2 is understood), the arithmetic unit (multiplier) corresponding to the instruction 2 (multiplication) is the minimum set arithmetic unit. 11 (refer to the multiplier after instruction 2 in FIG. 6). Then, an operation is executed by the multiplier configured by the minimum set calculator 11 (that is, the instruction 2 is executed).
- the connection of the multiplier by the minimum set calculator 11 and the calculation by the configured multiplier are configured to be completed by the time of generation of the DC clock related to the instruction 2 (t9). Will be described later).
- instruction 2 is executed, the operation result is stored in the register, and the processing for instruction 2 is completed.
- the connection of the minimum set computing unit 11 may be once cleared (reset) every time processing for each command is completed, or may be changed for each command in an overwrite format. In this way, single thread processing by the single minimum set computing unit 11 according to the present embodiment is executed.
- FIG. 7 shows a time series when a multi-thread (two-stage pipeline) is realized by two minimum set computing units 11 according to the present embodiment (here, reference numerals 11A and 11B are attached for distinction, respectively). It is a figure which shows an example.
- FIG. 8 is a diagram showing a transition mode of the arithmetic unit composed of the minimum set arithmetic units 11A and 11B corresponding to FIG.
- EX execute
- the data cache (DC) clock generation time point related to the respective instructions 1 and 2 is represented.
- processing is executed in a cycle of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB).
- IF fetch
- ID decode
- EX execute
- DC data cache
- WB write back
- the instruction 1 is an ADD (addition) instruction and the instruction 2 is a MUL (multiplication) instruction.
- an arithmetic unit (adder) corresponding to the instruction 1 (addition) is configured by the minimum set arithmetic unit 11A. (See the adder after instruction 1 in FIG. 8). Then, the calculation is executed by the adder constituted by the minimum set calculator 11A (that is, the instruction 1 is executed).
- the connection of the adder by the minimum set calculator 11A and the calculation by the configured adder are configured to be completed by the time of generation of the DC clock related to the instruction 1 (t4). Will be described later).
- an arithmetic unit (multiplier) corresponding to the instruction 2 (multiplication) is configured by the minimum set arithmetic unit 11B. (See the multiplier after instruction 2 in FIG. 8). Then, the calculation is executed by the multiplier configured by the minimum set calculator 11B (that is, the instruction 2 is executed).
- the connection of the multiplier by the minimum set calculator 11B and the calculation by the configured multiplier are configured to be completed by the time (t5) of the DC clock generation related to the instruction 2 (details of this configuration are Will be described later).
- the operation result is stored in the register, and the processing for instruction 2 is completed. In this way, multi-thread (two-stage pipeline) processing is executed by the minimum set calculators 11A and 11B according to the present embodiment.
- the number of stages (number) of the multi-thread pipeline is not limited to the above-described two stages, and may be an arbitrary number of stages of three or more.
- the number of minimum set calculators 11 may be provided according to the number of stages of the pipeline, but the minimum set calculator 11 is desirably the minimum necessary number as described later with reference to FIG.
- FIG. 9 shows a time series when a multi-thread (5-stage pipeline) is realized by two minimum set computing units 11 according to the present embodiment (here, reference numerals 11A and 11B are attached for distinction, respectively). It is a figure which shows an example.
- instruction 1 is an ADD (addition) instruction
- instruction 2 is a MUL (multiplication) instruction
- instruction 3 is a SUB (subtraction) instruction
- instruction 4 is an ADD (addition) instruction
- Assume that instruction 5 is a MUL (multiplication) instruction.
- the connection of the adder by the minimum set calculator 11A and the calculation by the configured adder are configured to be completed by the time of generation of the DC clock related to the instruction 1 (t4). Will be described later).
- the arithmetic unit (multiplier) corresponding to instruction 2 is the minimum set operation. It is comprised by the container 11B. Then, the calculation is executed by the multiplier configured by the minimum set calculator 11B (that is, the instruction 2 is executed). The connection of the multiplier by the minimum set calculator 11B and the calculation by the configured multiplier are configured to be completed by the time (t5) of the DC clock generation related to the instruction 2 (details of this configuration are Will be described later). When instruction 2 is executed, the operation result is stored in the register, and the processing for instruction 2 is completed.
- the minimum set computing unit 11A used for the instruction 1 is used to constitute a subtracter. This is because the execution (EX) of the instruction 1 is completed before the decoding (ID) processing of the instruction 3 is completed, and the minimum set computing unit 11A used for the instruction 1 is released ( This is because it can be used.
- an arithmetic unit (adder) corresponding to instruction 4 performs a minimum set operation. It is comprised by the container 11B. Then, the calculation is executed by the adder constituted by the minimum set calculator 11B (that is, the instruction 4 is executed).
- the connection of the adder by the minimum set calculator 11B and the calculation by the configured adder are configured to be completed by the time of generation of the DC clock related to the instruction 4 (details of this configuration are Will be described later).
- the minimum set operator 11B used for instruction 2 is used to constitute the adder. This is because the execution (EX) of the instruction 2 is completed before the decoding (ID) processing of the instruction 4 is completed, and the minimum set computing unit 11B used for the instruction 2 is released ( This is because it can be used.
- an arithmetic unit is configured using the minimum set arithmetic unit 11A used for the instruction 1 and the instruction 3, and the operation is executed.
- pipeline stalls due to a shortage of computing units are obtained by alternately using two minimum set computing units 11A and 11B in order of instructions for a multi-thread of a 5-stage pipeline.
- the hardware resources are reduced while preventing the occurrence.
- FIG. 10 shows a time series in a case where a superscalar (parallel execution) is realized by two minimum set computing units 11 according to the present embodiment (here, reference numerals 11A and 11B are respectively attached for distinction). It is a figure which shows an example.
- FIG. 11 is a diagram illustrating a transition mode of the arithmetic unit configured by the minimum set arithmetic units 11A and 11B corresponding to FIG.
- processing is executed in a cycle of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB).
- IF fetch
- ID decode
- EX execute
- DC data cache
- WB write back
- instruction 1 is an ADD (addition) instruction
- instruction 2 is an ADD (addition) instruction.
- an arithmetic unit (adder) corresponding to instruction 1 (addition) is the minimum set arithmetic unit 11A. (Refer to the adder after the instruction 1 in FIG. 11).
- an arithmetic unit (adder) corresponding to instruction 2 (addition) is configured by the minimum set arithmetic unit 11B. (See the adder after instruction 2 in FIG. 10).
- the calculation is executed by each of the adders constituted by the minimum set calculators 11A and 11B (that is, the instruction 1 and the instruction 2 are simultaneously executed).
- the connection of the adder by the minimum set calculators 11A and 11B and the calculation by the configured adder are configured to be completed by the time of generation of the DC clock (t4) related to the instruction 1 and the instruction 2 (this Details of the configuration will be described later).
- t4 the time of generation of the DC clock
- the number of parallel processes is not limited to the above-described parallel number of 2, and may be any parallel number of 3 or more.
- the minimum set computing unit 11 is provided in a number corresponding to the parallel number. Thereby, it is possible to prevent the occurrence of pipeline stalls due to a shortage of computing units.
- FIG. 12 is a diagram schematically showing the configuration of the dynamic reconfiguration processor 2 according to another embodiment (embodiment 2) of the present invention.
- the dynamic reconfiguration processor 2 of this embodiment includes a failure repair gate 20 in addition to the CPU 10 and the clock generation circuit 12.
- the configuration and operation example of the CPU 10, particularly the minimum set calculator 11, may be the same as in the first embodiment.
- the failure repair gate 20 is used in place of the failed gate when a failure occurs in a part of the gates of the minimum set computing unit 11. That is, when a failure occurs in a part of the gates of the minimum set computing unit 11, the operation can be continued by stopping the failed gate and changing the connection (connection) to the failure repair gate 20. it can. Note that a method generally used in failure repair technology may be used as the gate failure detection method and the gate stop method.
- the failure repair gate 20 is composed of a smaller number of gates than all the gates constituting the minimum set computing unit 11, and is composed of unit gates corresponding to the minimum unit of the minimum set computing unit 11.
- the fault repair gate 20 is provided for each FPGA synthesis gate unit. Including gate.
- the fault repair gate 20 is the ASIC logic synthesis gate unit. Including each gate.
- the minimum set computing unit 11 is constructed with the element unit of the PchMOSFET and the NchMOSFET as the minimum unit as in the example shown in FIG. 4, the PchMOSFET and the NchMOSFET are replaced with the failure repair gate 20.
- An element for failure repair including each element may be provided in element units.
- the fault repair gate 20 is set to all the gates constituting the minimum set arithmetic unit 11. You may be comprised only from the predetermined gate (for example, gate with high use frequency) of them. Alternatively, in the case where the minimum set calculator 11 is constructed with the gate unit as the minimum unit as in the example shown in FIG. 2 or FIG. 3, the failure repair gates 20 are all included in the minimum set calculator 11. One type of gate may be provided for each type.
- the failure repair gate 20 or element is configured in units of the gate level or in units of the element level, so that a calculator for fault repair is prepared in units of the calculator.
- the number of gates or elements prepared for failure repair can be reduced, and a configuration for failure repair can be realized with a small area.
- the failure repair gate 20 is shown separately from the minimum set computing unit 11 for the sake of explanation, but may be configured integrally with the minimum set computing unit 11 (that is, the minimum set computing unit 11). It may be incorporated in the computing unit 11).
- FIG. 13 is a diagram schematically showing the configuration of the dynamic reconfiguration processor 3 according to another embodiment (third embodiment) of the present invention.
- the dynamic reconfiguration processor 3 of this embodiment includes a CPU (calculator) 22 in addition to the CPU 10 and the clock generation circuit 12.
- the configuration and operation example of the CPU 10, particularly the minimum set calculator 11, may be the same as in the first embodiment.
- the CPU 22 may be a general CPU and includes a plurality of arithmetic units (non-reconfigurable arithmetic units) as hardware. Note that the CPU 22 may be configured integrally with the CPU 10. That is, a plurality of arithmetic units (non-reconfigurable arithmetic units) in the CPU 22 may be incorporated in the CPU 10 separately from the minimum set arithmetic unit 11 in the CPU 10. In this case, sharable hardware (hardware other than the arithmetic unit, such as an instruction decoder control circuit) may be combined into one.
- FIGS. 14, 15, and 16 show each operation example (single thread, multi-thread, superscalar) of the CPU 22, and FIG. 5, FIG. 7, and FIG. 10 respectively show the same operation example of the minimum set computing unit 11.
- FIG. Each operation of the CPU 22 may be in a general mode as shown in FIGS. 14, 15, and 16.
- FIG. 15 shows specific types of arithmetic units in the CPU 22, but actually other types of arithmetic units are included.
- the CPU 22 shown in FIG. 16 includes a larger number of arithmetic units than the CPU 22 shown in FIGS. 14 and 15 for superscalar (parallel execution). At that time, since the parallel number is 2, the CPU 22 shown in FIG. 16 may be equipped with a computing device that is completely double that of the CPU 22 shown in FIG. 14 and FIG. May be installed.
- the dynamic reconfiguration processor 3 is configured to use the minimum set computing unit 11 and the CPU 22 properly in accordance with an instruction.
- This mode of proper use may be any mode.
- frequently used instructions may be executed by an arithmetic unit in the CPU 22, and only low-frequency instructions may be executed by an arithmetic unit dynamically configured by the minimum set arithmetic unit 11. Thereby, the area can be reduced by the minimum set computing unit 11 while maintaining high speed computation by the CPU 22.
- high-frequency instructions are limited, and the area reduction effect is not greatly impaired.
- the classification between high-frequency instructions and low-frequency instructions may be a relative criterion, and may be determined in consideration of the demand for high-speed computation and the demand for area reduction.
- the frequency of each instruction may be determined by performing an instruction analysis in an application in which the dynamic reconfiguration processor 3 is most utilized. In this way, it is possible to balance the cost and speed by designing the architecture in cooperation with the compiler technology.
- the minimum set calculator 11 may be used temporarily. That is, when the CPU 22 performs a normal process and an instruction group in a mode that cannot be executed by the CPU 22 is issued, an arithmetic unit corresponding to the instruction that cannot be executed by the CPU 22 is generated by the minimum set calculator 11. It may be configured dynamically. The instruction that cannot be processed by the arithmetic unit of the CPU 22 is executed by the arithmetic unit constituted by the minimum set arithmetic unit 11 in this way.
- instructions 1, 2, and 3 of the addition instruction are issued simultaneously. However, if there are only two adders in the CPU 22, the pipe of the instruction 3 is originally used. Line stalls and a wait condition occurs.
- an adder is used by using the minimum set arithmetic unit 11. Generate and avoid stalls.
- the instruction 1 and the instruction 2 are executed by the arithmetic unit (two adders) held by the CPU 22, and the instruction 3 is an addition configured by the minimum set arithmetic unit 11. Executed by the instrument. Also in the example shown in FIG. 18, the connection of the adder by the minimum set calculator 11 and the calculation by the configured adder are configured to be completed by the time of generation of the DC clock (t4). Details will be described later).
- FIG. 19 is a diagram illustrating an example of the configuration of the clock generation circuit 12 (first delay prevention method).
- the clock generation circuit 12 includes a transmission circuit 13, a first multiplication circuit 15, and a second multiplication circuit 17.
- a transmitter 14 provided outside is connected to the transmitter circuit 13.
- the transmitter 14 may be provided inside the dynamic reconfiguration processors 1, 2, and 3.
- the output of the transmission circuit 13 is connected to the first multiplication circuit 15.
- the output of the first multiplication circuit 15 is connected to the second multiplication circuit 17.
- the output of the first multiplication circuit 15 is connected to the CPU 10.
- the output of the first multiplication circuit 15 is connected to the CPU 10 and the CPU 22.
- the first multiplication circuit 15 is typically configured by a PLL (Phase Locked Loop), and multiplies the frequency forg (internal clock frequency) of the clock source signal excited by the transmission circuit 13.
- f PLL1 d ⁇ forg.
- f PLL1 represents the frequency of the clock CLK1 from the first multiplier circuit 15, and d is a constant.
- the first multiplication circuit 15 may be omitted when the frequency is low, but in general, when the frequency is several tens of MHz or more, the frequency excited by the transmission circuit 13 is multiplied and used. Needed.
- the output of the first multiplication circuit 15 is input to the CPU 10 (or the CPU 10 and the CPU 22) and functions as a clock CLK1 that is a main clock.
- a clock CLK2 synchronized with the clock CLK1 and having a frequency twice the frequency of the clock CLK1 is generated.
- the clock CLK2 is input to the CPU 10.
- FIG. 20 is a diagram showing the principle of the delay prevention function (first delay prevention method) realized by the clock generation circuit 12 shown in FIG.
- one cycle process (fetch (IF), decode (ID), execute (EX), data cache (DC), write back (WB))) is shown in time series along with the waveform of the clock CLK1.
- t 1 to 7 represents the number of clocks with the IF clock of instruction 1 as the first.
- each processing of the computing unit generation processing (calculation unit generation) by the minimum set computing unit 11 and the arithmetic processing (calculation) by the computing unit generated by the minimum set computing unit 11 is shown.
- the timing of is shown.
- the timing at which the understanding of the instruction in decoding (ID) is completed is indicated by an arrow.
- Execute (EX) includes two processes of generation (connection) of the arithmetic unit by the minimum set arithmetic unit 11 and calculation by the generated adder. Two clocks are required. However, as shown in contrast to FIG. 21, when two clocks of CLK1 are given to execute (EX), data cache (DC) and write back (WB) are each corresponding to that (only one block of CLK1). The process will be delayed.
- the arithmetic unit generation processing (connection based on connection information) by the minimum set arithmetic unit 11 and the arithmetic processing by the arithmetic unit generated by the minimum set arithmetic unit 11 are clocks. This is executed based on a clock CLK2 obtained by multiplying CLK1 by two.
- FIG. 20 relates to the operation of the CPU 10 in the dynamic reconfiguration processors 1, 2, and 3 according to the first, second, and third embodiments.
- the operation of the CPU 22 in the dynamic reconfiguration processor 3 according to the third embodiment may be normal. That is, in the CPU 22 in the dynamic reconfiguration processor 3, each process of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB) is performed on the clock CLK1 as usual. Based on.
- FIG. 22 is a diagram showing another example of the configuration of the clock generation circuit 12 (second delay prevention method).
- the clock generation circuit 12 shown in FIG. 22 differs from the example shown in FIG. 19 mainly in that a phase adjustment circuit 18 is provided instead of the second multiplication circuit 17.
- Other configurations may be similar.
- the phase adjustment circuit 18 generates a clock CLK2 in which the phase of the clock CLK1 that is the output of the first multiplication circuit 15 is shifted by a predetermined phase amount.
- the predetermined phase amount is set based on the longest time (possible worst time) ⁇ T of the time (actual processing time) required for decoding (ID) processing.
- the predetermined phase amount may be determined within a phase range corresponding to a time that is longer than the longest time ⁇ T (see FIG. 23) of decoding (ID) and shorter than the time of one clock of the clock CLK1.
- the predetermined phase amount is set to a phase corresponding to the longest time ⁇ T of decoding (ID) so that the generation processing (calculation unit generation) of the calculation unit by the minimum set calculation unit 11 can be started as soon as possible. Is done.
- the predetermined phase amount is set to a phase corresponding to the longest time ⁇ T of decoding (ID).
- FIG. 23 is a diagram showing the principle of the delay prevention function (second delay prevention method) realized by the clock generation circuit 12 shown in FIG.
- one cycle process (fetch (IF), decode (ID), execute (EX), data cache (DC), write back (WB))) is shown in time series together with the waveform of the clock CLK1. Yes.
- t 1 to 7 represents the number of clocks with the IF clock of instruction 1 as the first.
- FIG. 23 also shows the processing of the arithmetic unit generation processing (arithmetic unit generation) by the minimum set arithmetic unit 11 and the arithmetic processing (calculation) by the arithmetic unit generated by the minimum set arithmetic unit 11 together with the waveform of the clock CLK2.
- the timing of is shown.
- FIG. 23 shows the longest time (actual processing time) required for each process of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB). Has been.
- the timing at which the understanding of the instruction in decoding (ID) is completed (the latest timing) is indicated by an arrow.
- each process of the arithmetic unit generation processing (arithmetic unit generation) by the minimum set arithmetic unit 11 and the arithmetic processing (arithmetic unit) by the arithmetic unit generated by the minimum set arithmetic unit 11 is This is executed based on the clock CLK2 in which the phase of the clock CLK1 is shifted.
- the arithmetic processing (calculation) by the arithmetic unit generated by the minimum set arithmetic unit 11 is started at the next rising edge of the clock CLK2.
- FIG. 23 relates to the operation of the CPU 10 in the dynamic reconfiguration processors 1, 2, and 3 according to the first, second, and third embodiments.
- the operation of the CPU 22 in the dynamic reconfiguration processor 3 according to the third embodiment may be normal. That is, in the CPU 22 in the dynamic reconfiguration processor 3, each process of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB) is performed on the clock CLK1 as usual. Based on. The same applies to the description of FIGS.
- a delay may not be prevented by the first and second delay prevention methods described above. In such a case, the delay can be prevented by combining the first and second delay prevention methods and / or by multiplying by 3 or more in the first delay prevention method.
- the first set and the second delay prevention method are combined to generate the calculation unit generation processing (calculation unit generation) by the minimum set calculation unit 11 and the minimum set calculation unit 11.
- the processing unit generation processing (operation unit generation) by the minimum set arithmetic unit 11 and the arithmetic processing by the arithmetic unit generated by the minimum set arithmetic unit 11 (operation unit generation). Calculation) is completed by the start of the data cache (DC).
- DC data cache
- more than two clocks may be used. For example, two clocks having different phase shifts with respect to the clock CLK1 are generated, and based on the respective clocks, an operation unit generation process (operation unit generation) by the minimum set operation unit 11 and a minimum set operation unit 11 are generated.
- Each of the arithmetic processing (calculation) by the arithmetic unit may be performed.
- the execute (EX) by the minimum set computing unit 11 is generated by the computing unit generation processing (calculation unit generation) by the minimum set computing unit 11 and the computing unit generated by the minimum set computing unit 11. It is separated into two processes of calculation processing (calculation). However, it may be decomposed into three or more processes.
- the computing unit generation process by the minimum set computing unit 11 is further divided into a connection information reading process according to the command and a process for generating the computing unit by the minimum set computing unit 11 based on the read connection information. May be.
- the execute (EX) can be completed by the start of the data cache (DC) by using a three-phase clock or a multiplied clock.
- each of the clocks CLK1 and CLK2 does not necessarily need to be a clock having the same period, and any clock may be used as long as the trigger of each process is given at such a timing that the above-described delay does not occur.
- the frequency of the clock CLK1 itself may be varied by, for example, a frequency spreader.
- processing is executed in a cycle of fetch (IF), decode (ID), execute (EX), data cache (DC), and write back (WB). It may be. In particular, the process immediately after Execute (EX) is optional. Further, the data cache (DC) and the write back (WB) may be processes for writing the result of the execution (EX) into a memory or a register file. Further, the data cache (DC) may be referred to as memory access (MA or MEM), and the name is arbitrary.
- the minimum set calculator 11 including the minimum number of gates or elements capable of generating calculators corresponding to all instruction sets is used as the dynamic configuration calculator.
- a dynamic configuration calculator having more gates or elements than the minimum set calculator 11 may be used (see FIG. 12), or less than the minimum set calculator 11.
- a dynamic configuration calculator comprising a gate or element may be used.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
命令に対応する演算器を動的に構成する動的構成演算器と、
メインクロック及び前記メインクロックとは別のサブクロックを生成するクロック生成回路とを備え、
前記一連の各工程は、前記動的構成演算器を用いて命令を実行する命令実行工程を除き、前記メインクロックに基づいて開始タイミングが規定され、
前記動的構成演算器を用いて命令を実行する命令実行工程は、命令に対応する演算器を前記動的構成演算器により動的に構成する演算器生成サブ工程と、前記演算器生成サブ工程で構成した演算器により前記命令に対応した演算を行う演算サブ工程とを含み、
前記演算器生成サブ工程及び前記演算サブ工程は、前記サブクロックに基づいて開始タイミングが規定され、
前記サブクロックは、前記演算器生成サブ工程及び前記命令実行サブ工程が、前記命令実行工程の直後の工程の開始タイミングよりも前に完了する態様で生成されることを特徴とする、動的再構成プロセッサが提供される。
前記エクゼキュート工程は、前記命令に対応する演算器を動的に構成する演算器生成サブ工程及び前記演算器生成サブ工程で構成した演算器により前記命令に対応した演算を行う演算サブ工程を含み、
当該方法は、
前記メインクロックで規定される第1のタイミングで前記フェッチ工程を実行し、
前記メインクロックで規定される第2のタイミングで前記デコード工程を実行し、
前記メインクロックで規定される第3のタイミングに代えて、前記メインクロックとは別のサブクロックで規定される第1のタイミングで、前記演算器生成サブ工程を実行すると共に、前記サブクロックで規定される第2のタイミングで、前記演算サブ工程を実行し、
前記メインクロックで規定される第4のタイミングで前記データキャッシュ工程を実行することを特徴とする、方法が提供される。
10 CPU
11 最小セット演算器
12 クロック生成回路
13 発信回路
14 発信子
15 第1の逓倍回路
17 第2の逓倍回路
18 位相調整回路
20 故障修復用のゲート
22 CPU
Claims (13)
- 命令毎に一連の各工程を実行して各命令を実行する動的再構成プロセッサであって、
命令に対応する演算器を動的に構成する動的構成演算器と、
メインクロック及び前記メインクロックとは別のサブクロックを生成するクロック生成回路とを備え、
前記一連の各工程は、前記動的構成演算器を用いて命令を実行する命令実行工程を除き、前記メインクロックに基づいて開始タイミングが規定され、
前記動的構成演算器を用いて命令を実行する命令実行工程は、命令に対応する演算器を前記動的構成演算器により動的に構成する演算器生成サブ工程と、前記演算器生成サブ工程で構成した演算器により前記命令に対応した演算を行う演算サブ工程とを含み、
前記演算器生成サブ工程及び前記演算サブ工程は、前記サブクロックに基づいて開始タイミングが規定され、
前記サブクロックは、前記演算器生成サブ工程及び前記命令実行サブ工程が、前記命令実行工程の直後の工程の開始タイミングよりも前に完了する態様で生成されることを特徴とする、動的再構成プロセッサ。 - 前記命令実行工程の直後の工程の開始タイミングは、前記命令実行工程の直前の工程の開始タイミングに対して前記メインクロックの2クロック後に規定される、請求項1に記載の動的再構成プロセッサ。
- 前記サブクロックは、前記メインクロックを逓倍したクロック、前記メインクロックの位相をずらしたクロック、又は、前記メインクロックの位相をずらし且つ逓倍したクロックである、請求項1に記載の動的再構成プロセッサ。
- 前記動的構成演算器は、前記演算器生成サブ工程で生成されうる全ての演算器を生成可能な最小限のゲート又は素子を備える、請求項1に記載の動的再構成プロセッサ。
- 前記動的構成演算器は、前記演算器生成サブ工程で生成されうる全ての演算器を生成可能な最小限のゲート又は素子を備える最小セット演算器からなり、
前記最小セット演算器を用いてシングルスレッドで動作する、請求項1に記載の動的再構成プロセッサ。 - 前記動的構成演算器は、前記演算器生成サブ工程で生成されうる全ての演算器を生成可能な最小限のゲート又は素子を備える最小セット演算器を複数個備え、
各最小セット演算器を利用して並列処理又はパイプライン処理を行う、請求項1に記載の動的再構成プロセッサ。 - 再構成不能な演算器を更に備え、
命令に応じて、前記動的構成演算器と前記再構成不能な演算器とを使い分け、
前記再構成不能な演算器を用いて命令を実行する命令実行工程は、前記メインクロックに基づいて開始タイミングが規定される、請求項1に記載の動的再構成プロセッサ。 - 比較的高い頻度で発生する所定の命令に対しては、前記再構成不能な演算器を用い、比較的低い頻度で発生する命令に対しては、前記動的構成演算器を用いる、請求項7に記載の動的再構成プロセッサ。
- 同一命令が、該命令に対応する前記再構成不能な演算器の数よりも多い命令数で同時に発行された場合に、前記再構成不能な演算器の数の命令に対しては、前記再構成不能な演算器を用い、前記再構成不能な演算器の数よりも多い分の命令に対しては、前記動的構成演算器を用いる、請求項7に記載の動的再構成プロセッサ。
- 前記動的構成演算器は、前記演算器生成サブ工程で生成されうる全ての演算器を生成可能な最小限のゲート又は素子を備える最小セット演算器からなり、
前記最小セット演算器のゲート又は素子に故障が発生した場合に使用される故障補修用ゲート又は故障補修用素子を更に備える、請求項1に記載の動的再構成プロセッサ。 - 前記動的構成演算器は、前記演算器生成サブ工程で生成されうる全ての演算器を生成可能な最小限のゲートを、NAND,NOR,NOTのゲート単位で備える最小セット演算器からなり、
前記記演算器生成サブ工程は、前記NAND,NOR,NOTのゲート単位で結線を行うことで、前記命令に対応する演算器を動的に構成する、請求項1に記載の動的再構成プロセッサ。 - 前記動的構成演算器は、前記演算器生成サブ工程で生成されうる全ての演算器を生成可能な最小限の素子を、PchMOSFET及びNchMOSFETレベルの素子単位で備える最小セット演算器からなり、
前記記演算器生成サブ工程は、前記PchMOSFET及びNchMOSFETレベルの素子単位で結線を行うことで、前記命令に対応する演算器を動的に構成する、請求項1に記載の動的再構成プロセッサ。 - 命令を取り出すフェッチ工程と、取り出した前記命令をデコードするデコード工程と、エクゼキュート工程と、データキャッシュ工程とを含むプロセッサの動作方法であって、
前記エクゼキュート工程は、前記命令に対応する演算器を動的に構成する演算器生成サブ工程及び前記演算器生成サブ工程で構成した演算器により前記命令に対応した演算を行う演算サブ工程を含み、
当該方法は、
前記メインクロックで規定される第1のタイミングで前記フェッチ工程を実行し、
前記メインクロックで規定される第2のタイミングで前記デコード工程を実行し、
前記メインクロックで規定される第3のタイミングに代えて、前記メインクロックとは別のサブクロックで規定される第1のタイミングで、前記演算器生成サブ工程を実行すると共に、前記サブクロックで規定される第2のタイミングで、前記演算サブ工程を実行し、
前記メインクロックで規定される第4のタイミングで前記データキャッシュ工程を実行することを特徴とする、方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE112010005459T DE112010005459T5 (de) | 2010-04-06 | 2010-04-06 | Dynamisch rekonfigurierbarer prozessor und verfahren zum betreiben desselben |
JP2012509223A JPWO2011125174A1 (ja) | 2010-04-06 | 2010-04-06 | 動的再構成プロセッサ及びその動作方法 |
US13/635,307 US20130013902A1 (en) | 2010-04-06 | 2010-04-06 | Dynamically reconfigurable processor and method of operating the same |
PCT/JP2010/056227 WO2011125174A1 (ja) | 2010-04-06 | 2010-04-06 | 動的再構成プロセッサ及びその動作方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2010/056227 WO2011125174A1 (ja) | 2010-04-06 | 2010-04-06 | 動的再構成プロセッサ及びその動作方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011125174A1 true WO2011125174A1 (ja) | 2011-10-13 |
Family
ID=44762161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/056227 WO2011125174A1 (ja) | 2010-04-06 | 2010-04-06 | 動的再構成プロセッサ及びその動作方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20130013902A1 (ja) |
JP (1) | JPWO2011125174A1 (ja) |
DE (1) | DE112010005459T5 (ja) |
WO (1) | WO2011125174A1 (ja) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9778312B1 (en) * | 2013-10-16 | 2017-10-03 | Altera Corporation | Integrated circuit calibration system using general purpose processors |
GB2526018B (en) | 2013-10-31 | 2018-11-14 | Silicon Tailor Ltd | Multistage switch |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06195149A (ja) * | 1992-10-23 | 1994-07-15 | Matsushita Electric Ind Co Ltd | 集積回路 |
JPH08202549A (ja) * | 1995-01-30 | 1996-08-09 | Mitsubishi Electric Corp | データ処理装置 |
JPH1185507A (ja) * | 1997-09-05 | 1999-03-30 | Mitsubishi Electric Corp | 中央処理装置およびマイクロコンピュータシステム |
JP2004005739A (ja) * | 1999-08-30 | 2004-01-08 | Ip Flex Kk | データ処理装置の制御方法 |
JP2006178653A (ja) * | 2004-12-21 | 2006-07-06 | Ip Flex Kk | データ処理システムおよびその制御方法 |
JP2008539485A (ja) * | 2005-04-28 | 2008-11-13 | ザ ユニバーシティ コート オブ ザ ユニバーシティ オブ エディンバラ | 再構成可能命令セル・アレイ |
JP2009140353A (ja) * | 2007-12-07 | 2009-06-25 | Toshiba Corp | 再構成可能な集積回路、及びこれを用いた自己修復システム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07175631A (ja) | 1993-12-16 | 1995-07-14 | Dainippon Printing Co Ltd | 演算処理装置 |
-
2010
- 2010-04-06 WO PCT/JP2010/056227 patent/WO2011125174A1/ja active Application Filing
- 2010-04-06 US US13/635,307 patent/US20130013902A1/en not_active Abandoned
- 2010-04-06 JP JP2012509223A patent/JPWO2011125174A1/ja active Pending
- 2010-04-06 DE DE112010005459T patent/DE112010005459T5/de not_active Withdrawn
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06195149A (ja) * | 1992-10-23 | 1994-07-15 | Matsushita Electric Ind Co Ltd | 集積回路 |
JPH08202549A (ja) * | 1995-01-30 | 1996-08-09 | Mitsubishi Electric Corp | データ処理装置 |
JPH1185507A (ja) * | 1997-09-05 | 1999-03-30 | Mitsubishi Electric Corp | 中央処理装置およびマイクロコンピュータシステム |
JP2004005739A (ja) * | 1999-08-30 | 2004-01-08 | Ip Flex Kk | データ処理装置の制御方法 |
JP2006178653A (ja) * | 2004-12-21 | 2006-07-06 | Ip Flex Kk | データ処理システムおよびその制御方法 |
JP2008539485A (ja) * | 2005-04-28 | 2008-11-13 | ザ ユニバーシティ コート オブ ザ ユニバーシティ オブ エディンバラ | 再構成可能命令セル・アレイ |
JP2009140353A (ja) * | 2007-12-07 | 2009-06-25 | Toshiba Corp | 再構成可能な集積回路、及びこれを用いた自己修復システム |
Also Published As
Publication number | Publication date |
---|---|
US20130013902A1 (en) | 2013-01-10 |
DE112010005459T5 (de) | 2013-01-31 |
JPWO2011125174A1 (ja) | 2013-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10564980B2 (en) | Apparatus, methods, and systems for conditional queues in a configurable spatial accelerator | |
US10469397B2 (en) | Processors and methods with configurable network-based dataflow operator circuits | |
US10417175B2 (en) | Apparatus, methods, and systems for memory consistency in a configurable spatial accelerator | |
US10515046B2 (en) | Processors, methods, and systems with a configurable spatial accelerator | |
EP3449357B1 (en) | Scheduler for out-of-order block isa processors | |
US8473880B1 (en) | Synchronization of parallel memory accesses in a dataflow circuit | |
US20190004878A1 (en) | Processors, methods, and systems for a configurable spatial accelerator with security, power reduction, and performace features | |
US20200004538A1 (en) | Apparatuses, methods, and systems for conditional operations in a configurable spatial accelerator | |
US20220100680A1 (en) | Apparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits | |
US8281113B2 (en) | Processor having ALU with dynamically transparent pipeline stages | |
JP5231800B2 (ja) | 半導体集積回路装置および半導体集積回路装置のクロック制御方法 | |
US8977835B2 (en) | Reversing processing order in half-pumped SIMD execution units to achieve K cycle issue-to-issue latency | |
US20230325195A1 (en) | Replicating logic blocks to enable increased throughput with sequential enabling of input register blocks | |
US20070005942A1 (en) | Converting a processor into a compatible virtual multithreaded processor (VMP) | |
KR20070107814A (ko) | 의존성 명령을 패킷으로 그룹핑하여 실행하는 프로세서 및방법 | |
JP2007299355A (ja) | マイクロプロセッサ | |
US11907713B2 (en) | Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator | |
US20240004663A1 (en) | Processing device with vector transformation execution | |
WO2011125174A1 (ja) | 動的再構成プロセッサ及びその動作方法 | |
Saghir et al. | Datapath and ISA customization for soft VLIW processors | |
Bansal | Reduced Instruction Set Computer (RISC): A Survey | |
Iyer et al. | Extended split-issue: Enabling flexibility in the hardware implementation of nual VLIW DSPs | |
US9141392B2 (en) | Different clock frequencies and stalls for unbalanced pipeline execution logics | |
JP6060853B2 (ja) | プロセッサおよびプロセッサの処理方法 | |
Ho | Dynamical Synthesized Execution Resources (DySER) Deisgn Specification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10849418 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012509223 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13635307 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120100054592 Country of ref document: DE Ref document number: 112010005459 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10849418 Country of ref document: EP Kind code of ref document: A1 |