WO2011105408A1 - Processeur simd - Google Patents

Processeur simd Download PDF

Info

Publication number
WO2011105408A1
WO2011105408A1 PCT/JP2011/053935 JP2011053935W WO2011105408A1 WO 2011105408 A1 WO2011105408 A1 WO 2011105408A1 JP 2011053935 W JP2011053935 W JP 2011053935W WO 2011105408 A1 WO2011105408 A1 WO 2011105408A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
pes
simd
simd processor
data
Prior art date
Application number
PCT/JP2011/053935
Other languages
English (en)
Japanese (ja)
Inventor
昭倫 京
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2012501811A priority Critical patent/JP5708634B2/ja
Publication of WO2011105408A1 publication Critical patent/WO2011105408A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]

Definitions

  • the present invention relates to a SIMD (Single Instruction Multiple Data) processor.
  • SIMD Single Instruction Multiple Data
  • FIG. 9 is a diagram showing the configuration of the SIMD processor described in Non-Patent Document 1.
  • the SIMD processor includes a plurality of computing elements (PE: Processing Element) 110 and a control processor (CP: Control Processor) 130 that issues the same instruction to the plurality of PEs 110.
  • PE Processing Element
  • CP Control Processor
  • high computing performance can be realized based on inexpensive hardware.
  • the data arranged at a predetermined position on the global memory (Global Memory) 140 managed by the CP 130 is sequentially read into the local memory (Local Memory) 120 on the PE 110 side in the order of addresses. Thereafter, all the PEs 110 perform calculations on the data in their local memory 120 at the same time in accordance with instructions issued from the CP 130.
  • Patent Document 1 describes an image processor that can switch between a SIMD type and a systolic array type configuration.
  • a dynamically reconfigurable processor As described above, a method of connecting a large number of memory blocks and a large number of PEs with abundant wiring (for example, crossbars) can be considered. However, according to this method, since it is necessary to add a large number of wiring circuits to the SIMD processor, hardware for realizing the SIMD processor becomes expensive.
  • An object of the present invention is to provide a SIMD processor that solves such problems.
  • the SIMD processor is: A SIMD (Single Instruction Multiple Data) processor comprising a control processor (CP: Control Processor) and a plurality of computing elements (PE: Processing Element), The plurality of PEs perform a SIMD operation to execute a single instruction issued from the CP, The CP performs a command / data distribution operation for distributing different commands or different commands and data to each of the plurality of PEs, Each of the plurality of PEs performs a systolic operation of executing an instruction sent from the CP in the instruction / data distribution operation after the source operands of the instruction are prepared.
  • CP Control Processor
  • PE Processing Element
  • the SIMD processor According to the SIMD processor according to the present invention, it is possible to process a program at high speed by utilizing the arithmetic elements of the SIMD processor without changing the data arrangement on the global memory.
  • the SIMD processor according to the first viewpoint is provided.
  • the CP is provided with a SIMD processor that sequentially and exclusively uses instruction issue paths for all PEs in order to distribute instructions to each PE in the instruction / data distribution operation.
  • the plurality of PEs transfer the operation results of the executed operation instructions to other PEs, and the operation results transferred from the other PEs are
  • a SIMD processor is provided that serves as a source operand for instructions sent in an instruction / data distribution operation.
  • each of the plurality of PEs uses the data sent from the CP as a source operand for the instruction sent in the command / data distribution operation. Is provided.
  • a global access arbitration unit (Global Access Arbiter) that arbitrates global memory access by each of the plurality of PEs and guarantees access exclusiveness to the global memory is further provided.
  • a SIMD processor is provided.
  • a SIMD processor in which each of the plurality of PEs includes a register for storing a flag for keeping the operation stopped during the instruction / data distribution operation.
  • a SIMD processor in which each of the plurality of PEs includes a register that stores a flag for switching the operation between the SIMD operation and the systolic operation.
  • a SIMD processor in which each of the plurality of PEs includes a selector that selects whether or not to store an instruction issued from the CP in its own instruction buffer.
  • the SIMD processor performs SIMD operation in which the entire PE array executes a single instruction issued from the control processor (CP), and the CP uses the instruction issue path to transmit different instruction codes and data.
  • the instruction / data distribution operation to be sequentially transmitted to each PE is performed, and each PE in the PE array receives the instruction transmitted from the CP by the instruction / data distribution operation, not the instruction broadcast from the CP every cycle.
  • a systolic operation is performed to specify that the execution result of the instruction is written to the register resource of another designated PE.
  • each PE starts executing the instruction after the operands of the instruction are written by data sent from other PEs or CPs. It is sent to another PE, and in the case of a memory access command, the global memory is accessed.
  • the SIMD processor of the present invention different instructions can be issued to each PE by the instruction / data distribution operation. Accordingly, it is possible to shift the timing of executing the memory access instruction between the PEs, or to allow only a specific PE to execute the memory access instruction. At this time, even if the data arrangement on the memory is left as it is and the memory space accessible by the PE is expanded to the global memory space, there are a number of cases when using the global memory space that is a single hardware resource. The frequency of competing PEs can be reduced, and the processing performance of the processing resources of the processor can be improved.
  • the SIMD processor according to the present invention it is possible to speed up the processing using the operation resource of the SIMD processor without changing the data arrangement on the memory of the program (first effect).
  • the CP performs an instruction / data distribution operation on the PE array, so that each arithmetic instruction corresponding to each arithmetic node and related constant data when processing for one iteration is expressed as a data flow graph It is possible to distribute to each PE and assign each operation node in the data flow graph to each PE. Further, in order for the CP to start execution in the systolic operation mode in the PE array, the activation data is repeatedly issued to one or more PEs, thereby starting up the different iterations one after another. The array can be processed.
  • the processing for the portion (parallel loop portion) having no data dependency among the iterations in the loop portion of the program can be performed without changing the data arrangement on the global memory.
  • the speed can be increased (second effect).
  • the circuit resources of the instruction issue path to all PEs provided in the conventional SIMD processor and the data wiring circuit resources from each PE to the CP and from the CP to each PE are allocated. It can be used as it is. Further, as a connection for accessing the global memory space from each PE, a wiring circuit resource for exchanging scalar data between a CP and a PE array provided in a conventional SIMD processor can be used as it is. Furthermore, between PEs that perform operations in which operation nodes on the data flow graph transmit / receive data to / from each other, as a connection between PEs for transmitting / receiving data, the wiring resources between adjacent PEs provided in the conventional SIMD processor are used as they are. Can be used.
  • the first effect and the second effect can be obtained (third effect) only by adding a small amount of circuit to the conventional SIMD processor.
  • FIG. 10 is a diagram showing a configuration of one PE 110 in a PE array included in a conventional SIMD processor.
  • the PE 110 stores an instruction buffer (instb) 111 for storing instructions issued from the CP 130 (FIG. 9), general purpose registers (General Purpose Registers) r0 to r7, and arithmetic units (ALU: Arithmetic Logic Unit). ) 112, and an entry / exit to the connection network between PEs (Left / Right Inter PE Connection), and a local memory 120 for each PE (Local Memory). All the PEs 110 simultaneously execute a single instruction issued from the CP 130 every cycle.
  • instruction buffer instb
  • general purpose registers General Purpose Registers
  • ALU Arithmetic Logic Unit
  • FIG. 1 shows a configuration of 1 PE 10 in a PE array included in a SIMD processor according to this embodiment.
  • the PE 10 includes an instruction buffer (instb) 11, general purpose registers (General Purpose Registers) r 0 to r 15, an arithmetic unit (ALU: Arithmetic Logic Unit) 12, and an entrance / exit to a connection network between PEs (Left / Right Interchange). PE Connection) and local memory 20 for each PE.
  • the PE 10 further includes registers stop and mode and a selector sel.
  • FIG. 1 components (thick line portions) added to FIG. 10, that is, registers stop, mode, selector sel, registers cm, sx, and sy will be described.
  • the register stop is a control register for keeping the operation of the PE 10 stopped during the instruction / data distribution operation.
  • the register mode is a 1-bit operation mode selection register for switching between a systolic operation and a conventional SIMD operation.
  • the selector sel is a selector that selects whether or not an instruction issued from the CP is stored in the instruction buffer (instb) 11.
  • the registers cm, sx, and sy are a general-purpose register (waiting register) group having a data waiting function in which a predetermined counter register is decremented each time a write to the register occurs during a systolic operation.
  • FIG. 2 is a diagram schematically showing the configuration of the control processor (CP) 30 included in the SIMD processor according to the present embodiment.
  • CP 30 has a data path for performing its own operation, and is an instruction / data cache (Instruction / Data Cache) 31, and a memory access adjustment unit, like CP 130 in the conventional SIMD processor. It is connected to a global memory 40 via an arbiter 33.
  • the CP 30 reads and issues commands to be executed in its own data path and commands to be broadcast to the entire PE array from the global memory 40, and is transmitted and received between the calculation data on the CP 30 and the local memory 20 of the PE 10. Data to be read from the global memory 40 or written to the global memory 40.
  • FIG. 3 is a diagram illustrating an example of an instruction format in the SIMD processor according to the present embodiment.
  • the CP 30 in the SIMD processor of the present embodiment is different from the CP 130 in the conventional SIMD processor, and sends instructions and / or data to a specific PE at the same time and executes them to a specific PE or a plurality of designated PEs.
  • An instruction set having an instruction format as shown in FIG. 3 is used.
  • the instruction format of CP30 has a different format according to the bit pattern of the header section of the instruction.
  • the bit pattern of the “header” part is “10”, it indicates that the instruction A is to be distributed to the specific PE 10, and as a subsequent instruction, an instruction B whose bit pattern of the header part is always “11”
  • the instruction A is written to the instruction buffer (instb) 11 of the PE of the PE number indicated by the “Target PEID” part of the instruction B, and the PE register of the PE number indicated by the “Target reg ID” part of the instruction B is written. Specifies the operation of writing the “data” part of the instruction B.
  • the value of the “data” part is stored in the register of the number indicated by the “Target reg ID” part of the PE of the PE number indicated by the “Target PEID” part of the instruction.
  • the instruction stored in the instruction buffer (instb) 11 writes the value of the data part to the register cm of the PE 10 having cm as the source operand. specify.
  • FIG. 4 shows the overall configuration of the PE array in the SIMD processor according to this embodiment.
  • the bold line portion indicates a global access arbitration unit (Global Access Arbiter) 50, which is a circuit element newly added in the present invention, in addition to the individual PEs 10, with respect to the conventional SIMD processor.
  • Global Access Arbiter Global Access Arbiter
  • the global access arbitration unit 50 is a module that manages the local memory blocks of all the PEs 20 so that they can be used together as a multi-bank cache memory body, and memory access is simultaneously generated from a large number of PEs 10 during the systolic operation mode. If this occurs, the memory access is arbitrated.
  • the following method can be considered. In other words, when there are memory access requests from two or more PEs simultaneously, the operation of the entire PE array is temporarily stopped, the memory access requests of each PE are answered one by one, and then the operation of the PE array is resumed.
  • An implementation method with low performance but low hardware implementation cost is conceivable.
  • a mounting method that has the highest performance but high hardware implementation cost is conceivable, in which local memory blocks of all PEs are connected by a crossbar mechanism to respond to a large number of memory access requests with the shortest possible delay. Any mounting method may be adopted as long as the maximum memory access delay can be determined statically when arbitrary program code is executed in a superimposed manner.
  • the CP 30 simultaneously reads two instructions, the instruction at the address indicated by the value of the program counter (PC: Program Counter) 35 and the instruction at the next address. However, whether the count value of the PC 35 is incremented or decremented every cycle and how the read instruction is processed is determined by “header” of the read first instruction A. It is determined as follows according to the value of the part.
  • PC Program Counter
  • an instruction having a “header” value of “10” includes a “counter” portion. It is assumed that a value equal to or greater than the number of “waiting registers” among the source operands of the instruction is set in the “counter” portion.
  • the PE number specified by the “target PEID” portion of the instruction A is on the PE.
  • a write operation P is performed on the register sx.
  • the value “1” is set in the “counter” portion of the instruction C in the instruction buffer (instb) 11 of the PE, and the register sx is the only “waiting register” in the source operand of the instruction C.
  • the “counter” part of the instruction C includes The value 1 is set again.
  • the “counter” portion is not 1 but 0 which is the decrement result. Since it is set, the instruction C is continuously executed on the PE.
  • CP30 executes an instruction A in which the waiting register cm is specified in the “Target reg ID” part in the systolic operation, all having cm as a source operand in the instruction buffer (instb) 11
  • the writing operation P to cm occurs with respect to the PE.
  • the value “2” is set in the “counter” part of the instruction C in the instruction buffer (instb) 11 of the PE, and cm and sy are both source operands of the instruction C
  • the PE does not start executing the instruction C only by the write operation P to cm caused by the execution of the instruction A. However, the PE does not start the execution of the instruction C.
  • the write operation Q occurs, the PE starts executing the instruction C.
  • the value 2 is copied from the “header” part of the instruction C and set again in the “counter” part of the instruction C.
  • a write operation occurs simultaneously with respect to the queuing registers sx and cm of the PE, 0, which is the result of decrementing this twice only, is set in the “counter” section again. Therefore, the instruction C is continuously executed on the PE.
  • the CP 30 can instruct the PE to execute the instruction stored in the instruction buffer (instb) 11 by issuing the “systolic operation” instruction instruction. If the instruction stored in the instruction buffer (instb) 11 designates the “waiting register” of another PE as the write destination of the execution result, the execution of the instruction propagates between the PEs. . Further, since the CP 30 can perform a write operation on the registers cm of a large number of PEs, it is possible to simultaneously shift a large number of PEs from the “waiting” state to the “execution” state. In this manner, the CP 30 can cause a systolic instruction execution chain on the PE array by issuing a “systolic operation” instruction.
  • FIG. 5A shows pseudo code corresponding to the loop portion of the process of mapping to the PE array in this embodiment.
  • the pseudo code reads the data from the array A, adds the variable a, and writes it to the array B, for a total of 8 iterations for elements 0 to 7 of the array A. This is the program code to be executed in the configuration.
  • FIG. 5B shows a case where the processing of FIG. 5A is mapped to PEs PE0, PE1, PE2, PE4 and PE10 in the PE array group of the SIMD processor of this embodiment. Indicates the instruction to be performed.
  • add shown in FIG. 5B means an addition instruction, and has two source register number designations (A and cm) and one destination register number designation (1sx).
  • a single alphabet (A, B, a) represents a constant (A is the absolute address of array A in this case).
  • the instruction buffer (instb) 11 When executing the instruction stored in the instruction buffer (instb) 11, if a constant is specified as an operand, it is assumed that the constant is stored in the register of register number 0 (ie, r0), and the register r0 Operates to read a value.
  • the destination register number is specified by a combination of a PE number and a register name. For example, if it is 1 sx, the operation is performed so that the operation result is stored in the sx register of the PE with PE number 1.
  • CP30 in order for CP30 to store the instruction “add A, cm, 1sx” in the instruction buffer (instb) 11 of PE0, the “header” part is set to 10, and the “opcode” part is set.
  • a bit string representing an add instruction “1st operating reg ID” is 0, “2nd operating reg ID” is 0xd, “Destination reg ID” is 0xe, and (the PE number of the operation result storage destination is 1) “PEID part ”,“ Counter ”part is 1 instruction,“ header ”part is 11,“ data ”part is absolute address of array A,“ Target reg ID ”part is 0, (add Since the PE number of the owner of the instruction storage instb is 0) "Target. EID "the prepared instructions and which is set to 0, it is sufficient to run the CP30.
  • gld and gst shown in FIG. 5B are a load instruction and a store instruction for the global memory, respectively.
  • the load instruction has the load target address as the first source operand without having the second source operand, and has the designation of the destination register number of the storage destination of the loaded data.
  • the store instruction has a store target address as the first source operand, a register number storing the write data as the second source operand, and does not have a destination register number designation (indicated as NULL in FIG. 5B).
  • 1sx, 2sx, 4sx, 4sy, and NULL in the destination field designation of each instruction are sx of PE1, sx of PE2, sx of PE4, Indicates sy and no destination.
  • FIG. 5 (c) shows, as an example, a time chart of the operation from when the instruction code of FIG. 5 (b) is distributed to the PE array by the CP 30 until the operation ends.
  • the vertical axis represents time (unit: cycle)
  • the horizontal axis represents the operation of the CP 30 and the operation on the PE side.
  • the operation on the PE side is displayed separately for each iteration.
  • the operation status of the CP 30 and PE in each cycle is shown.
  • INSTB_BC (PE0) written at the top of the column indicating the operation of CP30 reads an instruction whose “header” portion is “10” (and an instruction whose subsequent “header” portion is “11”). Indicates that the operation of distributing to PE0 has occurred in the cycle.
  • GO (1, cm) means that an instruction whose “header” part is “11” and whose value of the “data” part is “1” is read out of PE0, PE1, PE2, PE4, and PE10. This indicates that an operation in which the instruction in the instruction buffer (instb) 11 writes 1 as the value of the “data” part of the instruction to the register cm of the PE including the register cm as a source operand has occurred in the cycle.
  • the CLD and CST perform load operation and store operation via the cache memory or the like to the global memory generated as a result of the arbitration by the global access arbitration unit 50 due to the issuing of the gld and gst instructions on the PE side, respectively.
  • PEx represents a cycle in which a PE with a PE number x executes an instruction.
  • PEx / y indicates that PEx and PEy executed an instruction in the same cycle.
  • * or + is added to the end of PEx that executed the gld or gst instruction.
  • a dotted arrow indicates a flow from when the gld instruction is executed on PE1 until load data is sent to PE2.
  • a black rectangle in FIG. 5C represents a load data waiting cycle, and “-” represents a transfer cycle between PEs.
  • the filled black circle indicates that the instruction is executed on the corresponding PE (horizontal axis).
  • An arrow PEx ⁇ PEy indicates that data transfer has occurred between PEx and PEy.
  • the load access delay for the global memory 40 is three cycles. Therefore, the arrow from PE1 to PE2 extends over 3 cycles.
  • no crossing occurs between a plurality of arrows in the same direction. This indicates that there is no collision regarding data transfer using the coupling line between PEs over a total of 21 cycles in which the PE array performs systolic operation.
  • the arrow in the left ⁇ right (or left ⁇ right) direction from PEx to PEy in cycle P indicates the connection between adjacent PEs in the direction of PEx ⁇ PEy (or PEx ⁇ PEy) in the cycle. Indicates that data transfer is performed using a line.
  • the brightness of the arrows and filled circles is changed in order to make it easy to distinguish between individual iterations.
  • FIGS. 5C and 6 show diagrams assuming that the delay of the load access by the PE for the global memory is 3 cycles.
  • the delay is smaller than 3, for example, 2, the instruction assigned to PE10 may be assigned to PE9 as shown in FIG.
  • the load access delay is larger than 3, for example, 4, for example, the instruction assigned to PE10 may be assigned to PE11 as shown in FIG. 7B.
  • FIG. 8 shows pseudo code when the program code of FIG. 5A is sequentially executed on the CP 30.
  • CADD represents an add instruction.
  • CLD and CST represent a memory load instruction and a memory store instruction, respectively. These are all instructions whose “header” part is 00.
  • the performance improvement is small when the number of iterations is small. However, if the number of iterations is 1000, the 5 cycles required to distribute the instructions to the PE array can be ignored. Further, in this embodiment, it can be executed with a throughput of one cycle for each iteration. On the other hand, referring to FIG. 7, when the same processing is executed on the CP 30, it takes 6 cycles for each iteration. Therefore, the SIMD processor of the present invention provides a performance improvement of about 6 times.

Abstract

L'invention porte sur un processeur à instruction unique, données multiples (SIMD) qui permet à un programme d'être traité à grande vitesse à l'aide d'éléments de traitement du processeur SIMD sans modifier l'agencement de données dans une mémoire globale. Le processeur SIMD comprend un processeur de commande (CP) et une pluralité d'éléments de traitement (PE), la pluralité de PE réalisant une opération SIMD pour exécuter une instruction unique délivrée par le CP, un PE spécifique parmi la pluralité de PE réalisant une opération de distribution d'instruction/données pour recevoir l'instruction et des données délivrées par le CP, et pour l'instruction qui a été diffusée à chaque PE par le CP dans l'opération de distribution d'instruction/données, chacun des PE réalise une opération systolique pour l'exécution après que des opérandes sources de l'instruction ont été collectées.
PCT/JP2011/053935 2010-02-24 2011-02-23 Processeur simd WO2011105408A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012501811A JP5708634B2 (ja) 2010-02-24 2011-02-23 Simdプロセッサ

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010-038976 2010-02-24
JP2010038976 2010-02-24

Publications (1)

Publication Number Publication Date
WO2011105408A1 true WO2011105408A1 (fr) 2011-09-01

Family

ID=44506813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/053935 WO2011105408A1 (fr) 2010-02-24 2011-02-23 Processeur simd

Country Status (2)

Country Link
JP (1) JP5708634B2 (fr)
WO (1) WO2011105408A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021157771A (ja) * 2018-10-18 2021-10-07 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co., Ltd. ネットワークオンチップによるデータ処理方法及び装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63501530A (ja) * 1985-09-17 1988-06-09 ザ・ジョンズ・ホプキンス・ユニバ−シティ メモリをリンクしたウエ−ブフロント・アレイ・プロセッサ
US4967340A (en) * 1985-06-12 1990-10-30 E-Systems, Inc. Adaptive processing system having an array of individually configurable processing components
JPH0635878A (ja) * 1992-05-22 1994-02-10 Internatl Business Mach Corp <Ibm> 単一命令複数データ/複数命令複数データ・プロセッサ・アレイ用コントローラ
US5659780A (en) * 1994-02-24 1997-08-19 Wu; Chen-Mie Pipelined SIMD-systolic array processor and methods thereof
JP2002175283A (ja) * 2000-12-05 2002-06-21 Matsushita Electric Ind Co Ltd シストリックアレイ型演算器
JP2008034953A (ja) * 2006-07-26 2008-02-14 Kobe Univ 画像処理プロセッサ

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0266300B1 (fr) * 1986-10-29 1994-12-07 United Technologies Corporation Architecture de multiprocesseur modulaire en réseau à N dimensions
JPH0668053A (ja) * 1992-08-20 1994-03-11 Toshiba Corp 並列計算機
CA2129882A1 (fr) * 1993-08-12 1995-02-13 Soheil Shams Reseau de communication entre multiprocesseurs simd dynamiquement reconfigurable et appareil utilisant ce reseau
JP2657903B2 (ja) * 1994-11-29 1997-09-30 乾彌 呉 パイプライン式及び心収縮式の単命令多重データストリームのアレイプロセッサ及びその方法
US5680597A (en) * 1995-01-26 1997-10-21 International Business Machines Corporation System with flexible local control for modifying same instruction partially in different processor of a SIMD computer system to execute dissimilar sequences of instructions
JP3987784B2 (ja) * 2002-10-30 2007-10-10 Necエレクトロニクス株式会社 アレイ型プロセッサ
JP4477959B2 (ja) * 2004-07-26 2010-06-09 独立行政法人理化学研究所 ブロードキャスト型並列処理のための演算処理装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4967340A (en) * 1985-06-12 1990-10-30 E-Systems, Inc. Adaptive processing system having an array of individually configurable processing components
JPS63501530A (ja) * 1985-09-17 1988-06-09 ザ・ジョンズ・ホプキンス・ユニバ−シティ メモリをリンクしたウエ−ブフロント・アレイ・プロセッサ
JPH0635878A (ja) * 1992-05-22 1994-02-10 Internatl Business Mach Corp <Ibm> 単一命令複数データ/複数命令複数データ・プロセッサ・アレイ用コントローラ
US5659780A (en) * 1994-02-24 1997-08-19 Wu; Chen-Mie Pipelined SIMD-systolic array processor and methods thereof
JP2002175283A (ja) * 2000-12-05 2002-06-21 Matsushita Electric Ind Co Ltd シストリックアレイ型演算器
JP2008034953A (ja) * 2006-07-26 2008-02-14 Kobe Univ 画像処理プロセッサ

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021157771A (ja) * 2018-10-18 2021-10-07 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co., Ltd. ネットワークオンチップによるデータ処理方法及び装置
JP2021166034A (ja) * 2018-10-18 2021-10-14 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッドShanghai Cambricon Information Technology Co., Ltd. ネットワークオンチップによるデータ処理方法及び装置
JP7074832B2 (ja) 2018-10-18 2022-05-24 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッド ネットワークオンチップによるデータ処理方法及び装置
JP7074833B2 (ja) 2018-10-18 2022-05-24 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッド ネットワークオンチップによるデータ処理方法及び装置

Also Published As

Publication number Publication date
JP5708634B2 (ja) 2015-04-30
JPWO2011105408A1 (ja) 2013-06-20

Similar Documents

Publication Publication Date Title
RU2427895C2 (ru) Оптимизированная для потоков многопроцессорная архитектура
US7159099B2 (en) Streaming vector processor with reconfigurable interconnection switch
US8412917B2 (en) Data exchange and communication between execution units in a parallel processor
JP4156794B2 (ja) iVLIWのPE間通信を用いた効率的な同期MIMD動作のための方法および装置
US9158575B2 (en) Multithreaded processor array with heterogeneous function blocks communicating tokens via self-routing switch fabrics
US11550750B2 (en) Memory network processor
EP1333381A2 (fr) Système et procédé pour traiter des images et compilateur utilisable dans ce système
US8892620B2 (en) Computer for Amdahl-compliant algorithms like matrix inversion
JP2014501009A (ja) データを移動させるための方法及び装置
KR102090885B1 (ko) 배선 복잡성이 감소된 시프트 레지스터
JP4624098B2 (ja) プロセッサのアドレス発生ユニット
JP2011523132A (ja) 実行エンジン
US8024549B2 (en) Two-dimensional processor array of processing elements
US8190856B2 (en) Data transfer network and control apparatus for a system with an array of processing elements each either self- or common controlled
US8060726B2 (en) SIMD microprocessor, image processing apparatus including same, and image processing method used therein
US11782760B2 (en) Time-multiplexed use of reconfigurable hardware
JP5708634B2 (ja) Simdプロセッサ
KR100267092B1 (ko) 멀티미디어신호프로세서의단일명령다중데이터처리
JP5370352B2 (ja) Simd型プロセッサアレイシステム及びそのデータ転送方法
JP5358315B2 (ja) 並列計算装置
JP2024505440A (ja) トリガ条件に依存する命令実行のための回路及び方法
Frijns et al. Dc-simd: Dynamic communication for simd processors
CN117009287A (zh) 一种于弹性队列存储的动态可重构处理器
Ito et al. Reconfigurable instruction-level parallel processor architecture
JP2006196015A (ja) データ処理プロセッサおよびシステム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11747368

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012501811

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11747368

Country of ref document: EP

Kind code of ref document: A1