JP6551751B2 - マルチプロセッサ装置 - Google Patents
マルチプロセッサ装置 Download PDFInfo
- Publication number
- JP6551751B2 JP6551751B2 JP2016542545A JP2016542545A JP6551751B2 JP 6551751 B2 JP6551751 B2 JP 6551751B2 JP 2016542545 A JP2016542545 A JP 2016542545A JP 2016542545 A JP2016542545 A JP 2016542545A JP 6551751 B2 JP6551751 B2 JP 6551751B2
- Authority
- JP
- Japan
- Prior art keywords
- instruction
- processing
- processor
- register
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 claims description 139
- 230000015654 memory Effects 0.000 claims description 98
- 238000000034 method Methods 0.000 claims description 68
- 230000008569 process Effects 0.000 claims description 63
- 239000000284 extract Substances 0.000 claims description 7
- 230000002776 aggregation Effects 0.000 claims description 5
- 238000004220 aggregation Methods 0.000 claims description 5
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 8
- 230000003111 delayed effect Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 4
- 238000012937 correction Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 description 1
- 208000033748 Device issues Diseases 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007334 memory performance Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8053—Vector processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
- G06F9/38873—Iterative single instructions for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Advance Control (AREA)
- Image Processing (AREA)
- Multi Processors (AREA)
- Complex Calculations (AREA)
- Executing Machine-Instructions (AREA)
Description
for (x=0; x<320; x++) {
R2 = C0 * x + C1 * y + C2;
R3 = C3 * x + C4 * y + C5;
R0 = mem[R3][R2];
mem[y][x] = R0;
}
for (x=0; x<320; x+=8)
for (i=0; i<8; i++)
R2 = C0 * (x+i) + C1 * y + C2;
for (x=0; x<320; x+=8)
for (i=0; i<8; i++)
R3 = C3 * (x+i) + C4 * y + C5;
for (x=0; x<320; x+=8)
for (i=0; i<8; i++)
R0 = mem[R3][R2];
for (x=0; x<320; x+=8)
for (i=0; i<8; i++)
mem[y][x] = R0;
}
for (x=0; x<320*3; x+=8)
for (x=0; x<1920/2; x+=8)
case 00:
R0 = R1 + R2;
break;
case 01:
R0 = R1 + R2;
break;
case 10:
R3 = R4 / R1;
break;
case 11:
R0 = R1 + R2;
R3 = R4 / R1;
break;
}
Judge = 0x3333; if (!Judge[F3210]) R3 = R4 / R1;
for (x=0; x<64; x++) {
0: R4 = 1/16 * x - 2; R3 = F3210 = 0;
1: R5 = 1/32 * y - 1; R0=R1=0;
2: R2 = R0 * R0 - R1 * R1 + R4; R8 = sqrt(R1 * R1 - 4); R3 += 1;
Judge = 0xaaaa; if (!&Judge[F3210] & (Loop < 64)) goto 2;
Form = 0x3333; F0 |= Form[CCcor];
3: R1 = (R0 * R1 + R5) * 2; R9 = sqrt(R2 * R2 - 4); R0 = R2;
Form = 0x3333; F0 |= Form[CCcor];
4: mem[x][y] = R3;
}
101 メモリ集約装置
102 外部メモリ
103 マルチプレクサ
105 全体制御装置
106 レジスタメモリ
107 プロセッサ
Claims (5)
- 複数のプロセッサを備えるマルチプロセッサ装置であって、
外部メモリと、
前記複数のプロセッサのメモリアクセスを集約するメモリ集約装置と、
前記プロセッサが管理するレジスタ数と、同一命令について前記マルチプロセッサ装置全体が処理可能な最大の数である最大処理数との積の数のレジスタメモリと、
前記プロセッサのレジスタアクセスに対し与えられた命令に従って前記レジスタメモリのアクセスを行うマルチプレクサと、
命令からパラメータを抽出し前記プロセッサと前記マルチプレクサに与え制御するとともに、同一命令について前記マルチプロセッサ装置全体に対して要求された処理数である与えられた処理数分を同一命令にて前記プロセッサで前記レジスタメモリのアドレッシングを変化させて順次処理させ、処理数分が終われば次の命令に切り替えて与えられた処理数分の処理を繰り返させる全体制御装置と、
を備えるマルチプロセッサ装置。 - 前記全体制御装置は、前記与えられた処理数が前記最大処理数を越える処理数であればいくつかに分割して処理を実行し、前記与えられた処理数が前記最大処理数に満たない処理数であればいくつかを結合して処理を実行する、請求項1記載のマルチプロセッサ装置。
- 前記全体制御装置は、命令を切り替えて新たな命令を実行する際に、切替前の命令において、新たな命令の処理順番と同じ処理順番で実施された処理が終了していなければ当該処理が終了するまで新たな命令についての処理を待機させる、または切替前の命令において同じ処理順番で実施された処理のレジスタ書き込み位置と新たな命令でのレジスタ読み込み位置が等しい場合は当該処理が終了するまで新たな命令についての処理を待機させる、あるいは切替前の命令において、新たな命令の処理順番と同じ処理順番で実施されるとともに、予め指定された命令数以前に実施された処理が終了していなければ当該処理が終了するまで新たな命令についての処理を待機させる、請求項1記載のマルチプロセッサ装置。
- 前記全体制御装置は、与えられた命令から前記各プロセッサによる処理順番に関する相対的なシフト量を抽出して、当該シフト量を前記マルチプレクサに与えるとともに、当該シフト量が前記プロセッサの数の整数倍以外であれば前記レジスタメモリへのアドレッシングを最初だけ2回行うよう指示し、
前記マルチプレクサは、前記レジスタメモリのアドレッシングで得られるデータと、過去のアドレッシングで得られたデータから、前記シフト量に従ってデータをシフトさせて抽出するとともに、当該抽出したデータを前記複数のプロセッサに与える、請求項1記載のマルチプロセッサ装置。 - 前記プロセッサは、与えられた命令と個々の演算結果から分岐条件を示すフラグを生成し、命令に従って前記レジスタメモリに格納された複数の分岐フラグと組み合わせ新たな分岐フラグとして前記レジスタメモリに格納し、
前記プロセッサは与えられた命令と個々の前記レジスタメモリに格納された複数の分岐フラグから、前記レジスタメモリへの演算結果の書き込みの有無、もしくは指定された命令への移動の有無を決定する、請求項1記載のマルチプロセッサ装置。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014164137 | 2014-08-12 | ||
JP2014164137 | 2014-08-12 | ||
PCT/JP2015/072246 WO2016024508A1 (ja) | 2014-08-12 | 2015-08-05 | マルチプロセッサ装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
JPWO2016024508A1 JPWO2016024508A1 (ja) | 2017-06-01 |
JP6551751B2 true JP6551751B2 (ja) | 2019-07-31 |
Family
ID=55304138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2016542545A Active JP6551751B2 (ja) | 2014-08-12 | 2015-08-05 | マルチプロセッサ装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10754818B2 (ja) |
JP (1) | JP6551751B2 (ja) |
WO (1) | WO2016024508A1 (ja) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019208566A1 (ja) * | 2018-04-24 | 2019-10-31 | ArchiTek株式会社 | プロセッサ装置 |
JP7476676B2 (ja) * | 2020-06-04 | 2024-05-01 | 富士通株式会社 | 演算処理装置 |
EP4268177A1 (en) * | 2020-12-23 | 2023-11-01 | Imsys AB | A method and system for rearranging and distributing data of an incoming image for processing by multiple processing clusters |
US20220207148A1 (en) * | 2020-12-26 | 2022-06-30 | Intel Corporation | Hardening branch hardware against speculation vulnerabilities |
CN114553700B (zh) * | 2022-02-24 | 2024-06-28 | 树根互联股份有限公司 | 设备分组方法、装置、计算机设备及存储介质 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH077385B2 (ja) * | 1983-12-23 | 1995-01-30 | 株式会社日立製作所 | データ処理装置 |
US5790879A (en) * | 1994-06-15 | 1998-08-04 | Wu; Chen-Mie | Pipelined-systolic single-instruction stream multiple-data stream (SIMD) array processing with broadcasting control, and method of operating same |
US5513366A (en) * | 1994-09-28 | 1996-04-30 | International Business Machines Corporation | Method and system for dynamically reconfiguring a register file in a vector processor |
US7100026B2 (en) * | 2001-05-30 | 2006-08-29 | The Massachusetts Institute Of Technology | System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values |
JP3971535B2 (ja) * | 1999-09-10 | 2007-09-05 | 株式会社リコー | Simd型プロセッサ |
US6892361B2 (en) * | 2001-07-06 | 2005-05-10 | International Business Machines Corporation | Task composition method for computer applications |
US8041929B2 (en) * | 2006-06-16 | 2011-10-18 | Cisco Technology, Inc. | Techniques for hardware-assisted multi-threaded processing |
JP4801605B2 (ja) | 2007-02-28 | 2011-10-26 | 株式会社リコー | Simd型マイクロプロセッサ |
US7627744B2 (en) | 2007-05-10 | 2009-12-01 | Nvidia Corporation | External memory accessing DMA request scheduling in IC of parallel processing engines according to completion notification queue occupancy level |
JP5049802B2 (ja) * | 2008-01-22 | 2012-10-17 | 株式会社リコー | 画像処理装置 |
US20100115233A1 (en) * | 2008-10-31 | 2010-05-06 | Convey Computer | Dynamically-selectable vector register partitioning |
US8542732B1 (en) * | 2008-12-23 | 2013-09-24 | Elemental Technologies, Inc. | Video encoder using GPU |
US8112551B2 (en) * | 2009-05-07 | 2012-02-07 | Cypress Semiconductor Corporation | Addressing scheme to allow flexible mapping of functions in a programmable logic array |
JP6081300B2 (ja) * | 2013-06-18 | 2017-02-15 | 株式会社東芝 | 情報処理装置及びプログラム |
-
2015
- 2015-08-05 US US15/317,183 patent/US10754818B2/en active Active
- 2015-08-05 WO PCT/JP2015/072246 patent/WO2016024508A1/ja active Application Filing
- 2015-08-05 JP JP2016542545A patent/JP6551751B2/ja active Active
Also Published As
Publication number | Publication date |
---|---|
JPWO2016024508A1 (ja) | 2017-06-01 |
US10754818B2 (en) | 2020-08-25 |
WO2016024508A1 (ja) | 2016-02-18 |
US20170116153A1 (en) | 2017-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9830156B2 (en) | Temporal SIMT execution optimization through elimination of redundant operations | |
JP6551751B2 (ja) | マルチプロセッサ装置 | |
RU2427895C2 (ru) | Оптимизированная для потоков многопроцессорная архитектура | |
US12020067B2 (en) | Scheduling tasks using targeted pipelines | |
CN103197916A (zh) | 用于源操作数收集器高速缓存的方法和装置 | |
US10268519B2 (en) | Scheduling method and processing device for thread groups execution in a computing system | |
US11500677B2 (en) | Synchronizing scheduling tasks with atomic ALU | |
JP6493088B2 (ja) | 演算処理装置及び演算処理装置の制御方法 | |
US20240160472A1 (en) | Scheduling tasks using work fullness counter | |
US20180365009A1 (en) | Scheduling tasks | |
US6785743B1 (en) | Template data transfer coprocessor | |
US9477628B2 (en) | Collective communications apparatus and method for parallel systems | |
US20130166887A1 (en) | Data processing apparatus and data processing method | |
JP4444305B2 (ja) | 半導体装置 | |
JP2023509813A (ja) | Simt指令処理方法及び装置 | |
JP2015106325A (ja) | ベクトルレジスタリネーミング制御方式、ベクトルプロセッサ、及びベクトルレジスタリネーミング制御方法 | |
Raju et al. | Performance enhancement of CUDA applications by overlapping data transfer and Kernel execution | |
KR102644951B1 (ko) | 산술 논리 장치 레지스터 시퀀싱 | |
US20240220315A1 (en) | Dynamic control of work scheduling | |
US20230084298A1 (en) | Processing Device Using Variable Stride Pattern | |
JP4703735B2 (ja) | コンパイラ、コード生成方法、コード生成プログラム | |
Ou et al. | Efficient Statistical Computing on Multicore and MultiGPU Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A821 Effective date: 20161209 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20180327 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20180411 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20190606 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20190619 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6551751 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |