CN103777924B - 用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 - Google Patents
用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 Download PDFInfo
- Publication number
- CN103777924B CN103777924B CN201310503908.XA CN201310503908A CN103777924B CN 103777924 B CN103777924 B CN 103777924B CN 201310503908 A CN201310503908 A CN 201310503908A CN 103777924 B CN103777924 B CN 103777924B
- Authority
- CN
- China
- Prior art keywords
- state
- treatment
- predicate
- passage
- register
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 70
- 230000009467 reduction Effects 0.000 claims abstract description 5
- 230000015654 memory Effects 0.000 description 44
- 230000006870 function Effects 0.000 description 12
- 230000008859 change Effects 0.000 description 11
- 230000000873 masking effect Effects 0.000 description 10
- 238000013461 design Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000000712 assembly Effects 0.000 description 5
- 238000000429 assembly Methods 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 101100317378 Mus musculus Wnt3 gene Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical group [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000009415 formwork Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30018—Bit or string instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30065—Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30072—Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
- G06F9/30109—Register structure having multiple operands in a single register
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
Claims (20)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261717534P | 2012-10-23 | 2012-10-23 | |
US61/717,534 | 2012-10-23 | ||
US13/738,858 US9557993B2 (en) | 2012-10-23 | 2013-01-10 | Processor architecture and method for simplifying programming single instruction, multiple data within a register |
US13/738,858 | 2013-01-10 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103777924A CN103777924A (zh) | 2014-05-07 |
CN103777924B true CN103777924B (zh) | 2018-01-26 |
Family
ID=49328398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310503908.XA Active CN103777924B (zh) | 2012-10-23 | 2013-10-23 | 用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9557993B2 (zh) |
EP (1) | EP2725484A1 (zh) |
CN (1) | CN103777924B (zh) |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9477482B2 (en) * | 2013-09-26 | 2016-10-25 | Nvidia Corporation | System, method, and computer program product for implementing multi-cycle register file bypass |
US9519479B2 (en) * | 2013-11-18 | 2016-12-13 | Globalfoundries Inc. | Techniques for increasing vector processing utilization and efficiency through vector lane predication prediction |
EP3001307B1 (en) * | 2014-09-25 | 2019-11-13 | Intel Corporation | Bit shuffle processors, methods, systems, and instructions |
US9928076B2 (en) * | 2014-09-26 | 2018-03-27 | Intel Corporation | Method and apparatus for unstructured control flow for SIMD execution engine |
GB2540941B (en) | 2015-07-31 | 2017-11-15 | Advanced Risc Mach Ltd | Data processing |
EP3125108A1 (en) * | 2015-07-31 | 2017-02-01 | ARM Limited | Vector processing using loops of dynamic vector length |
EP3125109B1 (en) * | 2015-07-31 | 2019-02-20 | ARM Limited | Vector length querying instruction |
GB2545248B (en) * | 2015-12-10 | 2018-04-04 | Advanced Risc Mach Ltd | Data processing |
US10115175B2 (en) * | 2016-02-19 | 2018-10-30 | Qualcomm Incorporated | Uniform predicates in shaders for graphics processing units |
GB2548604B (en) | 2016-03-23 | 2018-03-21 | Advanced Risc Mach Ltd | Branch instruction |
GB2548602B (en) * | 2016-03-23 | 2019-10-23 | Advanced Risc Mach Ltd | Program loop control |
GB2548603B (en) * | 2016-03-23 | 2018-09-26 | Advanced Risc Mach Ltd | Program loop control |
AR108326A1 (es) | 2016-04-27 | 2018-08-08 | Samumed Llc | Isoquinolin-3-il carboxamidas y preparación y uso de las mismas |
AR108325A1 (es) | 2016-04-27 | 2018-08-08 | Samumed Llc | Isoquinolin-3-il carboxamidas y preparación y uso de las mismas |
CN112214244A (zh) * | 2016-08-05 | 2021-01-12 | 中科寒武纪科技股份有限公司 | 一种运算装置及其操作方法 |
JP2018124877A (ja) * | 2017-02-02 | 2018-08-09 | 富士通株式会社 | コード生成装置、コード生成方法、およびコード生成プログラム |
US11868804B1 (en) | 2019-11-18 | 2024-01-09 | Groq, Inc. | Processor instruction dispatch configuration |
US11243880B1 (en) | 2017-09-15 | 2022-02-08 | Groq, Inc. | Processor architecture |
US11114138B2 (en) | 2017-09-15 | 2021-09-07 | Groq, Inc. | Data structures with multiple read ports |
US11360934B1 (en) * | 2017-09-15 | 2022-06-14 | Groq, Inc. | Tensor streaming processor architecture |
US11170307B1 (en) | 2017-09-21 | 2021-11-09 | Groq, Inc. | Predictive model compiler for generating a statically scheduled binary with known resource constraints |
US11709681B2 (en) * | 2017-12-11 | 2023-07-25 | Advanced Micro Devices, Inc. | Differential pipeline delays in a coprocessor |
WO2019136454A1 (en) * | 2018-01-08 | 2019-07-11 | Atlazo, Inc. | Compact arithmetic accelerator for data processing devices, systems and methods |
US11488002B2 (en) | 2018-02-15 | 2022-11-01 | Atlazo, Inc. | Binary neural network accelerator engine methods and systems |
US11789734B2 (en) * | 2018-08-30 | 2023-10-17 | Advanced Micro Devices, Inc. | Padded vectorization with compile time known masks |
US11537687B2 (en) | 2018-11-19 | 2022-12-27 | Groq, Inc. | Spatial locality transform of matrices |
US11029960B2 (en) * | 2018-12-07 | 2021-06-08 | Intel Corporation | Apparatus and method for widened SIMD execution within a constrained register file |
US11216281B2 (en) * | 2019-05-14 | 2022-01-04 | International Business Machines Corporation | Facilitating data processing using SIMD reduction operations across SIMD lanes |
US11392316B2 (en) * | 2019-05-24 | 2022-07-19 | Texas Instruments Incorporated | System and method for predication handling |
WO2021035006A1 (en) | 2019-08-20 | 2021-02-25 | Northrop Grumman Systems Corporation | Simd controller and simd predication scheme |
US11269651B2 (en) * | 2019-09-10 | 2022-03-08 | International Business Machines Corporation | Reusing adjacent SIMD unit for fast wide result generation |
FR3100907B1 (fr) * | 2019-09-16 | 2022-12-09 | St Microelectronics Grenoble 2 | Test de programme |
CN114930351A (zh) | 2019-11-26 | 2022-08-19 | 格罗克公司 | 使用仅单个侧从多维阵列加载操作数并输出结果 |
CN111158757B (zh) * | 2019-12-31 | 2021-11-30 | 中昊芯英(杭州)科技有限公司 | 并行存取装置和方法以及芯片 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4763251A (en) * | 1986-01-17 | 1988-08-09 | International Business Machines Corporation | Merge and copy bit block transfer implementation |
US6115808A (en) * | 1998-12-30 | 2000-09-05 | Intel Corporation | Method and apparatus for performing predicate hazard detection |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2273377A (en) | 1992-12-11 | 1994-06-15 | Hughes Aircraft Co | Multiple masks for array processors |
US7039906B1 (en) | 2000-09-29 | 2006-05-02 | International Business Machines Corporation | Compiler for enabling multiple signed independent data elements per register |
US7017032B2 (en) * | 2001-06-11 | 2006-03-21 | Broadcom Corporation | Setting execution conditions |
US7127593B2 (en) | 2001-06-11 | 2006-10-24 | Broadcom Corporation | Conditional execution with multiple destination stores |
US6986025B2 (en) | 2001-06-11 | 2006-01-10 | Broadcom Corporation | Conditional execution per lane |
US7600102B2 (en) * | 2004-06-14 | 2009-10-06 | Broadcom Corporation | Condition bits for controlling branch processing |
US20080016320A1 (en) | 2006-06-27 | 2008-01-17 | Amitabh Menon | Vector Predicates for Sub-Word Parallel Operations |
US7676647B2 (en) | 2006-08-18 | 2010-03-09 | Qualcomm Incorporated | System and method of processing data using scalar/vector instructions |
US8260002B2 (en) | 2008-09-26 | 2012-09-04 | Axis Ab | Video analytics system, computer program product, and associated methodology for efficiently using SIMD operations |
US8401327B2 (en) | 2008-09-26 | 2013-03-19 | Axis Ab | Apparatus, computer program product and associated methodology for video analytics |
GB2470782B (en) * | 2009-06-05 | 2014-10-22 | Advanced Risc Mach Ltd | A data processing apparatus and method for handling vector instructions |
US8478946B2 (en) * | 2009-09-08 | 2013-07-02 | Advanced Micro Devices, Inc. | Method and system for local data sharing |
US8726252B2 (en) * | 2011-01-28 | 2014-05-13 | International Business Machines Corporation | Management of conditional branches within a data parallel system |
US20140189296A1 (en) * | 2011-12-14 | 2014-07-03 | Elmoustapha Ould-Ahmed-Vall | System, apparatus and method for loop remainder mask instruction |
US9860224B2 (en) | 2011-12-15 | 2018-01-02 | Intel Corporation | Systems and methods for secured entry of user authentication data |
US9588766B2 (en) * | 2012-09-28 | 2017-03-07 | Intel Corporation | Accelerated interlane vector reduction instructions |
-
2013
- 2013-01-10 US US13/738,858 patent/US9557993B2/en active Active
- 2013-10-09 EP EP13187965.2A patent/EP2725484A1/en not_active Ceased
- 2013-10-23 CN CN201310503908.XA patent/CN103777924B/zh active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4763251A (en) * | 1986-01-17 | 1988-08-09 | International Business Machines Corporation | Merge and copy bit block transfer implementation |
US6115808A (en) * | 1998-12-30 | 2000-09-05 | Intel Corporation | Method and apparatus for performing predicate hazard detection |
Also Published As
Publication number | Publication date |
---|---|
EP2725484A1 (en) | 2014-04-30 |
CN103777924A (zh) | 2014-05-07 |
US9557993B2 (en) | 2017-01-31 |
US20140115301A1 (en) | 2014-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103777924B (zh) | 用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 | |
Waidyasooriya et al. | OpenCL-based FPGA-platform for stencil computation and its optimization methodology | |
Demmel et al. | Parallel reproducible summation | |
Severance et al. | Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor | |
Kakay et al. | Speedup of FEM micromagnetic simulations with graphical processing units | |
DE102018005169A1 (de) | Prozessoren und verfahren mit konfigurierbaren netzwerkbasierten datenflussoperatorschaltungen | |
EP2725485A2 (en) | Memory interconnect network architecture for vector processor | |
Samsi et al. | MATLAB for signal processing on multiprocessors and multicores | |
Waidyasooriya et al. | Highly-parallel FPGA accelerator for simulated quantum annealing | |
Nugteren et al. | Algorithmic species: A classification of affine loop nests for parallel programming | |
Bernaschi et al. | A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units | |
Zapletal et al. | Parallel and vectorized implementation of analytic evaluation of boundary integral operators | |
Lukarski | Parallel sparse linear algebra for multi-core and many-core platforms: Parallel solvers and preconditioners | |
C. Penha et al. | ADD: Accelerator Design and Deploy‐A tool for FPGA high‐performance dataflow computing | |
Armstrong et al. | Parallel processing of spatial statistics | |
Jin et al. | Evaluating floating-point intensive applications on opencl fpga platforms: A case study on the simplemoc kernel | |
Wyrzykowski et al. | Model-driven adaptation of double-precision matrix multiplication to the cell processor architecture | |
Jin et al. | Evaluating LULESH kernels on opencl FPGA | |
Jost et al. | An efficient multi-algorithms sparse linear solver for GPUs | |
CN103777922B (zh) | 预测计数器 | |
Lemke et al. | An object-oriented approach for parallel self adaptive mesh refinement on block structured grids | |
Boku et al. | Mixed precision solver scalable to 16000 mpi processes for lattice quantum chromodynamics simulations on the oakforest-pacs system | |
Arabas et al. | PARADE: a massively parallel differential evolution template for EASEA | |
Labaki et al. | The BEM on general purpose graphics processing units (GPGPU): a study on three distinct implementations | |
Jakobs et al. | Performance and energy consumption of a Gram–Schmidt process for vector orthogonalization on a processor integrated GPU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: ANALOG DEVICES, INC. Free format text: FORMER OWNER: ANALOG DEVICES TECHNOLOGY COMPANY Effective date: 20150105 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20150105 Address after: Bermuda (UK) Hamilton Applicant after: ANALOG DEVICES GLOBAL Address before: Bermuda (UK) Hamilton Applicant before: Analog Devices Global |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Limerick Patentee after: Analog Devices Global Unlimited Co. Address before: Bermuda (UK) Hamilton Patentee before: Analog Devices Global |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210728 Address after: Limerick Patentee after: ANALOG DEVICES INTERNATIONAL UNLIMITED Co. Address before: Limerick Patentee before: Analog Devices Global Unlimited Co. |