CN103777924A - 用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 - Google Patents
用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 Download PDFInfo
- Publication number
- CN103777924A CN103777924A CN201310503908.XA CN201310503908A CN103777924A CN 103777924 A CN103777924 A CN 103777924A CN 201310503908 A CN201310503908 A CN 201310503908A CN 103777924 A CN103777924 A CN 103777924A
- Authority
- CN
- China
- Prior art keywords
- state
- passage
- register
- instruction
- predicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012545 processing Methods 0.000 claims abstract description 69
- 230000009467 reduction Effects 0.000 claims abstract description 5
- 241001269238 Data Species 0.000 claims 1
- 230000008569 process Effects 0.000 description 20
- 230000006870 function Effects 0.000 description 12
- 230000000873 masking effect Effects 0.000 description 10
- 230000008859 change Effects 0.000 description 9
- 230000000712 assembly Effects 0.000 description 5
- 238000000429 assembly Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 101100317378 Mus musculus Wnt3 gene Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical group [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30018—Bit or string instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30065—Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30072—Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
- G06F9/30109—Register structure having multiple operands in a single register
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Abstract
Description
Claims (20)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261717534P | 2012-10-23 | 2012-10-23 | |
US61/717,534 | 2012-10-23 | ||
US13/738,858 | 2013-01-10 | ||
US13/738,858 US9557993B2 (en) | 2012-10-23 | 2013-01-10 | Processor architecture and method for simplifying programming single instruction, multiple data within a register |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103777924A true CN103777924A (zh) | 2014-05-07 |
CN103777924B CN103777924B (zh) | 2018-01-26 |
Family
ID=49328398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310503908.XA Active CN103777924B (zh) | 2012-10-23 | 2013-10-23 | 用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9557993B2 (zh) |
EP (1) | EP2725484A1 (zh) |
CN (1) | CN103777924B (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158757A (zh) * | 2019-12-31 | 2020-05-15 | 深圳芯英科技有限公司 | 并行存取装置和方法以及芯片 |
CN112470139A (zh) * | 2018-01-08 | 2021-03-09 | 阿特拉佐有限公司 | 用于数据处理装置、系统和方法的紧凑算术加速器 |
CN112506586A (zh) * | 2019-09-16 | 2021-03-16 | 意法半导体(格勒诺布尔2)公司 | 可编程电子设备及其操作方法 |
US11392316B2 (en) * | 2019-05-24 | 2022-07-19 | Texas Instruments Incorporated | System and method for predication handling |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9477482B2 (en) * | 2013-09-26 | 2016-10-25 | Nvidia Corporation | System, method, and computer program product for implementing multi-cycle register file bypass |
US9519479B2 (en) * | 2013-11-18 | 2016-12-13 | Globalfoundries Inc. | Techniques for increasing vector processing utilization and efficiency through vector lane predication prediction |
EP3001307B1 (en) * | 2014-09-25 | 2019-11-13 | Intel Corporation | Bit shuffle processors, methods, systems, and instructions |
US9928076B2 (en) | 2014-09-26 | 2018-03-27 | Intel Corporation | Method and apparatus for unstructured control flow for SIMD execution engine |
GB2540941B (en) * | 2015-07-31 | 2017-11-15 | Advanced Risc Mach Ltd | Data processing |
EP3125109B1 (en) * | 2015-07-31 | 2019-02-20 | ARM Limited | Vector length querying instruction |
EP3125108A1 (en) * | 2015-07-31 | 2017-02-01 | ARM Limited | Vector processing using loops of dynamic vector length |
GB2545248B (en) * | 2015-12-10 | 2018-04-04 | Advanced Risc Mach Ltd | Data processing |
US10115175B2 (en) * | 2016-02-19 | 2018-10-30 | Qualcomm Incorporated | Uniform predicates in shaders for graphics processing units |
GB2548602B (en) * | 2016-03-23 | 2019-10-23 | Advanced Risc Mach Ltd | Program loop control |
GB2548603B (en) * | 2016-03-23 | 2018-09-26 | Advanced Risc Mach Ltd | Program loop control |
GB2548604B (en) | 2016-03-23 | 2018-03-21 | Advanced Risc Mach Ltd | Branch instruction |
AR108325A1 (es) * | 2016-04-27 | 2018-08-08 | Samumed Llc | Isoquinolin-3-il carboxamidas y preparación y uso de las mismas |
AR108326A1 (es) | 2016-04-27 | 2018-08-08 | Samumed Llc | Isoquinolin-3-il carboxamidas y preparación y uso de las mismas |
CN112214244A (zh) * | 2016-08-05 | 2021-01-12 | 中科寒武纪科技股份有限公司 | 一种运算装置及其操作方法 |
JP2018124877A (ja) * | 2017-02-02 | 2018-08-09 | 富士通株式会社 | コード生成装置、コード生成方法、およびコード生成プログラム |
US11243880B1 (en) | 2017-09-15 | 2022-02-08 | Groq, Inc. | Processor architecture |
US11360934B1 (en) | 2017-09-15 | 2022-06-14 | Groq, Inc. | Tensor streaming processor architecture |
US11114138B2 (en) | 2017-09-15 | 2021-09-07 | Groq, Inc. | Data structures with multiple read ports |
US11868804B1 (en) | 2019-11-18 | 2024-01-09 | Groq, Inc. | Processor instruction dispatch configuration |
US11170307B1 (en) | 2017-09-21 | 2021-11-09 | Groq, Inc. | Predictive model compiler for generating a statically scheduled binary with known resource constraints |
US11709681B2 (en) * | 2017-12-11 | 2023-07-25 | Advanced Micro Devices, Inc. | Differential pipeline delays in a coprocessor |
US11488002B2 (en) | 2018-02-15 | 2022-11-01 | Atlazo, Inc. | Binary neural network accelerator engine methods and systems |
US11789734B2 (en) * | 2018-08-30 | 2023-10-17 | Advanced Micro Devices, Inc. | Padded vectorization with compile time known masks |
US11204976B2 (en) | 2018-11-19 | 2021-12-21 | Groq, Inc. | Expanded kernel generation |
US11029960B2 (en) * | 2018-12-07 | 2021-06-08 | Intel Corporation | Apparatus and method for widened SIMD execution within a constrained register file |
US11216281B2 (en) * | 2019-05-14 | 2022-01-04 | International Business Machines Corporation | Facilitating data processing using SIMD reduction operations across SIMD lanes |
WO2021035006A1 (en) | 2019-08-20 | 2021-02-25 | Northrop Grumman Systems Corporation | Simd controller and simd predication scheme |
US11269651B2 (en) * | 2019-09-10 | 2022-03-08 | International Business Machines Corporation | Reusing adjacent SIMD unit for fast wide result generation |
WO2021108559A1 (en) | 2019-11-26 | 2021-06-03 | Groq, Inc. | Loading operands and outputting results from a multi-dimensional array using only a single side |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4763251A (en) * | 1986-01-17 | 1988-08-09 | International Business Machines Corporation | Merge and copy bit block transfer implementation |
US6115808A (en) * | 1998-12-30 | 2000-09-05 | Intel Corporation | Method and apparatus for performing predicate hazard detection |
US20020199086A1 (en) * | 2001-06-11 | 2002-12-26 | Broadcom Corporation | Setting execution conditions |
US20050278514A1 (en) * | 2004-06-14 | 2005-12-15 | Broadcom Corporation | Condition bits for controlling branch processing |
US20080016320A1 (en) * | 2006-06-27 | 2008-01-17 | Amitabh Menon | Vector Predicates for Sub-Word Parallel Operations |
US20100312988A1 (en) * | 2009-06-05 | 2010-12-09 | Arm Limited | Data processing apparatus and method for handling vector instructions |
US20110066813A1 (en) * | 2009-09-08 | 2011-03-17 | Advanced Micro Devices, Inc. | Method And System For Local Data Sharing |
US20120198425A1 (en) * | 2011-01-28 | 2012-08-02 | International Business Machines Corporation | Management of conditional branches within a data parallel system |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2273377A (en) | 1992-12-11 | 1994-06-15 | Hughes Aircraft Co | Multiple masks for array processors |
US7039906B1 (en) | 2000-09-29 | 2006-05-02 | International Business Machines Corporation | Compiler for enabling multiple signed independent data elements per register |
US7127593B2 (en) | 2001-06-11 | 2006-10-24 | Broadcom Corporation | Conditional execution with multiple destination stores |
US6986025B2 (en) | 2001-06-11 | 2006-01-10 | Broadcom Corporation | Conditional execution per lane |
US7676647B2 (en) | 2006-08-18 | 2010-03-09 | Qualcomm Incorporated | System and method of processing data using scalar/vector instructions |
US8401327B2 (en) | 2008-09-26 | 2013-03-19 | Axis Ab | Apparatus, computer program product and associated methodology for video analytics |
US8260002B2 (en) | 2008-09-26 | 2012-09-04 | Axis Ab | Video analytics system, computer program product, and associated methodology for efficiently using SIMD operations |
WO2013089707A1 (en) * | 2011-12-14 | 2013-06-20 | Intel Corporation | System, apparatus and method for loop remainder mask instruction |
WO2013089717A1 (en) | 2011-12-15 | 2013-06-20 | Intel Corporation | Systems and methods for secured entry of user authentication data |
US9588766B2 (en) * | 2012-09-28 | 2017-03-07 | Intel Corporation | Accelerated interlane vector reduction instructions |
-
2013
- 2013-01-10 US US13/738,858 patent/US9557993B2/en active Active
- 2013-10-09 EP EP13187965.2A patent/EP2725484A1/en not_active Ceased
- 2013-10-23 CN CN201310503908.XA patent/CN103777924B/zh active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4763251A (en) * | 1986-01-17 | 1988-08-09 | International Business Machines Corporation | Merge and copy bit block transfer implementation |
US6115808A (en) * | 1998-12-30 | 2000-09-05 | Intel Corporation | Method and apparatus for performing predicate hazard detection |
US20020199086A1 (en) * | 2001-06-11 | 2002-12-26 | Broadcom Corporation | Setting execution conditions |
US20050278514A1 (en) * | 2004-06-14 | 2005-12-15 | Broadcom Corporation | Condition bits for controlling branch processing |
US20080016320A1 (en) * | 2006-06-27 | 2008-01-17 | Amitabh Menon | Vector Predicates for Sub-Word Parallel Operations |
US20100312988A1 (en) * | 2009-06-05 | 2010-12-09 | Arm Limited | Data processing apparatus and method for handling vector instructions |
US20110066813A1 (en) * | 2009-09-08 | 2011-03-17 | Advanced Micro Devices, Inc. | Method And System For Local Data Sharing |
US20120198425A1 (en) * | 2011-01-28 | 2012-08-02 | International Business Machines Corporation | Management of conditional branches within a data parallel system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112470139A (zh) * | 2018-01-08 | 2021-03-09 | 阿特拉佐有限公司 | 用于数据处理装置、系统和方法的紧凑算术加速器 |
CN112470139B (zh) * | 2018-01-08 | 2022-04-08 | 阿特拉佐有限公司 | 用于数据处理装置、系统和方法的紧凑算术加速器 |
US11392316B2 (en) * | 2019-05-24 | 2022-07-19 | Texas Instruments Incorporated | System and method for predication handling |
CN112506586A (zh) * | 2019-09-16 | 2021-03-16 | 意法半导体(格勒诺布尔2)公司 | 可编程电子设备及其操作方法 |
CN111158757A (zh) * | 2019-12-31 | 2020-05-15 | 深圳芯英科技有限公司 | 并行存取装置和方法以及芯片 |
CN111158757B (zh) * | 2019-12-31 | 2021-11-30 | 中昊芯英(杭州)科技有限公司 | 并行存取装置和方法以及芯片 |
Also Published As
Publication number | Publication date |
---|---|
CN103777924B (zh) | 2018-01-26 |
EP2725484A1 (en) | 2014-04-30 |
US9557993B2 (en) | 2017-01-31 |
US20140115301A1 (en) | 2014-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103777924A (zh) | 用于简化寄存器中对单指令多数据编程的处理器体系结构和方法 | |
US9201828B2 (en) | Memory interconnect network architecture for vector processor | |
Kaeli et al. | Heterogeneous computing with OpenCL 2.0 | |
Waidyasooriya et al. | Design of FPGA-based computing systems with OpenCL | |
CN102640131B (zh) | 并行线程处理器中的一致分支指令 | |
DE102018005181A1 (de) | Prozessoren, Verfahren und Systeme für einen konfigurierbaren, räumlichen Beschleuniger mit Leistungs-, Richtigkeits- und Energiereduktionsmerkmalen | |
US20190250924A1 (en) | Efficient work execution in a parallel computing system | |
DE102018006735A1 (de) | Prozessoren und Verfahren für konfigurierbares Clock-Gating in einem räumlichen Array | |
JP5559297B2 (ja) | ハードウェアエンドループ情報の命令へのコード化 | |
TWI490783B (zh) | 包含向量化註釋及經向量化之函式記號匹配之純量函式向量化技術 | |
TWI733798B (zh) | 在執行向量操作時管理位址衝突的設備及方法 | |
TWI603262B (zh) | 緊縮有限脈衝響應(fir)濾波器處理器,方法,系統及指令 | |
CN103777923A (zh) | Dma向量缓冲区 | |
US11789734B2 (en) | Padded vectorization with compile time known masks | |
US11947962B2 (en) | Replicate partition instruction | |
US20110078418A1 (en) | Support for Non-Local Returns in Parallel Thread SIMD Engine | |
KR20180126520A (ko) | 벡터 술어 명령 | |
CN104133748A (zh) | 用以在微处理器内组合来自多个寄存器单元的对应半字单元的方法及系统 | |
CN107851016B (zh) | 向量算术指令 | |
C. Penha et al. | ADD: Accelerator Design and Deploy‐A tool for FPGA high‐performance dataflow computing | |
JPH07244589A (ja) | 述語、及びブール式を解くためのコンピュータ・システム、及び方法 | |
Abdelhamid et al. | A scalable many-core overlay architecture on an HBM2-enabled multi-die FPGA | |
CN103777922B (zh) | 预测计数器 | |
CN110073332A (zh) | 向量生成指令 | |
Kindratenko et al. | Accelerating scientific applications with reconfigurable computing: getting started |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: ANALOG DEVICES, INC. Free format text: FORMER OWNER: ANALOG DEVICES TECHNOLOGY COMPANY Effective date: 20150105 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20150105 Address after: Bermuda (UK) Hamilton Applicant after: ANALOG DEVICES GLOBAL Address before: Bermuda (UK) Hamilton Applicant before: Analog Devices Global |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Limerick Patentee after: Analog Devices Global Unlimited Co. Address before: Bermuda (UK) Hamilton Patentee before: Analog Devices Global |
|
TR01 | Transfer of patent right |
Effective date of registration: 20210728 Address after: Limerick Patentee after: ANALOG DEVICES INTERNATIONAL UNLIMITED Co. Address before: Limerick Patentee before: Analog Devices Global Unlimited Co. |
|
TR01 | Transfer of patent right |