CN105453028B - 向量积累方法及设备 - Google Patents

向量积累方法及设备 Download PDF

Info

Publication number
CN105453028B
CN105453028B CN201480043504.XA CN201480043504A CN105453028B CN 105453028 B CN105453028 B CN 105453028B CN 201480043504 A CN201480043504 A CN 201480043504A CN 105453028 B CN105453028 B CN 105453028B
Authority
CN
China
Prior art keywords
output
vector
elements
input
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480043504.XA
Other languages
English (en)
Chinese (zh)
Other versions
CN105453028A (zh
Inventor
阿贾伊·阿南特·英格尔
马克·默里·霍夫曼
迪帕克·马修
曾贸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN105453028A publication Critical patent/CN105453028A/zh
Application granted granted Critical
Publication of CN105453028B publication Critical patent/CN105453028B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • G06F9/30038Instructions to perform operations on packed data, e.g. vector, tile or matrix operations using a mask
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Advance Control (AREA)
  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)
CN201480043504.XA 2013-08-14 2014-08-04 向量积累方法及设备 Active CN105453028B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/967,191 2013-08-14
US13/967,191 US20150052330A1 (en) 2013-08-14 2013-08-14 Vector arithmetic reduction
PCT/US2014/049604 WO2015023465A1 (en) 2013-08-14 2014-08-04 Vector accumulation method and apparatus

Publications (2)

Publication Number Publication Date
CN105453028A CN105453028A (zh) 2016-03-30
CN105453028B true CN105453028B (zh) 2019-04-09

Family

ID=51492424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480043504.XA Active CN105453028B (zh) 2013-08-14 2014-08-04 向量积累方法及设备

Country Status (6)

Country Link
US (1) US20150052330A1 (enExample)
EP (1) EP3033670B1 (enExample)
JP (1) JP2016530631A (enExample)
CN (1) CN105453028B (enExample)
TW (1) TWI507982B (enExample)
WO (1) WO2015023465A1 (enExample)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9678715B2 (en) 2014-10-30 2017-06-13 Arm Limited Multi-element comparison and multi-element addition
US20160179530A1 (en) * 2014-12-23 2016-06-23 Elmoustapha Ould-Ahmed-Vall Instruction and logic to perform a vector saturated doubleword/quadword add
US10296342B2 (en) 2016-07-02 2019-05-21 Intel Corporation Systems, apparatuses, and methods for cumulative summation
US10466967B2 (en) 2016-07-29 2019-11-05 Qualcomm Incorporated System and method for piecewise linear approximation
US10108581B1 (en) 2017-04-03 2018-10-23 Google Llc Vector reduction processor
US10331445B2 (en) * 2017-05-24 2019-06-25 Microsoft Technology Licensing, Llc Multifunction vector processor circuits
GB2574817B (en) * 2018-06-18 2021-01-06 Advanced Risc Mach Ltd Data processing systems
US11294670B2 (en) * 2019-03-27 2022-04-05 Intel Corporation Method and apparatus for performing reduction operations on a plurality of associated data element values
CN110807521B (zh) * 2019-10-29 2022-06-24 中昊芯英(杭州)科技有限公司 支持向量运算的处理装置、芯片、电子设备和方法
GB2601466A (en) * 2020-02-10 2022-06-08 Xmos Ltd Rotating accumulator
US20240004647A1 (en) * 2022-07-01 2024-01-04 Andes Technology Corporation Vector processor with vector and element reduction method
US20250208878A1 (en) * 2023-12-20 2025-06-26 Advanced Micro Devices, Inc. Accumulation apertures

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5845112A (en) * 1997-03-06 1998-12-01 Samsung Electronics Co., Ltd. Method for performing dead-zone quantization in a single processor instruction
US20080016321A1 (en) * 2006-07-11 2008-01-17 Pennock James D Interleaved hardware multithreading processor architecture
CN101398753A (zh) * 2007-09-27 2009-04-01 辉达公司 用于执行扫描运算的系统、方法及计算机程序产品
CN101436121A (zh) * 2007-11-15 2009-05-20 辉达公司 用于使用并行处理器架构执行扫描操作的方法及设备
US20100049950A1 (en) * 2008-08-15 2010-02-25 Apple Inc. Running-sum instructions for processing vectors

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4996661A (en) * 1988-10-05 1991-02-26 United Technologies Corporation Single chip complex floating point numeric processor
US5542074A (en) * 1992-10-22 1996-07-30 Maspar Computer Corporation Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount
US5717947A (en) * 1993-03-31 1998-02-10 Motorola, Inc. Data processing system and method thereof
US6058473A (en) * 1993-11-30 2000-05-02 Texas Instruments Incorporated Memory store from a register pair conditional upon a selected status bit
US5727229A (en) * 1996-02-05 1998-03-10 Motorola, Inc. Method and apparatus for moving data in a parallel processor
US6542918B1 (en) * 1996-06-21 2003-04-01 Ramot At Tel Aviv University Ltd. Prefix sums and an application thereof
US5864703A (en) * 1997-10-09 1999-01-26 Mips Technologies, Inc. Method for providing extended precision in SIMD vector arithmetic operations
US6418529B1 (en) * 1998-03-31 2002-07-09 Intel Corporation Apparatus and method for performing intra-add operation
US7395302B2 (en) * 1998-03-31 2008-07-01 Intel Corporation Method and apparatus for performing horizontal addition and subtraction
US6295597B1 (en) * 1998-08-11 2001-09-25 Cray, Inc. Apparatus and method for improved vector processing to support extended-length integer arithmetic
US6192384B1 (en) * 1998-09-14 2001-02-20 The Board Of Trustees Of The Leland Stanford Junior University System and method for performing compound vector operations
US6324638B1 (en) * 1999-03-31 2001-11-27 International Business Machines Corporation Processor having vector processing capability and method for executing a vector instruction in a processor
US7624138B2 (en) * 2001-10-29 2009-11-24 Intel Corporation Method and apparatus for efficient integer transform
US6920545B2 (en) * 2002-01-17 2005-07-19 Raytheon Company Reconfigurable processor with alternately interconnected arithmetic and memory nodes of crossbar switched cluster
US7376812B1 (en) * 2002-05-13 2008-05-20 Tensilica, Inc. Vector co-processor for configurable and extensible processor architecture
US7159099B2 (en) * 2002-06-28 2007-01-02 Motorola, Inc. Streaming vector processor with reconfigurable interconnection switch
US7051186B2 (en) * 2002-08-29 2006-05-23 International Business Machines Corporation Selective bypassing of a multi-port register file
TWI221562B (en) * 2002-12-12 2004-10-01 Chung Shan Inst Of Science C6x_VSP-C6x vector signal processor
US7293056B2 (en) * 2002-12-18 2007-11-06 Intel Corporation Variable width, at least six-way addition/accumulation instructions
US20040193847A1 (en) * 2003-03-31 2004-09-30 Lee Ruby B. Intra-register subword-add instructions
EP1623307B1 (en) * 2003-05-09 2015-07-01 QUALCOMM Incorporated Processor reduction unit for accumulation of multiple operands with or without saturation
TW200504592A (en) * 2003-07-24 2005-02-01 Ind Tech Res Inst Reconfigurable apparatus with high hardware efficiency
US7797363B2 (en) * 2004-04-07 2010-09-14 Sandbridge Technologies, Inc. Processor having parallel vector multiply and reduce operations with sequential semantics
DE102006027181B4 (de) * 2006-06-12 2010-10-14 Universität Augsburg Prozessor mit internem Raster von Ausführungseinheiten
US7725518B1 (en) * 2007-08-08 2010-05-25 Nvidia Corporation Work-efficient parallel prefix sum algorithm for graphics processing units
US7895419B2 (en) * 2008-01-11 2011-02-22 International Business Machines Corporation Rotate then operate on selected bits facility and instructions therefore
US8856492B2 (en) * 2008-05-30 2014-10-07 Nxp B.V. Method for vector processing
US9176735B2 (en) * 2008-11-28 2015-11-03 Intel Corporation Digital signal processor having instruction set with one or more non-linear complex functions
US8595467B2 (en) * 2009-12-29 2013-11-26 International Business Machines Corporation Floating point collect and operate
US8667042B2 (en) * 2010-09-24 2014-03-04 Intel Corporation Functional unit for vector integer multiply add instruction
US8868885B2 (en) * 2010-11-18 2014-10-21 Ceva D.S.P. Ltd. On-the-fly permutation of vector elements for executing successive elemental instructions
WO2012134532A1 (en) * 2011-04-01 2012-10-04 Intel Corporation Vector friendly instruction format and execution thereof
US9760372B2 (en) * 2011-09-01 2017-09-12 Hewlett Packard Enterprise Development Lp Parallel processing in plural processors with result register each performing associative operation on respective column data
US9411583B2 (en) * 2011-12-22 2016-08-09 Intel Corporation Vector instruction for presenting complex conjugates of respective complex numbers
CN104081337B (zh) * 2011-12-23 2017-11-07 英特尔公司 用于响应于单个指令来执行横向部分求和的系统、装置和方法
WO2013095631A1 (en) * 2011-12-23 2013-06-27 Intel Corporation Systems, apparatuses, and methods for performing a butterfly horizontal and cross add or substract in response to a single instruction
US9823924B2 (en) * 2013-01-23 2017-11-21 International Business Machines Corporation Vector element rotate and insert under mask instruction
JP6079433B2 (ja) * 2013-05-23 2017-02-15 富士通株式会社 移動平均処理プログラム、及びプロセッサ

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5845112A (en) * 1997-03-06 1998-12-01 Samsung Electronics Co., Ltd. Method for performing dead-zone quantization in a single processor instruction
US20080016321A1 (en) * 2006-07-11 2008-01-17 Pennock James D Interleaved hardware multithreading processor architecture
CN101398753A (zh) * 2007-09-27 2009-04-01 辉达公司 用于执行扫描运算的系统、方法及计算机程序产品
CN101436121A (zh) * 2007-11-15 2009-05-20 辉达公司 用于使用并行处理器架构执行扫描操作的方法及设备
US20100049950A1 (en) * 2008-08-15 2010-02-25 Apple Inc. Running-sum instructions for processing vectors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Parallel prefix adders";Kostas Vitoroulis;《http://users.encs.concordia.ca/~asim/COEN_6501/Lecture_Notes/Parallel%20prefix%20adders%20presentation.ppt》;20061231;演示幻灯片第1-35页 *

Also Published As

Publication number Publication date
CN105453028A (zh) 2016-03-30
TWI507982B (zh) 2015-11-11
JP2016530631A (ja) 2016-09-29
EP3033670B1 (en) 2019-11-06
WO2015023465A1 (en) 2015-02-19
US20150052330A1 (en) 2015-02-19
TW201519090A (zh) 2015-05-16
EP3033670A1 (en) 2016-06-22

Similar Documents

Publication Publication Date Title
CN105453028B (zh) 向量积累方法及设备
EP3026549B1 (en) Systems and methods of data extraction in a vector processor
CN105027109B (zh) 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法
TWI601066B (zh) 具有用於提供多模基-2x蝶形向量處理電路的可程式設計資料路徑的向量處理引擎以及相關的向量處理器、系統和方法
CN105009075B (zh) 具有水平置换的向量间接元素垂直寻址模式
US7962718B2 (en) Methods for performing extended table lookups using SIMD vector permutation instructions that support out-of-range index values
US8713285B2 (en) Address generation unit for accessing a multi-dimensional data structure in a desired pattern
US11372804B2 (en) System and method of loading and replication of sub-vector values
EP2909713B1 (en) Selective coupling of an address line to an element bank of a vector register file
CN100437547C (zh) 具有级联simd结构的数字信号处理器及其信号处理方法
CN112639839A (zh) 神经网络的运算装置及其控制方法
CN101061460B (zh) 用于混移运算的微处理器设备和方法
CN107851013A (zh) 元素大小增加指令
CN108319559A (zh) 用于控制矢量内存存取的数据处理装置及方法
US8843730B2 (en) Executing instruction packet with multiple instructions with same destination by performing logical operation on results of instructions and storing the result to the destination
US20100145993A1 (en) Address Generation Unit Using End Point Patterns to Scan Multi-Dimensional Data Structures
JP6687803B2 (ja) 区分線形近似のためのシステムおよび方法
CN107229446A (zh) 一种音频数据处理器
CN106610817A (zh) 用于采取vliw处理器中的相同执行数据包中的常数扩展槽指定或扩展常数位数的方法
JP2013521577A (ja) 階層型の超長命令パケットを処理するシステムおよび方法
US7441099B2 (en) Configurable SIMD processor instruction specifying index to LUT storing information for different operation and memory location for each processing unit
US20180081803A1 (en) Data storage at contiguous memory addresses
US8046569B2 (en) Processing element having dual control stores to minimize branch latency
WO2021228392A1 (en) Device and method for data processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant