CN105027109B - 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法 - Google Patents

具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法 Download PDF

Info

Publication number
CN105027109B
CN105027109B CN201480012332.XA CN201480012332A CN105027109B CN 105027109 B CN105027109 B CN 105027109B CN 201480012332 A CN201480012332 A CN 201480012332A CN 105027109 B CN105027109 B CN 105027109B
Authority
CN
China
Prior art keywords
vector
vector processing
data path
input
processing block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480012332.XA
Other languages
English (en)
Chinese (zh)
Other versions
CN105027109A (zh
Inventor
R·汗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN105027109A publication Critical patent/CN105027109A/zh
Application granted granted Critical
Publication of CN105027109B publication Critical patent/CN105027109B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3893Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
    • G06F9/3895Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
    • G06F9/3897Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Advance Control (AREA)
  • Complex Calculations (AREA)
CN201480012332.XA 2013-03-13 2014-03-07 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法 Active CN105027109B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/798,641 US9495154B2 (en) 2013-03-13 2013-03-13 Vector processing engines having programmable data path configurations for providing multi-mode vector processing, and related vector processors, systems, and methods
US13/798,641 2013-03-13
PCT/US2014/022162 WO2014164367A1 (en) 2013-03-13 2014-03-07 Vector processing engines having programmable data path configurations for providing multi-mode vector processing, and related vector processors, systems, and methods

Publications (2)

Publication Number Publication Date
CN105027109A CN105027109A (zh) 2015-11-04
CN105027109B true CN105027109B (zh) 2019-03-08

Family

ID=50442637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480012332.XA Active CN105027109B (zh) 2013-03-13 2014-03-07 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法

Country Status (7)

Country Link
US (1) US9495154B2 (enExample)
EP (1) EP2972968B1 (enExample)
JP (1) JP6243000B2 (enExample)
KR (1) KR101735742B1 (enExample)
CN (1) CN105027109B (enExample)
BR (1) BR112015022852A2 (enExample)
WO (1) WO2014164367A1 (enExample)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9275014B2 (en) 2013-03-13 2016-03-01 Qualcomm Incorporated Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
JP6102528B2 (ja) * 2013-06-03 2017-03-29 富士通株式会社 信号処理装置及び信号処理方法
CN105431819A (zh) 2013-09-06 2016-03-23 华为技术有限公司 异步处理器消除亚稳态的方法和装置
US9684509B2 (en) 2013-11-15 2017-06-20 Qualcomm Incorporated Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods
US9792118B2 (en) 2013-11-15 2017-10-17 Qualcomm Incorporated Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US9619227B2 (en) 2013-11-15 2017-04-11 Qualcomm Incorporated Vector processing engines (VPEs) employing tapped-delay line(s) for providing precision correlation / covariance vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US9977676B2 (en) 2013-11-15 2018-05-22 Qualcomm Incorporated Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory, and related vector processor systems and methods
US9880845B2 (en) 2013-11-15 2018-01-30 Qualcomm Incorporated Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations, and related vector processor systems and methods
US11544214B2 (en) * 2015-02-02 2023-01-03 Optimum Semiconductor Technologies, Inc. Monolithic vector processor configured to operate on variable length vectors using a vector length register
US10162632B1 (en) * 2016-05-27 2018-12-25 Cadence Design Systems, Inc. System and method for a low-power processing architecture
FI3812900T3 (fi) * 2016-12-31 2024-02-09 Intel Corp Järjestelmät, menetelmät ja laitteet heterogeenistä laskentaa varten
US20180217838A1 (en) * 2017-02-01 2018-08-02 Futurewei Technologies, Inc. Ultra lean vector processor
US11277455B2 (en) 2018-06-07 2022-03-15 Mellanox Technologies, Ltd. Streaming system
US20200106828A1 (en) * 2018-10-02 2020-04-02 Mellanox Technologies, Ltd. Parallel Computation Network Device
US10831507B2 (en) 2018-11-21 2020-11-10 SambaNova Systems, Inc. Configuration load of a reconfigurable data processor
US11188497B2 (en) 2018-11-21 2021-11-30 SambaNova Systems, Inc. Configuration unload of a reconfigurable data processor
US11625393B2 (en) 2019-02-19 2023-04-11 Mellanox Technologies, Ltd. High performance computing system
EP3699770B1 (en) 2019-02-25 2025-05-21 Mellanox Technologies, Ltd. Collective communication system and methods
US11455368B2 (en) 2019-10-02 2022-09-27 Flex Logix Technologies, Inc. MAC processing pipeline having conversion circuitry, and methods of operating same
US12015428B2 (en) 2019-11-05 2024-06-18 Flex Logix Technologies, Inc. MAC processing pipeline using filter weights having enhanced dynamic range, and methods of operating same
US11750699B2 (en) 2020-01-15 2023-09-05 Mellanox Technologies, Ltd. Small message aggregation
US11252027B2 (en) 2020-01-23 2022-02-15 Mellanox Technologies, Ltd. Network element supporting flexible data reduction operations
US12282748B1 (en) 2020-05-26 2025-04-22 Analog Devices, Inc. Coarse floating point accumulator circuit, and MAC processing pipelines including same
US11876885B2 (en) 2020-07-02 2024-01-16 Mellanox Technologies, Ltd. Clock queue with arming and/or self-arming features
GB2597708B (en) * 2020-07-30 2022-11-02 Advanced Risc Mach Ltd Vector processing
WO2022039914A1 (en) * 2020-08-20 2022-02-24 Flex Logix Technologies, Inc. Configurable mac pipelines for finite-impulse-response filtering, and methods of operating same
US11556378B2 (en) 2020-12-14 2023-01-17 Mellanox Technologies, Ltd. Offloading execution of a multi-task parameter-dependent operation to a network device
US12455723B2 (en) 2021-02-02 2025-10-28 Analog Devices, Inc. MAC processing pipeline having activation circuitry, and methods of operating same
US12461713B2 (en) 2021-03-03 2025-11-04 Analog Devices, Inc. MAC processing pipelines, circuitry to configure same, and methods of operating same
US11327771B1 (en) 2021-07-16 2022-05-10 SambaNova Systems, Inc. Defect repair circuits for a reconfigurable data processor
US11556494B1 (en) 2021-07-16 2023-01-17 SambaNova Systems, Inc. Defect repair for a reconfigurable data processor for homogeneous subarrays
US11409540B1 (en) 2021-07-16 2022-08-09 SambaNova Systems, Inc. Routing circuits for defect repair for a reconfigurable data processor
US12309070B2 (en) 2022-04-07 2025-05-20 Nvidia Corporation In-network message aggregation for efficient small message transport
US11922237B1 (en) 2022-09-12 2024-03-05 Mellanox Technologies, Ltd. Single-step collective operations
US12242853B1 (en) * 2022-09-30 2025-03-04 Amazon Technologies, Inc. Configurable vector compute engine
US12271732B1 (en) 2022-09-30 2025-04-08 Amazon Technologies, Inc. Configuration of a deep vector engine using an opcode table, control table, and datapath table
US12489657B2 (en) 2023-08-17 2025-12-02 Mellanox Technologies, Ltd. In-network compute operation spreading

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5499375A (en) * 1993-06-03 1996-03-12 Texas Instruments Incorporated Feedback register configuration for a synchronous vector processor employing delayed and non-delayed algorithms
CN1413326A (zh) * 1999-10-25 2003-04-23 英特尔公司 在专用信号处理器中用于饱和乘法和累加的方法和装置
CN1666187A (zh) * 2002-06-28 2005-09-07 摩托罗拉公司 可重配置的流型矢量处理器
CN101359284A (zh) * 2006-02-06 2009-02-04 威盛电子股份有限公司 处理数个不同数据格式的乘法累加单元及其方法

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0681236B1 (en) 1994-05-05 2000-11-22 Conexant Systems, Inc. Space vector data path
US5805875A (en) 1996-09-13 1998-09-08 International Computer Science Institute Vector processing system with multi-operation, run-time configurable pipelines
US6006245A (en) 1996-12-20 1999-12-21 Compaq Computer Corporation Enhanced fast fourier transform technique on vector processor with operand routing and slot-selectable operation
WO1999045462A1 (de) 1998-03-03 1999-09-10 Siemens Aktiengesellschaft Datenpfad für signalverarbeitungsprozessoren
JP3940542B2 (ja) 2000-03-13 2007-07-04 株式会社ルネサステクノロジ データプロセッサ及びデータ処理システム
JP2003016051A (ja) 2001-06-29 2003-01-17 Nec Corp 複素ベクトル演算プロセッサ
US6986021B2 (en) 2001-11-30 2006-01-10 Quick Silver Technology, Inc. Apparatus, method, system and executable module for configuration and operation of adaptive integrated circuitry having fixed, application specific computational elements
US6922771B2 (en) * 2002-04-24 2005-07-26 Portalplayer, Inc. Vector floating point unit
AU2003286131A1 (en) 2002-08-07 2004-03-19 Pact Xpp Technologies Ag Method and device for processing data
US20040193837A1 (en) 2003-03-31 2004-09-30 Patrick Devaney CPU datapaths and local memory that executes either vector or superscalar instructions
US7272751B2 (en) 2004-01-15 2007-09-18 International Business Machines Corporation Error detection during processor idle cycles
KR100985110B1 (ko) 2004-01-28 2010-10-05 삼성전자주식회사 단순한 구조의 4:2 csa 셀 및 4:2 캐리 저장 가산 방법
JP4477959B2 (ja) * 2004-07-26 2010-06-09 独立行政法人理化学研究所 ブロードキャスト型並列処理のための演算処理装置
US7299342B2 (en) 2005-05-24 2007-11-20 Coresonic Ab Complex vector executing clustered SIMD micro-architecture DSP with accelerator coupled complex ALU paths each further including short multiplier/accumulator using two's complement
US20070106718A1 (en) 2005-11-04 2007-05-10 Shum Hoi L Fast fourier transform on a single-instruction-stream, multiple-data-stream processor
US7519646B2 (en) 2006-10-26 2009-04-14 Intel Corporation Reconfigurable SIMD vector processing system
US8051123B1 (en) 2006-12-15 2011-11-01 Nvidia Corporation Multipurpose functional unit with double-precision and filtering operations
DE102007014808A1 (de) 2007-03-28 2008-10-02 Texas Instruments Deutschland Gmbh Multiplizier- und Multiplizier- und Addiereinheit
JP5116499B2 (ja) * 2008-01-31 2013-01-09 三洋電機株式会社 演算処理回路
US8320478B2 (en) 2008-12-19 2012-11-27 Entropic Communications, Inc. System and method for generating a signal with a random low peak to average power ratio waveform for an orthogonal frequency division multiplexing system
US20110072236A1 (en) 2009-09-20 2011-03-24 Mimar Tibet Method for efficient and parallel color space conversion in a programmable processor
CN102768654A (zh) 2011-05-05 2012-11-07 中兴通讯股份有限公司 具有fft基2蝶运算处理能力的装置及其实现运算的方法
DE102011108576A1 (de) 2011-07-27 2013-01-31 Texas Instruments Deutschland Gmbh Selbstgetaktete Multipliziereinheit
US20130339649A1 (en) * 2012-06-15 2013-12-19 Intel Corporation Single instruction multiple data (simd) reconfigurable vector register file and permutation unit
US9275014B2 (en) 2013-03-13 2016-03-01 Qualcomm Incorporated Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
US20140280407A1 (en) 2013-03-13 2014-09-18 Qualcomm Incorporated Vector processing carry-save accumulators employing redundant carry-save format to reduce carry propagation, and related vector processors, systems, and methods
US9684509B2 (en) 2013-11-15 2017-06-20 Qualcomm Incorporated Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods
US9880845B2 (en) 2013-11-15 2018-01-30 Qualcomm Incorporated Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations, and related vector processor systems and methods
US9792118B2 (en) 2013-11-15 2017-10-17 Qualcomm Incorporated Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US20150143076A1 (en) 2013-11-15 2015-05-21 Qualcomm Incorporated VECTOR PROCESSING ENGINES (VPEs) EMPLOYING DESPREADING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT DESPREADING OF SPREAD-SPECTRUM SEQUENCES, AND RELATED VECTOR PROCESSING INSTRUCTIONS, SYSTEMS, AND METHODS
US9619227B2 (en) 2013-11-15 2017-04-11 Qualcomm Incorporated Vector processing engines (VPEs) employing tapped-delay line(s) for providing precision correlation / covariance vector processing operations with reduced sample re-fetching and power consumption, and related vector processor systems and methods
US9977676B2 (en) 2013-11-15 2018-05-22 Qualcomm Incorporated Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory, and related vector processor systems and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5499375A (en) * 1993-06-03 1996-03-12 Texas Instruments Incorporated Feedback register configuration for a synchronous vector processor employing delayed and non-delayed algorithms
CN1413326A (zh) * 1999-10-25 2003-04-23 英特尔公司 在专用信号处理器中用于饱和乘法和累加的方法和装置
CN1666187A (zh) * 2002-06-28 2005-09-07 摩托罗拉公司 可重配置的流型矢量处理器
CN101359284A (zh) * 2006-02-06 2009-02-04 威盛电子股份有限公司 处理数个不同数据格式的乘法累加单元及其方法

Also Published As

Publication number Publication date
US20140281370A1 (en) 2014-09-18
JP6243000B2 (ja) 2017-12-06
KR20150132258A (ko) 2015-11-25
CN105027109A (zh) 2015-11-04
EP2972968B1 (en) 2021-11-24
KR101735742B1 (ko) 2017-05-15
EP2972968A1 (en) 2016-01-20
WO2014164367A1 (en) 2014-10-09
JP2016517570A (ja) 2016-06-16
US9495154B2 (en) 2016-11-15
BR112015022852A2 (pt) 2017-07-18

Similar Documents

Publication Publication Date Title
CN105027109B (zh) 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法
CN104969215B (zh) 具有用于提供蝶形向量处理电路的可编程数据路径的向量处理引擎以及相关的向量处理器、系统和方法
JP6373991B2 (ja) フィルタベクトル処理動作のためのタップ付き遅延線を利用するベクトル処理エンジンと、関連するベクトル処理システムおよび方法
JP6339197B2 (ja) 実行ユニットとベクトルデータメモリとの間のマージング回路を備えるベクトル処理エンジンおよび関連する方法
CN105765523B (zh) 在向量数据存储器与执行单元之间的数据流路径中采用重排序电路系统的向量处理引擎以及相关的方法
US7568086B2 (en) Cache for instruction set architecture using indexes to achieve compression
JP2016537724A (ja) ベクトル処理動作のために実行ユニットに入力ベクトルデータのインフライトフォーマット変換を提供するためにベクトルデータメモリと実行ユニットとの間でデータフローパスにおいてフォーマット変換回路を利用するベクトル処理エンジン(vpe)および関連するベクトル処理システムと方法
JP2016537723A (ja) フィルタベクトル処理動作のためのタップ付き遅延線を利用するベクトル処理エンジンと、関連するベクトル処理システムおよび方法
JP2016537725A (ja) 実行ユニットとベクトルデータメモリとの間のデータフローパスにおいて逆拡散回路を利用するベクトル処理エンジン、および関連する方法
US20140280407A1 (en) Vector processing carry-save accumulators employing redundant carry-save format to reduce carry propagation, and related vector processors, systems, and methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant