TWI276972B - Efficient multiplication of small matrices using SIMD registers - Google Patents

Efficient multiplication of small matrices using SIMD registers Download PDF

Info

Publication number
TWI276972B
TWI276972B TW092131106A TW92131106A TWI276972B TW I276972 B TWI276972 B TW I276972B TW 092131106 A TW092131106 A TW 092131106A TW 92131106 A TW92131106 A TW 92131106A TW I276972 B TWI276972 B TW I276972B
Authority
TW
Taiwan
Prior art keywords
matrix
diagonal
multiplier
row
multiplicand
Prior art date
Application number
TW092131106A
Other languages
English (en)
Chinese (zh)
Other versions
TW200413947A (en
Inventor
William Macy Jr
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of TW200413947A publication Critical patent/TW200413947A/zh
Application granted granted Critical
Publication of TWI276972B publication Critical patent/TWI276972B/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)
TW092131106A 2002-12-20 2003-11-06 Efficient multiplication of small matrices using SIMD registers TWI276972B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/327,445 US20040122887A1 (en) 2002-12-20 2002-12-20 Efficient multiplication of small matrices using SIMD registers

Publications (2)

Publication Number Publication Date
TW200413947A TW200413947A (en) 2004-08-01
TWI276972B true TWI276972B (en) 2007-03-21

Family

ID=32594254

Family Applications (1)

Application Number Title Priority Date Filing Date
TW092131106A TWI276972B (en) 2002-12-20 2003-11-06 Efficient multiplication of small matrices using SIMD registers

Country Status (8)

Country Link
US (1) US20040122887A1 (de)
CN (1) CN1774709A (de)
AU (1) AU2003291170A1 (de)
DE (1) DE10393918T5 (de)
GB (1) GB2410108B (de)
HK (1) HK1074504A1 (de)
TW (1) TWI276972B (de)
WO (1) WO2004061705A2 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8671268B2 (en) 2005-05-05 2014-03-11 Icera, Inc. Apparatus and method for configurable processing

Families Citing this family (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050071405A1 (en) * 2003-09-29 2005-03-31 International Business Machines Corporation Method and structure for producing high performance linear algebra routines using level 3 prefetching for kernel routines
EP2011018B1 (de) 2006-04-12 2016-07-13 Soft Machines, Inc. Vorrichtung und verfahren zur verarbeitung einer instruktionsmatrix zur definition paralleler und abhängiger operationen
US7844352B2 (en) * 2006-10-20 2010-11-30 Lehigh University Iterative matrix processor based implementation of real-time model predictive control
EP2527972A3 (de) 2006-11-14 2014-08-06 Soft Machines, Inc. Vorrichtung und Verfahren zum Verarbeiten von komplexen Anweisungsformaten in einer Multi-Thread-Architektur, die verschiedene Kontextschaltungsmodi und Visualisierungsschemen unterstützt
WO2008126041A1 (en) * 2007-04-16 2008-10-23 Nxp B.V. Method of storing data, method of loading data and signal processor
US8533251B2 (en) 2008-05-23 2013-09-10 International Business Machines Corporation Optimized corner turns for local storage and bandwidth reduction
US8250130B2 (en) * 2008-05-30 2012-08-21 International Business Machines Corporation Reducing bandwidth requirements for matrix multiplication
WO2012037491A2 (en) 2010-09-17 2012-03-22 Soft Machines, Inc. Single cycle multi-branch prediction including shadow cache for early far branch prediction
EP2689326B1 (de) 2011-03-25 2022-11-16 Intel Corporation Speicherfragmente zur unterstützung einer codeblockausführung mittels durch partitionierbare engines realisierter virtueller kerne
CN108376097B (zh) 2011-03-25 2022-04-15 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
TWI533129B (zh) 2011-03-25 2016-05-11 軟體機器公司 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊
TWI666551B (zh) 2011-05-20 2019-07-21 美商英特爾股份有限公司 以複數個引擎作資源與互連結構的分散式分配以支援指令序列的執行
EP2710480B1 (de) 2011-05-20 2018-06-20 Intel Corporation Verbindungsstruktur zur unterstützung der ausführung von instruktionssequenzen durch mehrere maschinen
CN102446160B (zh) * 2011-09-06 2015-02-18 中国人民解放军国防科学技术大学 面向双精度simd部件的矩阵乘实现方法
CN104040491B (zh) 2011-11-22 2018-06-12 英特尔公司 微处理器加速的代码优化器
IN2014CN03678A (de) 2011-11-22 2015-09-25 Soft Machines Inc
US9960917B2 (en) * 2011-12-22 2018-05-01 Intel Corporation Matrix multiply accumulate instruction
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
WO2014151018A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for executing multithreaded instructions grouped onto blocks
WO2014151043A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US9384168B2 (en) 2013-06-11 2016-07-05 Analog Devices Global Vector matrix product accelerator for microprocessor integration
US9426434B1 (en) * 2014-04-21 2016-08-23 Ambarella, Inc. Two-dimensional transformation with minimum buffering
US20170046153A1 (en) * 2015-08-14 2017-02-16 Qualcomm Incorporated Simd multiply and horizontal reduce operations
US9870341B2 (en) * 2016-03-18 2018-01-16 Qualcomm Incorporated Memory reduction method for fixed point matrix multiply
CN109074845B (zh) * 2016-03-23 2023-07-14 Gsi 科技公司 存储器内矩阵乘法及其在神经网络中的使用
CN107315574B (zh) * 2016-04-26 2021-01-01 安徽寒武纪信息科技有限公司 一种用于执行矩阵乘运算的装置和方法
US20170344876A1 (en) * 2016-05-31 2017-11-30 Samsung Electronics Co., Ltd. Efficient sparse parallel winograd-based convolution scheme
US10275243B2 (en) 2016-07-02 2019-04-30 Intel Corporation Interruptible and restartable matrix multiplication instructions, processors, methods, and systems
JP6786948B2 (ja) * 2016-08-12 2020-11-18 富士通株式会社 演算処理装置及び演算処理装置の制御方法
US20180113840A1 (en) * 2016-10-25 2018-04-26 Wisconsin Alumni Research Foundation Matrix Processor with Localized Memory
US10528321B2 (en) * 2016-12-07 2020-01-07 Microsoft Technology Licensing, Llc Block floating point for neural network implementations
US10489480B2 (en) * 2017-01-22 2019-11-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device
US10817587B2 (en) * 2017-02-28 2020-10-27 Texas Instruments Incorporated Reconfigurable matrix multiplier system and method
BR112019022916A2 (pt) 2017-05-17 2020-05-26 Google Llc Unidade de multiplicação de matrizes de baixa latência
GB2563878B (en) 2017-06-28 2019-11-20 Advanced Risc Mach Ltd Register-based matrix multiplication
US10534838B2 (en) * 2017-09-29 2020-01-14 Intel Corporation Bit matrix multiplication
US10346163B2 (en) * 2017-11-01 2019-07-09 Apple Inc. Matrix computation engine
CN109871236A (zh) * 2017-12-01 2019-06-11 超威半导体公司 具有低功率并行矩阵乘法流水线的流处理器
US11093580B2 (en) * 2018-10-31 2021-08-17 Advanced Micro Devices, Inc. Matrix multiplier with submatrix sequencing
KR20200082617A (ko) * 2018-12-31 2020-07-08 삼성전자주식회사 메모리 장치를 이용한 계산 방법 및 이를 수행하는 메모리 장치
US10872038B1 (en) * 2019-09-30 2020-12-22 Facebook, Inc. Memory organization for matrix processing
CN110780849B (zh) * 2019-10-29 2021-11-30 中昊芯英(杭州)科技有限公司 矩阵处理方法、装置、设备及计算机可读存储介质
CN113536220A (zh) * 2020-04-21 2021-10-22 中科寒武纪科技股份有限公司 运算方法、处理器及相关产品
CN112433760B (zh) * 2020-11-27 2022-09-23 海光信息技术股份有限公司 数据排序方法和数据排序电路
CN114090956B (zh) * 2021-11-18 2024-05-10 深圳市比昂芯科技有限公司 一种矩阵数据处理方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5170370A (en) * 1989-11-17 1992-12-08 Cray Research, Inc. Vector bit-matrix multiply functional unit
US6115812A (en) * 1998-04-01 2000-09-05 Intel Corporation Method and apparatus for efficient vertical SIMD computations
JP2003242133A (ja) * 2002-02-19 2003-08-29 Matsushita Electric Ind Co Ltd 行列演算装置
US20040047466A1 (en) * 2002-09-06 2004-03-11 Joel Feldman Advanced encryption standard hardware accelerator and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8671268B2 (en) 2005-05-05 2014-03-11 Icera, Inc. Apparatus and method for configurable processing

Also Published As

Publication number Publication date
GB2410108A (en) 2005-07-20
TW200413947A (en) 2004-08-01
AU2003291170A1 (en) 2004-07-29
HK1074504A1 (en) 2005-11-11
GB2410108B (en) 2006-09-13
WO2004061705A3 (en) 2005-08-11
GB0508682D0 (en) 2005-06-08
US20040122887A1 (en) 2004-06-24
WO2004061705A2 (en) 2004-07-22
DE10393918T5 (de) 2006-03-16
CN1774709A (zh) 2006-05-17

Similar Documents

Publication Publication Date Title
TWI276972B (en) Efficient multiplication of small matrices using SIMD registers
US8984043B2 (en) Multiplying and adding matrices
JP4064989B2 (ja) パック・データの乗加算演算を実行する装置
JP4750157B2 (ja) データを右方向平行シフトマージする方法及び装置
US6385634B1 (en) Method for performing multiply-add operations on packed data
EP1302848B1 (de) Ein Mikroprozessor mit Mulitplizierungsoperation
JP7454377B2 (ja) データ処理装置における拡大算術計算
CN114391135A (zh) 用于对连续分配数据执行存储器内处理操作的方法及相关存储器装置和系统
JP2020533691A (ja) Simd命令を用いた効率的な直接畳み込み
US20020010730A1 (en) Accelerated montgomery exponentiation using plural multipliers
US20030084082A1 (en) Apparatus and method for efficient filtering and convolution of content data
TWI325571B (en) Systems and methods of indexed load and store operations in a dual-mode computer processor
JP2001527673A (ja) モントゴメリー乗算に基づくモジュラ乗算及び累乗の改善された装置と方法
JP2009169935A (ja) 並列プロセッサアーキテクチャを使用して単一ビット値のシーケンスに対してスキャン演算を実施するためのシステム、方法及びコンピュータプログラム製品
BR9612911B1 (pt) aparelho e mÉtodo para realizar operaÇÕes multiplicaÇço-adiÇço em dados em pacote.
US20120131308A1 (en) System, device, and method for on-the-fly permutations of vector memories for executing intra-vector operations
JP4349265B2 (ja) プロセッサ
JP2020508512A (ja) データ処理装置における乗累算
TWI780116B (zh) 用於資料處理設備、方法、電腦可讀式儲存媒體及虛擬機器的向量逐元素操作
WO2014101632A1 (zh) 一种基于蒙哥马利模乘的数据处理方法
TWI243332B (en) Registers for 2-D matrix processing
JP2011141823A (ja) データ処理装置および並列演算装置
Shahbahrami et al. Matrix register file and extended subwords: two techniques for embedded media processors
JP7020555B2 (ja) 情報処理装置、情報処理方法、及びプログラム
TWI773783B (zh) 用於基於暫存器的複數處理的設備、方法、積體電路、電腦程式及電腦可讀取儲存媒體

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees