TWI276972B - Efficient multiplication of small matrices using SIMD registers - Google Patents
Efficient multiplication of small matrices using SIMD registers Download PDFInfo
- Publication number
- TWI276972B TWI276972B TW092131106A TW92131106A TWI276972B TW I276972 B TWI276972 B TW I276972B TW 092131106 A TW092131106 A TW 092131106A TW 92131106 A TW92131106 A TW 92131106A TW I276972 B TWI276972 B TW I276972B
- Authority
- TW
- Taiwan
- Prior art keywords
- matrix
- diagonal
- multiplier
- row
- multiplicand
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Complex Calculations (AREA)
- Executing Machine-Instructions (AREA)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/327,445 US20040122887A1 (en) | 2002-12-20 | 2002-12-20 | Efficient multiplication of small matrices using SIMD registers |
Publications (2)
Publication Number | Publication Date |
---|---|
TW200413947A TW200413947A (en) | 2004-08-01 |
TWI276972B true TWI276972B (en) | 2007-03-21 |
Family
ID=32594254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW092131106A TWI276972B (en) | 2002-12-20 | 2003-11-06 | Efficient multiplication of small matrices using SIMD registers |
Country Status (8)
Country | Link |
---|---|
US (1) | US20040122887A1 (de) |
CN (1) | CN1774709A (de) |
AU (1) | AU2003291170A1 (de) |
DE (1) | DE10393918T5 (de) |
GB (1) | GB2410108B (de) |
HK (1) | HK1074504A1 (de) |
TW (1) | TWI276972B (de) |
WO (1) | WO2004061705A2 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8671268B2 (en) | 2005-05-05 | 2014-03-11 | Icera, Inc. | Apparatus and method for configurable processing |
Families Citing this family (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050071405A1 (en) * | 2003-09-29 | 2005-03-31 | International Business Machines Corporation | Method and structure for producing high performance linear algebra routines using level 3 prefetching for kernel routines |
EP2011018B1 (de) | 2006-04-12 | 2016-07-13 | Soft Machines, Inc. | Vorrichtung und verfahren zur verarbeitung einer instruktionsmatrix zur definition paralleler und abhängiger operationen |
US7844352B2 (en) * | 2006-10-20 | 2010-11-30 | Lehigh University | Iterative matrix processor based implementation of real-time model predictive control |
EP2527972A3 (de) | 2006-11-14 | 2014-08-06 | Soft Machines, Inc. | Vorrichtung und Verfahren zum Verarbeiten von komplexen Anweisungsformaten in einer Multi-Thread-Architektur, die verschiedene Kontextschaltungsmodi und Visualisierungsschemen unterstützt |
WO2008126041A1 (en) * | 2007-04-16 | 2008-10-23 | Nxp B.V. | Method of storing data, method of loading data and signal processor |
US8533251B2 (en) | 2008-05-23 | 2013-09-10 | International Business Machines Corporation | Optimized corner turns for local storage and bandwidth reduction |
US8250130B2 (en) * | 2008-05-30 | 2012-08-21 | International Business Machines Corporation | Reducing bandwidth requirements for matrix multiplication |
WO2012037491A2 (en) | 2010-09-17 | 2012-03-22 | Soft Machines, Inc. | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
EP2689326B1 (de) | 2011-03-25 | 2022-11-16 | Intel Corporation | Speicherfragmente zur unterstützung einer codeblockausführung mittels durch partitionierbare engines realisierter virtueller kerne |
CN108376097B (zh) | 2011-03-25 | 2022-04-15 | 英特尔公司 | 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 |
TWI533129B (zh) | 2011-03-25 | 2016-05-11 | 軟體機器公司 | 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊 |
TWI666551B (zh) | 2011-05-20 | 2019-07-21 | 美商英特爾股份有限公司 | 以複數個引擎作資源與互連結構的分散式分配以支援指令序列的執行 |
EP2710480B1 (de) | 2011-05-20 | 2018-06-20 | Intel Corporation | Verbindungsstruktur zur unterstützung der ausführung von instruktionssequenzen durch mehrere maschinen |
CN102446160B (zh) * | 2011-09-06 | 2015-02-18 | 中国人民解放军国防科学技术大学 | 面向双精度simd部件的矩阵乘实现方法 |
CN104040491B (zh) | 2011-11-22 | 2018-06-12 | 英特尔公司 | 微处理器加速的代码优化器 |
IN2014CN03678A (de) | 2011-11-22 | 2015-09-25 | Soft Machines Inc | |
US9960917B2 (en) * | 2011-12-22 | 2018-05-01 | Intel Corporation | Matrix multiply accumulate instruction |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
WO2014151018A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for executing multithreaded instructions grouped onto blocks |
WO2014151043A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9384168B2 (en) | 2013-06-11 | 2016-07-05 | Analog Devices Global | Vector matrix product accelerator for microprocessor integration |
US9426434B1 (en) * | 2014-04-21 | 2016-08-23 | Ambarella, Inc. | Two-dimensional transformation with minimum buffering |
US20170046153A1 (en) * | 2015-08-14 | 2017-02-16 | Qualcomm Incorporated | Simd multiply and horizontal reduce operations |
US9870341B2 (en) * | 2016-03-18 | 2018-01-16 | Qualcomm Incorporated | Memory reduction method for fixed point matrix multiply |
CN109074845B (zh) * | 2016-03-23 | 2023-07-14 | Gsi 科技公司 | 存储器内矩阵乘法及其在神经网络中的使用 |
CN107315574B (zh) * | 2016-04-26 | 2021-01-01 | 安徽寒武纪信息科技有限公司 | 一种用于执行矩阵乘运算的装置和方法 |
US20170344876A1 (en) * | 2016-05-31 | 2017-11-30 | Samsung Electronics Co., Ltd. | Efficient sparse parallel winograd-based convolution scheme |
US10275243B2 (en) | 2016-07-02 | 2019-04-30 | Intel Corporation | Interruptible and restartable matrix multiplication instructions, processors, methods, and systems |
JP6786948B2 (ja) * | 2016-08-12 | 2020-11-18 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
US20180113840A1 (en) * | 2016-10-25 | 2018-04-26 | Wisconsin Alumni Research Foundation | Matrix Processor with Localized Memory |
US10528321B2 (en) * | 2016-12-07 | 2020-01-07 | Microsoft Technology Licensing, Llc | Block floating point for neural network implementations |
US10489480B2 (en) * | 2017-01-22 | 2019-11-26 | Gsi Technology Inc. | Sparse matrix multiplication in associative memory device |
US10817587B2 (en) * | 2017-02-28 | 2020-10-27 | Texas Instruments Incorporated | Reconfigurable matrix multiplier system and method |
BR112019022916A2 (pt) | 2017-05-17 | 2020-05-26 | Google Llc | Unidade de multiplicação de matrizes de baixa latência |
GB2563878B (en) | 2017-06-28 | 2019-11-20 | Advanced Risc Mach Ltd | Register-based matrix multiplication |
US10534838B2 (en) * | 2017-09-29 | 2020-01-14 | Intel Corporation | Bit matrix multiplication |
US10346163B2 (en) * | 2017-11-01 | 2019-07-09 | Apple Inc. | Matrix computation engine |
CN109871236A (zh) * | 2017-12-01 | 2019-06-11 | 超威半导体公司 | 具有低功率并行矩阵乘法流水线的流处理器 |
US11093580B2 (en) * | 2018-10-31 | 2021-08-17 | Advanced Micro Devices, Inc. | Matrix multiplier with submatrix sequencing |
KR20200082617A (ko) * | 2018-12-31 | 2020-07-08 | 삼성전자주식회사 | 메모리 장치를 이용한 계산 방법 및 이를 수행하는 메모리 장치 |
US10872038B1 (en) * | 2019-09-30 | 2020-12-22 | Facebook, Inc. | Memory organization for matrix processing |
CN110780849B (zh) * | 2019-10-29 | 2021-11-30 | 中昊芯英(杭州)科技有限公司 | 矩阵处理方法、装置、设备及计算机可读存储介质 |
CN113536220A (zh) * | 2020-04-21 | 2021-10-22 | 中科寒武纪科技股份有限公司 | 运算方法、处理器及相关产品 |
CN112433760B (zh) * | 2020-11-27 | 2022-09-23 | 海光信息技术股份有限公司 | 数据排序方法和数据排序电路 |
CN114090956B (zh) * | 2021-11-18 | 2024-05-10 | 深圳市比昂芯科技有限公司 | 一种矩阵数据处理方法、装置、设备及存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5170370A (en) * | 1989-11-17 | 1992-12-08 | Cray Research, Inc. | Vector bit-matrix multiply functional unit |
US6115812A (en) * | 1998-04-01 | 2000-09-05 | Intel Corporation | Method and apparatus for efficient vertical SIMD computations |
JP2003242133A (ja) * | 2002-02-19 | 2003-08-29 | Matsushita Electric Ind Co Ltd | 行列演算装置 |
US20040047466A1 (en) * | 2002-09-06 | 2004-03-11 | Joel Feldman | Advanced encryption standard hardware accelerator and method |
-
2002
- 2002-12-20 US US10/327,445 patent/US20040122887A1/en not_active Abandoned
-
2003
- 2003-11-06 TW TW092131106A patent/TWI276972B/zh not_active IP Right Cessation
- 2003-11-21 DE DE10393918T patent/DE10393918T5/de not_active Ceased
- 2003-11-21 WO PCT/US2003/037564 patent/WO2004061705A2/en not_active Application Discontinuation
- 2003-11-21 GB GB0508682A patent/GB2410108B/en not_active Expired - Fee Related
- 2003-11-21 AU AU2003291170A patent/AU2003291170A1/en not_active Abandoned
- 2003-11-21 CN CNA2003801070957A patent/CN1774709A/zh active Pending
-
2005
- 2005-07-23 HK HK05106291A patent/HK1074504A1/xx not_active IP Right Cessation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8671268B2 (en) | 2005-05-05 | 2014-03-11 | Icera, Inc. | Apparatus and method for configurable processing |
Also Published As
Publication number | Publication date |
---|---|
GB2410108A (en) | 2005-07-20 |
TW200413947A (en) | 2004-08-01 |
AU2003291170A1 (en) | 2004-07-29 |
HK1074504A1 (en) | 2005-11-11 |
GB2410108B (en) | 2006-09-13 |
WO2004061705A3 (en) | 2005-08-11 |
GB0508682D0 (en) | 2005-06-08 |
US20040122887A1 (en) | 2004-06-24 |
WO2004061705A2 (en) | 2004-07-22 |
DE10393918T5 (de) | 2006-03-16 |
CN1774709A (zh) | 2006-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI276972B (en) | Efficient multiplication of small matrices using SIMD registers | |
US8984043B2 (en) | Multiplying and adding matrices | |
JP4064989B2 (ja) | パック・データの乗加算演算を実行する装置 | |
JP4750157B2 (ja) | データを右方向平行シフトマージする方法及び装置 | |
US6385634B1 (en) | Method for performing multiply-add operations on packed data | |
EP1302848B1 (de) | Ein Mikroprozessor mit Mulitplizierungsoperation | |
JP7454377B2 (ja) | データ処理装置における拡大算術計算 | |
CN114391135A (zh) | 用于对连续分配数据执行存储器内处理操作的方法及相关存储器装置和系统 | |
JP2020533691A (ja) | Simd命令を用いた効率的な直接畳み込み | |
US20020010730A1 (en) | Accelerated montgomery exponentiation using plural multipliers | |
US20030084082A1 (en) | Apparatus and method for efficient filtering and convolution of content data | |
TWI325571B (en) | Systems and methods of indexed load and store operations in a dual-mode computer processor | |
JP2001527673A (ja) | モントゴメリー乗算に基づくモジュラ乗算及び累乗の改善された装置と方法 | |
JP2009169935A (ja) | 並列プロセッサアーキテクチャを使用して単一ビット値のシーケンスに対してスキャン演算を実施するためのシステム、方法及びコンピュータプログラム製品 | |
BR9612911B1 (pt) | aparelho e mÉtodo para realizar operaÇÕes multiplicaÇço-adiÇço em dados em pacote. | |
US20120131308A1 (en) | System, device, and method for on-the-fly permutations of vector memories for executing intra-vector operations | |
JP4349265B2 (ja) | プロセッサ | |
JP2020508512A (ja) | データ処理装置における乗累算 | |
TWI780116B (zh) | 用於資料處理設備、方法、電腦可讀式儲存媒體及虛擬機器的向量逐元素操作 | |
WO2014101632A1 (zh) | 一种基于蒙哥马利模乘的数据处理方法 | |
TWI243332B (en) | Registers for 2-D matrix processing | |
JP2011141823A (ja) | データ処理装置および並列演算装置 | |
Shahbahrami et al. | Matrix register file and extended subwords: two techniques for embedded media processors | |
JP7020555B2 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
TWI773783B (zh) | 用於基於暫存器的複數處理的設備、方法、積體電路、電腦程式及電腦可讀取儲存媒體 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |