CN102053948B - 在单指令多数据多核处理器架构上转置矩阵的方法和系统 - Google Patents
在单指令多数据多核处理器架构上转置矩阵的方法和系统 Download PDFInfo
- Publication number
- CN102053948B CN102053948B CN2010105375212A CN201010537521A CN102053948B CN 102053948 B CN102053948 B CN 102053948B CN 2010105375212 A CN2010105375212 A CN 2010105375212A CN 201010537521 A CN201010537521 A CN 201010537521A CN 102053948 B CN102053948 B CN 102053948B
- Authority
- CN
- China
- Prior art keywords
- matrix
- simd
- row
- format
- transposed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/76—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
- G06F7/78—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor
- G06F7/785—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using a RAM
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3828—Multigauge devices, i.e. capable of handling packed numbers without unpacking them
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Discrete Mathematics (AREA)
- Complex Calculations (AREA)
- Image Processing (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/612,037 US8539201B2 (en) | 2009-11-04 | 2009-11-04 | Transposing array data on SIMD multi-core processor architectures |
| US12/612,037 | 2009-11-04 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102053948A CN102053948A (zh) | 2011-05-11 |
| CN102053948B true CN102053948B (zh) | 2013-09-11 |
Family
ID=43926625
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2010105375212A Expired - Fee Related CN102053948B (zh) | 2009-11-04 | 2010-11-04 | 在单指令多数据多核处理器架构上转置矩阵的方法和系统 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US8539201B2 (https=) |
| JP (1) | JP5689282B2 (https=) |
| KR (1) | KR20110079495A (https=) |
| CN (1) | CN102053948B (https=) |
Families Citing this family (73)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008129900A1 (ja) * | 2007-04-12 | 2008-10-30 | Nec Corporation | アレイプロセッサ型データ処理装置 |
| US8484276B2 (en) * | 2009-03-18 | 2013-07-09 | International Business Machines Corporation | Processing array data on SIMD multi-core processor architectures |
| JP5760532B2 (ja) * | 2011-03-14 | 2015-08-12 | 株式会社リコー | プロセッサ装置及びその演算方法 |
| JP6078923B2 (ja) * | 2011-10-14 | 2017-02-15 | パナソニックIpマネジメント株式会社 | 転置演算装置とその集積回路、および転置処理方法 |
| SG10201604445RA (en) * | 2011-12-01 | 2016-07-28 | Univ Singapore | Polymorphic heterogeneous multi-core architecture |
| CN102521209B (zh) * | 2011-12-12 | 2015-03-11 | 浪潮电子信息产业股份有限公司 | 一种并行多处理器计算机的设计方法 |
| US9898283B2 (en) * | 2011-12-22 | 2018-02-20 | Intel Corporation | Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset |
| EP2798475A4 (en) * | 2011-12-30 | 2016-07-13 | Intel Corp | TRANSPOSED INSTRUCTION |
| KR101893796B1 (ko) | 2012-08-16 | 2018-10-04 | 삼성전자주식회사 | 동적 데이터 구성을 위한 방법 및 장치 |
| CN102929724B (zh) * | 2012-11-06 | 2016-04-13 | 无锡江南计算技术研究所 | 基于异构众核处理器的多级访存方法、离散访存方法 |
| US9406015B2 (en) | 2013-12-27 | 2016-08-02 | International Business Machines Corporation | Transform for a neurosynaptic core circuit |
| US9412063B2 (en) | 2013-12-27 | 2016-08-09 | International Business Machines Corporation | Transform architecture for multiple neurosynaptic core circuits |
| TWI570573B (zh) | 2014-07-08 | 2017-02-11 | 財團法人工業技術研究院 | 矩陣轉置電路 |
| SE539721C2 (en) * | 2014-07-09 | 2017-11-07 | Device and method for performing a Fourier transform on a three dimensional data set | |
| KR102452945B1 (ko) * | 2015-08-27 | 2022-10-11 | 삼성전자주식회사 | 푸리에 변환을 수행하는 방법 및 장치 |
| US10635909B2 (en) * | 2015-12-30 | 2020-04-28 | Texas Instruments Incorporated | Vehicle control with efficient iterative triangulation |
| US10095445B2 (en) * | 2016-03-29 | 2018-10-09 | Western Digital Technologies, Inc. | Systems and methods for offloading processing from a host to storage processing units using an interconnect network |
| US10275243B2 (en) | 2016-07-02 | 2019-04-30 | Intel Corporation | Interruptible and restartable matrix multiplication instructions, processors, methods, and systems |
| KR102526754B1 (ko) | 2016-07-13 | 2023-04-27 | 삼성전자주식회사 | 3차원 영상 처리 방법 및 장치 |
| KR102654862B1 (ko) * | 2016-08-31 | 2024-04-05 | 삼성전자주식회사 | 영상 처리 방법 및 장치 |
| EP4160449A1 (en) * | 2016-12-30 | 2023-04-05 | Intel Corporation | Deep learning hardware |
| US10169296B2 (en) | 2016-12-30 | 2019-01-01 | Intel Corporation | Distributed matrix multiplication for neural networks |
| US11748625B2 (en) | 2016-12-30 | 2023-09-05 | Intel Corporation | Distributed convolution for neural networks |
| WO2018174928A1 (en) | 2017-03-20 | 2018-09-27 | Intel Corporation | Systems, methods, and apparatuses for zeroing a matrix |
| CN107168683B (zh) * | 2017-05-05 | 2020-06-09 | 中国科学院软件研究所 | 申威26010众核cpu上gemm稠密矩阵乘高性能实现方法 |
| WO2019009870A1 (en) | 2017-07-01 | 2019-01-10 | Intel Corporation | SAVE BACKGROUND TO VARIABLE BACKUP STATUS SIZE |
| KR102494412B1 (ko) * | 2017-11-28 | 2023-02-03 | 삼성전자 주식회사 | Simd 연산을 이용하여 이미지 데이터의 주파수 변환을 수행하는 전자 장치 및 전자 장치의 동작 방법 |
| US11669326B2 (en) | 2017-12-29 | 2023-06-06 | Intel Corporation | Systems, methods, and apparatuses for dot product operations |
| US11023235B2 (en) | 2017-12-29 | 2021-06-01 | Intel Corporation | Systems and methods to zero a tile register pair |
| US11093247B2 (en) | 2017-12-29 | 2021-08-17 | Intel Corporation | Systems and methods to load a tile register pair |
| US11816483B2 (en) | 2017-12-29 | 2023-11-14 | Intel Corporation | Systems, methods, and apparatuses for matrix operations |
| US11809869B2 (en) | 2017-12-29 | 2023-11-07 | Intel Corporation | Systems and methods to store a tile register pair to memory |
| US11789729B2 (en) | 2017-12-29 | 2023-10-17 | Intel Corporation | Systems and methods for computing dot products of nibbles in two tile operands |
| US10664287B2 (en) | 2018-03-30 | 2020-05-26 | Intel Corporation | Systems and methods for implementing chained tile operations |
| US11093579B2 (en) | 2018-09-05 | 2021-08-17 | Intel Corporation | FP16-S7E8 mixed precision for deep learning and other algorithms |
| US11579883B2 (en) | 2018-09-14 | 2023-02-14 | Intel Corporation | Systems and methods for performing horizontal tile operations |
| US10970076B2 (en) | 2018-09-14 | 2021-04-06 | Intel Corporation | Systems and methods for performing instructions specifying ternary tile logic operations |
| US10990396B2 (en) | 2018-09-27 | 2021-04-27 | Intel Corporation | Systems for performing instructions to quickly convert and use tiles as 1D vectors |
| US10866786B2 (en) | 2018-09-27 | 2020-12-15 | Intel Corporation | Systems and methods for performing instructions to transpose rectangular tiles |
| US10719323B2 (en) | 2018-09-27 | 2020-07-21 | Intel Corporation | Systems and methods for performing matrix compress and decompress instructions |
| US10929143B2 (en) | 2018-09-28 | 2021-02-23 | Intel Corporation | Method and apparatus for efficient matrix alignment in a systolic array |
| US10963256B2 (en) | 2018-09-28 | 2021-03-30 | Intel Corporation | Systems and methods for performing instructions to transform matrices into row-interleaved format |
| US10896043B2 (en) | 2018-09-28 | 2021-01-19 | Intel Corporation | Systems for performing instructions for fast element unpacking into 2-dimensional registers |
| US10963246B2 (en) | 2018-11-09 | 2021-03-30 | Intel Corporation | Systems and methods for performing 16-bit floating-point matrix dot product instructions |
| CN111338974B (zh) * | 2018-12-19 | 2025-05-16 | 超威半导体公司 | 用于矩阵数学指令集的图块化算法 |
| US10929503B2 (en) | 2018-12-21 | 2021-02-23 | Intel Corporation | Apparatus and method for a masked multiply instruction to support neural network pruning operations |
| US11294671B2 (en) | 2018-12-26 | 2022-04-05 | Intel Corporation | Systems and methods for performing duplicate detection instructions on 2D data |
| US11886875B2 (en) | 2018-12-26 | 2024-01-30 | Intel Corporation | Systems and methods for performing nibble-sized operations on matrix elements |
| US20200210517A1 (en) | 2018-12-27 | 2020-07-02 | Intel Corporation | Systems and methods to accelerate multiplication of sparse matrices |
| US10942985B2 (en) | 2018-12-29 | 2021-03-09 | Intel Corporation | Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions |
| US10922077B2 (en) | 2018-12-29 | 2021-02-16 | Intel Corporation | Apparatuses, methods, and systems for stencil configuration and computation instructions |
| US10853559B2 (en) | 2019-03-27 | 2020-12-01 | Charter Communications Operating, Llc | Symmetric text replacement |
| US11016731B2 (en) | 2019-03-29 | 2021-05-25 | Intel Corporation | Using Fuzzy-Jbit location of floating-point multiply-accumulate results |
| US11269630B2 (en) | 2019-03-29 | 2022-03-08 | Intel Corporation | Interleaved pipeline of floating-point adders |
| US11175891B2 (en) | 2019-03-30 | 2021-11-16 | Intel Corporation | Systems and methods to perform floating-point addition with selected rounding |
| US10990397B2 (en) | 2019-03-30 | 2021-04-27 | Intel Corporation | Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator |
| US11403097B2 (en) | 2019-06-26 | 2022-08-02 | Intel Corporation | Systems and methods to skip inconsequential matrix operations |
| US11334647B2 (en) | 2019-06-29 | 2022-05-17 | Intel Corporation | Apparatuses, methods, and systems for enhanced matrix multiplier architecture |
| US11714875B2 (en) | 2019-12-28 | 2023-08-01 | Intel Corporation | Apparatuses, methods, and systems for instructions of a matrix operations accelerator |
| CN111444134A (zh) * | 2020-03-24 | 2020-07-24 | 山东大学 | 分子动力学模拟软件的并行pme的加速优化方法及系统 |
| US11593454B2 (en) * | 2020-06-02 | 2023-02-28 | Intel Corporation | Matrix operation optimization mechanism |
| US11972230B2 (en) | 2020-06-27 | 2024-04-30 | Intel Corporation | Matrix transpose and multiply |
| US12112167B2 (en) | 2020-06-27 | 2024-10-08 | Intel Corporation | Matrix data scatter and gather between rows and irregularly spaced memory locations |
| US11941395B2 (en) | 2020-09-26 | 2024-03-26 | Intel Corporation | Apparatuses, methods, and systems for instructions for 16-bit floating-point matrix dot product instructions |
| CN112433760B (zh) * | 2020-11-27 | 2022-09-23 | 海光信息技术股份有限公司 | 数据排序方法和数据排序电路 |
| US12474928B2 (en) | 2020-12-22 | 2025-11-18 | Intel Corporation | Processors, methods, systems, and instructions to select and store data elements from strided data element positions in a first dimension from three source two-dimensional arrays in a result two-dimensional array |
| US12001385B2 (en) | 2020-12-24 | 2024-06-04 | Intel Corporation | Apparatuses, methods, and systems for instructions for loading a tile of a matrix operations accelerator |
| US12001887B2 (en) | 2020-12-24 | 2024-06-04 | Intel Corporation | Apparatuses, methods, and systems for instructions for aligning tiles of a matrix operations accelerator |
| KR102909878B1 (ko) | 2021-06-01 | 2026-01-08 | 에스케이하이닉스 주식회사 | 메모리 장치, 반도체 시스템 및 데이터 처리 시스템 |
| KR102527829B1 (ko) * | 2021-08-19 | 2023-04-28 | 한국기술교육대학교 산학협력단 | Cpu와 gpu를 사용하는 행렬 전치기반 2d-fft 연산 장치 및 이를 이용한 데이터 연산 방법 |
| US20240020129A1 (en) * | 2022-07-14 | 2024-01-18 | Nxp Usa, Inc. | Self-Ordering Fast Fourier Transform For Single Instruction Multiple Data Engines |
| EP4671986A1 (en) * | 2023-02-22 | 2025-12-31 | Denso Corporation | ARITHMETIC PROCESSING DEVICE |
| WO2025136620A1 (en) * | 2023-12-20 | 2025-06-26 | Sony Interactive Entertainment Inc. | Speeding up memory access |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1619526A (zh) * | 2003-11-18 | 2005-05-25 | 国际商业机器公司 | 用于处理矩阵数据的处理器和方法 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7197625B1 (en) * | 1997-10-09 | 2007-03-27 | Mips Technologies, Inc. | Alignment and ordering of vector elements for single instruction multiple data processing |
| US6243730B1 (en) * | 1999-05-04 | 2001-06-05 | Sony Electronics, Inc. | Methods and systems for performing short integer chen IDCT algorithm with fused multiply/add |
| US6625721B1 (en) * | 1999-07-26 | 2003-09-23 | Intel Corporation | Registers for 2-D matrix processing |
| US20030084081A1 (en) * | 2001-10-27 | 2003-05-01 | Bedros Hanounik | Method and apparatus for transposing a two dimensional array |
| US6963341B1 (en) * | 2002-06-03 | 2005-11-08 | Tibet MIMAR | Fast and flexible scan conversion and matrix transpose in a SIMD processor |
| US20070106718A1 (en) * | 2005-11-04 | 2007-05-10 | Shum Hoi L | Fast fourier transform on a single-instruction-stream, multiple-data-stream processor |
| US7937567B1 (en) * | 2006-11-01 | 2011-05-03 | Nvidia Corporation | Methods for scalably exploiting parallelism in a parallel processing system |
| US7979672B2 (en) * | 2008-07-25 | 2011-07-12 | International Business Machines Corporation | Multi-core processors for 3D array transposition by logically retrieving in-place physically transposed sub-array data |
| US8484276B2 (en) * | 2009-03-18 | 2013-07-09 | International Business Machines Corporation | Processing array data on SIMD multi-core processor architectures |
-
2009
- 2009-11-04 US US12/612,037 patent/US8539201B2/en not_active Expired - Fee Related
-
2010
- 2010-10-29 JP JP2010243281A patent/JP5689282B2/ja not_active Expired - Fee Related
- 2010-11-02 KR KR1020100108204A patent/KR20110079495A/ko not_active Ceased
- 2010-11-04 CN CN2010105375212A patent/CN102053948B/zh not_active Expired - Fee Related
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1619526A (zh) * | 2003-11-18 | 2005-05-25 | 国际商业机器公司 | 用于处理矩阵数据的处理器和方法 |
Non-Patent Citations (4)
| Title |
|---|
| Discrete Fourier Transform on Multicore;Franz Franchetti等;《IEEE Signal Processing Magazine》;20091130;第26卷(第6期);第90-102页 * |
| Franz Franchetti等.Discrete Fourier Transform on Multicore.《IEEE Signal Processing Magazine》.2009,第26卷(第6期),第90-102页. |
| 基于LS MPP的图像并行傅立叶分析技术(1)——算法的原理、分析与设计;李俊山等;《小型微型计算机系统》;20040721;第25卷(第7期);第1303-1306页 * |
| 李俊山等.基于LS MPP的图像并行傅立叶分析技术(1)——算法的原理、分析与设计.《小型微型计算机系统》.2004,第25卷(第7期), |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102053948A (zh) | 2011-05-11 |
| JP2011100452A (ja) | 2011-05-19 |
| US20110107060A1 (en) | 2011-05-05 |
| JP5689282B2 (ja) | 2015-03-25 |
| KR20110079495A (ko) | 2011-07-07 |
| US8539201B2 (en) | 2013-09-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102053948B (zh) | 在单指令多数据多核处理器架构上转置矩阵的方法和系统 | |
| JP3639323B2 (ja) | メモリ分散型並列計算機による連立1次方程式計算処理方法および計算機 | |
| CN109964203B (zh) | 数据并行计算设备的排序 | |
| US7979672B2 (en) | Multi-core processors for 3D array transposition by logically retrieving in-place physically transposed sub-array data | |
| Savage | Extending the Hong-Kung model to memory hierarchies | |
| JP3675537B2 (ja) | 高速フーリエ変換を行うメモリ分散型並列計算機およびその方法 | |
| US20220382829A1 (en) | Sparse matrix multiplication in hardware | |
| US20040039765A1 (en) | Fourier transform apparatus | |
| Baillie et al. | Cluster identification algorithms for spin models—Sequential and parallel | |
| CN110727911A (zh) | 一种矩阵的运算方法及装置、存储介质、终端 | |
| US8566267B1 (en) | Method, apparatus, and article of manufacture for solving linear optimization problems | |
| Liu | Parallel and scalable sparse basic linear algebra subprograms | |
| Al Badawi et al. | Faster number theoretic transform on graphics processors for ring learning with errors based cryptography | |
| Nakano | Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models | |
| US10339460B1 (en) | Method and apparatus for autonomous synchronous computing | |
| Hasan et al. | Gpu accelerated tensor computation of hadamard product for machine learning applications | |
| US8407172B1 (en) | Method, apparatus, and article of manufacture for performing a pivot-in-place operation for a linear programming problem | |
| Chen et al. | GPU-MEME: Using graphics hardware to accelerate motif finding in DNA sequences | |
| Afshani et al. | Sorting and permuting without bank conflicts on GPUs | |
| Ploskas et al. | A computational comparison of scaling techniques for linear optimization problems on a graphical processing unit | |
| Arefin et al. | Computing large-scale distance matrices on GPU | |
| JP5608932B2 (ja) | 並列プロセッサ用のアドレス指定装置 | |
| CN115424114A (zh) | 图像处理方法及装置、图像处理模型的训练方法及装置 | |
| Yin et al. | TensorNTT: Architecture-Aware Optimizations for Number-Theoretic Transform on Tensor Core Unit | |
| CN121210159B (zh) | 数据处理方法、装置、计算机设备、可读存储介质和程序产品 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130911 Termination date: 20181104 |