CN102053948B - 在单指令多数据多核处理器架构上转置矩阵的方法和系统 - Google Patents

在单指令多数据多核处理器架构上转置矩阵的方法和系统 Download PDF

Info

Publication number
CN102053948B
CN102053948B CN2010105375212A CN201010537521A CN102053948B CN 102053948 B CN102053948 B CN 102053948B CN 2010105375212 A CN2010105375212 A CN 2010105375212A CN 201010537521 A CN201010537521 A CN 201010537521A CN 102053948 B CN102053948 B CN 102053948B
Authority
CN
China
Prior art keywords
matrix
simd
row
format
transposed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010105375212A
Other languages
English (en)
Chinese (zh)
Other versions
CN102053948A (zh
Inventor
杰弗里·S·麦克阿利斯特
马克·A·布兰斯福德
蒂莫西·J·马林斯
尼尔森·拉米莱兹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN102053948A publication Critical patent/CN102053948A/zh
Application granted granted Critical
Publication of CN102053948B publication Critical patent/CN102053948B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/76Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
    • G06F7/78Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor
    • G06F7/785Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor having a sequence of storage locations each being individually accessible for both enqueue and dequeue operations, e.g. using a RAM
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/3804Details
    • G06F2207/3808Details concerning the type of numbers or the way they are handled
    • G06F2207/3828Multigauge devices, i.e. capable of handling packed numbers without unpacking them

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Discrete Mathematics (AREA)
  • Complex Calculations (AREA)
  • Image Processing (AREA)
CN2010105375212A 2009-11-04 2010-11-04 在单指令多数据多核处理器架构上转置矩阵的方法和系统 Expired - Fee Related CN102053948B (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/612,037 US8539201B2 (en) 2009-11-04 2009-11-04 Transposing array data on SIMD multi-core processor architectures
US12/612,037 2009-11-04

Publications (2)

Publication Number Publication Date
CN102053948A CN102053948A (zh) 2011-05-11
CN102053948B true CN102053948B (zh) 2013-09-11

Family

ID=43926625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105375212A Expired - Fee Related CN102053948B (zh) 2009-11-04 2010-11-04 在单指令多数据多核处理器架构上转置矩阵的方法和系统

Country Status (4)

Country Link
US (1) US8539201B2 (https=)
JP (1) JP5689282B2 (https=)
KR (1) KR20110079495A (https=)
CN (1) CN102053948B (https=)

Families Citing this family (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008129900A1 (ja) * 2007-04-12 2008-10-30 Nec Corporation アレイプロセッサ型データ処理装置
US8484276B2 (en) * 2009-03-18 2013-07-09 International Business Machines Corporation Processing array data on SIMD multi-core processor architectures
JP5760532B2 (ja) * 2011-03-14 2015-08-12 株式会社リコー プロセッサ装置及びその演算方法
JP6078923B2 (ja) * 2011-10-14 2017-02-15 パナソニックIpマネジメント株式会社 転置演算装置とその集積回路、および転置処理方法
SG10201604445RA (en) * 2011-12-01 2016-07-28 Univ Singapore Polymorphic heterogeneous multi-core architecture
CN102521209B (zh) * 2011-12-12 2015-03-11 浪潮电子信息产业股份有限公司 一种并行多处理器计算机的设计方法
US9898283B2 (en) * 2011-12-22 2018-02-20 Intel Corporation Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset
EP2798475A4 (en) * 2011-12-30 2016-07-13 Intel Corp TRANSPOSED INSTRUCTION
KR101893796B1 (ko) 2012-08-16 2018-10-04 삼성전자주식회사 동적 데이터 구성을 위한 방법 및 장치
CN102929724B (zh) * 2012-11-06 2016-04-13 无锡江南计算技术研究所 基于异构众核处理器的多级访存方法、离散访存方法
US9406015B2 (en) 2013-12-27 2016-08-02 International Business Machines Corporation Transform for a neurosynaptic core circuit
US9412063B2 (en) 2013-12-27 2016-08-09 International Business Machines Corporation Transform architecture for multiple neurosynaptic core circuits
TWI570573B (zh) 2014-07-08 2017-02-11 財團法人工業技術研究院 矩陣轉置電路
SE539721C2 (en) * 2014-07-09 2017-11-07 Device and method for performing a Fourier transform on a three dimensional data set
KR102452945B1 (ko) * 2015-08-27 2022-10-11 삼성전자주식회사 푸리에 변환을 수행하는 방법 및 장치
US10635909B2 (en) * 2015-12-30 2020-04-28 Texas Instruments Incorporated Vehicle control with efficient iterative triangulation
US10095445B2 (en) * 2016-03-29 2018-10-09 Western Digital Technologies, Inc. Systems and methods for offloading processing from a host to storage processing units using an interconnect network
US10275243B2 (en) 2016-07-02 2019-04-30 Intel Corporation Interruptible and restartable matrix multiplication instructions, processors, methods, and systems
KR102526754B1 (ko) 2016-07-13 2023-04-27 삼성전자주식회사 3차원 영상 처리 방법 및 장치
KR102654862B1 (ko) * 2016-08-31 2024-04-05 삼성전자주식회사 영상 처리 방법 및 장치
EP4160449A1 (en) * 2016-12-30 2023-04-05 Intel Corporation Deep learning hardware
US10169296B2 (en) 2016-12-30 2019-01-01 Intel Corporation Distributed matrix multiplication for neural networks
US11748625B2 (en) 2016-12-30 2023-09-05 Intel Corporation Distributed convolution for neural networks
WO2018174928A1 (en) 2017-03-20 2018-09-27 Intel Corporation Systems, methods, and apparatuses for zeroing a matrix
CN107168683B (zh) * 2017-05-05 2020-06-09 中国科学院软件研究所 申威26010众核cpu上gemm稠密矩阵乘高性能实现方法
WO2019009870A1 (en) 2017-07-01 2019-01-10 Intel Corporation SAVE BACKGROUND TO VARIABLE BACKUP STATUS SIZE
KR102494412B1 (ko) * 2017-11-28 2023-02-03 삼성전자 주식회사 Simd 연산을 이용하여 이미지 데이터의 주파수 변환을 수행하는 전자 장치 및 전자 장치의 동작 방법
US11669326B2 (en) 2017-12-29 2023-06-06 Intel Corporation Systems, methods, and apparatuses for dot product operations
US11023235B2 (en) 2017-12-29 2021-06-01 Intel Corporation Systems and methods to zero a tile register pair
US11093247B2 (en) 2017-12-29 2021-08-17 Intel Corporation Systems and methods to load a tile register pair
US11816483B2 (en) 2017-12-29 2023-11-14 Intel Corporation Systems, methods, and apparatuses for matrix operations
US11809869B2 (en) 2017-12-29 2023-11-07 Intel Corporation Systems and methods to store a tile register pair to memory
US11789729B2 (en) 2017-12-29 2023-10-17 Intel Corporation Systems and methods for computing dot products of nibbles in two tile operands
US10664287B2 (en) 2018-03-30 2020-05-26 Intel Corporation Systems and methods for implementing chained tile operations
US11093579B2 (en) 2018-09-05 2021-08-17 Intel Corporation FP16-S7E8 mixed precision for deep learning and other algorithms
US11579883B2 (en) 2018-09-14 2023-02-14 Intel Corporation Systems and methods for performing horizontal tile operations
US10970076B2 (en) 2018-09-14 2021-04-06 Intel Corporation Systems and methods for performing instructions specifying ternary tile logic operations
US10990396B2 (en) 2018-09-27 2021-04-27 Intel Corporation Systems for performing instructions to quickly convert and use tiles as 1D vectors
US10866786B2 (en) 2018-09-27 2020-12-15 Intel Corporation Systems and methods for performing instructions to transpose rectangular tiles
US10719323B2 (en) 2018-09-27 2020-07-21 Intel Corporation Systems and methods for performing matrix compress and decompress instructions
US10929143B2 (en) 2018-09-28 2021-02-23 Intel Corporation Method and apparatus for efficient matrix alignment in a systolic array
US10963256B2 (en) 2018-09-28 2021-03-30 Intel Corporation Systems and methods for performing instructions to transform matrices into row-interleaved format
US10896043B2 (en) 2018-09-28 2021-01-19 Intel Corporation Systems for performing instructions for fast element unpacking into 2-dimensional registers
US10963246B2 (en) 2018-11-09 2021-03-30 Intel Corporation Systems and methods for performing 16-bit floating-point matrix dot product instructions
CN111338974B (zh) * 2018-12-19 2025-05-16 超威半导体公司 用于矩阵数学指令集的图块化算法
US10929503B2 (en) 2018-12-21 2021-02-23 Intel Corporation Apparatus and method for a masked multiply instruction to support neural network pruning operations
US11294671B2 (en) 2018-12-26 2022-04-05 Intel Corporation Systems and methods for performing duplicate detection instructions on 2D data
US11886875B2 (en) 2018-12-26 2024-01-30 Intel Corporation Systems and methods for performing nibble-sized operations on matrix elements
US20200210517A1 (en) 2018-12-27 2020-07-02 Intel Corporation Systems and methods to accelerate multiplication of sparse matrices
US10942985B2 (en) 2018-12-29 2021-03-09 Intel Corporation Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions
US10922077B2 (en) 2018-12-29 2021-02-16 Intel Corporation Apparatuses, methods, and systems for stencil configuration and computation instructions
US10853559B2 (en) 2019-03-27 2020-12-01 Charter Communications Operating, Llc Symmetric text replacement
US11016731B2 (en) 2019-03-29 2021-05-25 Intel Corporation Using Fuzzy-Jbit location of floating-point multiply-accumulate results
US11269630B2 (en) 2019-03-29 2022-03-08 Intel Corporation Interleaved pipeline of floating-point adders
US11175891B2 (en) 2019-03-30 2021-11-16 Intel Corporation Systems and methods to perform floating-point addition with selected rounding
US10990397B2 (en) 2019-03-30 2021-04-27 Intel Corporation Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator
US11403097B2 (en) 2019-06-26 2022-08-02 Intel Corporation Systems and methods to skip inconsequential matrix operations
US11334647B2 (en) 2019-06-29 2022-05-17 Intel Corporation Apparatuses, methods, and systems for enhanced matrix multiplier architecture
US11714875B2 (en) 2019-12-28 2023-08-01 Intel Corporation Apparatuses, methods, and systems for instructions of a matrix operations accelerator
CN111444134A (zh) * 2020-03-24 2020-07-24 山东大学 分子动力学模拟软件的并行pme的加速优化方法及系统
US11593454B2 (en) * 2020-06-02 2023-02-28 Intel Corporation Matrix operation optimization mechanism
US11972230B2 (en) 2020-06-27 2024-04-30 Intel Corporation Matrix transpose and multiply
US12112167B2 (en) 2020-06-27 2024-10-08 Intel Corporation Matrix data scatter and gather between rows and irregularly spaced memory locations
US11941395B2 (en) 2020-09-26 2024-03-26 Intel Corporation Apparatuses, methods, and systems for instructions for 16-bit floating-point matrix dot product instructions
CN112433760B (zh) * 2020-11-27 2022-09-23 海光信息技术股份有限公司 数据排序方法和数据排序电路
US12474928B2 (en) 2020-12-22 2025-11-18 Intel Corporation Processors, methods, systems, and instructions to select and store data elements from strided data element positions in a first dimension from three source two-dimensional arrays in a result two-dimensional array
US12001385B2 (en) 2020-12-24 2024-06-04 Intel Corporation Apparatuses, methods, and systems for instructions for loading a tile of a matrix operations accelerator
US12001887B2 (en) 2020-12-24 2024-06-04 Intel Corporation Apparatuses, methods, and systems for instructions for aligning tiles of a matrix operations accelerator
KR102909878B1 (ko) 2021-06-01 2026-01-08 에스케이하이닉스 주식회사 메모리 장치, 반도체 시스템 및 데이터 처리 시스템
KR102527829B1 (ko) * 2021-08-19 2023-04-28 한국기술교육대학교 산학협력단 Cpu와 gpu를 사용하는 행렬 전치기반 2d-fft 연산 장치 및 이를 이용한 데이터 연산 방법
US20240020129A1 (en) * 2022-07-14 2024-01-18 Nxp Usa, Inc. Self-Ordering Fast Fourier Transform For Single Instruction Multiple Data Engines
EP4671986A1 (en) * 2023-02-22 2025-12-31 Denso Corporation ARITHMETIC PROCESSING DEVICE
WO2025136620A1 (en) * 2023-12-20 2025-06-26 Sony Interactive Entertainment Inc. Speeding up memory access

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619526A (zh) * 2003-11-18 2005-05-25 国际商业机器公司 用于处理矩阵数据的处理器和方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197625B1 (en) * 1997-10-09 2007-03-27 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US6243730B1 (en) * 1999-05-04 2001-06-05 Sony Electronics, Inc. Methods and systems for performing short integer chen IDCT algorithm with fused multiply/add
US6625721B1 (en) * 1999-07-26 2003-09-23 Intel Corporation Registers for 2-D matrix processing
US20030084081A1 (en) * 2001-10-27 2003-05-01 Bedros Hanounik Method and apparatus for transposing a two dimensional array
US6963341B1 (en) * 2002-06-03 2005-11-08 Tibet MIMAR Fast and flexible scan conversion and matrix transpose in a SIMD processor
US20070106718A1 (en) * 2005-11-04 2007-05-10 Shum Hoi L Fast fourier transform on a single-instruction-stream, multiple-data-stream processor
US7937567B1 (en) * 2006-11-01 2011-05-03 Nvidia Corporation Methods for scalably exploiting parallelism in a parallel processing system
US7979672B2 (en) * 2008-07-25 2011-07-12 International Business Machines Corporation Multi-core processors for 3D array transposition by logically retrieving in-place physically transposed sub-array data
US8484276B2 (en) * 2009-03-18 2013-07-09 International Business Machines Corporation Processing array data on SIMD multi-core processor architectures

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1619526A (zh) * 2003-11-18 2005-05-25 国际商业机器公司 用于处理矩阵数据的处理器和方法

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Discrete Fourier Transform on Multicore;Franz Franchetti等;《IEEE Signal Processing Magazine》;20091130;第26卷(第6期);第90-102页 *
Franz Franchetti等.Discrete Fourier Transform on Multicore.《IEEE Signal Processing Magazine》.2009,第26卷(第6期),第90-102页.
基于LS MPP的图像并行傅立叶分析技术(1)——算法的原理、分析与设计;李俊山等;《小型微型计算机系统》;20040721;第25卷(第7期);第1303-1306页 *
李俊山等.基于LS MPP的图像并行傅立叶分析技术(1)——算法的原理、分析与设计.《小型微型计算机系统》.2004,第25卷(第7期),

Also Published As

Publication number Publication date
CN102053948A (zh) 2011-05-11
JP2011100452A (ja) 2011-05-19
US20110107060A1 (en) 2011-05-05
JP5689282B2 (ja) 2015-03-25
KR20110079495A (ko) 2011-07-07
US8539201B2 (en) 2013-09-17

Similar Documents

Publication Publication Date Title
CN102053948B (zh) 在单指令多数据多核处理器架构上转置矩阵的方法和系统
JP3639323B2 (ja) メモリ分散型並列計算機による連立1次方程式計算処理方法および計算機
CN109964203B (zh) 数据并行计算设备的排序
US7979672B2 (en) Multi-core processors for 3D array transposition by logically retrieving in-place physically transposed sub-array data
Savage Extending the Hong-Kung model to memory hierarchies
JP3675537B2 (ja) 高速フーリエ変換を行うメモリ分散型並列計算機およびその方法
US20220382829A1 (en) Sparse matrix multiplication in hardware
US20040039765A1 (en) Fourier transform apparatus
Baillie et al. Cluster identification algorithms for spin models—Sequential and parallel
CN110727911A (zh) 一种矩阵的运算方法及装置、存储介质、终端
US8566267B1 (en) Method, apparatus, and article of manufacture for solving linear optimization problems
Liu Parallel and scalable sparse basic linear algebra subprograms
Al Badawi et al. Faster number theoretic transform on graphics processors for ring learning with errors based cryptography
Nakano Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models
US10339460B1 (en) Method and apparatus for autonomous synchronous computing
Hasan et al. Gpu accelerated tensor computation of hadamard product for machine learning applications
US8407172B1 (en) Method, apparatus, and article of manufacture for performing a pivot-in-place operation for a linear programming problem
Chen et al. GPU-MEME: Using graphics hardware to accelerate motif finding in DNA sequences
Afshani et al. Sorting and permuting without bank conflicts on GPUs
Ploskas et al. A computational comparison of scaling techniques for linear optimization problems on a graphical processing unit
Arefin et al. Computing large-scale distance matrices on GPU
JP5608932B2 (ja) 並列プロセッサ用のアドレス指定装置
CN115424114A (zh) 图像处理方法及装置、图像处理模型的训练方法及装置
Yin et al. TensorNTT: Architecture-Aware Optimizations for Number-Theoretic Transform on Tensor Core Unit
CN121210159B (zh) 数据处理方法、装置、计算机设备、可读存储介质和程序产品

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130911

Termination date: 20181104