KR102118836B1 - Simd 아키텍처에서 레인 셔플을 위한 셔플러 회로 - Google Patents

Simd 아키텍처에서 레인 셔플을 위한 셔플러 회로 Download PDF

Info

Publication number
KR102118836B1
KR102118836B1 KR1020197000601A KR20197000601A KR102118836B1 KR 102118836 B1 KR102118836 B1 KR 102118836B1 KR 1020197000601 A KR1020197000601 A KR 1020197000601A KR 20197000601 A KR20197000601 A KR 20197000601A KR 102118836 B1 KR102118836 B1 KR 102118836B1
Authority
KR
South Korea
Prior art keywords
data
processing
processing lanes
lanes
lane
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
KR1020197000601A
Other languages
English (en)
Korean (ko)
Other versions
KR20190028426A (ko
Inventor
량 한
샹동 진
린 천
윈 두
알렉세이 블라디미로비치 부르드
Original Assignee
퀄컴 인코포레이티드
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 퀄컴 인코포레이티드 filed Critical 퀄컴 인코포레이티드
Publication of KR20190028426A publication Critical patent/KR20190028426A/ko
Application granted granted Critical
Publication of KR102118836B1 publication Critical patent/KR102118836B1/ko
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4009Coupling between buses with data restructuring
    • G06F13/4013Coupling between buses with data restructuring with data re-ordering, e.g. Endian conversion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3888Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)
  • Image Processing (AREA)
  • Executing Machine-Instructions (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)
KR1020197000601A 2016-07-13 2017-05-19 Simd 아키텍처에서 레인 셔플을 위한 셔플러 회로 Active KR102118836B1 (ko)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/209,057 2016-07-13
US15/209,057 US10592468B2 (en) 2016-07-13 2016-07-13 Shuffler circuit for lane shuffle in SIMD architecture
PCT/US2017/033663 WO2018013219A1 (en) 2016-07-13 2017-05-19 Shuffler circuit for lane shuffle in simd architecture

Publications (2)

Publication Number Publication Date
KR20190028426A KR20190028426A (ko) 2019-03-18
KR102118836B1 true KR102118836B1 (ko) 2020-06-03

Family

ID=58779363

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020197000601A Active KR102118836B1 (ko) 2016-07-13 2017-05-19 Simd 아키텍처에서 레인 셔플을 위한 셔플러 회로

Country Status (6)

Country Link
US (1) US10592468B2 (enExample)
EP (1) EP3485385B1 (enExample)
JP (1) JP2019521445A (enExample)
KR (1) KR102118836B1 (enExample)
CN (1) CN109478175B (enExample)
WO (1) WO2018013219A1 (enExample)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10957095B2 (en) * 2018-08-06 2021-03-23 Intel Corporation Programmable ray tracing with hardware acceleration on a graphics processor
US10963300B2 (en) * 2018-12-06 2021-03-30 Raytheon Company Accelerating dataflow signal processing applications across heterogeneous CPU/GPU systems
US11397624B2 (en) * 2019-01-22 2022-07-26 Arm Limited Execution of cross-lane operations in data processing systems
US11294672B2 (en) * 2019-08-22 2022-04-05 Apple Inc. Routing circuitry for permutation of single-instruction multiple-data operands
US11256518B2 (en) 2019-10-09 2022-02-22 Apple Inc. Datapath circuitry for math operations using SIMD pipelines
US20210349717A1 (en) * 2020-05-05 2021-11-11 Intel Corporation Compaction of diverged lanes for efficient use of alus
US20220197649A1 (en) * 2020-12-22 2022-06-23 Advanced Micro Devices, Inc. General purpose register hierarchy system and method
US11360897B1 (en) * 2021-04-15 2022-06-14 Qualcomm Incorporated Adaptive memory access management
CN115793958A (zh) * 2021-09-10 2023-03-14 腾讯科技(深圳)有限公司 一种混洗数据的处理方法、相关装置、设备以及存储介质
CN115061731B (zh) * 2022-06-23 2023-05-23 摩尔线程智能科技(北京)有限责任公司 混洗电路和方法、以及芯片和集成电路装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054877A1 (en) 2001-10-29 2004-03-18 Macy William W. Method and apparatus for shuffling data
US20130339664A1 (en) 2011-12-23 2013-12-19 Elmoustapha Ould-Ahmed-Vall Instruction execution unit that broadcasts data values at different levels of granularity
US20140059323A1 (en) 2012-08-23 2014-02-27 Qualcomm Incorporated Systems and methods of data extraction in a vector processor
US20140208067A1 (en) 2013-01-23 2014-07-24 International Business Machines Corporation Vector element rotate and insert under mask instruction

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2078912A1 (en) * 1992-01-07 1993-07-08 Robert Edward Cypher Hierarchical interconnection networks for parallel processing
US7343389B2 (en) 2002-05-02 2008-03-11 Intel Corporation Apparatus and method for SIMD modular multiplication
US9557994B2 (en) * 2004-07-13 2017-01-31 Arm Limited Data processing apparatus and method for performing N-way interleaving and de-interleaving operations where N is an odd plural number
US7761694B2 (en) * 2006-06-30 2010-07-20 Intel Corporation Execution unit for performing shuffle and other operations
GB2444744B (en) 2006-12-12 2011-05-25 Advanced Risc Mach Ltd Apparatus and method for performing re-arrangement operations on data
US8078836B2 (en) 2007-12-30 2011-12-13 Intel Corporation Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits
US9436469B2 (en) * 2011-12-15 2016-09-06 Intel Corporation Methods to optimize a program loop via vector instructions using a shuffle table and a mask store table
US9218182B2 (en) 2012-06-29 2015-12-22 Intel Corporation Systems, apparatuses, and methods for performing a shuffle and operation (shuffle-op)
US20140149480A1 (en) 2012-11-28 2014-05-29 Nvidia Corporation System, method, and computer program product for transposing a matrix
US9405539B2 (en) 2013-07-31 2016-08-02 Intel Corporation Providing vector sub-byte decompression functionality

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040054877A1 (en) 2001-10-29 2004-03-18 Macy William W. Method and apparatus for shuffling data
US20130339664A1 (en) 2011-12-23 2013-12-19 Elmoustapha Ould-Ahmed-Vall Instruction execution unit that broadcasts data values at different levels of granularity
US20140059323A1 (en) 2012-08-23 2014-02-27 Qualcomm Incorporated Systems and methods of data extraction in a vector processor
US20140208067A1 (en) 2013-01-23 2014-07-24 International Business Machines Corporation Vector element rotate and insert under mask instruction

Also Published As

Publication number Publication date
US10592468B2 (en) 2020-03-17
KR20190028426A (ko) 2019-03-18
WO2018013219A1 (en) 2018-01-18
BR112019000120A8 (pt) 2023-01-31
BR112019000120A2 (pt) 2019-04-09
CN109478175B (zh) 2022-07-12
CN109478175A (zh) 2019-03-15
US20180018299A1 (en) 2018-01-18
EP3485385B1 (en) 2020-04-22
JP2019521445A (ja) 2019-07-25
EP3485385A1 (en) 2019-05-22

Similar Documents

Publication Publication Date Title
KR102118836B1 (ko) Simd 아키텍처에서 레인 셔플을 위한 셔플러 회로
US8984043B2 (en) Multiplying and adding matrices
CA2693344C (en) Scheme for varying packing and linking in graphics systems
US9513908B2 (en) Streaming memory transpose operations
KR20100122493A (ko) 프로세서
US20140181466A1 (en) Processors having fully-connected interconnects shared by vector conflict instructions and permute instructions
CN107533460B (zh) 紧缩有限冲激响应(fir)滤波处理器、方法、系统和指令
CN102279818A (zh) 支持有限共享的向量数据访存控制方法及向量存储器
JP7507304B2 (ja) レジスタデータの消去
US20080244238A1 (en) Stream processing accelerator
US9632783B2 (en) Operand conflict resolution for reduced port general purpose register
US9350584B2 (en) Element selection unit and a method therein
US9569210B2 (en) Apparatus and method of execution unit for calculating multiple rounds of a skein hashing algorithm
CN104011617B (zh) 用于对数据字内的数据进行重新定位的可重配置设备
US12395187B2 (en) Computer architecture with data decompression support for neural network computing
JP5659772B2 (ja) 演算処理装置
KR101863483B1 (ko) 중간 스토리지로서 파이프라인 레지스터들의 활용
BR112019000120B1 (pt) Circuito de embaralhamento para embaralhar faixa em arquitetura simd
CN111831338B (zh) 临时寄存器中的按通道动态索引
CN117150192A (zh) 一种高带宽利用率的稀疏矩阵向量相乘加速装置
US20160070505A1 (en) Efficient loading and storing of data

Legal Events

Date Code Title Description
PA0105 International application

St.27 status event code: A-0-1-A10-A15-nap-PA0105

PG1501 Laying open of application

St.27 status event code: A-1-1-Q10-Q12-nap-PG1501

A201 Request for examination
E13-X000 Pre-grant limitation requested

St.27 status event code: A-2-3-E10-E13-lim-X000

P11-X000 Amendment of application requested

St.27 status event code: A-2-2-P10-P11-nap-X000

P13-X000 Application amended

St.27 status event code: A-2-2-P10-P13-nap-X000

PA0201 Request for examination

St.27 status event code: A-1-2-D10-D11-exm-PA0201

PA0302 Request for accelerated examination

St.27 status event code: A-1-2-D10-D17-exm-PA0302

St.27 status event code: A-1-2-D10-D16-exm-PA0302

E701 Decision to grant or registration of patent right
PE0701 Decision of registration

St.27 status event code: A-1-2-D10-D22-exm-PE0701

GRNT Written decision to grant
PR0701 Registration of establishment

St.27 status event code: A-2-4-F10-F11-exm-PR0701

PR1002 Payment of registration fee

St.27 status event code: A-2-2-U10-U12-oth-PR1002

Fee payment year number: 1

PG1601 Publication of registration

St.27 status event code: A-4-4-Q10-Q13-nap-PG1601

PR1001 Payment of annual fee

St.27 status event code: A-4-4-U10-U11-oth-PR1001

Fee payment year number: 4

PR1001 Payment of annual fee

St.27 status event code: A-4-4-U10-U11-oth-PR1001

Fee payment year number: 5

P22-X000 Classification modified

St.27 status event code: A-4-4-P10-P22-nap-X000