CN109478175B - 在simd架构中用于通道混洗的混洗器电路 - Google Patents

在simd架构中用于通道混洗的混洗器电路 Download PDF

Info

Publication number
CN109478175B
CN109478175B CN201780042845.9A CN201780042845A CN109478175B CN 109478175 B CN109478175 B CN 109478175B CN 201780042845 A CN201780042845 A CN 201780042845A CN 109478175 B CN109478175 B CN 109478175B
Authority
CN
China
Prior art keywords
processing
data
processing lanes
lanes
lane
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780042845.9A
Other languages
English (en)
Chinese (zh)
Other versions
CN109478175A (zh
Inventor
韩亮
金向东
陈林
杜云
阿列克谢·弗拉狄米罗维奇·布尔德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN109478175A publication Critical patent/CN109478175A/zh
Application granted granted Critical
Publication of CN109478175B publication Critical patent/CN109478175B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4009Coupling between buses with data restructuring
    • G06F13/4013Coupling between buses with data restructuring with data re-ordering, e.g. Endian conversion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8053Vector processors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3888Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)
  • Image Processing (AREA)
  • Synchronisation In Digital Transmission Systems (AREA)
  • Time-Division Multiplex Systems (AREA)
CN201780042845.9A 2016-07-13 2017-05-19 在simd架构中用于通道混洗的混洗器电路 Active CN109478175B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US15/209,057 US10592468B2 (en) 2016-07-13 2016-07-13 Shuffler circuit for lane shuffle in SIMD architecture
US15/209,057 2016-07-13
PCT/US2017/033663 WO2018013219A1 (en) 2016-07-13 2017-05-19 Shuffler circuit for lane shuffle in simd architecture

Publications (2)

Publication Number Publication Date
CN109478175A CN109478175A (zh) 2019-03-15
CN109478175B true CN109478175B (zh) 2022-07-12

Family

ID=58779363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780042845.9A Active CN109478175B (zh) 2016-07-13 2017-05-19 在simd架构中用于通道混洗的混洗器电路

Country Status (6)

Country Link
US (1) US10592468B2 (enExample)
EP (1) EP3485385B1 (enExample)
JP (1) JP2019521445A (enExample)
KR (1) KR102118836B1 (enExample)
CN (1) CN109478175B (enExample)
WO (1) WO2018013219A1 (enExample)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10957095B2 (en) * 2018-08-06 2021-03-23 Intel Corporation Programmable ray tracing with hardware acceleration on a graphics processor
US10963300B2 (en) * 2018-12-06 2021-03-30 Raytheon Company Accelerating dataflow signal processing applications across heterogeneous CPU/GPU systems
US11397624B2 (en) * 2019-01-22 2022-07-26 Arm Limited Execution of cross-lane operations in data processing systems
US11294672B2 (en) * 2019-08-22 2022-04-05 Apple Inc. Routing circuitry for permutation of single-instruction multiple-data operands
US11256518B2 (en) 2019-10-09 2022-02-22 Apple Inc. Datapath circuitry for math operations using SIMD pipelines
US20210349717A1 (en) * 2020-05-05 2021-11-11 Intel Corporation Compaction of diverged lanes for efficient use of alus
US20220197649A1 (en) * 2020-12-22 2022-06-23 Advanced Micro Devices, Inc. General purpose register hierarchy system and method
US11360897B1 (en) * 2021-04-15 2022-06-14 Qualcomm Incorporated Adaptive memory access management
CN115793958A (zh) * 2021-09-10 2023-03-14 腾讯科技(深圳)有限公司 一种混洗数据的处理方法、相关装置、设备以及存储介质
CN115061731B (zh) * 2022-06-23 2023-05-23 摩尔线程智能科技(北京)有限责任公司 混洗电路和方法、以及芯片和集成电路装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2078912A1 (en) * 1992-01-07 1993-07-08 Robert Edward Cypher Hierarchical interconnection networks for parallel processing
US20040054877A1 (en) 2001-10-29 2004-03-18 Macy William W. Method and apparatus for shuffling data
US7343389B2 (en) 2002-05-02 2008-03-11 Intel Corporation Apparatus and method for SIMD modular multiplication
US9557994B2 (en) * 2004-07-13 2017-01-31 Arm Limited Data processing apparatus and method for performing N-way interleaving and de-interleaving operations where N is an odd plural number
US7761694B2 (en) * 2006-06-30 2010-07-20 Intel Corporation Execution unit for performing shuffle and other operations
GB2444744B (en) 2006-12-12 2011-05-25 Advanced Risc Mach Ltd Apparatus and method for performing re-arrangement operations on data
US8078836B2 (en) 2007-12-30 2011-12-13 Intel Corporation Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits
US9436469B2 (en) * 2011-12-15 2016-09-06 Intel Corporation Methods to optimize a program loop via vector instructions using a shuffle table and a mask store table
CN104185837B (zh) * 2011-12-23 2017-10-13 英特尔公司 在不同的粒度等级下广播数据值的指令执行单元
US9218182B2 (en) 2012-06-29 2015-12-22 Intel Corporation Systems, apparatuses, and methods for performing a shuffle and operation (shuffle-op)
US9342479B2 (en) 2012-08-23 2016-05-17 Qualcomm Incorporated Systems and methods of data extraction in a vector processor
US20140149480A1 (en) 2012-11-28 2014-05-29 Nvidia Corporation System, method, and computer program product for transposing a matrix
US9823924B2 (en) * 2013-01-23 2017-11-21 International Business Machines Corporation Vector element rotate and insert under mask instruction
US9405539B2 (en) 2013-07-31 2016-08-02 Intel Corporation Providing vector sub-byte decompression functionality

Also Published As

Publication number Publication date
EP3485385B1 (en) 2020-04-22
CN109478175A (zh) 2019-03-15
WO2018013219A1 (en) 2018-01-18
KR102118836B1 (ko) 2020-06-03
EP3485385A1 (en) 2019-05-22
US20180018299A1 (en) 2018-01-18
BR112019000120A2 (pt) 2019-04-09
US10592468B2 (en) 2020-03-17
BR112019000120A8 (pt) 2023-01-31
JP2019521445A (ja) 2019-07-25
KR20190028426A (ko) 2019-03-18

Similar Documents

Publication Publication Date Title
CN109478175B (zh) 在simd架构中用于通道混洗的混洗器电路
CN101055644B (zh) 绘图处理装置及其处理指令、数据和逻辑单元操作的方法
US8984043B2 (en) Multiplying and adding matrices
US6816961B2 (en) Processing architecture having field swapping capability
US8094157B1 (en) Performing an occurence count of radices
US10970081B2 (en) Stream processor with decoupled crossbar for cross lane operations
US12229215B2 (en) Performing matrix multiplication in a streaming processor
US20110072249A1 (en) Unanimous branch instructions in a parallel thread processor
CN100480997C (zh) 选择可实质同时处理的多重线程的系统与方法
US8572355B2 (en) Support for non-local returns in parallel thread SIMD engine
US20140181466A1 (en) Processors having fully-connected interconnects shared by vector conflict instructions and permute instructions
CN102279818A (zh) 支持有限共享的向量数据访存控制方法及向量存储器
JP7507304B2 (ja) レジスタデータの消去
KR20140131284A (ko) 스트리밍 메모리의 치환 동작
US9207936B2 (en) Graphic processor unit and method of operating the same
US20100325386A1 (en) Parallel operation device allowing efficient parallel operational processing
CN106716346A (zh) 对端口减少的通用寄存器的操作数冲突解决
CN101371248B (zh) 可配置的单指令多数据单元
US7035983B1 (en) System and method for facilitating communication across an asynchronous clock boundary
JP5659772B2 (ja) 演算処理装置
US9747104B2 (en) Utilizing pipeline registers as intermediate storage
US20040128475A1 (en) Widely accessible processor register file and method for use
BR112019000120B1 (pt) Circuito de embaralhamento para embaralhar faixa em arquitetura simd
KR102855717B1 (ko) 동적 웨이브 페어링
US20250378617A1 (en) Shuffle accelerator for graphics processing unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant