CN119556989A - 使用simd指令进行高效的直接卷积 - Google Patents

使用simd指令进行高效的直接卷积 Download PDF

Info

Publication number
CN119556989A
CN119556989A CN202311376759.5A CN202311376759A CN119556989A CN 119556989 A CN119556989 A CN 119556989A CN 202311376759 A CN202311376759 A CN 202311376759A CN 119556989 A CN119556989 A CN 119556989A
Authority
CN
China
Prior art keywords
vector
source
source vector
output
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311376759.5A
Other languages
English (en)
Chinese (zh)
Inventor
J·R·戴蒙德
A·P·帕特尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Publication of CN119556989A publication Critical patent/CN119556989A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Advance Control (AREA)
  • Complex Calculations (AREA)
CN202311376759.5A 2017-09-08 2018-09-06 使用simd指令进行高效的直接卷积 Pending CN119556989A (zh)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201762556274P 2017-09-08 2017-09-08
US62/556,274 2017-09-08
US15/941,975 US11803377B2 (en) 2017-09-08 2018-03-30 Efficient direct convolution using SIMD instructions
US15/941,975 2018-03-30
CN201880066852.7A CN111213125B (zh) 2017-09-08 2018-09-06 使用simd指令进行高效的直接卷积
PCT/US2018/049666 WO2019051027A1 (en) 2017-09-08 2018-09-06 EFFECTIVE DIRECT CONVOLUTION USING HMIS INSTRUCTIONS

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201880066852.7A Division CN111213125B (zh) 2017-09-08 2018-09-06 使用simd指令进行高效的直接卷积

Publications (1)

Publication Number Publication Date
CN119556989A true CN119556989A (zh) 2025-03-04

Family

ID=65631104

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202311376759.5A Pending CN119556989A (zh) 2017-09-08 2018-09-06 使用simd指令进行高效的直接卷积
CN201880066852.7A Active CN111213125B (zh) 2017-09-08 2018-09-06 使用simd指令进行高效的直接卷积

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201880066852.7A Active CN111213125B (zh) 2017-09-08 2018-09-06 使用simd指令进行高效的直接卷积

Country Status (5)

Country Link
US (2) US11803377B2 (enExample)
EP (1) EP3676700B1 (enExample)
JP (2) JP7335231B2 (enExample)
CN (2) CN119556989A (enExample)
WO (1) WO2019051027A1 (enExample)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747844B2 (en) * 2017-12-12 2020-08-18 Tesla, Inc. Systems and methods for converting a matrix input to a vectorized input for a matrix processor
US10565285B2 (en) * 2017-12-18 2020-02-18 International Business Machines Corporation Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations
US12099912B2 (en) 2018-06-22 2024-09-24 Samsung Electronics Co., Ltd. Neural processor
CN111813447B (zh) * 2019-04-12 2022-11-08 杭州中天微系统有限公司 一种数据拼接指令的处理方法和处理装置
US11671111B2 (en) 2019-04-17 2023-06-06 Samsung Electronics Co., Ltd. Hardware channel-parallel data compression/decompression
US11211944B2 (en) 2019-04-17 2021-12-28 Samsung Electronics Co., Ltd. Mixed-precision compression with random access
US12182577B2 (en) 2019-05-01 2024-12-31 Samsung Electronics Co., Ltd. Neural-processing unit tile for shuffling queued nibbles for multiplication with non-zero weight nibbles
US11880760B2 (en) 2019-05-01 2024-01-23 Samsung Electronics Co., Ltd. Mixed-precision NPU tile with depth-wise convolution
US20210049474A1 (en) * 2019-08-13 2021-02-18 Samsung Electronics Co., Ltd. Neural network method and apparatus
US11726950B2 (en) * 2019-09-28 2023-08-15 Intel Corporation Compute near memory convolution accelerator
US11475283B2 (en) * 2019-10-24 2022-10-18 Apple Inc. Multi dimensional convolution in neural network processor
US12112141B2 (en) 2019-12-12 2024-10-08 Samsung Electronics Co., Ltd. Accelerating 2D convolutional layer mapping on a dot product architecture
CN111178505B (zh) * 2019-12-23 2023-04-07 福建星网视易信息系统有限公司 卷积神经网络的加速方法和计算机可读存储介质
CN111797985B (zh) * 2020-07-22 2022-11-22 哈尔滨工业大学 一种基于gpu的卷积运算内存访问优化方法
KR102860334B1 (ko) * 2020-08-14 2025-09-16 삼성전자주식회사 중복성 감축 기반의 컨볼루션 연산 처리 방법 및 장치
CN112633505B (zh) * 2020-12-24 2022-05-27 苏州浪潮智能科技有限公司 一种基于risc-v的人工智能推理方法和系统
US12182570B2 (en) 2021-06-25 2024-12-31 Intel Corporation Apparatuses, methods, and systems for a packed data convolution instruction with shift control and width control
CN114443143B (zh) * 2022-01-30 2025-01-07 上海阵量智能科技有限公司 指令处理方法、装置、芯片、电子设备以及存储介质
US12443412B2 (en) 2022-01-30 2025-10-14 Simplex Micro, Inc. Method and apparatus for a scalable microprocessor with time counter
US12190116B2 (en) 2022-04-05 2025-01-07 Simplex Micro, Inc. Microprocessor with time count based instruction execution and replay
US12169716B2 (en) 2022-04-20 2024-12-17 Simplex Micro, Inc. Microprocessor with a time counter for statically dispatching extended instructions
US12141580B2 (en) 2022-04-20 2024-11-12 Simplex Micro, Inc. Microprocessor with non-cacheable memory load prediction
US12288065B2 (en) 2022-04-29 2025-04-29 Simplex Micro, Inc. Microprocessor with odd and even register sets
US12147812B2 (en) 2022-07-13 2024-11-19 Simplex Micro, Inc. Out-of-order execution of loop instructions in a microprocessor
US12282772B2 (en) 2022-07-13 2025-04-22 Simplex Micro, Inc. Vector processor with vector data buffer
US12124849B2 (en) * 2022-07-13 2024-10-22 Simplex Micro, Inc. Vector processor with extended vector registers
CN117313803B (zh) * 2023-11-28 2024-02-02 进迭时空(杭州)科技有限公司 基于risc-v向量处理器架构的滑动窗口2d卷积计算方法
CN119536744B (zh) * 2025-01-23 2025-06-17 山东浪潮科学研究院有限公司 一种代码自动向量化优化方法、设备及介质

Family Cites Families (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6489868A (en) 1987-09-30 1989-04-05 Sony Corp Video signal processing circuit
JPH03189868A (ja) * 1989-12-20 1991-08-19 Akira Iwata データ処理プロセツサ
US5734874A (en) * 1994-04-29 1998-03-31 Sun Microsystems, Inc. Central processing unit with integrated graphics functions
DE69519449T2 (de) * 1994-05-05 2001-06-21 Conexant Systems, Inc. Raumzeigersdatenpfad
US7085795B2 (en) 2001-10-29 2006-08-01 Intel Corporation Apparatus and method for efficient filtering and convolution of content data
US6061711A (en) * 1996-08-19 2000-05-09 Samsung Electronics, Inc. Efficient context saving and restoring in a multi-tasking computing system environment
US5801975A (en) * 1996-12-02 1998-09-01 Compaq Computer Corporation And Advanced Micro Devices, Inc. Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles
US5909572A (en) * 1996-12-02 1999-06-01 Compaq Computer Corp. System and method for conditionally moving an operand from a source register to a destination register
US5933650A (en) * 1997-10-09 1999-08-03 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US6115812A (en) * 1998-04-01 2000-09-05 Intel Corporation Method and apparatus for efficient vertical SIMD computations
ATE557342T1 (de) * 1998-08-24 2012-05-15 Microunity Systems Eng Prozessor und verfahren zur matrixmultiplikation mit einem breiten operand
US7685212B2 (en) 2001-10-29 2010-03-23 Intel Corporation Fast full search motion estimation with SIMD merge instruction
US7725521B2 (en) * 2001-10-29 2010-05-25 Intel Corporation Method and apparatus for computing matrix transformations
US6954841B2 (en) * 2002-06-26 2005-10-11 International Business Machines Corporation Viterbi decoding for SIMD vector processors with indirect vector element access
GB2395306B (en) * 2002-11-15 2006-02-15 Imagination Tech Ltd A configurable processor architecture
US7409415B2 (en) * 2002-12-20 2008-08-05 Texas Instruments Incorporated Processor system with efficient shift operations including EXTRACT operation
US7689641B2 (en) 2003-06-30 2010-03-30 Intel Corporation SIMD integer multiply high with round and shift
GB2409065B (en) * 2003-12-09 2006-10-25 Advanced Risc Mach Ltd Multiplexing operations in SIMD processing
GB2409063B (en) 2003-12-09 2006-07-12 Advanced Risc Mach Ltd Vector by scalar operations
US7328230B2 (en) * 2004-03-26 2008-02-05 Intel Corporation SIMD four-data element average instruction
US7315937B2 (en) * 2004-10-01 2008-01-01 Mips Technologies, Inc. Microprocessor instructions for efficient bit stream extractions
US7933405B2 (en) * 2005-04-08 2011-04-26 Icera Inc. Data access and permute unit
US7623732B1 (en) 2005-04-26 2009-11-24 Mercury Computer Systems, Inc. Method and apparatus for digital image filtering with discrete filter kernels using graphics hardware
US7529918B2 (en) * 2006-07-21 2009-05-05 Broadcom Corporation System and method for efficiently performing bit-field extraction and bit-field combination operations in a processor
US20080071851A1 (en) * 2006-09-20 2008-03-20 Ronen Zohar Instruction and logic for performing a dot-product operation
US8255884B2 (en) 2008-06-06 2012-08-28 International Business Machines Corporation Optimized scalar promotion with load and splat SIMD instructions
US20100180100A1 (en) 2009-01-13 2010-07-15 Mavrix Technology, Inc. Matrix microprocessor and method of operation
CN101923534B (zh) 2009-06-10 2012-02-01 新奥特(北京)视频技术有限公司 应用sse指令集对视音频信号的对称卷积核进行卷积的方法
US8732437B2 (en) * 2010-01-26 2014-05-20 Oracle America, Inc. Low-overhead misalignment and reformatting support for SIMD
US9363068B2 (en) 2010-08-03 2016-06-07 Intel Corporation Vector processor having instruction set with sliding window non-linear convolutional function
US20120185670A1 (en) * 2011-01-14 2012-07-19 Toll Bret L Scalar integer instructions capable of execution with three registers
US20120254589A1 (en) 2011-04-01 2012-10-04 Jesus Corbal San Adrian System, apparatus, and method for aligning registers
KR20140092852A (ko) 2011-10-27 2014-07-24 엘에스아이 코포레이션 Fir 필터링을 위한 벡터 콘볼루션 함수와 함께 명령어 집합을 갖는 벡터 프로세서
CN102495721A (zh) * 2011-12-02 2012-06-13 南京大学 一种支持fft加速的simd向量处理器
US9946540B2 (en) * 2011-12-23 2018-04-17 Intel Corporation Apparatus and method of improved permute instructions with multiple granularities
EP2798454A4 (en) 2011-12-30 2016-08-17 Intel Corp VARIABLE SIMD DISPLACEMENT AND ROTATION USING A CONTROL PANELULATION
US9275014B2 (en) * 2013-03-13 2016-03-01 Qualcomm Incorporated Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
US9477999B2 (en) * 2013-09-20 2016-10-25 The Board Of Trustees Of The Leland Stanford Junior University Low power programmable image processor
US9684509B2 (en) * 2013-11-15 2017-06-20 Qualcomm Incorporated Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods
US9442731B2 (en) * 2014-03-13 2016-09-13 Intel Corporation Packed two source inter-element shift merge processors, methods, systems, and instructions
US10402196B2 (en) 2015-05-11 2019-09-03 Ceva D.S.P. Ltd. Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients
US9582726B2 (en) * 2015-06-24 2017-02-28 Qualcomm Incorporated Systems and methods for image processing in a deep convolution network
US10459731B2 (en) * 2015-07-20 2019-10-29 Qualcomm Incorporated Sliding window operation
GB2540939B (en) * 2015-07-31 2019-01-23 Advanced Risc Mach Ltd An apparatus and method for performing a splice operation
US20170357894A1 (en) * 2016-06-10 2017-12-14 Apple Inc. Data packing for convolution of artificial neural networks
US10282204B2 (en) * 2016-07-02 2019-05-07 Intel Corporation Systems, apparatuses, and methods for strided load
CN106940815B (zh) * 2017-02-13 2020-07-28 西安交通大学 一种可编程卷积神经网络协处理器ip核
CN106991473A (zh) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 面向向量处理器的基于simd的平均值值池化并行处理方法
US10824938B2 (en) * 2017-04-24 2020-11-03 Intel Corporation Specialized fixed function hardware for efficient convolution
JP6958027B2 (ja) * 2017-07-03 2021-11-02 富士通株式会社 演算処理装置及び演算処理装置の制御方法

Also Published As

Publication number Publication date
WO2019051027A1 (en) 2019-03-14
US20190079764A1 (en) 2019-03-14
US20240012644A1 (en) 2024-01-11
JP7652507B2 (ja) 2025-03-27
US11803377B2 (en) 2023-10-31
EP3676700B1 (en) 2022-12-28
CN111213125A (zh) 2020-05-29
JP2023160833A (ja) 2023-11-02
JP7335231B2 (ja) 2023-08-29
CN111213125B (zh) 2023-11-07
JP2020533691A (ja) 2020-11-19
EP3676700A1 (en) 2020-07-08

Similar Documents

Publication Publication Date Title
CN111213125B (zh) 使用simd指令进行高效的直接卷积
CN107408037B (zh) 配置成对可变长度向量进行操作的单片向量处理器
US9164763B2 (en) Single instruction group information processing apparatus for dynamically performing transient processing associated with a repeat instruction
US8200948B2 (en) Apparatus and method for performing re-arrangement operations on data
US20050193050A1 (en) Matrix multiplication in a vector processing system
CN111656367A (zh) 神经网络加速器的系统和体系结构
CN111381880A (zh) 加载-存储指令
TWI603262B (zh) 緊縮有限脈衝響應(fir)濾波器處理器,方法,系統及指令
JP7385009B2 (ja) 圧縮支援命令
CN112434256B (zh) 矩阵乘法器和处理器
CN111433741A (zh) 向量带进位加法指令
US9965275B2 (en) Element size increasing instruction
CN112506468B (zh) 支持高吞吐多精度乘法运算的risc-v通用处理器
US20140351566A1 (en) Moving average processing in processor and processor
CN114090954A (zh) 一种基于ft-2000+的整数矩阵乘法内核优化方法
CN119836622A (zh) 多外积指令
CN112434255A (zh) 向量-矩阵运算和数据处理方法、乘法器和处理器芯片
CN121219678A (zh) 索引向量置换、向量比较和/或群体计数操作
US11669489B2 (en) Sparse systolic array design
GB2523805A (en) Data processing apparatus and method for performing vector scan operation
US20250258648A1 (en) Apparatus and method with in-register computing
US12493577B2 (en) Digital signal processor (DSP) and electronic device using the same
HK40047526A (en) Matrix multiplier and processor
HK40047526B (en) Matrix multiplier and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination