JP2020533691A5 - - Google Patents

Download PDF

Info

Publication number
JP2020533691A5
JP2020533691A5 JP2020513910A JP2020513910A JP2020533691A5 JP 2020533691 A5 JP2020533691 A5 JP 2020533691A5 JP 2020513910 A JP2020513910 A JP 2020513910A JP 2020513910 A JP2020513910 A JP 2020513910A JP 2020533691 A5 JP2020533691 A5 JP 2020533691A5
Authority
JP
Japan
Prior art keywords
vector
lane
data
vectors
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2020513910A
Other languages
English (en)
Japanese (ja)
Other versions
JP7335231B2 (ja
JP2020533691A (ja
Filing date
Publication date
Priority claimed from US15/941,975 external-priority patent/US11803377B2/en
Application filed filed Critical
Publication of JP2020533691A publication Critical patent/JP2020533691A/ja
Publication of JP2020533691A5 publication Critical patent/JP2020533691A5/ja
Priority to JP2023132932A priority Critical patent/JP7652507B2/ja
Application granted granted Critical
Publication of JP7335231B2 publication Critical patent/JP7335231B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

JP2020513910A 2017-09-08 2018-09-06 Simd命令を用いた効率的な直接畳み込み Active JP7335231B2 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023132932A JP7652507B2 (ja) 2017-09-08 2023-08-17 Simd命令を用いた効率的な直接畳み込み

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201762556274P 2017-09-08 2017-09-08
US62/556,274 2017-09-08
US15/941,975 US11803377B2 (en) 2017-09-08 2018-03-30 Efficient direct convolution using SIMD instructions
US15/941,975 2018-03-30
PCT/US2018/049666 WO2019051027A1 (en) 2017-09-08 2018-09-06 EFFECTIVE DIRECT CONVOLUTION USING HMIS INSTRUCTIONS

Related Child Applications (1)

Application Number Title Priority Date Filing Date
JP2023132932A Division JP7652507B2 (ja) 2017-09-08 2023-08-17 Simd命令を用いた効率的な直接畳み込み

Publications (3)

Publication Number Publication Date
JP2020533691A JP2020533691A (ja) 2020-11-19
JP2020533691A5 true JP2020533691A5 (enExample) 2021-10-14
JP7335231B2 JP7335231B2 (ja) 2023-08-29

Family

ID=65631104

Family Applications (2)

Application Number Title Priority Date Filing Date
JP2020513910A Active JP7335231B2 (ja) 2017-09-08 2018-09-06 Simd命令を用いた効率的な直接畳み込み
JP2023132932A Active JP7652507B2 (ja) 2017-09-08 2023-08-17 Simd命令を用いた効率的な直接畳み込み

Family Applications After (1)

Application Number Title Priority Date Filing Date
JP2023132932A Active JP7652507B2 (ja) 2017-09-08 2023-08-17 Simd命令を用いた効率的な直接畳み込み

Country Status (5)

Country Link
US (2) US11803377B2 (enExample)
EP (1) EP3676700B1 (enExample)
JP (2) JP7335231B2 (enExample)
CN (2) CN111213125B (enExample)
WO (1) WO2019051027A1 (enExample)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747844B2 (en) * 2017-12-12 2020-08-18 Tesla, Inc. Systems and methods for converting a matrix input to a vectorized input for a matrix processor
US10565285B2 (en) * 2017-12-18 2020-02-18 International Business Machines Corporation Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations
US12099912B2 (en) 2018-06-22 2024-09-24 Samsung Electronics Co., Ltd. Neural processor
CN111813447B (zh) * 2019-04-12 2022-11-08 杭州中天微系统有限公司 一种数据拼接指令的处理方法和处理装置
US11671111B2 (en) 2019-04-17 2023-06-06 Samsung Electronics Co., Ltd. Hardware channel-parallel data compression/decompression
US11211944B2 (en) 2019-04-17 2021-12-28 Samsung Electronics Co., Ltd. Mixed-precision compression with random access
US11880760B2 (en) 2019-05-01 2024-01-23 Samsung Electronics Co., Ltd. Mixed-precision NPU tile with depth-wise convolution
US12182577B2 (en) 2019-05-01 2024-12-31 Samsung Electronics Co., Ltd. Neural-processing unit tile for shuffling queued nibbles for multiplication with non-zero weight nibbles
US20210049474A1 (en) * 2019-08-13 2021-02-18 Samsung Electronics Co., Ltd. Neural network method and apparatus
US11726950B2 (en) * 2019-09-28 2023-08-15 Intel Corporation Compute near memory convolution accelerator
US11475283B2 (en) * 2019-10-24 2022-10-18 Apple Inc. Multi dimensional convolution in neural network processor
US12112141B2 (en) 2019-12-12 2024-10-08 Samsung Electronics Co., Ltd. Accelerating 2D convolutional layer mapping on a dot product architecture
CN111178505B (zh) * 2019-12-23 2023-04-07 福建星网视易信息系统有限公司 卷积神经网络的加速方法和计算机可读存储介质
CN111797985B (zh) * 2020-07-22 2022-11-22 哈尔滨工业大学 一种基于gpu的卷积运算内存访问优化方法
KR102860334B1 (ko) * 2020-08-14 2025-09-16 삼성전자주식회사 중복성 감축 기반의 컨볼루션 연산 처리 방법 및 장치
CN112633505B (zh) * 2020-12-24 2022-05-27 苏州浪潮智能科技有限公司 一种基于risc-v的人工智能推理方法和系统
US12182570B2 (en) 2021-06-25 2024-12-31 Intel Corporation Apparatuses, methods, and systems for a packed data convolution instruction with shift control and width control
US12443412B2 (en) 2022-01-30 2025-10-14 Simplex Micro, Inc. Method and apparatus for a scalable microprocessor with time counter
CN114443143B (zh) * 2022-01-30 2025-01-07 上海阵量智能科技有限公司 指令处理方法、装置、芯片、电子设备以及存储介质
US12190116B2 (en) 2022-04-05 2025-01-07 Simplex Micro, Inc. Microprocessor with time count based instruction execution and replay
US12169716B2 (en) 2022-04-20 2024-12-17 Simplex Micro, Inc. Microprocessor with a time counter for statically dispatching extended instructions
US12141580B2 (en) 2022-04-20 2024-11-12 Simplex Micro, Inc. Microprocessor with non-cacheable memory load prediction
US12288065B2 (en) 2022-04-29 2025-04-29 Simplex Micro, Inc. Microprocessor with odd and even register sets
US12124849B2 (en) * 2022-07-13 2024-10-22 Simplex Micro, Inc. Vector processor with extended vector registers
US12147812B2 (en) 2022-07-13 2024-11-19 Simplex Micro, Inc. Out-of-order execution of loop instructions in a microprocessor
US12282772B2 (en) 2022-07-13 2025-04-22 Simplex Micro, Inc. Vector processor with vector data buffer
CN117313803B (zh) * 2023-11-28 2024-02-02 进迭时空(杭州)科技有限公司 基于risc-v向量处理器架构的滑动窗口2d卷积计算方法
CN119536744B (zh) * 2025-01-23 2025-06-17 山东浪潮科学研究院有限公司 一种代码自动向量化优化方法、设备及介质

Family Cites Families (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6489868A (en) 1987-09-30 1989-04-05 Sony Corp Video signal processing circuit
JPH03189868A (ja) * 1989-12-20 1991-08-19 Akira Iwata データ処理プロセツサ
US5734874A (en) * 1994-04-29 1998-03-31 Sun Microsystems, Inc. Central processing unit with integrated graphics functions
EP0681236B1 (en) * 1994-05-05 2000-11-22 Conexant Systems, Inc. Space vector data path
US7085795B2 (en) 2001-10-29 2006-08-01 Intel Corporation Apparatus and method for efficient filtering and convolution of content data
US6061711A (en) * 1996-08-19 2000-05-09 Samsung Electronics, Inc. Efficient context saving and restoring in a multi-tasking computing system environment
US5801975A (en) * 1996-12-02 1998-09-01 Compaq Computer Corporation And Advanced Micro Devices, Inc. Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles
US5909572A (en) * 1996-12-02 1999-06-01 Compaq Computer Corp. System and method for conditionally moving an operand from a source register to a destination register
US5933650A (en) * 1997-10-09 1999-08-03 Mips Technologies, Inc. Alignment and ordering of vector elements for single instruction multiple data processing
US6115812A (en) * 1998-04-01 2000-09-05 Intel Corporation Method and apparatus for efficient vertical SIMD computations
EP2241968B1 (en) * 1998-08-24 2012-06-27 MicroUnity Systems Engineering, Inc. System with wide operand architecture, and method
US7725521B2 (en) * 2001-10-29 2010-05-25 Intel Corporation Method and apparatus for computing matrix transformations
US7685212B2 (en) 2001-10-29 2010-03-23 Intel Corporation Fast full search motion estimation with SIMD merge instruction
US6954841B2 (en) * 2002-06-26 2005-10-11 International Business Machines Corporation Viterbi decoding for SIMD vector processors with indirect vector element access
GB2395306B (en) * 2002-11-15 2006-02-15 Imagination Tech Ltd A configurable processor architecture
US7409415B2 (en) * 2002-12-20 2008-08-05 Texas Instruments Incorporated Processor system with efficient shift operations including EXTRACT operation
US7689641B2 (en) 2003-06-30 2010-03-30 Intel Corporation SIMD integer multiply high with round and shift
GB2409063B (en) 2003-12-09 2006-07-12 Advanced Risc Mach Ltd Vector by scalar operations
GB2409065B (en) * 2003-12-09 2006-10-25 Advanced Risc Mach Ltd Multiplexing operations in SIMD processing
US7328230B2 (en) * 2004-03-26 2008-02-05 Intel Corporation SIMD four-data element average instruction
US7315937B2 (en) * 2004-10-01 2008-01-01 Mips Technologies, Inc. Microprocessor instructions for efficient bit stream extractions
US7933405B2 (en) * 2005-04-08 2011-04-26 Icera Inc. Data access and permute unit
US7623732B1 (en) 2005-04-26 2009-11-24 Mercury Computer Systems, Inc. Method and apparatus for digital image filtering with discrete filter kernels using graphics hardware
US7529918B2 (en) * 2006-07-21 2009-05-05 Broadcom Corporation System and method for efficiently performing bit-field extraction and bit-field combination operations in a processor
US20080071851A1 (en) * 2006-09-20 2008-03-20 Ronen Zohar Instruction and logic for performing a dot-product operation
US8255884B2 (en) 2008-06-06 2012-08-28 International Business Machines Corporation Optimized scalar promotion with load and splat SIMD instructions
US20100180100A1 (en) 2009-01-13 2010-07-15 Mavrix Technology, Inc. Matrix microprocessor and method of operation
CN101923534B (zh) 2009-06-10 2012-02-01 新奥特(北京)视频技术有限公司 应用sse指令集对视音频信号的对称卷积核进行卷积的方法
US8732437B2 (en) * 2010-01-26 2014-05-20 Oracle America, Inc. Low-overhead misalignment and reformatting support for SIMD
US9363068B2 (en) 2010-08-03 2016-06-07 Intel Corporation Vector processor having instruction set with sliding window non-linear convolutional function
US20120185670A1 (en) * 2011-01-14 2012-07-19 Toll Bret L Scalar integer instructions capable of execution with three registers
US20120254589A1 (en) * 2011-04-01 2012-10-04 Jesus Corbal San Adrian System, apparatus, and method for aligning registers
KR102207599B1 (ko) 2011-10-27 2021-01-26 인텔 코포레이션 블록 기반 파고율 저감
CN102495721A (zh) * 2011-12-02 2012-06-13 南京大学 一种支持fft加速的simd向量处理器
US9946540B2 (en) * 2011-12-23 2018-04-17 Intel Corporation Apparatus and method of improved permute instructions with multiple granularities
EP2798454A4 (en) 2011-12-30 2016-08-17 Intel Corp VARIABLE SIMD DISPLACEMENT AND ROTATION USING A CONTROL PANELULATION
US9275014B2 (en) * 2013-03-13 2016-03-01 Qualcomm Incorporated Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods
US9477999B2 (en) * 2013-09-20 2016-10-25 The Board Of Trustees Of The Leland Stanford Junior University Low power programmable image processor
US9684509B2 (en) * 2013-11-15 2017-06-20 Qualcomm Incorporated Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods
US9442731B2 (en) * 2014-03-13 2016-09-13 Intel Corporation Packed two source inter-element shift merge processors, methods, systems, and instructions
US10402196B2 (en) * 2015-05-11 2019-09-03 Ceva D.S.P. Ltd. Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients
US9582726B2 (en) * 2015-06-24 2017-02-28 Qualcomm Incorporated Systems and methods for image processing in a deep convolution network
US10459731B2 (en) * 2015-07-20 2019-10-29 Qualcomm Incorporated Sliding window operation
GB2540939B (en) * 2015-07-31 2019-01-23 Advanced Risc Mach Ltd An apparatus and method for performing a splice operation
US20170357894A1 (en) * 2016-06-10 2017-12-14 Apple Inc. Data packing for convolution of artificial neural networks
US10282204B2 (en) * 2016-07-02 2019-05-07 Intel Corporation Systems, apparatuses, and methods for strided load
CN106940815B (zh) * 2017-02-13 2020-07-28 西安交通大学 一种可编程卷积神经网络协处理器ip核
CN106991473A (zh) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 面向向量处理器的基于simd的平均值值池化并行处理方法
US10824938B2 (en) * 2017-04-24 2020-11-03 Intel Corporation Specialized fixed function hardware for efficient convolution
JP6958027B2 (ja) * 2017-07-03 2021-11-02 富士通株式会社 演算処理装置及び演算処理装置の制御方法

Similar Documents

Publication Publication Date Title
JP2020533691A5 (enExample)
US12205018B2 (en) Transposing neural network matrices in hardware
US12277499B2 (en) Vector computation unit in a neural network processor
US20220138577A1 (en) Batch Processing In A Neural Network Processor
US11816532B2 (en) Performing kernel striding in hardware
US11574195B2 (en) Operation method
CN114239797B (zh) 用于在硬件中执行平均池化的方法和硬件电路
KR102331978B1 (ko) 인공 신경망 정방향 연산 실행용 장치와 방법
CN109324827B (zh) 用于处理用于访问数据的指令的装置、方法和系统
US11573765B2 (en) Fused convolution and batch normalization for neural networks
CN113868592B (zh) 基于g2d实现卷积计算的方法及系统
CN117492838A (zh) 访问序言和结尾数据
HK40078354A (en) Performing kernel striding in hardware
HK40043994A (en) Transposing neural network matrices in hardware
HK1254699B (en) Performing kernel striding in hardware
HK40043994B (en) Transposing neural network matrices in hardware