JP2020533691A5 - - Google Patents
Download PDFInfo
- Publication number
- JP2020533691A5 JP2020533691A5 JP2020513910A JP2020513910A JP2020533691A5 JP 2020533691 A5 JP2020533691 A5 JP 2020533691A5 JP 2020513910 A JP2020513910 A JP 2020513910A JP 2020513910 A JP2020513910 A JP 2020513910A JP 2020533691 A5 JP2020533691 A5 JP 2020533691A5
- Authority
- JP
- Japan
- Prior art keywords
- vector
- lane
- data
- vectors
- generate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 claims description 213
- 238000000605 extraction Methods 0.000 claims description 10
- 238000000034 method Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 1
Images
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023132932A JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762556274P | 2017-09-08 | 2017-09-08 | |
| US62/556,274 | 2017-09-08 | ||
| US15/941,975 US11803377B2 (en) | 2017-09-08 | 2018-03-30 | Efficient direct convolution using SIMD instructions |
| US15/941,975 | 2018-03-30 | ||
| PCT/US2018/049666 WO2019051027A1 (en) | 2017-09-08 | 2018-09-06 | EFFECTIVE DIRECT CONVOLUTION USING HMIS INSTRUCTIONS |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023132932A Division JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2020533691A JP2020533691A (ja) | 2020-11-19 |
| JP2020533691A5 true JP2020533691A5 (enExample) | 2021-10-14 |
| JP7335231B2 JP7335231B2 (ja) | 2023-08-29 |
Family
ID=65631104
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2020513910A Active JP7335231B2 (ja) | 2017-09-08 | 2018-09-06 | Simd命令を用いた効率的な直接畳み込み |
| JP2023132932A Active JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023132932A Active JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11803377B2 (enExample) |
| EP (1) | EP3676700B1 (enExample) |
| JP (2) | JP7335231B2 (enExample) |
| CN (2) | CN111213125B (enExample) |
| WO (1) | WO2019051027A1 (enExample) |
Families Citing this family (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10747844B2 (en) * | 2017-12-12 | 2020-08-18 | Tesla, Inc. | Systems and methods for converting a matrix input to a vectorized input for a matrix processor |
| US10565285B2 (en) * | 2017-12-18 | 2020-02-18 | International Business Machines Corporation | Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations |
| US12099912B2 (en) | 2018-06-22 | 2024-09-24 | Samsung Electronics Co., Ltd. | Neural processor |
| CN111813447B (zh) * | 2019-04-12 | 2022-11-08 | 杭州中天微系统有限公司 | 一种数据拼接指令的处理方法和处理装置 |
| US11671111B2 (en) | 2019-04-17 | 2023-06-06 | Samsung Electronics Co., Ltd. | Hardware channel-parallel data compression/decompression |
| US11211944B2 (en) | 2019-04-17 | 2021-12-28 | Samsung Electronics Co., Ltd. | Mixed-precision compression with random access |
| US11880760B2 (en) | 2019-05-01 | 2024-01-23 | Samsung Electronics Co., Ltd. | Mixed-precision NPU tile with depth-wise convolution |
| US12182577B2 (en) | 2019-05-01 | 2024-12-31 | Samsung Electronics Co., Ltd. | Neural-processing unit tile for shuffling queued nibbles for multiplication with non-zero weight nibbles |
| US20210049474A1 (en) * | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | Neural network method and apparatus |
| US11726950B2 (en) * | 2019-09-28 | 2023-08-15 | Intel Corporation | Compute near memory convolution accelerator |
| US11475283B2 (en) * | 2019-10-24 | 2022-10-18 | Apple Inc. | Multi dimensional convolution in neural network processor |
| US12112141B2 (en) | 2019-12-12 | 2024-10-08 | Samsung Electronics Co., Ltd. | Accelerating 2D convolutional layer mapping on a dot product architecture |
| CN111178505B (zh) * | 2019-12-23 | 2023-04-07 | 福建星网视易信息系统有限公司 | 卷积神经网络的加速方法和计算机可读存储介质 |
| CN111797985B (zh) * | 2020-07-22 | 2022-11-22 | 哈尔滨工业大学 | 一种基于gpu的卷积运算内存访问优化方法 |
| KR102860334B1 (ko) * | 2020-08-14 | 2025-09-16 | 삼성전자주식회사 | 중복성 감축 기반의 컨볼루션 연산 처리 방법 및 장치 |
| CN112633505B (zh) * | 2020-12-24 | 2022-05-27 | 苏州浪潮智能科技有限公司 | 一种基于risc-v的人工智能推理方法和系统 |
| US12182570B2 (en) | 2021-06-25 | 2024-12-31 | Intel Corporation | Apparatuses, methods, and systems for a packed data convolution instruction with shift control and width control |
| US12443412B2 (en) | 2022-01-30 | 2025-10-14 | Simplex Micro, Inc. | Method and apparatus for a scalable microprocessor with time counter |
| CN114443143B (zh) * | 2022-01-30 | 2025-01-07 | 上海阵量智能科技有限公司 | 指令处理方法、装置、芯片、电子设备以及存储介质 |
| US12190116B2 (en) | 2022-04-05 | 2025-01-07 | Simplex Micro, Inc. | Microprocessor with time count based instruction execution and replay |
| US12169716B2 (en) | 2022-04-20 | 2024-12-17 | Simplex Micro, Inc. | Microprocessor with a time counter for statically dispatching extended instructions |
| US12141580B2 (en) | 2022-04-20 | 2024-11-12 | Simplex Micro, Inc. | Microprocessor with non-cacheable memory load prediction |
| US12288065B2 (en) | 2022-04-29 | 2025-04-29 | Simplex Micro, Inc. | Microprocessor with odd and even register sets |
| US12124849B2 (en) * | 2022-07-13 | 2024-10-22 | Simplex Micro, Inc. | Vector processor with extended vector registers |
| US12147812B2 (en) | 2022-07-13 | 2024-11-19 | Simplex Micro, Inc. | Out-of-order execution of loop instructions in a microprocessor |
| US12282772B2 (en) | 2022-07-13 | 2025-04-22 | Simplex Micro, Inc. | Vector processor with vector data buffer |
| CN117313803B (zh) * | 2023-11-28 | 2024-02-02 | 进迭时空(杭州)科技有限公司 | 基于risc-v向量处理器架构的滑动窗口2d卷积计算方法 |
| CN119536744B (zh) * | 2025-01-23 | 2025-06-17 | 山东浪潮科学研究院有限公司 | 一种代码自动向量化优化方法、设备及介质 |
Family Cites Families (50)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6489868A (en) | 1987-09-30 | 1989-04-05 | Sony Corp | Video signal processing circuit |
| JPH03189868A (ja) * | 1989-12-20 | 1991-08-19 | Akira Iwata | データ処理プロセツサ |
| US5734874A (en) * | 1994-04-29 | 1998-03-31 | Sun Microsystems, Inc. | Central processing unit with integrated graphics functions |
| EP0681236B1 (en) * | 1994-05-05 | 2000-11-22 | Conexant Systems, Inc. | Space vector data path |
| US7085795B2 (en) | 2001-10-29 | 2006-08-01 | Intel Corporation | Apparatus and method for efficient filtering and convolution of content data |
| US6061711A (en) * | 1996-08-19 | 2000-05-09 | Samsung Electronics, Inc. | Efficient context saving and restoring in a multi-tasking computing system environment |
| US5801975A (en) * | 1996-12-02 | 1998-09-01 | Compaq Computer Corporation And Advanced Micro Devices, Inc. | Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles |
| US5909572A (en) * | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
| US5933650A (en) * | 1997-10-09 | 1999-08-03 | Mips Technologies, Inc. | Alignment and ordering of vector elements for single instruction multiple data processing |
| US6115812A (en) * | 1998-04-01 | 2000-09-05 | Intel Corporation | Method and apparatus for efficient vertical SIMD computations |
| EP2241968B1 (en) * | 1998-08-24 | 2012-06-27 | MicroUnity Systems Engineering, Inc. | System with wide operand architecture, and method |
| US7725521B2 (en) * | 2001-10-29 | 2010-05-25 | Intel Corporation | Method and apparatus for computing matrix transformations |
| US7685212B2 (en) | 2001-10-29 | 2010-03-23 | Intel Corporation | Fast full search motion estimation with SIMD merge instruction |
| US6954841B2 (en) * | 2002-06-26 | 2005-10-11 | International Business Machines Corporation | Viterbi decoding for SIMD vector processors with indirect vector element access |
| GB2395306B (en) * | 2002-11-15 | 2006-02-15 | Imagination Tech Ltd | A configurable processor architecture |
| US7409415B2 (en) * | 2002-12-20 | 2008-08-05 | Texas Instruments Incorporated | Processor system with efficient shift operations including EXTRACT operation |
| US7689641B2 (en) | 2003-06-30 | 2010-03-30 | Intel Corporation | SIMD integer multiply high with round and shift |
| GB2409063B (en) | 2003-12-09 | 2006-07-12 | Advanced Risc Mach Ltd | Vector by scalar operations |
| GB2409065B (en) * | 2003-12-09 | 2006-10-25 | Advanced Risc Mach Ltd | Multiplexing operations in SIMD processing |
| US7328230B2 (en) * | 2004-03-26 | 2008-02-05 | Intel Corporation | SIMD four-data element average instruction |
| US7315937B2 (en) * | 2004-10-01 | 2008-01-01 | Mips Technologies, Inc. | Microprocessor instructions for efficient bit stream extractions |
| US7933405B2 (en) * | 2005-04-08 | 2011-04-26 | Icera Inc. | Data access and permute unit |
| US7623732B1 (en) | 2005-04-26 | 2009-11-24 | Mercury Computer Systems, Inc. | Method and apparatus for digital image filtering with discrete filter kernels using graphics hardware |
| US7529918B2 (en) * | 2006-07-21 | 2009-05-05 | Broadcom Corporation | System and method for efficiently performing bit-field extraction and bit-field combination operations in a processor |
| US20080071851A1 (en) * | 2006-09-20 | 2008-03-20 | Ronen Zohar | Instruction and logic for performing a dot-product operation |
| US8255884B2 (en) | 2008-06-06 | 2012-08-28 | International Business Machines Corporation | Optimized scalar promotion with load and splat SIMD instructions |
| US20100180100A1 (en) | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
| CN101923534B (zh) | 2009-06-10 | 2012-02-01 | 新奥特(北京)视频技术有限公司 | 应用sse指令集对视音频信号的对称卷积核进行卷积的方法 |
| US8732437B2 (en) * | 2010-01-26 | 2014-05-20 | Oracle America, Inc. | Low-overhead misalignment and reformatting support for SIMD |
| US9363068B2 (en) | 2010-08-03 | 2016-06-07 | Intel Corporation | Vector processor having instruction set with sliding window non-linear convolutional function |
| US20120185670A1 (en) * | 2011-01-14 | 2012-07-19 | Toll Bret L | Scalar integer instructions capable of execution with three registers |
| US20120254589A1 (en) * | 2011-04-01 | 2012-10-04 | Jesus Corbal San Adrian | System, apparatus, and method for aligning registers |
| KR102207599B1 (ko) | 2011-10-27 | 2021-01-26 | 인텔 코포레이션 | 블록 기반 파고율 저감 |
| CN102495721A (zh) * | 2011-12-02 | 2012-06-13 | 南京大学 | 一种支持fft加速的simd向量处理器 |
| US9946540B2 (en) * | 2011-12-23 | 2018-04-17 | Intel Corporation | Apparatus and method of improved permute instructions with multiple granularities |
| EP2798454A4 (en) | 2011-12-30 | 2016-08-17 | Intel Corp | VARIABLE SIMD DISPLACEMENT AND ROTATION USING A CONTROL PANELULATION |
| US9275014B2 (en) * | 2013-03-13 | 2016-03-01 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods |
| US9477999B2 (en) * | 2013-09-20 | 2016-10-25 | The Board Of Trustees Of The Leland Stanford Junior University | Low power programmable image processor |
| US9684509B2 (en) * | 2013-11-15 | 2017-06-20 | Qualcomm Incorporated | Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods |
| US9442731B2 (en) * | 2014-03-13 | 2016-09-13 | Intel Corporation | Packed two source inter-element shift merge processors, methods, systems, and instructions |
| US10402196B2 (en) * | 2015-05-11 | 2019-09-03 | Ceva D.S.P. Ltd. | Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients |
| US9582726B2 (en) * | 2015-06-24 | 2017-02-28 | Qualcomm Incorporated | Systems and methods for image processing in a deep convolution network |
| US10459731B2 (en) * | 2015-07-20 | 2019-10-29 | Qualcomm Incorporated | Sliding window operation |
| GB2540939B (en) * | 2015-07-31 | 2019-01-23 | Advanced Risc Mach Ltd | An apparatus and method for performing a splice operation |
| US20170357894A1 (en) * | 2016-06-10 | 2017-12-14 | Apple Inc. | Data packing for convolution of artificial neural networks |
| US10282204B2 (en) * | 2016-07-02 | 2019-05-07 | Intel Corporation | Systems, apparatuses, and methods for strided load |
| CN106940815B (zh) * | 2017-02-13 | 2020-07-28 | 西安交通大学 | 一种可编程卷积神经网络协处理器ip核 |
| CN106991473A (zh) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 面向向量处理器的基于simd的平均值值池化并行处理方法 |
| US10824938B2 (en) * | 2017-04-24 | 2020-11-03 | Intel Corporation | Specialized fixed function hardware for efficient convolution |
| JP6958027B2 (ja) * | 2017-07-03 | 2021-11-02 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
-
2018
- 2018-03-30 US US15/941,975 patent/US11803377B2/en active Active
- 2018-09-06 EP EP18779130.6A patent/EP3676700B1/en active Active
- 2018-09-06 WO PCT/US2018/049666 patent/WO2019051027A1/en not_active Ceased
- 2018-09-06 JP JP2020513910A patent/JP7335231B2/ja active Active
- 2018-09-06 CN CN201880066852.7A patent/CN111213125B/zh active Active
- 2018-09-06 CN CN202311376759.5A patent/CN119556989A/zh active Pending
-
2023
- 2023-08-17 JP JP2023132932A patent/JP7652507B2/ja active Active
- 2023-09-22 US US18/472,482 patent/US20240012644A1/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2020533691A5 (enExample) | ||
| US12205018B2 (en) | Transposing neural network matrices in hardware | |
| US12277499B2 (en) | Vector computation unit in a neural network processor | |
| US20220138577A1 (en) | Batch Processing In A Neural Network Processor | |
| US11816532B2 (en) | Performing kernel striding in hardware | |
| US11574195B2 (en) | Operation method | |
| CN114239797B (zh) | 用于在硬件中执行平均池化的方法和硬件电路 | |
| KR102331978B1 (ko) | 인공 신경망 정방향 연산 실행용 장치와 방법 | |
| CN109324827B (zh) | 用于处理用于访问数据的指令的装置、方法和系统 | |
| US11573765B2 (en) | Fused convolution and batch normalization for neural networks | |
| CN113868592B (zh) | 基于g2d实现卷积计算的方法及系统 | |
| CN117492838A (zh) | 访问序言和结尾数据 | |
| HK40078354A (en) | Performing kernel striding in hardware | |
| HK40043994A (en) | Transposing neural network matrices in hardware | |
| HK1254699B (en) | Performing kernel striding in hardware | |
| HK40043994B (en) | Transposing neural network matrices in hardware |