CN111213125B - 使用simd指令进行高效的直接卷积 - Google Patents
使用simd指令进行高效的直接卷积 Download PDFInfo
- Publication number
- CN111213125B CN111213125B CN201880066852.7A CN201880066852A CN111213125B CN 111213125 B CN111213125 B CN 111213125B CN 201880066852 A CN201880066852 A CN 201880066852A CN 111213125 B CN111213125 B CN 111213125B
- Authority
- CN
- China
- Prior art keywords
- vector
- vectors
- data
- instruction
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Advance Control (AREA)
- Complex Calculations (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311376759.5A CN119556989A (zh) | 2017-09-08 | 2018-09-06 | 使用simd指令进行高效的直接卷积 |
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762556274P | 2017-09-08 | 2017-09-08 | |
| US62/556,274 | 2017-09-08 | ||
| US15/941,975 US11803377B2 (en) | 2017-09-08 | 2018-03-30 | Efficient direct convolution using SIMD instructions |
| US15/941,975 | 2018-03-30 | ||
| PCT/US2018/049666 WO2019051027A1 (en) | 2017-09-08 | 2018-09-06 | EFFECTIVE DIRECT CONVOLUTION USING HMIS INSTRUCTIONS |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311376759.5A Division CN119556989A (zh) | 2017-09-08 | 2018-09-06 | 使用simd指令进行高效的直接卷积 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111213125A CN111213125A (zh) | 2020-05-29 |
| CN111213125B true CN111213125B (zh) | 2023-11-07 |
Family
ID=65631104
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201880066852.7A Active CN111213125B (zh) | 2017-09-08 | 2018-09-06 | 使用simd指令进行高效的直接卷积 |
| CN202311376759.5A Pending CN119556989A (zh) | 2017-09-08 | 2018-09-06 | 使用simd指令进行高效的直接卷积 |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311376759.5A Pending CN119556989A (zh) | 2017-09-08 | 2018-09-06 | 使用simd指令进行高效的直接卷积 |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11803377B2 (enExample) |
| EP (1) | EP3676700B1 (enExample) |
| JP (2) | JP7335231B2 (enExample) |
| CN (2) | CN111213125B (enExample) |
| WO (1) | WO2019051027A1 (enExample) |
Families Citing this family (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10747844B2 (en) * | 2017-12-12 | 2020-08-18 | Tesla, Inc. | Systems and methods for converting a matrix input to a vectorized input for a matrix processor |
| US10565285B2 (en) * | 2017-12-18 | 2020-02-18 | International Business Machines Corporation | Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations |
| US12099912B2 (en) | 2018-06-22 | 2024-09-24 | Samsung Electronics Co., Ltd. | Neural processor |
| CN111813447B (zh) * | 2019-04-12 | 2022-11-08 | 杭州中天微系统有限公司 | 一种数据拼接指令的处理方法和处理装置 |
| US11671111B2 (en) | 2019-04-17 | 2023-06-06 | Samsung Electronics Co., Ltd. | Hardware channel-parallel data compression/decompression |
| US11211944B2 (en) | 2019-04-17 | 2021-12-28 | Samsung Electronics Co., Ltd. | Mixed-precision compression with random access |
| US11880760B2 (en) | 2019-05-01 | 2024-01-23 | Samsung Electronics Co., Ltd. | Mixed-precision NPU tile with depth-wise convolution |
| US12182577B2 (en) | 2019-05-01 | 2024-12-31 | Samsung Electronics Co., Ltd. | Neural-processing unit tile for shuffling queued nibbles for multiplication with non-zero weight nibbles |
| US20210049474A1 (en) * | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | Neural network method and apparatus |
| US11726950B2 (en) * | 2019-09-28 | 2023-08-15 | Intel Corporation | Compute near memory convolution accelerator |
| US11475283B2 (en) * | 2019-10-24 | 2022-10-18 | Apple Inc. | Multi dimensional convolution in neural network processor |
| US12112141B2 (en) | 2019-12-12 | 2024-10-08 | Samsung Electronics Co., Ltd. | Accelerating 2D convolutional layer mapping on a dot product architecture |
| CN111178505B (zh) * | 2019-12-23 | 2023-04-07 | 福建星网视易信息系统有限公司 | 卷积神经网络的加速方法和计算机可读存储介质 |
| CN111797985B (zh) * | 2020-07-22 | 2022-11-22 | 哈尔滨工业大学 | 一种基于gpu的卷积运算内存访问优化方法 |
| KR102860334B1 (ko) * | 2020-08-14 | 2025-09-16 | 삼성전자주식회사 | 중복성 감축 기반의 컨볼루션 연산 처리 방법 및 장치 |
| CN112633505B (zh) * | 2020-12-24 | 2022-05-27 | 苏州浪潮智能科技有限公司 | 一种基于risc-v的人工智能推理方法和系统 |
| US12182570B2 (en) | 2021-06-25 | 2024-12-31 | Intel Corporation | Apparatuses, methods, and systems for a packed data convolution instruction with shift control and width control |
| US12443412B2 (en) | 2022-01-30 | 2025-10-14 | Simplex Micro, Inc. | Method and apparatus for a scalable microprocessor with time counter |
| CN114443143B (zh) * | 2022-01-30 | 2025-01-07 | 上海阵量智能科技有限公司 | 指令处理方法、装置、芯片、电子设备以及存储介质 |
| US12190116B2 (en) | 2022-04-05 | 2025-01-07 | Simplex Micro, Inc. | Microprocessor with time count based instruction execution and replay |
| US12169716B2 (en) | 2022-04-20 | 2024-12-17 | Simplex Micro, Inc. | Microprocessor with a time counter for statically dispatching extended instructions |
| US12141580B2 (en) | 2022-04-20 | 2024-11-12 | Simplex Micro, Inc. | Microprocessor with non-cacheable memory load prediction |
| US12288065B2 (en) | 2022-04-29 | 2025-04-29 | Simplex Micro, Inc. | Microprocessor with odd and even register sets |
| US12124849B2 (en) * | 2022-07-13 | 2024-10-22 | Simplex Micro, Inc. | Vector processor with extended vector registers |
| US12147812B2 (en) | 2022-07-13 | 2024-11-19 | Simplex Micro, Inc. | Out-of-order execution of loop instructions in a microprocessor |
| US12282772B2 (en) | 2022-07-13 | 2025-04-22 | Simplex Micro, Inc. | Vector processor with vector data buffer |
| CN117313803B (zh) * | 2023-11-28 | 2024-02-02 | 进迭时空(杭州)科技有限公司 | 基于risc-v向量处理器架构的滑动窗口2d卷积计算方法 |
| CN119536744B (zh) * | 2025-01-23 | 2025-06-17 | 山东浪潮科学研究院有限公司 | 一种代码自动向量化优化方法、设备及介质 |
Citations (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH03189868A (ja) * | 1989-12-20 | 1991-08-19 | Akira Iwata | データ処理プロセツサ |
| EP0681236A1 (en) * | 1994-05-05 | 1995-11-08 | Rockwell International Corporation | Space vector data path |
| CN1175731A (zh) * | 1996-08-19 | 1998-03-11 | 三星电子株式会社 | 多任务计算系统环境中有效现场保存与恢复的装置和方法 |
| US5801975A (en) * | 1996-12-02 | 1998-09-01 | Compaq Computer Corporation And Advanced Micro Devices, Inc. | Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles |
| US5909572A (en) * | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
| US5933650A (en) * | 1997-10-09 | 1999-08-03 | Mips Technologies, Inc. | Alignment and ordering of vector elements for single instruction multiple data processing |
| GB0226732D0 (en) * | 2002-11-15 | 2002-12-24 | Imagination Tech Ltd | A configurable processor architecture |
| CN1522401A (zh) * | 2001-10-29 | 2004-08-18 | ض� | 数据并行右移合并的方法与装置 |
| CN1577257A (zh) * | 2003-06-30 | 2005-02-09 | 英特尔公司 | 具有取整和移位的单指令多数据整数高位乘法 |
| CN101923534A (zh) * | 2009-06-10 | 2010-12-22 | 新奥特(北京)视频技术有限公司 | 应用sse指令集对视音频信号的对称卷积核进行卷积的方法 |
| CN102495721A (zh) * | 2011-12-02 | 2012-06-13 | 南京大学 | 一种支持fft加速的simd向量处理器 |
| CN104025033A (zh) * | 2011-12-30 | 2014-09-03 | 英特尔公司 | 利用控制操纵的simd可变移位和循环 |
| CN104969215A (zh) * | 2013-03-13 | 2015-10-07 | 高通股份有限公司 | 具有用于提供多模基-2x蝶形向量处理电路的可编程数据路径的向量处理引擎以及相关的向量处理器、系统和方法 |
| CN105723333A (zh) * | 2013-11-15 | 2016-06-29 | 高通股份有限公司 | 在执行单元与向量数据存储器之间具有合并电路系统的向量处理引擎以及相关的方法 |
| EP3093757A2 (en) * | 2015-05-11 | 2016-11-16 | Ceva D.S.P. Ltd. | Multi-dimensional sliding window operation for a vector processor |
| CN106940815A (zh) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | 一种可编程卷积神经网络协处理器ip核 |
| CN106991473A (zh) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 面向向量处理器的基于simd的平均值值池化并行处理方法 |
Family Cites Families (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6489868A (en) | 1987-09-30 | 1989-04-05 | Sony Corp | Video signal processing circuit |
| US5734874A (en) * | 1994-04-29 | 1998-03-31 | Sun Microsystems, Inc. | Central processing unit with integrated graphics functions |
| US7085795B2 (en) | 2001-10-29 | 2006-08-01 | Intel Corporation | Apparatus and method for efficient filtering and convolution of content data |
| US6115812A (en) * | 1998-04-01 | 2000-09-05 | Intel Corporation | Method and apparatus for efficient vertical SIMD computations |
| EP2241968B1 (en) * | 1998-08-24 | 2012-06-27 | MicroUnity Systems Engineering, Inc. | System with wide operand architecture, and method |
| US7725521B2 (en) * | 2001-10-29 | 2010-05-25 | Intel Corporation | Method and apparatus for computing matrix transformations |
| US6954841B2 (en) * | 2002-06-26 | 2005-10-11 | International Business Machines Corporation | Viterbi decoding for SIMD vector processors with indirect vector element access |
| US7409415B2 (en) * | 2002-12-20 | 2008-08-05 | Texas Instruments Incorporated | Processor system with efficient shift operations including EXTRACT operation |
| GB2409063B (en) | 2003-12-09 | 2006-07-12 | Advanced Risc Mach Ltd | Vector by scalar operations |
| GB2409065B (en) * | 2003-12-09 | 2006-10-25 | Advanced Risc Mach Ltd | Multiplexing operations in SIMD processing |
| US7328230B2 (en) * | 2004-03-26 | 2008-02-05 | Intel Corporation | SIMD four-data element average instruction |
| US7315937B2 (en) * | 2004-10-01 | 2008-01-01 | Mips Technologies, Inc. | Microprocessor instructions for efficient bit stream extractions |
| US7933405B2 (en) * | 2005-04-08 | 2011-04-26 | Icera Inc. | Data access and permute unit |
| US7623732B1 (en) | 2005-04-26 | 2009-11-24 | Mercury Computer Systems, Inc. | Method and apparatus for digital image filtering with discrete filter kernels using graphics hardware |
| US7529918B2 (en) * | 2006-07-21 | 2009-05-05 | Broadcom Corporation | System and method for efficiently performing bit-field extraction and bit-field combination operations in a processor |
| US20080071851A1 (en) * | 2006-09-20 | 2008-03-20 | Ronen Zohar | Instruction and logic for performing a dot-product operation |
| US8255884B2 (en) | 2008-06-06 | 2012-08-28 | International Business Machines Corporation | Optimized scalar promotion with load and splat SIMD instructions |
| US20100180100A1 (en) | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
| US8732437B2 (en) * | 2010-01-26 | 2014-05-20 | Oracle America, Inc. | Low-overhead misalignment and reformatting support for SIMD |
| US9363068B2 (en) | 2010-08-03 | 2016-06-07 | Intel Corporation | Vector processor having instruction set with sliding window non-linear convolutional function |
| US20120185670A1 (en) * | 2011-01-14 | 2012-07-19 | Toll Bret L | Scalar integer instructions capable of execution with three registers |
| US20120254589A1 (en) * | 2011-04-01 | 2012-10-04 | Jesus Corbal San Adrian | System, apparatus, and method for aligning registers |
| KR102207599B1 (ko) | 2011-10-27 | 2021-01-26 | 인텔 코포레이션 | 블록 기반 파고율 저감 |
| US9946540B2 (en) * | 2011-12-23 | 2018-04-17 | Intel Corporation | Apparatus and method of improved permute instructions with multiple granularities |
| US9477999B2 (en) * | 2013-09-20 | 2016-10-25 | The Board Of Trustees Of The Leland Stanford Junior University | Low power programmable image processor |
| US9442731B2 (en) * | 2014-03-13 | 2016-09-13 | Intel Corporation | Packed two source inter-element shift merge processors, methods, systems, and instructions |
| US9582726B2 (en) * | 2015-06-24 | 2017-02-28 | Qualcomm Incorporated | Systems and methods for image processing in a deep convolution network |
| US10459731B2 (en) * | 2015-07-20 | 2019-10-29 | Qualcomm Incorporated | Sliding window operation |
| GB2540939B (en) * | 2015-07-31 | 2019-01-23 | Advanced Risc Mach Ltd | An apparatus and method for performing a splice operation |
| US20170357894A1 (en) * | 2016-06-10 | 2017-12-14 | Apple Inc. | Data packing for convolution of artificial neural networks |
| US10282204B2 (en) * | 2016-07-02 | 2019-05-07 | Intel Corporation | Systems, apparatuses, and methods for strided load |
| US10824938B2 (en) * | 2017-04-24 | 2020-11-03 | Intel Corporation | Specialized fixed function hardware for efficient convolution |
| JP6958027B2 (ja) * | 2017-07-03 | 2021-11-02 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
-
2018
- 2018-03-30 US US15/941,975 patent/US11803377B2/en active Active
- 2018-09-06 EP EP18779130.6A patent/EP3676700B1/en active Active
- 2018-09-06 WO PCT/US2018/049666 patent/WO2019051027A1/en not_active Ceased
- 2018-09-06 JP JP2020513910A patent/JP7335231B2/ja active Active
- 2018-09-06 CN CN201880066852.7A patent/CN111213125B/zh active Active
- 2018-09-06 CN CN202311376759.5A patent/CN119556989A/zh active Pending
-
2023
- 2023-08-17 JP JP2023132932A patent/JP7652507B2/ja active Active
- 2023-09-22 US US18/472,482 patent/US20240012644A1/en active Pending
Patent Citations (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH03189868A (ja) * | 1989-12-20 | 1991-08-19 | Akira Iwata | データ処理プロセツサ |
| EP0681236A1 (en) * | 1994-05-05 | 1995-11-08 | Rockwell International Corporation | Space vector data path |
| CN1175731A (zh) * | 1996-08-19 | 1998-03-11 | 三星电子株式会社 | 多任务计算系统环境中有效现场保存与恢复的装置和方法 |
| US5801975A (en) * | 1996-12-02 | 1998-09-01 | Compaq Computer Corporation And Advanced Micro Devices, Inc. | Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles |
| US5909572A (en) * | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
| US5933650A (en) * | 1997-10-09 | 1999-08-03 | Mips Technologies, Inc. | Alignment and ordering of vector elements for single instruction multiple data processing |
| CN1522401A (zh) * | 2001-10-29 | 2004-08-18 | ض� | 数据并行右移合并的方法与装置 |
| GB0226732D0 (en) * | 2002-11-15 | 2002-12-24 | Imagination Tech Ltd | A configurable processor architecture |
| CN1577257A (zh) * | 2003-06-30 | 2005-02-09 | 英特尔公司 | 具有取整和移位的单指令多数据整数高位乘法 |
| CN101923534A (zh) * | 2009-06-10 | 2010-12-22 | 新奥特(北京)视频技术有限公司 | 应用sse指令集对视音频信号的对称卷积核进行卷积的方法 |
| CN102495721A (zh) * | 2011-12-02 | 2012-06-13 | 南京大学 | 一种支持fft加速的simd向量处理器 |
| CN104025033A (zh) * | 2011-12-30 | 2014-09-03 | 英特尔公司 | 利用控制操纵的simd可变移位和循环 |
| CN104969215A (zh) * | 2013-03-13 | 2015-10-07 | 高通股份有限公司 | 具有用于提供多模基-2x蝶形向量处理电路的可编程数据路径的向量处理引擎以及相关的向量处理器、系统和方法 |
| CN105723333A (zh) * | 2013-11-15 | 2016-06-29 | 高通股份有限公司 | 在执行单元与向量数据存储器之间具有合并电路系统的向量处理引擎以及相关的方法 |
| EP3093757A2 (en) * | 2015-05-11 | 2016-11-16 | Ceva D.S.P. Ltd. | Multi-dimensional sliding window operation for a vector processor |
| CN106940815A (zh) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | 一种可编程卷积神经网络协处理器ip核 |
| CN106991473A (zh) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 面向向量处理器的基于simd的平均值值池化并行处理方法 |
Non-Patent Citations (3)
| Title |
|---|
| MCC-SIMD数据并行卷积计算方法的研究;张发存, 赵晓红, 王忠, 沈绪榜;计算机工程(第09期);34-36 * |
| SIMD技术在数字图像处理中的应用研究(英文);辛明瑞, 高德远, 佟凤辉;微电子学与计算机(第11期);164-168 * |
| 基于SIMD技术的图像卷积处理器体系结构研究;佟凤辉, 樊晓桠, 王党辉, 辛明瑞;微电子学与计算机(第03期);13-16+20 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3676700B1 (en) | 2022-12-28 |
| CN119556989A (zh) | 2025-03-04 |
| JP7652507B2 (ja) | 2025-03-27 |
| JP7335231B2 (ja) | 2023-08-29 |
| US20190079764A1 (en) | 2019-03-14 |
| WO2019051027A1 (en) | 2019-03-14 |
| US11803377B2 (en) | 2023-10-31 |
| CN111213125A (zh) | 2020-05-29 |
| US20240012644A1 (en) | 2024-01-11 |
| JP2023160833A (ja) | 2023-11-02 |
| JP2020533691A (ja) | 2020-11-19 |
| EP3676700A1 (en) | 2020-07-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111213125B (zh) | 使用simd指令进行高效的直接卷积 | |
| CN107408037B (zh) | 配置成对可变长度向量进行操作的单片向量处理器 | |
| TWI528276B (zh) | 執行乘法乘法累加指令之技術 | |
| US11630997B2 (en) | Method and apparatus with bit-serial data processing of a neural network | |
| CN112069459A (zh) | 用于稀疏-密集矩阵乘法的加速器 | |
| CN104603746B (zh) | 由读和写掩码控制的向量移动指令 | |
| CN114341802B (zh) | 用于执行存储器内处理操作的方法及相关存储器装置和系统 | |
| CN107533460B (zh) | 紧缩有限冲激响应(fir)滤波处理器、方法、系统和指令 | |
| JP7385009B2 (ja) | 圧縮支援命令 | |
| US20200265106A1 (en) | Two-dimensional multi-layer convolution for deep learning | |
| US9436465B2 (en) | Moving average processing in processor and processor | |
| KR20230109791A (ko) | 패킹된 데이터 정렬 플러스 계산 명령어, 프로세서,방법, 및 시스템 | |
| US20240111530A1 (en) | Matrix multiplication unit with flexible precision operations | |
| CN110235099A (zh) | 用于处理输入操作数值的装置和方法 | |
| CN114090954A (zh) | 一种基于ft-2000+的整数矩阵乘法内核优化方法 | |
| CN112434255A (zh) | 向量-矩阵运算和数据处理方法、乘法器和处理器芯片 | |
| WO2024251385A1 (en) | Indexed vector permutation, vector comparison, and/or population count operations | |
| GB2523805A (en) | Data processing apparatus and method for performing vector scan operation | |
| US12493577B2 (en) | Digital signal processor (DSP) and electronic device using the same | |
| US20250258648A1 (en) | Apparatus and method with in-register computing | |
| WO2024250758A1 (zh) | 复数数据处理方法以及相关设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |