JP7335231B2 - Simd命令を用いた効率的な直接畳み込み - Google Patents
Simd命令を用いた効率的な直接畳み込み Download PDFInfo
- Publication number
- JP7335231B2 JP7335231B2 JP2020513910A JP2020513910A JP7335231B2 JP 7335231 B2 JP7335231 B2 JP 7335231B2 JP 2020513910 A JP2020513910 A JP 2020513910A JP 2020513910 A JP2020513910 A JP 2020513910A JP 7335231 B2 JP7335231 B2 JP 7335231B2
- Authority
- JP
- Japan
- Prior art keywords
- vector
- lanes
- vectors
- data
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Advance Control (AREA)
- Complex Calculations (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023132932A JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762556274P | 2017-09-08 | 2017-09-08 | |
| US62/556,274 | 2017-09-08 | ||
| US15/941,975 | 2018-03-30 | ||
| US15/941,975 US11803377B2 (en) | 2017-09-08 | 2018-03-30 | Efficient direct convolution using SIMD instructions |
| PCT/US2018/049666 WO2019051027A1 (en) | 2017-09-08 | 2018-09-06 | EFFECTIVE DIRECT CONVOLUTION USING HMIS INSTRUCTIONS |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023132932A Division JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2020533691A JP2020533691A (ja) | 2020-11-19 |
| JP2020533691A5 JP2020533691A5 (https=) | 2021-10-14 |
| JP7335231B2 true JP7335231B2 (ja) | 2023-08-29 |
Family
ID=65631104
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2020513910A Active JP7335231B2 (ja) | 2017-09-08 | 2018-09-06 | Simd命令を用いた効率的な直接畳み込み |
| JP2023132932A Active JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023132932A Active JP7652507B2 (ja) | 2017-09-08 | 2023-08-17 | Simd命令を用いた効率的な直接畳み込み |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US11803377B2 (https=) |
| EP (1) | EP3676700B1 (https=) |
| JP (2) | JP7335231B2 (https=) |
| CN (2) | CN119556989A (https=) |
| WO (1) | WO2019051027A1 (https=) |
Families Citing this family (33)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10747844B2 (en) * | 2017-12-12 | 2020-08-18 | Tesla, Inc. | Systems and methods for converting a matrix input to a vectorized input for a matrix processor |
| US10565285B2 (en) * | 2017-12-18 | 2020-02-18 | International Business Machines Corporation | Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations |
| US12099912B2 (en) | 2018-06-22 | 2024-09-24 | Samsung Electronics Co., Ltd. | Neural processor |
| CN111813447B (zh) * | 2019-04-12 | 2022-11-08 | 杭州中天微系统有限公司 | 一种数据拼接指令的处理方法和处理装置 |
| US11671111B2 (en) | 2019-04-17 | 2023-06-06 | Samsung Electronics Co., Ltd. | Hardware channel-parallel data compression/decompression |
| US11211944B2 (en) | 2019-04-17 | 2021-12-28 | Samsung Electronics Co., Ltd. | Mixed-precision compression with random access |
| US12182577B2 (en) | 2019-05-01 | 2024-12-31 | Samsung Electronics Co., Ltd. | Neural-processing unit tile for shuffling queued nibbles for multiplication with non-zero weight nibbles |
| US11880760B2 (en) | 2019-05-01 | 2024-01-23 | Samsung Electronics Co., Ltd. | Mixed-precision NPU tile with depth-wise convolution |
| US20210049474A1 (en) * | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | Neural network method and apparatus |
| US11726950B2 (en) * | 2019-09-28 | 2023-08-15 | Intel Corporation | Compute near memory convolution accelerator |
| US11475283B2 (en) * | 2019-10-24 | 2022-10-18 | Apple Inc. | Multi dimensional convolution in neural network processor |
| US12112141B2 (en) | 2019-12-12 | 2024-10-08 | Samsung Electronics Co., Ltd. | Accelerating 2D convolutional layer mapping on a dot product architecture |
| CN111178505B (zh) * | 2019-12-23 | 2023-04-07 | 福建星网视易信息系统有限公司 | 卷积神经网络的加速方法和计算机可读存储介质 |
| CN111797985B (zh) * | 2020-07-22 | 2022-11-22 | 哈尔滨工业大学 | 一种基于gpu的卷积运算内存访问优化方法 |
| KR102860334B1 (ko) * | 2020-08-14 | 2025-09-16 | 삼성전자주식회사 | 중복성 감축 기반의 컨볼루션 연산 처리 방법 및 장치 |
| CN112633505B (zh) * | 2020-12-24 | 2022-05-27 | 苏州浪潮智能科技有限公司 | 一种基于risc-v的人工智能推理方法和系统 |
| US12182570B2 (en) | 2021-06-25 | 2024-12-31 | Intel Corporation | Apparatuses, methods, and systems for a packed data convolution instruction with shift control and width control |
| CN114443143B (zh) * | 2022-01-30 | 2025-01-07 | 上海阵量智能科技有限公司 | 指令处理方法、装置、芯片、电子设备以及存储介质 |
| US12443412B2 (en) | 2022-01-30 | 2025-10-14 | Simplex Micro, Inc. | Method and apparatus for a scalable microprocessor with time counter |
| US12190116B2 (en) | 2022-04-05 | 2025-01-07 | Simplex Micro, Inc. | Microprocessor with time count based instruction execution and replay |
| US12141580B2 (en) | 2022-04-20 | 2024-11-12 | Simplex Micro, Inc. | Microprocessor with non-cacheable memory load prediction |
| US12169716B2 (en) | 2022-04-20 | 2024-12-17 | Simplex Micro, Inc. | Microprocessor with a time counter for statically dispatching extended instructions |
| US12288065B2 (en) | 2022-04-29 | 2025-04-29 | Simplex Micro, Inc. | Microprocessor with odd and even register sets |
| CN115167920A (zh) * | 2022-07-06 | 2022-10-11 | 龙芯中科(合肥)技术有限公司 | 数据的处理方法、装置、电子设备及介质 |
| US12147812B2 (en) | 2022-07-13 | 2024-11-19 | Simplex Micro, Inc. | Out-of-order execution of loop instructions in a microprocessor |
| US12124849B2 (en) * | 2022-07-13 | 2024-10-22 | Simplex Micro, Inc. | Vector processor with extended vector registers |
| US12282772B2 (en) | 2022-07-13 | 2025-04-22 | Simplex Micro, Inc. | Vector processor with vector data buffer |
| US12541369B2 (en) | 2022-07-13 | 2026-02-03 | Simplex Micro, Inc. | Executing phantom loops in a microprocessor |
| US12566610B2 (en) | 2023-03-14 | 2026-03-03 | Simplex Micro, Inc. | Microprocessor with apparatus and method for replaying load instructions |
| US12566609B2 (en) | 2023-03-14 | 2026-03-03 | Simplex Micro, Inc. | Microprocessor with apparatus and method for handling of instructions with long throughput |
| US12566613B2 (en) | 2023-11-13 | 2026-03-03 | Simplex Micro, Inc. | Microprocessor with speculative and in-order register sets |
| CN117313803B (zh) * | 2023-11-28 | 2024-02-02 | 进迭时空(杭州)科技有限公司 | 基于risc-v向量处理器架构的滑动窗口2d卷积计算方法 |
| CN119536744B (zh) * | 2025-01-23 | 2025-06-17 | 山东浪潮科学研究院有限公司 | 一种代码自动向量化优化方法、设备及介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150086134A1 (en) | 2013-09-20 | 2015-03-26 | The Board Of Trustees Of The Leland Stanford Junior University | Low power programmable image processor |
| US20150261534A1 (en) | 2014-03-13 | 2015-09-17 | Intel Corporation | Packed two source inter-element shift merge processors, methods, systems, and instructions |
| US20170024218A1 (en) | 2015-07-20 | 2017-01-26 | Qualcomm Incorporated | Sliding window operation |
Family Cites Families (47)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6489868A (en) | 1987-09-30 | 1989-04-05 | Sony Corp | Video signal processing circuit |
| JPH03189868A (ja) * | 1989-12-20 | 1991-08-19 | Akira Iwata | データ処理プロセツサ |
| US5734874A (en) * | 1994-04-29 | 1998-03-31 | Sun Microsystems, Inc. | Central processing unit with integrated graphics functions |
| EP0681236B1 (en) * | 1994-05-05 | 2000-11-22 | Conexant Systems, Inc. | Space vector data path |
| US7085795B2 (en) | 2001-10-29 | 2006-08-01 | Intel Corporation | Apparatus and method for efficient filtering and convolution of content data |
| US6061711A (en) * | 1996-08-19 | 2000-05-09 | Samsung Electronics, Inc. | Efficient context saving and restoring in a multi-tasking computing system environment |
| US5909572A (en) * | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
| US5801975A (en) * | 1996-12-02 | 1998-09-01 | Compaq Computer Corporation And Advanced Micro Devices, Inc. | Computer modified to perform inverse discrete cosine transform operations on a one-dimensional matrix of numbers within a minimal number of instruction cycles |
| US5933650A (en) * | 1997-10-09 | 1999-08-03 | Mips Technologies, Inc. | Alignment and ordering of vector elements for single instruction multiple data processing |
| US6115812A (en) * | 1998-04-01 | 2000-09-05 | Intel Corporation | Method and apparatus for efficient vertical SIMD computations |
| ATE557342T1 (de) * | 1998-08-24 | 2012-05-15 | Microunity Systems Eng | Prozessor und verfahren zur matrixmultiplikation mit einem breiten operand |
| US7685212B2 (en) | 2001-10-29 | 2010-03-23 | Intel Corporation | Fast full search motion estimation with SIMD merge instruction |
| US7725521B2 (en) * | 2001-10-29 | 2010-05-25 | Intel Corporation | Method and apparatus for computing matrix transformations |
| US6954841B2 (en) * | 2002-06-26 | 2005-10-11 | International Business Machines Corporation | Viterbi decoding for SIMD vector processors with indirect vector element access |
| GB2395306B (en) * | 2002-11-15 | 2006-02-15 | Imagination Tech Ltd | A configurable processor architecture |
| US7409415B2 (en) * | 2002-12-20 | 2008-08-05 | Texas Instruments Incorporated | Processor system with efficient shift operations including EXTRACT operation |
| US7689641B2 (en) | 2003-06-30 | 2010-03-30 | Intel Corporation | SIMD integer multiply high with round and shift |
| GB2409063B (en) | 2003-12-09 | 2006-07-12 | Advanced Risc Mach Ltd | Vector by scalar operations |
| GB2409065B (en) * | 2003-12-09 | 2006-10-25 | Advanced Risc Mach Ltd | Multiplexing operations in SIMD processing |
| US7328230B2 (en) * | 2004-03-26 | 2008-02-05 | Intel Corporation | SIMD four-data element average instruction |
| US7315937B2 (en) * | 2004-10-01 | 2008-01-01 | Mips Technologies, Inc. | Microprocessor instructions for efficient bit stream extractions |
| US7933405B2 (en) * | 2005-04-08 | 2011-04-26 | Icera Inc. | Data access and permute unit |
| US7623732B1 (en) | 2005-04-26 | 2009-11-24 | Mercury Computer Systems, Inc. | Method and apparatus for digital image filtering with discrete filter kernels using graphics hardware |
| US7529918B2 (en) * | 2006-07-21 | 2009-05-05 | Broadcom Corporation | System and method for efficiently performing bit-field extraction and bit-field combination operations in a processor |
| US20080071851A1 (en) * | 2006-09-20 | 2008-03-20 | Ronen Zohar | Instruction and logic for performing a dot-product operation |
| US8255884B2 (en) | 2008-06-06 | 2012-08-28 | International Business Machines Corporation | Optimized scalar promotion with load and splat SIMD instructions |
| US20100180100A1 (en) | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
| CN101923534B (zh) | 2009-06-10 | 2012-02-01 | 新奥特(北京)视频技术有限公司 | 应用sse指令集对视音频信号的对称卷积核进行卷积的方法 |
| US8732437B2 (en) * | 2010-01-26 | 2014-05-20 | Oracle America, Inc. | Low-overhead misalignment and reformatting support for SIMD |
| US20120185670A1 (en) * | 2011-01-14 | 2012-07-19 | Toll Bret L | Scalar integer instructions capable of execution with three registers |
| US20120254589A1 (en) * | 2011-04-01 | 2012-10-04 | Jesus Corbal San Adrian | System, apparatus, and method for aligning registers |
| KR102063140B1 (ko) | 2011-10-27 | 2020-02-11 | 인텔 코포레이션 | 블록 기반 파고율 저감 |
| CN102495721A (zh) * | 2011-12-02 | 2012-06-13 | 南京大学 | 一种支持fft加速的simd向量处理器 |
| US9946540B2 (en) * | 2011-12-23 | 2018-04-17 | Intel Corporation | Apparatus and method of improved permute instructions with multiple granularities |
| CN104025033B (zh) * | 2011-12-30 | 2017-11-21 | 英特尔公司 | 利用控制操纵的simd可变移位和循环 |
| US9275014B2 (en) * | 2013-03-13 | 2016-03-01 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods |
| US9813223B2 (en) | 2013-04-17 | 2017-11-07 | Intel Corporation | Non-linear modeling of a physical system using direct optimization of look-up table values |
| US9684509B2 (en) * | 2013-11-15 | 2017-06-20 | Qualcomm Incorporated | Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory, and related vector processing instructions, systems, and methods |
| US10402196B2 (en) * | 2015-05-11 | 2019-09-03 | Ceva D.S.P. Ltd. | Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients |
| US9582726B2 (en) * | 2015-06-24 | 2017-02-28 | Qualcomm Incorporated | Systems and methods for image processing in a deep convolution network |
| GB2540939B (en) * | 2015-07-31 | 2019-01-23 | Advanced Risc Mach Ltd | An apparatus and method for performing a splice operation |
| US20170357894A1 (en) * | 2016-06-10 | 2017-12-14 | Apple Inc. | Data packing for convolution of artificial neural networks |
| US10282204B2 (en) * | 2016-07-02 | 2019-05-07 | Intel Corporation | Systems, apparatuses, and methods for strided load |
| CN106940815B (zh) * | 2017-02-13 | 2020-07-28 | 西安交通大学 | 一种可编程卷积神经网络协处理器ip核 |
| CN106991473A (zh) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | 面向向量处理器的基于simd的平均值值池化并行处理方法 |
| US10824938B2 (en) * | 2017-04-24 | 2020-11-03 | Intel Corporation | Specialized fixed function hardware for efficient convolution |
| JP6958027B2 (ja) * | 2017-07-03 | 2021-11-02 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
-
2018
- 2018-03-30 US US15/941,975 patent/US11803377B2/en active Active
- 2018-09-06 JP JP2020513910A patent/JP7335231B2/ja active Active
- 2018-09-06 CN CN202311376759.5A patent/CN119556989A/zh active Pending
- 2018-09-06 CN CN201880066852.7A patent/CN111213125B/zh active Active
- 2018-09-06 WO PCT/US2018/049666 patent/WO2019051027A1/en not_active Ceased
- 2018-09-06 EP EP18779130.6A patent/EP3676700B1/en active Active
-
2023
- 2023-08-17 JP JP2023132932A patent/JP7652507B2/ja active Active
- 2023-09-22 US US18/472,482 patent/US20240012644A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150086134A1 (en) | 2013-09-20 | 2015-03-26 | The Board Of Trustees Of The Leland Stanford Junior University | Low power programmable image processor |
| US20150261534A1 (en) | 2014-03-13 | 2015-09-17 | Intel Corporation | Packed two source inter-element shift merge processors, methods, systems, and instructions |
| US20170024218A1 (en) | 2015-07-20 | 2017-01-26 | Qualcomm Incorporated | Sliding window operation |
Also Published As
| Publication number | Publication date |
|---|---|
| US20190079764A1 (en) | 2019-03-14 |
| US20240012644A1 (en) | 2024-01-11 |
| JP2023160833A (ja) | 2023-11-02 |
| CN111213125A (zh) | 2020-05-29 |
| EP3676700B1 (en) | 2022-12-28 |
| CN111213125B (zh) | 2023-11-07 |
| WO2019051027A1 (en) | 2019-03-14 |
| EP3676700A1 (en) | 2020-07-08 |
| JP7652507B2 (ja) | 2025-03-27 |
| US11803377B2 (en) | 2023-10-31 |
| CN119556989A (zh) | 2025-03-04 |
| JP2020533691A (ja) | 2020-11-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7335231B2 (ja) | Simd命令を用いた効率的な直接畳み込み | |
| JP7728831B2 (ja) | 加速数学エンジン | |
| US11175920B2 (en) | Efficient work execution in a parallel computing system | |
| US10140251B2 (en) | Processor and method for executing matrix multiplication operation on processor | |
| CN111381880A (zh) | 加载-存储指令 | |
| US20200356837A1 (en) | Fast deep learning fully-connected inference | |
| TWI603262B (zh) | 緊縮有限脈衝響應(fir)濾波器處理器,方法,系統及指令 | |
| US11341210B2 (en) | Two-dimensional multi-layer convolution for deep learning | |
| US9436465B2 (en) | Moving average processing in processor and processor | |
| KR20220051006A (ko) | Pim(processing-in-memory) 연산 수행 방법, 및 관련 메모리 디바이스 및 시스템 | |
| US20240111530A1 (en) | Matrix multiplication unit with flexible precision operations | |
| US20200356836A1 (en) | Fast deep learning fully-connected column-major implementation | |
| CN112434255A (zh) | 向量-矩阵运算和数据处理方法、乘法器和处理器芯片 | |
| CN114090954A (zh) | 一种基于ft-2000+的整数矩阵乘法内核优化方法 | |
| GB2523805A (en) | Data processing apparatus and method for performing vector scan operation | |
| EP4423601A1 (en) | Performing a floating-point multiply-add operation in a computer implemented environment | |
| US20250258648A1 (en) | Apparatus and method with in-register computing | |
| US12493577B2 (en) | Digital signal processor (DSP) and electronic device using the same | |
| Moradifar et al. | Performance improvement of multimedia Kernels using data-and thread-level parallelism on CPU platform | |
| Moradifar et al. | Performance Improvement of Multimedia | |
| Zhang et al. | Vectorizable Design and Implementation of Matrix Multiplication on Vector Processor | |
| Sreedhar et al. | Matrix-matrix multiplication on a large register file architecture with indirection |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20210906 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20210906 |
|
| A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20220930 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20221025 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230124 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20230404 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230704 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20230718 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20230817 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7335231 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |