CN114329329B - 硬件中的稀疏矩阵乘法 - Google Patents
硬件中的稀疏矩阵乘法Info
- Publication number
- CN114329329B CN114329329B CN202111665133.7A CN202111665133A CN114329329B CN 114329329 B CN114329329 B CN 114329329B CN 202111665133 A CN202111665133 A CN 202111665133A CN 114329329 B CN114329329 B CN 114329329B
- Authority
- CN
- China
- Prior art keywords
- sparse
- input
- vector
- matrix
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8046—Systolic arrays
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/487—Multiplying; Dividing
- G06F7/4876—Multiplying
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/501—Half or full adders, i.e. basic adder cells for one denomination
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/76—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
- G06F7/78—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data for changing the order of data flow, e.g. matrix transposition or LIFO buffers; Overflow or underflow handling therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Nonlinear Science (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Complex Calculations (AREA)
- Neurology (AREA)
- Advance Control (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/329,259 US12189710B2 (en) | 2021-05-25 | 2021-05-25 | Sparse matrix multiplication in hardware |
| US17/329,259 | 2021-05-25 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114329329A CN114329329A (zh) | 2022-04-12 |
| CN114329329B true CN114329329B (zh) | 2026-02-17 |
Family
ID=80222142
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111665133.7A Active CN114329329B (zh) | 2021-05-25 | 2021-12-30 | 硬件中的稀疏矩阵乘法 |
Country Status (5)
| Country | Link |
|---|---|
| US (2) | US12189710B2 (https=) |
| EP (1) | EP4095719A1 (https=) |
| JP (2) | JP7401513B2 (https=) |
| KR (2) | KR102601034B1 (https=) |
| CN (1) | CN114329329B (https=) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20220012304A1 (en) * | 2020-07-07 | 2022-01-13 | Sudarshan Kumar | Fast matrix multiplication |
| US11940907B2 (en) * | 2021-06-25 | 2024-03-26 | Intel Corporation | Methods and apparatus for sparse tensor storage for neural network accelerators |
| US20220012012A1 (en) * | 2021-09-24 | 2022-01-13 | Martin Langhammer | Systems and Methods for Sparsity Operations in a Specialized Processing Block |
| US20230267169A1 (en) * | 2022-02-24 | 2023-08-24 | Xilinx, Inc. | Sparse matrix dense vector multliplication circuitry |
| CN115470450B (zh) * | 2022-08-30 | 2025-09-05 | 无锡江南计算技术研究所 | 一种矩阵乘运算装置及其低开销异常定位方法 |
| WO2024108584A1 (zh) * | 2022-11-25 | 2024-05-30 | 华为技术有限公司 | 稀疏算子处理方法及装置 |
| KR20240081961A (ko) * | 2022-12-01 | 2024-06-10 | 삼성전자주식회사 | 희소 행렬의 압축 저장 포맷을 변환하는 전자 장치 및 그 동작 방법 |
| KR102745798B1 (ko) * | 2023-12-26 | 2024-12-23 | 리벨리온 주식회사 | 데이터 연산 방법 및 이를 지원하는 데이터 연산 장치 |
| US20260111173A1 (en) * | 2024-10-17 | 2026-04-23 | Edgecortix Inc. | On-chip non-zero value unpacking and distribution |
| CN119602947B (zh) * | 2024-11-14 | 2025-10-03 | 西安交通大学 | 一种面向密码算法bike的二进制多项式乘法器及加密方法 |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9697176B2 (en) | 2014-11-14 | 2017-07-04 | Advanced Micro Devices, Inc. | Efficient sparse matrix-vector multiplication on parallel processors |
| US9760538B2 (en) | 2014-12-22 | 2017-09-12 | Palo Alto Research Center Incorporated | Computer-implemented system and method for efficient sparse matrix representation and processing |
| US10528321B2 (en) | 2016-12-07 | 2020-01-07 | Microsoft Technology Licensing, Llc | Block floating point for neural network implementations |
| US10489063B2 (en) | 2016-12-19 | 2019-11-26 | Intel Corporation | Memory-to-memory instructions to accelerate sparse-matrix by dense-vector and sparse-vector by dense-vector multiplication |
| US11216722B2 (en) | 2016-12-31 | 2022-01-04 | Intel Corporation | Hardware accelerator template and design framework for implementing recurrent neural networks |
| US10180928B2 (en) | 2016-12-31 | 2019-01-15 | Intel Corporation | Heterogeneous hardware accelerator architecture for processing sparse matrix data with skewed non-zero distributions |
| WO2018134740A2 (en) * | 2017-01-22 | 2018-07-26 | Gsi Technology Inc. | Sparse matrix multiplication in associative memory device |
| US10572568B2 (en) * | 2018-03-28 | 2020-02-25 | Intel Corporation | Accelerator for sparse-dense matrix multiplication |
| US10726096B2 (en) | 2018-10-12 | 2020-07-28 | Hewlett Packard Enterprise Development Lp | Sparse matrix vector multiplication with a matrix vector multiplication unit |
| KR102838677B1 (ko) * | 2019-03-15 | 2025-07-25 | 인텔 코포레이션 | 매트릭스 가속기 아키텍처를 위한 희소 최적화 |
| US11188618B2 (en) * | 2019-09-05 | 2021-11-30 | Intel Corporation | Sparse matrix multiplication acceleration mechanism |
| US11663746B2 (en) * | 2019-11-15 | 2023-05-30 | Intel Corporation | Systolic arithmetic on sparse data |
| US12141229B2 (en) * | 2021-05-19 | 2024-11-12 | Nvidia Corporation | Techniques for accelerating matrix multiplication computations using hierarchical representations of sparse matrices |
-
2021
- 2021-05-25 US US17/329,259 patent/US12189710B2/en active Active
- 2021-12-21 JP JP2021207147A patent/JP7401513B2/ja active Active
- 2021-12-30 CN CN202111665133.7A patent/CN114329329B/zh active Active
-
2022
- 2022-02-07 EP EP22155308.4A patent/EP4095719A1/en active Pending
- 2022-02-09 KR KR1020220016772A patent/KR102601034B1/ko active Active
-
2023
- 2023-11-06 KR KR1020230151637A patent/KR20230155417A/ko active Pending
- 2023-12-07 JP JP2023206881A patent/JP7793585B2/ja active Active
-
2024
- 2024-11-12 US US18/944,274 patent/US20250068694A1/en active Pending
Non-Patent Citations (1)
| Title |
|---|
| HyGCN: A GCN Accelerator with Hybrid Architecture;YAN MINGYU 等;2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA);20200413;第15-29页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20220159257A (ko) | 2022-12-02 |
| US20220382829A1 (en) | 2022-12-01 |
| KR102601034B1 (ko) | 2023-11-09 |
| US12189710B2 (en) | 2025-01-07 |
| JP7793585B2 (ja) | 2026-01-05 |
| EP4095719A1 (en) | 2022-11-30 |
| US20250068694A1 (en) | 2025-02-27 |
| JP2024028901A (ja) | 2024-03-05 |
| KR20230155417A (ko) | 2023-11-10 |
| CN114329329A (zh) | 2022-04-12 |
| JP7401513B2 (ja) | 2023-12-19 |
| JP2022181161A (ja) | 2022-12-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114329329B (zh) | 硬件中的稀疏矩阵乘法 | |
| Lu et al. | SpWA: An efficient sparse winograd convolutional neural networks accelerator on FPGAs | |
| KR102511911B1 (ko) | Gemm 데이터플로우 가속기 반도체 회로 | |
| US10726096B2 (en) | Sparse matrix vector multiplication with a matrix vector multiplication unit | |
| CN109657782B (zh) | 运算方法、装置及相关产品 | |
| EP3179415B1 (en) | Systems and methods for a multi-core optimized recurrent neural network | |
| US20250173398A1 (en) | Matrix computing method and apparatus | |
| CN108170639A (zh) | 基于分布式环境的张量cp分解实现方法 | |
| KR20190107766A (ko) | 계산 장치 및 방법 | |
| Hunter et al. | Two sparsities are better than one: unlocking the performance benefits of sparse–sparse networks | |
| Conte et al. | GPU-acceleration of waveform relaxation methods for large differential systems | |
| US20240086719A1 (en) | Sparse encoding and decoding at mixture-of-experts layer | |
| KR102869716B1 (ko) | 기계 학습 가속을 위한 전력 감소 | |
| CN112446007A (zh) | 一种矩阵运算方法、运算装置以及处理器 | |
| Zhuzhunashvili et al. | Preconditioned spectral clustering for stochastic block partition streaming graph challenge (preliminary version at arxiv.) | |
| CN118069315A (zh) | 基于分布式平台的稀疏三角矩阵的求解方法及装置 | |
| JP2010122850A (ja) | 行列方程式計算装置および行列方程式計算方法 | |
| HK40064573A (zh) | 硬件中的稀疏矩阵乘法 | |
| US20240160906A1 (en) | Collective communication phases at mixture-of-experts layer | |
| Wang et al. | A parallel sparse approximate inverse preconditioning algorithm based on MPI and CUDA | |
| CN115576895B (zh) | 计算装置、计算方法及计算机可读存储介质 | |
| Venieris et al. | Towards heterogeneous solvers for large-scale linear systems | |
| Al Na'mneh et al. | An efficient bit reversal permutation algorithm | |
| Sergiyenko et al. | Genetic Programming of Discrete Cosine Transform Processors | |
| Tomii et al. | A Hardware Solver for Simultaneous Linear Equations with Multistage Interconnection Network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40064573 Country of ref document: HK |
|
| GR01 | Patent grant | ||
| GR01 | Patent grant |