CN112712173A - Method and system for acquiring sparse operation data based on MAC (media Access control) multiply-add array - Google Patents

Method and system for acquiring sparse operation data based on MAC (media Access control) multiply-add array Download PDF

Info

Publication number
CN112712173A
CN112712173A CN202011640074.3A CN202011640074A CN112712173A CN 112712173 A CN112712173 A CN 112712173A CN 202011640074 A CN202011640074 A CN 202011640074A CN 112712173 A CN112712173 A CN 112712173A
Authority
CN
China
Prior art keywords
unit
array
column
matrix
columns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011640074.3A
Other languages
Chinese (zh)
Other versions
CN112712173B (en
Inventor
吴小鹏
唐士斌
欧阳鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingwei Intelligent Technology Co ltd
Original Assignee
Beijing Qingwei Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingwei Intelligent Technology Co ltd filed Critical Beijing Qingwei Intelligent Technology Co ltd
Priority to CN202011640074.3A priority Critical patent/CN112712173B/en
Publication of CN112712173A publication Critical patent/CN112712173A/en
Application granted granted Critical
Publication of CN112712173B publication Critical patent/CN112712173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a method for acquiring sparse operation data based on an MAC multiply-add array, which comprises the following steps: and taking O rows and I columns as a dividing unit in the row-column direction of the sparse weight matrix to be calculated. One or more cell blocks are read in a column direction of a thinning weight matrix to be calculated. A plurality of operating modes are generated. And integrating the calculation arrays into a calculation array with M rows and N columns. When matrix multiplication is realized through the MAC multiplication and addition array, the calculation array is used as a multiplier item after being converted, and an effective weight value unit in the row direction of the calculation matrix can be used as a characteristic weight value for calculation. According to the invention, through the division of the sparse weight matrix to be calculated, the divided units can meet the operation structure of the MAC multiply-add array, so that a plurality of non-zero weights are processed in parallel by using a small amount of resources. And (5) carrying out a traditional thinning process. Meanwhile, the invention also provides a system for acquiring sparse operation data based on the MAC multiply-add array.

Description

Method and system for acquiring sparse operation data based on MAC (media Access control) multiply-add array
Technical Field
The invention relates to the field of reconfigurable processors, in particular to neural network hardware acceleration, convolution, full connection, rule sparseness and matrix multiplication operation. The invention particularly relates to a method and a system for acquiring sparse operation data based on an MAC multiply-add array.
Background
The neural network accelerator is used for accelerating the operation of a neural network algorithm through hardware such as a chip and plays an important role in the neural network operation. The traditional neural network operation is generally performed by a GPU or a CPU in a software mode, the operation speed is low, the energy efficiency is low, meanwhile, the application scene of a customized ASIC or FPGA is single, and only convolution operation or full connection can be performed. The neural network operation is generally operation intensive operation, the data bandwidth and weight on-chip storage can become the performance of the existing neural network accelerator, the weight data can be compressed through thinning, the problem of the data bandwidth and weight on-chip storage is solved, and the operation efficiency can be improved (or the actual operation amount is reduced).
At present, general sparsification is very unfriendly to hardware, so that according to rule sparsification, a hardware structure is optimized to better support sparsification. The matrix operation is a relatively common operation in various algorithms, but is different from the NN operation rule of the neural network, so that only the ASIC can be customized, and the MAC array can simultaneously support the neural network calculation and the matrix multiplication calculation in a special mode.
Disclosure of Invention
The invention aims to provide a method for acquiring sparse operation data based on an MAC multiply-add array, which divides a sparse weight matrix to be calculated to enable the divided units to meet the operation structure of the MAC multiply-add array, thereby utilizing a small amount of resources to process a plurality of nonzero weights in parallel; and (5) carrying out a traditional thinning process.
The invention also aims to provide a system for acquiring sparse operation data based on the MAC multiply-add array, which can enable the divided units to meet the operation structure of the MAC multiply-add array by dividing the sparse weight matrix to be calculated, thereby accelerating the operation speed of the system, reducing the implementation cost and reducing the complexity of hardware implementation.
In a first aspect of the present invention, a method for acquiring sparsely operated data based on a MAC multiply-add array is provided, where the MAC multiply-add array is an O row-I column matrix. The MAC multiply-add array comprises I multiplied by O computing units, wherein I bit input channels are arranged along the column direction of the MAC multiply-add array, and each input channel corresponds to one computing unit. And the row direction of the MAC multiplication and addition array is provided with O-bit output channels, and each input channel of the O-bit output channels corresponds to one computing unit.
The method for acquiring the sparse operation data based on the MAC multiply-add array comprises the following steps:
step S101, taking the row and column directions of the sparse weight matrix to be calculated as a dividing unit by taking O rows and I columns, and dividing the sparse weight matrix into a plurality of cell blocks along the column direction according to the division of the sparse weight matrix. Each cell block includes a plurality of cells having a significant weight value.
Step S102, one or more unit blocks are read in the column direction of the thinning weight matrix to be calculated. If the number of cells of the one or more cell blocks having a significant weight value is equal to I O/2 and the number of cells of the one or more cell blocks having a significant weight value is equal to or less than I in each column of the one or more cell blocks, a plurality of operation modes corresponding to the one or more cell blocks are generated.
Step S103, reading one or more cell blocks along the column direction of the thinning weight matrix according to a plurality of working modes. The cells of significant weight values in one or more cell blocks are integrated into a computational array of M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Row M corresponds to row O. The N columns correspond to the I columns.
And step S104, when matrix multiplication calculation is realized through the MAC multiplication and addition array, the calculation array is used as a multiplier item after being converted, and an effective weight value unit in the row direction of the calculation matrix can be used as a characteristic weight value for calculation.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, step S102 includes:
1 unit block is read as a first division unit in the column direction of the thinning-out weight matrix to be calculated. If the number of cells of the first partition unit with significant weight values is equal to I × O/2 and the number of cells of the first partition unit with significant weight values is below I in each column of the first partition unit, a first operation mode is generated. Or
And reading 2 unit blocks in the column direction of the sparse weight matrix to be calculated as a second division unit. And if the number of the units of the effective weight values in the second dividing unit is equal to I multiplied by O/2 and the number of the units of the effective weight values in each column of the second dividing unit is below I, generating a second working mode. Or
And reading 4 unit blocks in the column direction of the sparse weight matrix to be calculated as a third division unit. And if the number of the effective weight values in the third dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the third dividing unit is below I, generating a third working mode. Or
And reading 8 unit blocks in the column direction of the sparse weight matrix to be calculated as a fourth division unit. And if the number of the cells of the effective weight value in the fourth dividing cell is equal to I multiplied by O/2 and the number of the cells of the effective weight value in each column of the fourth dividing cell is below I, generating a fourth working mode.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the step of integrating the units of effective weight values in one or more unit blocks into a calculation array with M rows and N columns in step S103 includes:
one or more cell blocks are read. And sequentially reading the units according to the row sorting in each column, and if the current reading unit is an effective weight value unit, sequentially arranging the current effective weight value unit and the last effective weight value unit along the row sorting and the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, step S103 includes:
and reading the first division unit along the column direction of the sparse weight matrix according to a first working mode. And integrating the effective weight value units of the first division unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the second division unit along the column direction of the thinning weight matrix according to a second working mode. And integrating the effective weight value units of the second dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the third division unit along the column direction of the sparse weight matrix according to a third working mode. And integrating the effective weight value units of the third dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the fourth division unit along the column direction of the sparse weight matrix according to the fourth working mode. And integrating the effective weight value units of the fourth dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, after step S104, the method further includes:
and step S105, taking the calculation array as a characteristic input value to realize convolution or full-connection layer calculation in the neural network model of deep learning.
The MAC multiply-add array is an 8 row 8 column matrix, a 16 row 16 column matrix, a 32 row 32 column matrix, a 64 row 64 column matrix, a 16 row 32 column matrix, or a 32 row 64 column matrix. The MAC multiply-add array includes 8 × 8, 16 × 16, 32 × 32, 64 × 64, or 16 × 32, 32 × 64 computing units. The M rows and N columns of computational arrays are 8 rows and 8 columns, 16 rows and 16 columns, 32 rows and 32 columns, 64 rows and 64 columns, 16 rows and 32 columns, or 32 rows and 64 columns of computational arrays.
In a second aspect, the present invention provides a system for acquiring sparsely operated data based on a MAC multiply-add array, which is an O row-I column matrix. The MAC multiply-add array comprises I multiplied by O computing units, wherein I bit input channels are arranged along the column direction of the MAC multiply-add array, and each input channel corresponds to one computing unit. And the row direction of the MAC multiplication and addition array is provided with O-bit output channels, and each input channel of the O-bit output channels corresponds to one computing unit.
The system for acquiring the sparse operation data based on the MAC multiply-add array comprises: a dividing unit, a generating work mode unit, an integrating unit and a calculating unit, wherein:
and the dividing unit is configured to divide the row and column directions of the sparse weight matrix to be calculated into a plurality of cell blocks along the column direction of the sparse weight matrix by taking O rows and I columns as one dividing unit. Each cell block includes a plurality of cells having a significant weight value.
Generating an operation mode unit configured to read one or more unit blocks in a column direction of a thinning weight matrix to be calculated. If the number of cells of the one or more cell blocks having a significant weight value is equal to I O/2 and the number of cells of the one or more cell blocks having a significant weight value is equal to or less than I in each column of the one or more cell blocks, a plurality of operation modes corresponding to the one or more cell blocks are generated.
An integration unit configured to read one or more unit blocks in a column direction of the thinning weight matrix according to a plurality of operation modes. The cells of significant weight values in one or more cell blocks are integrated into a computational array of M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Row M corresponds to row O. The N columns correspond to the I columns.
And the calculating unit is configured to convert the calculating array to be used as a multiplier item when matrix multiplication calculation is realized through the MAC multiplication and addition array, and the effective weight value unit for calculating the row direction of the matrix can be used as a characteristic weight value for calculation.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the generating operation mode unit is further configured to:
1 unit block is read as a first division unit in the column direction of the thinning-out weight matrix to be calculated. If the number of cells of the first partition unit with significant weight values is equal to I × O/2 and the number of cells of the first partition unit with significant weight values is below I in each column of the first partition unit, a first operation mode is generated. Or
And reading 2 unit blocks in the column direction of the sparse weight matrix to be calculated as a second division unit. And if the number of the units of the effective weight values in the second dividing unit is equal to I multiplied by O/2 and the number of the units of the effective weight values in each column of the second dividing unit is below I, generating a second working mode. Or
And reading 4 unit blocks in the column direction of the sparse weight matrix to be calculated as a third division unit. And if the number of the effective weight values in the third dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the third dividing unit is below I, generating a third working mode. Or
And reading 8 unit blocks in the column direction of the sparse weight matrix to be calculated as a fourth division unit. And if the number of the cells of the effective weight value in the fourth dividing cell is equal to I multiplied by O/2 and the number of the cells of the effective weight value in each column of the fourth dividing cell is below I, generating a fourth working mode.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the step of integrating the units of effective weight values in one or more unit blocks into a calculation array with M rows and N columns in the integration unit is configured to:
one or more cell blocks are read. And sequentially reading the units according to the row sorting in each column, and if the current reading unit is an effective weight value unit, sequentially arranging the current effective weight value unit and the last effective weight value unit along the row sorting and the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the integrating unit is configured to:
and reading the first division unit along the column direction of the sparse weight matrix according to a first working mode. And integrating the effective weight value units of the first division unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the second division unit along the column direction of the thinning weight matrix according to a second working mode. And integrating the effective weight value units of the second dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the third division unit along the column direction of the sparse weight matrix according to a third working mode. And integrating the effective weight value units of the third dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the fourth division unit along the column direction of the sparse weight matrix according to the fourth working mode. And integrating the effective weight value units of the fourth dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the calculating unit further includes:
and a convolution calculation unit configured to implement convolution or full-connected layer calculation in the deep-learning neural network model with the calculation array as the characteristic input value.
The MAC multiply-add array is an 8 row 8 column matrix, a 16 row 16 column matrix, a 32 row 32 column matrix, a 64 row 64 column matrix, a 16 row 32 column matrix, or a 32 row 64 column matrix. The MAC multiply-add array includes 8 × 8, 16 × 16, 32 × 32, 64 × 64, or 16 × 32, 32 × 64 computing units. The M rows and N columns of computational arrays are 8 rows and 8 columns, 16 rows and 16 columns, 32 rows and 32 columns, 64 rows and 64 columns, 16 rows and 32 columns, or 32 rows and 64 columns of computational arrays.
The following will further describe characteristics, technical features, advantages and implementation manners of the method and system for acquiring sparse operation data based on the MAC multiply-add array in a clearly understandable manner with reference to the accompanying drawings.
Drawings
Fig. 1 is a flowchart for explaining a method of acquiring sparse operation data based on a MAC multiply-add array according to an embodiment of the present invention.
Fig. 2 is a schematic diagram for explaining the composition of a MAC-based multiply-add array according to an embodiment of the present invention.
Fig. 3 is a diagram for explaining the effective weight values in the cells when the first division unit corresponds to the first operation mode according to an embodiment of the present invention.
Fig. 4 is a diagram for explaining the effective weight values in the unit when the second division unit corresponds to the second operation mode in an embodiment of the present invention.
Fig. 5 is a diagram for explaining the effective weight values in the unit when the third division unit corresponds to the third operation mode in an embodiment of the present invention.
Fig. 6 is a schematic diagram for explaining integration in the method for acquiring the sparsification operation data based on the MAC multiply-add array according to an embodiment of the present invention.
Fig. 7 is a schematic diagram for explaining a combination of a system for acquiring thinning-out operation data based on a MAC multiply-add array according to an embodiment of the present invention.
FIG. 8 is a diagram for illustrating the convolution module supporting the matrix multiplication process in one embodiment of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings, in which the same reference numerals indicate the same or structurally similar but functionally identical elements.
"exemplary" means "serving as an example, instance, or illustration" herein, and any illustration, embodiment, or steps described as "exemplary" herein should not be construed as a preferred or advantageous alternative. For the sake of simplicity, the drawings only schematically show the parts relevant to the present exemplary embodiment, and they do not represent the actual structure and the true scale of the product.
A first aspect of the invention. A method for acquiring sparse operation data based on a MAC multiplication and addition array is provided, wherein the MAC multiplication and addition array is an O row and I column matrix, and as shown in figure 2, the MAC multiplication and addition array is an 8 row and 8 column matrix. The MAC multiply-add array comprises I multiplied by O computing units, wherein I bit input channels are arranged along the column direction of the MAC multiply-add array, and each input channel corresponds to one computing unit. And the row direction of the MAC multiplication and addition array is provided with O-bit output channels, and each input channel of the O-bit output channels corresponds to one computing unit. The MAC multiply-add array is an operation array in hardware.
As shown in fig. 1, the method for acquiring the sparse operation data based on the MAC multiply-add array includes:
step S101, a plurality of cell blocks are obtained according to the sparse weight matrix to be calculated.
In this step, the row and column direction of the sparse weight matrix to be calculated is taken as a dividing unit by row I and column O, and the sparse weight matrix is divided into a plurality of cell blocks along the column direction according to the dividing unit. Each cell block includes a plurality of cells having a significant weight value.
For example: the dividing unit is 8 rows and 8 columns, and the thinning weight matrix is divided into a plurality of unit blocks along the column direction according to the 8 rows and 8 columns dividing unit.
Step S102, generating a plurality of working modes.
In this step, one or more unit blocks are read in the column direction of the sparse weight matrix to be calculated. If the number of cells of the one or more cell blocks having a significant weight value is equal to I O/2 and the number of cells of the one or more cell blocks having a significant weight value is equal to or less than I in each column of the one or more cell blocks, a plurality of operation modes corresponding to the one or more cell blocks are generated.
Step S103, integrating the calculation array.
In this step, one or more cell blocks are read along the column direction of the thinning weight matrix according to the plurality of operation modes. The cells of significant weight values in one or more cell blocks are integrated into a computational array of M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Row M corresponds to row O. The N columns correspond to the I columns. For example: the M rows and N columns are 8 rows and 8 columns of computational arrays.
And step S104, realizing matrix multiplication calculation through the MAC multiplication and addition array.
In this step, when the matrix multiplication calculation is realized by the MAC multiplication and addition array, the calculation array is converted and used as a multiplier item, and the effective weight value unit in the matrix row direction can be calculated as a characteristic weight value.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, step S102 includes:
1 unit block is read as a first division unit in the column direction of the thinning-out weight matrix to be calculated. If the number of cells of the first partition unit with significant weight values is equal to I × O/2 and the number of cells of the first partition unit with significant weight values is below I in each column of the first partition unit, a first operation mode is generated.
For example: as shown in fig. 3, the number of cells of the effective weight value in the first divided cell thereof is equal to 32, i.e., 8 × 8/2. In column 0 the effective weights are the elements indicated by the boxes 0, 1, 2, 3, 4. In column 1 the effective weights are the elements indicated by the 0, 1, 2, 3, 4, 5, 6 boxes. In column 2 the effective weights are the elements indicated by the 0, 1, 2 boxes.
In column 3 the significance weights are 0, 1, 2, 3, the unit indicated by the box. In column 4 the effective weights are the elements indicated by the 0, 1, 2 boxes. In column 5 the effective weights are the elements indicated by the 0, 1, 2 boxes. In column 6 the effective weights are the elements indicated by the 0, 1, 2, 3, 4, 5 boxes. The effective weight is the unit indicated by the 0 box in column 7.
The effective weight is a unit with a weight value of non-zero. Wherein the number of cells of the 1 st column with a significant weight value is 7, i.e. less than 8 in each column of the first division cells, a first operation mode is generated.
In the other case, 2 unit blocks are read in the column direction of the thinning-out weight matrix to be calculated as the second division unit. And if the number of the units of the effective weight values in the second dividing unit is equal to I multiplied by O/2 and the number of the units of the effective weight values in each column of the second dividing unit is below I, generating a second working mode.
For example: as shown in fig. 4, 2 unit blocks, that is, a unit block 11 and a unit block 12 are read in the column direction of the thinning-out weight matrix to be calculated as second division units. The number of cells of the effective weight value in the cell block 11 and the cell block 12 is equal to 32, i.e., 8 × 8/2. And if the number of the effective weight values in the 1 st column is 7, namely, the number of the effective weight values in each column of the second dividing unit is less than 8, generating a second working mode.
And reading 4 unit blocks in the column direction of the sparse weight matrix to be calculated as a third division unit. And if the number of the effective weight values in the third dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the third dividing unit is below I, generating a third working mode.
For example: as shown in fig. 5, 4 unit blocks, that is, a unit block 21, a unit block 22, a unit block 23, a unit block 24 are read in the column direction of the thinning-out weight matrix to be calculated as the third division unit. The number of cells of the effective weight values in the cell block 21, the cell block 22, the cell block 23, the cell block 24 is equal to 32, i.e., 8 × 8/2. And if the number of the effective weight values in the 1 st column is 7, that is, each column of the third dividing unit is less than 8, generating a third working mode.
And reading 8 unit blocks in the column direction of the sparse weight matrix to be calculated as a fourth division unit. And if the number of the cells of the effective weight value in the fourth dividing cell is equal to I multiplied by O/2 and the number of the cells of the effective weight value in each column of the fourth dividing cell is below I, generating a fourth working mode. The implementation of the fourth working mode refers to the first, second and third working modes, and is not described in detail.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the step of integrating the units of effective weight values in one or more unit blocks into a calculation array with M rows and N columns in step S103 includes:
one or more cell blocks are read. And sequentially reading the units according to the row sorting in each column, and if the current reading unit is an effective weight value unit, sequentially arranging the current effective weight value unit and the last effective weight value unit along the row sorting and the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, step S103 includes:
and reading the first division unit along the column direction of the sparse weight matrix according to a first working mode. And integrating the effective weight value units of the first division unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the second division unit along the column direction of the thinning weight matrix according to a second working mode. And integrating the effective weight value units of the second dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. As shown in fig. 6, in the second operating mode, the second partitioning unit is integrated into the 8 rows and 8 columns of the computational array. Or
And reading the third division unit along the column direction of the sparse weight matrix according to a third working mode. And integrating the effective weight value units of the third dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the fourth division unit along the column direction of the sparse weight matrix according to the fourth working mode. And integrating the effective weight value units of the fourth dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, after step S104, the method further includes:
and step S105, taking the calculation array as a characteristic input value to realize convolution or full-connection layer calculation in the neural network model of deep learning.
The MAC multiply-add array is an 8 row 8 column matrix, a 16 row 16 column matrix, a 32 row 32 column matrix, a 64 row 64 column matrix, a 16 row 32 column matrix, or a 32 row 64 column matrix. The MAC multiply-add array includes 8 × 8, 16 × 16, 32 × 32, 64 × 64, or 16 × 32, 32 × 64 computing units. The M rows and N columns of computational arrays are 8 rows and 8 columns, 16 rows and 16 columns, 32 rows and 32 columns, 64 rows and 64 columns, 16 rows and 32 columns, or 32 rows and 64 columns of computational arrays.
In a second aspect, as shown in fig. 7, the present invention provides a system for acquiring sparsely operated data based on a MAC multiply-add array, which is an O row-I column matrix. The MAC multiply-add array comprises I multiplied by O computing units, wherein I bit input channels are arranged along the column direction of the MAC multiply-add array, and each input channel corresponds to one computing unit. And the row direction of the MAC multiplication and addition array is provided with O-bit output channels, and each input channel of the O-bit output channels corresponds to one computing unit.
The system for acquiring the sparse operation data based on the MAC multiply-add array comprises: a dividing unit 101, a generating operation mode unit 201, an integrating unit 301 and a calculating unit 401, wherein:
the dividing unit 101 is configured to divide the sparse weight matrix into a plurality of cell blocks along the column direction of the sparse weight matrix, wherein the row and column direction of the sparse weight matrix to be calculated is O row I column. Each cell block includes a plurality of cells having a significant weight value.
An operation mode unit 201 is generated, which is configured to read one or more unit blocks in a column direction of a thinning weight matrix to be calculated. If the number of cells of the one or more cell blocks having a significant weight value is equal to I O/2 and the number of cells of the one or more cell blocks having a significant weight value is equal to or less than I in each column of the one or more cell blocks, a plurality of operation modes corresponding to the one or more cell blocks are generated.
An integration unit 301 configured to read one or more unit blocks in a column direction of the thinning weight matrix according to a plurality of operation modes. The cells of significant weight values in one or more cell blocks are integrated into a computational array of M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Row M corresponds to row O. The N columns correspond to the I columns.
And a calculating unit 401 configured to, when the MAC multiply-add array is used to perform matrix multiply calculation, convert the calculating array to be a multiplier item, and calculate an effective weight value in a matrix row direction as a feature weight value.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the generating operation mode unit 201 is further configured to:
1 unit block is read as a first division unit in the column direction of the thinning-out weight matrix to be calculated. If the number of cells of the first partition unit with significant weight values is equal to I × O/2 and the number of cells of the first partition unit with significant weight values is below I in each column of the first partition unit, a first operation mode is generated. Or
And reading 2 unit blocks in the column direction of the sparse weight matrix to be calculated as a second division unit. And if the number of the units of the effective weight values in the second dividing unit is equal to I multiplied by O/2 and the number of the units of the effective weight values in each column of the second dividing unit is below I, generating a second working mode. Or
And reading 4 unit blocks in the column direction of the sparse weight matrix to be calculated as a third division unit. And if the number of the effective weight values in the third dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the third dividing unit is below I, generating a third working mode. Or
And reading 8 unit blocks in the column direction of the sparse weight matrix to be calculated as a fourth division unit. And if the number of the cells of the effective weight value in the fourth dividing cell is equal to I multiplied by O/2 and the number of the cells of the effective weight value in each column of the fourth dividing cell is below I, generating a fourth working mode.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the step of integrating the units of effective weight values in one or more unit blocks into a calculation array with M rows and N columns in the integration unit 301 is configured to:
one or more cell blocks are read. And sequentially reading the units according to the row sorting in each column, and if the current reading unit is an effective weight value unit, sequentially arranging the current effective weight value unit and the last effective weight value unit along the row sorting and the column direction.
In another embodiment of the method for acquiring sparse operation data based on MAC multiply-add array according to the present invention, the integrating unit 301 is configured to:
and reading the first division unit along the column direction of the sparse weight matrix according to a first working mode. And integrating the effective weight value units of the first division unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the second division unit along the column direction of the thinning weight matrix according to a second working mode. And integrating the effective weight value units of the second dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the third division unit along the column direction of the sparse weight matrix according to a third working mode. And integrating the effective weight value units of the third dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction. Or
And reading the fourth division unit along the column direction of the sparse weight matrix according to the fourth working mode. And integrating the effective weight value units of the fourth dividing unit into a calculation array with M rows and N columns. Effective weight value units in the calculation array are sequentially arranged along the column direction.
In another embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention, the calculating unit 401 further includes:
and a convolution calculation unit configured to implement convolution or full-connected layer calculation in the deep-learning neural network model with the calculation array as the characteristic input value.
The MAC multiply-add array is an 8 row 8 column matrix, a 16 row 16 column matrix, a 32 row 32 column matrix, a 64 row 64 column matrix, a 16 row 32 column matrix, or a 32 row 64 column matrix. The MAC multiply-add array includes 8 × 8, 16 × 16, 32 × 32, 64 × 64, or 16 × 32, 32 × 64 computing units. The M rows and N columns of computational arrays are 8 rows and 8 columns, 16 rows and 16 columns, 32 rows and 32 columns, 64 rows and 64 columns, 16 rows and 32 columns, or 32 rows and 64 columns of computational arrays.
In a preferred embodiment of the method for acquiring sparse operation data based on the MAC multiply-add array according to the present invention.
The invention relates to a hardware implementation mode of hardware friendly rule sparsization. The selection of the data enables the neural network accelerator to support matrix multiplication.
Preferably, the method for acquiring the sparse operation data based on the MAC multiply-add array includes:
the first step, array arrangement mode: the array arrangement is 8 by 8 arrays, 8 channels are input, and 8 channels are output. As shown in fig. 2.
Secondly, rule sparsification: the working modes are divided into 4 levels: the 50% sparsity (G1) as shown in fig. 3, the 25% sparsity (G2) as shown in fig. 4, the 12.5% sparsity (G4) as shown in fig. 5, etc., may be divided according to demand and resource allocation.
As shown in fig. 3, fig. 4, and fig. 5, each MAC array has a weight of one group, G1 mode comprises 1 group per block, G2 mode comprises 2 groups per black, G4 mode comprises 4 groups per block, and so on. Each block occupies a line space in the memory.
The limiting conditions are as follows: first, each column direction does not exceed 8 non-zero weights, which are limited by the output channel resources of the MAC array; second, the number of non-zero weights per line weight equals 32.
The process flow is shown in fig. 6 with G2, for example, where the edge is the arrangement of the original non-zero weights and the right graph is the arrangement of the sorted non-zero weights.
Thirdly, the MAC array supports a matrix multiplication mode: as shown in fig. 8, the convolution module supports the matrix multiplication process in two steps, first, S10 performs right matrix transposition, where C is depth direction data and W is width direction data. CO and Ci are data after the shift. Then, in step S20, a convolution with k 1 × 1S 1 is performed. And performing full join operation to obtain the result of matrix multiplication.
The invention has the beneficial effects that:
on one hand, the sparse processing flow is friendly to hardware, and a small amount of resources can be utilized to process a plurality of non-zero weights in parallel; in the traditional sparsification process, a non-zero weight needs to be found, the processing speed is low, or a large amount of hardware is needed to be processed in parallel, the cost is high, and the hardware implementation is complex.
On the other hand, the convolution operation MAC can support the matrix multiplication operation through simple data change.
It should be understood that although the present description is described in terms of various embodiments, not every embodiment includes only a single embodiment, and such description is for clarity purposes only, and those skilled in the art will recognize that the embodiments described herein as a whole may be suitably combined to form other embodiments as will be appreciated by those skilled in the art.
The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

Claims (10)

1. The method for acquiring the sparse operation data based on the MAC multiply-add array is characterized in that the MAC multiply-add array is an O row and I column matrix; the MAC multiply-add array comprises I multiplied by O computing units, wherein I bit input channels are arranged along the column direction of the MAC multiply-add array, and each input channel corresponds to one computing unit; the row direction of the MAC multiply-add array is provided with O bit output channels, and each input channel corresponds to one computing unit;
the method for acquiring the sparse operation data based on the MAC multiply-add array comprises the following steps:
step S101, taking the row and column directions of a sparsification weight matrix to be calculated and the O row and the I column as a dividing unit, dividing the sparsification weight matrix into a plurality of cell blocks along the column direction according to the dividing unit; each cell block comprises a plurality of cells with effective weight values;
step S102, reading one or more unit blocks along the column direction of the sparse weight matrix to be calculated; generating a plurality of operation modes corresponding to the one or more unit blocks if the number of cells of the effective weight value in the one or more unit blocks is equal to I × O/2 and the number of cells of the effective weight value in each column of the one or more unit blocks is below I;
step S103, reading one or more cell blocks along the column direction of the sparse weight matrix according to the plurality of working modes; integrating the cells of significant weight values in one or more cell blocks into a computational array of M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; the M rows correspond to the O rows; the N columns correspond to the I columns;
and step S104, when the MAC multiplication and addition array is used for realizing matrix multiplication calculation, the calculation array is used as a multiplier item after being converted, and the effective weight value unit in the row direction of the calculation matrix can be used as a characteristic weight value for calculation.
2. The acquisition method according to claim 1, the step S102 comprising:
reading 1 unit block along the column direction of the sparse weight matrix to be calculated as a first dividing unit; if the number of the effective weight values in the first division unit is equal to I multiplied by O/2 and the number of the effective weight values in each row of the first division unit is below I, generating a first working mode; or
Reading 2 unit blocks along the column direction of the sparse weight matrix to be calculated as a second division unit; if the number of the effective weight values in the second dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the second dividing unit is below I, generating a second working mode; or
Reading 4 unit blocks along the column direction of the sparse weight matrix to be calculated as a third division unit; if the number of the effective weight values in the third dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the third dividing unit is below I, generating a third working mode; or
Reading 8 unit blocks along the column direction of the sparse weight matrix to be calculated as a fourth dividing unit; and if the number of the effective weight values in the fourth dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the fourth dividing unit is below I, generating a fourth working mode.
3. The method of claim 2, wherein the step of integrating the cells of significant weight values in one or more cell blocks into a computational array of M rows and N columns in step S103 comprises:
reading one or more cell blocks; and sequentially reading the units according to the row sorting in each column, and if the current reading unit is an effective weight value unit, sequentially arranging the current effective weight value unit and the last effective weight value unit along the row sorting and the column direction.
4. The acquisition method according to claim 2 or 3, the step S103 comprising:
reading the first division unit along the column direction of the sparse weight matrix according to the first working mode; integrating the effective weight value units of the first division unit into a calculation array with M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; or
Reading the second division unit along the column direction of the sparse weight matrix according to the second working mode; integrating the effective weight value units of the second dividing unit into a calculation array with M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; or
Reading the third division unit along the column direction of the sparse weight matrix according to the third working mode; integrating the effective weight value units of the third dividing unit into a calculation array with M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; or
Reading the fourth division unit along the column direction of the sparse weight matrix according to the fourth working mode; integrating the effective weight value units of the fourth dividing unit into a calculation array with M rows and N columns; the effective weight value units in the calculation array are sequentially arranged along the column direction.
5. The acquisition method according to claim 1, further comprising after the step S104:
step S105, the calculation array is used as a characteristic input value to realize convolution or full-connection layer calculation in a neural network model of deep learning;
the MAC multiplication and addition array is an 8-row 8-column matrix, a 16-row 16-column matrix, a 32-row 32-column matrix, a 64-row 64-column matrix, a 16-row 32-column matrix or a 32-row 64-column matrix; the MAC multiply-add array comprises 8 × 8, 16 × 16, 32 × 32, 64 × 64 or 16 × 32, 32 × 64 computing units; the M rows and N columns of computational arrays are 8 rows and 8 columns, 16 rows and 16 columns, 32 rows and 32 columns, 64 rows and 64 columns, 16 rows and 32 columns or 32 rows and 64 columns of computational arrays.
6. The sparse operation data acquisition system is based on an MAC multiplication and addition array, wherein the MAC multiplication and addition array is an O row and I column matrix; the MAC multiply-add array comprises I multiplied by O computing units, wherein I bit input channels are arranged along the column direction of the MAC multiply-add array, and each input channel corresponds to one computing unit; the row direction of the MAC multiply-add array is provided with O bit output channels, and each input channel corresponds to one computing unit;
the system for acquiring the sparse operation data based on the MAC multiply-add array comprises: a dividing unit, a generating work mode unit, an integrating unit and a calculating unit, wherein:
the dividing unit is configured to take the row and column directions of the sparse weight matrix to be calculated and the O row and the I column as one dividing unit, and divide the sparse weight matrix into a plurality of cell blocks along the column direction according to the dividing unit; each cell block comprises a plurality of cells with effective weight values;
the generation working mode unit is configured to read one or more unit blocks along the column direction of the sparse weight matrix to be calculated; generating a plurality of operation modes corresponding to the one or more unit blocks if the number of cells of the effective weight value in the one or more unit blocks is equal to I × O/2 and the number of cells of the effective weight value in each column of the one or more unit blocks is below I;
the integration unit is configured to read one or more unit blocks along a column direction of the thinning weight matrix according to the plurality of operation modes; integrating the cells of significant weight values in one or more cell blocks into a computational array of M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; the M rows correspond to the O rows; the N columns correspond to the I columns;
the calculation unit is configured to, when the MAC multiply-add array is used to perform matrix multiply calculation, convert the calculation array to be a multiplier item, and the effective weight value unit in the calculation matrix row direction can be calculated as a feature weight value.
7. The acquisition system of claim 6, the generate operating mode unit further configured to:
reading 1 unit block along the column direction of the sparse weight matrix to be calculated as a first dividing unit; if the number of the effective weight values in the first division unit is equal to I multiplied by O/2 and the number of the effective weight values in each row of the first division unit is below I, generating a first working mode; or
Reading 2 unit blocks along the column direction of the sparse weight matrix to be calculated as a second division unit; if the number of the effective weight values in the second dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the second dividing unit is below I, generating a second working mode; or
Reading 4 unit blocks along the column direction of the sparse weight matrix to be calculated as a third division unit; if the number of the effective weight values in the third dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the third dividing unit is below I, generating a third working mode; or
Reading 8 unit blocks along the column direction of the sparse weight matrix to be calculated as a fourth dividing unit; and if the number of the effective weight values in the fourth dividing unit is equal to I multiplied by O/2 and the number of the effective weight values in each column of the fourth dividing unit is below I, generating a fourth working mode.
8. The acquisition system of claim 7, said integrating of the cells of significant weight values in one or more cell blocks into an M row by N column computational array of cells configured to:
reading one or more cell blocks; and sequentially reading the units according to the row sorting in each column, and if the current reading unit is an effective weight value unit, sequentially arranging the current effective weight value unit and the last effective weight value unit along the row sorting and the column direction.
9. The acquisition system according to claim 7 or 8, the integration unit being configured to:
reading the first division unit along the column direction of the sparse weight matrix according to the first working mode; integrating the effective weight value units of the first division unit into a calculation array with M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; or
Reading the second division unit along the column direction of the sparse weight matrix according to the second working mode; integrating the effective weight value units of the second dividing unit into a calculation array with M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; or
Reading the third division unit along the column direction of the sparse weight matrix according to the third working mode; integrating the effective weight value units of the third dividing unit into a calculation array with M rows and N columns; effective weight value units in the calculation array are sequentially arranged along the column direction; or
Reading the fourth division unit along the column direction of the sparse weight matrix according to the fourth working mode; integrating the effective weight value units of the fourth dividing unit into a calculation array with M rows and N columns; the effective weight value units in the calculation array are sequentially arranged along the column direction.
10. The acquisition system of claim 6, the computing unit further comprising:
a convolution calculation unit configured to perform convolution or full-connected layer calculation in a deep-learning neural network model with the calculation array as a feature input value;
the MAC multiplication and addition array is an 8-row 8-column matrix, a 16-row 16-column matrix, a 32-row 32-column matrix, a 64-row 64-column matrix, a 16-row 32-column matrix or a 32-row 64-column matrix; the MAC multiply-add array comprises 8 × 8, 16 × 16, 32 × 32, 64 × 64 or 16 × 32, 32 × 64 computing units; the M rows and N columns of computational arrays are 8 rows and 8 columns, 16 rows and 16 columns, 32 rows and 32 columns, 64 rows and 64 columns, 16 rows and 32 columns or 32 rows and 64 columns of computational arrays.
CN202011640074.3A 2020-12-31 2020-12-31 Method and system for acquiring sparse operation data based on MAC multiply-add array Active CN112712173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011640074.3A CN112712173B (en) 2020-12-31 2020-12-31 Method and system for acquiring sparse operation data based on MAC multiply-add array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011640074.3A CN112712173B (en) 2020-12-31 2020-12-31 Method and system for acquiring sparse operation data based on MAC multiply-add array

Publications (2)

Publication Number Publication Date
CN112712173A true CN112712173A (en) 2021-04-27
CN112712173B CN112712173B (en) 2024-06-07

Family

ID=75547981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011640074.3A Active CN112712173B (en) 2020-12-31 2020-12-31 Method and system for acquiring sparse operation data based on MAC multiply-add array

Country Status (1)

Country Link
CN (1) CN112712173B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581676A (en) * 2022-03-01 2022-06-03 北京百度网讯科技有限公司 Characteristic image processing method and device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101378303A (en) * 2007-08-31 2009-03-04 华为技术有限公司 Method and apparatus for generating and processing retransmission low-density parity check code
US20180210862A1 (en) * 2017-01-22 2018-07-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device
CN109992742A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of signal processing method and device
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network
CN209514618U (en) * 2019-02-26 2019-10-18 北京知存科技有限公司 Dynamic bias simulates vector-matrix multiplication operation circuit
CN110766157A (en) * 2019-10-21 2020-02-07 中国人民解放军国防科技大学 Multi-sample neural network forward propagation vectorization implementation method
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101378303A (en) * 2007-08-31 2009-03-04 华为技术有限公司 Method and apparatus for generating and processing retransmission low-density parity check code
US20180210862A1 (en) * 2017-01-22 2018-07-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device
CN109992742A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of signal processing method and device
CN209514618U (en) * 2019-02-26 2019-10-18 北京知存科技有限公司 Dynamic bias simulates vector-matrix multiplication operation circuit
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network
CN110766157A (en) * 2019-10-21 2020-02-07 中国人民解放军国防科技大学 Multi-sample neural network forward propagation vectorization implementation method
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581676A (en) * 2022-03-01 2022-06-03 北京百度网讯科技有限公司 Characteristic image processing method and device and storage medium
CN114581676B (en) * 2022-03-01 2023-09-26 北京百度网讯科技有限公司 Processing method, device and storage medium for feature image

Also Published As

Publication number Publication date
CN112712173B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
CN109447241B (en) Dynamic reconfigurable convolutional neural network accelerator architecture for field of Internet of things
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
CN111445012A (en) FPGA-based packet convolution hardware accelerator and method thereof
CN110222818B (en) Multi-bank row-column interleaving read-write method for convolutional neural network data storage
CN112668708B (en) Convolution operation device for improving data utilization rate
CN112286864B (en) Sparse data processing method and system for accelerating operation of reconfigurable processor
CN113283587B (en) Winograd convolution operation acceleration method and acceleration module
WO2018027706A1 (en) Fft processor and algorithm
CN112434801A (en) Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN111340198A (en) Neural network accelerator with highly-multiplexed data based on FPGA (field programmable Gate array)
CN112712173A (en) Method and system for acquiring sparse operation data based on MAC (media Access control) multiply-add array
US7912891B2 (en) High speed low power fixed-point multiplier and method thereof
CN109446478B (en) Complex covariance matrix calculation system based on iteration and reconfigurable mode
CN111610963B (en) Chip structure and multiply-add calculation engine thereof
CN111626410B (en) Sparse convolutional neural network accelerator and calculation method
CN113239591A (en) DCU cluster-oriented large-scale finite element grid parallel partitioning method and device
CN113743046B (en) Integrated layout structure for memory and calculation and integrated layout structure for data splitting and memory and calculation
CN111667052A (en) Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN116882455A (en) Pointwise convolution computing device and method
CN113448624B (en) Data access method, device, system and AI accelerator
CN113656656B (en) Efficient neighbor retrieval method and system for wide-grading discrete element particle system
US20240220203A1 (en) Streaming-based compute unit and method, and artificial intelligence chip
CN112612447B (en) Matrix calculator and full-connection layer calculating method based on same
CN220983883U (en) Matrix computing device, chiplet apparatus and artificial intelligence accelerator device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant