CN112799635B - Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation - Google Patents

Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation Download PDF

Info

Publication number
CN112799635B
CN112799635B CN202110170947.7A CN202110170947A CN112799635B CN 112799635 B CN112799635 B CN 112799635B CN 202110170947 A CN202110170947 A CN 202110170947A CN 112799635 B CN112799635 B CN 112799635B
Authority
CN
China
Prior art keywords
matrix
sparse matrix
row
kxn
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110170947.7A
Other languages
Chinese (zh)
Other versions
CN112799635A (en
Inventor
周晓辉
袁博
华诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computing Chip Shenzhen Information Technology Co ltd
Original Assignee
Computing Chip Shenzhen Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computing Chip Shenzhen Information Technology Co ltd filed Critical Computing Chip Shenzhen Information Technology Co ltd
Priority to CN202110170947.7A priority Critical patent/CN112799635B/en
Publication of CN112799635A publication Critical patent/CN112799635A/en
Application granted granted Critical
Publication of CN112799635B publication Critical patent/CN112799635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The invention belongs to the field of computer application equipment, and particularly relates to a novel method for solving inner products of a dense matrix and a sparse matrix by means of outer product accumulation. The method comprises the following effective steps: a. firstly, completing reading of original data of a dense matrix A (MxK) and a sparse matrix B (KxN); b. performing line-direction data compression on the sparse matrix B (KxN), and performing line-direction compression on the sparse matrix B (KxN) in two-dimensional layout to obtain a line-direction compressed sparse matrix B (KxN) in one-dimensional layout; c. reading the ith column of the row-wise compressed sparse matrix B (KxN) and reading the ith row outer product operation of the dense matrix A (MxK) to generate ci; d. accumulating the obtained outer product results to obtain an inner product of a dense matrix A (MxK) and a sparse matrix B (KxN); in the step B, the compression characteristic of the sparse matrix B (KxN) is that the first element value of the starting point of each row is retained, and whether the element value is zero or not, the non-starting point element of each row is determined whether to be retained or not according to whether the element value is zero or not, and if the element value is zero, the element is discarded, and if the element value is non-zero, the element is retained.

Description

Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation
Technical Field
The invention belongs to the field of computer application equipment, and particularly relates to a novel method for solving inner products of a dense matrix and a sparse matrix by means of outer product accumulation.
Background
Matrix inner product multiplication is the basis of algorithms in numerous fields such as numerical calculation, machine learning, engineering simulation and the like, and dense matrix A (MxK) and sparse matrix B (KxN) inner product multiplication is a common matrix inner product calculation scene.
Dense matrix a (MxK) and sparse matrix B (KxN), which are characterized by the presence of a certain number of zero elements in sparse matrix B (KxN). The part of zero elements can still ensure the correctness of the operation result without participating in calculation, and meanwhile, if the zero elements directly abandon storage, the utilization efficiency of the memory can be improved.
However, skipping zero elements in the sparse matrix B (KxN) during computation and storage results in discontinuous memory access, which is not conducive to utilizing cache levels in modern processor architectures. The computational efficiency involved in the inner product of sparse matrices is thus lower than the inner product of dense matrices and dense matrices.
Here, the inner product of the dense matrix a (4 x 4) and the sparse matrix B (4 x 3) is taken as an example, and the sparse matrix B (4 x 3) is a general sparse matrix which is characterized in that the distribution of zero elements has no regularity. FIG. 1 is the first step of solving the inner product calculation of dense matrix A (4 x 4) and sparse matrix B (4 x 3) in the manner of summation of outer products. In this step, first column a1 (4 x 1) of dense matrix a (4 x 4) and first row B1 (1 x 3) of sparse matrix B (4 x 3) are taken out to perform outer product operation to obtain result c1, and so on, column 2 of dense matrix a and row 2B 2 of sparse matrix B also perform outer product operation to obtain result c2. The dense matrix A (4 x 4) has four columns, the sparse matrix B (4 x 3) has four rows, and the outer products of the columns and the rows in one-to-one correspondence are c1 (4 x 3), c2 (4 x 3), c3 (4 x 3) and c4 (4 x 3). And adding corresponding elements of the four outer product results of c1, c2, c3 and c4 to obtain an inner product result. In general matrix inner product multiplication, the value of the target matrix (i, j) position is obtained by inner product of the ith row of the dense matrix a (4 x 4) and the jth column of the sparse matrix B (4 x 3). The calculation of the whole inner product is completed, and the sparse matrix B (4 x 3) needs to be repeatedly read and calculated by the dense matrix A (4 x 4) for four times.
Fig. 2 illustrates the outer product calculation logic of the first column a1 (4 x 1) of the dense matrix a (4 x 4) and the first row B1 (1 x 3) of the sparse matrix B (4 x 3). The outer product of the row vector a1 (4 x 1) and the row vector b1 (1 x 3) forms a matrix c1 of 4 rows and 3 columns. The calculation process is equivalent to multiplying each element of the row vector a1 by the row vector b1. The computation of the ith column ai (4 x 1) of the dense matrix A (4 x 4) and the ith row bi (1 x 3) of the sparse matrix B (4 x 3) is equivalent to the logic of the c1 computation. A zero element appears at a certain position in a certain row of the sparse matrix B (4 x 3), and the influence on the calculation result is that a zero vector is formed in the row direction at the position.
Fig. 3 illustrates the process of four matrix additions of the outer product results c1, c2, c3, c4 described above, which follow the process of adding corresponding elements in mathematics. The above is a description of the process of solving the inner product of the dense matrix a (4 x 4) and the sparse matrix B (4 x 3) by the outer product accumulation method. It can be seen that in the process of solving the inner product of the dense matrix a (4 x 4) and the sparse matrix B (4 x 3) by the outer product accumulation, a row-wise zero vector generated by multiplying one column of the dense matrix a (4 x 4) by a zero element in a certain row of the sparse matrix B (4 x 3) is not specially considered. If the line direction zero vector is not calculated, the line direction zero vector also participates in the subsequent accumulation process, and the correctness of the inner product result is not influenced. However, in the existing outer product accumulation calculation, zero elements in the sparse matrix participate in the outer product calculation, which not only reduces the calculation efficiency of the outer product calculation, but also increases the requirement of writing zero values into the memory after the outer product calculation, thereby reducing the effective bandwidth of the memory writing.
Disclosure of Invention
Aiming at the technical problems of the inner product multiplication of the dense matrix A (4 x 4) and the sparse matrix B (4 x 3), the invention provides a novel method for solving the inner product of the dense matrix and the sparse matrix through outer product accumulation, which is simple in method, can effectively avoid invalid operation and save memory access.
In order to achieve the above object, the technical solution adopted by the present invention is that the present invention provides a novel method for solving inner products of dense matrix and sparse matrix by means of outer product accumulation, which comprises the following effective steps:
a. firstly, completing reading of original data of a dense matrix A (MxK) and a sparse matrix B (KxN);
b. performing line-direction data compression on the sparse matrix B (KxN), and performing line-direction compression on the sparse matrix B (KxN) in two-dimensional layout to obtain a line-direction compressed sparse matrix B (KxN) in one-dimensional layout;
c. reading the ith row of the row-wise compressed sparse matrix B (KxN) and reading the ith row outer product operation of the dense matrix A (MxK) to generate ci;
d. accumulating the obtained outer product results to obtain an inner product of a dense matrix A (MxK) and a sparse matrix B (KxN); in the step B, the compression characteristic of the sparse matrix B (KxN) is that the first element value of the starting point of each row is retained, and whether the element value is zero or not, the non-starting point element of each row is determined whether to be retained or not according to whether the element value is zero or not, and if the element value is zero, the element is discarded, and if the element value is non-zero, the element is retained.
Preferably, the method also comprises a step e, judging whether the last column of the sparse matrix B (KxN) is finished or not, and if so, finishing the calculation; if not, increasing i by 1 and repeating the steps c to e.
Preferably, in the step B, the row-wise compressed sparse matrix B (KxN) includes a total length after matrix compression, a column index array and a compressed element value array, wherein the total number of elements of the compressed total length sparse matrix B (4 x 3) is the sum of the numbers of elements reserved; the column index array is used for representing column index values of elements reserved in each row under an original two-dimensional layout; the array of compressed element values is used to sequentially fill in the compressed element values in column order.
Compared with the prior art, the invention has the advantages and positive effects that,
1. the invention provides a novel method for solving inner products of a dense matrix and a sparse matrix by accumulating outer products, which is characterized in that a sparse matrix B (KxN) is compressed in a row direction, a first element and other non-zero elements in each row are reserved, the positions of the compressed elements are conveniently positioned, and meanwhile, a large number of zero elements are removed, so that invalid numerical value calculation is avoided, the aims of saving memory access and invalid operation are fulfilled, the operation efficiency is improved, and the guarantee is provided for the quick operation of numerical value calculation, machine learning and engineering simulation related technologies.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
FIG. 1, example of inner product multiplication solved by outer product accumulation of dense matrix A (4 x 4) and sparse matrix B (4 x 3)
FIG. 2, example of performing outer product computation of the first column of dense matrix A (4 x 4) and the first row of sparse matrix B (4 x 3)
FIG. 3, example of accumulation of outer product results of dense matrix A (4 x 4) and sparse matrix B (4 x 3)
FIG. 4 is the conversion process of the sparse matrix B (4 x 3) simplified row to the compressed storage mode
FIG. 5, sparse matrix B (4 x 3) reduced row-wise compressed column index array logic example
FIG. 6 is a diagram illustrating an example of the column and row correspondence relationship of the outer product operation of the one-dimensional array formed by the row-wise compression of the dense matrix A (4 x 4) and the sparse matrix B (4 x 3)
FIG. 7 is an example of a process for performing outer product accumulation of columns of dense matrix A (4 x 4) and reduced rows of sparse matrix B (4 x 3) into a one-dimensional array of rows
FIG. 8 is a flow chart of the accumulation of the outer products of the dense matrix A (MxK) and the reduced row-wise compressed sparse matrix B (KxN).
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be further described with reference to the accompanying drawings and examples. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and thus the present invention is not limited to the specific embodiments of the present disclosure.
The embodiment aims to solve the problems that the inner product is large in calculation amount and the zero element calculation is repeated when the inner product is solved by the accumulation of the sparse matrix B (KxN) and the dense matrix A (MxK). For this reason, the example of solving inner product multiplication by outer product accumulation of the sparse matrix B (4 x 3) and the dense matrix a (4 x 4) provided in this embodiment is the same as the background art, and the sparse matrix B (4 x 3) and the dense matrix a (4 x 4) are used in this embodiment to describe the technical solution provided in this embodiment, so as to better understand the technology provided in the present invention.
Firstly, reading a sparse matrix B (4 x 3) and a dense matrix A (4 x 4) of original data is completed, the specific values of which are shown in figure 1,
fig. 4 illustrates a conversion process 400 of sparse matrix B (4 x 3) according to the SCSR storage scheme. 400 involving a two-dimensional layoutA sparse matrix B (4 x 3) 401 and a one-dimensional layout 402 of the compacted row-wise compressed sparse matrix B (4 x 3) in the memory. 401 non-zero elements are represented by "b ij The "representation, the zero element is represented by a black square" \9632; ". The column index and row index may start from 0 or 1, without affecting the implementation of the method.
In 402, the compressed total length 403 of the matrix, the column index array 404 and the compressed element value array 405 form all components of the row direction compressed storage of the sparse matrix B (4 x 3), and the memory address 406 is used for assisting the explanation and is not needed in the actual algorithm implementation. Dashed arrows in 400 are used to indicate the positions of the elements in 401 mapped in 402 after being stored in compact row-wise compression.
The method aims at the compression characteristic of the sparse matrix and is characterized in that the first element value of the starting point of each row is reserved, and whether the element value is zero or not, the non-starting point element of each row is judged whether the element value is zero or not. If the value is zero, discarding, and if the value is non-zero, keeping. Compared with a general line compression method, the method of the invention calculates the range of the memory area where a line element is compressed by judging whether the value in the row index array 404 is zero or not. The first element of each row is retained, regardless of whether the element is zero or not, other elements of each row are normally processed, and zero elements are discarded. This has the advantage that the position of the compressed element can be located using only the column index array. The data structure storing the compression results is also a simple point. The elements are conveniently positioned.
The total length 403 after matrix compression represents the sum of the number of elements retained after compression of the sparse matrix B (4 x 3). The total length 403 after matrix compression ensures that no over-bound access is made when accessing the memory space of the column index array 404 and the compressed element value array 405.
The row index array 404 records the elements remaining in each row, which have the row index values under the original two-dimensional layout 401. The compressed element value array 405 sequentially stores compressed element values in the sparse matrix B (4 × 3) at consecutive memory addresses. The nth address offset position addrN of the column index array 404 stores a value indicating the column number of the compressed element value array 405 under the two-dimensional layout of the sparse matrix B (4 x 3) for the element value stored at the same address offset position addrN.
Although the first row and first column element of the two-dimensional layout 401 of the sparse matrix B (4 x 3) in fig. 4 is zero, it is still retained and placed at the first position of the compressed element value array 405 in the step of compressed storage, and its column index value0 is also indicated by the first position of the column index array 404, and the specific mapping relationship is indicated by the arrow 410. Arrow 411 indicates the mapping relationship of the second non-zero element in the first row of the sparse matrix B (4 x 3). The third element of the first row is a zero value, which is discarded and not mapped into the column index array 404 and the compressed element value array 405. The first element in the second row of sparse matrix B (4 x 3) is a non-zero value, requiring retention in the array of column indices 404 and the array of compressed element values 405, as indicated by arrow 412.
In the column index array 504 of fig. 5, the value stored in the first address addr0 is rvalue0, the actual value of rvalue0 is 0 in this example, the value stored in the second address addr1 is rvalue1, the actual value of rvalue1 is 1 in this example, the value stored in the third address addr2 is rvalue2, and the actual value of rvalue2 is 0 in this example. Since the actual value of rvalue2 is 0, and the column index value of each row has and only one element is zero, the position represented by rvalue2 is the start of a new row.
The range of values of rvalue in the column index array 504 between two adjacent 0 values constitutes the data range of each row after compression. The Nth occurrence of rvalue is a value of 0, indicating that the column index following the value of 0 is the Nth row-column index data.
For the sparse matrix B (4 x 3) of this example, the reduced compressed element range of the sparse first row, which originally contained zero elements, is: greater than or equal to rvalue0 and less than rvalue2; the range of elements after compaction compression of the sparse second line, which originally contained zero elements, is greater than or equal to rvalue2 and less than rvalue4.
The last line, line K, of the sparse matrix B (KxN), the element range after compaction is: the memory address offset corresponding to the 0 value appearing at the Kth time in the row index array 504 is larger than or equal to the total length 503 after matrix compression.
Fig. 6 illustrates a correspondence relationship between the ith column of the dense matrix a (4 x 4) and the ith row of the reduced row-wise compressed sparse matrix B (4 x 3) to perform an outer product operation. The first row of the one-dimensional layout compressed element array 605 of the compressed sparse matrix B (4 x 3) to which the first column 606 of the dense matrix a (4 x 4) performs the outer calculation is indicated by the arrow 610 to the extent. In the same way the second row of the compressed sparse matrix B (4 x 3) corresponding to the second column 607 of the dense matrix a (4 x 4) is indicated by arrow 611. The third and fourth rows of the compressed sparse matrix B (4 x 3) corresponding to the third and fourth columns 608 and 609 of the dense matrix a (4 x 4) are indicated by arrows 612 and 613, respectively.
Fig. 7 shows that the outer product computation 722 is performed on the first column 706 of the dense matrix a (4 x 4) and the first row 710 of the row-wise compressed sparse matrix B (4 x 3) to generate the outer product computation result 714, where the outer product computation result 714 includes a result 712 of performing the outer product on the first column 706 of the dense matrix a (4 x 4) and the first row 710 of the row-wise compressed sparse matrix B (4 x 3), and the outer product computation result 714 includes a result 713 of performing the outer product on the second row 710 of the row-wise compressed sparse matrix B (4 x 3) on the first column 706 of the dense matrix a (4 x 4) in the outer product computation result 714. Although the first element of the first row 710 of the row-wise and row-wise compressed sparse matrix B (4 x 3) is zero, the outer product operation is still performed in the method of the present invention
The outer product calculation result 714 is equivalent to c1 in fig. 3, the outer product calculation result 714 is mapped into a two-dimensional row-column layout of c1 (4 x 3), an intermediate result 718 of the inner product is formed, a memory region value of a zero value generated by the outer product calculation 722 in the intermediate result 718 is represented by "0", and a memory region of a zero value not generated by the outer product calculation is represented by a black block "\9632;".
Similarly, the second column 707 of the dense matrix a (4 x 4) and the second row 711 of the row-wise compressed sparse matrix B (4 x 3) perform an outer product calculation 722 to generate an outer product calculation result 717, and the outer product calculation result 717 is equivalent to c2 in fig. 3. The outer product computation result 717 is mapped into a two-dimensional c2 (4 x 3) row-column layout, forming an inner product intermediate result 719. The intermediate result of the inner product 718 and the intermediate result of the inner product 719 perform a matrix addition 720 and the corresponding position elements are added to form an intermediate result of the inner product 721. The intermediate inner product result 721 is equivalent to c1 (4 x 3) + c2 (4 x 3). In fig. 6, 608 and 612 perform the outer product calculation to generate c3 (4 x 3), and 609 and 613 perform the outer product calculation to generate c4 (4 x 3). The above C3 (4 x 3) and C4 (4 x 3) are added to the intermediate inner product result 721, so as to form a complete inner product calculation result C (4 x 3) = a (4 x 4) B (4 x 3) = C1 (4 x 3) + C2 (4 x 3) + C3 (4 x 3) + C4 (4 x 3).
In the 710 and 711 compressed rows, after the sparse matrix B (4 × 3) is compressed in a reduced row direction, the third zero element of the first row and the third zero element of the second row are discarded. Compared with the situation that the sparse matrix B (4 x 3) in FIG. 2 is not subjected to compaction compression, the memory storage of the third zero element in the first row is reduced, the outer product calculation is performed, the proportion of invalid operation is reduced, and the proportion of valid calculation is improved.
FIG. 8 illustrates a data flow 800 involved in the present invention, wherein the original data reading of the dense matrix A (MxK) and the sparse matrix B (KxN) is first performed by block 801. The 802 module performs reduced row-wise data compression of the sparse matrix B (KxN), the specific row-wise data compression process being illustrated in fig. 4-5. Next, outer product calculation is performed, and the simplified row is read to the ith row of the compressed sparse matrix B (KxN) by the 803 module and the ith column of the dense matrix a (MxK) is read by the 804 module to generate ci, respectively, and the outer product calculation of the columns and the rows is illustrated in fig. 7. The outer product accumulation process is then completed by 806, which is to repeatedly accumulate the current result in the same block memory space and superimpose the current result on the value of the memory space in which the previous calculation result was located. And finally judging whether the last row of the sparse matrix B (KxN) is finished or not. If yes, finishing the calculation; if not, then block i is incremented by 1, 808, and the module returns to 803 to perform 803-807 in turn.
The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or changes to the equivalent embodiments with equivalent changes, without departing from the technical spirit of the present invention, and any simple modification, equivalent change and change made to the above embodiments according to the technical spirit of the present invention still belong to the protection scope of the technical spirit of the present invention.

Claims (3)

1. A novel method for solving inner products of a dense matrix and a sparse matrix by accumulation of outer products is characterized by comprising the following effective steps:
a. firstly, completing reading of original data of a dense matrix A (MxK) and a sparse matrix B (KxN);
b. performing line-direction data compression on the sparse matrix B (KxN), and performing line-direction compression on the sparse matrix B (KxN) in two-dimensional layout to obtain a line-direction compressed sparse matrix B (KxN) in one-dimensional layout;
c. reading the ith column of the row-wise compressed sparse matrix B (KxN) and reading the ith row outer product operation of the dense matrix A (MxK) to generate ci;
d. accumulating the obtained outer product results to obtain an inner product of a dense matrix A (MxK) and a sparse matrix B (KxN); in the step B, the compression characteristic of the sparse matrix B (KxN) is that the first element value of the starting point of each row is retained, and whether the element value is zero or not, the non-starting point element of each row is determined whether to be retained or not according to whether the element value is zero or not, and if the element value is zero, the element is discarded, and if the element value is non-zero, the element is retained.
2. The novel method for solving inner products of dense matrix and sparse matrix by accumulating outer products according to claim 1, further comprising e step, e, judging whether the last column of sparse matrix B (KxN) is finished, if yes, finishing the calculation; if not, increasing the value of i by 1, and repeating the steps c to e.
3. The method for solving the inner product of the dense matrix and the sparse matrix through the novel outer product accumulation as claimed in claim 2, wherein in the step B, the row-wise compressed sparse matrix B (KxN) comprises a total length after matrix compression, a column index array and a compressed element value array, wherein the total number sum of elements of the compressed total length sparse matrix B (4 x 3) is retained; the column index array is used for representing column index values of elements reserved in each row under an original two-dimensional layout; the compressed element value array is used for sequentially filling compressed element values according to the column sequence.
CN202110170947.7A 2021-02-08 2021-02-08 Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation Active CN112799635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110170947.7A CN112799635B (en) 2021-02-08 2021-02-08 Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110170947.7A CN112799635B (en) 2021-02-08 2021-02-08 Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation

Publications (2)

Publication Number Publication Date
CN112799635A CN112799635A (en) 2021-05-14
CN112799635B true CN112799635B (en) 2022-11-15

Family

ID=75814748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110170947.7A Active CN112799635B (en) 2021-02-08 2021-02-08 Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation

Country Status (1)

Country Link
CN (1) CN112799635B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005115497A (en) * 2003-10-03 2005-04-28 Nec Corp Computing machine and method for repetitively finding solution to simultaneous linear equation
CN104112044A (en) * 2014-07-04 2014-10-22 同济大学 Efficient analyzing method for superfine line structure object electromagnetic property
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated
CN110325988A (en) * 2017-01-22 2019-10-11 Gsi 科技公司 Sparse matrix multiplication in associated memory devices
CN111191784A (en) * 2018-11-14 2020-05-22 辉达公司 Transposed sparse matrix multiplied by dense matrix for neural network training
CN111859277A (en) * 2020-07-23 2020-10-30 中国人民解放军国防科技大学 Sparse matrix vector multiplication vectorization implementation method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011040A1 (en) * 2008-07-11 2010-01-14 James Vannucci Device and method for solving a system of equations characterized by a coefficient matrix comprising a Toeplitz structure
WO2018022821A1 (en) * 2016-07-29 2018-02-01 Arizona Board Of Regents On Behalf Of Arizona State University Memory compression in a deep neural network
US10528864B2 (en) * 2016-08-11 2020-01-07 Nvidia Corporation Sparse convolutional neural network accelerator
US10452744B2 (en) * 2017-03-27 2019-10-22 Oracle International Corporation Memory management for sparse matrix multiplication
CN107679618B (en) * 2017-07-28 2021-06-11 赛灵思电子科技(北京)有限公司 Static strategy fixed-point training method and device
US11478603B2 (en) * 2017-12-31 2022-10-25 Neuroenhancement Lab, LLC Method and apparatus for neuroenhancement to enhance emotional response
US10572568B2 (en) * 2018-03-28 2020-02-25 Intel Corporation Accelerator for sparse-dense matrix multiplication
US10644721B2 (en) * 2018-06-11 2020-05-05 Tenstorrent Inc. Processing core data compression and storage system
CN109032670B (en) * 2018-08-08 2021-10-19 上海寒武纪信息科技有限公司 Neural network processing device and method for executing vector copy instruction
CN109726314B (en) * 2019-01-03 2020-09-01 中国人民解放军国防科技大学 Bitmap-based sparse matrix compression storage method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005115497A (en) * 2003-10-03 2005-04-28 Nec Corp Computing machine and method for repetitively finding solution to simultaneous linear equation
CN104112044A (en) * 2014-07-04 2014-10-22 同济大学 Efficient analyzing method for superfine line structure object electromagnetic property
CN110325988A (en) * 2017-01-22 2019-10-11 Gsi 科技公司 Sparse matrix multiplication in associated memory devices
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated
CN111191784A (en) * 2018-11-14 2020-05-22 辉达公司 Transposed sparse matrix multiplied by dense matrix for neural network training
CN111859277A (en) * 2020-07-23 2020-10-30 中国人民解放军国防科技大学 Sparse matrix vector multiplication vectorization implementation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于GPU的稀疏线性系统的预条件共轭梯度法;张健飞等;《计算机应用》;20130301(第03期);全文 *

Also Published As

Publication number Publication date
CN112799635A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN111247527B (en) Method and device for determining characteristic images in convolutional neural network model
US20110107060A1 (en) Transposing array data on simd multi-core processor architectures
CN112732222B (en) Sparse matrix accelerated calculation method, device, equipment and medium
CN110580324B (en) Image matrix operation method and device, computer equipment and storage medium
CN109840585B (en) Sparse two-dimensional convolution-oriented operation method and system
CN112835552A (en) Method for solving inner product of sparse matrix and dense matrix by outer product accumulation
CN112633470B (en) Method, system, device and medium for optimizing neural network convolution residual structure
CN112286864A (en) Sparse data processing method and system for accelerating operation of reconfigurable processor
CN113762493A (en) Neural network model compression method and device, acceleration unit and computing system
CN112799635B (en) Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation
JP3802783B2 (en) Collision prevention memory device and address calculation and data routing method using the same
US7143072B2 (en) Method and a system for calculating the values of the neurons of a neural network
CN104572588B (en) Matrix inversion process method and apparatus
CN113485750A (en) Data processing method and data processing device
CN110737870B (en) Method and device for merging schuler matrixes on GPU
CN111667052A (en) Standard and nonstandard volume consistency transformation method for special neural network accelerator
CN111723906A (en) Accelerated calculation method and system of recurrent neural network and related device
CN116451755A (en) Acceleration method and device of graph convolution neural network and electronic equipment
CN108184127A (en) A kind of configurable more dimension D CT mapping hardware multiplexing architectures
US11361052B2 (en) Method of formatting a weight matrix, an accelerator using the formatted weight matrix, and a system including the accelerator
CN111783977B (en) Neural network training process intermediate value storage compression method and device based on regional gradient update
CN113094648B (en) Method for solving triangular matrix and matrix inner product by outer product accumulation
CN113705784A (en) Neural network weight coding method based on matrix sharing and hardware system
JP2022020464A (en) Neural network processing apparatus, neural network processing method, and computer program
CN112508187A (en) Machine learning model compression method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210903

Address after: 518000 floor 71, Ping An financial center, 5033 Yitian Road, Fu'an community, Futian street, Futian District, Shenzhen, Guangdong Province

Applicant after: Computing chip (Shenzhen) Information Technology Co.,Ltd.

Address before: Room 911, block a, Qilu Software Park, 1766 Xinluo street, high tech Industrial Development Zone, Jinan City, Shandong Province

Applicant before: China Mdt InfoTech Ltd.

GR01 Patent grant
GR01 Patent grant