WO2023272917A1 - Sparse matrix storage and computation system and method - Google Patents

Sparse matrix storage and computation system and method Download PDF

Info

Publication number
WO2023272917A1
WO2023272917A1 PCT/CN2021/115335 CN2021115335W WO2023272917A1 WO 2023272917 A1 WO2023272917 A1 WO 2023272917A1 CN 2021115335 W CN2021115335 W CN 2021115335W WO 2023272917 A1 WO2023272917 A1 WO 2023272917A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
sub
storage
sparse matrix
storage array
Prior art date
Application number
PCT/CN2021/115335
Other languages
French (fr)
Chinese (zh)
Inventor
李祎
杨岭
缪向水
Original Assignee
华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华中科技大学 filed Critical 华中科技大学
Publication of WO2023272917A1 publication Critical patent/WO2023272917A1/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/10Decoders
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/06Address interface arrangements, e.g. address buffers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/08Word line control circuits, e.g. drivers, boosters, pull-up circuits, pull-down circuits, precharging circuits, for word lines
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C8/00Arrangements for selecting an address in a digital store
    • G11C8/16Multiple access memory array, e.g. addressing one storage element via at least two independent addressing line groups

Definitions

  • the invention belongs to the field of microelectronic devices, and more specifically relates to a sparse matrix storage and calculation system and method.
  • Sparse matrix is a common matrix in scientific and engineering calculations, but because its 0 elements account for most of the entire matrix, and 0 elements are meaningless for matrix calculations, therefore, the storage and calculation efficiency of sparse matrices Low.
  • the purpose of the present invention is to provide a sparse matrix storage and calculation system and method, aiming at solving the problem that the existing sparse matrix storage and matrix-vector multiplication cannot remove 0 elements, and 0 elements not only waste storage Therefore, there are problems of large storage space and low computational efficiency in the process of sparse matrix storage and matrix-vector multiplication.
  • the present invention provides a sparse matrix storage and calculation system, including a first memory array, a second memory array, a first peripheral circuit, a second peripheral circuit, a main processor, an on-chip cache and block storage scheduling unit;
  • the first storage array is used to store the coordinate index table of the non-zero elements of the sparse matrix
  • the second storage array is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation
  • the on-chip cache is used to load the index table of the sparse matrix when performing the sparse matrix multiplication operation, and transmit the address decoding and the selection of the strobe switch position in the index table to the first peripheral circuit and the second peripheral circuit respectively; and store the intermediate operation As a result, after all calculation tasks are completed, all intermediate calculation results are returned to the main processor;
  • the block storage scheduling unit is used to block the sparse matrix into several sub-matrices, store each sub-matrix to the second storage array according to different compression formats; and establish an index table corresponding to the remaining sub-matrix, and store it in the first storage array ;
  • the first peripheral circuit is used to decode the received address, read and write the index table in the first storage array, and transmit the index table of the read-write sparse matrix to the on-chip cache;
  • the second peripheral circuit is used to convert the vector into a voltage signal, and open the corresponding switch according to the selection of the position of the strobe switch, and the voltage signal is applied to the bit line or word line corresponding to the sub-matrix of the sparse matrix through the opened switch, and Read the intermediate operation results through the word line or bit line and store them in the on-chip cache;
  • the main processor is used for analyzing the type of the sparse matrix; receiving the intermediate operation result; and transferring the received vector to the second peripheral circuit.
  • the method of storing the sub-matrix according to different compression formats is:
  • the sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
  • the indented row storage format is directly called, and the non-zero elements are left-shifted so that all elements are compressed into the same row for storage.
  • the first peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter, and an address decoder;
  • the second peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter and a gate switch.
  • the structures of the first memory array and the second memory array are a cross bar structure, or a transistor-memristor cascade structure, or a single transistor-multi-memristor cascade structure.
  • the memory in the first memory array and the second memory array is a memristor, or a resistive change memory, or a phase change memory, or an optional transfer torque-magnetic random access memory, or a NOR Flash device or a NAND Flash device.
  • the present invention provides a sparse matrix storage calculation method, comprising the following steps:
  • the sparse matrix is divided, stored according to different compression formats, and an index table corresponding to each sub-matrix is established;
  • each sub-matrix as a unit, sequentially decoding according to the address in the corresponding index table of each sub-matrix, loading the electrical signal into the sub-matrix, completing the multiplication and accumulation operation between the current sub-matrix and the vector, and storing the current intermediate operation result.
  • the method of storing the sub-matrix according to different compression formats is:
  • the sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
  • the sub-matrix supports a direct indentation storage format, and the non-zero elements are left-shifted, so that all elements are compressed into the same row for storage.
  • the storage array in the sparse array storage and calculation system provided in the present invention includes two parts, respectively the first storage array and the second storage array; the first storage array is used to store the coordinate index table of the non-zero element of the sparse matrix; the second storage array It is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation; this storage method can effectively improve the storage efficiency of the sparse matrix-vector multiplication in the in-memory calculation, and ensure the reliability of the calculation.
  • the block storage scheduling unit divides the sparse matrix into several sub-matrices and removes the 0 elements in the sub-matrices, stores each sub-matrix in the second storage array according to different compression formats, and establishes the corresponding
  • the index table is stored in the first storage array; because there are many 0 elements in the sparse matrix, not only the storage space is wasted, but also some unnecessary energy consumption and calculation delay will be added during the calculation process. Therefore, the block storage scheduling unit will be sparse
  • the deletion of 0 in the matrix can take into account the storage efficiency while retaining the parallelism of performing matrix-vector multiplication for in-memory calculations. Among them, the compression efficiency improvement for diagonal matrices and triangular matrices is particularly obvious.
  • Fig. 1 is a schematic structural diagram of a sparse matrix storage and calculation system provided by an embodiment of the present invention
  • Fig. 2 is a schematic diagram of the storage and operation format of the diagonal sparse matrix provided by Embodiment 1 of the present invention
  • Fig. 3 is a schematic diagram of the storage and operation format of the triangular sparse matrix provided by Embodiment 2 of the present invention.
  • Fig. 4 is a schematic diagram of the storage and operation format of the random sparse matrix provided by Embodiment 3 of the present invention.
  • the present invention provides a sparse matrix storage and calculation system, including a first storage array 3-1, a second storage array 3-3, a first peripheral circuit 3-2, The second peripheral circuit 3-4, the main processor 1, the on-chip cache 4 and the block storage scheduling unit 2;
  • the first storage array 3-1 is used to store the coordinate index table of the non-zero elements of the sparse matrix
  • the second storage array 3-3 is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation
  • the on-chip buffer 4 is used to load the index table of the sparse matrix when performing the sparse matrix multiplication operation, and transmit the address decoding in the index table and the selection of the strobe switch position to the first peripheral circuit 3-2 and the second peripheral circuit 3 respectively -4; and store the intermediate calculation results, and return all intermediate calculation results to the main processor after all calculation tasks are completed;
  • the block storage scheduling unit 2 is used to block the sparse matrix into several sub-matrices, store each sub-matrix to the second storage array according to different compression formats; and establish an index table corresponding to the remaining sub-matrix, and store it in the first storage array array;
  • the first peripheral circuit 3-2 is used to decode the received address, read and write the index table in the first storage array, and transmit the index table of the read-write sparse matrix to the on-chip cache;
  • the second peripheral circuit 3-4 is used to convert the vector into a voltage signal, and open the corresponding switch according to the selection of the position of the strobe switch, and the voltage signal is applied to the bit line or word line corresponding to the sub-matrix of the sparse matrix through the opened switch On, and read the intermediate operation results through the word line or bit line and store them in the on-chip cache;
  • the main processor 1 is used for analyzing the type of the sparse matrix; receiving intermediate operation results; and transferring the received vectors to the second peripheral circuit.
  • the method of storing the sub-matrix according to different compression formats is:
  • the sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
  • the indented row storage format is directly called, and the non-zero elements are left-shifted so that all elements are compressed into the same row for storage.
  • the first peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter, and an address decoder;
  • the second peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter and a gate switch.
  • the structures of the first storage array 3-1 and the second storage array 3-3 are cross bar structures, or transistor-memristor cascade structures, or single transistor-multi-memristor cascade structures.
  • the memory in the first memory array 3-1 and the second memory array 3-3 is a memristor, or a resistive change memory, or a phase change memory, or a self-selected transfer torque-magnetic random access memory, or a NOR Flash device or NAND Flash devices.
  • the present invention provides a sparse matrix storage calculation method, comprising the following steps:
  • the sparse matrix is divided, stored according to different compression formats, and an index table corresponding to each sub-matrix is established;
  • each sub-matrix as a unit, sequentially decoding according to the address in the corresponding index table of each sub-matrix, loading the electrical signal into the sub-matrix, completing the multiplication and accumulation operation between the current sub-matrix and the vector, and storing the current intermediate operation result.
  • the method of storing the sub-matrix according to different compression formats is:
  • the sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
  • the acquired sub-matrix supports a direct call indented storage format, and the non-zero elements are left-shifted so that all elements are compressed into the same row for storage.
  • the first sub-matrix 7-1 and the second sub-matrix 7-2 are stored in the second storage array 3-3, and corresponding indexes are set up and stored in the first storage array; the index situation is specifically: the first sub-matrix in this embodiment
  • the columns of the matrix 7-1 are 1 ⁇ n/2+1 columns;
  • the columns of the second sub-matrix 7-2 are n/2 ⁇ n columns, and the column information is stored in the second storage array 3-3;
  • the vector is sent from the main processor to the second peripheral circuit 3-4, and the vector is converted into a voltage signal;
  • the address information open the switch corresponding to the first sub-matrix 7-1 in the second peripheral circuit 3-4; the first part of the vector voltage signal 9-1 enters the second storage array, completes the first matrix-vector multiplication operation, and obtains an intermediate result Part 10-1 of the vector Y is stored in the on-chip cache 4;
  • a part 10-1 and another part 10-2 of the intermediate result vector Y are returned to the main processor together, that is, a round of multiplication of the sparse matrix vector is completed.
  • the sparse matrix can be divided into finer blocks, such as divided into 4 blocks (8-1, 8-2, 8-3 and 8-4); the vector is divided into 9-3, 9-4, 9-5 and 9-6; perform four operations, but store fewer 0 elements.
  • the first sub-matrix 12-1 and the second sub-matrix 12-2 are stored in the second storage array 3-3, and corresponding indexes are set up and stored in the first storage array; the index situation is specifically: the first sub-matrix in this embodiment
  • the columns of the matrix 12-1 are 1-n/2 columns;
  • the columns of the second sub-matrix 12-2 are 1-n columns, and the column information is stored in the second storage array 3-3;
  • the vector is sent from the main processor to the second peripheral circuit 3-4, and the vector is converted into a voltage signal;
  • the addresses corresponding to the first sub-matrix 12-1, that is, the addresses of columns 1 to n/2, are first read and written from the on-chip cache to the block storage scheduling unit 2;
  • the address information open the switch corresponding to the first sub-matrix 12-1 in the second peripheral circuit 3-4; the first part of the vector voltage signal 9-1 enters the second storage array, completes the first matrix-vector multiplication operation, and obtains an intermediate result Part 10-1 of the vector Y is stored in the on-chip cache 4;
  • the address corresponding to the second sub-matrix 12-2, that is, the address of the 1 ⁇ n/2 columns, is sent to the second peripheral Circuit 3-4, the switch in the second peripheral circuit 3-4 is connected to the second sub-matrix 12-2, and another part of the voltage signal 9-2 of the vector enters the second storage array to complete the multiplication operation of the second array vector to obtain the intermediate
  • the other part 10-2 of the result vector Y is stored in the on-chip cache 4;
  • the sparse matrix can be divided into finer blocks, such as divided into 4 blocks (13-1, 13-2, 13-3 and 13-4); four operations are performed, but the stored 0 elements are more few.
  • the vector is sent from the main processor to the second peripheral circuit, and the second peripheral circuit converts the vector into a voltage signal;
  • the header of the index table is the row number, and the column number of the row element Stored as a linked list element, so when performing calculations, load a linked list of the index table in turn, convert it into the address of the sparse matrix 15-1, turn on the corresponding switch, perform the vector multiplication of the line diversion, and store the result of each operation on the chip Cache 4, a complete matrix-vector multiplication is completed, and then the result is returned to the main processor.
  • the present invention has the following advantages:
  • the storage array in the sparse array storage and calculation system provided in the present invention includes two parts, respectively the first storage array and the second storage array; the first storage array is used to store the coordinate index table of the non-zero element of the sparse matrix; the second storage array It is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation; this storage method can effectively improve the storage efficiency of the sparse matrix-vector multiplication in the in-memory calculation, and ensure the reliability of the calculation.
  • the block storage scheduling unit divides the sparse matrix into several sub-matrices and removes the 0 elements in the sub-matrices, stores each sub-matrix in the second storage array according to different compression formats, and establishes the corresponding
  • the index table is stored in the first storage array; because there are many 0 elements in the sparse matrix, not only the storage space is wasted, but also some unnecessary energy consumption and calculation delay will be added during the calculation process. Therefore, the block storage scheduling unit will be sparse
  • the deletion of 0 in the matrix can take into account the storage efficiency while retaining the parallelism of performing matrix-vector multiplication for in-memory calculations. Among them, the compression efficiency improvement for diagonal matrices and triangular matrices is particularly obvious.

Landscapes

  • Engineering & Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Complex Calculations (AREA)

Abstract

A sparse matrix storage and computation system and method, which belong to the field of microelectronic devices. The system comprises: a first storage array, which is used for storing a coordinate index table of non-zero elements of a sparse matrix; a second storage array, which is used for storing elements of the sparse matrix, and also serves as an in-situ computing core for a multiplication operation of the sparse matrix; a partitioning and storage scheduling unit, which is used for partitioning the sparse matrix into several sub-matrices, storing the sub-matrices in the second storage array according to different compression formats, and establishing an index table corresponding to the sparse matrix; and a second peripheral circuit, which is used for converting a vector into a voltage signal, applying the voltage signal to a bit line or a word line corresponding to each sub-matrix of the sparse matrix, and completing a multiplication operation of the sparse matrix and the vector.

Description

一种稀疏矩阵存算系统及方法A sparse matrix storage and calculation system and method 【技术领域】【Technical field】
本发明属于微电子器件领域,更具体地,涉及一种稀疏矩阵存算系统及方法。The invention belongs to the field of microelectronic devices, and more specifically relates to a sparse matrix storage and calculation system and method.
【背景技术】【Background technique】
稀疏矩阵是科学和工程计算中较为常见的一种矩阵,但是由于其0元素占整个矩阵的大部分,而0元素对矩阵计算来讲是无意义的,因此,稀疏矩阵的存储和计算效率比较低。Sparse matrix is a common matrix in scientific and engineering calculations, but because its 0 elements account for most of the entire matrix, and 0 elements are meaningless for matrix calculations, therefore, the storage and calculation efficiency of sparse matrices Low.
稀疏矩阵的存储和矩阵向量乘法一直以来是计算机和微电子领域的重大挑战,特别是在存内计算中,由于存内计算技术具有天然的原位计算和高并行度,对矩阵元素存储的位置有着非常严格的对齐要求,因此在完全并行的情况下,如果不对稀疏矩阵进行数学变换没法对0元素进行剔除的,而0元素在存内计算中往往不是以0的形式存在存储器中,一般是以一个高电阻态存入器件,而不同的器件存0的电阻态不同,同时不存在电导为0的半导体存储器,因此,0元素不仅浪费了存储空间,而且会引起计算误差,增加不必要的能耗和计算延时,而当前还没有专利和文献针对存内计算架构为稀疏矩阵设定特定的存储格式和运算格式。The storage of sparse matrices and matrix-vector multiplication have always been a major challenge in the field of computers and microelectronics, especially in memory computing. Due to the natural in-situ computing and high parallelism of memory computing technology, the location of matrix element storage There are very strict alignment requirements, so in the case of complete parallelism, if the sparse matrix is not mathematically transformed, it is impossible to remove the 0 elements, and the 0 elements are often not stored in the memory in the form of 0 in the memory calculation, generally The device is stored in a high-resistance state, and different devices have different resistance states for storing 0. At the same time, there is no semiconductor memory with a conductance of 0. Therefore, the 0 element not only wastes storage space, but also causes calculation errors and increases unnecessary energy consumption and calculation delay, and currently there are no patents and literatures setting specific storage and operation formats for sparse matrices for in-memory computing architectures.
【发明内容】【Content of invention】
针对现有技术的缺陷,本发明的目的在于提供一种稀疏矩阵存算系统及方法,旨在解决现有的稀疏矩阵的存储和矩阵向量乘法运算时无法剔除0元素,且0元素不仅浪费存储空间,而且会引入计算误差,增加不必要的能耗和计算延时,因此,稀疏矩阵的存储和矩阵向量乘法运算过程中存在存储空间大且计算效率较低的问题。In view of the defects of the prior art, the purpose of the present invention is to provide a sparse matrix storage and calculation system and method, aiming at solving the problem that the existing sparse matrix storage and matrix-vector multiplication cannot remove 0 elements, and 0 elements not only waste storage Therefore, there are problems of large storage space and low computational efficiency in the process of sparse matrix storage and matrix-vector multiplication.
为实现上述目的,本发明提供了一种稀疏矩阵存算系统,包括两两相互 连接的第一存储阵列、第二存储阵列、第一外围电路、第二外围电路、主处理器、片上缓存和分块存储调度单元;To achieve the above object, the present invention provides a sparse matrix storage and calculation system, including a first memory array, a second memory array, a first peripheral circuit, a second peripheral circuit, a main processor, an on-chip cache and block storage scheduling unit;
第一存储阵列用于存储稀疏矩阵非零元的坐标索引表;第二存储阵列用于存储稀疏矩阵的元素,同时作为稀疏矩阵乘法运算的原位计算核;The first storage array is used to store the coordinate index table of the non-zero elements of the sparse matrix; the second storage array is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation;
片上缓存用于在执行稀疏矩阵乘法运算时加载稀疏矩阵的索引表,并将索引表中地址译码和选通开关位置的选择分别传送至第一外围电路和第二外围电路;并存储中间运算结果,待计算任务全部结束,将所有中间运算结果返回至主处理器;The on-chip cache is used to load the index table of the sparse matrix when performing the sparse matrix multiplication operation, and transmit the address decoding and the selection of the strobe switch position in the index table to the first peripheral circuit and the second peripheral circuit respectively; and store the intermediate operation As a result, after all calculation tasks are completed, all intermediate calculation results are returned to the main processor;
分块存储调度单元用于将稀疏矩阵分块成若干子矩阵后,按照不同的压缩格式将各子矩阵存储至第二存储阵列;且建立剩余子矩阵对应的索引表,存储至第一存储阵列;The block storage scheduling unit is used to block the sparse matrix into several sub-matrices, store each sub-matrix to the second storage array according to different compression formats; and establish an index table corresponding to the remaining sub-matrix, and store it in the first storage array ;
第一外围电路用于根据接收的地址译码,对第一存储阵列中的索引表进行读写,将读写的稀疏矩阵的索引表传输至片上缓存;The first peripheral circuit is used to decode the received address, read and write the index table in the first storage array, and transmit the index table of the read-write sparse matrix to the on-chip cache;
第二外围电路用于将向量转换为电压信号,并根据选通开关位置的选择打开对应的开关,电压信号通过已打开的开关施加在稀疏矩阵的子矩阵对应的位线或字线上,并通过字线或位线读取中间运算结果存入片上缓存;The second peripheral circuit is used to convert the vector into a voltage signal, and open the corresponding switch according to the selection of the position of the strobe switch, and the voltage signal is applied to the bit line or word line corresponding to the sub-matrix of the sparse matrix through the opened switch, and Read the intermediate operation results through the word line or bit line and store them in the on-chip cache;
主处理器用于分析稀疏矩阵的类型;接收中间运算结果;并将接收的向量传递至第二外围电路。The main processor is used for analyzing the type of the sparse matrix; receiving the intermediate operation result; and transferring the received vector to the second peripheral circuit.
优选地,按照不同的压缩格式存储子矩阵的方法为:Preferably, the method of storing the sub-matrix according to different compression formats is:
剔除全0的子矩阵,并对剩余的各子矩阵前端和末端全零的行或列进行剔除,只存储非零元的行或列。The sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
优选地,对子矩阵进行压缩时,采用直接调用缩行存储格式,将非零元素左移,使所有的元素压缩至同一行中进行存储。Preferably, when compressing the sub-matrix, the indented row storage format is directly called, and the non-zero elements are left-shifted so that all elements are compressed into the same row for storage.
优选地,第一外围电路包括读写电路、驱动电路、数模转换器、模数转换器和地址译码器;Preferably, the first peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter, and an address decoder;
第二外围电路包括读写电路、驱动电路、数模转换器、模数转换器和选通开关。The second peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter and a gate switch.
优选地,第一存储阵列和第二存储阵列的结构为十字交叉杆结构,或晶体管-忆阻器级联结构,或单晶体管-多忆阻器级联结构。Preferably, the structures of the first memory array and the second memory array are a cross bar structure, or a transistor-memristor cascade structure, or a single transistor-multi-memristor cascade structure.
优选地,第一存储阵列和第二存储阵列中的存储器为忆阻器,或阻变存储器,或相变存储器,或自选转移力矩-磁随机存储器,或NOR Flash器件或NAND Flash器件。Preferably, the memory in the first memory array and the second memory array is a memristor, or a resistive change memory, or a phase change memory, or an optional transfer torque-magnetic random access memory, or a NOR Flash device or a NAND Flash device.
另一方面,本发明提供了一种稀疏矩阵存算方法,包括以下步骤:On the other hand, the present invention provides a sparse matrix storage calculation method, comprising the following steps:
通过识别判断稀疏矩阵的类型,将稀疏矩阵进行分割后,按照不同的压缩格式进行存储,并建立各子矩阵对应的索引表;By identifying and judging the type of the sparse matrix, the sparse matrix is divided, stored according to different compression formats, and an index table corresponding to each sub-matrix is established;
当执行稀疏矩阵向量乘法运算时,将向量转换为电信号;Convert vectors to electrical signals when performing sparse matrix-vector multiplication;
以各子矩阵为单元,顺次根据各子矩阵对应索引表中的地址译码,将电信号加载至子矩阵中,完成当前子矩阵与向量之间的乘法累加运算,存储当前中间运算结果。Taking each sub-matrix as a unit, sequentially decoding according to the address in the corresponding index table of each sub-matrix, loading the electrical signal into the sub-matrix, completing the multiplication and accumulation operation between the current sub-matrix and the vector, and storing the current intermediate operation result.
优选地,按照不同的压缩格式存储子矩阵的方法为:Preferably, the method of storing the sub-matrix according to different compression formats is:
剔除全0的子矩阵,并对剩余的各子矩阵前端和末端全零的行或列进行剔除,只存储非零元的行或列。The sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
优选地,子矩阵支持采用直接调用缩行存储格式,将非零元素左移,使所有的元素压缩至同一行中进行存储。Preferably, the sub-matrix supports a direct indentation storage format, and the non-zero elements are left-shifted, so that all elements are compressed into the same row for storage.
总体而言,通过本发明所构思的以上技术方案与现有技术相比,具有以下有益效果:Generally speaking, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
本发明中提供的稀疏阵列存算系统中存储阵列包括两部分,分别为第一存储阵列和第二存储阵列;第一存储阵列用于存储稀疏矩阵非零元的坐标索引表;第二存储阵列用于存储稀疏矩阵的元素,同时作为稀疏矩阵乘法运算的原位计算核;这种存储方式可以有效提升存内计算中稀疏矩阵向量乘法的 存储效率,并保证计算的可靠性。The storage array in the sparse array storage and calculation system provided in the present invention includes two parts, respectively the first storage array and the second storage array; the first storage array is used to store the coordinate index table of the non-zero element of the sparse matrix; the second storage array It is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation; this storage method can effectively improve the storage efficiency of the sparse matrix-vector multiplication in the in-memory calculation, and ensure the reliability of the calculation.
本发明中分块存储调度单元将稀疏矩阵分块成若干子矩阵后剔除掉子矩阵中的0元素,按照不同的压缩格式将各子矩阵存储至第二存储阵列,且建立系数觉镇对应的索引表,存储至第一存储阵列;因为稀疏矩阵存在很多0元素,不仅浪费存储空间,而且在计算过程中会增加一些不必要的能耗和计算延时,因此,分块存储调度单元将稀疏矩阵中的0删除掉,可以兼顾存储效率的同时,能够保留存内计算执行矩阵向量乘法的并行性,其中,对于对角矩阵和三角矩阵的压缩效率提升尤为明显。In the present invention, the block storage scheduling unit divides the sparse matrix into several sub-matrices and removes the 0 elements in the sub-matrices, stores each sub-matrix in the second storage array according to different compression formats, and establishes the corresponding The index table is stored in the first storage array; because there are many 0 elements in the sparse matrix, not only the storage space is wasted, but also some unnecessary energy consumption and calculation delay will be added during the calculation process. Therefore, the block storage scheduling unit will be sparse The deletion of 0 in the matrix can take into account the storage efficiency while retaining the parallelism of performing matrix-vector multiplication for in-memory calculations. Among them, the compression efficiency improvement for diagonal matrices and triangular matrices is particularly obvious.
【附图说明】【Description of drawings】
图1是本发明实施例提供的稀疏矩阵存算系统的结构示意图;Fig. 1 is a schematic structural diagram of a sparse matrix storage and calculation system provided by an embodiment of the present invention;
图2是本发明实施例1提供的对角稀疏矩阵的存储与运算格式示意图;Fig. 2 is a schematic diagram of the storage and operation format of the diagonal sparse matrix provided by Embodiment 1 of the present invention;
图3是本发明实施例2提供的三角稀疏矩阵的存储与运算格式示意图;Fig. 3 is a schematic diagram of the storage and operation format of the triangular sparse matrix provided by Embodiment 2 of the present invention;
图4是本发明实施例3提供的随机稀疏矩阵的存储与运算格式示意图。Fig. 4 is a schematic diagram of the storage and operation format of the random sparse matrix provided by Embodiment 3 of the present invention.
【具体实施方式】【detailed description】
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
一方面,如图1所示,本发明提供一种稀疏矩阵存算系统,包括两两相互连接的第一存储阵列3-1、第二存储阵列3-3、第一外围电路3-2、第二外围电路3-4、主处理器1、片上缓存4和分块存储调度单元2;On the one hand, as shown in FIG. 1 , the present invention provides a sparse matrix storage and calculation system, including a first storage array 3-1, a second storage array 3-3, a first peripheral circuit 3-2, The second peripheral circuit 3-4, the main processor 1, the on-chip cache 4 and the block storage scheduling unit 2;
第一存储阵列3-1用于存储稀疏矩阵非零元的坐标索引表;第二存储阵列3-3用于存储稀疏矩阵的元素,同时作为稀疏矩阵乘法运算的原位计算核;The first storage array 3-1 is used to store the coordinate index table of the non-zero elements of the sparse matrix; the second storage array 3-3 is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation;
片上缓存4用于在执行稀疏矩阵乘法运算时加载稀疏矩阵的索引表,并将索引表中地址译码和选通开关位置的选择分别传送至第一外围电路3-2和第二外围电路3-4;并存储中间运算结果,待计算任务全部结束,将所有中间 运算结果返回至主处理器;The on-chip buffer 4 is used to load the index table of the sparse matrix when performing the sparse matrix multiplication operation, and transmit the address decoding in the index table and the selection of the strobe switch position to the first peripheral circuit 3-2 and the second peripheral circuit 3 respectively -4; and store the intermediate calculation results, and return all intermediate calculation results to the main processor after all calculation tasks are completed;
分块存储调度单元2用于将稀疏矩阵分块成若干子矩阵后,按照不同的压缩格式将各子矩阵存储至第二存储阵列;且建立剩余子矩阵对应的索引表,存储至第一存储阵列;The block storage scheduling unit 2 is used to block the sparse matrix into several sub-matrices, store each sub-matrix to the second storage array according to different compression formats; and establish an index table corresponding to the remaining sub-matrix, and store it in the first storage array array;
第一外围电路3-2用于根据接收的地址译码,对第一存储阵列中的索引表进行读写,将读写的稀疏矩阵的索引表传输至片上缓存;The first peripheral circuit 3-2 is used to decode the received address, read and write the index table in the first storage array, and transmit the index table of the read-write sparse matrix to the on-chip cache;
第二外围电路3-4用于将向量转换为电压信号,并根据选通开关位置的选择打开对应的开关,电压信号通过已打开的开关施加在稀疏矩阵的子矩阵对应的位线或字线上,并通过字线或位线读取中间运算结果存入片上缓存;The second peripheral circuit 3-4 is used to convert the vector into a voltage signal, and open the corresponding switch according to the selection of the position of the strobe switch, and the voltage signal is applied to the bit line or word line corresponding to the sub-matrix of the sparse matrix through the opened switch On, and read the intermediate operation results through the word line or bit line and store them in the on-chip cache;
主处理器1用于分析稀疏矩阵的类型;接收中间运算结果;并将接收的向量传递至第二外围电路。The main processor 1 is used for analyzing the type of the sparse matrix; receiving intermediate operation results; and transferring the received vectors to the second peripheral circuit.
优选地,按照不同的压缩格式存储子矩阵的方法为:Preferably, the method of storing the sub-matrix according to different compression formats is:
剔除全0的子矩阵,并对剩余的各子矩阵前端和末端全零的行或列进行剔除,只存储非零元的行或列。The sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
优选地,对子矩阵进行压缩时,采用直接调用缩行存储格式,将非零元素左移,使所有的元素压缩至同一行中进行存储。Preferably, when compressing the sub-matrix, the indented row storage format is directly called, and the non-zero elements are left-shifted so that all elements are compressed into the same row for storage.
优选地,第一外围电路包括读写电路、驱动电路、数模转换器、模数转换器和地址译码器;Preferably, the first peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter, and an address decoder;
第二外围电路包括读写电路、驱动电路、数模转换器、模数转换器和选通开关。The second peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter and a gate switch.
优选地,第一存储阵列3-1和第二存储阵列3-3的结构为十字交叉杆结构,或晶体管-忆阻器级联结构,或单晶体管-多忆阻器级联结构。Preferably, the structures of the first storage array 3-1 and the second storage array 3-3 are cross bar structures, or transistor-memristor cascade structures, or single transistor-multi-memristor cascade structures.
优选地,第一存储阵列3-1和第二存储阵列3-3中的存储器为忆阻器,或阻变存储器,或相变存储器,或自选转移力矩-磁随机存储器,或NOR Flash器件或NAND Flash器件。Preferably, the memory in the first memory array 3-1 and the second memory array 3-3 is a memristor, or a resistive change memory, or a phase change memory, or a self-selected transfer torque-magnetic random access memory, or a NOR Flash device or NAND Flash devices.
另一方面,本发明提供了一种稀疏矩阵存算方法,包括以下步骤:On the other hand, the present invention provides a sparse matrix storage calculation method, comprising the following steps:
通过识别判断稀疏矩阵的类型,将稀疏矩阵进行分割后,按照不同的压缩格式进行存储,并建立各子矩阵对应的索引表;By identifying and judging the type of the sparse matrix, the sparse matrix is divided, stored according to different compression formats, and an index table corresponding to each sub-matrix is established;
当执行稀疏矩阵向量乘法运算时,将向量转换为电信号;Convert vectors to electrical signals when performing sparse matrix-vector multiplication;
以各子矩阵为单元,顺次根据各子矩阵对应索引表中的地址译码,将电信号加载至子矩阵中,完成当前子矩阵与向量之间的乘法累加运算,存储当前中间运算结果。Taking each sub-matrix as a unit, sequentially decoding according to the address in the corresponding index table of each sub-matrix, loading the electrical signal into the sub-matrix, completing the multiplication and accumulation operation between the current sub-matrix and the vector, and storing the current intermediate operation result.
优选地,按照不同的压缩格式存储子矩阵的方法为:Preferably, the method of storing the sub-matrix according to different compression formats is:
剔除全0的子矩阵,并对剩余的各子矩阵前端和末端全零的行或列进行剔除,只存储非零元的行或列。The sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
优选地,获取的子矩阵支持采用直接调用缩行存储格式,将非零元素左移,使所有的元素压缩至同一行中进行存储。Preferably, the acquired sub-matrix supports a direct call indented storage format, and the non-zero elements are left-shifted so that all elements are compressed into the same row for storage.
实施例1Example 1
如图2所示,当处理的稀疏矩阵为n×n的对角矩阵6时,先根据实际需求确定分块参数,假设分两块计算,调用对角矩阵的分块算法7,分为上、下两个子矩阵;As shown in Figure 2, when the sparse matrix to be processed is an n×n diagonal matrix 6, first determine the block parameters according to the actual needs, assuming that the calculation is divided into two blocks, and call the block algorithm 7 of the diagonal matrix, divided into the above , the next two sub-matrices;
将全零列进行剔除,保存含有非零元的列,如第一子矩阵7-1和第二子矩阵7-2;Eliminate all zero columns and save columns containing non-zero elements, such as the first sub-matrix 7-1 and the second sub-matrix 7-2;
将第一子矩阵7-1和第二子矩阵7-2存入第二存储阵列3-3,并建立相应的索引存入第一存储阵列;索引情况具体为:本实施例中第一子矩阵7-1的列为1~n/2+1列;第二子矩阵7-2的列为n/2~n列,将列信息存入第二存储阵列3-3中;The first sub-matrix 7-1 and the second sub-matrix 7-2 are stored in the second storage array 3-3, and corresponding indexes are set up and stored in the first storage array; the index situation is specifically: the first sub-matrix in this embodiment The columns of the matrix 7-1 are 1~n/2+1 columns; the columns of the second sub-matrix 7-2 are n/2~n columns, and the column information is stored in the second storage array 3-3;
当需要执行该稀疏矩阵与向量的乘法运算时,向量从主处理器送入第二外围电路3-4中,将向量转换为电压信号;When the multiplication operation of the sparse matrix and the vector needs to be performed, the vector is sent from the main processor to the second peripheral circuit 3-4, and the vector is converted into a voltage signal;
将索引表从第一存储阵列加载至片上缓存4;Load the index table from the first storage array to the on-chip cache 4;
在第一个周期,先将第一子矩阵7-1对应的地址,即1~n/2+1列的地址从片上缓存读写至分块存储调度单元2;In the first cycle, first read and write the address corresponding to the first sub-matrix 7-1, that is, the address of column 1~n/2+1, from the on-chip cache to the block storage scheduling unit 2;
根据地址信息,打开第二外围电路3-4中第一子矩阵7-1对应的开关;向量第一部分电压信号9-1进入第二存储阵列,完成第一次矩阵向量乘法运算,得到中间结果向量Y的一部分10-1,存入片上缓存4;According to the address information, open the switch corresponding to the first sub-matrix 7-1 in the second peripheral circuit 3-4; the first part of the vector voltage signal 9-1 enters the second storage array, completes the first matrix-vector multiplication operation, and obtains an intermediate result Part 10-1 of the vector Y is stored in the on-chip cache 4;
进行第二次矩阵向量的乘法运算,因为索引表已经加载到片上缓存4中,因此,将第二子矩阵7-2对应的地址,即n/2~n列的地址,发送至第二外围电路3-4,第二外围电路3-4中的开关接到第二子矩阵7-2,向量的另一部分电压信号9-2进入第二存储阵列,完成第二阵列向量乘法运算,得到中间结果向量Y的另一部分10-2,存入片上缓存4;Carry out the multiplication operation of the matrix vector for the second time, because the index table has been loaded in the on-chip cache 4, therefore, the address corresponding to the second sub-matrix 7-2, that is, the address of n/2~n columns, is sent to the second peripheral Circuit 3-4, the switch in the second peripheral circuit 3-4 is connected to the second sub-matrix 7-2, and another part of the voltage signal 9-2 of the vector enters the second storage array to complete the multiplication operation of the second array vector to obtain the intermediate The other part 10-2 of the result vector Y is stored in the on-chip cache 4;
将中间结果向量Y的一部分10-1和另一部分10-2一并返回至主处理器中,即完成一轮稀疏矩阵向量的乘法运算。A part 10-1 and another part 10-2 of the intermediate result vector Y are returned to the main processor together, that is, a round of multiplication of the sparse matrix vector is completed.
与上述操作相同,可对稀疏矩阵进行更精细的分块,如分为4块(8-1、8-2、8-3和8-4);向量分为9-3、9-4、9-5和9-6;执行四次运算,但是存储的0元素更少。Same as the above operation, the sparse matrix can be divided into finer blocks, such as divided into 4 blocks (8-1, 8-2, 8-3 and 8-4); the vector is divided into 9-3, 9-4, 9-5 and 9-6; perform four operations, but store fewer 0 elements.
实施例2Example 2
如图3所示,当处理的稀疏矩阵为n×n的三角矩阵11时,先根据实际需求确定分块参数,假设分两块计算,调用对角矩阵的分块算法12,分为上、下两个子矩阵;As shown in Figure 3, when the sparse matrix to be processed is an n×n triangular matrix 11, first determine the block parameters according to the actual needs, assuming that the calculation is divided into two blocks, and call the block algorithm 12 of the diagonal matrix, which is divided into upper and lower the next two submatrices;
将全零列进行剔除,保存含有非零元的列,如第一子矩阵12-1和第二子矩阵12-2;Eliminate all zero columns and save columns containing non-zero elements, such as the first sub-matrix 12-1 and the second sub-matrix 12-2;
将第一子矩阵12-1和第二子矩阵12-2存入第二存储阵列3-3,并建立相应的索引存入第一存储阵列;索引情况具体为:本实施例中第一子矩阵12-1的列为1~n/2列;第二子矩阵12-2的列为1~n列,将列信息存入第二存储阵列3-3中;The first sub-matrix 12-1 and the second sub-matrix 12-2 are stored in the second storage array 3-3, and corresponding indexes are set up and stored in the first storage array; the index situation is specifically: the first sub-matrix in this embodiment The columns of the matrix 12-1 are 1-n/2 columns; the columns of the second sub-matrix 12-2 are 1-n columns, and the column information is stored in the second storage array 3-3;
当需要执行该稀疏矩阵与向量的乘法运算时,向量从主处理器送入第二外围电路3-4中,将向量转换为电压信号;When the multiplication operation of the sparse matrix and the vector needs to be performed, the vector is sent from the main processor to the second peripheral circuit 3-4, and the vector is converted into a voltage signal;
将索引表从第一存储阵列加载至片上缓存4;Load the index table from the first storage array to the on-chip cache 4;
在第一个周期,先将第一子矩阵12-1对应的地址,即1~n/2列的地址从片上缓存读写至分块存储调度单元2;In the first cycle, the addresses corresponding to the first sub-matrix 12-1, that is, the addresses of columns 1 to n/2, are first read and written from the on-chip cache to the block storage scheduling unit 2;
根据地址信息,打开第二外围电路3-4中第一子矩阵12-1对应的开关;向量第一部分电压信号9-1进入第二存储阵列,完成第一次矩阵向量乘法运算,得到中间结果向量Y的一部分10-1,存入片上缓存4;According to the address information, open the switch corresponding to the first sub-matrix 12-1 in the second peripheral circuit 3-4; the first part of the vector voltage signal 9-1 enters the second storage array, completes the first matrix-vector multiplication operation, and obtains an intermediate result Part 10-1 of the vector Y is stored in the on-chip cache 4;
进行第二次矩阵向量的乘法运算,因为索引表已经加载到片上缓存4中,因此,将第二子矩阵12-2对应的地址,即1~n/2列的地址,发送至第二外围电路3-4,第二外围电路3-4中的开关接到第二子矩阵12-2,向量的另一部分电压信号9-2进入第二存储阵列,完成第二阵列向量乘法运算,得到中间结果向量Y的另一部分10-2,存入片上缓存4;Perform the second matrix-vector multiplication operation, because the index table has been loaded into the on-chip cache 4, therefore, the address corresponding to the second sub-matrix 12-2, that is, the address of the 1~n/2 columns, is sent to the second peripheral Circuit 3-4, the switch in the second peripheral circuit 3-4 is connected to the second sub-matrix 12-2, and another part of the voltage signal 9-2 of the vector enters the second storage array to complete the multiplication operation of the second array vector to obtain the intermediate The other part 10-2 of the result vector Y is stored in the on-chip cache 4;
与上述操作相同,可对稀疏矩阵进行更精细的分块,如分为4块(13-1、13-2、13-3和13-4);执行四次运算,但是存储的0元素更少。Same as the above operation, the sparse matrix can be divided into finer blocks, such as divided into 4 blocks (13-1, 13-2, 13-3 and 13-4); four operations are performed, but the stored 0 elements are more few.
实施例3Example 3
如图4所示,当处理的矩阵为n×n的随机稀疏矩阵15时,先采用传统的缩行存储格式,将每一行的非零元全部集中到该行的首端,如15-1所示;As shown in Figure 4, when the processed matrix is an n×n random sparse matrix 15, the traditional indented storage format is used first, and all the non-zero elements of each row are concentrated at the beginning of the row, such as 15-1 shown;
建立索引表16,存入第一存储阵列的存储区;Build an index table 16 and store it in the storage area of the first storage array;
当需要执行矩阵向量乘法时,向量从主处理器送入第二外围电路,第二外围电路将向量转换为电压信号;When it is necessary to perform matrix-vector multiplication, the vector is sent from the main processor to the second peripheral circuit, and the second peripheral circuit converts the vector into a voltage signal;
将索引表从存储区加载到片上缓存4,因为每一行的元素并没有列对齐,因此在这种情况下,需要逐行进行计算,索引表的表头为行序号,该行元素的列序号作为链表元素存入,因此在执行计算时,依次载入该索引表的一个链表,转换为稀疏矩阵15-1的地址,打开对应的开关,执行改行的向量乘法, 每次运算结果存入片上缓存4,一次完整的矩阵向量乘法结束,再将结果返回主处理器。Load the index table from the storage area to the on-chip cache 4, because the elements of each row are not aligned in columns, so in this case, calculations need to be performed row by row. The header of the index table is the row number, and the column number of the row element Stored as a linked list element, so when performing calculations, load a linked list of the index table in turn, convert it into the address of the sparse matrix 15-1, turn on the corresponding switch, perform the vector multiplication of the line diversion, and store the result of each operation on the chip Cache 4, a complete matrix-vector multiplication is completed, and then the result is returned to the main processor.
综上所述,本发明存在以下优势:In summary, the present invention has the following advantages:
本发明中提供的稀疏阵列存算系统中存储阵列包括两部分,分别为第一存储阵列和第二存储阵列;第一存储阵列用于存储稀疏矩阵非零元的坐标索引表;第二存储阵列用于存储稀疏矩阵的元素,同时作为稀疏矩阵乘法运算的原位计算核;这种存储方式可以有效提升存内计算中稀疏矩阵向量乘法的存储效率,并保证计算的可靠性。The storage array in the sparse array storage and calculation system provided in the present invention includes two parts, respectively the first storage array and the second storage array; the first storage array is used to store the coordinate index table of the non-zero element of the sparse matrix; the second storage array It is used to store the elements of the sparse matrix, and at the same time as the in-situ calculation core of the sparse matrix multiplication operation; this storage method can effectively improve the storage efficiency of the sparse matrix-vector multiplication in the in-memory calculation, and ensure the reliability of the calculation.
本发明中分块存储调度单元将稀疏矩阵分块成若干子矩阵后剔除掉子矩阵中的0元素,按照不同的压缩格式将各子矩阵存储至第二存储阵列,且建立系数觉镇对应的索引表,存储至第一存储阵列;因为稀疏矩阵存在很多0元素,不仅浪费存储空间,而且在计算过程中会增加一些不必要的能耗和计算延时,因此,分块存储调度单元将稀疏矩阵中的0删除掉,可以兼顾存储效率的同时,能够保留存内计算执行矩阵向量乘法的并行性,其中,对于对角矩阵和三角矩阵的压缩效率提升尤为明显。In the present invention, the block storage scheduling unit divides the sparse matrix into several sub-matrices and removes the 0 elements in the sub-matrices, stores each sub-matrix in the second storage array according to different compression formats, and establishes the corresponding The index table is stored in the first storage array; because there are many 0 elements in the sparse matrix, not only the storage space is wasted, but also some unnecessary energy consumption and calculation delay will be added during the calculation process. Therefore, the block storage scheduling unit will be sparse The deletion of 0 in the matrix can take into account the storage efficiency while retaining the parallelism of performing matrix-vector multiplication for in-memory calculations. Among them, the compression efficiency improvement for diagonal matrices and triangular matrices is particularly obvious.
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, All should be included within the protection scope of the present invention.

Claims (9)

  1. 一种稀疏矩阵存算系统,其特征在于,包括:两两相互连接的第一存储阵列、第二存储阵列、第一外围电路、第二外围电路、主处理器、片上缓存和分块存储调度单元;A sparse matrix storage and calculation system is characterized in that it includes: a first storage array, a second storage array, a first peripheral circuit, a second peripheral circuit, a main processor, an on-chip cache, and a block storage scheduling connected in pairs unit;
    所述第一存储阵列用于存储稀疏矩阵非零元的坐标索引表;所述第二存储阵列用于存储稀疏矩阵的元素,同时作为稀疏矩阵乘法运算的原位计算核;The first storage array is used to store the coordinate index table of the non-zero elements of the sparse matrix; the second storage array is used to store the elements of the sparse matrix, and at the same time serve as an in-situ computing core of the sparse matrix multiplication operation;
    所述片上缓存用于在执行稀疏矩阵乘法运算时加载稀疏矩阵的索引表,并将索引表中地址译码和选通开关位置的选择分别传送至第一外围电路和第二外围电路;并存储中间运算结果,待计算任务全部结束,将所有中间运算结果返回至主处理器;The on-chip buffer is used to load the index table of the sparse matrix when performing the multiplication operation of the sparse matrix, and transmit the address decoding in the index table and the selection of the strobe switch position to the first peripheral circuit and the second peripheral circuit respectively; and store Intermediate calculation results, after all calculation tasks are completed, return all intermediate calculation results to the main processor;
    所述分块存储调度单元用于将稀疏矩阵分块成若干子矩阵后,按照不同的压缩格式将各子矩阵存储至第二存储阵列;且建立剩余子矩阵对应的索引表,存储至第一存储阵列;The block storage scheduling unit is used to block the sparse matrix into several sub-matrices, store each sub-matrix to the second storage array according to different compression formats; and establish an index table corresponding to the remaining sub-matrix, and store it in the first storage array;
    所述第一外围电路用于根据接收的地址译码,对第一存储阵列中的索引表进行读写,将读写的稀疏矩阵的索引表传输至片上缓存;The first peripheral circuit is used to decode the received address, read and write the index table in the first storage array, and transmit the index table of the read-write sparse matrix to the on-chip cache;
    所述第二外围电路用于将向量转换为电压信号,并根据选通开关位置的选择打开对应的开关,电压信号通过已打开的开关施加在稀疏矩阵的子矩阵对应的位线或字线上,并通过字线或位线读取中间运算结果存入片上缓存;The second peripheral circuit is used to convert the vector into a voltage signal, and open the corresponding switch according to the selection of the position of the strobe switch, and the voltage signal is applied to the bit line or word line corresponding to the sub-matrix of the sparse matrix through the opened switch , and read the intermediate operation results through the word line or bit line and store them in the on-chip cache;
    所述主处理器用于分析稀疏矩阵的类型;接收中间运算结果;并将接收的向量传递至第二外围电路。The main processor is used for analyzing the type of the sparse matrix; receiving the intermediate operation result; and transferring the received vector to the second peripheral circuit.
  2. 根据权利要求1所述的稀疏矩阵存算系统,其特征在于,按照不同的压缩格式存储子矩阵的方法为:The sparse matrix storage and calculation system according to claim 1, wherein the method for storing the sub-matrix according to different compression formats is:
    剔除全0的子矩阵,并对剩余的各子矩阵前端和末端全零的行或列进行剔除,只存储非零元的行或列。The sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
  3. 根据权利要求1或2所述的稀疏矩阵存算系统,其特征在于,对子矩阵进行压缩时,采用直接调用缩行存储格式,将非零元素左移,使所有的元素压缩至同一行中进行存储。The sparse matrix storage and calculation system according to claim 1 or 2, wherein when compressing the sub-matrix, the storage format of directly calling the indented line is adopted, and the non-zero elements are left-shifted, so that all elements are compressed into the same line to store.
  4. 根据权利要求1所述的稀疏矩阵存算系统,其特征在于,所述第一存储阵列和所述第二存储阵列的结构为十字交叉杆结构,或晶体管-忆阻器级联结构,或单晶体管-多忆阻器级联结构。The sparse matrix storage and calculation system according to claim 1, wherein the structure of the first storage array and the second storage array is a cross bar structure, or a transistor-memristor cascade structure, or a single Transistor-multi-memristor cascade structure.
  5. 根据权利要求1或4所述的稀疏矩阵存算系统,其特征在于,所述第一存储阵列和所述第二存储阵列中的存储器为忆阻器,或阻变存储器,或相变存储器,或自选转移力矩-磁随机存储器,或NOR Flash器件或NAND Flash器件。The sparse matrix storage and calculation system according to claim 1 or 4, wherein the memories in the first storage array and the second storage array are memristors, or resistive change memory, or phase change memory, Or optional transfer torque-magnetic random access memory, or NOR Flash device or NAND Flash device.
  6. 根据权利要求5所述的稀疏矩阵存算系统,其特征在于,所述第一外围电路包括读写电路、驱动电路、数模转换器、模数转换器和地址译码器;The sparse matrix storage and calculation system according to claim 5, wherein the first peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter, and an address decoder;
    所述第二外围电路包括读写电路、驱动电路、数模转换器、模数转换器和选通开关。The second peripheral circuit includes a read-write circuit, a drive circuit, a digital-to-analog converter, an analog-to-digital converter and a gate switch.
  7. 一种稀疏矩阵存算方法,其特征在于,包括以下步骤:A sparse matrix storage and calculation method is characterized in that it comprises the following steps:
    通过识别判断稀疏矩阵的类型,将稀疏矩阵进行分割后,按照不同的压缩格式进行存储,并建立各子矩阵对应的索引表;By identifying and judging the type of the sparse matrix, the sparse matrix is divided, stored according to different compression formats, and an index table corresponding to each sub-matrix is established;
    当执行稀疏矩阵向量乘法运算时,将向量转换为电信号;Convert vectors to electrical signals when performing sparse matrix-vector multiplication;
    以各子矩阵为单元,顺次根据各子矩阵对应索引表中的地址译码,将电信号加载至子矩阵中,完成当前子矩阵与向量之间的乘法累加运算,存储当前中间运算结果。Taking each sub-matrix as a unit, sequentially decoding according to the address in the corresponding index table of each sub-matrix, loading the electrical signal into the sub-matrix, completing the multiplication and accumulation operation between the current sub-matrix and the vector, and storing the current intermediate operation result.
  8. 根据权利要求7所述的稀疏矩阵存算方法,其特征在于,按照不同的压缩格式存储子矩阵的方法为:The sparse matrix storage method according to claim 7, wherein the method for storing the sub-matrix according to different compression formats is:
    剔除全0的子矩阵,并对剩余的各子矩阵前端和末端全零的行或列进行剔除,只存储非零元的行或列。The sub-matrix with all 0s is eliminated, and the rows or columns with all zeros at the front and end of the remaining sub-matrices are eliminated, and only the rows or columns with non-zero elements are stored.
  9. 根据权利要求7或8所述的稀疏矩阵存算方法,其特征在于,所述子矩阵支持采用直接调用缩行存储格式,将非零元素左移,使所有的元素压缩至同一行中进行存储。The sparse matrix storage and calculation method according to claim 7 or 8, characterized in that, the sub-matrix supports a storage format that directly calls indented lines, shifts the non-zero elements to the left, and compresses all elements into the same row for storage .
PCT/CN2021/115335 2021-06-28 2021-08-30 Sparse matrix storage and computation system and method WO2023272917A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110717321.3A CN113506589B (en) 2021-06-28 2021-06-28 Sparse matrix storage system and method
CN202110717321.3 2021-06-28

Publications (1)

Publication Number Publication Date
WO2023272917A1 true WO2023272917A1 (en) 2023-01-05

Family

ID=78011073

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115335 WO2023272917A1 (en) 2021-06-28 2021-08-30 Sparse matrix storage and computation system and method

Country Status (2)

Country Link
CN (1) CN113506589B (en)
WO (1) WO2023272917A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304509A (en) * 2021-12-20 2023-06-23 华为技术有限公司 Matrix calculation method, chip and related equipment
CN116070685B (en) * 2023-03-27 2023-07-21 南京大学 Memory computing unit, memory computing array and memory computing chip

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090282207A1 (en) * 2008-05-06 2009-11-12 L-3 Communications Integrated Systems, L.P. System & method for storing a sparse matrix
CN102141976A (en) * 2011-01-10 2011-08-03 中国科学院软件研究所 Method for storing diagonal data of sparse matrix and SpMV (Sparse Matrix Vector) realization method based on method
CN109740116A (en) * 2019-01-08 2019-05-10 郑州云海信息技术有限公司 A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate
CN112507284A (en) * 2020-12-18 2021-03-16 清华大学 Method and device for realizing sparse matrix multiplication on reconfigurable processor array

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436438B (en) * 2011-12-13 2015-03-04 华中科技大学 Sparse matrix data storage method based on ground power unit (GPU)
US10275479B2 (en) * 2014-02-27 2019-04-30 Sas Institute Inc. Sparse matrix storage in a database
WO2018134740A2 (en) * 2017-01-22 2018-07-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device
CN110674462B (en) * 2019-12-04 2020-06-02 深圳芯英科技有限公司 Matrix operation device, method, processor and computer readable storage medium
CN111694544B (en) * 2020-06-02 2022-03-15 杭州知存智能科技有限公司 Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus
CN112182495B (en) * 2020-09-14 2024-04-19 华中科技大学 Binary domain matrix operation circuit based on memristor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090282207A1 (en) * 2008-05-06 2009-11-12 L-3 Communications Integrated Systems, L.P. System & method for storing a sparse matrix
CN102141976A (en) * 2011-01-10 2011-08-03 中国科学院软件研究所 Method for storing diagonal data of sparse matrix and SpMV (Sparse Matrix Vector) realization method based on method
CN109740116A (en) * 2019-01-08 2019-05-10 郑州云海信息技术有限公司 A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate
CN112507284A (en) * 2020-12-18 2021-03-16 清华大学 Method and device for realizing sparse matrix multiplication on reconfigurable processor array

Also Published As

Publication number Publication date
CN113506589B (en) 2022-04-26
CN113506589A (en) 2021-10-15

Similar Documents

Publication Publication Date Title
WO2023272917A1 (en) Sparse matrix storage and computation system and method
US11023336B2 (en) Memory-based distributed processor architecture
CN109766309B (en) Spin-save integrated chip
US10496371B2 (en) Key-value compaction
US9836277B2 (en) In-memory popcount support for real time analytics
KR102453542B1 (en) Memory device supporting skip calculation mode and method of operating the same
US8392641B2 (en) Microcontroller with an interrupt structure having programmable priority levels with each priority level associated with a different register set
US9766978B2 (en) System and method for performing simultaneous read and write operations in a memory
US10672445B2 (en) Memory device including local support for target data searching and methods of operating the same
KR20160062119A (en) Volatile memory architecture in non-volatile memory devices and related controllers
CN111459552B (en) Method and device for parallelization calculation in memory
US10387322B2 (en) Multiple read and write port memory
US11650941B2 (en) Computing tile
US20210191811A1 (en) Memory striping approach that interleaves sub protected data words
US20130182498A1 (en) Magnetic memory device and data writing method for magnetic memory device
US20160267946A1 (en) Stack memory device and method for operating same
US20210286727A1 (en) Dynamic random access memory (dram) with scalable meta data
US11768614B2 (en) Storage device operation orchestration
US10521237B2 (en) Memristor based multithreading
US20230153067A1 (en) In-memory computing method and circuit, semiconductor memory, and memory structure
TWI812391B (en) Computing-in-memory circuitry
US11960776B2 (en) Data protection for stacks of memory dice
US20210279128A1 (en) Buffer that supports burst transfers having parallel crc and data transmissions
US11474937B2 (en) Computing device and operation method thereof
Melnyk Parallel conflict-free ordered access memory device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947854

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21947854

Country of ref document: EP

Kind code of ref document: A1