CN115936101A

CN115936101A - Sparse data compression device and method for neural network tensor processor

Info

Publication number: CN115936101A
Application number: CN202211618646.7A
Authority: CN
Inventors: 汤梦饶; 罗闳訚; 周志新; 何日辉; 尤培坤
Original assignee: Xiamen Yipu Intelligent Technology Co ltd
Current assignee: Xiamen Yipu Intelligent Technology Co ltd
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2023-04-07

Abstract

The invention relates to the field of neural network tensor processors, in particular to a sparse data compression device and method for a neural network tensor processor. The method comprises the following steps: reading configuration data of compression operation and configuring each module of the sparse data compression device; reading sparse data according to the configuration information; judging and marking sparse feature points; deleting the sparse feature point data to generate sparse compressed data and a sparse mapping table; writing out sparse compressed data; and writing a sparse mapping table. According to the method, through the mode of traversing sparse data, judging and marking sparse feature points, deleting sparse feature points and constructing a sparse mapping table, 0 data deletion and data rearrangement operation of the sparse data can be efficiently realized, so that the compression of the sparse data is efficiently realized, and the method has the advantages of low complexity and high efficiency of compression calculation.

Description

Sparse data compression device and method for neural network tensor processor

Technical Field

The invention relates to the field of neural network tensor processors, in particular to a sparse data compression device and method for a neural network tensor processor.

Background

The neural network algorithm performs a calculation based on the dense data. The dense data refers to data which has fixed length, width and height dimensions and occupies a fixed memory space. The number of calculation operations for dense data is fixed, for example, the number of multiplication operations required for multiplication of two dense tensor data of a fixed size is fixed. The input data, the parameter data, the intermediate temporary data and the output data in the neural network algorithm have fixed sizes and occupy fixed memory space, so the neural network algorithm performs calculation based on dense data.

However, in actual neural network calculations, the calculation process of the neural network can produce many 0 data, especially when the activation function of the neural network algorithm is of some type that is more likely to produce a 0 value (e.g., relu activation will set all negative numbers to 0). Since 0 times any number is 0, the multiplication of 0 data is effectively skipped, thereby saving computational power and reducing computational time.

When a tensor data has many 0, for example, the number of 0 is greater than the number other than 0, the tensor data is generally called sparse data.

The impulse neural network algorithm naturally has the characteristic of sparse data. The impulse neural network processes impulse data, which refers to data consisting of time, coordinates, and polarity, which may come directly from the event sensor. For example, one data from an event sensor consists of Δ t, x, y, p, where Δ t refers to the time value, x and y refer to the coordinates in the sensor frame, and p refers to the polarity of the illumination change for the corresponding coordinate pixel (e.g., p equals 1 for an increase in illumination equal to-1 for a decrease in illumination). At a certain Δ t moment, the number of pixels with illumination change in the sensor image is limited, so that data with the same Δ t is limited (even a very small number), and therefore, input data of the impulse neural network algorithm has a sparse data characteristic.

Conventional neural network tensor processors store and compute based on dense data. Therefore, in the conventional neural network tensor processor, sparse data must be stored as dense data: sparse data can be viewed as dense data with many 0 s. Sparse data must also be computed as dense data: sparse data with a value of 0 may participate in the calculation.

For a traditional neural network tensor processor, sparse data is the same as dense data, and the sparse data and the dense data have the same memory occupation and calculation operation. Therefore, although many 0 data in the sparse data do not hold valid information, they still participate in storage and calculation, which results in waste of storage and calculation resources. This waste of storage and computational resources is particularly significant in impulse neural network computations.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the present invention provides a sparse data compression apparatus and method for a neural network tensor processor. In the sparse data compression apparatus and method, "useless 0" data of the sparse data is deleted, and the sparse data is stored in a "no 0" compressed form.

The specific scheme is as follows:

a sparse data compression device for a neural network tensor processor comprises sparse data RDMA, a configuration unit, a compression unit, sparse compression data WDMA, a sparse mapping table WDMA, sparse data, configuration data, sparse compression data and a sparse mapping table; the sparse data compression device is used for realizing conversion from sparse data to sparse compressed data;

the configuration unit is used for reading configuration data, configuring initial address and size information of the sparse data to the RDMA (remote direct memory Access), configuring initial address and size information of the sparse compressed data to the WDMA (sparse mapping table), and configuring initial address and size information of the sparse mapping table to the WDMA;

the RDMA is used for reading sparse data according to the starting address and the size information of the sparse data;

the compression unit is used for executing sparse data compression operation and generating sparse compressed data and a sparse mapping table;

the WDMA is used for responding to a write request of sparse compressed data and writing the sparse compressed data according to the initial address and the size information of the sparse compressed data;

and the sparse mapping table WDMA is used for responding to a sparse mapping table writing request and writing a sparse mapping table according to the initial address and the size information of the sparse mapping table.

Further, the sparse data refers to tensor data adopting an n-degree parallel storage scheme (C/n, H, W, n), and the data bit width is 8 bits or 16 bits; an original storage scheme (C, H, W) of the tensor data, W representing width, being the 0 th dimension of the data; h represents high, and is the 1 st dimension of the data; c represents a channel, which is the 2 nd dimension of the data;

the conversion method of the n-degree parallel storage scheme (C/n, H, W, n) comprises the following steps: for an original storage scheme (C, H, W) of tensor data, taking n continuous data in the C direction, storing the n continuous data in a physical address continuous mode, setting the n continuous data as a 0 th dimension, and fixing the length of the 0 th dimension as n; setting W as 1 st dimension with unchanged length; setting H as 2 nd dimension, and keeping the length unchanged; let C/n be the 3 rd dimension, C/n is expressed as the length C divided by n and rounded down.

Further, n is an integer multiple of 8.

Further, the sparse data has at least one sparse feature point; the sparse feature points refer to: in sparse data, if all the values of 0-dimensional data pointed to by 1-, 2-, and 3-dimensional coordinates (Z, Y, X) are 0, the (Z, Y, X) coordinates are referred to as a sparse feature point of the sparse data.

Further, the sparse data has N sparse feature points, the number of N being less than or equal to (C/N) × H × W.

Furthermore, the sparse compressed data refers to the sparse data with all 0-dimensional n numbers pointed by all sparse feature points deleted; the sparse mapping table refers to tensor data with the size of (C/n, H, W), and the data bit width is 1 bit; the sparse mapping table corresponds to sparse data: each 1-bit data of the sparse mapping table represents whether the values of the 0-th dimension n data of the corresponding sparse data are all 0.

A sparse data compression method for a neural network tensor processor, applied to a sparse data compression apparatus for a neural network tensor processor as described above, comprising:

reading configuration data of compression operation and configuring each module of the sparse data compression device; the configuration data comprises the initial address and the size of sparse data, the initial address and the size of sparse compressed data and the initial address and the size of a sparse mapping table;

reading sparse data according to the configuration information; for sparse data with the size of (C/n, H, W, n), reading n numbers of the 0 th dimension at one time, and sequentially reading the sparse data from the initial address in a mode of 1-dimensional, 2-dimensional and 3-dimensional sequential traversal;

judging and marking sparse feature points: for the n number of the 0 th dimension read each time, judging whether the n number is all 0; if the values of all n data are 0, marking the 1-dimensional coordinate (Z, Y, X) corresponding to the n data as a sparse feature point;

deleting the sparse feature point data to generate sparse compressed data and a sparse mapping table;

writing out sparse compressed data: responding to the sparse compressed data writing request, and writing the n data to corresponding addresses;

writing a sparse mapping table: and responding to the sparse mapping table writing request, and writing the sparse mapping table data to the corresponding address.

Further, the method for reading the sparse data according to the configuration information is as follows: for sparse data with the size of (C/n, H, W, n), reading n numbers of 0-dimension at one time, and sequentially reading the sparse data from the initial address in a mode of 1-dimension, 2-dimension and 3-dimension sequential traversal; the method for judging and marking the sparse feature points comprises the following steps: for the n number of the 0 th dimension read each time, judging whether the n number is all 0; if all the n data values are 0, the 1, 2, 3-dimensional coordinates (Z, Y, X) corresponding to the n data are marked as a sparse feature point.

Further, the method for deleting the sparse feature point data and generating the sparse compressed data and the sparse mapping table is as follows: skipping a writing-out stage of n data marked as sparse characteristic points, not sending sparse compressed data writing requests of the n data, and only sending a sparse mapping table writing request with a value of 0; and for the data which is not marked as the sparse characteristic point, sending sparse compression data write requests of the n data, and sending a sparse mapping table write request with the value of 1.

Further, the method for writing out the sparse compressed data is as follows: writing n numbers at a time, and writing data from the initial address in a mode of sequentially increasing the address.

Further, the writing out of the sparse mapping table is as follows: 1 piece of 1-bit data is written once, and the sparse mapping table is written from the starting address in a mode of sequentially increasing the address.

The invention realizes the following technical effects:

the sparse compression data provided by the invention expresses the condition that all the 16 data with 0 dimension have 0 as the sparse characteristic points which can be compressed and deleted. The sparse feature points widely exist in the neural network calculation process, particularly in the impulse neural network calculation process. Sparse information of sparse data can be greatly mined through the sparse feature points.

By traversing sparse data, judging and marking sparse feature points, deleting sparse feature points and constructing a sparse mapping table, the method can efficiently realize the 0 data deletion and data rearrangement operation of the sparse data, thereby efficiently realizing the compression of the sparse data, and has the advantages of low complexity and high efficiency of compression calculation.

Drawings

FIG. 1 is a functional block diagram of a sparse data compression apparatus of the present invention;

FIG. 2 is a schematic diagram of the 16 degree parallel storage scheme (C/16, H, W, 16) of the present invention;

FIG. 3 is an example of sparse compressed data and its corresponding sparse mapping table of the present invention;

FIG. 4 is a flow chart of the sparse data compression method of the present invention.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures. Elements in the figures are not drawn to scale and like reference numerals are generally used to indicate like elements.

The invention will now be further described with reference to the drawings and the detailed description.

The invention provides a sparse data compression device and method for a neural network tensor processor. The sparse data compression device is shown in fig. 1 and comprises sparse data RDMA, a configuration unit, a compression unit, sparse compressed data WDMA, a sparse mapping table WDMA, sparse data, configuration data, sparse compressed data and a sparse mapping table. The sparse data compression device is used for realizing conversion from sparse data to sparse compressed data.

The sparse data refers to tensor data using a 16-degree parallel storage scheme (C/16, H, W, 16), and the data bit width is usually 8 bits or 16 bits. The sparse data may be derived from tensor data transformations employing the original storage scheme (C, H, W). The original storage scheme, W represents width, is the 0 th dimension of the data; h represents high, and is the 1 st dimension of the data; c represents a channel, which is the 2 nd dimension of the data.

The conversion method of the 16-degree parallel storage scheme (C/16, H, W, 16) is as follows: for the original storage scheme (C, H, W), 16 continuous data in the direction C are taken and stored in a physical address continuous mode, the data are set to be the 0 th dimension, and the length of the 0 th dimension is fixed to be 16; setting W as 1 st dimension with unchanged length; setting H as 2 nd dimension, and keeping the length unchanged; let C/16 be 3 d, C/16 is expressed as the length C divided by 16 and rounded down. An example of a transition is shown in fig. 2.

The sparse data possesses at least one sparse feature point. The sparse feature points refer to: in sparse data, if all the values of the 0-dimensional 16 data to which the 1, 2, 3-dimensional coordinates (Z, Y, X) point are 0, the (Z, Y, X) coordinates are referred to as a sparse feature point of the sparse data. The sparse data may have N sparse feature points, the number of N being less than or equal to (C/16) H W.

The sparse compressed data refers to the sparse data with all the deleted 0-dimensional 16 numbers pointed by all the sparse feature points. Furthermore, in order to compensate for information loss (some data is deleted) caused by sparse data compression, a sparse mapping table is used for storing original complete information of sparse data. The sparse mapping table refers to tensor data with the size of (C/16, H, W), and the data bit width is 1 bit. The sparse mapping table corresponds to sparse data: each 1-bit data of the sparse mapping table represents whether the values of the 0-dimension 16 data of the corresponding sparse data are all 0. For example, the value of 1-bit data pointed by the 0, 1, 2-dimensional coordinate (Z, Y, X) in a sparse mapping table of size (C/16, h, w) represents whether all 16 data of dimension 0 pointed by the corresponding 1, 2, 3-dimensional coordinate (Z, Y, X) in the sparse data of size (C/16, h, w, 16) are 0. When a certain 1-bit data in the sparse mapping table is 0, all of the 16 data representing corresponding sparse data are 0. When a certain 1-bit data in the sparse mapping table is 1, at least one non-0 data exists in 16 data representing corresponding sparse data. An example of sparse compressed data and its corresponding sparse mapping table is shown in fig. 3.

And the sparse compressed data is obtained by converting sparse data through a sparse data compression device.

The sparse data of the sparse data compression device can be characteristic data or parameter data of a neural network, and the sparse compressed data can be the characteristic data or parameter data of the neural network.

In the sparse data compression device, the configuration unit is used for reading configuration data, configuring initial address and size information of the sparse data to the RDMA (remote direct memory access), configuring initial address and size information of the sparse compressed data to the WDMA (sparse mapping machine), and configuring initial address and size information of the WDMA to the WDMA. The size of the sparse data is expressed as (C/16, H, W, 16), the size of the sparse compression data is expressed as (C/16, H, W, 16), and the size of the sparse mapping table is expressed as (C/16, H, W).

In the sparse data compression apparatus, the sparse data RDMA is configured to read sparse data according to the start address and size information of the sparse data. The RDMA reads 16 numbers of the 0 th dimension at a time, and sequentially reads the sparse data from the initial address in a mode of sequentially traversing the 1, 2 and 3 dimensions.

In the sparse data compression apparatus, the compression unit is configured to perform a sparse data compression operation and generate sparse compressed data and a sparse mapping table. The compression unit sequentially acquires sparse data in sequence. All 16 sparse data 0 th dimension data are read each time, and whether the values of the 16 sparse data are all 0 is judged. If the value of at least one of the 16 numbers is not 0, sending a sparse compressed data write request of the 16 data, and sending a sparse mapping table write request with the value of 1. If the 16 data values are all 0, the sparse compression data write request of the 16 data is not sent (that is, all 0 data are skipped), and only the sparse mapping table write request with the value of 0 is sent.

In the sparse data compression device, the sparse compressed data WDMA is used for responding to a sparse compressed data write request and writing out sparse compressed data according to the initial address and size information of the sparse compressed data. The WDMA writes 16 sparse compressed data at a time, and writes the sparse compressed data in a mode of sequentially increasing addresses for effective write requests from the compression unit from the start address.

In the sparse data compression device, the sparse mapping table WDMA is used for responding to a sparse mapping table writing request and writing a sparse mapping table according to the initial address and the size information of the sparse mapping table. The sparse mapping table WDMA writes 1 bit of data once, and writes the sparse mapping table from the initial address in a mode of sequentially increasing the address for the effective write request from the compression unit.

The compression of the sparse data is lossless, namely, the compressed sparse data and the sparse mapping table can completely express the sparse data before compression. Therefore, the sparse compressed data has complete original sparse data information and can be directly used for calculation of the neural network, and therefore the memory bandwidth requirement and the calculation resource requirement are reduced.

The sparse data compression method is shown in fig. 4:

(1) And (4) configuring. Reading configuration data of compression operation, wherein the configuration data mainly comprises the initial address and size of sparse data, the initial address and size of sparse compression data and the initial address and size of a sparse mapping table, and configuring each module of the sparse data compression device.

(2) And reading the sparse data. And reading the sparse data according to the configuration information. For sparse data of size (C/16, H, W, 16), the reading method is: reading 16 numbers of 0 th dimension at a time, and sequentially reading the sparse data from the initial address in a mode of sequentially traversing 1, 2 and 3 dimensions.

(3) And judging and marking sparse feature points. For each read 16 numbers in the 0 th dimension, it is determined whether all the 16 numbers have values of 0. If all the 16 data values are 0, the 1, 2, 3-dimensional coordinates (Z, Y, X) corresponding to the 16 data are marked as a sparse feature point.

(4) And deleting the sparse feature point data to generate sparse compressed data and a sparse mapping table. 16 data marked as sparse feature points are deleted, and the deletion method comprises the following steps: and skipping the writing-out stage of the 16 data marked as the sparse characteristic points, not sending the sparse compressed data writing requests of the 16 data, and only sending the sparse mapping table writing request with the value of 0. And for the data which is not marked as the sparse characteristic point, sending a sparse compression data write request of the 16 data, and sending a sparse mapping table write request with the value of 1.

(5) And writing out sparse compressed data. And responding to the sparse compressed data write request, and writing the 16 data to the corresponding addresses. The writing method comprises the following steps: 16 numbers are written out at a time, and data are written out from the initial address in a mode of sequentially increasing the address.

(6) And writing a sparse mapping table. And responding to the writing request of the sparse mapping table, and writing the data of the sparse mapping table to a corresponding address. The writing method comprises the following steps: 1 piece of 1-bit data is written once, and the sparse mapping table is written from the starting address in a mode of sequentially increasing the address.

In the present embodiment, the sparse data refers to tensor data using a 16-degree parallel storage scheme (C/16, H, W, 16). In a particular application, to accommodate the size and processing power of different tensor processors, the sparse data may be more broadly defined as tensor data employing an n degree parallel storage scheme (C/n, H, W, n), where n is an integer multiple of 8. While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A sparse data compression device for a neural network tensor processor is characterized by comprising a sparse data RDMA, a configuration unit, a compression unit, a sparse compression data WDMA, a sparse mapping table WDMA, sparse data, configuration data, sparse compression data and a sparse mapping table; the sparse data compression device is used for realizing conversion from sparse data to sparse compressed data;

the WDMA is used for responding to a sparse compressed data writing request and writing sparse compressed data according to the initial address and size information of the sparse compressed data;

2. The sparse data compression apparatus for a neural network tensor processor of claim 1, wherein the sparse data refers to tensor data employing an n-degree parallel storage scheme (C/n, H, W, n), a data bit width being 8 bits or 16 bits; an original storage scheme (C, H, W) of the tensor data, W representing width, being the 0 th dimension of the data; h represents high, and is the 1 st dimension of the data; c represents a channel, which is the 2 nd dimension of the data;

the conversion method of the n-degree parallel storage scheme (C/n, H, W, n) comprises the following steps: for an original storage scheme (C, H, W) of tensor data, taking n continuous data in the C direction, storing the n continuous data in a physical address continuous mode, setting the n continuous data as a 0 th dimension, and fixing the length of the 0 th dimension as n; setting W as 1 st dimension with unchanged length; setting H as 2 nd dimension, and keeping the length unchanged; let C/n be 3 d, expressed as length C divided by n and rounded down.

3. The sparse data compression apparatus for a neural network tensor processor of claim 1, wherein the n is an integer multiple of 8.

4. The sparse data compression apparatus for a neural network tensor processor of claim 2 wherein the sparse data possesses at least one sparse feature point; the sparse feature points refer to: in sparse data, if all the values of 0-dimensional data pointed to by 1-, 2-, and 3-dimensional coordinates (Z, Y, X) are 0, the (Z, Y, X) coordinates are referred to as a sparse feature point of the sparse data.

5. The sparse data compression apparatus for a neural network tensor processor of claim 4 wherein the sparse data has N sparse feature points, the number of N being less than or equal to (C/N) H W.

6. The sparse data compression apparatus for a neural network tensor processor of claim 1, wherein the sparse compressed data refers to all of the 0-dimensional n number of deleted sparse data to which all of the sparse feature points point; the sparse mapping table is tensor data with the size of (C/n, H, W), and the data bit width is 1 bit; the sparse mapping table corresponds to sparse data: each 1-bit data of the sparse mapping table represents whether the values of the 0-dimension n data of the corresponding sparse data are all 0.

7. A sparse data compression method for a neural network tensor processor, applied to the sparse data compression apparatus for a neural network tensor processor as recited in any one of claims 2 to 6, comprising:

8. The sparse data compression method for a neural network tensor processor as recited in claim 7, wherein the method of reading the sparse data according to the configuration information is: for sparse data with the size of (C/n, H, W, n), reading n numbers of the 0 th dimension at one time, and sequentially reading the sparse data from the initial address in a mode of 1-dimensional, 2-dimensional and 3-dimensional sequential traversal; the method for judging and marking the sparse feature points comprises the following steps: for the n number of the 0 th dimension read each time, judging whether the n number is all 0; if all the n data values are 0, the 1, 2, 3-dimensional coordinates (Z, Y, X) corresponding to the n data are marked as a sparse feature point.

9. The sparse data compression method for a neural network tensor processor of claim 7, wherein the removing sparse feature point data, generating sparse compressed data and a sparse mapping table is by: skipping a writing-out stage of n data marked as sparse characteristic points, not sending sparse compressed data writing requests of the n data, and only sending a sparse mapping table writing request with a value of 0; and for the data which is not marked as the sparse characteristic point, sending sparse compression data write requests of the n data, and sending a sparse mapping table write request with the value of 1.

10. The method of sparse data compression for a neural network tensor processor of claim 7, wherein the method of writing out sparse compressed data is: writing n numbers at a time, and writing data from the initial address in a mode of sequentially increasing the address.

11. The sparse data compression method for a neural network tensor processor as recited in claim 7, wherein the sparse mapping table is written by: 1 piece of 1-bit data is written once, and the sparse mapping table is written from the starting address in a mode of sequentially increasing the address.