CN114077718A - Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium - Google Patents
Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium Download PDFInfo
- Publication number
- CN114077718A CN114077718A CN202010807002.7A CN202010807002A CN114077718A CN 114077718 A CN114077718 A CN 114077718A CN 202010807002 A CN202010807002 A CN 202010807002A CN 114077718 A CN114077718 A CN 114077718A
- Authority
- CN
- China
- Prior art keywords
- data
- matrix
- circuit
- reading
- control signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 213
- 238000004364 calculation method Methods 0.000 title claims abstract description 119
- 230000004044 response Effects 0.000 claims description 11
- 238000010586 diagram Methods 0.000 description 14
- 238000000034 method Methods 0.000 description 13
- 239000000872 buffer Substances 0.000 description 8
- 239000004814 polyurethane Substances 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 3
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 3
- 229920002635 polyurethane Polymers 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The embodiment of the disclosure discloses a matrix calculation circuit, a matrix calculation method, electronic equipment and a computer-readable storage medium. Wherein the matrix calculation circuit includes: a first data reading circuit for reading a plurality of first data in the first matrix according to a read address of the first matrix; generating a second data read control signal according to the plurality of first data; the second data reading circuit is used for reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix; and the calculation circuit is used for calculating third data according to the plurality of first data and the plurality of second data. The matrix calculation circuit generates the reading control signal of the second data through the plurality of read first data, and solves the problem that calculation time is wasted due to the fact that data which do not need to be calculated are calculated in the prior art.
Description
Technical Field
The present disclosure relates to the field of processors, and in particular, to a matrix calculation circuit, a matrix calculation method, an electronic device, and a computer-readable storage medium.
Background
With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher. Chips are the cornerstone of task assignment, which fundamentally determines the ability of people to process data. From the application field, the chip mainly has two routes: one is a generic chip path, such as a CPU or the like, which offers great flexibility but is less computationally efficient in processing domain-specific algorithms; the other is a special chip route, such as TPU and the like, which can exert higher effective computing power in certain specific fields, but have poorer or even no processing capability in the more flexible and changeable and more general fields. Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.
In the neural network calculation, the convolution calculation accounts for most of the total operation amount, and the convolution calculation can be converted into matrix multiplication calculation, so that the matrix multiplication calculation speed is improved to improve the throughput in the neural network task, reduce the time delay and improve the effective calculation power of a chip.
FIG. 1a is a schematic diagram of a matrix multiplication computation in a neural network. As shown in FIG. 1a, M1 is a data matrix, M2 is a parameter matrix, and M is an output matrix. And performing multiply-add calculation on one row of data in M1 and one column of parameters in M2 to obtain one data in M.
FIG. 1b is a diagram of the hardware used for matrix calculation. As shown in fig. 1b, in order to better increase the speed of matrix calculation and improve the efficiency of data utilization, the matrix calculation is often implemented by using a calculation unit array. Taking an array formed by MxN (M >1, N >1) computing units as an example, data can be fully utilized through the computing unit array. For example, for one element in M1, it can be multiplexed by N compute units in the same row at the same time, and for one element in M2, it can be multiplexed by M compute units in the same column at the same time. The calculation of a column element of M1 and a row element of M2 may be done each time.
The above scheme has the following disadvantages: data matrixes and/or parameter matrixes in many neural networks are sparse matrixes, that is, a large number of 0 exist in data of the matrixes, and the 0 elements are also used as normal elements for calculation, so that the calculation time is wasted, the exertion of the calculation capacity of a chip is reduced, and the power consumption is increased.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to solve the technical problems of inflexible task allocation and complex control of processing cores in the prior art, the embodiment of the disclosure provides the following technical solutions:
in a first aspect, an embodiment of the present disclosure provides a matrix calculation circuit, including:
a first data reading circuit for reading a plurality of first data in the first matrix according to a read address of the first matrix; generating a second data read control signal according to the plurality of first data;
the second data reading circuit is used for reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix;
and the calculation circuit is used for calculating third data according to the plurality of first data and the plurality of second data.
Further, the first data reading circuit is configured to generate a second data reading control signal according to the plurality of first data, and includes:
the first data reading circuit is configured to generate the second data reading control signal according to whether each of the plurality of data is zero.
Further, the generating the second data reading control signal according to whether each of the plurality of data is zero specifically includes:
generating the second data read control signal indicating a read address to accumulate the second matrix in response to a value of each of the plurality of first data being zero; or,
generating the second data read control signal instructing the second data read circuit to read the plurality of second data according to the read address of the second matrix in response to a value of at least one of the plurality of first data being not zero.
Further, the first data reading circuit includes:
a first numerical value comparison circuit for judging whether a value of each of the plurality of first data is zero;
and the first control circuit is used for generating the second data reading control signal according to the judgment result of the first numerical comparison circuit.
Further, the second data reading circuit further includes:
a second numerical value comparison circuit for judging whether a value of each of the plurality of second data is zero;
and the second control circuit is used for generating a calculation control signal according to the judgment result of the second numerical comparison circuit.
Further, the second control circuit is configured to:
generating the calculation control signal to instruct the calculation circuit to perform a calculation operation in response to at least one of the plurality of second data being non-zero.
Further, the first control circuit includes:
the first control signal generating circuit is used for generating a first control signal according to the judgment result of the first numerical value comparison circuit;
and the first read address generating circuit is used for accumulating the read addresses of the first matrix to obtain a new read address of the first matrix.
Further, the plurality of first data are a column of first data in the first matrix; the plurality of second data are a row of second data in the second matrix.
Further, the computation circuit includes:
a computing unit array, wherein the computing unit array comprises a plurality of computing units;
each row of computing units in the computing unit array respectively receives the plurality of first data;
each column of computing units in the computing unit array respectively receives the plurality of second data. In a second aspect, an embodiment of the present disclosure provides a matrix calculation method, including:
reading a plurality of first data in a first matrix according to the reading address of the first matrix;
generating a second data read control signal according to the plurality of first data;
reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix;
and calculating to obtain third data according to the plurality of first data and the plurality of second data. In a third aspect, an embodiment of the present disclosure provides a chip including the matrix calculation circuit described in any one of the first aspects.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions, such that the processors when executed implement the matrix computation method of any of the preceding first aspects.
In a fifth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to execute the matrix calculation method according to any one of the foregoing first aspects.
In a sixth aspect, an embodiment of the present disclosure provides a computer program product, wherein: comprising computer instructions which, when executed by a computing device, may perform the matrix calculation method of any of the preceding first aspects.
In a seventh aspect, an embodiment of the present disclosure provides a computing device, including one or more chips described in the third aspect.
The embodiment of the disclosure discloses a matrix calculation circuit, a matrix calculation method, electronic equipment and a computer-readable storage medium. Wherein the matrix calculation circuit includes: a first data reading circuit for reading a plurality of first data in the first matrix according to a read address of the first matrix; generating a second data read control signal according to the plurality of first data; the second data reading circuit is used for reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix; and the calculation circuit is used for calculating third data according to the plurality of first data and the plurality of second data. The matrix calculation circuit generates the reading control signal of the second data through the plurality of read first data, and solves the problem that calculation time is wasted due to the fact that data which do not need to be calculated are calculated in the prior art.
The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
FIGS. 1a and 1b are schematic diagrams of the prior art of the present disclosure;
fig. 2 is a schematic structural diagram of a matrix calculation circuit provided in an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a storage format of a first matrix according to an embodiment of the disclosure;
fig. 4 is a schematic structural diagram of a first data reading circuit according to an embodiment of the disclosure;
FIG. 5 is a schematic structural diagram of a second data reading circuit according to an embodiment of the disclosure;
6a-6e are schematic diagrams of an application example of the disclosed embodiments;
fig. 7 is a flowchart of a matrix calculation method according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 2 is a schematic diagram of a matrix calculation circuit provided in an embodiment of the present disclosure. The matrix calculation circuit (EU)200 provided in the present embodiment includes:
a first data reading circuit (LD _ M1)201 for reading a plurality of first data in the first matrix according to a read address of the first matrix; generating a second data read control signal according to the plurality of first data;
a second data reading circuit (LD _ M2)202 for reading a plurality of second data in the second matrix according to the second data reading control signal and a reading address of the second matrix;
and a calculating circuit 203 for calculating third data according to the plurality of first data and the plurality of second data.
Illustratively, the read address of the first matrix is a storage head address of the first matrix, and the read address of the second matrix is a storage head address of the second matrix. The storage head address of the first matrix and the storage head address of the second matrix are obtained through an instruction decoding circuit ID (instruction decoder), and the instruction decoding circuit is used for decoding a matrix calculation instruction to obtain the storage head address of the first matrix, the storage head address of the second matrix, the size of the first matrix and the size of the second matrix and other parameters.
Illustratively, the matrix calculation instruction includes an instruction type, a storage head address of a first matrix participating in calculation, a storage head address of a second matrix, and size parameters of the first matrix and the second matrix. In one embodiment, the instruction type is a multiplication instruction of a matrix, the first matrix is a data matrix in the neural network convolution calculation, and the second matrix is a parameter matrix in the neural network convolution calculation; the first matrix and/or the second matrix are/is a sparse matrix, and a large number of elements in the sparse matrix are 0. It is understood that the memory head address of the matrix and the size parameter of the matrix (such as the number of rows and columns of the matrix) in the matrix calculation instruction may be represented in the form of register addresses, and the instruction decoding circuit acquires corresponding data from the corresponding register addresses.
In the embodiment of the present disclosure, the first data reading circuit 201 receives the first address of the first matrix decoded by the instruction decoding circuit, and generates a reading address of the first matrix according to the first address; and reading a plurality of first data in the first matrix at one time according to the reading address of the first matrix. Illustratively, the plurality of first data are data of a column in a first matrix, and when the first matrix is stored, the first matrix is stored in a sequence of a first column and a second row, that is, the first matrix is stored column by column. The first matrix M1 shown in fig. 3 is stored by storing a11-a41 first, then a12-a42, and so on until the entire first matrix is stored. a11-a44 are stored logically contiguously. And when the storage head address of the first matrix is PD and the row number of the first matrix is L1, the first data is read by the first data reading circuit according to the current storage head address each time the first data is read, and after one column of data of the first matrix is read, the read address of the first matrix is automatically added with L1, namely PD is PD + L1, so as to obtain the next read head address of the first matrix until the number of reading times reaches the column number of the first matrix. Taking M1 as an example as shown in fig. 3, PD + 4.
In the embodiment of the present disclosure, after the first data reading circuit 201 reads the plurality of first data, a second data reading control signal is generated according to the plurality of first data. In matrix calculation, the matrix involved in the calculation may be a sparse matrix, that is, many data in the first matrix are all 0, and many calculations may actually be skipped at this time, for example, in matrix multiplication, one data of the first matrix is 0, and the result is 0 regardless of the value of the corresponding second data of the second matrix, and the calculation may not be performed. Therefore, optionally, the first data reading circuit is configured to generate the second data reading control signal according to the plurality of first data, and includes: the first data reading circuit is configured to generate the second data reading control signal according to whether each of the plurality of data is zero.
Specifically, in response to a value of each of the plurality of first data being zero, generating the second data read control signal indicating a read address of the accumulated second matrix; or, in response to a value of at least one of the plurality of first data being not zero, generating the second data read control signal instructing the second data read circuit to read the plurality of second data according to a current read address of the second matrix.
In this embodiment, if the value of each of the plurality of data is 0, the plurality of second data corresponding to the data may not be read, and at this time, the second data read control signal is generated to instruct to accumulate the read address of the second matrix, and skip the current reading of the plurality of second data; and if the value of at least one of the plurality of data is not zero, the calculation is required to be executed, and at this time, a second data reading control signal is generated to instruct the second data reading circuit to read a plurality of second data corresponding to the plurality of first data according to the current reading address of the second matrix.
In the embodiment of the present disclosure, the second data reading circuit 202 receives the first address of the second matrix decoded by the instruction decoding circuit, and generates a reading address of the second matrix according to the first address and the second data reading control signal generated by the first data reading circuit 201 to read a plurality of second data in the second matrix. Optionally, the second data reading circuit reads a row of data in the second matrix each time, when the second matrix is stored, the data is stored in an order of a first row and a second row of the second matrix, that is, the data is stored row by row, each time one row of the data is read, a storage head address of the second matrix is set to be PW, and a column number of the second matrix is set to be L2, so that the second data is read by the second data reading circuit each time according to a current storage head address, and after a row of data of the second matrix is read, a read address of the second matrix is automatically added to L2, that is, PW + L2, so as to obtain a read head address of the next second matrix until the number of reads reaches the row number of the second matrix.
Specifically, the second data reading circuit needs to determine an actual reading address of the second matrix according to the second data reading control signal and a current reading address of the second matrix. If the second data reading control signal indicates that the reading address of the second matrix is accumulated, obtaining a new reading address of the second matrix by using PW + L2, and then continuously waiting for a new second data reading control signal; if the second data reading control signal indicates that the second data is read by the current reading address of the second matrix, a plurality of second data are read by the PW, and then the next reading address of the second matrix is calculated by the PW + L2, and a new second data reading control signal is waited.
After obtaining the plurality of first data and the plurality of second data, the calculation unit performs the calculation indicated by the matrix calculation instruction according to the plurality of first data and the plurality of second data to obtain third data, wherein the third data is data or partial data in the output matrix.
Optionally, the computing circuit includes: a computing unit array, wherein the computing unit array comprises a plurality of computing units; each row of computing units in the computing unit array respectively receives the plurality of first data; each column of computing units in the computing unit array respectively receives the plurality of second data. Specifically, as shown in fig. 2, the computing circuit includes a computing unit array including a plurality of computing units PU, each row of computing units receiving one of the plurality of first data, and taking M1 in fig. 3 as an example, when the first data in a column is a11-a41, the first row of computing units receives a11, the second row of computing units receives a21, the third row of computing units receives a31, and the fourth row of computing units receives a 41. Each row of the computing units in the computing unit array respectively receives the plurality of second data, similar to the above example, each row of the computing units receives the same second data, and the computing units in different rows receive different second data, which will be described in detail later and will not be described herein again.
As shown in fig. 4, in order to implement the function of the first data reading circuit, optionally, the first data reading circuit further includes:
a first numerical value comparison circuit 401(Z _ D) for determining whether a value of each of the plurality of first data is zero;
the first control circuit 402(Ctrl1) is configured to generate the second data reading control signal according to the determination result of the first value comparison circuit.
As shown in fig. 4, the first numerical value comparison circuit 401 includes a plurality of data comparison buffer circuits Z _ Det, and after a plurality of first data are read from the memory or the storage area M1, each of the first data is buffered in one Z _ Det circuit, and in the Z _ Det circuit, it is determined whether the first data is 0, and the determination result Z is obtained0-ZM-1. Illustratively, the Z _ Det circuit includes an or gate circuit and a not gate circuit, wherein the or gate circuit includes X inputs, where X is the number of bits of the first data, and each input corresponds to one bit of the first data; for example, if X is 4, the or gate circuit includes 4 input terminals corresponding to the first Data [3:0 ═ respectively]After each bit in the first data is subjected to logical OR operation according to the bit through the OR gate circuit, a judgment result is obtained through the NOT gate circuit; for example, when the first data is 1110, performing a logical or operation by bit to obtain a result of 1, and obtaining a result of 0 through a not circuit, determining that the first data is not 0; and if the first data is 0000, performing logical OR operation according to the bit to obtain a result of 0, and if the result of the logical OR operation is 1 through the NOT circuit, judging that the first data is 0. It is to be understood that the Z _ Det may not include the not gate circuit, and when the result output by the or gate is 1, it is determined that the first data is not 0, and when the result output by the or gate is 0, it is determined that the first data is 0.
As shown in fig. 4, the plurality of determination results Z of the plurality of data comparison buffer circuits Z _ Det0-ZM-1The first control circuit 402 is input, and the first control circuit 402 generates the second data reading control signal C1 according to the judgment result of the first numerical value comparison circuit. Illustratively, the first control circuit includes a first read address generation circuit AG1 and a first control signal generation circuit CL1, wherein the first control circuit includes a first control signal generation circuit CL1The first control signal generation circuit CL1 receives the plurality of determination results Z0-ZM-1To generate the second data read control signal C1, the exemplary first control signal generating circuit CL1 includes a multi-input and gate circuit, each input of the and gate circuit is used to receive one of the determination results Z0-ZM-1(ii) a When said Z is0-ZM-1When both are 1, C1 is 1; when said Z is0-ZM-1When not all of them are 0, C1 is 0. The second data reading circuit 202 receives the second data reading control signal C1, and if the C1 is 1, which indicates that the plurality of fetched first data are all 0, the second data reading circuit 202 accumulates the reading address of the second matrix and does not read the second data corresponding to the plurality of first data; if C1 is 0, which means that the plurality of second data fetched are not all 0, the first data reading circuit 201 reads the plurality of second data corresponding to the plurality of first data according to the current reading address of the second matrix.
As shown in fig. 5, in order to implement the function of the second data reading circuit, optionally, the second data reading circuit further includes:
a second numerical value comparison circuit 501(DB) for judging whether a value of each of the plurality of second data is zero;
and a second control circuit 502(Ctrl2) for generating a calculation control signal according to the judgment result of the second numerical value comparison circuit.
Optionally, the structure of the second numerical comparison circuit 501 is the same as that of the first numerical comparison circuit 401, and the second numerical comparison circuit also includes a plurality of data comparison buffer circuits, and the step of determining whether the value of each of the plurality of second data is zero is the same as the step of determining whether the value of each of the plurality of first data is zero by the first numerical comparison circuit, which is not described herein again.
The second control circuit 502 generates a calculation control signal C _ Start of 0 so that the calculation circuit does not perform calculation when the determination result of the second data comparison circuit is that each piece of second data is 0; and when the judgment result of the second data comparison circuit shows that at least one of the plurality of second data is not 0, generating a calculation control signal C _ Start to be 1, so that the calculation circuit performs calculation according to the plurality of first data and the plurality of second data to obtain third data.
Optionally, the second control circuit 502 further includes: a second read address generating circuit AG2 and a second control signal generating circuit CL2, wherein the second control signal generating circuit CL2 is configured to generate a control signal of the second read address generating circuit AG2 according to the second data reading circuit control signal and generate the calculation control signal according to the determination result of the second data comparing circuit, and the second read address generating circuit AG2 is configured to generate a read address of the second matrix according to the control signal of the second read address generating circuit AG 2. Illustratively, the second read address generating circuit AG2 determines whether the read address of the second matrix needs to be accumulated according to the C1 signal after obtaining the read head address of the second data sent by the decoding circuit, and if the C1 signal is 1, the AG2 accumulates the read address of the second matrix; if the C1 signal is 0, the AG2 does not accumulate the read addresses of the second matrix, and the second data reading circuit LD _ M2 reads a plurality of second data from the memory or storage area M2.
Fig. 6a to 6e are examples of the calculation process of the matrix calculation circuit in the above embodiment. As shown in fig. 6a, for the matrix multiplication required to be performed by the matrix calculation circuit, M1 is the first matrix, M2 is the second matrix, and M is the third matrix obtained by multiplying the M1 and M2 matrices.
As shown in fig. 6b, is an overall schematic diagram of the matrix multiplication performed using the matrix calculation circuit. The matrix calculation circuit comprises a2 x 2 calculation unit array, two first data read out by LD _ M1 are respectively input into each row in the calculation unit array, two second data read out by LD _ M2 are respectively input into each column in the calculation unit array, and a third matrix M is obtained through calculation of three clock cycles.
As shown in FIG. 6c, the first time of taking and countingAnd (5) calculating. The LD _ M1 takes out the first data 1 and 0 in the first column of the first matrix, buffers the first data into Z _ Det, and outputs the determination result 01 to Ctrl1, and Ctrl1 performs bitwise and logic operation on 01 to obtain C1 as 0; LD _ M2 reads the first row second data 1 and 2 in the second matrix according to the reading address of the current second matrix and buffers them into DB according to C1 being 0, DB judges that the second data is not all 0, then generates the calculation control signal C _ Start being 1, makes the calculation unit array execute the calculation, wherein PU is 00,0The input ends of the two- way filter input 1 and 1, and the product is calculated to be 1; PU (polyurethane)0,1The input ends of the two- way filter input 1 and 2, and the product is calculated to be 2; PU (polyurethane)1,0The input end of the first comparator inputs 0 and 1, and the product is calculated to be 0; PU (polyurethane)1,1The input end of the controller inputs 0 and 2, and the product is calculated to be 0; thereby obtaining a result M temp of the first calculation. After reading the column of the first data, the LD _ M1 accumulates the read address of the first matrix; after reading the row of second data, LD _ M2 accumulates the read address of the second matrix.
Fig. 6d shows the process of the second fetching and the second calculation. The LD _ M1 takes out the first data 0 and 0 in the second column of the first matrix, buffers the first data into Z _ Det, and Z _ Det outputs the determination result 11 to Ctrl1, and Ctrl1 performs bitwise and logical operation on 11 to obtain C1 as 1; the LD _ M2 accumulates the read address of the second matrix according to C1 being 1, and in the second calculation, since both the first data of the first matrix are 0, the calculation unit array skips the calculation; m temp does not change.
Fig. 6e shows the process of the third counting and calculating. The LD _ M1 takes out the first data 0 and 2 in the third column of the first matrix, buffers the first data into Z _ Det, and Z _ Det outputs the determination result 10 to Ctrl1, and Ctrl1 performs bitwise and logic operation on 10 to obtain C1 as 0; the LD _ M2 reads the third row of second data 0 and 0 in the second matrix according to the read address of the current second matrix and buffers the third row of second data into the DB, and if the DB determines that the second data are all 0, the calculation control signal C _ Start is generated to be 0, so the calculation unit array skips the calculation; the M temp is still unchanged.
According to the size parameters of the first matrix, such as the number of rows and columns of the matrix, the matrix calculation can be judged to be finished, and thus the final third matrix M is obtained.
According to the matrix calculation circuit, whether the plurality of first data in the first matrix are all 0 can be judged before calculation, and if the plurality of first data in the first matrix are all 0, the calculation can be skipped, so that the calculation resources are saved; when the plurality of first data in the first matrix are not all 0, further judging whether the plurality of second data taken out are all 0, if all 0, not calculating to save calculation resources. The calculation circuit performs the matrix calculation only when the plurality of first data are not all 0 and the plurality of second data are not all 0, and the calculation resources can be greatly saved when the first matrix and/or the second matrix are sparse.
Fig. 7 is a flowchart of a matrix calculation method according to an embodiment of the present disclosure. As shown in fig. 7, the method includes the steps of:
step S701, reading a plurality of first data in a first matrix according to the reading address of the first matrix;
step S702, generating a second data reading control signal according to the plurality of first data;
step S703, reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix;
step S704, calculating third data according to the plurality of first data and the plurality of second data.
Further, the generating a second data reading control signal according to the plurality of first data includes:
generating the second data read control signal according to whether each of the plurality of data is zero.
Further, the generating the second data reading control signal according to whether each of the plurality of data is zero specifically includes:
generating the second data read control signal indicating a read address to accumulate the second matrix in response to a value of each of the plurality of first data being zero; or,
generating the second data read control signal instructing the second data read circuit to read the plurality of second data according to the read address of the second matrix in response to a value of at least one of the plurality of first data being not zero.
Further, the generating a second data read control signal according to the plurality of first data comprises:
judging whether the value of each of the plurality of first data is zero;
and generating the second data reading control signal according to the judgment result.
Further, the matrix calculation method further includes:
judging whether the value of each of the plurality of second data is zero;
and generating a calculation control signal according to the judgment result.
Further, the generating a calculation control signal according to the determination result includes:
in response to at least one of the plurality of second data being non-zero, a calculation control signal is generated to instruct the calculation circuit to perform a calculation operation.
Further, the matrix calculation method further includes:
generating a first control signal according to the judgment result;
and accumulating the read addresses of the first matrix to obtain a new read address of the first matrix.
In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, other steps may also be added by those skilled in the art, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.
The embodiment of the disclosure also provides a chip, which comprises at least one matrix calculation circuit in the above embodiments.
An embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions, such that the processors when executed perform the matrix computation method of any of the embodiments.
The disclosed embodiments also provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to execute the matrix calculation method described in any one of the foregoing embodiments.
The embodiment of the present disclosure further provides a computer program product, wherein: comprising computer instructions which, when executed by a computing device, may perform the matrix calculation method of any of the preceding embodiments.
The embodiment of the present disclosure further provides a computing device, which includes the chip in any one of the embodiments.
The flowchart and block diagrams in the figures of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Claims (10)
1. A matrix computation circuit, comprising:
a first data reading circuit for reading a plurality of first data in the first matrix according to a read address of the first matrix; generating a second data read control signal according to the plurality of first data;
the second data reading circuit is used for reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix;
and the calculation circuit is used for calculating third data according to the plurality of first data and the plurality of second data.
2. The matrix calculation circuit of claim 1 wherein the first data read circuit is to generate a second data read control signal from the plurality of first data, comprising:
the first data reading circuit is configured to generate the second data reading control signal according to whether each of the plurality of data is zero.
3. The matrix calculation circuit according to claim 2, wherein the generating the second data reading control signal according to whether each of the plurality of data is zero comprises:
generating the second data read control signal indicating a read address to accumulate the second matrix in response to a value of each of the plurality of first data being zero; or,
generating the second data read control signal instructing the second data read circuit to read the plurality of second data according to the read address of the second matrix in response to a value of at least one of the plurality of first data being not zero.
4. The matrix calculation circuit of claim 1 wherein the first data reading circuit comprises:
a first numerical value comparison circuit for judging whether a value of each of the plurality of first data is zero;
and the first control circuit is used for generating the second data reading control signal according to the judgment result of the first numerical comparison circuit.
5. The matrix calculation circuit according to any one of claims 1 to 4, wherein the second data reading circuit further comprises:
a second numerical value comparison circuit for judging whether a value of each of the plurality of second data is zero;
and the second control circuit is used for generating a calculation control signal according to the judgment result of the second numerical comparison circuit.
6. The matrix computation circuit of claim 5, wherein the second control circuit is to:
generating the calculation control signal to instruct the calculation circuit to perform a calculation operation in response to at least one of the plurality of second data being non-zero.
7. The matrix computation circuit according to any one of claims 4 to 6, wherein the first control circuit comprises:
the first control signal generating circuit is used for generating a first control signal according to the judgment result of the first numerical value comparison circuit;
and the first read address generating circuit is used for accumulating the read addresses of the first matrix to obtain a new read address of the first matrix.
8. The matrix computation circuit of any of claims 1-7, wherein:
the plurality of first data are a column of first data in the first matrix;
the plurality of second data are a row of second data in the second matrix.
9. The matrix computation circuit of claim 1, wherein the computation circuit comprises:
a computing unit array, wherein the computing unit array comprises a plurality of computing units;
each row of computing units in the computing unit array respectively receives the plurality of first data;
each column of computing units in the computing unit array respectively receives the plurality of second data.
10. A matrix calculation method, comprising:
reading a plurality of first data in a first matrix according to the reading address of the first matrix;
generating a second data read control signal according to the plurality of first data;
reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix;
and calculating to obtain third data according to the plurality of first data and the plurality of second data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010807002.7A CN114077718A (en) | 2020-08-12 | 2020-08-12 | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010807002.7A CN114077718A (en) | 2020-08-12 | 2020-08-12 | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114077718A true CN114077718A (en) | 2022-02-22 |
Family
ID=80280051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010807002.7A Pending CN114077718A (en) | 2020-08-12 | 2020-08-12 | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114077718A (en) |
-
2020
- 2020-08-12 CN CN202010807002.7A patent/CN114077718A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111915001A (en) | Convolution calculation engine, artificial intelligence chip and data processing method | |
CN112732222B (en) | Sparse matrix accelerated calculation method, device, equipment and medium | |
CN102694554A (en) | Data compression devices, operating methods thereof, and data processing apparatuses including the same | |
WO2021232422A1 (en) | Neural network arithmetic device and control method thereof | |
CN111240746B (en) | Floating point data inverse quantization and quantization method and equipment | |
CN111768458A (en) | Sparse image processing method based on convolutional neural network | |
CN113869507B (en) | Neural network accelerator convolution calculation device and method based on pulse array | |
CN114503126A (en) | Matrix operation circuit, device and method | |
CN111931925A (en) | FPGA-based binary neural network acceleration system | |
CN102567254B (en) | The method that adopts dma controller to carry out data normalization processing | |
CN118519610A (en) | Operation method, device, equipment and storage medium based on pulse array | |
CN111008691A (en) | Convolutional neural network accelerator architecture with weight and activation value both binarized | |
US20230032432A1 (en) | Pruning method based on crossbar architecture and system thereof | |
CN112364985B (en) | Convolution optimization method based on distributed coding | |
CN117808101A (en) | Neural network reasoning method, system and storage medium based on FPGA | |
CN114077718A (en) | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium | |
CN113988279A (en) | Output current reading method and system of storage array supporting negative value excitation | |
CN113836481B (en) | Matrix computing circuit, method, electronic device, and computer-readable storage medium | |
CN111124358B (en) | Operation method and device of sequence accumulator | |
CN113961871A (en) | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium | |
CN114168894A (en) | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium | |
CN114282158A (en) | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium | |
CN112418418A (en) | Data processing method and device based on neural network, storage medium and server | |
CN113591031A (en) | Low-power-consumption matrix operation method and device | |
CN114168895A (en) | Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Country or region after: China Address after: Room 201, No. 6 Fengtong Heng Street, Huangpu District, Guangzhou City, Guangdong Province Applicant after: Guangzhou Ximu Semiconductor Technology Co.,Ltd. Address before: Building 202-24, No. 6, Courtyard 1, Gaolizhang Road, Haidian District, Beijing Applicant before: Beijing SIMM Computing Technology Co.,Ltd. Country or region before: China |