CN212112470U - Matrix multiplication circuit - Google Patents

Matrix multiplication circuit Download PDF

Info

Publication number
CN212112470U
CN212112470U CN202020657493.7U CN202020657493U CN212112470U CN 212112470 U CN212112470 U CN 212112470U CN 202020657493 U CN202020657493 U CN 202020657493U CN 212112470 U CN212112470 U CN 212112470U
Authority
CN
China
Prior art keywords
data
memory
multiplier
adder
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202020657493.7U
Other languages
Chinese (zh)
Inventor
江宏武
于振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202020657493.7U priority Critical patent/CN212112470U/en
Application granted granted Critical
Publication of CN212112470U publication Critical patent/CN212112470U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The utility model discloses a matrix multiplication calculating circuit, include: a first memory for storing a first array data, the first array data being defined as a plurality of first sub-data matrices of N α × β; a second memory for storing a second array data, the second array data being defined as a plurality of second sub data matrices of β × γ; the multiplier-adder array comprises a plurality of multiplier-adder calculation units which are arranged into an alpha x beta matrix, wherein a first input end of each multiplier-adder calculation unit is connected to a first memory through a corresponding random memory so as to synchronously read and store N same-column first data from a first sub-data matrix, the N same-column first data are sequentially and respectively input to the corresponding multiplier-adder calculation units in N clock cycles, and a second input end of each multiplier-adder calculation unit is connected to a second memory so as to read second data from a second sub-data array in the N clock cycles, so that matrix multiplication calculation is carried out on the first data and the second data, and the calculation efficiency is improved.

Description

Matrix multiplication circuit
Technical Field
The utility model relates to a digital circuit technical field, concretely relates to matrix multiplication circuit.
Background
The matrix is a common tool in high-class algebra, and is commonly used in applied mathematics disciplines such as statistical analysis and the like. Plays an irreplaceable role in the industries of science, engineering, machinery, economy, finance and the like. Among them, the General Matrix multiplication (GEMM) operation is one of the very important operators, which is commonly found in software support packages of various heterogeneous computing devices.
The function of the upper layer software is not supported by the bottom layer hardware, and the realization of the computing circuit suitable for the GEMM operator becomes one of the important works for designing a high-performance computing chip. In engineering practice, various problems such as timing, power consumption, bandwidth, etc. are generally considered to optimize the design. The existing GEMM computing circuit is not fully optimized in the aspects of design on an input port network of a multiplier and process scheduling, and has the problems of low computing efficiency and the like.
SUMMERY OF THE UTILITY MODEL
The utility model provides a matrix multiplication calculating circuit, its that can improve the computational efficiency of matrix multiplication calculating circuit when carrying out matrix multiplication.
For solving the above technical problem, the utility model provides a first technical scheme does: there is provided a matrix multiplication circuit including: a first memory for storing a first array of data, wherein the first array of data is defined as a plurality of first sub-data matrices of N α × β; a second memory for storing a second array data, wherein the second array data is defined as a plurality of second sub data matrices of β × γ; and the multiplier-adder array comprises a plurality of multiplier-adder calculation units which are arranged into an alpha x beta matrix, wherein a first input end of each multiplier-adder calculation unit is connected to the first memory through a corresponding random memory so as to synchronously read and store N same-column first data from the first sub-data matrix and sequentially and respectively input the N same-column first data to the corresponding multiplier-adder calculation unit in N clock cycles, and a second input end of each multiplier-adder calculation unit is connected to the second memory so as to read second data from the second sub-data matrix in the N clock cycles, so that matrix multiplication calculation is carried out on the first data and the second data.
Each multiplication and addition computing unit comprises a multiplier and an adder, wherein a first input end of the multiplier is used as the first input end of the multiplication and addition computing unit so as to sequentially and respectively receive the N same-column first data transmitted by the random access memory in the N clock cycles; a second input terminal of the multiplier is used as the second input terminal of the multiply-add calculation unit to read the second data from the second sub-data array within the N clock cycles; the first input end of the adder is connected with the output end of the multiplier to receive the output result of the multiplier, the output end of the adder is used as the output end of the corresponding multiply-add computing unit, and the second input end of the adder is connected with the output end of another multiply-add computing unit adjacent to the same row, so that the outputs of the multiply-add computing units in the same row are output after being cascaded together.
Wherein the matrix multiplication circuit further comprises: and when the first ring memory is connected with the multiplier-adder array to store a calculation result of matrix multiplication of the first data and the second data, the second ring memory outputs the calculation result of matrix multiplication of the first data and the second data stored in the second ring memory in a last calculation task to an external device.
Wherein the matrix multiplication circuit further comprises: and the accumulation circuit is connected between the multiplier accumulator array and the annular memories to accumulate calculation results of the same-row first data among different first sub data arrays and the same-column second data among different second sub data arrays.
Wherein the matrix multiplication circuit further comprises: and the output circuit is connected with the second ring memory and is used for outputting the calculation result of matrix multiplication calculation of the first data and the second data in the last calculation task stored in the second ring memory.
Wherein the matrix multiplication circuit further comprises: and the switching circuit is used for switching among the plurality of annular memories, the accumulation circuit and the output circuit so as to switch part of the annular memories in the plurality of annular memories to be connected to the accumulation circuit and the rest of the annular memories to be connected to the output circuit.
Wherein N is equal to 2.
Wherein the matrix multiplication circuit further comprises: and a pi phase shifter connected between the second memory and the second input terminal of the multiplier-adder array so that each of the multiplier-adder calculation units has the same read frequency from the random access memory as the read frequency from the second memory.
The accumulation circuit comprises a plurality of accumulation units, and each accumulation unit is correspondingly connected with the output end of the multiplication and addition calculation unit of one row.
Each annular memory comprises a plurality of annular memory units, and each annular memory unit is correspondingly connected with one accumulation unit.
The utility model has the advantages that: be different from prior art's condition, the utility model provides an among the matrix multiplication calculating circuit, multiply the adder array and be connected to first memory through a random access memory that corresponds to read and save N with the first data of column from the first subdata matrix of first memory storage, and in N clock cycle respectively with N with the first data input of column to the multiplication calculating unit that corresponds in proper order, multiply the adder array and still be connected to the second memory, read the second data in the second subdata matrix of storage in the second memory in N clock cycle, and then carry out the matrix multiplication to first data and second data. Therefore, the data required to be calculated is stored in advance through the random access memory, so that the calculation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained without inventive work, wherein:
fig. 1 is a schematic structural diagram of a first embodiment of the matrix multiplication circuit of the present invention;
fig. 2 is a schematic structural diagram of a second embodiment of the matrix multiplication circuit of the present invention;
fig. 3a is a timing waveform diagram of the multiplication operation of the first data and the second data when the pi phase shifter 13 is not enabled in the matrix multiplication circuit of the present invention;
fig. 3b is a timing waveform diagram of the multiplication operation of the first data and the second data when the pi phase shifter 13 is enabled in the matrix multiplication circuit of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
When performing matrix operation, the conventional GEMM calculation circuit may receive the second array data stored in the second memory after all the data in the first array data are input to the multiplier-adder array from the first memory, so as to start the matrix calculation of the first array data and the second array data. Before all the data in the first array data are input to the multiplier-adder array from the first memory, the calculation of the multiplier-adder array is invalid, so that the waiting time is increased, the clock waste is caused, and the efficiency is reduced. The matrix multiplication circuit provided in the present application can solve this problem, and the present invention will be described in detail with reference to the accompanying drawings and embodiments.
Fig. 1 is a schematic structural diagram of a matrix multiplication circuit according to a first embodiment of the present invention. In this embodiment, the matrix multiplication circuit includes: a first memory 11, a second memory 13, a random access memory 14, and a multiplier-adder array 15.
The first memory 11 is used for storing a first array data, which is assumed to be a in one embodimentikI.e. the size of the first array data is i × k, wherein the first array data AikIs defined as a plurality of sub data matrices of N alpha x beta, i.e. a first array data AikThe sub-data matrix is divided into a plurality of sub-data matrixes, namely, in one embodiment, the size of the sub-data matrix is N α × β; the second memory 12 is used for storing a second array data, which is assumed to be B in one embodimentkjI.e. the size of the second array data is kxj, wherein the second array data BkjA second sub-data matrix defined as a plurality of beta x gamma, i.e. a second array of data BkjThe sub-data matrix is divided into a plurality of sub-data matrices, i.e. in one embodiment, the size of the sub-data matrix is β × γ.
The multiplier-adder array 15 includes a plurality of multiplier-adder calculation units 151 arranged in an α × β matrix, and the multiplier-adder calculation units 151 in the multiplier-adder array 15 correspond to the size of the first sub-data matrix. The first input U of each multiply-add computing unit 151 is connected to the first memory 11 through a corresponding random access memory 14, and the second input V of each multiply-add computing unit 151 is connected to the second memory 12. Each multiply-add computing unit 151 synchronously reads and stores N same-column first data from the first sub-data matrix stored in the first memory 11 through the first input terminal U, and sequentially and respectively inputs the N same-column first data to the corresponding multiply-add computing unit 151 in N clock cycles; each of the multiply-add calculation units 151 reads second data from the second sub data array stored in the second memory 12 through the second input terminal V for N clock cycles. After each multiply-add unit 151 obtains the first data and the second data from the first memory 11 and the second memory 12 through the first input terminal U and the second input terminal V, respectively, the first data and the second data are subjected to matrix multiplication in each multiply-add unit 151.
Specifically, in the embodiment, the multiplier-adder array 15 has a data of the first array as aikAnd second array data BkjAnd carrying out matrix calculation. In one embodiment, the first array data may be A by the following equation (1)ikAnd second array data BkjAnd (3) performing matrix calculation:
Figure BDA0002466009940000051
wherein, CijRepresents the first array data as AikAnd second array data BkjAnd multiplying the sum, wherein i represents a row of the first array data, j represents a column of the second array data, t represents a t-th row of the first array data, r represents an r-th row of the second array data, and k is a natural number.
In the present embodiment, the random access memory 14 is connected between the first memory 11 and the multiplier-adder array 15, and specifically, one random access memory 14 is connected between the first memory 11 and each of the multiplier-adder calculation units 151 in the multiplier-adder array 15. In the prior art, when performing matrix operation by the GEMM calculation circuit, the second input terminal V of the multiply-add calculation unit 151 receives the second sub-data matrix from the second memory 12 after all data in the first sub-data matrix stored in the first memory 11 is input to the first input terminal U of the multiply-add calculation unit 151, and all calculations in the multiply-add array are invalid before the data in the first sub-data matrix is input to the first input terminal U, which reduces the calculation efficiency. In this embodiment, N same-column first data are synchronously read and stored from the first sub-data matrix stored in the first memory 11 through the random access memory 14, and are sequentially and respectively input to the corresponding multiply-add calculating units 151 in N clock cycles. While the first N same-column first data is subjected to the matrix multiplication operation, the random access memory 14 reads another N same-column first data from the first memory 11. Therefore, the data in the first sub-data matrix do not need to be sent completely, the waiting time is reduced, and the calculation efficiency of matrix multiplication calculation is greatly improved.
Specifically, as shown in fig. 1, each of the multiply-add computing units 151 includes a multiplier 152 and an adder 153. The first input terminal U of the multiply-add calculating unit 151 is a first input terminal U of the multiplier, and the second input terminal V of the multiply-add calculating unit 152 is a second input terminal V of the multiplier. Specifically, the multiplier 152 sequentially receives N first data in the same column from the random access memory 14 through the first input terminal U in N clock cycles; the multiplier 152 reads the second data from the second sub data array through the second input terminal V in N clock cycles, so that the first data and the second data are multiplied in the multiplier 152 to obtain a calculation result of the multiplication.
The first input H of the adder 153 is connected to the output of the multiplier 152 to receive the result of the multiplication of the first data and the second data output from the multiplier 152. An output terminal S of the adder 153 serves as an output terminal of the multiply-add computing unit 151, and a second input terminal W of the adder 153 is connected to an output terminal S of another multiply-add computing unit 151 adjacent to the same row, so that outputs of the multiply-add computing units 151 in the same row are cascaded together and then output. Specifically, the second input terminal of the first multiply-add computing unit 151 (i.e., the second input terminal W of the adder 153) located on the same row is grounded. After the first data and the second data are multiplied to obtain the multiplication result, the multiplication result is input to the adder 153, added to the output result of the previous multiply-add calculation unit 151 adjacent to the same row in the adder 153, and output to the adder 153 in the next multiply-add calculation unit 151 adjacent to the current multiply-add calculation unit 151 through the output terminal S.
In this embodiment, the matrix multiplication circuit further includes: an accumulation circuit 16 and a ring memory 17. The accumulation circuit 16 is connected between the multiplier accumulator array 15 and the ring memory 17 to accumulate the calculation results of the same-row first data between different first sub-data arrays and the same-column second data between different second sub-data arrays. Specifically, the first memory stores a plurality of N α × β first sub-data matrices, after one of the N α × β first sub-data matrices and one of the N α × β second sub-data matrices are calculated in the multiplier-adder array 15, the calculated N α × β first sub-data matrices and the calculated second sub-data matrices are input to the accumulation circuit 16, the accumulation circuit 16 transmits the calculated N α × β first sub-data matrices and the calculated second sub-data matrices to the ring memory 17 for storage, and when the first N α × β first sub-data matrix and the first N α × β second sub-data matrix are operated, invalid data is stored in the ring memory 17, that is, the invalid. After the second sub data matrix and the first sub data matrix of N α × β are calculated, the calculation result is transmitted to the accumulation circuit 16, the accumulation circuit 16 receives the previous calculation result from the ring memory 17, adds the previous calculation result to the current calculation result, and transmits the added result to the ring memory 17 for storage. Until the calculation of the last N α × β first sub-data matrix and the second sub-data matrix is completed, the ring memory 17 outputs the final calculation result, thereby obtaining the calculation result of the matrix multiplication calculation. It should be noted that, after the accumulation circuit 16 receives the last calculation result from the ring memory 17, the currently stored data of the ring memory 17 is invalidated, i.e., cleared.
Specifically, the accumulation circuit 16 includes a plurality of accumulation units 161, and each accumulation unit 161 is connected to the output terminal S of the multiply-add calculation unit 151 of one row. The ring memory 17 includes a plurality of ring memory units 171, and each ring memory unit 171 is connected to one of the accumulation units 161. As shown in fig. 1, each accumulation unit 16 includes two input terminals and one output terminal, wherein one input terminal is connected to the output terminal S of the multiply-add calculation unit 151 located in the same row to receive the calculation results of the current first sub data matrix and the current second sub data matrix of N α × β calculated in the multiply-add array; and the other input terminal is connected to a ring memory unit 171 in the ring memory 17 to receive the calculation results of the previous first sub data matrix and the second sub data matrix of N α × β stored in the ring memory unit 171; the output end is connected to a ring storage unit 171 of the ring memory 17, each accumulation unit 161 adds the calculation result of the current N α × β first sub-data matrix and the calculation result of the previous N α × β first sub-data matrix and the previous N α × β second sub-data matrix, and then transmits the added result to the ring storage unit 171 of the ring memory 17 through the output end for storage, until the last N α × β first sub-data matrix and the last N α × β second sub-data matrix are calculated, the calculation result is output to an external device. In an embodiment, the external device may be a display device for displaying the calculation result, and in another embodiment, the external device may also be a voice device for reading the calculation result, which is not limited in particular. The first memory 11 and the second memory 12 may be sequential memories, the sequential memories are memories accessed according to a certain sequence, and the storage time of the sequential memories is related to the physical location of the storage unit, and in another embodiment, the first memory 11 and the second memory 12 may be random memories, which is not limited herein. Of course, it may be other types of memories, and is not limited herein. The contents of any memory cell of random access memory 14 can be randomly accessed, regardless of the physical location of the memory cell. In one embodiment, the ring memory 17, the first memory 11, the second memory 12, and the random access memory 14 may be the same type of memory, and the data reading and writing modes are different. For example, the first memory 11, the second memory 12, and the random access memory 14 need to be read in a predetermined order when data is read, and the ring memory 17 needs to be written with data after data is read.
In one embodiment, the value of N × γ does not exceed an integer of the depth of the ring memory 17, N may be an integer greater than 1, and in one embodiment, N is equal to 2.
In an embodiment, if the sizes of the first sub-data matrix and the second sub-data matrix are smaller than the limited size, the first sub-data matrix and the second sub-data matrix can be extended by filling zero to be consistent with the limited size.
Further, in this embodiment, the matrix multiplication circuit further includes a pi phase shifter 13. Wherein the pi phase shifter 13 is connected between the second memory 12 and the second input terminal V of the multiplier-adder array 15 so that the read frequency of the read data from the random access memory 14 by each of the multiplier-adder calculation units 151 is the same as the read frequency of the read data from the second memory 12. Specifically, the pi phase shifter 13 can shift the multiplier-adder array 15 by 180 degrees with respect to the read clock of the random access memory 14 to read the second data in the second sub data array, and further, the read frequency of the multiplier-adder calculation unit 151 from the random access memory 14 is the same as the read frequency from the second memory 12.
In an embodiment, in order to avoid an excessively large occupied area of the random access memory 14, N may be set to be equal to 2, and when N is equal to 2, the multiplier-adder array 15 reads N columns of first data in the first sub data array from the first memory 11 through the random access memory 14 at a rate 2 times as fast as that of reading second data in the second sub data array from the second memory 12, that is, a reading clock frequency of the random access memory 14 is 2 times as fast as that of the second memory 12, which may cause high power consumption during data transportation. The pi phase shifter 13 is disposed between the second memory 12 and the second input terminal V of the multiplier-adder array 15, so that the read clock frequency of the random access memory 14 can be reduced by one time, and thus the difficulty in wiring the random access memory 14 is reduced, the difficulty in system wiring is reduced, and the power consumption is reduced in the data transfer process.
In order to ensure that the calculation result is still accurate after the read clock frequency of the random access memory 14 is reduced by one time, after the pi phase shifter 13 is enabled, that is, after the data at the second input terminal V is shifted by 180 degrees, the storage locations of the 3 rd and 4 th data in each group may be swapped, specifically, each group includes 4 data, and after the swapping, the data is the same as the data before the pi phase shifter 13 is not enabled, specifically refer to fig. 3a and 3b, where fig. 3a is a result of multiplying the first data and the second data before the pi phase shifter 13 is not enabled in the multiplier 153, and fig. 3b is a result of multiplying the first data and the second data before the pi phase shifter 13 is enabled in the multiplier 153. As shown in fig. 3b, the storage locations of the 3 rd and 4 th data in each group are exchanged, and the result is the same as that shown in fig. 3a, so as to ensure the accuracy of the calculation result after the reading clock frequency of the ram 14 is reduced by one time.
Referring to fig. 2, a schematic structural diagram of a second embodiment of the matrix multiplication circuit of the present invention is shown, which is different from the first embodiment in that: this embodiment still includes: the switching circuit 18 and the output circuit 19, and the ring memory 17 in this embodiment includes a plurality of circuits.
As shown in fig. 2, wherein the switching circuit 18 is connected to the accumulation circuit 16 and the ring memory 17, the ring memory 17 includes a plurality of ring memories 17, in one embodiment, the ring memory 17 includes a first ring memory 172 and a second ring memory 173, in one embodiment, if the accumulation circuit 16 is connected to the first ring memory 172, the output circuit 19 is connected to the second ring memory 173; if the accumulation circuit 16 is connected to the second ring memory 173, the output circuit 19 is connected to the first ring memory 172.
Specifically, as shown in fig. 2, the first ring memory 172 is connected to the multiplier-adder array 15 through the accumulation circuit 16, and the first ring memory 172 is used for storing the calculation result of the matrix multiplication of the first data and the second data in the current calculation task. Further, the second ring memory 173 is connected to the output circuit 19, and is used for outputting the calculation result of the matrix multiplication of the first data and the second data in the previous calculation task. Specifically, assuming that two matrix calculations, i.e., the a × B matrix and the C × D matrix, are currently required, after the a × B matrix is calculated and stored in the ring memory 172, the first ring memory 172 is disconnected from the multiplier-adder array 15, is connected to the output circuit 19, outputs the calculation result through the output circuit 19, and the second ring memory 173 is connected to the multiplier-adder array 15 through the accumulator 16 while the first ring memory 172 is disconnected from the multiplier-adder array 15, so as to store data generated during the C × D matrix calculation.
Further, the switching circuit 18 is used to switch between the first ring memory 172 and the second ring memory 173 and the accumulation circuit 16 and the output circuit 19 to switch the connection of the first ring memory 172 to the accumulation circuit 16 and the connection of the second ring memory 173 to the output circuit 19, or to connect the second ring memory 173 to the accumulation circuit 16 and the connection of the first ring memory 172 to the output circuit 19.
The matrix multiplication calculating circuit provided by the embodiment controls the connection of the plurality of annular memories and the accumulator or the output circuit by arranging the plurality of annular memories, so that the calculation of the next task can be carried out without waiting time after the current calculation task is finished, and the calculation efficiency is improved.
The utility model provides a matrix multiplication calculating circuit is through setting up random access memory between taking advantage of adder array and first memory to make when carrying out matrix calculation, need not wait for the data in the first subdata array all to input to taking advantage of the first input that adds the computational element and can calculate, saved the latency in the computational process, improved computational efficiency. By providing the pi phase shifter 13 between the second memory and the multiplier-adder array, the read clock frequency of the random access memory 14 by the multiplier-adder array is reduced by one time, and the read frequency of the multiplier-adder calculation unit 151 from the random access memory 14 is made the same as the read frequency from the second memory 12. The wiring difficulty of the system is reduced, and the power consumption is reduced in the data carrying process. By arranging the plurality of annular memories and controlling the connection of the plurality of annular memories and the accumulator or the output circuit, the calculation of the next task can be carried out without waiting time after the current calculation task is finished, and the calculation efficiency is improved.
The above is only the embodiment of the present invention, not the limitation of the patent scope of the present invention, all the equivalent structures or equivalent processes that are used in the specification and the attached drawings or directly or indirectly applied to other related technical fields are included in the patent protection scope of the present invention.

Claims (10)

1. A matrix multiplication circuit, comprising:
a first memory for storing a first array of data, wherein the first array of data is defined as a plurality of first sub-data matrices of N α × β;
a second memory for storing a second array data, wherein the second array data is defined as a plurality of second sub data matrices of β × γ;
and the multiplier-adder array comprises a plurality of multiplier-adder calculation units which are arranged into an alpha x beta matrix, wherein a first input end of each multiplier-adder calculation unit is connected to the first memory through a corresponding random memory so as to synchronously read and store N same-column first data from the first sub-data matrix and sequentially and respectively input the N same-column first data to the corresponding multiplier-adder calculation unit in N clock cycles, and a second input end of each multiplier-adder calculation unit is connected to the second memory so as to read second data from the second sub-data matrix in the N clock cycles, so that matrix multiplication calculation is carried out on the first data and the second data.
2. The matrix multiplication circuit according to claim 1, wherein each of the multiplication and addition calculation units comprises a multiplier and an adder, wherein a first input terminal of the multiplier is used as the first input terminal of the multiplication and addition calculation unit to sequentially receive the N same-column first data transmitted from the random access memory respectively in the N clock cycles; a second input terminal of the multiplier is used as the second input terminal of the multiply-add calculation unit to read the second data from the second sub-data array within the N clock cycles;
the first input end of the adder is connected with the output end of the multiplier to receive the output result of the multiplier, the output end of the adder is used as the output end of the corresponding multiply-add computing unit, and the second input end of the adder is connected with the output end of another multiply-add computing unit adjacent to the same row, so that the outputs of the multiply-add computing units in the same row are output after being cascaded together.
3. The matrix multiplication circuit according to claim 2, further comprising:
and when the first ring memory is connected with the multiplier-adder array to store a calculation result of matrix multiplication of the first data and the second data, the second ring memory outputs the calculation result of matrix multiplication of the first data and the second data stored in the second ring memory in a last calculation task to an external device.
4. The matrix multiplication circuit of claim 3, further comprising:
and the accumulation circuit is connected between the multiplier accumulator array and the annular memories to accumulate calculation results of the same-row first data among different first sub data arrays and the same-column second data among different second sub data arrays.
5. The matrix multiplication circuit of claim 4, further comprising:
and the output circuit is connected with the second ring memory and is used for outputting the calculation result of matrix multiplication calculation of the first data and the second data in the last calculation task stored in the second ring memory.
6. The matrix multiplication circuit of claim 5, further comprising:
and the switching circuit is used for switching among the plurality of annular memories, the accumulation circuit and the output circuit so as to switch part of the annular memories in the plurality of annular memories to be connected to the accumulation circuit and the rest of the annular memories to be connected to the output circuit.
7. The matrix multiplication circuit of claim 1 wherein N is equal to 2.
8. The matrix multiplication circuit of claim 7, further comprising:
and a pi phase shifter connected between the second memory and the second input terminal of the multiplier-adder array so that each of the multiplier-adder calculation units has the same read frequency from the random access memory as the read frequency from the second memory.
9. The matrix multiplication circuit of claim 4, wherein the accumulation circuit comprises a plurality of accumulation units, each of the accumulation units being connected to the output of a corresponding one of the multiplication and addition calculation units of a row.
10. The matrix multiplication circuit according to claim 9, wherein each of the ring memories comprises a plurality of ring memory cells, and each of the ring memory cells is connected to one of the accumulation units.
CN202020657493.7U 2020-04-24 2020-04-24 Matrix multiplication circuit Active CN212112470U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202020657493.7U CN212112470U (en) 2020-04-24 2020-04-24 Matrix multiplication circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202020657493.7U CN212112470U (en) 2020-04-24 2020-04-24 Matrix multiplication circuit

Publications (1)

Publication Number Publication Date
CN212112470U true CN212112470U (en) 2020-12-08

Family

ID=73612583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202020657493.7U Active CN212112470U (en) 2020-04-24 2020-04-24 Matrix multiplication circuit

Country Status (1)

Country Link
CN (1) CN212112470U (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581595A (en) * 2020-04-24 2020-08-25 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit
CN112801276A (en) * 2021-02-08 2021-05-14 清华大学 Data processing method, processor and electronic equipment
CN113838497A (en) * 2021-09-23 2021-12-24 南京后摩智能科技有限公司 Simplified integrated circuit for data reading

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581595A (en) * 2020-04-24 2020-08-25 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit
CN111581595B (en) * 2020-04-24 2024-02-13 科大讯飞股份有限公司 Matrix multiplication calculation method and calculation circuit
CN112801276A (en) * 2021-02-08 2021-05-14 清华大学 Data processing method, processor and electronic equipment
CN113838497A (en) * 2021-09-23 2021-12-24 南京后摩智能科技有限公司 Simplified integrated circuit for data reading

Similar Documents

Publication Publication Date Title
CN212112470U (en) Matrix multiplication circuit
US8051124B2 (en) High speed and efficient matrix multiplication hardware module
CN111581595A (en) Matrix multiplication calculation method and calculation circuit
CN111291323B (en) Matrix multiplication processor based on systolic array and data processing method thereof
US5299144A (en) Architecture for covariance matrix generation
CN111915001A (en) Convolution calculation engine, artificial intelligence chip and data processing method
US11640303B2 (en) Calculating device
EP4318275A1 (en) Matrix multiplier and method for controlling matrix multiplier
US20140082282A1 (en) Multi-granularity parallel storage system and storage
CN115437602A (en) Arbitrary-precision calculation accelerator, integrated circuit device, board card and method
CN115860080A (en) Computing core, accelerator, computing method, device, equipment, medium and system
US5422836A (en) Circuit arrangement for calculating matrix operations in signal processing
CN112446007A (en) Matrix operation method, operation device and processor
US20230253032A1 (en) In-memory computation device and in-memory computation method to perform multiplication operation in memory cell array according to bit orders
JPS63501828A (en) multi-zone processor
US6732133B2 (en) Montgomery multiplier with dual independent channels
CN116050492A (en) Expansion unit
JP3333779B2 (en) Matrix arithmetic unit
US4970675A (en) Multiplier for binary numbers comprising a very high number of bits
CN114237551B (en) Multi-precision accelerator based on pulse array and data processing method thereof
CN108664272B (en) Processor core structure
CN117742594A (en) Data storage method and device, electronic equipment and storage medium
CN118093018A (en) Memory computing core, memory operation method, memory processor and processing method
JPS63273176A (en) Space filtering device
JPH05165875A (en) Vector arithmetic processor

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant