CN117786293A - Matrix device and method of operating the same - Google Patents

Matrix device and method of operating the same Download PDF

Info

Publication number
CN117786293A
CN117786293A CN202211274537.8A CN202211274537A CN117786293A CN 117786293 A CN117786293 A CN 117786293A CN 202211274537 A CN202211274537 A CN 202211274537A CN 117786293 A CN117786293 A CN 117786293A
Authority
CN
China
Prior art keywords
matrix
elements
string
memory
native
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211274537.8A
Other languages
Chinese (zh)
Inventor
郭皇志
阮郁善
陈建文
骆子仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuangxin Wisdom Co ltd
Original Assignee
Chuangxin Wisdom Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chuangxin Wisdom Co ltd filed Critical Chuangxin Wisdom Co ltd
Publication of CN117786293A publication Critical patent/CN117786293A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The invention provides a matrix device and an operation method thereof. The transpose circuit is to receive a first string of elements representing a native matrix from a matrix source, wherein all elements of the native matrix are arranged in one of a "row-major" and a "column-major" manner in the first string of elements. The transpose circuit transposes the first element string into a second element string, where the second element string is equivalent to an element string arranged by the other of "row-major" and "column-major" all elements of the native matrix. The memory is coupled to the transpose circuit to receive the second element string.

Description

Matrix device and method of operating the same
Technical Field
The invention relates to a matrix device and an operation method thereof. The transpose circuit is to receive a first string of elements representing a native matrix from a matrix source, wherein all elements of the native matrix are arranged in one of a "row-major" and a "column-major" manner in the first string of elements. The transpose circuit transposes the first element string into a second element string, where the second element string is equivalent to an element string arranged by the other of "row-major" and "column-major" all elements of the native matrix. The memory is coupled to the transpose circuit to receive the second element string.
Background
Matrix multiplication is the fundamental operation in a calculator system. After the operation circuit completes a previous matrix operation, different elements of the matrix (operation result) are sequentially written into the random volatile memory (dynamic random access memory, DRAM) according to the element generation sequence of the previous matrix operation. For example, the matrix may be "column major" or "row major" stored in the DRAM. However, the order of placement of the matrix elements of the previous matrix operation in the DRAM may be detrimental to the retrieval of the next matrix operation. For example, the operation result matrix of the previous matrix operation is stored in the DRAM in a column-wise manner for the next matrix operation, but the input manner of the operand (operand) matrix of the next matrix operation is in a behavior-wise manner. Thus for the next matrix operation, the elements of the operand matrix are discretely placed in different locations (discrete addresses) of the DRAM.
When the elements fetched in the same batch for the next matrix operation are consecutive addresses located in the DRAM, the operation circuit may read the elements at consecutive addresses from the DRAM at a time using a burst read instruction. When the elements fetched by the next matrix operation are discrete addresses located in the DRAM, the operation circuit must use a plurality of read instructions to read the elements from the DRAM a plurality of times. In general, the number of reads to a DRAM is proportional to the power consumption. How to store the matrix generated by the previous matrix operation in DRAM so that the next matrix operation can take the matrix efficiently is one of the important issues. If the DRAM access times can be reduced in the process of taking the matrix from the DRAM, the performance of the matrix operation can be effectively improved, and the circuit power consumption can be effectively reduced.
It should be noted that the content of the "background art" section is intended to aid in understanding the present invention. Some (or all) of the disclosure in the "background art" section may not be known to those of skill in the art. The disclosure in the background section is not presented for the purpose of providing a representation of what has been known to those of ordinary skill in the art prior to the application of the present invention.
Disclosure of Invention
The invention provides a matrix device and an operation method thereof, which are used for improving efficiency.
The invention provides a matrix device, which comprises a transpose circuit and a memory. The transpose circuit is to receive a first string of elements representing the native matrix from the matrix source and transpose the first string of elements into a second string of elements, wherein all elements of the native matrix are arranged in one of a "row-major" and a "column-major" manner to the first string of elements, and the second string of elements is equivalent to a string of elements in the other of a "row-major" and a "column-major" manner to the all elements of the native matrix. The memory is coupled to the transpose circuit to receive the second element string.
In an embodiment of the present invention, the matrix device may be used in a method of operation, including: receiving, by a transpose circuit of the matrix device, a first string of elements representing a native matrix from a matrix source; transpose the first element string into a second element string by the transpose circuit, wherein one of the "behavior-based" and "column-based" elements of the original matrix is arranged in the first element string, and the second element string is equivalent to the element string arranged by the other of the "behavior-based" and "column-based" elements of the original matrix; and receiving the second element string from a memory of the matrix device.
Based on the above, the transpose circuit according to the embodiments of the present invention can match the element arrangement in the memory with the characteristics of the access calculation through the transpose method. Therefore, the efficiency of the matrix device can be effectively improved.
In order to make the above features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
Drawings
Fig. 1 is a schematic circuit block diagram of a matrix device according to an embodiment of the present invention.
Fig. 2 is a schematic circuit block diagram of a matrix device according to another embodiment of the invention.
FIG. 3 is a diagram illustrating the storage locations of elements in the memory without the transpose circuit.
Fig. 4 is a schematic diagram illustrating the element storage locations in the memory in the case of the transpose circuit 210 for transposition.
FIG. 5 is a schematic diagram showing the storage of elements in a static random access memory.
Fig. 6 is a flow chart of a method of operating a matrix device according to an embodiment of the invention.
Description of the reference numerals
100. 200: matrix device
110. 210: transpose circuit
120. 220, 240: memory
230: matrix multiplication circuit
A0, A1, A2, A3, B0, B1, B2, B3, C0, C1: address of
ES1, ES2, ES3: element string
S601, S602, S603: step (a)
X 00 、X 01 、X 10 、X 11 、Y 00 、Y 01 、Y 10 、Y 11 : element(s)
Detailed Description
Reference will now be made in detail to the exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
The term "coupled" as used throughout this specification (including the claims) may refer to any direct or indirect connection. For example, if a first device couples (or connects) to a second device, that connection may be through a direct connection, or through an indirect connection via other devices and connections. The terms first, second and the like in the description and in the claims are used for naming the elements or distinguishing between different embodiments or ranges and not for limiting the number of elements or the order of the elements. In addition, wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts. The components/elements/steps in different embodiments using the same reference numerals or using the same terminology may be referred to with respect to each other.
Fig. 1 is a schematic diagram of a circuit block (circuit block) of a matrix device 100 according to an embodiment of the invention. The matrix device 100 shown in fig. 1 includes a transpose (transfer) circuit 110 and a memory 120. In accordance with various design requirements, in some embodiments, the transpose circuit 110 can be implemented as a hardware (hardware) circuit. In other embodiments, the transpose circuit 110 can be implemented as firmware, software, or a combination of the two. In still other embodiments, the implementation of the transpose circuit 110 can be a combination of multiple of hardware, firmware, or software.
In hardware, the transpose circuit 110 can be implemented as logic on an integrated circuit (integrated circuit). For example, the relevant functions of the transpose circuit 110 can be implemented in various logic blocks, modules, and circuits in one or more controllers, microcontrollers (microcontrollers), microprocessors (microprocessors), application-specific integrated circuits (ASICs), digital signal processors (digital signal processor, DSPs), field programmable gate arrays (Field Programmable Gate Array, FPGAs), and/or other processing units. The above-described matrix devices, transpose circuits, and/or associated functions of memory may be implemented as hardware circuits, such as various logic blocks, modules, and circuits in an integrated circuit, using a hardware description language (hardware description languages, such as Verilog HDL or VHDL) or other suitable programming language.
The functions associated with the transpose circuit 110 described above can be implemented as programming code (programming codes) in software and/or firmware. The transpose circuit 110 is implemented, for example, using a general programming language (programming languages, e.g., C, C ++ or assembly language) or other suitable programming language. The programming code may be recorded/stored on a "non-transitory computer readable medium (non-transitory computer readable medium)". In some embodiments, the non-transitory computer readable medium includes, for example, a semiconductor memory and/or a storage device. The semiconductor Memory includes a Memory card, a Read Only Memory (ROM), a FLASH Memory (FLASH Memory), a programmable logic circuit, or other semiconductor Memory. The storage device includes a tape (tape), a disk (disk), a hard disk (HDD), a Solid-state drive (SSD), or other storage devices. An electronic device, such as a central processing unit (Central Processing Unit, CPU), controller, microcontroller, or microprocessor, may read and execute the programming code from the non-transitory computer readable medium to perform the functions associated with the transpose circuit 110.
The transpose circuit 110 may receive an element string ES1 representing a native matrix from a matrix source (not shown in fig. 1). The present implementation is not limited to the matrix sources. For example, in some embodiments, the matrix source may include a storage device, a network, a matrix multiplication circuit, or other source for providing an operand (operand) matrix. In some embodiments, the matrix multiplication circuit may include an array of product accumulators (multiply accumulate, MAC).
The transpose circuit 110 can transpose the element string ES1 to the element string ES2. Wherein all elements of a native matrix are used as' toOne of the behavior dominant modes "and" column dominant mode "is arranged in the element string ES1, and the element string ES2 is equivalent to one element string in which all elements of the native matrix are arranged in the other of the behavior dominant mode and the column dominant mode. For example, assume that the content of the native matrix a is shown in the following equation 1. The content of the element string ES1 in which the primitive matrix A 'is arranged in the behavior main mode' is { X ] 00 ,X 01 ,X 10 ,X 11 }. After the transposition by the transposition circuit 110, the original matrix A is transposed into an element string ES2 arranged by 'taking the column as the main direction', and the content of the element string ES2 is { X ] 00 ,X 10 ,X 01 ,X 11 }。
The memory 120 is coupled to the transpose circuit 110. The transpose circuit 110 transposes the element string ES1 of the original matrix to obtain an element string ES2, and transmits the element string ES2 to the memory 120. The memory 120 may be any type of memory according to practical designs. For example, in some embodiments, the memory 120 may be a static random access memory (static random access memory, SRAM), a dynamic random access memory (dynamic random access memory, DRAM), a Magnetic Random Access Memory (MRAM), a magnetoresistive random access memory (magnetoresistive random access memory, MRAM), a Flash (Flash) memory, or other memory. The memory 120 receives and stores the element string ES2 as an operand (operand) matrix for the next matrix operation.
For example, fig. 2 is a schematic circuit block diagram of a matrix device 200 according to another embodiment of the invention. The matrix device 200 shown in fig. 2 includes a transpose circuit 210, a memory 220, a matrix multiplication circuit 230, and a memory 240. The matrix device 200, the transpose circuit 210, and the memory 220 of fig. 2 can be described with reference to the matrix device 100, the transpose circuit 110, and the memory 120 of fig. 1, and so forth, and thus are not described in detail herein. The matrix device 200 shown in fig. 2 may be used as one of many embodiments of the matrix device 100 shown in fig. 1, and thus the matrix device 100, the transpose circuit 110, and the memory 120 shown in fig. 1 may be described with reference to the matrix device 200, the transpose circuit 210, and the memory 220 shown in fig. 2.
The matrix multiplication circuit 230 is coupled to the transpose circuit 210, the memory 220, and the memory 240. The matrix multiplication circuit 230 may perform a front-layer calculation of a neural network (neural network) calculation to generate the primordial matrix. The matrix multiplication circuit 230 may serve as a matrix source to provide the element string ES1 of the native matrix to the transpose circuit 210. The transpose circuit 210 can transpose the element string ES1 to the element string ES2. The memory 220 is coupled to the transpose circuit 210 for receiving and storing the element string ES2. Matrix multiplication circuit 230 may read element string ES3 (matrix a) from memory 240 as a weight matrix (weight matrix) and element string ES2 (matrix B) from memory 220 as an input matrix (input matrix) for the next level of computation of the neural network computation. Generally, the weight matrix is a pre-trained parameter.
For example, assume that memory 220 comprises Dynamic Random Access Memory (DRAM). Based on the transpose operation of the transpose circuit 210, all elements of the same column of the native matrix (the result of the previous layer computation) may be stored at multiple sequential addresses in the memory 220. The memory 220 provides all elements of the same column of the native matrix to the matrix multiplication circuit 230 in burst mode to cause the matrix multiplication circuit 230 to perform the next level of computation of neural network computation.
The present embodiment does not limit the matrix operation of the matrix multiplication circuit 230. In some applications, the matrix operations may include matrix addition operations, matrix multiplication operations, multiply-accumulate (MAC) operations, and/or other matrix operations. For example, assume that the content of the native matrix a is shown in the above equation 1, and the content of the native matrix B is shown in the following equation 2. The two 2x2 matrices A, B are multiplied to obtain a matrix Z as shown in the following equation 3.
The matrix multiplication performed by matrix multiplication circuit 230 may include four steps. Step one: matrix multiplication circuit 230 may fetch elements X of matrix a from memory 240 00 ,X 01 ]Element [ Y ] of matrix B is fetched from memory 220 00 ,Y 10 ]And calculate X 00 Y 00 +X 01 Y 10 . Step two: matrix multiplication circuit 230 may retain elements [ X0 ] of matrix A 0 ,X 01 ]Element [ Y ] of matrix B is fetched from memory 220 01 ,Y 11 ]And calculate X 00 Y 01 +X 01 Y 11 . Step three: matrix multiplication circuit 230 may fetch elements X of matrix a from memory 240 10 ,X 11 ]Element [ Y ] of matrix B is fetched from memory 220 00 ,Y 10 ]And calculate X 10 Y 00 +X 11 Y 10 . Step four: matrix multiplication circuit 230 may retain elements [ X ] of matrix A 10 ,X 11 ]Element [ Y ] of matrix B is fetched from memory 220 01 ,Y 11 ]And calculate X 10 Y 01 +X 11 Y 11 . To this end, the matrix multiplication circuit 230 can obtain a matrix Z shown in equation 3.
The matrix multiplication performed by the matrix multiplication circuit 230 described in the preceding paragraph includes four steps, and six reads are performed on the memory 220. If the calculation is performed on the basis of data reuse, the matrix multiplication can be simplified from four steps to two optimization steps. Optimizing: matrix multiplication circuit 230 may fetch elements X of matrix a from memory 240 00 ,X 10 ]Element [ Y ] of matrix B is fetched from memory 220 00 ,Y 01 ]And calculate X 00 Y 00 、X 00 Y 01 、X 10 Y 00 X is as follows 10 Y 01 . Optimizing: matrix multiplication circuit 230 may fetch elements X of matrix a from memory 240 01 ,X 11 ]Element [ Y ] of matrix B is fetched from memory 220 10 ,Y 11 ]And calculate X 01 Y 10 、X 01 Y 11 、X 11 Y 10 、X 11 Y 11 . To this end, the matrix multiplication circuit 230 may use X of the optimization step one and the optimization step two 00 Y 00 、X 00 Y 01 、X 10 Y 00 、X 10 Y 01 、X 01 Y 10 、X 01 Y 11 、X 11 Y 10 、X 11 Y 11 Resulting in matrix Z shown in equation 3.
As a comparison to fig. 4, fig. 3 shows the schematic diagram of the element storage locations in the memories 220 and 240 in the case where the transpose circuit 210 is not transposed (i.e., the element string ES2 is identical to the element string ES 1). It is assumed here that matrix a is stored in memory 240 in column-wise fashion, and that all elements of matrix B are also arranged in column-wise fashion in element string ES1. That is, matrix B is stored in memory 220 in column-major fashion. In the first optimization step, the matrix multiplication circuit 230 can extract the element [ X ] of the matrix A from the consecutive addresses A0 and A1 of the memory 240 in a burst mode 00 ,X 10 ]. Because of the elements of matrix B [ Y ] 00 ,Y 01 ]Discrete addresses (discrete addresses) B0 and B2 in memory 220 cannot be fetched using burst, so that matrix multiplication circuit 230 fetches element Y from memory 220 in two times 00 ]And element [ Y ] 01 ]. In the second optimization step, the matrix multiplication circuit 230 can fetch the element [ X ] of the matrix A from the consecutive addresses A2 and A3 of the memory 240 in a burst manner 01 ,X 11 ]. Because of the elements of matrix B [ Y ] 10 ,Y 11 ]Discrete addresses (discrete addresses) B1 and B3 in memory 220 cannot be fetched using burst, so that matrix multiplication circuit 230 fetches element Y from memory 220 in two times 10 ]And element [ Y ] 11 ]。
Fig. 4 is a schematic diagram illustrating the element storage locations in the memories 220 and 240 in the case of the transpose circuit 210 for transposition. It is assumed here that matrix a is stored in memory 240 in column-wise fashion, and that all elements of matrix B are also arranged in column-wise fashion in element string ES1. Based on transposition circuitThe transpose operation of 210, the element string ES2 is equivalent to one element string in which all elements of the native matrix B are arranged in a behavior-dominant manner. The element string ES2 is sequentially and continuously stored in the memory 220. That is, matrix B is stored in memory 220 in a behavior-based manner, as shown in fig. 4. In the first optimization step, the matrix multiplication circuit 230 can extract the element [ X ] of the matrix A from the consecutive addresses A0 and A1 of the memory 240 in a burst mode 00 ,X 10 ]And burst-wise fetching element [ Y ] of matrix B from consecutive addresses B0 and B1 of memory 220 00 ,Y 01 ]. In the second optimization step, the matrix multiplication circuit 230 can fetch the element [ X ] of the matrix A from the consecutive addresses A2 and A3 of the memory 240 in a burst manner 01 ,X 11 ]And burst-wise fetching element [ Y ] of matrix B from consecutive addresses B2 and B3 of memory 220 10 ,Y 11 ]。
FIG. 5 is a schematic diagram showing the storage of elements in a Static Random Access Memory (SRAM). In the embodiment shown in fig. 5, the memory 220 may be a piece of SRAM, where the depth of the SRAM is 2 (two addresses) and the data width is 2 (two elements). It is assumed here that all elements of the matrix B are arranged in the element string ES1 in a column-wise manner. Based on the transpose operation of the transpose circuit 210, all elements of the matrix B are arranged in a row-wise fashion in the element string ES2. That is, matrix B is stored in memory 220 (SRAM) in a behavior-based manner, as shown in fig. 5. In the first optimization step, the matrix multiplication circuit 230 may extract the element [ X ] of the matrix A from the consecutive addresses of the memory 240 (e.g. DRAM) in a burst (burst) manner 00 ,X 10 ]And element [ Y ] of matrix B is fetched from address C0 of memory 220 (SRAM) 00 ,Y 01 ]. In the second optimization step, the matrix multiplication circuit 230 may extract the elements [ X ] of the matrix A from the consecutive addresses of the memory 240 (DRAM) in a burst manner 01 ,X 11 ]And extracting element [ Y ] of matrix B from address C1 of memory 220 (SRAM) 10 ,Y 11 ]。
Fig. 6 is a flow chart of a method of operating a matrix device according to an embodiment of the invention. Please refer to fig. 1 and fig. 6. In step S601, the transpose circuit 110 of the matrix apparatus 100 receives an element string ES1 (first element string) representing a native matrix from a matrix source. Wherein all elements of the primordial matrix are arranged in one of a "behavior-based" and a "column-based" manner in the element string ES1. In step S602, the transpose circuit 110 can transpose the element string ES1 to an element string ES2 (second element string). Wherein the element string ES2 is equivalent to one element string in which all elements of the original matrix are arranged in the "behavior-based manner" and the "column-based manner" other. In step S603, the memory 120 of the matrix device 100 receives and stores the element string ES2 as an operand matrix for the next matrix operation.
In summary, the transpose circuit in the above embodiments can make the element arrangement in the memory conform to the characteristics of access computation through the transpose method. Therefore, the matrix device can reduce the energy consumption and time required by memory access and reading, thereby effectively improving the efficiency of the matrix device.
Although the present invention has been described with reference to the above embodiments, it should be understood that the invention is not limited thereto, but rather is capable of modification and variation without departing from the spirit and scope of the present invention.

Claims (14)

1. A matrix device, comprising:
a transpose circuit to receive a first string of elements representing a native matrix from a matrix source, and transpose the first string of elements into a second string of elements, wherein all elements of the native matrix are arranged in one of a behavior dominant manner and a column dominant manner in the first string of elements, and the second string of elements is equivalent to the string of elements in which all elements of the native matrix are arranged in the other of the behavior dominant manner and the column dominant manner; and
and a memory coupled to the transpose circuit for receiving the second element string.
2. The matrix device of claim 1, wherein the matrix source comprises a storage device, a network, or a matrix multiplication circuit.
3. The matrix device of claim 2 wherein the matrix multiplication circuit comprises a multiply accumulator array.
4. The matrix device of claim 1, further comprising:
a matrix multiplication circuit coupled to the transpose circuit and the memory, wherein the matrix multiplication circuit performs a front-level computation of a neural network computation to generate the native matrix, the matrix multiplication circuit acts as the matrix source to provide the first element string of the native matrix to the transpose circuit, and the matrix multiplication circuit reads the second element string from the memory to perform a next-level computation of the neural network computation.
5. The matrix device of claim 4, wherein the memory comprises a dynamic random access memory that provides all elements of a column of the native matrix in burst mode to the matrix multiplication circuit for the next level of computation of the neural network computation.
6. The matrix device of claim 5, wherein all elements of a column of the native matrix are stored at a plurality of consecutive addresses of the memory.
7. The matrix device of claim 1, wherein all elements of the native matrix are arranged in the column-wise manner in the first element string, the second element string is equivalent to the element string in which all elements of the native matrix are arranged in the row-wise manner, and the second element string is sequentially and consecutively stored in the memory.
8. A method of operating a matrix device, comprising:
receiving, by a transpose circuit of the matrix apparatus, a first string of elements representing a native matrix from a matrix source;
transpose, by the transpose circuit, the first string of elements to a second string of elements, wherein all elements of the native matrix are arranged in one of a row-wise dominant manner and a column-wise dominant manner in the first string of elements, and the second string of elements is equivalent to the string of elements in the other of the row-wise dominant manner and the column-wise dominant manner in which all elements of the native matrix are arranged; and
the second element string is received by a memory of the matrix device.
9. The method of claim 8, wherein the matrix source comprises a storage device, a network, or a matrix multiplication circuit.
10. The method of operation of claim 9 wherein the matrix multiplication circuit comprises a multiply accumulator array.
11. The method of operation of claim 8, further comprising:
performing a front-layer computation of a neural network computation by a matrix multiplication circuit of the matrix device to generate the native matrix, wherein the matrix multiplication circuit acts as the matrix source to provide the first string of elements of the native matrix to the transpose circuit; and
and reading the second element string from the memory by the matrix multiplication circuit to perform the next-layer calculation of the neural network calculation.
12. The method of operation of claim 11, wherein the memory comprises dynamic random access memory, the method of operation further comprising:
providing all elements of a column of the native matrix to the matrix multiplication circuit in burst mode by the memory for the next level of computation of the neural network computation.
13. The method of claim 12, wherein all elements of a column of the native matrix are stored at a plurality of consecutive addresses of the memory.
14. The method of claim 8, wherein all elements of the native matrix are arranged in the column-wise manner in the first element string, the second element string is equivalent to the element string in which all elements of the native matrix are arranged in the row-wise manner, and the second element string is sequentially and consecutively stored in the memory.
CN202211274537.8A 2022-09-20 2022-10-18 Matrix device and method of operating the same Pending CN117786293A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW111135607A TWI808000B (en) 2022-09-20 2022-09-20 Matrix device and operation method thereof
TW111135607 2022-09-20

Publications (1)

Publication Number Publication Date
CN117786293A true CN117786293A (en) 2024-03-29

Family

ID=88149144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211274537.8A Pending CN117786293A (en) 2022-09-20 2022-10-18 Matrix device and method of operating the same

Country Status (3)

Country Link
US (1) US20240111827A1 (en)
CN (1) CN117786293A (en)
TW (1) TWI808000B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI570573B (en) * 2014-07-08 2017-02-11 財團法人工業技術研究院 Circuit for matrix transpose
US10909447B2 (en) * 2017-03-09 2021-02-02 Google Llc Transposing neural network matrices in hardware
TWI769810B (en) * 2017-05-17 2022-07-01 美商谷歌有限責任公司 Special purpose neural network training chip
US10768899B2 (en) * 2019-01-29 2020-09-08 SambaNova Systems, Inc. Matrix normal/transpose read and a reconfigurable data processor including same

Also Published As

Publication number Publication date
TWI808000B (en) 2023-07-01
US20240111827A1 (en) 2024-04-04

Similar Documents

Publication Publication Date Title
CN109992743B (en) Matrix multiplier
US9639458B2 (en) Reducing memory accesses for enhanced in-memory parallel operations
CN114391135A (en) Method for performing in-memory processing operations on contiguously allocated data, and related memory device and system
US20100106692A1 (en) Circuit for compressing data and a processor employing same
US9110778B2 (en) Address generation in an active memory device
CN111915001B (en) Convolution calculation engine, artificial intelligent chip and data processing method
US11675624B2 (en) Inference engine circuit architecture
CN101689105A (en) A processor exploiting trivial arithmetic operations
KR20220051006A (en) Method of performing PIM (PROCESSING-IN-MEMORY) operation, and related memory device and system
US9146696B2 (en) Multi-granularity parallel storage system and storage
CN112416433A (en) Data processing device, data processing method and related product
US9171593B2 (en) Multi-granularity parallel storage system
US10942889B2 (en) Bit string accumulation in memory array periphery
CN117786293A (en) Matrix device and method of operating the same
US10942890B2 (en) Bit string accumulation in memory array periphery
US11487699B2 (en) Processing of universal number bit strings accumulated in memory array periphery
US20220108203A1 (en) Machine learning hardware accelerator
EP3066583A1 (en) Fft device and method for performing a fast fourier transform
US11226740B2 (en) Selectively performing inline compression based on data entropy
US9582473B1 (en) Instruction set to enable efficient implementation of fixed point fast fourier transform (FFT) algorithms
TW202414245A (en) Matrix device and operation method thereof
CN112766471A (en) Arithmetic device and related product
US11275562B2 (en) Bit string accumulation
US20230177106A1 (en) Computational circuit with hierarchical accumulator
US11669489B2 (en) Sparse systolic array design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination