CN114168897A - Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium - Google Patents

Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN114168897A
CN114168897A CN202010956493.1A CN202010956493A CN114168897A CN 114168897 A CN114168897 A CN 114168897A CN 202010956493 A CN202010956493 A CN 202010956493A CN 114168897 A CN114168897 A CN 114168897A
Authority
CN
China
Prior art keywords
data
matrix
circuit
bitmap
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010956493.1A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Priority to CN202010956493.1A priority Critical patent/CN114168897A/en
Publication of CN114168897A publication Critical patent/CN114168897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the disclosure discloses a matrix calculation circuit, a matrix calculation method, electronic equipment and a computer-readable storage medium. Wherein the matrix calculation circuit includes: the first data reading circuit is used for reading and caching first data in the first matrix and bitmap data in the bitmap matrix; outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data; the second data reading circuit is used for reading and caching second data in the second matrix; outputting at least one of the second data according to the location information; and the calculation circuit is used for performing calculation on the first data and the second data to obtain third data. The matrix calculation circuit controls the output of the plurality of second data through the read position information of the plurality of first data, and solves the technical problems that only single data calculation and complex access address calculation can be performed during matrix calculation in the prior art.

Description

Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
Technical Field
The present disclosure relates to the field of processors, and in particular, to a matrix calculation circuit, a matrix calculation method, an electronic device, and a computer-readable storage medium.
Background
With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher. Chips are the cornerstone of task assignment, which fundamentally determines the ability of people to process data. From the application field, the chip mainly has two routes: one is a general chip route, such as a cpu (central processing unit), which provides great flexibility but is less computationally efficient in processing domain-specific algorithms; the other is a special chip route, such as tpu (thermoplastic processing unit), which can exert higher effective computing power in some specific fields, but in the face of flexible and versatile more general fields, the processing capability is worse or even impossible. Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.
In the neural network calculation, the convolution calculation accounts for most of the total operation amount, and the convolution calculation can be converted into matrix multiplication calculation, so that the matrix multiplication calculation speed is improved to improve the throughput in the neural network task, reduce the time delay and improve the effective calculation power of a chip.
The matrix formed by the data in many neural networks (the data includes the parameter data and the input data in the neural networks) is a sparse matrix, that is, the matrix has a large number of elements with 0 values. In order to reduce the storage capacity and bandwidth occupation of data in the neural network calculation, a sparse matrix is compressed for storage; in order to improve the matrix operation speed, the sparse matrix operation is optimized.
FIG. 1a is a schematic diagram of a matrix multiplication computation in a neural network. As shown in FIG. 1a, M1 is a data matrix, M2 is a parameter matrix, and M is an output matrix. Each of the data in a row in M1 and each of the parameters in a column in M2 are multiplied and added to obtain one data in M. Wherein, in fig. 1a, the two matrices M1 and M2 may be one sparse matrix or both sparse matrices.
Fig. 1b shows a schematic compression of the matrix. For storage in sparse matrices, a general compression method can be employed: only elements other than 0 are stored. While storing the value of this non-0 element, it stores its position information in the matrix, i.e., the relative coordinates X and Y of the element in the matrix. Wherein X represents the serial number of the matrix row, and Y represents the serial number of the matrix column. In this method, data and coordinates are stored as one data structure, and the data structure is used as a unit. As shown in fig. 1b, taking an MxN matrix as an example, the left MxN matrix is compressed into a right compression matrix, and each data structure in the compression matrix represents non-0 data in the left matrix and coordinates of the non-0 data in the matrix.
In the sparse matrix, the values of the elements in the matrix are 0, and the 0 elements do not need to be stored, so the storage capacity of the matrix can be effectively reduced by adopting the compression method. Fig. 1c is a schematic diagram illustrating an example of compressing a matrix by using the above compression method. For a 16x16 sparse matrix, only a, b, c and d are elements other than 0, and after compressed storage, only the values and coordinates of the elements need to be stored, thereby saving storage space.
When performing the matrix operation of M1xM2, the matrix after compression is used as the matrix used for actual access. However, the above technical solutions have the following disadvantages: 1. when matrix operation is carried out, the utilization rate of data is low, and usually only independent operation units can be used for calculating single data; 2. according to the data coordinates of the compression matrix, the calculation of the access address is complex, and the performance is influenced.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to solve the above technical problems in the prior art, the embodiment of the present disclosure provides the following technical solutions:
in a first aspect, an embodiment of the present disclosure provides a matrix calculation circuit, including:
the first data reading circuit is used for reading and caching first data in the first matrix and bitmap data in the bitmap matrix; outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data; the first matrix is a matrix formed by non-0 data in a data matrix, and the bitmap data in the bitmap matrix and the data in the data matrix are in one-to-one correspondence in position;
the second data reading circuit is used for reading and caching second data in the second matrix; outputting at least one of the second data according to the location information;
and the calculation circuit is used for performing calculation on the first data and the second data to obtain third data.
Further, the first data reading circuit further includes:
the device comprises a first data cache circuit, a bitmap matrix cache circuit, a first data sorting circuit and a first control circuit;
the first control circuit is used for generating a first data reading address according to a first address of the first matrix; generating a position information reading address according to the first address of the bitmap matrix;
the first data cache circuit is used for caching a plurality of first data read out according to the first data reading address;
the bitmap matrix cache circuit is used for caching the bitmap data read out according to the position information reading address;
and the first data sorting circuit is used for re-sorting the first data according to the bitmap data in a column-by-column manner in a position one-to-one correspondence manner, wherein the re-sorting result indicates that the data in the same row in the data matrix is still in the same row and the data in different rows are still not in the same row, and if two adjacent first data are in the same row in the sorting process, 0 is supplemented to other rows in the previous column.
Further, the bitmap matrix buffer circuit is further configured to:
transmitting position information indicated by the bitmap data corresponding to the first data to the second data reading circuit.
Further, the second data reading circuit further includes:
a second data buffer circuit and a second control circuit;
the second control circuit is used for generating a second data reading address according to the first address of the second matrix;
the second data buffer circuit is used for buffering second data read out according to the second data reading address.
Further, the second data reading circuit further includes:
and a switch circuit for controlling output of the second data in the second data buffer circuit according to the position information indicated by the bitmap data corresponding to the first data.
Further, the switch circuit is configured to control a plurality of second data outputs in the second data buffer circuit according to the position information indicated by the bitmap data corresponding to the first data, and includes:
the switch circuit controls to output at least one row of the second data corresponding to the column information in the second data buffer circuit according to the column information of the position information indicated by the bitmap data corresponding to the first data.
Further, the computation circuit includes:
a computing unit array, wherein the computing unit array comprises a plurality of computing units;
a row of the computing units in the computing unit array receives a row of the second data;
a column of compute units in the array of compute units receives a column of first data in the first data.
Further, the calculating circuit is configured to calculate third data according to the first data and the second data, and includes:
the calculation circuit receives the reordered column of first data output by the first data sorting circuit; receiving at least one row of second data corresponding to the column of first data output by the switch circuit; and calculating to obtain third data according to the column of first data and the at least one row of second data.
In a second aspect, an embodiment of the present disclosure provides a matrix calculation method, including:
reading and caching first data in a first matrix and bitmap data in a bitmap matrix, wherein the first matrix is a matrix formed by non-0 data in a data matrix, and the bitmap data in the bitmap matrix and the data in the data matrix are in one-to-one correspondence in position;
outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data;
reading and caching second data in the second matrix;
outputting at least one of the second data according to the location information;
and performing calculation on the first data and the second data to obtain third data.
Further, the reading and buffering the first data in the first matrix and the bitmap data in the bitmap matrix includes:
generating a first data reading address according to the first address of the first matrix;
generating a position information reading address according to the first address of the bitmap matrix;
caching a plurality of first data read out according to the first data reading address;
caching the bitmap data read out according to the position information reading address;
and reordering the first data according to the rows in a position one-to-one correspondence mode according to the bitmap data, wherein the reordering result is that the data in the same row in the data matrix are still in the same row, the data in different rows are still not in the same row, and if two adjacent first data are in the same row in the ordering process, 0 is supplemented to other rows in the previous column.
Further, the method further comprises:
transmitting position information indicated by the bitmap data corresponding to the first data to the second data reading circuit.
Further, the reading and buffering the second data in the second matrix includes:
generating a second data reading address according to the first address of the second matrix;
and caching the second data read according to the second data reading address.
Further, the outputting at least one piece of the second data according to the position information includes:
output of second data in the second data buffer circuit is controlled in accordance with position information indicated by the bitmap data corresponding to the first data.
Further, the controlling of the plurality of second data outputs in the second data buffer circuit according to the position information indicated by the bitmap data corresponding to the first data includes:
and controlling to output at least one row of the second data corresponding to the column information in the second data buffer circuit according to the column information of the position information indicated by the bitmap data corresponding to the first data.
Further, the performing the calculation on the first data and the second data to obtain third data includes:
receiving the reordered column of first data; receiving at least one row of second data corresponding to the column of first data; and calculating to obtain third data according to the column of first data and the at least one row of second data.
In a third aspect, an embodiment of the present disclosure further provides a processing core, where the processing core includes at least one matrix calculation circuit in the first aspect, a decoding unit, and a storage device.
In a fourth aspect, an embodiment of the present disclosure further provides a chip, where the chip includes at least one processing core in the third aspect.
In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions, such that the processors when executed implement the matrix computation method of any of the preceding first aspects.
In a sixth aspect, the present disclosure provides a non-transitory computer-readable storage medium, which stores computer instructions for causing a computer to execute the matrix calculation method according to any one of the foregoing first aspects.
In a seventh aspect, an embodiment of the present disclosure provides a computer program product, wherein: comprising computer instructions which, when executed by a computing device, may perform the matrix calculation method of any of the preceding first aspects.
In an eighth aspect, the embodiments of the present disclosure provide a computing device, which includes one or more chips described in the fourth aspect.
The embodiment of the disclosure discloses a matrix calculation circuit, a matrix calculation method, electronic equipment and a computer-readable storage medium. Wherein the matrix calculation circuit includes: the first data reading circuit is used for reading and caching first data in the first matrix and bitmap data in the bitmap matrix; outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data; the second data reading circuit is used for reading and caching second data in the second matrix; outputting at least one of the second data according to the location information; and the calculation circuit is used for performing calculation on the first data and the second data to obtain third data. The matrix calculation circuit controls the output of the plurality of second data through the read position information of the plurality of first data, and solves the technical problems that only single data calculation and complex access address calculation can be performed during matrix calculation in the prior art.
The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
FIGS. 1a-1c are schematic diagrams of the prior art of the present disclosure;
fig. 2 is a schematic structural diagram of a matrix calculation circuit provided in an embodiment of the present disclosure;
3 a-3 b are schematic diagrams of the generation of the first matrix and the bitmap matrix;
fig. 4 is a schematic structural diagram of a first data reading circuit according to an embodiment of the disclosure;
FIGS. 5a-5b are schematic diagrams of an example of reordering of a first data read circuit according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a second data reading circuit according to an embodiment of the disclosure;
FIGS. 7a-7e are schematic diagrams of an example application of an embodiment of the present disclosure;
fig. 8 is a flowchart of a matrix calculation method provided in the embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 2 is a schematic diagram of a matrix calculation circuit provided in an embodiment of the present disclosure. The matrix calculation circuit (EU)200 provided in the present embodiment includes:
a first data reading circuit (LD _ M1)201 for reading and buffering first data in the first matrix and bitmap data in the bitmap matrix; outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data; the first matrix is a matrix formed by non-0 data in a data matrix, and the bitmap data in the bitmap matrix and the data in the data matrix are in one-to-one correspondence in position;
a second data read circuit (LD _ M2)202 for reading and buffering second data in a second matrix; outputting at least one of the second data according to the location information;
a calculation circuit 203 for performing a calculation on the first data and the second data to obtain third data.
Illustratively, the first data reading circuit reads and buffers first data in the first matrix according to a read address of the first data, and the read address of the first data is generated according to a storage head address of the first matrix; and the second data reading circuit reads and buffers the second data in the second matrix according to the reading address of the second data, and the reading address of the second data is generated according to the storage head address of the second matrix. The storage head address of the first matrix and the storage head address of the second matrix are obtained through an instruction decoding circuit ID (instruction decoder), and the instruction decoding circuit is used for decoding a matrix calculation instruction to obtain the storage head address of the first matrix, the storage head address of the second matrix, the storage head address of the bitmap matrix, the size of the first matrix and the size of the second matrix.
Illustratively, the matrix calculation instruction includes an instruction type, a storage head address of the first matrix, a storage head address of the second matrix, a storage head address of the bitmap matrix, and parameters such as a size of the first matrix, a size of the second matrix, and a size of the bitmap matrix. In one embodiment, the instruction type is a multiplication instruction of a matrix, the first matrix is a compression matrix of a data matrix in the neural network convolution calculation, and the second matrix is a parameter matrix in the neural network convolution calculation; wherein the data matrix and/or the second matrix is a sparse matrix having a large number of elements with values of 0. It is understood that the memory head address of the matrix and the size parameter of the matrix (such as the number of rows and columns of the matrix) in the matrix calculation instruction may be represented in the form of register addresses, and the instruction decoding circuit acquires corresponding data from the corresponding register addresses.
In the embodiment of the present disclosure, the first data reading circuit 201 receives the first address of the first matrix decoded by the instruction decoding circuit, and generates a reading address of the first data according to the first address; optionally, a plurality of first data in the first matrix are read at one time according to the read address of the first data; the first data reading circuit 201 receives the first address of the bitmap matrix decoded by the instruction decoding circuit, and generates a reading address of bitmap data according to the first address; optionally, the bitmap data in the bitmap matrix is read at one time according to the read address of the bitmap data.
For example, the maximum number of first data read at a time is preset to be first data in K rows of data corresponding to a data matrix, and the first data reading circuit generates a reading address of the first data according to a head address and K of the first matrix, and reads and caches a plurality of first data corresponding to the K rows of data from the first matrix at a time; similarly, the first data reading circuit reads and caches a plurality of bitmap data corresponding to the K columns of data from the bitmap matrix at a time according to the head address of the bitmap matrix and the read address of the K generation position information; the first data reading circuit further outputs at least one of the first data and position information indicated by bitmap data corresponding to the first data after obtaining the first data and the bitmap data.
FIG. 3a is a diagram of a data matrix M1_ O, a first matrix M1, and a bitmap matrix M1_ map according to an embodiment of the disclosure. As shown in fig. 3, the data matrix M1_ O is a sparse matrix of M × K, and M1 is a compressed matrix of M1_ O, which stores only non-0 data in M1_ O, and which may be stored in order of column-first storage or row-first storage. In the following description, the description is made in the order of column-first storage. The bitmap matrix M1_ map is the same size as M1_ O, i.e., both have the same number of rows and columns; however, each data in the bitmap matrix has only 1bit, and each data in the bitmap matrix has a one-to-one correspondence with each data in M1_ O at the matrix position, if the data in M1_ O is 0, the corresponding data in M1_ map is also 0, and if the data in M1_ O is not 0, the corresponding data in M1_ map is 1. The bitmap matrix and the first matrix may be generated before performing the matrix operation, and are not described herein again.
FIG. 3b is an exemplary diagram of the data matrix M1_ O, the first matrix M1, and the bitmap matrix M1_ map in the embodiment of the disclosure. As shown in fig. 3b, the data matrix M1_ O is a 2 × 4 sparse matrix, and the compressed matrix M1 only includes the non-0 data in M1_ O and is stored in the order of priority of the columns in the data matrix, so the first data in M1 is the non-0 data 1 in the first column, the second data is the non-0 data 2 in the second column, the third data is the non-0 data 3 in the third column, and the fourth data is the non-0 data 4 in the fourth column. And the bitmap matrix M1_ map of M1_ O has the same size as M1_ O, but each data is only 1bit, and only 0 or 1 is used for representing, wherein the data in M1_ map corresponding to data 1,2, 3 and 4 in M1_ O is 1, and other data are all 0. Thus, the data matrix M1_ O originally needs 1byte for storing one data, i.e. 8 bits, then 64 bits are needed for 2 × 4M 1_ O, and M1_ O is expressed by M1 and M1_ map, then only 4 data in M1 need 4 × 8 bits-32 bits, and each data in M1_ map only occupies 1bit, which needs 8 bits, and only 40 bits in total. Typically, the data matrix is much larger than the above example, and thus a significant saving in memory space can be achieved.
In the embodiment of the present disclosure, the second data reading circuit 202 receives the first address of the second matrix decoded by the instruction decoding circuit, and generates a reading address of the second data according to the first address; and reading a plurality of second data in the second matrix at one time according to the reading address of the second data. For example, the maximum number of the second data read at one time is preset to be K rows, and for example, if the second matrix is not a compression matrix, the K rows are K rows in the second matrix; and the second data reading circuit generates a reading address of second data according to the first address of the second matrix and K, and reads and buffers K rows of second data from the second matrix at one time. And then, controlling the output of the second data according to the received position information of the first data so as to output all or part of the second data.
In the embodiment of the present disclosure, the calculation circuit receives the first data transmitted from the first data reading circuit and the second data transmitted from the second data reading circuit, and calculates to obtain third data, where the third data is one or more.
As shown in fig. 4, in order to implement the function of the first data reading circuit, optionally, the first data reading circuit further includes:
a first data buffer circuit 401, a bitmap matrix buffer circuit 402, a first data sorting circuit 403, and a first control circuit 404;
the first control circuit 404 is configured to generate a first data read address according to a first address of the first matrix; generating a position information reading address according to the first address of the bitmap matrix;
the first data buffer circuit 401 is configured to buffer a plurality of first data read according to the first data read address;
the bitmap matrix buffer circuit 402 is configured to buffer the bitmap data read out according to the position information read address;
the first data sorting circuit 403 is configured to reorder, according to the bitmap data, the first data in rows in a one-to-one position-to-one correspondence manner, where the result of reordering is that data in the same row in the data matrix is still in the same row, and data in different rows are still not in the same row, and if two adjacent first data are in the same row in the process of sorting, 0 is supplemented to other rows in the previous column.
Optionally, the first control circuit 404 receives a first address of the bitmap matrix decoded by the instruction decoding circuit and a preset parameter K. Optionally, the first control circuit includes a first read control circuit CL1 and a first address generating circuit AG1, where the first read control circuit CL1 receives a first address of the bitmap matrix decoded by the instruction decoding circuit and a preset parameter K, and controls the AG1 to generate a first data read address Addr0, so that the first data read circuit can read bitmap data indicating positions of data in the data matrix in the bitmap matrix at a time according to Addr 0.
Optionally, the first control circuit 404 receives a first address of the first matrix decoded by the instruction decoding circuit and the number of the non-0 bitmap data in the K columns of bitmap data. Optionally, the first control circuit includes a first read control circuit CL1 and a first address generating circuit AG1, where the first read control circuit CL1 receives the first address of the first matrix decoded by the instruction decoding circuit and the number of the non-0 bitmap data in the K columns of bitmap data, and controls the AG1 to generate a first data read address Addr1, so that the first data read circuit can read the K columns of first data in the first matrix at a time according to the Addr 1. Optionally, the first control circuit 404 may further read the first data while reading the K columns of bitmap data, where the first data may be read by a preset number of reads; further, in order to prevent the number of the read first data from being smaller than the number of the non-0 data in the K columns of bitmap data, it may be set that M × K first data are read at a time, where M is the number of rows of the data matrix, so that M × K is not smaller than the number of the non-0 data in the K columns of bitmap data. In this case, the first read control circuit CL1 receives the first address of the first matrix decoded by the instruction decoding circuit and the set parameter M × K, and controls the AG1 to generate the first data read address Addr1, so that the first data read circuit can read M × K first data in the first matrix at a time according to the Addr 1.
Optionally, the first data buffer circuit 401 further includes a first memory or a first storage area DB11 for buffering first data, which is buffered in the DB11 after being read out from the first matrix.
Optionally, the bitmap matrix buffer circuit 402 further includes a second memory or a second storage area DB10 for buffering first data, and the bitmap data is buffered in the DB10 after being read out from the bitmap matrix.
Optionally, the first data sorting circuit 403 further includes a reordering first data buffer circuit DRDB. The DRDB is used for caching the reordered first data. Optionally, for example, the reordering is performed in a column-first order, that is, the reordering is performed in a Y coordinate from small to large, and then the reordering is performed in a X coordinate from small to large in order to ensure that the first data in the same row and the first data not in the same row in the data matrix are still in the same row, and the first data after the reordering is buffered in the DRDB, and some rows in some columns after the ordering may lack data, and then 0 is filled in these positions. FIG. 5a is a schematic diagram of an example of reordering, and as shown in FIG. 5a, the data matrix M1_ O is a sparse matrix, the first matrix is a compressed matrix M1 of the data matrix, the M1 stores non-0 data 1,2, 3 and 4 in the M1_ O in a column-first order, and the M1_ map is a bitmap matrix of the data matrix; the first data reading circuit reads out the data of the 4-column bitmap matrix at a time and stores the data in the DB 10; there are 4 "1" s in DB10 in total, the first data reading circuit reads 4 first data 1,2, 3, and 4 in the first matrix at a time. Then reordering the acquired 4 first data, traversing the non-0 data in the bitmap data in a column-first order according to the positions of the non-0 data in the bitmap data, wherein the position of the first 1 is in the 0 th row, the first data 1 corresponding to the first 1 is located at the (0,0) position in the DRDB, the position of the second 1 is in the 1 st row, the first data 2 corresponding to the second 1 is located at the (1,0) position in the DRDB, the position of the third 1 is in the 0 th row, the first data 3 corresponding to the third 1 is located at the (0,1) position in the DRDB, the position of the fourth 1 is in the 1 st row, and the first data 4 corresponding to the fourth 1 is located at the (1,1) position in the DRDB, so that in the data matrix, the 1 and 3 in the same row are still in the same row after reordering, and the 2 and 4 in the same row are still in the same row after reordering, and 1 and 3 are located in different rows than 2 and 4.
FIG. 5b is a schematic diagram of another example of reordering. As shown in fig. 5b, in the process of rank ordering, rank 0 is first ranked, row 0 of column 0 of M1_ map is row 0, so 1 in M1 is stored in the location of (0,0) in DRDB, column 0 of M1_ map is row 1, so (0,1) location of DRDB temporarily does not store data; column 1 of M1_ map is all 0, skip no sort; column 2, row 0,1 of M1_ map, then store 2 in M1 in the next position (0,1) of row 0 in DRDB, at which time row 0 is complemented by row 1 of column 0 because row 0 stores two first data 1 and 2 in series; column 2, row 1, behavior 0 of M1_ map, so the (1,1) position of DRDB temporarily holds no data; column 3, row 0 of M1_ map 1, so 3 in M1 is stored in (0,2) position in DRDB, at which time row 0 is complemented by row 1 of column 1 because row 0 is stored with two first data 2 and 3 in the same row consecutively; column 3, line 1 of M1_ map, therefore, stores 3 in M1 in the (1,2) position in DRDB, and gets the first data in DRDB that has been reordered by this time.
After reordering, the first data reading circuit outputs the bitmap data DO0 and the first data DO 1. Wherein DO1 is some or all of the plurality of first data, the bitmap data DO0 is bitmap data corresponding to the D01. Optionally, the bitmap matrix buffer circuit is further configured to: transmitting position information indicated by the bitmap data corresponding to the first data to the second data reading circuit. Optionally, the position information indicated by the bitmap data is column coordinates of the bitmap data in the bitmap matrix.
As shown in fig. 6, in order to implement the function of the second data reading circuit, optionally, the second data reading circuit further includes:
a second data buffer circuit 601 and a second control circuit 602;
the second control circuit 602 is configured to generate a second data read address according to a first address of the second matrix;
the second data buffer circuit 601 is configured to buffer the second data read according to the second data read address.
Optionally, the second control circuit 602 receives a first address of the second matrix decoded by the instruction decoding circuit, a preset parameter K, and a size parameter of the second matrix, for example, the second matrix includes N rows of second data. Optionally, the second control circuit includes a second read control circuit CL2 and a second address generating circuit AG2, where the second read control circuit CL2 receives the first address of the second matrix decoded by the instruction decoding circuit, a preset parameter K, a size parameter of the second matrix, and the like, and controls the AG2 to generate a second data reading address Addr2, so that the second data reading circuit can read K rows of second data in the second matrix at a time according to the Addr 2.
Optionally, the second data buffer circuit 601 includes a second data memory or a second data storage area, the size of which is the size of K rows of second data, and the read second data is buffered in the second data memory or the second data storage area row by row according to the position of the second data in the second matrix.
The second data reading circuit outputs all or part of the read second data according to the position information indicated by the bitmap data. Optionally, the second data reading circuit further includes:
a switch circuit 603 for controlling output of the second data in the second data buffer circuit according to the position information indicated by the bitmap data corresponding to the first data.
Optionally, the switch circuit 603 includes a switch control circuit SC and a switch array SW, wherein the switch control circuit is configured to receive the position information indicated by the bitmap data to generate a switch signal of the switch array, and the switch array SW controls a switch corresponding to the switch signal to be opened after receiving the switch signal to output corresponding second data.
Optionally, the column information is included in the position information, and the column information represents column coordinates of data in a data matrix corresponding to the first data in the data matrix, and the switch circuit controls to output at least one row of the second data corresponding to the column information in the second data buffer circuit according to the column information of the position information indicated by the bitmap data corresponding to the first data. Specifically, after receiving the position information, the switch control circuit SC obtains column information therein, generates row switch information corresponding to the column information, and turns on the switch circuit, thereby outputting the second data corresponding to the row switch information, where the output second data is one or more rows of second data buffered in the second data reading circuit.
As shown in fig. 2, the calculation circuit 203 includes:
a computing unit array PUA including a plurality of computing units PU1,1,PU1,2,……PUM,N
A row of the computing units in the computing unit array receives a row of the second data;
a column of compute units in the array of compute units receives a column of first data in the first data.
Optionally, the calculating circuit 203 receives a reordered column of first data output by the first data sorting circuit; receiving at least one row of second data corresponding to the column of first data output by the switch circuit; and calculating to obtain third data according to the column of first data and the at least one row of second data.
Specifically, one of the reordered first data in the column output by the first data sorting circuit is output to a row of computing units in the computing circuit, and if the first data in the column includes two first data, the first data in the 0 th row is output to each computing unit in the 0 th row of computing units, and the first data in the 1 st row is output to each computing unit in the 1 st row of computing units; one or more rows of second data output by the switching circuit and corresponding to a column of first data output by the first data sorting circuit; if the column of first data includes 1 first data, the second data selectively outputted by the switch circuit is 1 row of second data, the row number of the second data outputted by the switch circuit is related to the column information of the first data outputted by the first data sorting circuit, and several different column information switch circuits correspondingly output several rows of second data. Therefore, the calculation units participating in the calculation all obtain two data inputs, namely a first data and a second data, the calculation units execute calculation operation according to the calculation type specified by the type of the calculation instruction to obtain the calculation results of the first data and the second data as third data, and the plurality of calculation units obtain and output a plurality of third data. And circulating the calculation process, and accumulating the calculation result by each calculation unit until all the first data and the second data are read to obtain an output matrix, wherein the value of each element in the output matrix is the accumulation result of the calculation unit participating in the calculation.
Fig. 7a to 7e are examples of the calculation process of the matrix calculation circuit in the above embodiment. As shown in FIG. 7a, for the matrix multiplication required by the matrix calculation circuit, M1_ O is the data matrix, M2 is the second matrix, and M is the third matrix M obtained by multiplying M1_ O by M2 matrix.
Wherein, M1_ O is stored in the form of compressed matrix, as shown in fig. 7b, M1_0 is compressed to generate the first matrix M1 and stored. Let K be 4, i.e. during the calculation, read K columns of bitmap data of bitmap matrix M _ map each time, read a number of first data in M1 corresponding to 4 columns of data in data matrix M1_ O each time, read 4 rows of second data in the second matrix each time, for the example, read and buffer all data in M1 and M2 at once. Then as shown in fig. 7b, the first data reading circuit of the matrix calculation circuit reads the first data in the entire first matrix M1 to the first data buffer circuit at a time, and reads the entire bitmap matrix to the bitmap data buffer circuit at a time; the reordering by the first data sorting circuit results in the storage order in the DRDB as shown in fig. 7 b.
Fig. 7c is an overall schematic diagram of matrix calculation using the matrix calculation circuit. Reading 4 columns of bitmap data of the M1_ map by using a unit of K-4 columns, reading a plurality of first data in the M1 corresponding to 4 columns of data of the M1_ O, and reading and buffering the whole M1_ map and the whole M1 into the first data reading circuit LD _ M1 at a time because the total columns of the data matrix M1_ O are 4; the data is read and then reordered, and the reordered first data is stored in the DRDB of the LD _ M1. The 4 lines of data of M2 are read in units of K-4 lines and buffered in the second data read circuit LD _ M2, and since the total number of lines of M2 is 4 in this example, the entire M2 is read and buffered in LD _ M2 at a time. And then outputting an output matrix M of 4 x 4 through the calculation of 4 calculation units in the calculation array, wherein each element in M corresponds to the accumulated value of the output data of one calculation unit.
Fig. 7d is a schematic diagram of the first calculation. The calculating circuit obtains a first column of first data from the DRDB of the LD _ M1, wherein the first column of first data comprises a 1 of the 0 th row and a 2 of the 1 st row, wherein the 1 of the 0 th row is input to the 0 th row calculating unit PU in the calculating circuit0,0And PU0,1Performing the following steps; line 12 inputs to a line 1 calculation unit PU in the calculation circuit1,0And PU1,1Performing the following steps; LD _ M1 sends column coordinates 0 and 1 of bitmap data corresponding to the first column of first data buffered in DB10 to LD _ M2, and the switch circuit of LD _ M2 selects and outputs the second data of line 0 and line 1 corresponding to the column coordinates of the first data buffered in LD _ M2 according to the column coordinates 0 and 1, wherein the second data of line 0 is input to the calculation unit PU of line 00,0And PU0,1Line 0 second data comprises 1 and 2, wherein second data 1 is input to the calculation unit PU0,0Second data 2 are input to the calculation unit PU0,1Performing the following steps; wherein the second data of line 1 are input to the calculation unit PU of line 11,0And PU1,1Line 1 second data comprises 1 and 2, wherein second data 1 is input to the calculation unit PU1,0Second data 2 are input to the calculation unit PU1,1In (1). Then each computing unit independently carries out multiply-accumulate computation to respectively obtain the PU0,0Calculated result of (1, PU)0,1Calculated result of (2), PU1,0 Calculated result 2 and PU 1,14, the calculation result of (a); since the first data and the second data have not been calculated yet, the resulting third data is the intermediate data M _ temp.
Fig. 7e is a schematic diagram of the second calculation. The calculating circuit obtains the second column of the first data from the DRDB of the LD _ M1, wherein the second column of the first data comprises 3 of the 0 th row and 4 of the 1 st row, wherein the 3 of the 0 th row is input to the 0 th row calculating unit PU in the calculating circuit0,0And PU0,1Performing the following steps; line 1 4 inputs to a line 1 calculation unit PU in the calculation circuit1,0And PU1,1Performing the following steps; LD _ M1 buffered in DB10The column coordinates 2 and 3 of the first data of the second column are sent to LD _ M2, the switch circuit of LD _ M2 selects and outputs the second data of the 2 nd row and the 3 rd row corresponding to the column coordinates of the first data buffered in LD _ M2 according to the column coordinates 2 and 3, wherein the second data of the 2 nd row are respectively input into the corresponding 0 th row calculation unit PU0,0And PU0,1Line 0 second data comprises 1 and 2, wherein second data 1 is input to the calculation unit PU0,0Second data 2 are input to the calculation unit PU0,1Performing the following steps; wherein the second data of line 3 is input to the calculation unit PU of line 11,0And PU1,1Line 3 second data comprises 1 and 2, wherein second data 1 is input to the calculation unit PU1,0Second data 2 are input to the calculation unit PU1,1In (1). Then each computing unit independently carries out multiply-accumulate computation to respectively obtain the PU0,0Calculated result of (4), PU0,1Calculated result of (8, PU)1,0 Calculated result 6 and PU 1,112; since the first data and the second data are calculated, the obtained third data are values of elements in the output matrix M.
It can be seen from the calculation process of the above example that the matrix calculation circuit in the present disclosure is used to perform matrix multiplication, and only two calculations are needed to complete the multiplication of one 2 x 4 matrix and one 4 x 2 matrix, thereby greatly increasing the calculation speed and saving the calculation time.
By the technical scheme, the compressed sparse matrix is directly calculated, so that the storage space is effectively saved, and the data bandwidth is saved; by using the computing unit array, all the computing units synchronously process data, the data utilization rate is greatly improved, and a plurality of computing units can share the same data; the compressed sparse matrix is directly calculated, and calculation of some 0 elements is skipped, so that the operation speed is increased, and the effective calculation capacity of the chip is improved.
Fig. 8 is a flowchart of a matrix calculation method provided in the embodiment of the present disclosure. As shown in fig. 8, the method includes the steps of:
step S801, reading and caching first data in a first matrix and bitmap data in a bitmap matrix, wherein the first matrix is a matrix formed by non-0 data in a data matrix, and the bitmap data in the bitmap matrix corresponds to the data in the data matrix in position one to one;
step S802, outputting at least one first data and position information indicated by bitmap data corresponding to the first data;
step S803, reading and caching the second data in the second matrix;
step S804, outputting at least one second data according to the position information;
step S805, performing calculation on the first data and the second data to obtain third data.
Further, the reading and buffering the first data in the first matrix and the bitmap data in the bitmap matrix includes:
generating a first data reading address according to the first address of the first matrix;
generating a position information reading address according to the first address of the bitmap matrix;
caching a plurality of first data read out according to the first data reading address;
caching the bitmap data read out according to the position information reading address;
and reordering the first data according to the rows in a position one-to-one correspondence mode according to the bitmap data, wherein the reordering result is that the data in the same row in the data matrix are still in the same row, the data in different rows are still not in the same row, and if two adjacent first data are in the same row in the ordering process, 0 is supplemented to other rows in the previous column.
Further, the method further comprises:
transmitting position information indicated by the bitmap data corresponding to the first data to the second data reading circuit.
Further, the reading and buffering the second data in the second matrix includes:
generating a second data reading address according to the first address of the second matrix;
and caching the second data read according to the second data reading address.
Further, the outputting at least one piece of the second data according to the position information includes:
output of second data in the second data buffer circuit is controlled in accordance with position information indicated by the bitmap data corresponding to the first data.
Further, the controlling of the plurality of second data outputs in the second data buffer circuit according to the position information indicated by the bitmap data corresponding to the first data includes:
and controlling to output at least one row of the second data corresponding to the column information in the second data buffer circuit according to the column information of the position information indicated by the bitmap data corresponding to the first data.
Further, the performing the calculation on the first data and the second data to obtain third data includes:
receiving the reordered column of first data; receiving at least one row of second data corresponding to the column of first data; and calculating to obtain third data according to the column of first data and the at least one row of second data.
In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, other steps may also be added by those skilled in the art, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.
The embodiments of the present disclosure further provide a processing core, where the processing core includes at least one matrix calculation circuit in any of the above embodiments, a decoding unit, and a storage device.
The embodiment of the present disclosure further provides a chip, where the chip includes at least one processing core in any one of the above embodiments.
An embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions, such that the processors when executed perform the matrix computation method of any of the embodiments.
The disclosed embodiments also provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to execute the matrix calculation method described in any one of the foregoing embodiments.
The embodiment of the present disclosure further provides a computer program product, wherein: comprising computer instructions which, when executed by a computing device, may perform the matrix calculation method of any of the preceding embodiments.
The embodiment of the present disclosure further provides a computing device, which includes the chip in any one of the embodiments.
The flowchart and block diagrams in the figures of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Claims (10)

1. A matrix computation circuit, comprising:
the first data reading circuit is used for reading and caching first data in the first matrix and bitmap data in the bitmap matrix; outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data; the first matrix is a matrix formed by non-0 data in a data matrix, and the bitmap data in the bitmap matrix and the data in the data matrix are in one-to-one correspondence in position;
the second data reading circuit is used for reading and caching second data in the second matrix; outputting at least one of the second data according to the location information;
and the calculation circuit is used for performing calculation on the first data and the second data to obtain third data.
2. The matrix calculation circuit of claim 1 wherein the first data reading circuit further comprises:
the device comprises a first data cache circuit, a bitmap matrix cache circuit, a first data sorting circuit and a first control circuit;
the first control circuit is used for generating a first data reading address according to a first address of the first matrix; generating a position information reading address according to the first address of the bitmap matrix;
the first data cache circuit is used for caching a plurality of first data read out according to the first data reading address;
the bitmap matrix cache circuit is used for caching the bitmap data read out according to the position information reading address;
and the first data sorting circuit is used for re-sorting the first data according to the bitmap data in a column-by-column manner in a position one-to-one correspondence manner, wherein the re-sorting result indicates that the data in the same row in the data matrix is still in the same row and the data in different rows are still not in the same row, and if two adjacent first data are in the same row in the sorting process, 0 is supplemented to other rows in the previous column.
3. The matrix computation circuitry of claim 2, wherein the bitmap matrix buffer circuitry is further to:
transmitting position information indicated by the bitmap data corresponding to the first data to the second data reading circuit.
4. The matrix computation circuit of any of claims 1-3, wherein the second data reading circuit further comprises:
a second data buffer circuit and a second control circuit;
the second control circuit is used for generating a second data reading address according to the first address of the second matrix;
the second data buffer circuit is used for buffering second data read out according to the second data reading address.
5. The matrix calculation circuit of claim 4, wherein the second data reading circuit further comprises:
and a switch circuit for controlling output of the second data in the second data buffer circuit according to the position information indicated by the bitmap data corresponding to the first data.
6. The matrix calculation circuit according to claim 5, wherein the switch circuit for controlling a plurality of second data outputs in the second data buffer circuit in accordance with the position information indicated by the bitmap data corresponding to the first data comprises:
the switch circuit controls to output at least one row of the second data corresponding to the column information in the second data buffer circuit according to the column information of the position information indicated by the bitmap data corresponding to the first data.
7. The matrix computation circuit of any of claims 1-6, wherein the computation circuit comprises:
a computing unit array, wherein the computing unit array comprises a plurality of computing units;
a row of the computing units in the computing unit array receives a row of the second data;
a column of compute units in the array of compute units receives a column of first data in the first data.
8. The matrix computation circuit of claim 4, wherein the computation circuit to compute third data from the first data and the second data comprises:
the calculation circuit receives the reordered column of first data output by the first data sorting circuit; receiving at least one row of second data corresponding to the column of first data output by the switch circuit; and calculating to obtain third data according to the column of first data and the at least one row of second data.
9. A matrix calculation method, comprising:
reading and caching first data in a first matrix and bitmap data in a bitmap matrix, wherein the first matrix is a matrix formed by non-0 data in a data matrix, and the bitmap data in the bitmap matrix and the data in the data matrix are in one-to-one correspondence in position;
outputting at least one of the first data and position information indicated by bitmap data corresponding to the first data;
reading and caching second data in the second matrix;
outputting at least one of the second data according to the location information;
and performing calculation on the first data and the second data to obtain third data.
10. A processing core comprising the matrix computation circuit of any of claims 1-8.
CN202010956493.1A 2020-09-11 2020-09-11 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium Pending CN114168897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010956493.1A CN114168897A (en) 2020-09-11 2020-09-11 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010956493.1A CN114168897A (en) 2020-09-11 2020-09-11 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN114168897A true CN114168897A (en) 2022-03-11

Family

ID=80476242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010956493.1A Pending CN114168897A (en) 2020-09-11 2020-09-11 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN114168897A (en)

Similar Documents

Publication Publication Date Title
CN108133270B (en) Convolutional neural network acceleration method and device
CN111199273B (en) Convolution calculation method, device, equipment and storage medium
CN109840585B (en) Sparse two-dimensional convolution-oriented operation method and system
US10169295B2 (en) Convolution operation device and method
CN111768458A (en) Sparse image processing method based on convolutional neural network
CN112668708A (en) Convolution operation device for improving data utilization rate
CN109726798B (en) Data processing method and device
CN113485750B (en) Data processing method and data processing device
CN110796229B (en) Device and method for realizing convolution operation
CN114780215A (en) Task scheduling method, device, equipment and storage medium
CN114168897A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN115828044B (en) Dual sparsity matrix multiplication circuit, method and device based on neural network
CN114168895A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN114168894A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN114168896A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN108737833B (en) Data buffering method, system, computer device and storage medium
CN101399978A (en) Reference frame data reading method in hardware decoder and apparatus thereof
US20160233850A1 (en) Parallel filtering method and corresponding apparatus
CN114077718A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN113836481B (en) Matrix computing circuit, method, electronic device, and computer-readable storage medium
CN114139107B (en) Pooling device and pooling method
CN114282158A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN113961871A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN112579971B (en) Matrix operation circuit, matrix operation device and matrix operation method
CN118296084B (en) Data processing apparatus, instruction synchronization method, electronic apparatus, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 201, No. 6 Fengtong Heng Street, Huangpu District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou Ximu Semiconductor Technology Co.,Ltd.

Address before: Building 202-24, No. 6, Courtyard 1, Gaolizhang Road, Haidian District, Beijing

Applicant before: Beijing SIMM Computing Technology Co.,Ltd.

Country or region before: China