CN116048456A - Matrix multiplier, method of matrix multiplication, and computing device - Google Patents

Matrix multiplier, method of matrix multiplication, and computing device Download PDF

Info

Publication number
CN116048456A
CN116048456A CN202310344718.1A CN202310344718A CN116048456A CN 116048456 A CN116048456 A CN 116048456A CN 202310344718 A CN202310344718 A CN 202310344718A CN 116048456 A CN116048456 A CN 116048456A
Authority
CN
China
Prior art keywords
target data
data
circuit
result
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310344718.1A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202310344718.1A priority Critical patent/CN116048456A/en
Publication of CN116048456A publication Critical patent/CN116048456A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

A matrix multiplier, a method of matrix multiplication, and a computing device, the matrix multiplier comprising: the device comprises a comparison circuit and a first operation circuit, wherein the comparison circuit is used for determining whether first target data and/or second target data are data in a first set, the first operation circuit is used for outputting a first result of multiplying the first target data and the second target data according to the fact that the first target data and/or the second target data are data in the first set, and the first set comprises: 0. 2 n N is an integer, and the first result includes: 0 or third data, wherein the third data is obtained by shifting the first target data or the second target data according to n pairs, or is obtained by shifting and inverting the first target data or the second target data according to n pairs. The matrix multiplier can save power consumption.

Description

Matrix multiplier, method of matrix multiplication, and computing device
Technical Field
The present application relates to the field of chip design, and more particularly, to a matrix multiplier, a method of matrix multiplication, and a computing device.
Background
Matrix multiplication (matric multiplication, MM) is one of the important mathematical operations in modern artificial intelligence related technologies such as neural networks, machine learning, etc. In one example, the operation of matrix multiplication may be performed by a matrix multiplier.
In the related art, since many multiplication operations and addition operations are involved in matrix multiplication, the matrix multiplier in the related art includes a conventional multiplication circuit in which a plurality of adders, shifters, and multipliers are required to be included. The conventional multiplication circuit has high power consumption due to the inclusion of a plurality of adders, shifters, and a plurality of multipliers when performing matrix multiplication operations.
Therefore, how to reduce the power consumption of the matrix multiplier is a technical problem to be solved.
Disclosure of Invention
The application provides a matrix multiplier, a matrix multiplication method and computing equipment, wherein the matrix multiplier can save power consumption.
In a first aspect, there is provided a matrix multiplier comprising: a comparison circuit, a first operation circuit.
A comparison circuit for determining whether first target data and/or second target data are data in a first set, wherein the first target data are data in a first matrix and the second target data are data in a second matrix, the first set comprising: 0. 2 n N is an integer;
a first operation circuit, configured to output a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data being data in the first set, where the first result includes: 0 or third data, wherein the third data is obtained by shifting the first target data or the second target data according to n pairs, or is obtained by shifting and inverting the first target data or the second target data according to n pairs.
The matrix multiplier can multiply the data in two matrices with respect to special data (e.g. 0 or + -2 n ) When multiplying the data, the corresponding special result can be directly output. In this way, the higher power consumption due to performing conventional multiplication operations can be reduced. And, since the matrix multiplier multiplies the data in two matrices, a large number of operations are involvedThe power consumption is saved, and the benefits are more considerable.
With reference to the first aspect, in certain implementation manners of the first aspect, the first operation circuit is specifically configured to output the first result as 0 according to the first target data as 0.
With reference to the first aspect, in certain implementation manners of the first aspect, the first operation circuit is specifically configured to determine that the first target data is 2 n The first result is the third data, which is obtained by shifting the second target data left or right by |n| bits.
With reference to the first aspect, in certain implementation manners of the first aspect, the first operation circuit is specifically configured to: according to the first target data is 2 n And n is a positive integer, the first result outputted is the third data, the third data is obtained by shifting the second target data left by n bits; or 2 according to the first target data n And n is a negative integer, the first result is the third data, and the third data is obtained by right shifting the second target data by |n| bits.
With reference to the first aspect, in certain implementation manners of the first aspect, the first operation circuit is specifically configured to be-2 according to the first target data n The first result is the third data, which is obtained by shifting left or right the second target data by |n| bits and inverting.
With reference to the first aspect, in certain implementation manners of the first aspect, the comparing circuit is further configured to determine a first operation code according to the first target data and/or the second target data as data in the first set, where a value of the first operation code indicates that the first target data and/or the second target data is 0 or ±2 n The method comprises the steps of carrying out a first treatment on the surface of the The first operation circuit is specifically configured to determine that the first result is 0 or the third data according to the value of the first operation code and the first target data and/or the second target data.
With reference to the first aspect, in certain implementations of the first aspect, the matrix multiplier further includes at least one first register, the at least one first register being connected to the comparison circuit and the first operation circuit, respectively, the at least one first register being configured to obtain the first target data and the second target data from the comparison circuit; the at least one first register is further configured to output the first target data and the second target data to the first operation circuit.
With reference to the first aspect, in certain implementation manners of the first aspect, the at least one first register is further configured to obtain the first operation code from the comparison circuit and output the first operation code to the first operation circuit.
With reference to the first aspect, in certain implementations of the first aspect, the matrix multiplier further includes at least one second register and a second operating circuit, the at least one second register being connected to the second operating circuit, the at least one second register being configured to output the acquired first target data and the second target data to the second operating circuit if the comparing circuit determines that neither the first target data nor the second target data is data in the first set; the second operation circuit is used for performing conventional multiplication operation on the received first target data and the second target data and outputting a second result of multiplying the first target data and the second target data.
With reference to the first aspect, in certain implementations of the first aspect, the matrix multiplier further includes a data selector MUX, the MUX being connected to the second operation circuit and the first operation circuit, respectively, and configured to take the first result or the second result as an output of the matrix multiplier.
With reference to the first aspect, in certain implementation manners of the first aspect, the comparing circuit is further configured to output an enable signal with a value of 1 to the at least one first register if the comparing circuit determines that the first target data and/or the second target data are data in the first set; the at least one first register is specifically configured to output the first target data and the second target data to the first operation circuit according to the enable signal with the value of 1.
With reference to the first aspect, in certain implementation manners of the first aspect, the matrix multiplier further includes an inverting circuit, where the inverting circuit is connected to the comparing circuit and the at least one second register, and the comparing circuit is further configured to output an enable signal with a value of 0 to the at least one first register if the comparing circuit determines that neither the first target data nor the second target data are data in the first set; the inverting circuit is used for inverting the enabling signal with the value of 0 output by the comparing circuit to obtain an enabling signal with the value of 1, and outputting the enabling signal with the value of 1 to the at least one second register; the at least one second register is specifically configured to output the acquired first target data and second target data to the second operation circuit based on the enable signal having the value of 1.
With reference to the first aspect, in certain implementations of the first aspect, the inverting circuit is an inverter.
With reference to the first aspect, in certain implementations of the first aspect, the MUX is specifically configured to: after receiving an enabling signal with a value of 1, taking the first result as the output of the matrix multiplier; or after receiving the enabling signal with the value of 0, taking the second result as the output of the matrix multiplier.
In a second aspect, there is provided a method of matrix multiplication applied to a matrix multiplier for performing a matrix multiplication operation on a first matrix and a second matrix, the method comprising: the comparison circuit determines whether first target data and/or second target data are data in a first set, wherein the first target data are data in the first matrix, the second target data are data in the second matrix, and the first set comprises: 0. + -2 n N is an integer; the first operation circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data as data in the first set, wherein the first result comprises: 0 or third data, wherein the third data is obtained by shifting the first target data or the second target data according to n pairs of Or the first target data or the second target data are shifted and inverted according to n pairs.
With reference to the second aspect, in some implementations of the second aspect, the first operation circuit outputs the first result as 0 according to the first target data as 0.
With reference to the second aspect, in some implementations of the second aspect, the first operating circuit is configured to determine that the first target data is 2 n The first result is the third data, which is obtained by shifting the second target data left or right by |n| bits.
With reference to the second aspect, in some implementations of the second aspect, the first operating circuit is configured to determine that the first target data is 2 n And n is a positive integer, the first result outputted is the third data, the third data is obtained by shifting the second target data left by n bits; or the first operation circuit is 2 according to the first target data n And n is a negative integer, the first result is the third data, and the third data is obtained by right shifting the second target data by |n| bits.
With reference to the second aspect, in some implementations of the second aspect, the first operating circuit is-2 according to the first target data n The first result is the third data, which is obtained by shifting left or right the second target data by |n| bits and inverting.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: the comparison circuit determines a first operation code according to the first target data and/or the second target data as the data in the first set, wherein the value of the first operation code indicates that the first target data and/or the second target data is 0 or + -2 n The method comprises the steps of carrying out a first treatment on the surface of the The first operation circuit determines the first result to be 0 or the third data according to the value of the first operation code and the first target data and/or the second target data.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: at least one first register acquires the first target data and the second target data from the comparison circuit, and the at least one first register is respectively connected with the comparison circuit and the first operation circuit; the at least one first register outputs the first target data and the second target data to the first operation circuit.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: the at least one first register obtains the first operation code from the comparison circuit and outputs the first operation code to the first operation circuit.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: at least one second register outputting the acquired first target data and second target data to a second operation circuit, the at least one second register being connected to the second operation circuit, in case the comparison circuit determines that neither the first target data nor the second target data is data in the first set; the second operation circuit performs a normal multiplication operation on the received first target data and the second target data, and outputs a second result of multiplying the first target data and the second target data.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: the data selector MUX takes the first result or the second result as the output of the matrix multiplier, and the MUX is respectively connected with the second operation circuit and the first operation circuit.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: the comparison circuit outputs an enabling signal with a value of 1 to the at least one first register when the comparison circuit determines that the first target data and/or the second target data are data in the first set; the at least one first register outputs the first target data and the second target data to the first operation circuit according to the enabling signal with the value of 1.
With reference to the second aspect, in certain implementations of the second aspect, the method further includes: the comparison circuit outputs an enable signal with a value of 0 to the at least one first register when the comparison circuit determines that the first target data and the second target data are not data in the first set; the inverting circuit performs inverting operation on the enable signal with the value of 0 output by the comparison circuit to obtain an enable signal with the value of 1, and outputs the enable signal with the value of 1 to the at least one second register, wherein the inverting circuit is respectively connected with the comparison circuit and the at least one second register; the at least one second register outputs the acquired first target data and second target data to the second operation circuit based on the enable signal with the value of 1.
With reference to the second aspect, in some implementations of the second aspect, the inverting circuit is an inverter.
With reference to the second aspect, in some implementations of the second aspect, the MUX takes the first result as an output of the matrix multiplier after receiving the enable signal with a value of 1; or the MUX takes the second result as the output of the matrix multiplier after receiving the enabling signal with the value of 0.
In a third aspect, a computing device is provided that includes at least one processor and at least one memory, and optionally, an input-output interface. Wherein the at least one processor is configured to control the input-output interface to send and receive information, the at least one memory is configured to store a computer program, and the at least one processor is configured to invoke and run the computer program from the at least one memory, so that the computing device performs the method of the second aspect or any of the possible implementations of the second aspect.
In the alternative, the at least one processor may be a general purpose processor, and may be implemented in hardware or in software. When implemented in hardware, the at least one processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the at least one processor may be a general-purpose processor implemented by reading software code stored in at least one memory, which may be integrated in the at least one processor, may be external to the at least one processor, and may exist separately.
In a fourth aspect, a chip is provided, the chip comprising a matrix multiplier as in the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, a chip is provided that fetches instructions and executes the instructions to implement the method of the second aspect and any implementation of the second aspect described above.
Optionally, as an implementation manner, the chip includes a processor and a data interface, where the processor reads instructions stored on a memory through the data interface, and performs the method in any implementation manner of the second aspect and the second aspect.
Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in any two implementation manners of the first aspect and the second aspect.
In a sixth aspect, there is provided a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method as in the second aspect and any implementation of the second aspect above.
In a seventh aspect, a computer readable storage medium is provided, comprising computer program instructions which, when executed by a computer, perform a method as in the second aspect and any implementation of the second aspect described above.
By way of example, these computer-readable stores include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), flash memory, electrically EPROM (EEPROM), and hard disk drive (hard drive).
Alternatively, as an implementation manner, the storage medium may be a nonvolatile storage medium.
Drawings
Fig. 1 is a schematic block diagram of a matrix multiplier 100 provided in an embodiment of the present application.
Fig. 2 is a schematic block diagram of another matrix multiplier 200 provided in an embodiment of the present application.
Fig. 3 is a schematic block diagram of a matrix multiplication method provided in an embodiment of the present application.
Fig. 4 is a schematic block diagram of a matrix multiplication apparatus 400 provided in an embodiment of the present application.
Fig. 5 is a schematic architecture diagram of a computing device 1500 provided in an embodiment of the present application.
Detailed Description
The technical solutions in the present application will be described below with reference to the accompanying drawings.
The present application will present various aspects, embodiments, or features about a system comprising a plurality of devices, components, modules, etc. It is to be understood and appreciated that the various systems may include additional devices, components, modules, etc. and/or may not include all of the devices, components, modules etc. discussed in connection with the figures. Furthermore, combinations of these schemes may also be used.
In addition, in the embodiments of the present application, words such as "exemplary," "for example," and the like are used to indicate an example, instance, or illustration. Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the term use of an example is intended to present concepts in a concrete fashion.
In the embodiments of the present application, "corresponding" and "corresponding" may sometimes be used in combination, and it should be noted that the meaning to be expressed is consistent when the distinction is not emphasized.
The service scenario described in the embodiments of the present application is to more clearly illustrate the technical solution of the embodiments of the present application, and does not constitute a limitation to the technical solution provided in the embodiments of the present application, and as a person of ordinary skill in the art can know that, with the evolution of the network architecture and the appearance of a new service scenario, the technical solution provided in the embodiments of the present application is equally applicable to similar technical problems.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: including the case where a alone exists, both a and B together, and B alone, where a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
Matrix multiplication (matric multiplication, MM) is one of the important mathematical operations in modern artificial intelligence related technologies such as neural networks, machine learning, etc. In one example, the operation of matrix multiplication may be performed by a matrix multiplier. In the related art, since many multiplication operations and addition operations are involved in matrix multiplication, the matrix multiplier in the related art includes a conventional multiplication circuit in which a plurality of adders, shifters, and multipliers are required to be included. The conventional multiplication circuit has high power consumption when performing matrix multiplication operations.
In view of this, the embodiments of the present application provide a matrix multiplier, which can save power consumption of the matrix multiplier when performing matrix multiplication.
A detailed description of a matrix multiplier according to an embodiment of the present application will be provided with reference to fig. 1.
Fig. 1 is a schematic block diagram of a matrix multiplier 100 provided in an embodiment of the present application. As shown in fig. 1, the matrix multiplier 100 may include: the comparing circuit 110, the first operating circuit 120, and the functions of the comparing circuit 110 and the first operating circuit 120, respectively, are described in detail below.
It should be understood that for convenience of description, the matrix multiplier 100 is used to multiply the first matrix and the second matrix by a matrix.
A comparison circuit 110 for obtaining first target data in a first matrix and second target data in a second matrix, and determining whether at least one of the first target data and the second target data is data in a first set, wherein the data in the first set may include, but is not limited to: 0, + -2 n N is an integer. That is, n may be a positive integer, may be 0, or may be a negative integer, which is not specifically limited in the embodiments of the present application.
That is, in one example, the comparison circuit 110 may determine whether the first target data is data in the first set. For another example, the comparison circuit 110 may determine whether the second target data is data in the first set. For another example, the comparison circuit 110 may determine whether the first target data and the second target data are both data in the first set.
It should be noted that, in the embodiment of the present application, the order of determining whether the first target data and/or the second target data are the data in the first set by the comparison circuit 110 is not specifically limited, and the first target data may be determined first, or the second target data may be determined first, or the first target data and the second target data may be determined simultaneously.
The first operation circuit 120 is configured to output a first result obtained by multiplying the first target data and the second target data according to the first target data and/or the second target data being data in the first set, where the first result includes: 0 or third data. That is, the first operation circuit 120 may directly output the first result of multiplying the first target data and the second target data as 0 or the third data in the case where the comparison circuit 110 determines that at least one of the first target data and the second target data is the data in the first set.
It should be appreciated that the implementation manner of the third data is various, and the embodiment of the application is not limited in particular, and is specifically determined according to the value of the first target data and/or the second target data. In one possible implementation, the third data output by the first operation circuit 120 is obtained by shifting the first target data left by n bits or an absolute value of n (|n|) bits. In another possible implementation, the third data output by the first operation circuit 120 is obtained by shifting the first target data by n bits or the absolute value of n (|n|) bits to the left and inverting the same. In another possible implementation, the third data output by the first operation circuit 120 is obtained by right shifting the first target data by n bits or an absolute value of n (|n|) bits. In another possible implementation, the third data output by the first operation circuit 120 is obtained by right shifting the first target data by n bits or an absolute value of n (|n|) bits and inverting. In another possible implementation, the third data output by the first operation circuit 120 is obtained by shifting the second target data left by n bits or an absolute value of n (|n|) bits. In another possible implementation, the third data output by the first operation circuit 120 is obtained by shifting the second target data by n bits or the absolute value of n (|n|) bits to the left and inverting the same. In another possible implementation, the third data output by the first operation circuit 120 is obtained by right shifting the second target data by n bits or an absolute value of n (|n|) bits. In another possible implementation, the third data output by the first operation circuit 120 is obtained by right shifting the second target data by n bits or an absolute value of n (|n|) bits and inverting.
The first result of multiplying the first target data and the second target data output by the first operation circuit 120 is exemplified below in conjunction with different examples.
In one example, assuming that at least one of the first target data and the second target data is 0, the first operation circuit 120 may directly output the first result of multiplying the first target data and the second target data as 0.
Another example, assume that the first target data is 2 n The first operation circuit 120 may shift the second target data left or right by the third data obtained by n bits, and take the third data as a first result of multiplying the first target data and the second target data. For example, assuming that n is a positive integer, the first operation circuit 120 may shift left the second target data by n bits to obtain third data. As another example, assuming that n is a negative integer, the first operational circuitry 120 may right shift the absolute value of n (|n|) bits of the second target data to obtain the third data. For another example, assuming that n is 0, the first operation circuit 120 may shift the second target data by 0 bits to obtain the third data, that is, the output third data is the second target data itself.
Another example, assume that the first target data is-2 n The first operation circuit 120 may shift the second target data left or right by the third data obtained by n bits, and take the third data as a first result of multiplying the first target data and the second target data. For example, assuming that n is a positive integer, the first operation circuit 120 may shift left the second target data by n bits and invert to obtain the third data. As another example, assuming that n is a negative integer, the first operation circuit 120 may shift the absolute value of n (i n) bits to the right of the second target data and invert to obtain the third data. For another example, assuming that n is 0, the first operation circuit 120 may shift the second target data by 0 bits and invert the second target data to obtain the third data, that is, the output third data is the data obtained by inverting the second target data.
Another example, assume that the second target data is 2 n The first operation circuit 120 may shift the first target data left or right by n bits to obtain third data, and multiply the third data as the first target data and the second target dataA first result. For example, assuming that n is a positive integer, the first operation circuit 120 may shift left the first target data by n bits to obtain third data. As another example, assuming that n is a negative integer, the first operation circuit 120 may right shift the absolute value (|n|) of n by n to obtain the third data for the first target data. For another example, assuming that n is 0, the first operation circuit 120 may shift the first target data by 0 bits to obtain the third data, that is, the output third data is the first target data itself.
Another example, assume that the second target data is-2 n The first operation circuit 120 may shift the first target data left or right by the third data obtained by n bits, and take the third data as a first result of multiplying the first target data and the second target data. For example, assuming that n is a positive integer, the first operation circuit 120 may shift left the first target data by n bits and invert to obtain the third data. As another example, assuming that n is a negative integer, the first operation circuit 120 may shift the absolute value of n (i n) bits to the right of the first target data and invert to obtain the third data. For another example, assuming that n is 0, the first operation circuit 120 may shift the first target data by 0 bits and invert the first target data to obtain the third data, that is, the output third data is the data obtained by inverting the first target data.
Optionally, in some embodiments, the comparing circuit 110 is further configured to determine an operation code (opcode) for indicating that the first target data and/or the second target data is 0 or ±2, based on the first target data and/or the second target data being data in the first set n . In this way, the first operation circuit 120 may determine the first result of multiplying the first target data and the second target data directly according to the value of the operation code (opcode) corresponding to the first target data and/or the second target data.
It should be noted that the comparison circuit 110 may determine whether the first target data and/or the second target data is 0 or ±2 n An opcode is determined and output, or an opcode corresponding to the first target data and the second target data may also be determined and output, which is not specifically limited in the embodiment of the present application.
By way of example, several possible implementations of the opcode are described in detail below taking an opcode corresponding to the output of the first target data by the comparison circuit 110 and a corresponding opcode corresponding to the second target data.
In one example, the value of the opcode corresponding to the first target data is 0, which may be used to indicate that the first target data is 0. In this way, the first operation circuit 120 may determine that the first result obtained by multiplying the first target data and the second target data is 0 according to the value of the opcode corresponding to the first target data being 0.
For another example, the value of the opcode corresponding to the second target data is 0, which may be used to indicate that the second target data is 0. In this way, the first operation circuit 120 may determine that the first result obtained by multiplying the first target data and the second target data is 0 according to the value of the opcode corresponding to the second target data being 0.
Another example is that the value of the opcode corresponding to the first target data is 2 n Which may be used to indicate that the first target data is 2 n . Thus, the first operation circuit 120 may take the value of 2 according to the opcode corresponding to the first target data n And n is taken as a first result of multiplying the first target data and the second target data, wherein the third data is obtained by shifting the second target data left or right by |n| bits. The process of determining the third data by the first operation circuit 120 according to the second target data and the value of n is described above, which is not described in detail herein.
Another example is that the value of the opcode corresponding to the first target data is-2 n Which may be used to indicate that the first target data is-2 n . Thus, the first operation circuit 120 may take the value of 2 according to the opcode corresponding to the first target data n And n is taken, the second target data is shifted left or right by |n| bits and inverted to obtain third data, and the third data is taken as a first result of multiplying the first target data and the second target data. The process of determining the third data by the first operation circuit 120 according to the second target data and the value of n is described above, which is not described in detail herein.
Another example, the second target numberThe value of the corresponding opcode is 2 n Which may be used to indicate that the second target data is 2 n . Thus, the first operation circuit 120 may take the value of 2 according to the opcode corresponding to the second target data n And n is taken as a first result of multiplying the first target data and the second target data, wherein the third data is obtained by shifting the first target data left or right by |n| bits. The process of determining the third data by the first operation circuit 120 according to the first target data and the value of n is described above, which is not described in detail herein.
Another example is that the value of the opcode corresponding to the second target data is-2 n Which may be used to indicate that the second target data is 2 n . Thus, the first operation circuit 120 may take the value of 2 according to the opcode corresponding to the second target data n And n is taken, the first target data is shifted left or right by |n| bits and inverted to obtain third data, and the third data is taken as a first result of multiplying the first target data and the second target data. The process of determining the third data by the first operation circuit 120 according to the first target data and the value of n is described above, which is not described in detail herein.
Optionally, in some embodiments, the matrix multiplier 100 may further include at least one first register, which is connected to the comparing circuit 110 and the first operating circuit 120, respectively. Wherein the at least one first register is configured to obtain the first target data and the second target data from the comparison circuit 110, and output the first target data and the second target data to the first operation circuit 120. In a possible implementation, the input of at least one first register may store the first target data and the second target data after receiving them from the comparison circuit 110, and the first target data and the second target data are transferred from the output to the first operation circuit 120 after receiving the clock control signal (clk) from the clock control unit at its clock pulse input.
Optionally, in some embodiments, the at least one first register may also receive an opcode from the comparison circuit 110 and transmit it to the first operation circuit 120. In a possible implementation, the at least one first register transmits the opcode from the output to the first operating circuit 120 after its clock pulse input receives a clock control signal (clk) from the clock control unit.
It should be noted that the first target data and the second target data may be stored in the same first register, or may also be stored in different first registers, which is not specifically limited in the embodiment of the present application.
It should be further noted that the opcode may be stored in the same first register as the first target data and/or the second target data, or may be stored in a different first register, which is not specifically limited in the embodiment of the present application.
Optionally, in some embodiments, the matrix multiplier 100 may further include at least one second register and a second operation circuit, where the at least one second register is connected to the second operation circuit. Wherein the at least one second register is configured to output the acquired first target data and second target data to the second operation circuit, in a case where the comparison circuit 110 determines that neither the first target data nor the second target data is data in the first set; the second operation circuit is used for carrying out conventional multiplication operation on the received first target data and second target data and outputting a second result of multiplying the first target data and the second target data.
It should be appreciated that in the above embodiment, in the case where the first target data and/or the second target data are data in the first set, the first operation circuit 120 may be used to output a first result obtained by multiplying the first target data and the second target data, and the first result may be used as a final output result of the matrix multiplier 100, so that power consumption of the matrix multiplier 100 may be saved. If neither the first target data nor the second target data is the data in the first set, the multiplication operation of the first target data and the second target data is performed using the second operation circuit to obtain a second result, and the second result is used as a final output result of the matrix multiplier 100.
In a possible implementation, the first result may be selected by a data selector (MUX) as a result of the final output of the matrix multiplier 100, or the second result may be selected as a result of the final output of the matrix multiplier 100.
In the embodiment of the present application, it may be determined by the enable signal whether to use the second operation circuit to determine the result of multiplying the first target data and the second target data or to use the first operation circuit 120 to determine the result of multiplying the first target data and the second target data.
For example, the input end of the at least one first register is the first target data, the second target data, the corresponding opcode and the first enable signal, and the input end of the at least one second register is the first target data, the second target data and the second enable signal, wherein the value of the second enable signal is opposite to the value of the first enable signal. In one example, assuming that the first target data and/or the second target data are data in the first set, the comparison circuit 110 may output a first enable signal having a value of 1 in this case. The at least one first register outputs the first target data, the second target data and the corresponding opcode acquired from the input end from the output end according to the received first enabling signal with the value of 1. The at least one second register does not output the first target data and the second target data acquired from the input end according to the received second enabling signal with the value of 0, namely, after the at least one second register receives the enabling signal with the value of 0, the data of the output end of the at least one second register is kept unchanged, and the data of the output end of the at least one second register is also the output data of the last clock period and does not change along with the change of the data of the input end. As another example, assuming that neither the first target data nor the second target data are data in the first set, the comparison circuit 110 outputs a first enable signal having a value of 0 in this case. The at least one first register does not output the first target data, the second target data and the corresponding opcode acquired from the input end according to the received first enabling signal with the value of 0, namely, after the at least one first register receives the enabling signal with the value of 0, the data at the output end of the at least one first register is kept unchanged, and the data at the output end of the at least one first register is also the output data of the last clock period and does not change along with the change of the data at the input end. The at least one second register outputs the first target data and the second target data acquired from the input end from the output end according to the received second enabling signal with the value of 1.
Alternatively, in some embodiments, the operations of inverting the values of the first enable signal and the second enable signal may be implemented by an inverting circuit. In one example, the inverting circuit is an NOT gate.
Optionally, in some embodiments, the MUX may further select the first result as a result of the final output of the matrix multiplier 100 or select the second result as a result of the final output of the matrix multiplier 100 according to the first enable signal. Specifically, in one possible implementation, the value of the first enable signal output by the comparison circuit 110 is 1, and the mux may use the first result as the final output result of the matrix multiplier 100 according to the value of the first enable signal. In another possible implementation, the value of the first enable signal output by the comparison circuit 110 is 0, and the mux may use the second result as the final output result of the matrix multiplier 100 according to the value of the first enable signal.
The matrix multiplier can multiply the data in two matrices with respect to special data (e.g. 0 or 2 n ) When multiplying the data, the corresponding special result can be directly output. In this way, the higher power consumption due to performing conventional multiplication operations can be reduced. In addition, since the matrix multiplier performs multiplication operation on data in two matrices, a large amount of operation operations are involved, and the benefit brought by power consumption saving is considerable.
Another matrix multiplier 200 provided in an embodiment of the present application is described in detail below with reference to fig. 2. It should be understood that the example of fig. 2 is merely to aid one skilled in the art in understanding the present embodiments, and is not intended to limit the present embodiments to the specific values or the specific scenarios illustrated in fig. 2. Various equivalent modifications and variations will be apparent to those skilled in the art from the following example given in fig. 2, and such modifications and variations are intended to be within the scope of the embodiments of the present application.
Fig. 2 is a schematic block diagram of another matrix multiplier 200 provided in an embodiment of the present application. As shown in fig. 2, the matrix multiplier 200 may include: comparator 210, register 1, register 2, register 3, register 4, first operational circuitry 220, second operational circuitry 230, MUX 240, NOT gate 250.
It should be noted that, in the embodiment of the present application, the number of the first registers and the second registers is not specifically limited, and for convenience of description, two first registers (register 3, register 4) and two second registers (register 1, register 2) are exemplified in fig. 2.
Referring to fig. 2, an operand a (corresponding to the first target data above) and an operand B (corresponding to the second target data above) are input to a comparator 210, and the comparator 210 determines whether at least one of the operand a and the operand B is data in a first set according to the above method and outputs a corresponding enable signal, wherein the data in the first set may include, but is not limited to: 0, + -2 n N is an integer.
In the following, each portion included in the matrix multiplier 200 will be described in detail taking the value of the operand a as 2 and the value of the operand B as 3 as an example.
Comparator 210, based on the value of operand A, takes on the value of data (2 n N is 1), the operand A is output, and the opcode corresponding to the operand A (opcode is 1, indicating that the operand A is 2) 1 ) Operand B, and an enable signal having a value of 1.
The not gate 250 performs an inverting operation on the enable signal having a value of 1 output from the comparator 210, obtains an enable signal having a value of 0, and outputs the enable signal having a value of 0 to the register 1 and the register 2.
In the register 1, the data received by the input terminal (D1) includes an operand a, the enable signal received by the enable terminal (EN 1) is an enable signal with a value of 0 output by the not gate 250, and the clock signal (clk) sent by the clock control unit is received by the clock pulse input terminal (clk 1). When the rising edge of the clock signal at clk 1 (i.e., low-to-high) arrives, the output terminal (Q1) of the register 1 will not output the operand a received at the input terminal (D1) because the value of the enable signal is 0.
In the register 2, the data received by the input terminal (D2) includes an operand B, the enable signal received by the enable terminal (EN 2) is an enable signal with a value of 0 output by the not gate 250, and the clock signal (clk) sent by the clock control unit is received by the clock pulse input terminal (clk 2). When the rising edge of the clock signal at clk 2 (i.e., low-to-high) arrives, the output terminal (Q2) of the register 2 will not output the operand B received at the input terminal (D2) because the value of the enable signal is 0.
A register 3, the data received by the input end (D3) of which comprises an operand A and an opcode corresponding to the operand A (the opcode has a value of 1 and indicates that the operand A has a value of 2 1 ) The enable signal received by the enable terminal (EN 3) has a value of 1, and the clock signal (clk) sent by the clock control unit is received by the clock pulse input terminal (clk 3). When the rising edge of the clock signal at clk 3 (i.e. low level changes to high level), the register 3 will input the operand A at input (D3) and the opcode corresponding to the operand A (opcode takes 1, indicating that the operand A takes 2) 1 ) Output from the output terminal (Q3).
The data received by the input terminal (D4) of the register 4 comprises an operand B, the value of the enable signal received by the enable terminal (EN 4) is 1, and the clock signal (clk) sent by the clock control unit is received by the clock pulse input terminal (clk 4). Upon arrival of the rising edge (i.e., low to high) of the clock signal at clk 4, register 4 outputs operand B at input (D4) from output (Q4) due to the value of the enable signal being 1.
The first operation circuit 220 is configured to acquire data output from the registers 3 and 4. Specifically, the data obtained from the input of the first operation circuit 220 includes an operand a, an opcode corresponding to the operand a (opcode has a value of 1, indicating that the operand a has a value of 2) 1 ) And operand B. The first operation circuit 220 determines that the value of the operand A is 2 according to the value of the opcode is 1 1 Thus, the first operation circuit 220 may shift the operand B1 bit to the left, and take the result after shifting the operand B1 bit to the left as an output result (first result) at the output of the first operation circuit 220.
And a second operation circuit 230 for acquiring data output therefrom from the register 1 and the register 2. Since the outputs of the register 1 and the register 2 do not output the operands a and B, or the data output from the register 1 and the register 2 at the previous clock cycle, the second operation circuit 230 does not perform the multiplication operation of the operands a and B. The multiplication output result (second result) of the output terminal of the second operation circuit 230 is also the calculation result of the last clock cycle.
The data received by the input end of the MUX 240 includes a first result and a second result, and the enable signal received by the enable end (EN 5) is an enable signal with a value of 1. The MUX 240 takes the first result output by the first operation circuit 220 as the final output result of the matrix multiplier 200 according to the enable signal having a value of 1.
As another example, each portion included in the matrix multiplier 200 will be described in detail below with the value of operand a being 5 and the value of operand B being 3 as an example.
Comparator 210, based on the fact that neither the value of operand A nor the value of operand B is data in the first set (the data in the first set comprises 0,2 n N is a positive number), the comparator 210 outputs an enable signal having a value of 1, operand a and operand B.
The not gate 250 performs an inverting operation on the enable signal having a value of 0 output from the comparator 210, obtains the enable signal having a value of 1, and outputs the enable signal having a value of 1 to the register 1 and the register 2.
In register 1, the data received at the input terminal (D1) includes an operand a, the enable signal received at the enable terminal (EN 1) is an enable signal with a value of 1 output by the not gate 250, and the clock signal (clk) sent by the clock control unit is received at the clock pulse input terminal (clk 1). When the rising edge of the clock signal at clk 1 (i.e., low-to-high) arrives, the register 1 outputs the operand a at the input terminal (D1) from the output terminal (Q1) due to the value of the enable signal being 1.
In the register 2, the data received by the input terminal (D2) includes an operand B, the enable signal received by the enable terminal (EN 2) is an enable signal with a value of 1 output by the not gate 250, and the clock signal (clk) sent by the clock control unit is received by the clock pulse input terminal (clk 2). Upon arrival of the rising edge (i.e., low to high) of the clock signal at clk 2, register 2 outputs operand B at input (D2) from output (Q2) due to the value of the enable signal being 1.
In the register 3, the data received by the input terminal (D3) includes an operand a, the enable signal received by the enable terminal (EN 3) has a value of 0, and the clock signal (clk) sent by the clock control unit is received by the clock input terminal (clk 3). When the rising edge of the clock signal at clk 3 (i.e., low-to-high) arrives, the output terminal (Q3) of the register 3 will not output the operand a received at the input terminal (D3) because the value of the enable signal is 0.
The data received by the input terminal (D4) of the register 4 comprises an operand B, the value of the enable signal received by the enable terminal (EN 4) is 0, and the clock signal (clk) sent by the clock control unit is received by the clock pulse input terminal (clk 4). When the rising edge of the clock signal at clk 4 (i.e., low-to-high) arrives, the output terminal (Q4) of the register 3 will not output the operand B received at the input terminal (D4) because the value of the enable signal is 0.
The second operation circuit 230 is configured to obtain the operand a and the operand B output from the register 1 and the register 2, respectively, and perform a conventional multiplication operation on the operand a and the operand B, and the obtained result is used as a multiplication output result (second result) at the output end of the second operation circuit 230.
The first operation circuit 220 is configured to acquire data output from the registers 3 and 4. Since the outputs of the register 3 and the register 4 do not output the operand a and the operand B, the output result (first result) of the output of the first operation circuit 220 is also the result of the last clock cycle.
The data received by the input end of the MUX 240 includes a first result and a second result, and the enable signal received by the enable end (EN 5) is an enable signal with a value of 0. The MUX 240 takes the second result output by the second operation circuit 230 as the final output result of the matrix multiplier 200 according to the enable signal having the value of 0.
A method for matrix multiplication according to an embodiment of the present application is described in detail below with reference to fig. 3.
Fig. 3 is a schematic block diagram of a matrix multiplication method provided in an embodiment of the present application. The method is applied to a matrix multiplier for performing a matrix multiplication operation on a first matrix and a second matrix. As shown in FIG. 3, the method may include steps 310-320, with steps 310-320 being described in detail below, respectively.
Step 310: the comparison circuit determines whether first target data and/or second target data are data in a first set, wherein the first target data are data in the first matrix, the second target data are data in the second matrix, and the first set comprises: 0. + -2 n N is an integer.
Step 320: the first operation circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data as data in the first set, wherein the first result comprises: 0 or third data.
The third data is obtained by shifting n pairs of first target data or second target data, or is obtained by shifting n pairs of first target data or second target data and inverting the first target data or the second target data.
Optionally, the first operation circuit outputs the first result as 0 according to the first target data as 0.
Optionally, the first operation circuit is 2 according to the first target data n The first result is the third data, which is obtained by shifting the second target data left or right by |n| bits.
Optionally, the first operation circuit is 2 according to the first target data n And n is a positive integer, the first result outputted is the third data, the third data is obtained by shifting the second target data left by n bits; or the first operation circuit is 2 according to the first target data n And n is a negative integer, the first result is the third data, and the third data is obtained by right shifting the second target data by |n| bits.
Optionally, the first operation circuit is-2 according to the first target data n The first result is the third data, which is obtained by shifting left or right the second target data by |n| bits and inverting.
Optionally, the method further comprises: the comparison circuit determines a first operation code according to the first target data and/or the second target data as the data in the first set, wherein the value of the first operation code indicates that the first target data and/or the second target data is 0 or + -2 n The method comprises the steps of carrying out a first treatment on the surface of the The first operation circuit determines the first result to be 0 or the third data according to the value of the first operation code and the first target data and/or the second target data.
Optionally, the method further comprises: at least one first register acquires the first target data and the second target data from the comparison circuit, and the at least one first register is respectively connected with the comparison circuit and the first operation circuit; the at least one first register outputs the first target data and the second target data to the first operation circuit.
Optionally, the method further comprises: the at least one first register obtains the first operation code from the comparison circuit and outputs the first operation code to the first operation circuit.
Optionally, the method further comprises: at least one second register outputting the acquired first target data and second target data to a second operation circuit, the at least one second register being connected to the second operation circuit, in case the comparison circuit determines that neither the first target data nor the second target data is data in the first set; the second operation circuit performs a normal multiplication operation on the received first target data and the second target data, and outputs a second result of multiplying the first target data and the second target data.
Optionally, the method further comprises: the data selector MUX takes the first result or the second result as the output of the matrix multiplier, and the MUX is respectively connected with the second operation circuit and the first operation circuit.
Optionally, the method further comprises: the comparison circuit outputs an enabling signal with a value of 1 to the at least one first register under the condition that the first target data and/or the second target data are/is determined to be the data in the first set; the at least one first register outputs the first target data and the second target data to the first operation circuit according to the enabling signal with the value of 1.
Optionally, the method further comprises: the comparison circuit outputs an enable signal with a value of 0 to the at least one first register in the case that it is determined that neither the first target data nor the second target data is data in the first set; the inverting circuit performs inverting operation on the enable signal with the value of 0 output by the comparison circuit to obtain an enable signal with the value of 1, and outputs the enable signal with the value of 1 to the at least one second register, wherein the inverting circuit is respectively connected with the comparison circuit and the at least one second register; the at least one second register outputs the acquired first target data and second target data to the second operation circuit based on the enable signal with the value of 1.
Optionally, the inverting circuit is an inverter.
Optionally, the MUX takes the first result as the output of the matrix multiplier after receiving the enable signal with the value of 1; or the MUX takes the second result as the output of the matrix multiplier after receiving the enabling signal with the value of 0.
It will be appreciated that the description of the method embodiment shown in fig. 3 corresponds to the description of the matrix multiplier shown in fig. 1 or fig. 2, and that therefore, reference is made to the previous embodiments of matrix multipliers for parts that are not described in detail.
The method provided by the embodiment of the present application is described above in detail with reference to fig. 3, and the embodiment of the apparatus of the present application will be described below in detail with reference to fig. 4 to 5. It is to be understood that the description of the method embodiments corresponds to the description of the device embodiments, and that parts not described in detail can therefore be seen in the preceding method embodiments.
Fig. 4 is a schematic block diagram of a matrix multiplication apparatus 400 provided in an embodiment of the present application. The apparatus 400 may be implemented in software, hardware, or a combination of both. The apparatus 400 provided in the embodiment of the present application may implement the method flow shown in fig. 3 in the embodiment of the present application, where the apparatus 400 includes: the comparing module 410, the determining module 420, and the outputting module 430, wherein the comparing module 410 is configured to determine whether the first target data and/or the second target data are data in a first set, where the first target data are data in the first matrix, and the second target data are data in the second matrix, and the first set includes: 0. + -2 n N is an integer; the determining module 420 is configured to determine, according to the first target data and/or the second target data being data in the first set, a first result of multiplying the first target data and the second target data, where the first result includes: 0 or third data, wherein the third data is obtained by shifting n pairs of first target data or second target data, or is obtained by shifting n pairs of first target data or second target data and inverting the first target data or the second target data; the output module 430 is configured to output the first result.
Optionally, the determining module 420 is specifically configured to: the first operation circuit determines that the first result is 0 according to the first target data is 0.
Optionally, the determining module 420 is specifically configured to: the first operation circuit is 2 according to the first target data n The first result is determined to be the third data, which is obtained by shifting left or right the second target data by |n| bits.
Optionally, the determining module 420 is specifically configured to: the first operation circuit is 2 according to the first target data n And n is a positive integer, determining the first result as the third data, the third data being obtained by shifting the second target data left by n bits; or the first operation circuit is 2 according to the first target data n And n is a negative integer, the first result is the third data, and the third data is obtained by right shifting the second target data by |n| bits.
Optionally, the determining module 420 is specifically configured to: the first operation circuit is-2 according to the first target data n And determining the first result as the third data, wherein the third data is obtained by shifting left or right of the second target data by |n| bits and inverting.
Optionally, the determining module 420 is further configured to: the comparison circuit determines a first operation code according to the first target data and/or the second target data as the data in the first set, wherein the value of the first operation code indicates that the first target data and/or the second target data is 0 or + -2 n The method comprises the steps of carrying out a first treatment on the surface of the The first operation circuit determines the first result to be 0 or the third data according to the value of the first operation code and the first target data and/or the second target data.
Optionally, the apparatus 400 further comprises: the acquisition module is used for acquiring the first target data and the second target data from the comparison circuit by at least one first register, and the at least one first register is respectively connected with the comparison circuit and the first operation circuit; the output module 430 is further configured to output the first target data and the second target data to the first operation circuit by using the at least one first register.
Optionally, the acquiring module is further configured to acquire the first operation code from the comparing circuit by using the at least one first register; the output module 430 is further configured to output the first operation code to the first operation circuit by the at least one first register.
Optionally, the apparatus 400 further comprises: the multiplication module, the output module 430 is further configured to output the acquired first target data and the second target data to a second operation circuit, where the comparison circuit determines that neither the first target data nor the second target data is data in the first set, and the at least one second register is connected to the second operation circuit; the multiplication module is used for carrying out conventional multiplication operation on the received first target data and the second target data by the second operation circuit to obtain a second result of multiplication of the first target data and the second target data; the output module 430 is further configured to output a second result of multiplying the first target data and the second target data by the second operation circuit.
Optionally, the output module 430 is further configured to use the first result or the second result as an output of the matrix multiplier by a data selector MUX, where the UX is connected to the second operation circuit and the first operation circuit respectively.
Optionally, the output module 430 is further configured to output an enable signal with a value of 1 to the at least one first register when the comparison circuit determines that the first target data and/or the second target data are data in the first set; the at least one first register outputs the first target data and the second target data to the first operation circuit according to the enabling signal with the value of 1.
Optionally, the apparatus 400 further comprises: the inverting module, the output module 430 is further configured to output an enable signal with a value of 0 to the at least one first register when the comparing circuit determines that neither the first target data nor the second target data is the data in the first set; the inverting module is used for inverting the enabling signal with the value of 0 output by the comparison circuit by the inverting circuit to obtain the enabling signal with the value of 1, and the inverting circuit is respectively connected with the comparison circuit and the at least one second register; the output module 430 is further configured to output the enable signal with the value of 1 to the at least one second register; the output module 430 is further configured to output the acquired first target data and the second target data to the second operation circuit based on the enable signal with the value of 1.
Optionally, the inverting circuit is an inverter.
Optionally, the output module 430 is further configured to, after receiving the enable signal with the value of 1, take the first result as an output of the matrix multiplier; or the MUX takes the second result as the output of the matrix multiplier after receiving the enabling signal with the value of 0.
The apparatus 400 herein may be embodied in the form of functional modules. The term "module" herein may be implemented in software and/or hardware, and is not specifically limited thereto.
The modules of the examples described in the embodiments of the present application can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It should be noted that: in the apparatus provided in the above embodiment, when the method is executed, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to perform all or part of the functions described above. For example, the comparison module 410 may be used to perform any of the methods described above, the determination module 420 may be used to perform any of the methods described above, and the output module 430 may be used to perform any of the methods described above. The steps of the comparing module 410, the determining module 420 and the output module 430 which are responsible for implementation can be designated as required, and all functions of the device are implemented by respectively implementing different steps in the method through the comparing module 410, the determining module 420 and the output module 430.
In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the foregoing method embodiments, which are not repeated herein.
The methods provided by embodiments of the present application may be performed by a computing device, which may also be referred to as a computer system. Including a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on top of the operating system layer. The hardware layer includes hardware such as a processing unit, a memory control unit, and the like, and the functions and structures of the hardware are described in detail later. The operating system is any one or more computer operating systems for realizing business processing through processes (processes), for example, a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, a windows operating system, or the like. The application layer comprises application programs such as a browser, an address book, word processing software, instant messaging software and the like. Alternatively, the computer system may be a handheld device such as a smart phone or a terminal device such as a personal computer, which is not particularly limited in the present application, so long as the method provided in the embodiments of the present application can be used. The execution subject of the method provided in the embodiments of the present application may be a computing device, or may be a functional module in the computing device that can call a program and execute the program.
A computing device provided in an embodiment of the present application is described in detail below in conjunction with fig. 5.
Fig. 5 is a schematic architecture diagram of a computing device 1500 provided in an embodiment of the present application. The computing device 1500 may be a server or a computer or other computing device. The computing device 1500 shown in fig. 5 includes: at least one processor 1510 and a memory 1520.
It should be understood that the present application is not limited to the number of processors, memories in computing device 1500.
The processor 1510 executes instructions in the memory 1520, causing the computing apparatus 1500 to implement the methods provided herein. Alternatively, processor 1510 executes instructions in memory 1520, causing computing device 1500 to implement the functional modules provided herein, thereby implementing the methods provided herein.
Optionally, computing device 1500 also includes a communication interface 1530. The communication interface 1530 enables communication between the computing device 1500 and other devices or communication networks using a transceiver module such as, but not limited to, a network interface card, transceiver, or the like.
Computing device 1500 also includes a system bus 1540, wherein processor 1510, memory 1520, and communication interface 1530 are each coupled to system bus 1540. The processor 1510 is capable of accessing the memory 1520 through the system bus 1540, for example, the processor 1510 is capable of data reading and writing or code execution in the memory 1520 through the system bus 1540. The system bus 1540 is a peripheral component interconnect express (peripheral component interconnect express, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The system bus 1540 is classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.
In one possible implementation, the functions of the processor 1510 are mainly to interpret instructions (or code) of a computer program and process data in computer software. Wherein the instructions of the computer program and the data in the computer software can be stored in the memory 1520 or in the cache 1516.
Alternatively, the processor 1510 may be an integrated circuit chip having signal processing capabilities. By way of example, and not limitation, the processor 1510 is a general purpose processor, a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Wherein the general purpose processor is a microprocessor or the like. For example, the processor 1510 is a central processing unit (central processing unit, CPU).
Optionally, each processor 1510 includes at least one processing unit 1512 and a memory control unit 1514.
Optionally, processing unit 1512, also known as a core or kernel, is the most important component of the processor. The processing unit 1512 is manufactured by monocrystalline silicon in a certain production process, and all calculation, receiving command, storing command and processing data of the processor are executed by a core. The processing units respectively and independently run the program instructions, and the running speed of the program is increased by utilizing the parallel computing capability. Various processing units have a fixed logic structure, e.g., processing units include logic units such as a first-level cache, a second-level cache, an execution unit, an instruction-level unit, and a bus interface.
For example, memory control unit 1514 is used to control the interaction of data between storage 1520 and processing unit 1512. Specifically, the memory control unit 1514 receives a memory access request from the processing unit 1512 and controls access to memory based on the memory access request. By way of example, and not limitation, the memory control unit is a memory management unit (memory management unit, MMU) or the like.
For example, each memory control unit 1514 addresses memory 1520 over a system bus. And an arbiter (not shown in fig. 5) is configured in the system bus, which is responsible for handling and coordinating competing accesses by the multiple processing units 1512.
One implementation example, the processing unit 1512 and the memory control unit 1514 are communicatively coupled via connection lines, such as address lines, internal to the chip, to enable communication between the processing unit 1512 and the memory control unit 1514.
Optionally, each processor 1510 also includes a cache 1516, where the cache is a buffer of data exchanges (referred to as a cache). When the processing unit 1512 is to read data, it will first look up the required data from the cache, execute it directly if found, and look up it from the memory if not found. Since the cache runs much faster than the memory, the cache functions to help the processing unit 1512 run faster.
Memory 1520 can provide runtime space for processes in computing device 1500, e.g., memory 1520 holds computer programs (specifically, code for the programs) used to generate the processes. After the computer program is run by the processor to generate a process, the processor allocates a corresponding memory space for the process in memory 1520. Further, the storage space further includes a text segment, an initialization data segment, a bit initialization data segment, a stack segment, a heap segment, and the like. The memory 1520 stores data generated during the running of the process, for example, intermediate data, or process data, etc., in a storage space corresponding to the above process.
Optionally, the memory is also referred to as a memory, and is used to temporarily store operation data in the processor 1510, and data exchanged with an external memory such as a hard disk. As long as the computer is running, the processor 1510 will call the data to be operated on into memory for operation, and after the operation is completed, the processing unit 1512 will send out the result.
By way of example, and not limitation, memory 1520 is volatile memory or nonvolatile memory or may include both volatile and nonvolatile memory. The nonvolatile memory is a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory is random access memory (random access memory, RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM). It should be noted that the memory 1520 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The above-listed structure of the computing device 1500 is merely an exemplary illustration, and the present application is not limited thereto, and the computing device 1500 of the embodiments of the present application includes various hardware in computer systems in the prior art, for example, the computing device 1500 includes other memories besides the memory 1520, for example, a disk memory, and the like. Those skilled in the art will appreciate that computing device 1500 may also include other components necessary to achieve proper operation. Also, those skilled in the art will appreciate that the computing device 1500 described above may also include hardware devices that implement other additional functions, as desired. Furthermore, those skilled in the art will appreciate that the computing device 1500 described above may also include only the necessary components to implement embodiments of the present application, and not necessarily all of the components shown in FIG. 5.
In this embodiment, a chip including the matrix multiplier is also provided.
In this embodiment, a computer program product comprising instructions is also provided, which may be a software or program product comprising instructions, capable of running on a computing device or stored in any available medium. Which, when executed on a computing device, causes the computing device to perform the methods provided above, or causes the computing device to perform the functions of the apparatus provided above.
In this embodiment, a computer-readable storage medium is also provided, where the computer-readable storage medium may be any available medium that can be stored by a computing device or a data storage device such as a data center that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc. The computer readable storage medium includes instructions that, when executed on a computing device, cause the computing device to perform the methods provided above.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (32)

1. A matrix multiplier, the matrix multiplier comprising:
a comparison circuit for determining whether first target data and/or second target data are data in a first set, wherein the first target data are data in a first matrix and the second target data are data in a second matrix, the first set comprising: 0. + -2 n N is an integer;
a first operation circuit, configured to output a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data being data in the first set, where the first result includes: 0 or third data, wherein the third data is obtained by shifting the first target data or the second target data according to the n pairs, or the third data is obtained by shifting and inverting the first target data or the second target data according to the n pairs.
2. The matrix multiplier of claim 1, wherein,
the first operation circuit is specifically configured to output the first result as 0 according to the first target data as 0.
3. The matrix multiplier of claim 1, wherein,
The first operation circuit is specifically configured to be 2 according to the first target data n The first result is the third data, which is obtained by shifting left or right the second target data by |n| bits.
4. A matrix multiplier as claimed in claim 3, characterized in that the first operating circuit is specifically adapted to:
2 according to the first target data n The n is a positive integer, the output first result is the third data, and the third data is obtained by shifting the second target data by n bits leftwards; or (b)
2 according to the first target data n And n is a negative integer, the first result output is the third data, and the third data is obtained by right shifting the second target data by |n| bits.
5. The matrix multiplier of claim 1, wherein,
the first operation circuit is specifically configured to be-2 according to the first target data n Transport and deliverThe first result is the third data, which is obtained by shifting left or right by |n| bits and inverting the second target data.
6. A matrix multiplier as claimed in any one of claims 1 to 5, characterised in that,
The comparison circuit is further configured to determine a first operation code according to the first target data and/or the second target data being data in the first set, where a value of the first operation code indicates that the first target data and/or the second target data is 0 or ±2 n
The first operation circuit is specifically configured to determine that the first result is 0 or the third data according to the value of the first operation code and the first target data and/or the second target data.
7. The matrix multiplier according to any one of claims 1 to 5, further comprising at least one first register, said at least one first register being connected to said comparison circuit, said first operation circuit respectively,
the at least one first register is used for acquiring the first target data and the second target data from the comparison circuit;
the at least one first register is further configured to output the first target data and the second target data to the first operation circuit.
8. The matrix multiplier of claim 7, wherein,
the at least one first register is further configured to obtain the first operation code from the comparison circuit, and output the first operation code to the first operation circuit.
9. The matrix multiplier according to any one of claims 1 to 5, further comprising at least one second register and a second operating circuit, said at least one second register and said second operating circuit being connected,
the at least one second register is configured to output the acquired first target data and second target data to the second operation circuit, in a case where the comparison circuit determines that neither the first target data nor the second target data is data in the first set;
the second operation circuit is used for performing conventional multiplication operation on the received first target data and the second target data and outputting a second result obtained by multiplying the first target data and the second target data.
10. The matrix multiplier of claim 9, further comprising a data selector MUX, wherein the MUX is coupled to the second operating circuit and the first operating circuit, respectively,
the MUX is used for taking the first result or the second result as the output of the matrix multiplier.
11. The matrix multiplier of claim 7, wherein,
The comparison circuit is further configured to output an enable signal with a value of 1 to the at least one first register when the comparison circuit determines that the first target data and/or the second target data are data in the first set;
the at least one first register is specifically configured to output the first target data and the second target data to the first operation circuit according to the enable signal with the value of 1.
12. The matrix multiplier of claim 9, further comprising an inverting circuit coupled to the comparison circuit and the at least one second register, respectively,
the comparison circuit is further configured to output an enable signal with a value of 0 to the at least one first register when the comparison circuit determines that neither the first target data nor the second target data is data in the first set;
the inverting circuit is configured to perform inverting operation on the enable signal with the value of 0 output by the comparing circuit, obtain an enable signal with the value of 1, and output the enable signal with the value of 1 to the at least one second register;
The at least one second register is specifically configured to output the acquired first target data and second target data to the second operation circuit based on the enable signal with a value of 1.
13. The matrix multiplier of claim 12, wherein the inverting circuit is an not gate.
14. The matrix multiplier of claim 10, wherein the MUX is specifically configured to:
after receiving an enabling signal with a value of 1, taking the first result as the output of the matrix multiplier; or (b)
And after receiving the enabling signal with the value of 0, taking the second result as the output of the matrix multiplier.
15. A method of matrix multiplication, the method being applied to a matrix multiplier for performing a matrix multiplication operation on a first matrix and a second matrix, the method comprising:
the comparison circuit determines whether first target data and/or second target data are data in a first set, wherein the first target data are data in the first matrix and the second target data are data in the second matrix, the first set comprising: 0. + -2 n N is an integer;
the first operation circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data as data in the first set, wherein the first result comprises: 0 or third data, wherein the third data is obtained by shifting the first target data or the second target data according to the n pairs, or the third data is obtained by shifting and inverting the first target data or the second target data according to the n pairs.
16. The method of claim 15, wherein the first operating circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data being data in the first set, comprising:
the first operation circuit outputs the first result of 0 according to the first target data of 0.
17. The method of claim 15, wherein the first operating circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data being data in the first set, comprising:
The first operation circuit is 2 according to the first target data n The first result is the third data, which is obtained by shifting left or right the second target data by |n| bits.
18. The method of claim 17, wherein the first operating circuit is 2 according to the first target data n The first result output is the third data, including:
the first operation circuit is 2 according to the first target data n The n is a positive integer, the output first result is the third data, and the third data is obtained by shifting the second target data by n bits leftwards; or (b)
The first operation circuit2 according to the first target data n And n is a negative integer, the first result output is the third data, and the third data is obtained by right shifting the second target data by |n| bits.
19. The method of claim 15, wherein the first operating circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data being data in the first set, comprising:
The first operation circuit is-2 according to the first target data n The first result is the third data, which is obtained by shifting left or right the second target data by |n| bits and inverting.
20. The method according to any one of claims 15 to 19, further comprising:
the comparison circuit determines a first operation code according to the first target data and/or the second target data as the data in the first set, wherein the value of the first operation code indicates that the first target data and/or the second target data is 0 or +/-2 n
The first operation circuit outputs a first result of multiplying the first target data and the second target data according to the first target data and/or the second target data as data in the first set, and the first operation circuit comprises:
the first operation circuit determines that the first result is 0 or the third data according to the value of the first operation code and the first target data and/or the second target data.
21. The method according to any one of claims 15 to 19, further comprising:
At least one first register acquires the first target data and the second target data from the comparison circuit, and the at least one first register is respectively connected with the comparison circuit and the first operation circuit;
the at least one first register outputs the first target data and the second target data to the first operation circuit.
22. The method of claim 21, wherein the method further comprises:
the at least one first register obtains the first operation code from the comparison circuit and outputs the first operation code to the first operation circuit.
23. The method according to any one of claims 15 to 19, further comprising:
at least one second register outputting the acquired first target data and second target data to a second operation circuit, the at least one second register being connected to the second operation circuit, in a case where the comparison circuit determines that neither the first target data nor the second target data is data in the first set;
the second operation circuit performs a conventional multiplication operation on the received first target data and the second target data, and outputs a second result of multiplying the first target data and the second target data.
24. The method of claim 23, wherein the method further comprises:
and the data selector MUX takes the first result or the second result as the output of the matrix multiplier, and the MUX is respectively connected with the second operation circuit and the first operation circuit.
25. The method of claim 21, wherein the method further comprises:
the comparison circuit outputs an enabling signal with a value of 1 to the at least one first register under the condition that the first target data and/or the second target data are determined to be data in the first set;
the at least one first register outputting the first target data and the second target data to the first operation circuit, comprising:
and the at least one first register outputs the first target data and the second target data to the first operation circuit according to the enabling signal with the value of 1.
26. The method of claim 23, wherein the method further comprises:
the comparison circuit outputs an enable signal with a value of 0 to the at least one first register when determining that neither the first target data nor the second target data is data in the first set;
The inverting circuit performs inverting operation on the enabling signal with the value of 0 output by the comparison circuit to obtain an enabling signal with the value of 1, and outputs the enabling signal with the value of 1 to the at least one second register, wherein the inverting circuit is respectively connected with the comparison circuit and the at least one second register;
the at least one second register outputting the acquired first target data and second target data to a second operation circuit in a case where the comparison circuit determines that neither the first target data nor the second target data is data in the first set, comprising:
the at least one second register outputs the acquired first target data and second target data to the second operation circuit based on the enable signal having the value of 1.
27. The method of claim 26, wherein the inverting circuit is an not gate.
28. The method of claim 24, wherein the data selector MUX takes the first result or the second result as an output of the matrix multiplier, comprising:
the MUX takes the first result as the output of the matrix multiplier after receiving the enabling signal with the value of 1; or (b)
And the MUX takes the second result as the output of the matrix multiplier after receiving the enabling signal with the value of 0.
29. A chip comprising a matrix multiplier as claimed in any one of claims 1 to 14.
30. A computing device, characterized in that,
at least one processor; and
at least one memory coupled to the at least one processor;
wherein the at least one memory stores instructions executable by the at least one processor to execute the instructions stored in the at least one memory to cause the computing device to perform the method of any one of claims 15 to 28.
31. A computer program product comprising instructions which, when executed by the computer, cause the computer to perform the method of any of claims 15 to 28.
32. A computer readable storage medium comprising computer program instructions which, when executed by the computer, cause the computer to perform the method of any of claims 15 to 28.
CN202310344718.1A 2023-04-03 2023-04-03 Matrix multiplier, method of matrix multiplication, and computing device Pending CN116048456A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310344718.1A CN116048456A (en) 2023-04-03 2023-04-03 Matrix multiplier, method of matrix multiplication, and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310344718.1A CN116048456A (en) 2023-04-03 2023-04-03 Matrix multiplier, method of matrix multiplication, and computing device

Publications (1)

Publication Number Publication Date
CN116048456A true CN116048456A (en) 2023-05-02

Family

ID=86113666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310344718.1A Pending CN116048456A (en) 2023-04-03 2023-04-03 Matrix multiplier, method of matrix multiplication, and computing device

Country Status (1)

Country Link
CN (1) CN116048456A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284827A (en) * 2017-07-19 2019-01-29 阿里巴巴集团控股有限公司 Neural computing method, equipment, processor and computer readable storage medium
CN112148251A (en) * 2019-06-26 2020-12-29 英特尔公司 System and method for skipping meaningless matrix operations
CN113591031A (en) * 2021-09-30 2021-11-02 沐曦科技(北京)有限公司 Low-power-consumption matrix operation method and device
US20220197595A1 (en) * 2020-12-21 2022-06-23 Intel Corporation Efficient multiply and accumulate instruction when an operand is equal to or near a power of two

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284827A (en) * 2017-07-19 2019-01-29 阿里巴巴集团控股有限公司 Neural computing method, equipment, processor and computer readable storage medium
CN112148251A (en) * 2019-06-26 2020-12-29 英特尔公司 System and method for skipping meaningless matrix operations
US20220197595A1 (en) * 2020-12-21 2022-06-23 Intel Corporation Efficient multiply and accumulate instruction when an operand is equal to or near a power of two
CN113591031A (en) * 2021-09-30 2021-11-02 沐曦科技(北京)有限公司 Low-power-consumption matrix operation method and device

Similar Documents

Publication Publication Date Title
EP4002105B1 (en) Systems and methods for performing 16-bit floating-point matrix dot product instructions
US10402468B2 (en) Processing device for performing convolution operations
RU2656730C2 (en) Three source operand floating point addition processors, methods, systems and instructions
KR101817397B1 (en) Inter-architecture compatability module to allow code module of one architecture to use library module of another architecture
EP3719639B1 (en) Systems and methods to perform floating-point addition with selected rounding
US20160026912A1 (en) Weight-shifting mechanism for convolutional neural networks
KR102556033B1 (en) Packed Collation Plus Calculation Instructions, Processors, Methods, and Systems
EP3716054A2 (en) Interleaved pipeline of floating-point adders
EP3394729B1 (en) Fused multiply add (fma) low functional unit
JP2019197531A (en) Systems and methods for implementing chained tile operations
US20220188073A1 (en) Data-type-aware clock-gating
US20190004807A1 (en) Stream processor with overlapping execution
CN113965205A (en) Bit string compression
CN111696610A (en) Apparatus and method for bit string conversion
CN111625183A (en) Systems, devices, and methods involving acceleration circuitry
CN113918233A (en) AI chip control method, electronic equipment and AI chip
CN111158757B (en) Parallel access device and method and chip
CN110058884B (en) Optimization method, system and storage medium for computational storage instruction set operation
CN116048456A (en) Matrix multiplier, method of matrix multiplication, and computing device
CN113918117B (en) Dynamic precision bit string accumulation
US11886737B2 (en) Devices and systems for in-memory processing determined
US11354130B1 (en) Efficient race-condition detection
CN113508363B (en) Arithmetic and logical operations in a multi-user network
WO2016177083A1 (en) Data storage method, storage device, and computer storage medium
US12099840B1 (en) Throughput increase for tensor operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination