CN115344826A - Computing device, operating method, and machine-readable storage medium - Google Patents

Computing device, operating method, and machine-readable storage medium Download PDF

Info

Publication number
CN115344826A
CN115344826A CN202210979891.4A CN202210979891A CN115344826A CN 115344826 A CN115344826 A CN 115344826A CN 202210979891 A CN202210979891 A CN 202210979891A CN 115344826 A CN115344826 A CN 115344826A
Authority
CN
China
Prior art keywords
data type
operand
bit
metadata
floating point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210979891.4A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Biren Intelligent Technology Co Ltd
Original Assignee
Shanghai Biren Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biren Intelligent Technology Co Ltd filed Critical Shanghai Biren Intelligent Technology Co Ltd
Priority to CN202210979891.4A priority Critical patent/CN115344826A/en
Publication of CN115344826A publication Critical patent/CN115344826A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Abstract

The invention provides a computing device, an operating method and a machine-readable storage medium. In an embodiment according to the invention, the method of operation comprises: checking data type information carried by a current instruction, wherein the data type information represents the data type of a target operand corresponding to the current instruction; when the data type information indicates that the data type of the target operand is the self-adaptive data type, reading metadata corresponding to the target operand to acquire the actual data type of the target operand from the metadata; and executing the current instruction to process the target operand based on the actual data type of the target operand recorded by the metadata.

Description

Computing device, operating method and machine-readable storage medium
Technical Field
The present invention relates to an instruction set, and more particularly, to a computing device, an operating method, and a machine-readable storage medium.
Background
Generally, when writing a computer program (program), a writer knows the data type of an operand (or operand, operand), so the writer can write an instruction with "specify (fixed) data type" information into the computer program and then use a compiler to compile the computer program. For example, a computer program may include a load instruction from a load instruction with the information "32-bit floating point" (fixed data type) to load an operand with a data type "32-bit floating point" from memory to an arithmetic core (e.g., tensor core). Alternatively, the calculation program may include a Matrix Multiply and Accumulate (MMA) instruction from a floating point number with information "32 bits" (fixed data type) to cause an arithmetic core to perform a matrix multiply calculation on two operands of the "32 bit floating point number" that have been loaded. Regardless, there are some cases where the data type of an operand may be indeterminate at compile time (data type is unknown). For example, the data type of the calculation result of the Hidden layer (Hidden layer) of the Convolutional Neural Network (CNN) calculation program may not be dynamically determined until the calculation is actually performed. However, the current instruction set does not support "indeterminate data types".
Disclosure of Invention
The present invention provides a computing device, a method of operating the same, and a machine-readable storage medium to support an adaptive data type. Adaptive data type means that the data type is unknown at compile time (build).
In an embodiment according to the invention, the method of operation comprises: checking the data type information carried by the current instruction, wherein the data type information represents the data type of a target operand (operand) corresponding to the current instruction; when the data type information indicates that the data type of the target operand is the self-adaptive data type, reading metadata (meta data) corresponding to the target operand to acquire the actual data type of the target operand from the metadata or directly acquire the actual data type of the target operand; and executing the current instruction to process the target operand based on the actual data type of the target operand recorded by the metadata.
In an embodiment consistent with the invention, the machine-readable storage medium is to store non-transitory machine-readable instructions. The non-transitory machine readable instructions, when executed by a computer, may implement a method of operation of the computing device.
In an embodiment according to the present invention, the computing apparatus includes a memory and an operation core. The memory is used for storing target operands. The operation core is coupled to the memory. The arithmetic core checks the data type information carried by the current instruction, wherein the data type information represents the data type of the target operand corresponding to the current instruction. When the data type information indicates that the data type of the target operand is the self-adaptive data type, the operation core reads the metadata corresponding to the target operand so as to obtain the actual data type of the target operand from the metadata, or directly obtain the actual data type of the target operand. Based on the actual data type of the target operand recorded in the metadata, the arithmetic core executes the current instruction to process the target operand.
Based on the above, the arithmetic core may check the data type information carried by the current instruction to determine whether the data type of the target operand corresponding to the current instruction is a fixed (specified) data type or an adaptive data type. The fixed data type means that the data type of the target operand is known at compile time. The adaptive data type means that the data type of the target operand is not known at compile time but is dynamically determined at program execution time. The actual data type of the target operand is recorded in the metadata corresponding to the target operand when the program is executed. Before the arithmetic core executes the current instruction, the arithmetic core checks the data type of the target operand of the current instruction. When the data type of the target operand is the adaptive data type, the arithmetic core may obtain the actual data type of the target operand from the metadata corresponding to the target operand, or directly obtain the actual data type of the target operand. Based on the actual data type of the target operand recorded in the metadata, the arithmetic core can correctly execute the current instruction to process the target operand.
Drawings
Fig. 1 is a schematic block diagram of a computing device according to an embodiment of the invention.
Fig. 2 is a flow chart illustrating a method of operating a computing device according to an embodiment of the invention.
FIG. 3 is a block diagram of an exemplary circuit of an arithmetic core according to an embodiment of the present invention.
FIG. 4 is a block diagram of an operation core according to another embodiment of the present invention.
Description of the reference numerals
100: computing device
110: memory device
120: operation core
121: arithmetic circuit
122. 126: operand buffer
123: conversion unit
124: loading unit
125: status register
Dconv: operands
Dm: metadata
Dorig: calculation results
S210 to S250: step (ii) of
ST: statistical results
Detailed Description
Reference will now be made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.
The term "coupled" as used throughout this specification, including the claims, may refer to any means for connecting, directly or indirectly. For example, if a first device couples (or connects) to a second device, it should be construed that the first device may be directly connected to the second device or the first device may be indirectly connected to the second device through other devices or some means of connection. The terms "first," "second," and the like, as used throughout this specification, including the claims, are used to designate elements (elements) that are not necessarily limited by the number of elements or by the order in which the elements are listed. Further, wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts. Components/parts/steps in different embodiments using the same reference numerals or using the same terms may be referred to one another in relation to the description.
Fig. 1 is a schematic block diagram of a computing device 100 according to an embodiment of the invention. The computing device 100 shown in fig. 1 includes a memory 110 and an operation core 120. The memory 110 is used for storing operands (operands). The present implementation does not limit the specific data structure of the operands. For example, in neural network applications, the operands may be vectors (vectors), tensors (tensors), or other data. The operand may be a matrix (matrix) or any of a plurality of blocks into which a matrix is divided, depending on the actual design. The size of the matrix and the size of the blocks may be determined according to the actual design. For example, in some applications, the size of a block (operand) may be 32 × 32, 64 × 64, or other sizes.
The operation core 120 is coupled to the memory 110. In various embodiments, the operation core 120 includes a tensor core (tensor core), a general matrix multiplication (GEMM) core, an Arithmetic Logic Unit (ALU), and/or other operation units. According to different design requirements, in some embodiments, the implementation manner of the operation core 120 may be a hardware (hardware) circuit. In other embodiments, the implementation of the computing core 120 may be firmware, software (or program), or a combination of the two. In still other embodiments, the implementation of the arithmetic core 120 may be a combination of multiple of hardware, firmware, software.
In terms of hardware, the operation core 120 may be implemented as a logic circuit on an integrated circuit (integrated circuit). For example, the related functions of the computational core 120 may be implemented in various logic blocks, modules, and circuits of one or more controllers, microcontrollers (microcontrollers), microprocessors (microprocessors), application-specific integrated circuits (ASICs), digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), and/or other processing units. The related functions of arithmetic core 120 may be implemented as hardware circuits, such as various logic blocks, modules, and circuits in an integrated circuit, using a hardware description language (e.g., verilog HDL or VHDL) or other suitable programming language.
In the form of software and/or firmware, the related functions of the computing core 120 may be implemented as programming codes (or programming codes). For example, the operation core 120 is implemented by a general programming language (e.g., C + +, or assembly language) or other suitable programming language. The program code may be recorded/stored in a non-transitory machine-readable storage medium. In some embodiments, the machine-readable storage medium includes, for example, a semiconductor memory and/or a storage device. The semiconductor Memory includes a Memory card, a Read Only Memory (ROM), a FLASH Memory (FLASH Memory), a programmable logic circuit, or other semiconductor memories. The storage device includes a tape (tape), a disk (disk), a Hard Disk Drive (HDD), a Solid-state drive (SSD), or other storage devices. An electronic device (e.g., a computer, a Central Processing Unit (CPU), a controller, a microcontroller, or a microprocessor) may read and execute the programming codes from the machine-readable storage medium, so as to implement the functions of the arithmetic core 120. Alternatively, the programming code may be provided to the electronic device via any transmission medium, such as a communications network or broadcast waves, among others. Such as the Internet, a wired communication network, a wireless communication network, or other communication media.
FIG. 2 is a flow chart illustrating a method of operating a computing device according to an embodiment of the invention. In some embodiments, the method of operation of the computing device shown in fig. 2 may be implemented in firmware or software (i.e., a program). For example, operations associated with the method of operation of the computing device shown in fig. 2 may be implemented as non-transitory machine-readable instructions (programming code or program) that may be stored on a machine-readable storage medium. The non-transitory machine readable instructions, when executed by a computer, may implement a method of operation of the computing device shown in fig. 2. In other embodiments, the method of operation of the computing device shown in FIG. 2 may be implemented in hardware, for example, in computing device 100 shown in FIG. 1.
The core 120 may fetch instructions (hereinafter referred to as current instructions) from the memory 110. For example, the core 120 may fetch a load instruction, a Matrix Multiply and Accumulate (MMA) instruction, or other instructions from the memory 110. Generally, assuming that the current instruction is for processing one or more operands, the current instruction carries data type information for the one or more operands. The data type information indicates the data type of the target operand corresponding to the current instruction. The data type information represents any "fixed (specified) data type" provided that the data type is known at compile time. For example, the fixed data type may be 4-bit signed integer (signed integer) s4, 8-bit signed integer s8, 8-bit unsigned integer (unsigned integer) u8, 8-bit floating point number f8, 8-bit brain floating point (bridge float) bf8, 16-bit signed integer s16, standard 16-bit floating point (standard 16-bit at) f16, 16-bit brain floating point (bf 16), standard 32-bit floating point number f32, 32-bit fast floating point (fast float) ff32, or floating point number f32+ over 32, depending on the actual application scenario. In addition, the operands may be scalar (scalar), vector (vector), matrix (matrix), tensor (Tensor), or other operands according to the application. For example, in a neural network application, the operand may be any one of a plurality of blocks after a matrix is split, based on an actual design. The size of the matrix and the size of the blocks may be determined according to the actual design. For example, in some applications, the size of a block (operand) may be 32 × 32, 64 × 64, or other sizes.
In some cases, the data type of the target operand corresponding to the current instruction may be indeterminate at compile time (the data type is temporarily unknown). For example, the data type of the calculation result of the Hidden layer (Hidden layer) of the Convolutional Neural Network (CNN) calculation program may not be dynamically determined until the calculation is actually performed. The instruction set of the embodiment may support an "uncertain data type", that is, an adaptive data type (adaptive data type). The adaptive data type represents that the data type of the target operand is unknown (uncertain) at compile time and dynamically decided at actual execution time.
Please refer to fig. 1 and fig. 2. In step S210, the arithmetic core 120 may check the data type information carried by the current instruction. When the data type information indicates that the data types of all target operands of the current instruction are fixed data types (NO in step S220), the arithmetic core 120 can execute the current instruction to process the target operands based on the fixed data types (step S230). According to practical design, in some embodiments, the step S230 may be a conventional method, and thus is not described herein.
The specific content of the data type information may be determined according to actual design. By way of example, but not limitation, the data type information may include 4-bit coded bits. When the 4-bit coded bit (data type information) is a first value (e.g., 0), it indicates that the fixed data type is a 4-bit signed integer s4. When the data type information is a second value (e.g., 1), it indicates that the fixed data type is an 8-bit signed integer s8. When the data type information is a third value (e.g., 2), it indicates that the fixed data type is an 8-bit unsigned integer u8. When the data type information is a fourth value (e.g., 3), it indicates that the fixed data type is a standard 16-bit floating-point number f16. When the data type information is a fifth value (e.g., 4), it indicates that the fixed data type is a standard 32-bit floating-point number f32. When the data type information is a sixth value (for example, 5), it indicates that the fixed data type is a 16-bit floating-point brain number bf16. When the data type information is a seventh value (e.g., 9), it indicates that the fixed data type is an 8-bit floating point number f8 with a 4-bit exponent (exponent). When the data type information is an eighth value (e.g., 10), it indicates that the fixed data type is an 8-bit brain floating point number bf8 with a 5-bit exponent. When the data type information is a ninth value (e.g., 15), the data type representing the target operand is an adaptive data type.
When the data type information indicates that the data type of any target operand of the current instruction is an adaptive data type (yes in step S220), the arithmetic core 120 may read metadata (meta data) corresponding to the target operand to know the actual data type of the target operand from the metadata, or directly obtain the actual data type of the target operand (step S240). The specific content of the metadata may be determined according to the actual design. For example, but not limited to, the metadata may include an actual data type field for recording an actual data type of a target operand corresponding to the metadata. As one of many examples, the actual data type described by the actual data type field includes an 8-bit floating point number having a first structure, an 8-bit floating point number having a second structure, or a 16-bit floating point number having a third structure. For example, assume that the actual data type field includes a 2-bit encoded number. When the code number is 0, it indicates that the actual data type of the target operand is an 8-bit floating point number having a "1-bit sign (sign), a 5-bit exponent (exponentation), and a 2-bit mantissa" (first structure). Wherein the symbol is used to represent a sign. When the code number is 1, the actual data type of the target operand is an 8-bit floating point number of "1-bit sign, 4-bit exponent, and 3-bit mantissa" (second structure). When the code number is 2, it indicates that the actual data type of the target operand is a 16-bit floating-point number of "1-bit sign, 5-bit exponent, and 10-bit mantissa" (third structure).
In some embodiments, the metadata may also include a scaling factor field to record the amount of shift (offset) of the exponent of the target operand. The arithmetic core 120 may convert long-format (long-format) data into short-format data according to a range of exponent values of each element in a target operand (e.g., a block). For example, assuming that the range of exponent values of all elements in a block (target operand) is 10-20, the arithmetic core 120 may shift the range of exponent values from 10-20 to the range of exponent values from 0-10, and record the shift amount "-10" in the scale factor field of the metadata. Therefore, when the arithmetic core 120 executes the current instruction, the arithmetic core 120 may restore the range of the exponent values of all elements in the target operand from "0 to 10" to "10 to 20" according to the shift amount "-10" of the scale factor field of the metadata.
Based on the actual data type of the target operand described by the metadata, the arithmetic core 120 may execute the current instruction to process the target operand (step S250). For example, assume that the current instruction comprises a load instruction. When the data type information carried by the load instruction indicates that the data type of the target operand is an adaptive data type, the arithmetic core 120 may read the metadata corresponding to the target operand from the memory 110 to obtain the actual data type of the target operand from the metadata (step S240), and then record the metadata and the actual data type in a register (register) inside the arithmetic core 120, such as a status register (state register) or other registers. Thus, based on the actual data type of the target operand specified by the metadata, the arithmetic core 120 may execute a load instruction (current instruction) to load the target operand from the memory 110 into the arithmetic core 120. Assume again that the present instruction comprises a Matrix Multiply and Accumulate (MMA) instruction. When the data type information carried by the MMA instruction indicates that the data type of the target operand is an adaptive data type, the arithmetic core 120 may directly acquire the actual data type of the target operand from its internal register (step S240).
Fig. 3 is a block diagram of the operation core 120 according to an embodiment of the invention. The arithmetic core 120 shown in fig. 3 includes an arithmetic circuit 121, an operand buffer (operand buffer) 122, and a conversion unit (conversion unit) 123. The arithmetic circuit 121 generates a calculation result after completing the previous layer calculation, and stores the calculation result Dorig in the operand buffer 122. Depending on the actual design, operand buffer 122 may be configured inside arithmetic circuitry 121, or external to arithmetic circuitry 121, or in a reduction buffer (reduction buffer), or in thread local registers (thread local registers), in various embodiments. In addition, the arithmetic circuit 121 may count the numerical features of the calculation result Dorig to generate a statistical result ST to the conversion unit 123.
Operand buffer 122 may provide computation result Dorig to conversion unit 123. The conversion unit 123 converts the calculation result Dorig into an operand Dconv (target operand) having a data type suitable for the next layer calculation and metadata Dm corresponding to the operand Dconv based on the statistical result ST. In actual execution, since the conversion unit 123 dynamically determines the data type of the operand Dconv, the conversion unit 123 records the actual data type of the operand Dconv in the metadata Dm corresponding to the operand Dconv, and then stores the operand Dconv and the metadata Dm in the memory 110.
Fig. 4 is a block diagram of an operation core 120 according to another embodiment of the invention. The arithmetic core 120 shown in fig. 4 includes an arithmetic circuit 121, a load unit 124, a status register 125, and an operand buffer 126. Load unit 124 is coupled to memory 110. The status register 125 is coupled between the load unit 124 and the operation circuit 121. The operand buffer 126 is coupled between the load unit 124 and the operation circuit 121. Depending on the actual design, the arithmetic circuit 121 may include a tensor core (tensor core), a general matrix multiplication (GEMM) core, an Arithmetic Logic Unit (ALU), and/or other arithmetic units in various embodiments.
As an illustrative example, assume that the current instruction comprises a load instruction. When the data type information carried by the load instruction indicates that the data type of the target operand is an adaptive data type, the load unit 124 may read the metadata corresponding to the target operand from the memory 110 to obtain the actual data type of the target operand from the metadata. The load unit 124 stores the metadata in the status register 125 for use by the arithmetic circuitry 121. In addition, load unit 124 may read the target operand from memory 110 based on the "actual data type of the target operand" specified by the metadata. Load unit 124 stores the target operand in operand buffer 126 for use by arithmetic circuitry 121.
As another illustrative example, assume that the current instruction comprises a Matrix Multiply and Accumulate (MMA) instruction whose corresponding target operands (first operand and second operand) have been loaded into operand buffers 126 by a previously executed load instruction. Further, the first operand corresponds to a first metadata and the second operand corresponds to a second metadata. By analogy with the description in the previous paragraph, a previously executed load instruction may store the first metadata and the actual data type described by the first metadata (the actual data type of the first operand) and the second metadata and the actual data type described by the second metadata (the actual data type of the first operand) in the status register 125. When the data type information carried by the matrix multiply-and-accumulate instruction indicates that the data type of the first operand is an adaptive data type, the arithmetic circuit 121 may directly obtain the actual data type of the first operand from the status register 125. When the data type information carried by the matrix multiply-and-accumulate instruction indicates that the data type of the second operand is an adaptive data type, the arithmetic circuit 121 may directly obtain the actual data type of the second operand from the status register 125. Based on the actual data type of the first operand and the actual data type of the second operand, the arithmetic circuit 121 can correctly read the first operand and the second operand from the operand buffer 126 and perform matrix multiplication calculation on the first operand and the second operand.
In summary, the arithmetic core 120 may check the data type information carried by the current instruction to determine whether the data type of the target operand corresponding to the current instruction is a fixed (specified) data type or an adaptive data type. The fixed data type means that the data type of the target operand is known at compile time. The adaptive data type means that the data type of the target operand is not known at compile time but is dynamically determined at program execution time. When the program is actually executed, the actual data type of the target operand is recorded in the metadata corresponding to the target operand. Before the arithmetic core 120 executes the current instruction, the arithmetic core 120 may check the data type of the target operand corresponding to the current instruction. When the data type of the target operand is an adaptive data type, the arithmetic core 120 may know the actual data type of the target operand from the metadata corresponding to the target operand. Based on the actual data type of the target operand specified by the metadata, the arithmetic core 120 can correctly execute the current instruction to process the target operand.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (27)

1. A method of operation of a computing device, the method of operation comprising:
checking data type information carried by a current instruction, wherein the data type information represents a data type of a target operand corresponding to the current instruction;
when the data type information indicates that the data type of the target operand is an adaptive data type, reading metadata corresponding to the target operand to obtain the actual data type of the target operand from the metadata, or directly obtaining the actual data type of the target operand; and
based on the actual data type of the target operand that is recorded by the metadata, the current instruction is executed to process the target operand.
2. The method of claim 1, wherein the adaptive data type indicates that a data type of the target operand is unknown at compile time and is dynamically determined at execution time.
3. The method of operation of claim 1, further comprising:
when the data type information indicates that the data type of the target operand is a fixed data type, the current instruction is executed to process the target operand based on the fixed data type.
4. The method of operation of claim 3, wherein the fixed data type comprises a 4-bit signed integer, an 8-bit unsigned integer, an 8-bit floating point, an 8-bit brain floating point, a 16-bit signed integer, a standard 16-bit floating point, a 16-bit brain floating point, a standard 32-bit floating point, a 32-bit fast floating point, or a floating point above 32 bits.
5. The method of claim 4, wherein the fixed data type is a 4-bit signed integer s4 when the data type information is a first value, the fixed data type is an 8-bit signed integer s8 when the data type information is a second value, the fixed data type is an 8-bit unsigned integer u8 when the data type information is a third value, the fixed data type is a standard 16-bit floating point f16 when the data type information is a fourth value, the fixed data type is a standard 32-bit floating point f32 when the data type information is a fifth value, the fixed data type is a 16-bit brain floating point bf16 when the data type information is a sixth value, the fixed data type is an 8-bit floating point f8 with a 4-bit exponent when the data type information is a seventh value, the fixed data type is an 8-bit floating point with a 5-bit exponent when the data type information is an eighth value, and the data type is adapted to the target data type operation.
6. The method of claim 1, wherein the current instruction comprises a load instruction, the method further comprising:
when the data type information carried by the load instruction indicates that the data type of the target operand is the self-adaptive data type, reading the metadata corresponding to the target operand from a memory so as to obtain the actual data type of the target operand from the metadata;
storing the metadata and the actual data type recorded by the metadata in a state register of an operation core;
reading the target operand from the memory based on the actual data type of the target operand recorded by the metadata; and
depositing the target operand in an operand buffer of the arithmetic core.
7. The method of operation of claim 6, wherein the arithmetic core comprises a tensor core, a universal matrix multiplication core, or an arithmetic logic unit.
8. The method of operation of claim 1 wherein the current instruction comprises a matrix multiply and accumulate instruction, the target operand comprises a first operand and a second operand, the metadata comprises a first metadata and a second metadata, the first metadata corresponds to the first operand, the second metadata corresponds to the second operand, the method further comprising:
when the data type information carried by the matrix multiply-accumulate instruction indicates that the data type of the first operand is the self-adaptive data type, directly acquiring the actual data type of the first operand from a state register of an operation core;
when the data type information carried by the matrix multiply and accumulate instruction indicates that the data type of the second operand is the adaptive data type, directly acquiring the actual data type of the second operand from the state register of the operation core;
reading the first operand and the second operand from an operand buffer of the arithmetic core based on the actual data type of the first operand and the actual data type of the second operand; and
performing a matrix multiplication calculation on the first operand and the second operand.
9. The method of claim 1, wherein the metadata comprises an actual data type field for describing the actual data type of the target operand to which the metadata corresponds.
10. The method of operation of claim 9, wherein the real data type specified by the real data type field comprises an 8-bit floating point number having a first structure, an 8-bit floating point number having a second structure, or a 16-bit floating point number having a third structure.
11. The method of operation of claim 10 wherein the first structure is a "1-bit symbol, a 5-bit exponent, and a 2-bit mantissa", the second structure is a "1-bit symbol, a 4-bit exponent, and a 3-bit mantissa", and the third structure is a "1-bit symbol, a 5-bit exponent, and a 10-bit mantissa".
12. The method of operation of claim 9 wherein the metadata further comprises a scale factor field to document an amount of shift in the exponent of the target operand.
13. The method of operation of claim 1, further comprising:
generating a calculation result through an operation core;
counting the numerical characteristics of the calculation result through the operation core to generate a statistical result;
converting, by the computation core, the computation result into the target operand and the metadata based on the statistical result; and
and storing the target operand and the metadata in a memory.
14. A machine-readable storage medium storing non-transitory machine-readable instructions which, when executed by a computer, may implement a method of operation of the computing device of any of claims 1-13.
15. A computing device, the computing device comprising:
a memory for storing a target operand; and
an operation core coupled to the memory, wherein,
the arithmetic core checks data type information carried by a current instruction, wherein the data type information represents the data type of the target operand corresponding to the current instruction;
when the data type information indicates that the data type of the target operand is an adaptive data type, the arithmetic core reads metadata corresponding to the target operand to obtain an actual data type of the target operand from the metadata, or directly obtains the actual data type of the target operand; and
the arithmetic core executes the current instruction to process the target operand based on the actual data type of the target operand documented by the metadata.
16. The computing device of claim 15, wherein the adaptive data type represents a data type of the target operand that is unknown at compile time and dynamically determined at execution time.
17. The computing device of claim 15,
when the data type information indicates that the data type of the target operand is a fixed data type, the arithmetic core executes the current instruction to process the target operand based on the fixed data type.
18. The computing device of claim 17, wherein the fixed data type comprises a 4-bit signed integer, an 8-bit unsigned integer, an 8-bit floating point number, an 8-bit brain floating point number, a 16-bit signed integer, a standard 16-bit floating point number, a 16-bit brain floating point number, a standard 32-bit floating point number, a 32-bit fast floating point number, or a floating point number greater than 32 bits.
19. The computing device of claim 18, wherein the fixed data type is a 4-bit signed integer s4 when the data type information is a first value, the fixed data type is an 8-bit signed integer s8 when the data type information is a second value, the fixed data type is an 8-bit unsigned integer u8 when the data type information is a third value, the fixed data type is a standard 16-bit floating point f16 when the data type information is a fourth value, the fixed data type is a standard 32-bit floating point f32 when the data type information is a fifth value, the fixed data type is a 16-bit brain floating point bf16 when the data type information is a sixth value, the fixed data type is an 8-bit floating point f8 with a 4-bit exponent when the data type information is a seventh value, the fixed data type is an 8-bit brain floating point bf8 with a 5-bit exponent when the data type information is an eighth value, and the data type is adaptable to the target data type operation when the data type information is a ninth value.
20. The computing device of claim 15, wherein the current instruction comprises a load instruction, and wherein the arithmetic core comprises:
a load unit coupled to the memory, wherein when the data type information carried by the load instruction indicates that the data type of the target operand is the adaptive data type, the load unit reads the metadata corresponding to the target operand from the memory to obtain the actual data type of the target operand from the metadata, and the load unit reads the target operand from the memory based on the actual data type of the target operand recorded by the metadata;
a status register coupled to the load unit, wherein the load unit stores the metadata and the actual data type recorded by the metadata in the status register; and
an operand buffer coupled to the load unit, wherein the load unit stores the target operand in the operand buffer.
21. The computing device of claim 20, wherein the computational core further comprises:
an arithmetic circuit coupled to the status register and the operand buffer, wherein the arithmetic circuit comprises a tensor core, a universal matrix multiplication core, or an arithmetic logic unit.
22. The computing device of claim 15, wherein the current instruction comprises a matrix multiply and accumulate instruction, and wherein the arithmetic core comprises:
an operand buffer to store the target operand, wherein the target operand comprises a first operand and a second operand;
a status register for storing the metadata and the actual data type recorded by the metadata, wherein the metadata includes a first metadata and a second metadata, the first metadata corresponds to the first operand, and the second metadata corresponds to the second operand; and
an arithmetic circuit coupled to the status register and the operand buffer, wherein
When the data type information carried by the matrix multiply and accumulate instruction indicates that the data type of the first operand is the adaptive data type, the arithmetic circuit directly acquires the actual data type of the first operand from the state register;
when the data type information carried by the matrix multiply-and-accumulate instruction indicates that the data type of the second operand is the adaptive data type, the arithmetic circuit directly acquires the actual data type of the second operand from the status register;
based on the actual data type of the first operand and the actual data type of the second operand, the arithmetic circuitry reads the first operand and the second operand from the operand buffer; and
the arithmetic circuit performs a matrix multiplication calculation on the first operand and the second operand.
23. The computing device of claim 15, wherein the metadata comprises an actual data type field to record the actual data type of the target operand to which the metadata corresponds.
24. The computing device of claim 23, wherein the actual data type recited in the actual data type field comprises an 8-bit floating point number having a first structure, an 8-bit floating point number having a second structure, or a 16-bit floating point number having a third structure.
25. The computing device of claim 24, wherein the first structure is a "1-bit symbol, a 5-bit exponent, and a 2-bit mantissa", wherein the second structure is a "1-bit symbol, a 4-bit exponent, and a 3-bit mantissa", and wherein the third structure is a "1-bit symbol, a 5-bit exponent, and a 10-bit mantissa".
26. The computing device of claim 23, wherein the metadata further comprises a scale factor field to document an amount of shift in exponent for the target operand.
27. The computing device as claimed in claim 15, wherein the computing core generates a computation result, the computing core performs statistics on a numerical characteristic of the computation result to generate a statistical result, the computing core converts the computation result into the target operand and the metadata based on the statistical result, and the computing core stores the target operand and the metadata in the memory.
CN202210979891.4A 2022-08-16 2022-08-16 Computing device, operating method, and machine-readable storage medium Pending CN115344826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210979891.4A CN115344826A (en) 2022-08-16 2022-08-16 Computing device, operating method, and machine-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210979891.4A CN115344826A (en) 2022-08-16 2022-08-16 Computing device, operating method, and machine-readable storage medium

Publications (1)

Publication Number Publication Date
CN115344826A true CN115344826A (en) 2022-11-15

Family

ID=83952625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210979891.4A Pending CN115344826A (en) 2022-08-16 2022-08-16 Computing device, operating method, and machine-readable storage medium

Country Status (1)

Country Link
CN (1) CN115344826A (en)

Similar Documents

Publication Publication Date Title
US11727276B2 (en) Processing method and accelerating device
CN107608715B (en) Apparatus and method for performing artificial neural network forward operations
US11544057B2 (en) Computer processor for higher precision computations using a mixed-precision decomposition of operations
KR102471606B1 (en) Floating-point instruction format with built-in rounding rules
US11175891B2 (en) Systems and methods to perform floating-point addition with selected rounding
US20170322805A1 (en) Performing Rounding Operations Responsive To An Instruction
CN111656367A (en) System and architecture for neural network accelerator
US7353368B2 (en) Method and apparatus for achieving architectural correctness in a multi-mode processor providing floating-point support
CN112148251A (en) System and method for skipping meaningless matrix operations
WO2002091166A2 (en) Apparatus and method for uniformly performing comparison operations on long word operands
KR20210028075A (en) System to perform unary functions using range-specific coefficient sets
EP4020169A1 (en) Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions
US20210279038A1 (en) Using fuzzy-jbit location of floating-point multiply-accumulate results
US20230161555A1 (en) System and method performing floating-point operations
CN115344826A (en) Computing device, operating method, and machine-readable storage medium
CN104823153B (en) Processor, method, communication equipment, machine readable media, the equipment and equipment for process instruction of normalization add operation for execute instruction
US11221826B2 (en) Parallel rounding for conversion from binary floating point to binary coded decimal
US9348535B1 (en) Compression format designed for a very fast decompressor
US20230281013A1 (en) Machine Code Instruction
US11704092B2 (en) High-precision anchored-implicit processing
CN112862086A (en) Neural network operation processing method and device and computer readable medium
CN117850882A (en) Single instruction multithreading processing device and method
WO2022191859A1 (en) Vector processing using vector-specific data type
CN116991362A (en) Modular multiplication operation processing method, device, electronic equipment and readable medium
CN115202617A (en) Method, system and device for recoding and decoding floating-point number

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant after: Shanghai Bi Ren Technology Co.,Ltd.

Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant before: Shanghai Bilin Intelligent Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information