WO2023078364A1

WO2023078364A1 - Operation method and apparatus for matrix multiplication

Info

Publication number: WO2023078364A1
Application number: PCT/CN2022/129619
Authority: WO
Inventors: 雷洪; 甄德根; 吴桐庆; 孔德辉; 徐科
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2021-11-03
Filing date: 2022-11-03
Publication date: 2023-05-11
Also published as: CN116090513A

Abstract

The embodiments of the present invention provide an operation method and apparatus for matrix multiplication. The operation method comprises: respectively splitting two pieces of floating-point-type data of 2N bits into corresponding sign bits, precision bits and index bits, and respectively splitting four pieces of integer-type data of N bits into corresponding sign bits and precision bits; and performing a matrix multiplication operation on the two pieces of floating-point-type data by means of the addition of the index bits, an XOR operation of the sign bits and the multiplication of the precision bits, performing a matrix multiplication operation on every two pieces of the four pieces of integer-type data by means of an XOR operation of the sign bits and the multiplication of the precision bits, and multiplexing a multiplication unit and an addition unit during the matrix multiplication operation of the floating-point-type data and that of the integer-type data. In the present invention, input data of different data types is split, such that multiplication and addition operation resources of an accelerator can be multiplexed during a matrix multiplication process, thereby greatly reducing the area of a chip of the accelerator, and also reducing the cost.

Description

Operation method and device for matrix multiplication

technical field

Embodiments of the present invention relate to the field of matrix multiplication, and in particular, to an operation method and device for matrix multiplication.

Background technique

With the advancement of technology, the neural network in artificial intelligence has an increasing demand for the convolution and fully connected computing capabilities in accelerators. However, convolution and fully connected operations can be converted into matrix multiplication operations. Matrix multiplication consists of multiplication and addition, and the computing power of multiplication and addition of existing accelerators has been increased from GOPS to TOPS. At the same time, the improvement of multiplication and addition computing power in the accelerator requires more computing units to support it. However, for chip designers, it is necessary to support more computing units with a smaller area and cost as much as possible, so as to achieve greater computing power.

Existing AI accelerators mainly support input data types such as INT8, INT16, INT32, FP16, FP32, and FP64. To implement an AI accelerator that supports the first six data types as input, six independent computing units are required to support 6 input operations. Therefore, the disadvantage of this AI accelerator is that, for the same neural network, there is generally one type of input data, and only one computing unit performs calculations at the same time, but multiple independent computing units are required, which increases chip area and cost.

Contents of the invention

Embodiments of the present invention provide a matrix multiplication operation method and device to at least solve the problem of increased chip area and cost caused by the need for independent operation units of various input data types in the accelerator in the related art.

According to one embodiment of the present invention, a kind of operation method of matrix multiplication is provided, comprising: respectively splitting two 2N-bit floating-point data into corresponding sign bits, precision bits and exponent bits, and four N The bit integer data is divided into corresponding sign bits and precision bits respectively; the matrix multiplication operation is performed on the two floating-point data by adding the exponent bits, XORing the sign bits, and multiplying the precision bits, and performing matrix multiplication by using the sign bits Exclusive OR and precision bit multiplication perform a matrix multiplication operation on the four integer data pairs, and multiplex a multiplication unit and an addition unit in the matrix multiplication operation of the floating point data and the integer data.

In an exemplary embodiment, performing matrix multiplication on the two floating-point data by adding exponent bits, sign bit XOR, and precision bit multiplication includes: combining the exponent bits of the first floating-point data with the second The exponent bits of the two floating-point data are added, the sign bit of the first floating-point data and the sign bit of the second floating-point data are XORed, and the first floating-point data The precision bits of the data are multiplied by the precision bits of the second floating-point data.

In an exemplary embodiment, the two-by-two matrix multiplication operation of the four integer data by sign bit XOR and precision bit multiplication includes: combining the sign bit of the first integer data with the second integer data XORing the sign bit, and multiplying the precision bit of the first integer data by the precision bit of the second integer data to obtain a first operation result including the sign bit and the precision bit; XOR operation is performed on the sign bit of the three integer data and the sign bit of the fourth integer data, and the precision bit of the third integer data is multiplied by the precision bit of the fourth integer data to obtain A second operation result including a sign bit and a precision bit; adding the first operation result to the second operation result.

In an exemplary embodiment, two 2N-bit floating-point data are split into corresponding sign bits, precision bits, and exponent bits, and four N-bit integer data are split into corresponding sign bits and Precision bit, including: splitting two 16-bit floating-point data into 1-bit sign bit, 11-bit precision bit and 4-bit exponent bit, and splitting four 8-bit integer data respectively 1 bit for sign and 7 bits for precision.

In an exemplary embodiment, performing matrix multiplication operation on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits includes: combining 1 bit sign bit and 11 bit precision bits The first floating-point data is multiplied by the second floating-point data composed of 1-bit sign bit and 11-bit precision bits to obtain 1-bit sign bit and 22-bit original code, and then convert it to complement code to obtain 1-bit sign bit 22bit complement code.

In an exemplary embodiment, the matrix multiplication operation of the four integer data by sign bit XOR and precision bit multiplication includes: first integer data composed of 1 bit sign bit and 7 bit precision bit, and 1 bit Multiply the second integer data composed of sign bit and 7bit precision bits to obtain the first operation result composed of 1bit sign bit and 14bit original code; combine the third integer data composed of 1bit sign bit and 7bit precision bits with 1bit sign Multiply the fourth integer data composed of 1 bit and 7bit precision bits to obtain the second operation result composed of 1bit sign bit and 14bit original code; add the first multiplication operation and the second multiplication operation result to obtain the 1bit sign bit sum 15bit original code, and then convert it from original code to complement code to get 1bit sign bit and 15bit complement code.

In an exemplary embodiment, the addition operation in the matrix multiplication operation of floating-point data includes: selecting the largest number from the split exponents; respectively calculating the step difference of each index relative to the largest number; according to The step difference right-shifts the product data bits; adds the shifted product data.

According to another embodiment of the present invention, there is provided an arithmetic device for matrix multiplication, including: a splitting module, configured to split two 2N-bit floating-point data into corresponding sign bits, precision bits, and exponents respectively bit, and split the four N-bit integer data into corresponding sign bits and precision bits respectively; the operation module is set to multiply the two floats by adding exponent bits, sign bit XOR and precision bits Perform matrix multiplication operation on point data, and perform matrix multiplication operation on the four integer data two by two by sign bit XOR and precision bit multiplication, and perform matrix multiplication operation on the floating point data and the integer data The multiplication unit and the addition unit are multiplexed in the multiplication operation.

In an exemplary embodiment, the operation module includes: a first operation unit configured to perform an addition operation on the exponent bits of the first floating-point data and the exponent bits of the second floating-point data, and add the first Performing an XOR operation on the sign bit of the floating-point data and the sign bit of the second floating-point data, and performing an XOR operation on the precision bit of the first floating-point data and the precision bit of the second floating-point data multiplication operation.

In an exemplary embodiment, the operation module further includes: a second operation unit configured to perform an XOR operation on the sign bit of the first integer data and the sign bit of the second integer data, and A multiplication operation of the precision bit of the integer type data and the precision bit of the second integer type data to obtain the first operation result including the sign bit and the precision bit; combining the sign bit of the third integer type data with the fourth integer type Performing an XOR operation on the sign bit of the data, and multiplying the precision bit of the third integer data by the precision bit of the fourth integer data to obtain a second operation result including the sign bit and the precision bit; and adding the first operation result to the second operation result

According to yet another embodiment of the present invention, a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, wherein the computer program is configured to perform any one of the above methods when running Steps in the examples.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to perform any of the above Steps in the method examples.

In the embodiment of the present invention, by splitting the input data of different data types, the multiplication and addition operation resources of the accelerator can be reused in the process of matrix multiplication, thereby greatly reducing the chip area and cost of the accelerator.

Description of drawings

Fig. 1 is the flowchart of the computing method of matrix multiplication according to the embodiment of the present invention;

Fig. 2 is the structural block diagram of the computing device of matrix multiplication according to the embodiment of the present invention;

Fig. 3 is a structural block diagram of an arithmetic device for matrix multiplication according to another embodiment of the present invention;

FIG. 4 is a schematic diagram of multiplexing 4 pairs of FP16 multipliers and 8 pairs of INT8 multipliers according to an embodiment of the present invention;

5 is a schematic diagram of preprocessing before multiplication of FP16 and INT8 according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of FP16 and INT8 independently implementing multiplication and converting to a complement form according to an embodiment of the present invention;

Fig. 7 is a schematic diagram of multiplication splitting and multiplexing of FP16 and INT8 according to an embodiment of the present invention;

Fig. 8 is a schematic diagram of addition multiplexing of fp16 and int8 according to an embodiment of the present invention.

Detailed ways

Embodiments of the present invention will be described in detail below with reference to the drawings and in combination with the embodiments.

It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

For the input data type supported by the accelerator, it usually contains multiple matrix multiplication units of the input data type. And because usually only one input data type is used for neural network calculations, only one of the matrix multiplication units of multiple input data types is working at the same time, but matrix multiplication units of multiple input data types must exist with the accelerator.

In order to solve the above problem, an embodiment of the present invention provides an operation method for matrix multiplication. The core of the matrix multiplication operation is the adder and the multiplier. The matrix multiplication operation method provided in this embodiment mainly reuses the multiplier and the adder in the matrix multiplication, and can greatly reduce the area under the condition that the function is realized. consume.

In this embodiment, multiplication and multiplexing are realized by multiplication after data splitting. The multiplexing principle in this embodiment is 2 multiplications and 1 addition of nbit shaping data, and 1 multiplication of 2nbit floating-point data. Perform resource reuse. For example: INT8*INT8+INT8*INT8 resources are multiplexed with FP16*FP16 resources, INT16*INT16+INT16*INT16 resources are multiplexed with FP32*FP32 resources, INT32*INT32+INT32*INT32 resources are multiplexed with FP64*FP64 resources are multiplexed.

Fig. 1 is the flow chart of the operation method of matrix multiplication according to the embodiment of the present invention, as shown in Fig. 1, this flow process comprises the following steps:

Step S102, splitting the two 2N-bit floating-point data into corresponding sign bits, precision bits and exponent bits, and splitting the four N-bit integer data into corresponding sign bits and precision bits;

Step S104, performing matrix multiplication operation on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits, and performing matrix multiplication operations on the four integer data by multiplying sign bits XOR and precision bits The matrix multiplication operation is performed on the data in pairs, and the multiplication unit and the addition unit are multiplexed in the matrix multiplication operation of the floating-point data and the integer data.

In an exemplary embodiment, in step S104, performing matrix multiplication on the two split floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits includes: The exponent bit of the type data is added to the exponent bit of the second floating-point type data, the sign bit of the first floating-point type data is XORed with the sign bit of the second floating-point type data, and the The precision bits of the first floating-point data are multiplied by the precision bits of the second floating-point data.

In an exemplary embodiment, in step S104, performing matrix multiplication operation on the four split integer data two by two by sign bit XOR and precision bit multiplication includes: taking the sign bit of the first integer data performing an XOR operation with the sign bit of the second integer data, and multiplying the precision bit of the first integer data by the precision bit of the second integer data to obtain a value including the sign bit and the precision bit The first operation result; XOR operation is performed on the sign bit of the third integer data and the sign bit of the fourth integer data, and the precision bit of the third integer data is compared with the precision of the fourth integer data bit multiplication to obtain a second operation result including a sign bit and a precision bit; adding the first operation result to the second operation result.

In an exemplary embodiment in which INT8*INT8+INT8*INT8 resources are multiplexed with FP16*FP16 resources, step S102 includes: splitting two 16-bit floating-point data into 1-bit sign bits , 11-bit precision bits and 4-bit exponent bits, and split the four 8-bit integer data into 1-bit sign bits and 7-bit precision bits.

In an exemplary embodiment in which resources of INT8*INT8+INT8*INT8 are multiplexed with resources of FP16*FP16, the two parts after splitting are performed by adding exponent bits, XORing sign bits, and multiplying precision bits. Matrix multiplication of floating-point data includes: multiplying the first floating-point data composed of 1-bit sign and 11-bit precision with the second floating-point data composed of 1-bit sign and 11-bit precision Get 1-bit sign bit and 22-bit original code, and then convert it to complement code to get 1-bit sign bit and 22-bit complement code.

In an exemplary embodiment, performing matrix multiplication operation on the four integer data INT8 by means of sign bit XOR and precision bit multiplication includes: first integer data composed of 1 bit sign bit and 7 bit precision bit, and Multiply the second integer data composed of 1bit sign bit and 7bit precision bits to obtain the first operation result composed of 1bit sign bit and 14bit original code; combine the third integer data composed of 1bit sign bit and 7bit precision bits with 1bit The fourth integer data composed of the sign bit and the 7bit precision bit is multiplied to obtain the second operation result composed of the 1bit sign bit and the 14bit original code; the first multiplication operation is added to the second multiplication operation result to obtain the 1bit sign bit and 15bit original code, and then convert it from original code to complement code to obtain 1bit sign bit and 15bit complement code.

The method of resource multiplexing in this embodiment can be, but is not limited to, multiplied matrix A with 4 rows and 4 columns of nbit integer data by matrix B with 4 rows and 4 columns, or multiplied A with 4 rows and 8 columns of 2nbit floating-point data Matrix B with 8 columns and 4 rows.

In this embodiment, a computing device for matrix multiplication is also provided, and the device is used to realize the above-mentioned embodiments and preferred implementation modes, and what has been explained will not be repeated here. As used below, the term "module" may be a combination of software and/or hardware that realizes a predetermined function. For example, an arithmetic unit consisting of a multiplier and an adder.

FIG. 2 is a structural block diagram of a computing device for matrix multiplication according to an embodiment of the present invention. As shown in FIG. 2 , the computing device 100 includes a splitting module 10 and a computing module 20 .

The splitting module 10 is configured to split two 2N-bit floating-point data into corresponding sign bits, precision bits and exponent bits, and split four N-bit integer data into corresponding sign bits respectively and precision bits.

The operation module 20 is configured to perform matrix multiplication on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits, and performing matrix multiplication operations on the four floating-point data by multiplying sign bits XOR and precision bits. The matrix multiplication operation is performed on two integer data, and the multiplication unit and the addition unit are multiplexed in the matrix multiplication operation of the floating point data and the integer data.

Fig. 3 is a structural block diagram of a computing device for matrix multiplication according to an embodiment of the present invention. As shown in Fig. 3, the computing device 100 includes all modules shown in Fig. 2, and the computing module 10 includes a first computing unit 11 and the second arithmetic unit 12 .

The first computing unit 11 is configured to perform an addition operation on the exponent bit of the first floating-point data and the exponent bit of the second floating-point data, and add the sign bit of the first floating-point data to the second floating-point data The sign bit of the point data is XORed, and the precision bit of the first floating point data is multiplied by the precision bit of the second floating point data.

The second computing unit 12 is configured to perform exclusive OR operation on the sign bit of the first integer data and the sign bit of the second integer data, and combine the precision bit of the first integer data with the second integer data The multiplication operation of the precision bit of the data to obtain the first operation result including the sign bit and the precision bit; the exclusive OR operation is performed on the sign bit of the third integer type data and the sign bit of the fourth integer type data, and the first The precision bit of the three integer data is multiplied by the precision bit of the fourth integer data to obtain a second operation result including a sign bit and a precision bit; and combining the first operation result with the second operation result add the results

In the arithmetic device provided in this embodiment, by splitting the input data of different data types, the multiplication and addition operation resources of the accelerator can be reused in the process of matrix multiplication, thereby greatly reducing the chip area of the accelerator and reducing the cost.

It should be noted that the above-mentioned modules can be realized by software or hardware. For the latter, it can be realized by the following methods, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules can be combined in any combination The forms of are located in different processors.

In order to facilitate the understanding of the present invention, the multiplexing of INT8*INT8+INT8*INT8 resources and FP16*FP16 resources is used as an example below, as shown in FIG. 4 . In the figure, 4 multiplications and 3 additions of FP16 data type and 8 multiplications and 7 additions of INT8 data type are multiplexed.

In this embodiment, matrix multiplication and multiplexing are mainly multiplication and addition. The operation flow of this embodiment is mainly divided into three stages: input data preprocessing, multiplication and addition.

First, the input data is preprocessed.

Specifically, in this embodiment, the input fp16 and int8 are first converted into a fixed format, the main purpose is to enable subsequent multiplication to be multiplexed. As shown in Figure 5, the method of fp16 is to split it into fix12 and the exponent part, and the method of int8 is to convert it into the format of 1-bit sign bit and 7-bit original code.

Secondly, each multiplication unit will input 2 fp16s that need to be multiplied or 4 int8s that need to be multiplied in pairs. The operation method of multiplying two fp16s in pairs is specifically: two 1bit sign bits 11bit original code composed of fix12 Multiply to get 1bit sign bit 22bit original code, then convert it to complement code, and finally get 1bit sign bit 22bit complement code. The multiplication operation method of 4 int8s is as follows: multiply the fix8 composed of two 1bit sign bit 7bit original codes to obtain the 1bit sign bit 14bit original code, and then add the two 1bit sign bit 14bit original codes to obtain the 1bit sign bit 15bit original code , and then convert it from the original code to the complement code, and finally get the 1-bit sign bit 15-bit complement code.

In another embodiment, if the multiplication of FP16 and INT8 is processed separately, their implementation is shown in Figure 6, and finally the products of two INT8s are added to reduce the number of output data.

As shown in Figure 6, the multiplication of two FP16s can be divided into 3 operations: exponential addition, sign bit XOR, 11bit precision bit multiplication, and the multiplication of four INT8 pairs can be expressed as 4 operations: 2 sign bits XOR, two 7bit precision multiplication.

In this embodiment, in order to fully reuse the multiplication resources of fp16 and int8 here, a subsequent resource reuse method for multiplication is proposed.

In this embodiment, the multiplication of fp16 and the multiplication of int8 can be split according to the splitting manner shown in FIG. 7 . Specifically, as shown in Figure 6, the operation 3 of fp16 is divided into smaller granularity 7bit*7bit, 7bit*4bit, 7bit*4bit, 4bit*4bit, and the operation D of int8 is divided into 7bit*4bit, 7bit *4bit format. In this way, three multipliers DSP7*4, DSP7*4, and DSP7*7 can be reused in the end.

In this embodiment, the addition implementation resources in fp16 and int8 matrix multiplication are also fully reused, so a subsequent method for implementing addition operations by multiplexing resources is proposed.

In this embodiment, the addition in the fp16 matrix multiplication operation is performed in the manner shown in FIG. 8 .

The first step is to find the largest index, and find the largest index from the four split indexes, as shown in Figure 8. The 4 indices are compared in pairs and the larger value is selected to obtain 2 indices, and then the 2 indices are compared and the larger value is the maximum value among the 4 indices.

The second step is to calculate the step difference, and calculate the difference between the largest exponent and the 4 indices in the first step, and obtain the step difference of the 4 numbers relative to the largest exponent;

In the third step, shifting, the 4 product data bits are shifted to the right, and the number of shifted digits is the step difference calculated in the second step;

The fourth step is addition. For the first time, add 4 numbers in pairs to get 2 numbers, and then add the 2 numbers for the second time to get 1 number. add_0_3 is the final result.

In this embodiment, the addition in the int8 matrix multiplication operation is performed in the manner shown in FIG. 8 .

That is, 4 numbers are added every pair for the first time to get 2 numbers, and 2 numbers are added for the second time to get 1 number, and add_0_7 is the final result.

In this embodiment, the addition of fp16 and int8 is implemented by multiplexing the following parts: 8 adders for the first addition, 4 adders for the second addition, and 2 adders for the third addition , the adder for the fourth addition.

In this embodiment, the matrix multiplication operation unit can be reused for various operation precisions, and the area consumption is greatly reduced under the premise of ensuring the function. That is, in the case of limited chip area resources, matrix multiplication operations with more precision can be realized, so that artificial intelligence accelerators can support more precision. Thereby improving the computing power of the artificial intelligence accelerator and increasing its application scenarios.

Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is set to execute the steps in any one of the above method embodiments when running.

In an exemplary embodiment, the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.

An embodiment of the present invention also provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any one of the above method embodiments.

In an exemplary embodiment, the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementation manners, and details will not be repeated here in this embodiment.

Obviously, those skilled in the art should understand that each module or each step of the above-mentioned present invention can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network formed by multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present invention is not limited to any specific combination of hardware and software.

The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention shall be included in the protection scope of the present invention.

Claims

An operation method for matrix multiplication, comprising:

Split two 2N-bit floating-point data into corresponding sign bits, precision bits, and exponent bits, and split four N-bit integer data into corresponding sign bits and precision bits;

Perform matrix multiplication on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits, and pairwise multiplying the four integer data by multiplying sign bits XOR and precision bits performing a matrix multiplication operation, and multiplexing a multiplication unit and an addition unit in the matrix multiplication operation of the floating-point data and the integer data.
The method according to claim 1, wherein performing matrix multiplication on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits comprises:

Adding the exponent bit of the first floating-point data to the exponent bit of the second floating-point data, and exclusive-ORing the sign bit of the first floating-point data with the sign bit of the second floating-point data performing an OR operation, and performing a multiplication operation on the precision bits of the first floating-point data and the precision bits of the second floating-point data.
The method according to claim 1, wherein performing matrix multiplication operation on the four integer data two by two by sign bit XOR and precision bit multiplication comprises:

performing an XOR operation on the sign bit of the first integer data and the sign bit of the second integer data, and multiplying the precision bits of the first integer data by the precision bits of the second integer data, to obtain a first operation result including a sign bit and a precision bit;

performing an XOR operation on the sign bit of the third integer data and the sign bit of the fourth integer data, and multiplying the precision bits of the third integer data by the precision bits of the fourth integer data, to obtain a second operation result including a sign bit and a precision bit;

adding the first operation result and the second operation result.
The method according to claim 1, wherein, two 2N-bit floating-point data are split into corresponding sign bits, precision bits and exponent bits, and four N-bit integer data are split into corresponding Sign and precision bits, including:

Split two 16-bit floating-point data into 1-bit sign bit, 11-bit precision bit and 4-bit exponent bit, and split four 8-bit integer data into 1-bit sign bit bits and 7 bits of precision.
The method according to claim 4, wherein, performing matrix multiplication on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits comprises:

Multiply the first floating-point data composed of 1-bit sign bit and 11-bit precision bit with the second floating-point data composed of 1-bit sign bit and 11-bit precision bit to obtain 1-bit sign bit and 22-bit original code, Then convert it to complement code to get 1bit sign bit 22bit complement code.
The method according to claim 4, wherein performing matrix multiplication on the four integer data by sign bit XOR and precision bit multiplication comprises:

Multiplying the first integer data composed of 1 bit sign bit and 7 bit precision bits with the second integer data composed of 1 bit sign bit and 7 bit precision bits to obtain the first operation result composed of 1 bit sign bit and 14 bit original code;

Multiplying the third integer data composed of 1 bit sign bit and 7 bit precision bits with the fourth integer data composed of 1 bit sign bit and 7 bit precision bits to obtain the second operation result composed of 1 bit sign bit and 14 bit original code;

Adding the result of the first multiplication operation and the second multiplication operation to obtain a 1-bit sign bit and a 15-bit original code, and then converting the original code into a complement code to obtain a 1-bit sign bit and a 15-bit complement code.
The method according to claim 4, wherein the addition operation in the matrix multiplication operation of floating-point data comprises:

Select the largest number from the split indices;

Calculate the step difference of each index with respect to said maximum number respectively;

right-shifting the product data bits according to the step difference;

The shifted product data are added.
A computing device for matrix multiplication, comprising:

The split module is used to split two 2N-bit floating-point data into corresponding sign bits, precision bits and exponent bits, and split four N-bit integer data into corresponding sign bits and precision bits;

An operation module, configured to perform matrix multiplication on the two floating-point data by adding exponent bits, XORing sign bits, and multiplying precision bits, and performing matrix multiplication on the four floating-point data by multiplying sign bits XOR and precision bits Integer data is subjected to matrix multiplication operation two by two, and a multiplication unit and an addition unit are multiplexed in the matrix multiplication operation of the floating point data and the integer data.
The device according to claim 8, wherein the computing module comprises:

The first computing unit is configured to add the exponent bit of the first floating-point data to the exponent bit of the second floating-point data, and add the sign bit of the first floating-point data to the second floating-point data Execute an XOR operation on the sign bit of the floating-point data, and perform a multiplication operation on the precision bits of the first floating-point data and the precision bits of the second floating-point data.
The device according to claim 8, wherein the computing module further comprises:

The second operation unit is configured to perform an exclusive OR operation on the sign bit of the first integer data and the sign bit of the second integer data, and perform an XOR operation on the precision bit of the first integer data and the second integer data The multiplication operation of the precision bits to obtain the first operation result including the sign bit and the precision bit; the exclusive OR operation is performed on the sign bit of the third integer type data and the sign bit of the fourth integer type data, and the third multiplying the precision bits of the integer data by the precision bits of the fourth integer data to obtain a second operation result including a sign bit and a precision bit; and combining the first operation result and the second operation result add up.
A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, wherein, when the computer program is executed by a processor, the steps of the method described in any one of claims 1 to 7 are implemented .
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements claims 1 to 7 when executing the computer program The step of the method described in any one.