CN112416295B

CN112416295B - Arithmetic unit for floating point data and tensor data operation

Info

Publication number: CN112416295B
Application number: CN202011427161.0A
Authority: CN
Inventors: 罗闳訚; 何日辉; 周志新
Original assignee: Xiamen Yipu Intelligent Technology Co ltd
Current assignee: Xiamen Yipu Intelligent Technology Co ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2024-02-02
Anticipated expiration: 2040-12-09
Also published as: CN112416295A

Abstract

The invention discloses an arithmetic unit for floating point data and tensor data operation, wherein the arithmetic unit for tensor data operation comprisesTwo input tensor data and a shared E value thereof, and one output tensor data and a shared E value thereof, wherein each number of the tensor data is represented by S+F in EF16 data format, wherein S is a symbol value of EF16 data, and F is a decimal value of EF16 data; the shared E value of the tensor data is an index value of EF16 data; the numerical expression formula of the EF16 data is as follows: (-1) ^signbit ×2 ^{(‑exp onent)} X fraction, wherein sign bit is a symbol value; exp onent is an index value; fraction is a fractional value. When the arithmetic unit executes operations such as multiplication and addition, the S+F part and the shared E value part of the two input tensor data can execute multiplication or addition respectively without data conversion, so that the operation of the tensor data can be effectively simplified, and the calculation efficiency of the tensor data can be improved.

Description

Arithmetic unit for floating point data and tensor data operation

Technical Field

The invention relates to the field of computer mathematical computation, in particular to an arithmetic unit for floating point data and tensor data operation.

Background

The neural network algorithm uses tensor data in floating point format to perform operations such as addition, multiplication, multiply-accumulate and the like. To balance precision and speed of operation, a half-precision floating point format is typically used.

The common half-precision floating point format is the FP16 format, and the FP16 half-precision floating point is represented by the following method:

(1) If the exact bits are all 0:

if the fraction bits are all 0, then the number 0 is indicated;

if the fraction bit is not 0, a very small number (subnormal numbers) is represented, which is calculated as:

(2) If the exact bits are all 1:

if the fraction bits are all 0, the number + -inf is indicated;

if the fraction bit is not 0, then NAN is indicated;

(3) Other cases of the exact bit:

the calculation formula is that

From the above FP16 expression rule, the data expression mode can be divided into three cases according to the content of the exposure: the exponents are all 0, the exponents are all 1 or others. This makes the FP16 calculation necessary to first determine the content of the exact value and then determine the data expression, and contains a large number of control rule logics.

In addition, the data expression mode used in most cases of FP16 is

The two FP16 data are as follows:

the multiplication between them can be expressed as:

finally, the multiplication calculation of the two FP16 data consists of the following calculation:

(1)s3＝s1+s2

(2)e3＝(e1+e2-15)

(3)

therefore, the multiplication operation of FP16 data consists of a plurality of addition, multiplication, and division operations, and the hardware implementation is very complex. Similarly, the addition operation of the FP16 data is very complex, and the processing efficiency of the neural network algorithm is seriously affected.

Disclosure of Invention

The invention aims to overcome at least one defect (deficiency) of the prior art, provides an arithmetic unit for floating point data and tensor data operation, and is based on a novel half-precision floating point expression mode and an operation realization method so as to reconstruct floating point operation, simplify tensor operation steps and improve the processing capacity of large data volume operations such as a neural network algorithm and the like.

To achieve the above object, the present invention proposes an operator for floating point data operations, said operator for performing a multiplication operation or an addition operation, comprising two input floating point data and its exponent value and one output floating point data and its exponent value, said input floating point data and output floating point data being represented in EF16 data format,

the numerical expression formula of the EF16 data is as follows:

(-1) ^signbit ×2 ^(-exponent) ×fraction

wherein sign is a symbol value; the exact is an index value; fraction is a fractional value;

in the case of the arithmetic unit,

the first input floating point data is expressed as two parts of S1+F1 and E1;

the other input floating point data is expressed as two parts of S2+F2 and E2;

the output floating point data is expressed as S3+F3 and E3;

wherein S1, S2 and S3 are symbol values; e1, E2 and E3 are index values; f1, F2, F3 are fractional values.

Further, the data bit width of the EF16 data is 21 bits, including a sign bit with a bit width of 1 bit, a finger bit with a bit width of 5 bits, and a decimal bit with a bit width of 15 bits.

Further, the operator is a multiplier for performing a multiplication operation, the multiplication operation of the multiplier being expressed as:

when E3 is not specified, s3=s1+s2; f3 =f1×f2; e3 =e1+e2; or (b)

At the time of E3 designation, s3=s1+s2; f3 =f1×f2> > E, where e=e3-E1-E2, > > represents a shift-to-right operation.

Further, the arithmetic unit is an adder for performing addition operation, and a precondition for the adder to perform addition operation is e1=e2;

the addition operation of the adder is expressed as:

(S3+F3)＝(S1+F1)+(S2+F2)。

the input and output of the arithmetic unit for floating point data operation are expressed by EF16 data format. EF16 data has a better small expression range, and the maximum expression range is basically the same as that of FP16 data; the exponents of the floating point data directly represent the exponent values of the semi-precision floating point data, and the exponents-15 operation is not required to be executed, so that the mathematical expression of two EP16 data operations is simplified; the fraction value of the fraction value does not need to be a decimal valueUnder the condition of meeting the precondition of operation, the arithmetic can directly carry out addition or multiplication processing on the decimal values of the two EF16 data when carrying out addition or multiplication operation, thereby effectively simplifying the operation of floating point data and improving the calculation efficiency of the floating point data.

The invention also provides an arithmetic unit for tensor data operation, which is characterized by comprising two input tensor data and a shared E value thereof and one output tensor data and a shared E value thereof, wherein each number of the tensor data is represented by S+F in EF16 data format, S is a symbol value of EF16 data, and F is a decimal value of the EF16 data; the shared E value of the tensor data is an index value of EF16 data; the tensor data are expressed as E-value sharing tensor data and separation channel E-value sharing tensor data according to the number of the shared E-values;

the numerical expression formula of the EF16 data is as follows:

(-1) ^signbit ×2 ^(-exponent) ×fraction

the E-value sharing tensor data represents: all numbers in the tensor data share a shared E value;

the separation channel E value sharing tensor data representation: tensor data has c channels, each channel having a shared E value, each shared E value being shared only among the data within each channel;

the operator is used for executing multiplication operation, addition operation or multiplication accumulation operation of tensor data.

Further, the shared E value is transmitted in a parametric manner of the tensor data.

Further, the operator is a multiplier for performing tensor data multiplication operation, the input and output of the multiplier share tensor data for the E value, and the multiplication operation of the multiplier is expressed as:

two tensor data with the same size, wherein each number is multiplied correspondingly to obtain process tensor data with the same size;

adding the shared E values of the two tensor data with the same size to obtain the shared E value of the process tensor data;

when the shared E value of the output tensor data is not specified, assigning the process tensor data and the shared E value thereof to the output tensor data and the shared E value thereof; when the shared E value of the output tensor data is designated, each data of the process tensor data is shifted rightwards according to the difference value between the corresponding designated shared E value and the shared E value of the process tensor data to generate the output tensor data.

Further, the arithmetic unit is a multiplier for performing tensor data multiplication operation, the input and output of the multiplier share tensor data for the separation channel E value, and the multiplication operation of the multiplier is expressed as:

two tensor data with the same size and c channels, wherein each number is multiplied correspondingly to obtain process tensor data with the same size;

the method comprises the steps that tensor data of c channels with the same size are obtained, and shared E values of the corresponding channels are added to obtain shared E values of the c channels of the process tensor data;

Further, the arithmetic unit is an adder for executing tensor data addition operation, the input and output of the adder share tensor data for the E value,

the addition operation of the adder is expressed as:

the method meets the pre-operation conditions: the shared E values of two tensor data of the same size are the same;

two tensor data with the same size, wherein each number is correspondingly added to obtain process tensor data with the same size;

taking the shared E value of the input tensor data as the shared E value of the process tensor data;

Further, the arithmetic unit is an adder for executing tensor data addition operation, the input and output of the adder share tensor data for the separation channel E value,

the addition operation of the adder is expressed as:

the method meets the pre-operation conditions: the shared E value of each channel in the tensor data with the same size is the same;

two tensor data with the same size, wherein each number is correspondingly added to obtain new tensor data with the same size;

taking the shared E value of each separation channel of the input tensor data as the shared E value of each separation channel of the process tensor data;

Further, the operator is a multiply accumulator for performing a tensor data multiply-accumulate operation, the input and output of the multiply accumulator sharing tensor data for the E value, the multiply-accumulate operation of the multiply accumulator being expressed as:

two tensor data with the same size, wherein each number is multiplied correspondingly to obtain first process tensor data with the same size;

accumulating each number of the process tensor data to form second process tensor data with the size of 1 in all dimensions;

adding the shared E values of the tensor data with the same size to obtain the shared E value of the tensor data of the second process;

when the shared E value of the output tensor data is not specified, assigning the second process tensor data and the shared E value thereof to the output tensor data and the shared E value thereof; when the shared E value of the output tensor data is designated, each number of the second process tensor data is shifted rightwards according to the difference value between the corresponding designated shared E value and the shared E value of the process tensor data to generate the output tensor data.

Further, the operator is a multiply accumulator for performing a tensor data multiply-accumulate operation, the input and output of the multiply accumulator sharing tensor data for the separation channel E value, the multiply-accumulate operation of the multiply accumulator being expressed as:

two tensor data with the same size and c channels, wherein each number is multiplied correspondingly to obtain first tensor data with the same size;

accumulating each number of the process tensor data to form second process tensor data with the channel dimension of c and the other dimensions of 1;

the shared E values of the corresponding channels are added to obtain the shared E value of the c channels of the tensor data of the second process;

The tensor data is divided into an integer part and a shared index part by extracting the shared E value (index value), so that the format of the tensor data is greatly simplified, floating point data operation among the tensor data is simplified into integer multiplication operation, addition operation, multiplication accumulation operation and index addition operation by the arithmetic unit, and the operation speed of the neural network can be greatly improved.

Drawings

FIG. 1 is a schematic diagram of an E-value sharing EF16 tensor data structure according to the present invention;

FIG. 2 is a diagram of a split channel E-value sharing EF16 tensor data structure according to the present invention;

FIG. 3 is a schematic diagram of an E-value sharing EF16 tensor multiplication operation according to the present invention;

FIG. 4 is a simplified diagram of an E-value sharing EF16 tensor multiplication operation of the present invention;

FIG. 5 is a schematic diagram of E-value sharing EF16 tensor addition according to the present invention;

FIG. 6 is a simplified diagram of an E-value sharing EF16 tensor addition operation of the present invention;

FIG. 7 is a schematic diagram of an E-value sharing EF16 tensor multiply-accumulate operation according to the present invention;

FIG. 8 is a simplified diagram of an E-value sharing EF16 tensor multiply-accumulate operation in accordance with the present invention;

FIG. 9 is a schematic diagram of a split-channel E-value sharing EF16 tensor multiply-accumulate operation in accordance with the present invention;

FIG. 10 is a simplified diagram of a split-channel E-value sharing EF16 tensor multiply-accumulate operation in accordance with the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the invention. For better illustration of the following embodiments, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

Example 1

The invention provides a half-precision floating point expression mode aiming at neural network computer data and tensor data, which is called EF16.

EF16 basic expression pattern:

the bit width of EF16 data is 21 bits, and the floating point number is expressed as follows:

(-1) ^signbit ×2 ^(-exponent) ×fraction

wherein sign is a symbol value, which is represented by S; exponent is an index value, denoted by E, fraction is a fractional value, denoted by F, so EF16 data can be expressed as S+E+F.

According to the above formula, the minimum expressible number of EF16 can be calculated as:

0 11111 _2* 000000000000001 ₂ ＝2 ^-31 ×1≈0.00000000046566

the maximum number is:

0 00000 _2* 111111111111111 ₂ ＝2 ⁰ ×32768≈32768

the FP16 and EF16 data expression ranges are compared as follows, and it can be seen that EF16 has a better small expression range and the maximum value indicates that the range is substantially the same as FP 16:

EF16 compares with other floating point number expressions as follows:

multiplication of EF16 data:

the two inputs EF16 data are as follows:

(-1) ^s1 ×2 ^(-e1) ×f1

(-1) ^s2 ×2 ^(-e2) ×f2

the multiplication between them can be expressed as:

(-1) ^(s1+s2) ×2 ^(-(e1+e2)) ×(f1*f2)

in this example, the EF16 data is represented as two parts S+F and E, and the first input floating point data is represented as two parts S1+F1 and E1; the other input floating point data is expressed as two parts of S2+F2 and E2; the output floating point data is expressed as two parts of S3+F3 and E3 (S1 and S1, S2 and S2, E1 and E1, E2 and E2, F1 and F1, F2 and F2 and the like respectively represent the same meanings, and are convenient for the line, lower case letters are adopted on the expression of the formula, and upper case letters are adopted in the description of the characters);

the multiplication of EF16 data can be expressed as:

when E3 is not specified, s3=s1+s2; f3 =f1×f2; e3 =e1+e2; or (b)

Addition of EF16 data:

in this example, the precondition for the addition operation of EF16 data is e1=e2;

the addition of EF16 data is expressed as:

(S3+F3)＝(S1+F1)+(S2+F2)。

according to the above established half-precision floating point expression form EF16 for the neural network tensor data and the rule of multiplication and addition of the EF16 floating point data, in this embodiment of the present invention, an arithmetic unit for floating point data is constructed, and the arithmetic unit supports multiplication and addition of the EF16 floating point data.

The arithmetic unit of EF16 floating point data comprises two input EF16 floating point data and one output EF16 floating point data, wherein each EF16 floating point data is divided into an S+F value and an E value during operation so as to perform calculation respectively, wherein S represents sign bits of the EF16 data, F represents decimal places of the EF16 data, and E represents exponent bits of the EF16 data.

Specifically, in the arithmetic unit,

the first input floating point data is expressed as two parts of S1+F1 and E1;

the other input floating point data is expressed as two parts of S2+F2 and E2;

the output floating point data is expressed as S3+F3 and E3;

The multiplication operation of the multiplier is expressed as:

when E3 is not specified, s3=s1+s2; f3 =f1×f2; e3 =e1+e2; or (b)

The operation method can be realized in a mode of software, hardware or combination of the software and the hardware, and forms a basic operation unit of a neural network calculation module such as a multiplier, an adder and the like so as to execute multiplication operation and addition operation between two floating point data.

The arithmetic unit for floating point data operation can realize the following technical effects:

the input and output of the arithmetic unit for floating point data operation are expressed by EF16 data format. EF16 data has a better small expression range, and the maximum expression range is basically the same as that of FP16 data; the exponents of the floating point data directly represent the exponent values of the semi-precision floating point data, and the operation of the exponents-15 is not needed, so that the mathematical expression of the two EP16 data operations is simplified; the fraction value of the fraction value does not need to be a decimal valueUnder the condition of meeting the precondition of operation, the arithmetic operation can directly carry out addition or multiplication processing on the decimal values of two EF16 data when carrying out addition or multiplication operation, thereby effectively simplifying the operation of floating point data and improving floating pointThe computational efficiency of the data.

Example 2

EF16 tensor data:

the EF16 is a half-precision floating point format specifically proposed for tensor data, and when the data type is tensor data, all tensor data share the same exponent value (E value), hereinafter referred to as E value.

For example, one size (h, w, c) of the EF16 tensor data is shown in fig. 1.

Wherein, each number in the tensor data is expressed by using only 16 bits s+f in the EF16 data (specifically, signed integers), all data share the same E value, the E value is transmitted as a parameter of the tensor data, and we call the tensor data in the EF16 format as the E value sharing EF16 tensor data.

In addition, it is also possible to make the EF16 tensor data with c channels have c E values, where each E value is shared only among the h×w data in each channel, we call such EF16 tensor data as split channel E value shared tensor data, and one split channel E value shared EF16 tensor data with a size of (H, W, c) is shown in fig. 2.

E-value sharing EF16 tensor data supports multiplication, addition and multiply-accumulate operations. Wherein the multiplication operation refers to two tensor data with the size of (h, w, c), and each number is correspondingly multiplied to obtain new tensor data with the size of (h, w, c); the addition operation refers to two tensor data with the size of (h, w, c), and each number is correspondingly added to obtain new tensor data with the size of (h, w, c); the multiply-accumulate operation refers to two tensor data with the size of (h, w, c), and each number is correspondingly multiplied and accumulated with each other to obtain new tensor data with the size of (1, 1).

The split-channel E value sharing EF16 tensor data supports multiplication, addition and split-channel multiply-accumulate operations. Wherein the multiplication operation refers to two tensor data with the size of (h, w, c), and each number is correspondingly multiplied to obtain new tensor data with the size of (h, w, c); the addition operation refers to two tensor data with the size of (h, w, c), and each number is correspondingly added to obtain new tensor data with the size of (h, w, c); the separate channel multiply-accumulate operation refers to two tensor data with the size of (h, w, c), and each number is correspondingly multiplied and accumulated with each other to obtain new tensor data with the size of (1, c).

(1) E-value sharing EF16 tensor multiplication operation

As shown in fig. 3, the three tensors have the same size (h, w, c), and the shared E1, E2 and E3 of the three tensors are determined in advance by the system according to the characteristics of the neural network data, so that floating-point multiplication of each number in the floating-point tensor data is simplified to multiplication and shift operations as shown in fig. 4.

The split-channel E-value-sharing EF16 tensor multiplication operation is quite similar to the E-value-sharing EF16 tensor multiplication operation described above, except that the tensor data is replaced with the split-channel E-value-sharing EF16 tensor data described above.

(2) E-value sharing EF16 tensor addition operation

As shown in fig. 5, the three tensors have the same size (h, w, c), and the three tensors must have the same shared E value, at which time the floating point addition of each number in the floating point tensor data is reduced to the addition operation shown in fig. 6.

In addition, the split-channel E-value-sharing EF16 tensor multiplication operation is very similar to the E-value-sharing EF16 tensor multiplication operation described above, except that the tensor data is replaced by the split-channel E-value-sharing EF16 tensor data, and all the es in the split-channel E-value-sharing EF16 tensor data are equal.

(3) E-value sharing EF16 tensor multiply-accumulate operation

As shown in fig. 7, the two tensors have the same size (h, w, c), and the sharing E1 and the sharing E2 of the two tensors are determined in advance by the system according to the characteristics of the neural network data. The E value of the calculated result obtained by multiply-accumulate is equal to E1+E2, and corresponding shift operation can be carried out on the multiply-accumulate data according to the requirement of the subsequent operation. Floating-point multiply-accumulate for each number in the floating-point tensor data is reduced to a multiply-accumulate operation as shown in fig. 8.

(4) Shared EF16 tensor multiply-accumulate operation for separation channel E value

As shown in fig. 9, for the multiply-accumulate operation, the split-channel E value shares tensor data according to the split-channel multiply-accumulate operation method as shown below, where c channels are completely split, with the multiply-accumulate operation occurring only between numbers in the h and w ranges within the channels, as shown in fig. 10.

It should be noted that, for ease of understanding, in the example, tensor data of three dimensions that can be expressed by a physical structure are used as an example to illustrate the respective differences between the tensor data shared by the exponent values and the tensor data shared by the split channel exponent values, but in actual data calculation, the dimensions of the tensor data are not limited.

According to the rule of multiplication, addition and multiply-accumulate of EF16 tensor data, an arithmetic unit for tensor data operation is also constructed in the embodiment of the invention.

The arithmetic unit comprises two input tensor data and a shared E value thereof, and one output tensor data and a shared E value thereof, wherein each number of the tensor data is represented by S+F in an EF16 data format, S is a symbol value of EF16 data, F is a decimal value of EF16 data, and the shared E value is an exponent value of EF16 data; the tensor data are expressed as E-value sharing tensor data and separation channel E-value sharing tensor data according to the number of the shared E-values;

the separation channel E value sharing tensor data representation: tensor data has c channels, each channel having a shared E value, each shared E value being shared only among the data within each channel.

According to the classification of tensor data, an operator for tensor data operation specifically includes: multiplier, adder and multiply-add accumulator for E-value sharing tensor data operation; and multipliers, adders and multiply-add accumulators for split-channel E value-sharing tensor data operations.

At the output of the operator, the shared E value of the output tensor data may be specified according to the requirements of the neural network application. When the shared E value of the output tensor data is specified, each new tensor data generated in the operation process is shifted rightwards according to the difference value between the corresponding shared E value of the output tensor data and the shared E value of the new tensor data to generate the output tensor data.

The operation method can be realized in a mode of software, hardware or combination of software and hardware, and forms a basic operation unit of a neural network calculation module such as a multiplier, an adder, a multiply accumulator and the like so as to execute multiplication operation, addition operation and multiply accumulation operation between two tensor data.

The arithmetic unit for tensor data operation can realize the following technical effects:

In the neural network calculation, the shared E value of each tensor data may be specified in advance, so that the calculation of the tensor data is independent of the E value (index value), or the calculation process of the tensor data is simplified.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An arithmetic unit for floating point data operation, realized by hardware or combination of hardware and software, forms a basic arithmetic unit of a neural network computing module comprising a multiplier and an adder, and is characterized in that: the arithmetic unit is used for executing multiplication or addition operation and comprises two input floating point data and an exponent value thereof and one output floating point data and an exponent value thereof, the input floating point data and the output floating point data are expressed in EF16 data format,

the numerical expression formula of the EF16 data is as follows:

(-1) ^signbit ×2 ^(-exponent) ×fraction

in the case of the arithmetic unit,

one input floating point data is expressed as two parts of S1+F1 and E1;

the other input floating point data is expressed as two parts of S2+F2 and E2;

the output floating point data is expressed as S3+F3 and E3;

wherein S1, S2 and S3 are symbol values; e1, E2 and E3 are index values; f1, F2, F3 are fractional values;

the operator includes a multiplier for performing a multiplication operation;

the multiplication operation of the multiplier is expressed as:

when E3 is not specified, s3=s1+s2; f3 =f1×f2; e3 =e1+e2; or (b)

At the time of E3 designation, s3=s1+s2; f3 =f1×f2> > E, where e=e3-E1-E2, > > represents a shift-to-right operation;

or the operator includes an adder, and a precondition for the adder to perform the addition operation is e1=e2;

the addition operation of the adder is expressed as:

(S3+F3)＝(S1+F1)+(S2+F2)；

wherein: the EF16 data has a data bit width of 21 bits, including a sign bit with a bit width of 1 bit, a finger bit with a bit width of 5 bits and a decimal bit with a bit width of 15 bits.

2. An arithmetic unit for tensor data operation, realized by hardware or combination of hardware and software, forms a basic arithmetic unit of a neural network computing module comprising a multiplier, an adder and a multiplication accumulator, and is characterized by comprising two input tensor data with the size of (h, w, c) and shared E values thereof, and one output tensor data with the size of (h, w, c) and shared E values thereof, wherein each tensor data is represented by S+F in an EF16 data format, S is a sign value of EF16 data, and F is a decimal value of the EF16 data; the shared E value of the tensor data is an index value of EF16 data; the tensor data are expressed as E-value sharing tensor data and separation channel E-value sharing tensor data according to the number of the shared E-values;

the numerical expression formula of the EF16 data is as follows:

(-1) ^signbit ×2 ^(-exponent) ×fraction

the E-value sharing tensor data represents: all numbers in tensor data of size (h, w, c) share a shared E value;

the separation channel E value sharing tensor data representation: tensor data of size (h, w, c) has c channels, each channel having a shared E value, each shared E value being shared only among the h x w data within each channel;

the arithmetic unit is a multiplier for executing tensor data multiplication operation, the input and output of the multiplier share tensor data for the E value, and the multiplication operation of the multiplier is expressed as:

when the shared E value of the output tensor data is not specified, assigning the process tensor data and the shared E value thereof to the output tensor data and the shared E value thereof; when the shared E value of the output tensor data is specified, each number of the process tensor data is shifted rightwards according to the difference value between the corresponding specified shared E value and the shared E value of the process tensor data to generate the output tensor data;

or the arithmetic unit is a multiplier for executing tensor data multiplication operation, the input and output of the multiplier share tensor data for the separation channel E value, and the multiplication operation of the multiplier is expressed as:

or the arithmetic unit is an adder for executing addition operation of tensor data, the input and output of the adder share tensor data for the E value, and the addition operation of the adder is expressed as:

or the arithmetic unit is an adder for executing addition operation of tensor data, the input and output of the adder share tensor data for the separation channel E value, and the addition operation of the adder is expressed as:

or the operator is a multiply accumulator for performing a tensor data multiply-accumulate operation, the input and output of the multiply accumulator sharing tensor data for the E value, the multiply-accumulate operation of the multiply accumulator being expressed as:

when the shared E value of the output tensor data is not specified, assigning the second process tensor data and the shared E value thereof to the output tensor data and the shared E value thereof; when the shared E value of the output tensor data is specified, each number of the second process tensor data is shifted rightwards according to the difference value between the corresponding specified shared E value and the shared E value of the process tensor data to generate the output tensor data;

or the arithmetic unit is a multiplication accumulator for performing a tensor data multiplication accumulation operation, the input and output of the multiplication accumulator share tensor data for the separation channel E value, and the multiplication accumulation operation of the multiplication accumulator is expressed as:

wherein the shared E value is transmitted in a parametric manner of the tensor data.