CN110399117B

CN110399117B - Hybrid multiplication and addition processing method and device

Info

Publication number: CN110399117B
Application number: CN201910702995.9A
Authority: CN
Inventors: 历广绪; 冯闯
Original assignee: Shanghai Suiyuan Intelligent Technology Co ltd
Current assignee: Shanghai Suiyuan Intelligent Technology Co ltd
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2021-05-28
Anticipated expiration: 2039-07-31
Also published as: CN110399117A

Abstract

The invention provides a mixed multiplication-addition processing method and a device, after first-stage processing to third-stage processing is carried out on a first operand and a second operand in sequence, a compressed number is obtained in fourth-stage processing based on first two bits of a fifth intermediate operand and a sixth intermediate operand obtained by the third-stage processing, and the compressed number is spliced before the sixth intermediate operand to obtain a seventh intermediate operand; and then, in the fifth-stage processing, the sign bit of the third operand is expanded to obtain a fourth operand, the fourth operand and the seventh intermediate operand are added, and the addition result is subjected to shaping processing. The fourth operand and the seventh intermediate operand have the same number of bits, and the seventh intermediate operand is a compressed number with a reduced number of bits spliced on the basis of the sixth intermediate operand, so that the number of bits of the fourth operand and the seventh intermediate operand is less than twice the number of bits of the first operand, and the number of bits of two adding numbers in the fifth-stage processing is reduced, thereby reducing the resources and time occupied by the adder.

Description

Hybrid multiplication and addition processing method and device

Technical Field

The invention belongs to the technical field of data calculation, and particularly relates to a hybrid multiplication and addition processing method and device.

Background

The current processor performs a multi-stage pipeline operation when performing a mixed multiply-add operation on an integer type operand (the number of bits of the operand is denoted as N, N is 8 bits, 16 bits, 32 bits, 64 bits, 128 bits, etc.), and the principle of the multi-stage pipeline operation is as follows: multiplying the first operand and the second operand to obtain an intermediate operand, expanding the third operand to an expanded operand with the same number of bits as the intermediate operand, wherein the number of bits of the first operand, the second operand and the third operand is N, the number of bits of the intermediate operand is 2N, then adding the expanded operand and the intermediate operand to obtain data to be shaped, and performing overflow processing on the data to be shaped according to the first N bits of the data to be shaped to obtain a result of mixed multiplication and addition operation. The specific multi-stage pipeline operation is shown in fig. 1, and the process is as follows:

a first stage: splitting a first operand (such as SRC0 in FIG. 1) with N bits into an upper operand (such as A in FIG. 1) and a lower operand (such as B in FIG. 1) by a decoder, splitting a second operand (such as SRC1 in FIG. 1) with N bits into an upper operand (such as C in FIG. 1) and a lower operand (such as D in FIG. 1), wherein the number of bits of the upper operand and the number of bits of the lower operand are respectively half of the number of the corresponding operands;

and a second stage: four N-bit (N-bit) intermediate operands are obtained through four multipliers, and the multiplication modes of the four multipliers are as follows: a, D, B, C and B;

and a third stage: the four N-bit intermediate operands are subjected to addition operation through two adders to obtain two intermediate results which are respectively H and L, and the addition operation on the four N-bit intermediate operands is as follows: adding the high bits of A C, A and B C, and adding the low bits of B D, A and B C;

fourth stage: fusing the two intermediate results H and L into a 2N-bit intermediate operand, and expanding a third operand into 2N bits;

and a fifth stage: the 2N-bit intermediate operand and the expanded third operand are added, and shaping processing (also referred to as clamp processing) is performed on the addition result, where the shaping processing is performed to shape to a positive maximum value or a negative maximum value of the number of bits of the addition result when an integer as the addition result overflows.

From the above process shown in fig. 1, it can be found that: because the fourth stage expands the third operand to 2N bits, the fourth stage needs to occupy at least 2N bit resources when transferring the operand to the fifth stage, and the fifth stage needs to adopt a high-bit-width adder to perform 2N bit addition operation, so that the high-bit-width adder occupies resources and time of an ALU (Arithmetic and Logic Unit) of the processor.

Disclosure of Invention

In view of the above, the present invention provides a hybrid multiply-add processing method and apparatus for reducing resource occupation and processing time.

The invention provides a mixed multiply-add processing method, which is used for carrying out five-stage processing on a first operand, a second operand and a third operand, wherein the first operand, the second operand and the third operand have the same number of bits and the number of bits is an even number, and the method comprises the following steps:

the first stage of processing is used for splitting the first operand into a first upper operand and a first lower operand and splitting the second operand into a second upper operand and a second lower operand;

performing second-stage processing, namely multiplying the first high-order operand with the second high-order operand and the second low-order operand in sequence to obtain a first intermediate operand and a second intermediate operand; multiplying the first low-order operand with the second high-order operand and the second low-order operand in sequence to obtain a third intermediate operand and a fourth intermediate operand;

performing third-stage processing, namely adding the first intermediate operand, the upper part of the second intermediate operand and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand;

fourth-stage processing, namely obtaining a compressed number based on the first two bits of the fifth intermediate operand and the sixth intermediate operand, wherein the number of bits of the compressed number is less than that of the first operand; splicing the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand;

and performing fifth-stage processing, namely expanding sign bits of the third operand to obtain a fourth operand, wherein the fourth operand has the same bits as the seventh intermediate operand, adding the fourth operand and the seventh intermediate operand, and performing shaping processing on an addition result.

Preferably, the obtaining the compressed number based on the first two bits of the fifth intermediate operand and the sixth intermediate operand includes:

adding the first two bits of the sixth intermediate operand and the fifth intermediate operand to obtain an operand to be compressed;

if the operand to be compressed is a signed number, performing an AND operation on a first preset bit to a second preset bit of the operand to be compressed to obtain a first operation result, and performing an OR operation on the first preset bit to the second preset bit to obtain a second operation result, wherein the first preset bit is one of the last bit to the second bit of the operand to be compressed, the second preset bit is one of the second bit to the last second bit of the operand to be compressed, and the second preset bit is positioned before the first preset bit;

intercepting the operand to be compressed based on the first operation result or the second operation result to obtain the compressed number;

if the operand to be compressed is an unsigned number, performing an or operation on a third preset bit to a fourth preset bit of the operand to be compressed to obtain a third operation result, wherein the third preset bit is one of the last bit to the second bit of the operand to be compressed, the fourth preset bit is one of the first bit to the last second bit of the operand to be compressed, the fourth preset bit is located before the third preset bit, and the difference between the fourth preset bit and the third preset bit is smaller than the difference between the number of bits of the first operand and 1;

and intercepting the operand to be compressed based on the third operand to obtain the compressed number.

Preferably, the expanding the sign bit of the third operand to obtain the fourth operand includes: if the third operand is a signed number, increasing the value of the first bit of the third operand with a preset bit number before the third operand;

if the third operand is an unsigned number, increasing a zero of the preset number of bits before the third operand;

the preset number of bits is the same as the number of bits of the compressed number.

Preferably, the shaping processing of the addition result includes:

obtaining an identifier for performing shaping processing on the addition result;

and performing shaping processing on the addition result based on the identifier.

Preferably, the obtaining the identifier for performing the shaping process on the addition result includes:

if the addition result is an unsigned number, obtaining a first identifier of the addition result according to a first preset digit of the addition result, and obtaining a second identifier of the addition result according to an operation type corresponding to the addition result and the compressed number;

and if the addition result is a signed number, obtaining a third identifier of the addition result according to a first preset digit of the addition result, and obtaining a fourth identifier of the addition result according to the operation type corresponding to the addition result and the compressed number, wherein the second preset digit is larger than the first preset digit.

Preferably, the shaping the addition result based on the identifier includes:

shaping the addition result to all 0's if the addition result is an unsigned number and both the first identifier of the addition result and the second identifier of the addition result are valid;

shaping an addition result into full F if the addition result is an unsigned number and a first identifier of the addition result is valid but a second identifier of the addition result is invalid;

forbidding to shape the addition result if the addition result is an unsigned number and the first identifier of the addition result and the second identifier of the addition result are both invalid;

shaping the addition result into a negative maximum value corresponding to the number of bits of the addition result if the addition result is a signed number and the third identifier of the addition result and the fourth identifier of the addition result are both valid;

shaping an addition result into a positive maximum value corresponding to the number of bits of the addition result if the addition result is a signed number and a third identifier of the addition result is valid but a fourth identifier of the addition result is invalid;

and inhibiting shaping of the addition result if the addition result is a signed number and both the third identifier of the addition result and the fourth identifier of the addition result are invalid.

The present invention also provides a hybrid multiply-add processing apparatus for performing five-stage processing on a first operand, a second operand, and a third operand, where the first operand, the second operand, and the third operand have the same number of bits and the number of bits is an even number, the apparatus comprising:

the first-stage processing module is used for splitting the first operand into a first upper-order operand and a first lower-order operand and splitting the second operand into a second upper-order operand and a second lower-order operand;

the second-stage processing module is used for multiplying the first high-order operand with the second high-order operand and the second low-order operand in sequence to obtain a first intermediate operand and a second intermediate operand; multiplying the first low-order operand with the second high-order operand and the second low-order operand in sequence to obtain a third intermediate operand and a fourth intermediate operand;

a third-stage processing module, configured to add the first intermediate operand, the upper part of the second intermediate operand, and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand;

a fourth-stage processing module, configured to obtain a compressed number based on first two bits of the fifth intermediate operand and the sixth intermediate operand, where a bit number of the compressed number is smaller than a bit number of the first operand; splicing the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand;

and the fifth-stage processing module is used for expanding the sign bit of the third operand to obtain a fourth operand, wherein the fourth operand has the same number of bits as the seventh intermediate operand, adding the fourth operand and the seventh intermediate operand, and shaping the addition result.

The present invention also provides a processor, comprising:

a decoder for splitting the first operand into a first upper operand and a first lower operand, and splitting the second operand into a second upper operand and a second lower operand, the first operand, the second operand and the third operand having the same number of bits and an even number of bits;

the multiplier is used for multiplying the first high-order operand with the second high-order operand and the second low-order operand in sequence to obtain a first intermediate operand and a second intermediate operand; multiplying the first low-order operand with the second high-order operand and the second low-order operand in sequence to obtain a third intermediate operand and a fourth intermediate operand;

a first adder for adding the first intermediate operand, the upper part of the second intermediate operand, and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand;

a compression module, configured to obtain a compressed number based on first two bits of the fifth intermediate operand and the sixth intermediate operand, where a bit number of the compressed number is smaller than a bit number of the first operand;

the splicing module is used for splicing the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand;

the extension module is used for extending the sign bit of the third operand to obtain a fourth operand, and the bit numbers of the fourth operand and the seventh intermediate operand are the same;

a second adder, configured to add the fourth operand and the seventh intermediate operand to obtain an addition result;

and the shaping processing module is used for shaping the addition result.

Preferably, the compression module comprises:

the third adder is used for performing addition operation on the first two bits of the sixth intermediate operand and the fifth intermediate operand to obtain an operand to be compressed;

a first logic operation unit, configured to perform an and operation on a first preset bit to a second preset bit of the operand to be compressed to obtain a first operation result and perform an or operation on the first preset bit to the second preset bit to obtain a second operation result if the operand to be compressed is a signed number, where the first preset bit is one of a last bit to a second bit of the operand to be compressed, the second preset bit is one of the second bit to the last second bit of the operand to be compressed, and the second preset bit is located before the first preset bit;

the first compression unit is used for intercepting the operand to be compressed based on the first operation result or the second operation result to obtain the compressed number;

a second logic operation unit, configured to perform an or operation on a third preset bit to a fourth preset bit of the operand to be compressed to obtain a third operation result if the operand to be compressed is an unsigned number, where the third preset bit is one of the last bit to the second bit of the operand to be compressed, the fourth preset bit is one of the first bit to the last second bit of the operand to be compressed, and the fourth preset bit is located before the third preset bit, and a difference between the fourth preset bit and the third preset bit is smaller than a difference between the number of bits of the first operand and 1;

and the second compression unit is used for intercepting the operand to be compressed based on the third operand to obtain the compressed number.

The present invention also provides a storage medium having stored therein a computer program code for implementing the above-described hybrid multiply-add processing method when executed by a processor.

According to the technical scheme, after the first-stage processing, the second-stage processing and the third-stage processing are sequentially performed on the first operand and the second operand, a compressed number with a reduced number of bits can be obtained in the fourth-stage processing based on the first two bits of the fifth intermediate operand and the sixth intermediate operand obtained by the third-stage processing, and the compressed number is spliced before the sixth intermediate operand to obtain a seventh intermediate operand; and then, in the fifth-stage processing, the sign bit of the third operand is expanded to obtain a fourth operand, the fourth operand and the seventh intermediate operand are added, and the addition result is subjected to shaping processing. The number of bits of the fourth operand and the seventh intermediate operand is the same, and the seventh intermediate operand is a reduced number of bits of packed numbers spliced on the basis of the sixth intermediate operand, so that the number of bits of the fourth operand and the seventh intermediate operand is less than twice the number of bits of the first operand, and compared with the prior art, the number of bits of two added numbers in the fifth-stage processing is reduced, so that the bit width of the adder can be reduced in the fifth-stage processing, and the resources and time occupied by the adder are reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a prior art multi-stage process;

FIG. 2 is a schematic diagram of a hybrid multiply-add processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of compression provided by an embodiment of the present invention;

FIG. 4 is a schematic illustration of another compression scheme provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of yet another compression scheme provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of yet another compression scheme provided by an embodiment of the present invention;

fig. 7 is a hardware schematic diagram for implementing a hybrid multiply-add processing method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a hybrid multiply-add processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a processor according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 2, a hybrid multiply-add processing method according to an embodiment of the present invention is shown, where the hybrid multiply-add processing method is used to perform five stages of processing on a first operand, a second operand, and a third operand, where the first operand, the second operand, and the third operand have the same number of bits (the number of bits is N) and the number of bits is an even number, and the corresponding stages of processing in conjunction with fig. 2 are as follows:

the first stage of processing splits the first operand (SRC 0 in FIG. 2) into a first upper operand (N/2 bit A in FIG. 2) and a first lower operand (N/2 bit B in FIG. 2), and splits the second operand (SRC 1 in FIG. 2) into a second upper operand (N/2 bit C in FIG. 2) and a second lower operand (N/2 bit D in FIG. 2).

And the second stage of processing multiplies the first upper-order operand by the second upper-order operand and the second lower-order operand in sequence to obtain a first intermediate operand (A in figure 2C) and a second intermediate operand (A in figure 2D). And multiplying the first lower-order operand with the second upper-order operand and the second lower-order operand in sequence to obtain a third intermediate operand (B & ltC & gt in figure 2) and a fourth intermediate operand (B & ltD & gt in figure 2).

The third stage of processing adds the first intermediate operand, the upper part of the second intermediate operand (half of high of a × D in fig. 2), and the upper part of the third intermediate operand (half of high of B × C in fig. 2), to obtain a fifth intermediate operand (H in fig. 2). The fourth intermediate operand, the lower portion of the second intermediate operand (half of low of a x D in fig. 2), and the lower portion of the third intermediate operand (half of low of B x C in fig. 2) are added to obtain a sixth intermediate operand (L in fig. 2).

A fourth stage of processing, obtaining a compressed number (S in fig. 2) based on the first two bits of the fifth intermediate operand and the sixth intermediate operand, the number of bits of the compressed number being smaller than the number of bits of the first operand; the packed number is spliced before the sixth intermediate operand, resulting in a seventh intermediate operand (Hx).

And a fifth stage of processing, namely expanding sign bits of a third operand (E in fig. 2) to obtain a fourth operand (Ex in fig. 2), wherein the fourth operand and a seventh intermediate operand have the same number of bits, adding the fourth operand and the seventh intermediate operand, and shaping the addition result.

Wherein in the fourth stage of processing, the process of obtaining the compression number includes, but is not limited to, the following steps:

1) and adding the first two bits of the sixth intermediate operand with the fifth intermediate operand to obtain an operand to be compressed (M in fig. 2), wherein the first two bits of the sixth intermediate operand are a carry result of the sixth intermediate operand, such as L [ N +1, N ] in fig. 2.

2) If the operand to be compressed is a signed number, performing an and operation (& ') on a first preset bit (denoted as ls, which indicates that the operand to be compressed starts to be compressed from the left side of the operand to be compressed) to a second preset bit (denoted as rs, which indicates that the operand to be compressed starts to be compressed from the right side of the operand to be compressed) to obtain a first operation result (denoted as CA), and performing an or operation (' | ') on the first preset bit to the second preset bit to obtain a second operation result (denoted as CO), wherein the first preset bit is one of the last bit to the second bit of the operand to be compressed (i.e., ls belongs to [0, N-2]), the second preset bit is one of the second bit to the last second bit of the operand to be compressed (i.e., rs belongs to [ N-2,1]), and the second preset bit is located before the first preset bit.

3) And intercepting the operand to be compressed based on the first operation result or the second operation result to obtain the compressed number. In this embodiment, when the operand to be compressed is intercepted based on the first operation result or the second operation result, it is required to determine a value of the first bit of the operand to be compressed when the operand to be compressed is a signed number, and the process is as follows:

if the operand to be compressed is a signed number and the value of the first bit of the operand to be compressed is 1 (for example, M [ N-1] ═ 1' b1), intercepting the operand to be compressed based on the first operation result to obtain a compressed number;

when the operand to be compressed is intercepted based on the first operation result, data for performing an operation in the compressed number is composed based on the first bit of the operand to be compressed to the (N-rs +1) th bit of the operand to be compressed, the first operation result, and the (ls-1) th bit of the operand to be compressed to the last bit of the operand to be compressed, for example, the data for performing the operation in the composed compressed number is as follows:

s ' ═ {1 ' b1, M [ N-2], M [ … ], M [ N-rs +1], CA, M [ ls-1], M [ … ], M [0] }, and a sign bit is added to the data S ' for operation to obtain a compressed number S.

If the operand to be compressed is a signed number and the value of the first bit of the operand to be compressed is 0(M [ N-1] ═ 1' b0), intercepting the operand to be compressed based on the second operation result to obtain a compressed number;

when the operand to be compressed is intercepted based on the second operation result, data for performing an operation in the compressed number is composed based on the first bit of the operand to be compressed to the (N-rs +1) th bit of the operand to be compressed, the second operation result, and the (ls-1) th bit of the operand to be compressed to the last bit of the operand to be compressed, for example, the data for performing the operation in the composed compressed number is as follows:

s ' ═ 1 ' b0, M [ N-2], M [ … ], M [ N-rs +1], CO, M [ ls-1], M [ … ], M [0], and a sign bit is added to the data S ' used for arithmetic operation to obtain a compressed number S.

4) If the operand to be compressed is an unsigned number, performing an or operation ('|') on a third preset bit (denoted as ls, which indicates that the operand to be compressed starts to be compressed from the left side) to a fourth preset bit (denoted as rs, which indicates that the operand to be compressed starts to be compressed from the right side) to obtain a third operation result (denoted as COO), wherein the third preset bit is one of the last bit to the second bit of the operand to be compressed (i.e., ls belongs to [0, N-2]), the fourth preset bit is one of the first bit to the last second bit of the operand to be compressed (i.e., rs belongs to [ N-1,1]), and the fourth preset bit is located before the third preset bit, and the difference between the fourth preset bit and the third preset bit is smaller than the difference between the number of bits of the first operand and 1 (i.e., rs-ls is smaller than N-1).

And intercepting the operand to be compressed based on the third operand to obtain the compressed number. When the operand to be compressed is intercepted based on the third operation result, data for performing an operation in the compressed number is composed based on the first bit of the operand to be compressed to the (N-rs +1) th bit of the operand to be compressed, the third operation result, and the (ls-1) th bit of the operand to be compressed to the last bit of the operand to be compressed, for example, the data for performing the operation in the composed compressed number is as follows:

s '═ { M [ N-1], M [ … ], M [ N-rs +1], COO, M [ ls-1], M [ … ], M [0] }, and a sign bit is added in front of the data S' used for arithmetic operation to obtain a compressed number S.

In the following, taking fig. 3 to fig. 6 as examples, X ═ 2, ls ═ 0, and rs ═ N-2 are given when the operands to be compressed are signed numbers; x ═ 3, ls ═ 1, rs ═ N-2; x ═ 3, ls ═ 0, rs ═ N-3; x ═ 5, ls ═ 2, rs ═ N-3 give an example of data S' for operation in the compressed number S, and X ═ 2, ls ═ 0, rs ═ N-1 when the operand to be compressed is an unsigned number; x ═ 3, ls ═ 1 or ls ═ 2, rs ═ N-1; x ═ 3, ls ═ 0, rs ═ N-2, or rs ═ N-3; x is 5, ls is 0 or ls is 1 or ls is 2, rs is N-1 or rs is N-2 or rs is N-3 to obtain an example of data S 'for operation in the compressed number S, where X represents the maximum number of bits of S' obtained by truncating the operand to be compressed.

In fig. 4, X ═ 3, ls ═ 1, and rs ═ N-2 are taken as examples to intercept signed operands to be compressed, and X ═ 3, ls ═ 1, or ls ═ 2, and rs ═ N-1 are taken as examples to intercept unsigned operands to be compressed:

for a signed operand to be compressed, performing an and operation (& ') on the (N-2) th bit to the second last bit of the operand to be compressed to obtain a first operation result CA, performing an or operation (' | ') on the (N-2) th bit to the second last bit of the operand to be compressed to obtain a second operation result CO, and if the first bit M [ N-1] of the signed operand to be compressed is 1 (1' b1), performing truncation by using S '{ 1' b1, M [ N-2], M [ … ], M [ N-rs +1], CA, M [ ls-1], M [ … ] and M [0] }, wherein as ls is 1, M [ ls-1] to M [0] positioned behind CA are M [0 ]; rs ═ N-2, M [ N-rs +1] M [3] preceding CA, truncating 1 'b 1 to M [3] would result in an S' bit number greater than 3, so 1 'b 1 would need to be selected from 1' b1 to M [3] to make up S 'from 1' b1, CA and M [0], i.e., S '{ 1' b1, CA, M [0 }. Similarly, if the first bit M [ N-1] of the signed operand to be compressed takes the value 0(1 ' b0), the resulting S ' ═ 1 ' b0, CO, M [0] }.

For an unsigned operand to be compressed, if ls ═ 1, i.e., M [1], participates in or operation, and if (N-1) th to second last bits of the operand to be compressed are subjected to or operation (' | ') to obtain a third operation result COO, i.e., M [ N-1] to M [1], since M [ N-1] to M [1] are all involved in or operation, then when S ' ═ M [ N-1], M [ … ], M [ N-rs +1], COO, M [ ls-1], M [ … ], M [0] } is intercepted, M [ N-1] to M [1] cannot be used as data in S ', and since M [0] does not participate in or operation, S ' ═ COO, M [0] }, which is obtained based on the above formula. If ls ═ 2, i.e., M [1], is not involved or operated, and M [ N-1] to M [2] are involved or operated, the resulting S' ═ { COO, M [1], M [0] }.

With respect to fig. 3, fig. 5 and fig. 6, how the signed and unsigned operands to be compressed are S' is illustrated in fig. 3, fig. 5 and fig. 6, and this embodiment will not be explained again.

In the fifth stage, in order to ensure that the number of bits of the fourth operand obtained after the third operand is expanded is the same as that of the seventh intermediate operand obtained by the fourth stage, an expansion manner of the third operand in this embodiment is as follows:

if the third operand is a signed number, increasing the value of the first bit of the third operand with a preset number of bits before the third operand, wherein the preset number of bits is the same as the number of bits of the compressed number, and if the third operand is a signed number, the sign bit of the third operand is E [ N-1], namely the value of the first bit, so that the extension of the sign bit is realized by increasing the value of the first bit of the third operand before the third operand, and assuming that the number of bits of the compressed number is (X +1), wherein X is the number of bits of data S' used for performing operation in the compressed number S, and 1 is the sign bit of the compressed number S, increasing the values of (X +1) first bits before the third operand, so that the number of bits of the fourth operand is the same as the number of bits of the seventh intermediate operand obtained by the fourth-stage processing.

If the third operand is an unsigned number, adding zero of a preset number of bits before the third operand, wherein the same preset number of bits is the same as the number of bits of the compressed number, and the zero is added because the unsigned number has no sign bit, and the number of bits of the fourth operand is the same as that of the seventh intermediate operand obtained by the fourth-stage processing by adding zero.

In the fifth stage, the addition of the fourth operand and the seventh intermediate operand is different because the operation types corresponding to the first operand, the second operand, and the third operand are different, and assuming that the first operand is SRC0, the second operand is SRC1, and the third operand is E, the operation types include: E-SRC0 SRC1, SRC0 SRC1-/+ E, the corresponding addition operations are as follows:

the seventh intermediate operand Hx ═ S, L [ N-1: 0, S is the number of compressions, L [ N-1: 0 is a sixth intermediate operand, each bit of Hx is subjected to inversion transformation to obtain Hv, the bit number of Hv is the same as that of Hx, each bit of a fourth operand Ex is subjected to inversion transformation to obtain Ev, and the bit number of Ev is the same as that of Ex; if the operation type is E-SRC0 × SRC1, the addition result F is Hx + Ev, if the operation type is SRC0 × SRC1-E, the addition result F is Hx + Ex, and if the operation type is SRC0 × SRC1+ E, the addition result F is Hv + Ex.

Further, after the addition result is obtained, for example, the first identifier uo ═ F [ X + N: n ], obtained from the first X +1 bits of the addition result, and the calculation procedure of the second identifier is: the second identifier sgn _ clp is 0& & S [ X-1:0] ═ {0, xxx } | sub1& & S [ X-1:0] ═ {1, xxx }, xxx denotes 0 or 1, and only the first bit of sub0& & S [ X-1:0] and the first bit of sub1& & S [ X-1:0] need to be considered in the actual calculation of the second identifier, sub0 denotes E-SRC0 × SRC1, and sub1 denotes SRC0 × SRC1- + E.

And if the addition result is a signed number, obtaining a third identifier of the addition result according to a first second preset digit number (such as the first X +2 digits) of the addition result, and determining a fourth identifier of the addition result according to the operation type and the compression number corresponding to the addition result, wherein the second preset digit number is larger than the first preset digit number.

For example, the third identifier so ═ F [ X + N ] & & |. (& F [ X + N-1: N-1]) | | | |! F [ X + N ] & & | (F [ X + N-1: N-1]), which is obtained from the first X +2 bits of the addition result, and the fourth identifier is calculated by: the fourth identifier usgn _ clp is sub0| | sub1& & S [ X-1:0] ═ { (X) {1 'b 0} }, and S [ X-1:0] ═ { (X) { 1' b0} } indicates that X zeros are taken for comparison with sub 1.

And the process of shaping the addition result based on the identifier is as follows:

shaping the addition result to all 0's if the addition result is an unsigned number and the first identifier of the addition result and the second identifier of the addition result are both valid.

If the addition result is an unsigned number and the first identifier of the addition result is valid but the second identifier of the addition result is invalid, the addition result is shaped to full F.

If the addition result is an unsigned number and both the first identifier of the addition result and the second identifier of the addition result are invalid, shaping of the addition result is disabled.

If the addition result is a signed number and the third identifier of the addition result and the fourth identifier of the addition result are both valid, the addition result is shaped into a negative maximum value corresponding to the number of bits of the addition result, for example, the number of bits of the addition result is 32, and a negative maximum value corresponding to 32 bits is obtained.

If the addition result is a signed number and the third identifier of the addition result is valid but the fourth identifier of the addition result is invalid, the addition result is shaped to a positive maximum value corresponding to the number of bits of the addition result.

And forbidding shaping of the addition result if the addition result is signed number and the third identifier of the addition result and the fourth identifier of the addition result are both invalid.

The first identifier and the second identifier are valid when the values of the first identifier and the second identifier are 1, invalid when the values of the first identifier and the second identifier are 0, valid when the values of the third identifier and the fourth identifier are 1, and invalid when the values of the third identifier and the fourth identifier are 0. If after the first identifier to the fourth identifier are obtained based on the above operation, the first identifier to the fourth identifier are inverted, and the first identifier to the fourth identifier are valid when the value of the first identifier to the fourth identifier is 0 and invalid when the value of the first identifier to the fourth identifier is 1, the corresponding process of shaping the addition result based on the identifiers is as follows:

shaping the addition result to full F if the addition result is an unsigned number and the first identifier of the addition result and the second identifier of the addition result are both valid.

If the addition result is an unsigned number and the first identifier of the addition result is valid but the second identifier of the addition result is not valid, the addition result is shaped to all 0 s.

If the addition result is a signed number and the third identifier of the addition result and the fourth identifier of the addition result are both valid, the addition result is shaped to a positive maximum value corresponding to the number of bits of the addition result, e.g. the number of bits of the addition result is 32, and a positive maximum value corresponding to 32 bits is obtained.

Shaping the addition result into a negative maximum value corresponding to the number of bits of the addition result if the addition result is a signed number and the third identifier of the addition result is valid but the fourth identifier of the addition result is invalid.

Based on the above scheme, the following example is given with N ═ 32 and X ═ 2, and this embodiment provides a hardware implementation of the method in the fourth stage and the fifth stage:

when the fourth stage obtains the compressed number, for a signed number, if the value of the most significant bit M [31] of the operand M to be compressed is 1, CA is selected as the cps bit in the figure, if the value of the most significant bit [31] of the operand M to be compressed is 1, CA is selected as the cps bit in the figure, S ' in the compressed number is formed with the most significant bit M [31], for an unsigned number, COO is obtained according to the bit ' or ', finally, the compressed number S is selected according to the is _ sgn identifier, H x is obtained by bit splicing according to the compressed number S and the 32 bits of L obtained by the fourth stage calculation, and H x is transmitted to the fifth stage. The is _ sign mark is used for indicating whether an operand to be compressed forming a compressed number is a signed number or an unsigned number, if the value of the is _ sign is 1, the operand to be compressed is a signed number (signed), and if the value of the is _ sign is 0, the operand to be compressed is an unsigned number (signed).

In the fifth stage, Hx is negated to obtain Hv. And carrying out 3-bit signed sign bit expansion on the E to form Ex, and meanwhile, negating the Ex to obtain Ev. Then selecting Hx/Hv according to the sub0 or sub1 type of the instruction obtained in the instruction analysis stage, carrying out operation by an Ex/Ev adder to obtain an addition result F of 35 bits, and respectively calculating so and uo according to the signed number and the unsigned number. For unsigned numbers, if it is a sub1 type instruction and S is 2 ' b00 and uo is valid, or it is a sub0 type instruction and uo is valid, then it is shaped to 32 ' h0000 — 0000 (all 0), otherwise it is shaped to 32 ' hfffff — FFFF (all F) if uo is valid. For signed numbers, if it is a sub0 type instruction and S is 2 ' b00 or S is 2 ' b01 and so is valid, shaping to 32 ' h8000_0000 (i.e. the maximum value of negative number corresponding to 32 bits); if the command is a sub1 type command and S is 2 ' b10 or S is 2 ' b11 and so is valid, shaping to 32 ' h8000_0000 (i.e. a negative maximum value corresponding to 32 bits); else the other condition and so is valid clamp to 7FFF _ FFFF (i.e., 32-bit corresponding positive maximum).

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Corresponding to the above method embodiments, an embodiment of the present invention further provides a hybrid multiply add processing apparatus, where the hybrid multiply add processing apparatus is configured to perform five-stage processing on a first operand, a second operand, and a third operand, where the first operand, the second operand, and the third operand have the same number of bits and an even number of bits, and the structure of the apparatus is as shown in fig. 8, and may include: a first stage processing module 10, a second stage processing module 20, a third stage processing module 30, a fourth stage processing module 40, and a fifth stage processing module 50.

The first stage processing module 10 is configured to split the first operand into a first upper-order operand and a first lower-order operand, and split the second operand into a second upper-order operand and a second lower-order operand.

A second-stage processing module 20, configured to multiply the first high-order operand with the second high-order operand and the second low-order operand in sequence to obtain a first intermediate operand and a second intermediate operand; and multiplying the first low-order operand with the second high-order operand and the second low-order operand in sequence to obtain a third intermediate operand and a fourth intermediate operand.

A third-stage processing module 30, configured to add the first intermediate operand, the upper part of the second intermediate operand, and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand.

A fourth-stage processing module 40, configured to obtain a compressed number based on the first two bits of the fifth intermediate operand and the sixth intermediate operand, where a bit number of the compressed number is smaller than a bit number of the first operand; and splicing the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand.

The process of obtaining the compression number by the fourth stage processing module 40 includes, but is not limited to, the following steps:

For a detailed description of the process of obtaining the compression number by the fourth-stage processing module 40, please refer to the above method embodiment, which will not be described again.

And a fifth-stage processing module 50, configured to expand a sign bit of the third operand to obtain a fourth operand, where the fourth operand and the seventh intermediate operand have the same number of bits, add the fourth operand and the seventh intermediate operand, and perform shaping processing on an addition result.

In this embodiment, one way for the fifth stage processing module 50 to extend the sign bit of the third operand is to: if the third operand is a signed number, increasing the value of the first bit of the third operand with a preset number of bits before the third operand, wherein the preset number of bits is the same as the number of bits of the compressed number, and if the third operand is a signed number, the sign bit of the third operand is E [ N-1], namely the value of the first bit, so that the extension of the sign bit is realized by increasing the value of the first bit of the third operand before the third operand, and assuming that the number of bits of the compressed number is (X +1), wherein X is the number of bits of data S' used for performing operation in the compressed number S, and 1 is the sign bit of the compressed number S, increasing the values of (X +1) first bits before the third operand, so that the number of bits of the fourth operand is the same as the number of bits of the seventh intermediate operand obtained by the fourth-stage processing.

The addition of the fourth operand and the seventh intermediate operand by the fifth-stage processing module 50 is different because of different operation types corresponding to the first operand, the second operand, and the third operand, and different shaping processes are executed because the addition result is a signed number or an unsigned number after the addition result is obtained.

As can be seen from the above technical solution, after the first-stage processing module 10, the second-stage processing module 20, and the third-stage processing module 30 sequentially perform the first-stage processing, the second-stage processing, and the third-stage processing on the first operand and the second operand, the fourth-stage processing module 40 can obtain a compressed number with a reduced number of bits based on the first two bits of the fifth intermediate operand and the sixth intermediate operand obtained by the third-stage processing module 30, and splice the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand; then, the sign bit of the third operand is expanded by the fifth-stage processing module 50 to obtain a fourth operand, the fourth operand and the seventh intermediate operand are added, and the addition result is shaped. The number of bits of the fourth operand and the seventh intermediate operand is the same, and the seventh intermediate operand is a reduced number of bits of packed numbers spliced on the basis of the sixth intermediate operand, so that the number of bits of the fourth operand and the seventh intermediate operand is less than twice the number of bits of the first operand, and compared with the prior art, the number of bits of two added numbers in the fifth-stage processing is reduced, so that the bit width of the adder can be reduced in the fifth-stage processing, and the resources and time occupied by the adder are reduced.

An embodiment of the present invention further provides a processor, which is shown in fig. 9, and may include: decoder 100, multiplier 200, first adder 300, compression module 400, concatenation module 500, expansion module 600, second adder 700, and shaping processing module 800.

The decoder 100 is configured to split the first operand into a first upper-order operand and a first lower-order operand, and split the second operand into a second upper-order operand and a second lower-order operand, where the first operand, the second operand, and the third operand have the same number of bits and have an even number of bits.

A multiplier 200, configured to multiply the first high-order operand with the second high-order operand and the second low-order operand in sequence to obtain a first intermediate operand and a second intermediate operand; and multiplying the first low-order operand with the second high-order operand and the second low-order operand in sequence to obtain a third intermediate operand and a fourth intermediate operand.

A first adder 300 for adding the first intermediate operand, the upper part of the second intermediate operand, and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand.

The compressing module 400 is configured to obtain a compressed number based on the first two bits of the fifth intermediate operand and the sixth intermediate operand, where the number of bits of the compressed number is smaller than the number of bits of the first operand.

Among other things, the compression module 400 may include: the device comprises a third adder, a first logic operation unit, a first compression unit, a second logic operation unit and a second compression unit.

And the third adder is used for performing addition operation on the first two bits of the sixth intermediate operand and the fifth intermediate operand to obtain an operand to be compressed.

The first logic operation unit is used for performing AND operation on a first preset bit to a second preset bit of the operand to be compressed to obtain a first operation result and performing OR operation on the first preset bit to the second preset bit to obtain a second operation result if the operand to be compressed is a signed number, wherein the first preset bit is one of the last bit to the second bit of the operand to be compressed, the second preset bit is one of the second bit to the last second bit of the operand to be compressed, and the second preset bit is positioned in front of the first preset bit.

And the first compression unit is used for intercepting the operand to be compressed based on the first operation result or the second operation result to obtain the compressed number.

And the second logic operation unit is used for performing OR operation on a third preset bit to a fourth preset bit of the operand to be compressed to obtain a third operation result if the operand to be compressed is an unsigned number, the third preset bit is one of the last bit to the second bit of the operand to be compressed, the fourth preset bit is one of the first bit to the last second bit of the operand to be compressed, the fourth preset bit is positioned before the third preset bit, and the difference between the fourth preset bit and the third preset bit is smaller than the difference between the number of bits of the first operand and 1.

For the specific implementation processes of the first logic operation unit, the first compression unit, the second logic operation unit and the second compression unit, reference is made to the related descriptions in the above method embodiments, and the description of this embodiment is not repeated.

And the splicing module 500 is configured to splice the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand.

The expanding module 600 is configured to expand the sign bit of the third operand to obtain a fourth operand, where the number of bits of the fourth operand is the same as that of the seventh intermediate operand. One extension is as follows:

And a second adder 700 for adding the fourth operand and the seventh intermediate operand to obtain an addition result. In the fifth stage, the addition of the fourth operand and the seventh intermediate operand is different because the operation types corresponding to the first operand, the second operand, and the third operand are different, and it is specifically referred to the related description of the above method embodiment, and this embodiment is not described again.

And a shaping module 800, configured to perform shaping processing on the addition result. For example, the shaping module 800 obtains the identifier for shaping the addition result, and shapes the addition result based on the identifier.

The identifier is determined according to whether the addition result is an unsigned number or a signed number, for example, if the addition result is an unsigned number, a first identifier of the addition result is obtained according to a first preset bit number (for example, the first X +1 bits) before the addition result, and a second identifier of the addition result is obtained according to an operation type and a compression number corresponding to the addition result; if the addition result is a signed number, obtaining a third identifier of the addition result according to a first second preset digit number (such as the first X +2 digits) of the addition result, and determining a fourth identifier of the addition result according to an operation type and a compression number corresponding to the addition result, wherein the second preset digit number is larger than the first preset digit number; the detailed process refers to the related description of the method embodiment, which is not described again. And please refer to the related description in the embodiment of the method for shaping the addition result based on the identifier, which is not described in this embodiment.

The points to be explained here are: the decoder 100, the multiplier 200, the first adder 300, the compression module 400, the concatenation module 500, the expansion module 600, the second adder 700, and the shaping processing module 800 included in the processor, and the third adder, the first logic operation unit, the first compression unit, the second logic operation unit, and the second compression unit included in the compression module 400 can be implemented by existing computer logic components, as shown in fig. 7, or by a programming method, which is not limited in this embodiment.

The embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a computer program code, and the computer program code realizes the hybrid multiplication-addition processing method when being operated by the processor.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A hybrid multiply-add processing method for five-stage processing of a first operand, a second operand, and a third operand, the first operand, the second operand, and the third operand having a same number of bits and an even number of bits, the method comprising:

the first-stage processing and decoding device is used for splitting the first operand into a first upper-order operand and a first lower-order operand and splitting the second operand into a second upper-order operand and a second lower-order operand;

the second-stage processing and multiplying unit is used for multiplying the first high-order operand with the second high-order operand and the second low-order operand in sequence to obtain a first intermediate operand and a second intermediate operand; multiplying the first low-order operand with the second high-order operand and the second low-order operand in sequence to obtain a third intermediate operand and a fourth intermediate operand;

the third-stage processing and the first adder are used for adding the first intermediate operand, the upper part of the second intermediate operand and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand;

the fourth-stage processing and compressing module is used for obtaining a compressed number based on the first two bits of the fifth intermediate operand and the sixth intermediate operand, and the number of bits of the compressed number is smaller than that of the first operand; the splicing module is used for splicing the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand;

the fifth-stage processing and expanding module is used for expanding sign bits of the third operand to obtain a fourth operand, the number of bits of the fourth operand is the same as that of bits of the seventh intermediate operand, and the second adder is used for adding the fourth operand and the seventh intermediate operand;

the shaping processing module is used for obtaining an identifier for shaping the addition result and shaping the addition result based on the identifier;

wherein the obtaining an identifier for performing a shaping process on the addition result includes:

2. The method of claim 1, wherein deriving the packed number based on the first two bits of the fifth and sixth intermediate operands comprises:

3. The method of claim 1, wherein extending the sign bit of the third operand to obtain a fourth operand comprises: if the third operand is a signed number, increasing the value of the first bit of the third operand with a preset bit number before the third operand;

4. The method of claim 1, wherein the shaping the addition result based on the identifier comprises:

5. A hybrid multiply add processing apparatus for five-stage processing of a first operand, a second operand, and a third operand, the first operand, the second operand, and the third operand having a same number of bits and an even number of bits, the apparatus comprising:

a first stage processing module, wherein the decoder is configured to split the first operand into a first upper operand and a first lower operand, and to split the second operand into a second upper operand and a second lower operand;

a third stage processing module, wherein the first adder is configured to add the first intermediate operand, the upper part of the second intermediate operand, and the upper part of the third intermediate operand to obtain a fifth intermediate operand; adding the fourth intermediate operand, the lower portion of the second intermediate operand, and the lower portion of the third intermediate operand to obtain a sixth intermediate operand;

a fourth-stage processing module, wherein the compressing module is configured to obtain a compressed number based on the first two bits of the fifth intermediate operand and the sixth intermediate operand, and the number of bits of the compressed number is smaller than the number of bits of the first operand; the splicing module is used for splicing the compressed number before the sixth intermediate operand to obtain a seventh intermediate operand;

a fifth-stage processing module, wherein the expansion module is configured to expand a sign bit of the third operand to obtain a fourth operand, and the fourth operand has the same number of bits as the seventh intermediate operand, and the second adder is configured to add the fourth operand and the seventh intermediate operand, and shape an addition result;

in the fifth-stage processing module, the shaping processing module is specifically configured to: obtaining an identifier for performing shaping processing on an addition result, and performing shaping processing on the addition result based on the identifier;

6. A processor, comprising:

the decoder is used for splitting a first operand into a first upper-order operand and a first lower-order operand and splitting a second operand into a second upper-order operand and a second lower-order operand, wherein the first operand, the second operand and the third operand have the same number of bits and the number of bits is an even number;

the shaping processing module is used for shaping the addition result;

the shaping processing module is specifically configured to: an identifier for performing a shaping process on the addition result is obtained,

shaping the addition result based on the identifier;

7. The processor of claim 6, wherein the compression module comprises:

8. A storage medium having stored therein computer program code for implementing a hybrid multiply-add processing method according to any one of claims 1 to 4 when executed by a processor.