CN101706712B

CN101706712B - Operation device and method for multiplying and adding floating point vector

Info

Publication number: CN101706712B
Application number: CN2009102416148A
Authority: CN
Inventors: 胡伟武; 陈云霁; 朱海涛
Original assignee: BEIJING LOONGSON ZHONGKE TECHNOLOGY SERVICE CENTER Co Ltd
Current assignee: Loongson Technology Corp Ltd
Priority date: 2009-11-27
Filing date: 2009-11-27
Publication date: 2011-08-31
Anticipated expiration: 2029-11-27
Also published as: CN101706712A

Abstract

The invention discloses operation device and method for multiplying and adding a floating point vector. The device comprises a multiplier selecting unit, n multiplication units and n addition units, wherein the multiplier selecting unit is used for selecting the (m+1)th part of a second source operand floating point vector and outputting the (m+1)th part to the n multiplication units according to a value m of a fourth source operand, wherein m is less than or equal to n-1; the multiplication units are used for respectively multiplying the (m+1)th part of the selected second source operand floating point vector and n parts of a third source operand floating point vector and outputting an operation result from the multiplication operation to the corresponding n addition units; and the adding units are used for respectively adding the n parts of the operation result from the multiplication operation and n parts of a first source operand floating point vector to obtain a result of multiplication and addition. Therefore, a shuffle shift instruction in the operation of the floating point vector can be saved, the length of a program is reduced, and the operation efficiency and velocity of a microprocessor are improve when the multiplication and addition operations are carried out.

Description

Floating point vector multiply-add operation apparatus and method

Technical field

The present invention relates to the arithmetic unit design field in the microprocessor, particularly relate to a kind of floating point vector multiply-add operation apparatus and method.

Background technology

In microprocessor, floating point vector takes advantage of the application of made component to improve floating-point operation speed greatly.

Floating point vector is taken advantage of made component to take advantage of by floating point vector to add instruction, with a floating-point operation count A0 and floating point vector (B0, B1 ... Bn-1) multiply each other, with three floating point vectors of its product accumulation to the (C0, C1 ... Cn-1), that is:

(C0，C1，……Cn-1)＝(C0，C1，……Cn-1)+A0*(B0，B1，……Bn-1)

Wherein, each floating-point operation number leaves in the relevant register.

A kind of very important use of floating point vector parts is exactly to carry out the matrix operation shown in a among Fig. 1.

In existing technology, the matrix operation among Fig. 1 is taken advantage of by shuffle instruction and vector and is added the packing of orders and make and be used for finishing.If A0, A1 ... An-1 has left among the register REG_A in order, B0, B1 ... Bn-1 leaves among the register REG_B, C00, C01 ... C0 (n-1) leaves among the register REG_C0, C10, C11 ... C1 (n-1) leaves among the register REG_C1, C (n-1) 0, C (n-1) 1 ... C (n-1) (n-1) leaves among the register REG_C (n-1).

At first, vector take advantage of add the instruction carry out (C00, C01 ... C0 (n-1)) +=A0* (B0, B1 ... Bn-1) computing is shown in b among Fig. 1.Then shuffle instruction to vector (A0, A1 ... An-1) be shifted, obtain (A1 ... An-1,0), next vector take advantage of add instruction carry out (C10, C11 ... C1 (n-1)) +=A1* (B0, B1 ... Bn-1) computing is shown in c among Fig. 1; Shuffle instruction to vector (An-2, An-1 ... 0) is shifted, obtains (An-1,0 ... 0), vector instruction is carried out (C (n-1) 0, C (n-1) 1 then, " ... C (n-1) is (n-1)) +=A (n-1) * (B0, B1 ... computing Bn-1), shown in d among Fig. 1.

This shows, use prior art, two instructions of every execution (shift order, an operational order) just can be finished once vector and take advantage of add operation, and efficient is very low.

Summary of the invention

The object of the present invention is to provide a kind of floating point vector multiply-add operation apparatus and method, overcome defective of the prior art, shuffle shift order and vector taken advantage of add instruction and be fused into a new vector and take advantage of and add instruction, save the shift order when carrying out the floating point vector computing, reduce program length, improve operation efficiency and the speed of microprocessor when the execute vector multiply-add operation.

In order to realize described purpose, the invention provides a kind of floating point vector multiply-add operation device, comprise a multiplier selected cell, n multiplication unit, n adder unit;

The multiplier selected cell is used for the value m according to the 4th source operand, selects the m+1 of the second source operand floating point vector partly to output to n multiplication unit; Wherein, m≤n-1;

Multiplication unit is used for the m+1 part of the selected second source operand floating point vector is carried out multiplication operations respectively with n part of the 3rd source operand floating point vector, and the operation result of multiplication operations is outputed to a corresponding n adder unit;

Adder unit is used for n part of the operation result of multiplication operations and n of the first source operand floating point vector are partly carried out add operation respectively, obtains taking advantage of adding the result;

Wherein, the described first source operand floating point vector, the second source operand floating point vector and the 3rd source operand floating point vector leave in the register, and the 4th source operand is for counting immediately; Described taking advantage of adds the register that the result deposits the described first source operand place in, and promptly first source operand is a destination operand.

For realizing that the object of the invention also provides a kind of disposal route of floating point vector multiply-add operation device, comprises the steps:

Steps A, n the part floating point vector addend that have that will carry out the floating point vector multiply-add operation leaves in the first source operand register of floating point vector multiply-add operation device;

The floating point vector multiplier with n part that will carry out the floating point vector multiply-add operation leaves in the second source operand register of floating point vector multiply-add operation device;

The floating point vector multiplicand with n part that will carry out the floating point vector multiply-add operation leaves in the 3rd source operand register of floating point vector multiply-add operation device;

With participating in the location number of the multiplier of multiply-add operation in the described second source operand register, leave in the 4th source operand of floating point vector multiply-add operation device; Described the 4th source operand is to count m immediately;

Step B, according to the value m of described the 4th source operand, the m+1 that the multiplier selected cell of described floating point vector multiply-add operation device selects to have in the described second source operand register the second source operand floating point vector of n part partly outputs to n multiplication unit of floating point vector multiply-add operation device;

And the 3rd source operand floating point vector that has n part in described the 3rd source operand register outputed to n multiplication unit of corresponding floating point vector multiply-add operation device; Wherein, m≤n-1;

Step C, the multiplication unit of described floating point vector multiply-add operation device carries out multiplication operations with n part of the selected described second source operand floating point vector respectively with n part of described the 3rd source operand floating point vector, and the operation result of multiplication operations is outputed to n adder unit of corresponding described floating point vector multiply-add operation device;

Step D, the adder unit of described floating point vector multiply-add operation device carries out add operation respectively with n part of the operation result of described multiplication unit multiplication operations and the first source operand floating point vector in the described first source operand register, obtains taking advantage of adding the result.

Beneficial effect of the present invention: use floating point vector multiply-add operation apparatus and method of the present invention, only need take advantage of to add in the instruction and specify multiplier by the 4th source operand at vector, having avoided whenever executing in the prior art vector takes advantage of and adds instruction, will reset the defective of multiplier by a shuffle shift order, reduce program length, improved operation efficiency and speed.

Description of drawings

Fig. 1 is to use prior art to carry out the synoptic diagram of floating point vector multiply-add operation;

Fig. 2 is the floating point vector multiply-add operation apparatus structure synoptic diagram of the embodiment of the invention;

Fig. 3 is the process flow figure of the floating point vector multiply-add operation device of the embodiment of the invention;

Fig. 4 is that the floating point vector of the embodiment of the invention is taken advantage of the synoptic diagram that adds the instruction execution;

Fig. 5 is a floating point vector multiply-add operation matrix synoptic diagram in the embodiment of the invention;

Fig. 6 is a false code synoptic diagram of realizing Fig. 5 matrix multiplication in the embodiment of the invention with the instruction of floating point vector multiply-add operation.

Fig. 7 is to use the false code synoptic diagram of matrix operation in the prior art calculating chart 5.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer,, floating point vector multiply-add operation apparatus and method of the present invention are further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein is only in order to explain the present invention rather than limitation of the present invention.

Shuffle parts and vector that the floating point vector multiply-add operation device of the embodiment of the invention has merged in the existing vectorial multiply-add operation method are taken advantage of made component, take advantage of when adding that the 4th source operand is 0 in the instruction when vector, carry out the computing of C=C+A0*B; When the 4th source operand is 1, carry out the computing of C=C+A1*B; When the 4th source operand is n-1, carry out the computing of C=C+A (n-1) * B.Avoided to be shifted to vectorial A by the shuffle instruction in the prior art multiplier has been set.

As shown in Figure 2, be the floating point vector multiply-add operation apparatus structure synoptic diagram of the embodiment of the invention.The function that this device is realized is:

Destination operand＜-the first source operand+second a source operand part (selecting) * the 3rd source operand by the 4th source operand.

This device comprises a multiplier selected cell 1, n multiplication unit 2, n adder unit 3.

Multiplier selected cell 1 is used for the value m according to the 4th source operand, selects the m+1 of the second source operand floating point vector partly to output to n multiplication unit 2;

Wherein, m≤n-1.

When the value of the 4th source operand is 0, select the first of the second source operand floating point vector; When the value of the 4th source operand is 1, select the second portion of the second source operand floating point vector; When the 4th source operand is n-1, select the n part of the second source operand floating point vector.

Multiplication unit 2 is used for the m+1 part of the selected second source operand floating point vector is carried out multiplication operations respectively with n part of the 3rd source operand floating point vector, and the operation result of multiplication operations is outputed to a corresponding n adder unit 3.

Adder unit 3 is used for n part of the operation result of multiplication operations and n of the first source operand floating point vector are partly carried out add operation respectively, obtains taking advantage of adding the result.

The described first source operand floating point vector, the second source operand floating point vector and the 3rd source operand floating point vector leave in the register, and described the 4th source operand is for counting immediately;

The result of described add operation is kept in first source operand as the end product of whole multiply-add operation, and the result who is about to add operation deposits the relevant position that described first source operand is deposited register in, and promptly first source operand is a destination operand.

Preferably, described multiplier selected cell 1 is a MUX.

Preferably, described multiplication unit 2 is made of a plurality of parallel multiplication subelements.

Preferably, described adder unit 3 is made of a plurality of parallel addition subelements.

Based on floating point vector multiply-add operation device shown in Figure 2, Fig. 3 illustrates the process flow diagram of disposal route of the floating point vector multiply-add operation device of the embodiment of the invention, and the disposal route of described floating point vector multiply-add operation device comprises the steps:

Step S100, the floating point vector that will have n addend leaves in the first source operand register, floating point vector with n multiplier leaves in the second source operand register, floating point vector with n multiplicand leaves in the 3rd source operand register, the location number of the multiplier that will participate in computing in the second source operand register, leave in the 4th source operand.

Step S200, according to the value m of the 4th source operand, the m+1 part of selecting to have n the second source operand floating point vector partly outputs to n multiplication unit 2 as multiplier; The 3rd source operand floating point vector outputs to a corresponding n multiplication unit 2; Wherein, m≤n-1.

Step S300 carries out multiplication operations respectively with the m+1 part and n the part of the 3rd source operand floating point vector of the selected second source operand floating point vector, and the operation result of multiplication operations is outputed to a corresponding n adder unit 3.

Step S400, n addend in the first source operand register outputed to corresponding adder unit 3, after the multiplication result among the step S300 is input to adder unit 3, carry out additive operation, the result of additive operation is kept in first source operand as the end product of whole multiply-add operation.

As a kind of embodiment, described method can take advantage of the mode that adds instruction to realize with the computing machine vector, as shown in Figure 4, takes advantage of the synoptic diagram that adds the instruction execution for the floating point vector of the embodiment of the invention.The described form that adds instruction of taking advantage of is: operational code first source operand (destination operand), second source operand, the 3rd source operand, the 4th source operand.Described taking advantage of adds instruction and is used to be achieved as follows function:

Wherein, first source operand, second source operand and the 3rd source operand leave in the register, and the 4th operand is for counting immediately, and described first source operand is a destination operand.

The following process that the disposal route of using floating point vector multiply-add operation device of the present invention is described with a by way of example.If register length is 256bit, each data element is the double-precision quantity of 8 bytes.With matrix multiplication shown in Figure 5 is example, establishes A0, and A1, A2, A3 have left among the register REG_A in order, B0, B1, B2, B3 leave among the register REG_B, C00, C01, C02, C03 leave among the register REG_C0, C10, C11, C12, C13 leave among the register REG_C1, C20, C21, C22, C23 leave among the register REG C2, C30, C31, C32, C33 leave among the register REG_C3.The false code of Fig. 6 for using floating point vector multiply-add operation method of the present invention to carry out computing.

In first step, the value of the 4th source operand is 0, the A0 among the multiplier selected cell 1 selection this moment second source operand REG_A, B0 among A0 and the 3rd source operand REG_B then, B1, B2, B3 carry out multiplying simultaneously, operation result outputs to four adder units 3, respectively with the first source operand REG_C0 in C00, C01, C02, C03 carries out additive operation, and end product is kept among the first source operand REG_C0.

In second step, the value of the 4th source operand is 1, the A1 among the multiplier selected cell 1 selection this moment second source operand REG_A, B0 among A1 and the 3rd source operand REG_B then, B1, B2, B3 carry out multiplying simultaneously, operation result outputs to four adder units 3, respectively with the first source operand REG_C1 in C10, C11, C12, C13 carries out additive operation, and end product is kept among the first source operand REG_C1.

In third step, the value of the 4th source operand is 2, the A2 among the multiplier selected cell 1 selection this moment second source operand REG_A, B0 among A2 and the 3rd source operand REG_B then, B1, B2, B3 carry out multiplying simultaneously, operation result outputs to four adder units 3, respectively with the first source operand REG_C2 in C20, C21, C22, C23 carries out additive operation, and end product is kept among the first source operand REG_C2.

In the 4th step, the value of the 4th source operand is 3, the A3 among the multiplier selected cell 1 selection this moment second source operand REG_A, B0 among A3 and the 3rd source operand REG_B then, B1, B2, B3 carry out multiplying simultaneously, operation result outputs to four adder units 3, respectively with the first source operand REG_C3 in C30, C31, C32, C33 carries out additive operation, and end product is kept among the first source operand REG_C3.

Through four steps, register REG_C0, REG_C1, REG_C2, the value of preserving among the REG_C3 is net result.Fig. 7 is to use the false code synoptic diagram of matrix operation in the prior art calculating chart 5, with Fig. 6 more as can be seen, efficient of the present invention is far above prior art.

Use the floating point vector multiply-add operation apparatus and method of the embodiment of the invention, when carrying out the computing of type shown in Fig. 1, vectorial (A0, A1 ... An-1) leave in second source operand, only need to carry out a continuous n bar vector then and take advantage of and add instruction and get final product.Wherein, the

value difference

0,1 of the 4th source operand in the instruction of n bar ... n-1 is used for specifying employed multiplier in every instruction.Avoided in the prior art, whenever executed a vector and take advantage of and add instruction, will carry out the defective that a shuffle shift order is reset multiplier, reduced program length, improved operation efficiency and speed.

Should be noted that at last that obviously those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these revise and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification.

Claims

1. a floating point vector multiply-add operation device is characterized in that, comprises a multiplier selected cell, n multiplication unit, n adder unit;

Adder unit is used for n part of the operation result of multiplication operations and the first source operand floating point vector carried out add operation respectively, obtains taking advantage of adding the result;

2. floating point vector multiply-add operation device according to claim 1 is characterized in that the value of described the 4th source operand is selected the appropriate section of the second source operand floating point vector, for:

3. floating point vector multiply-add operation device according to claim 1 and 2 is characterized in that, described multiplier selected cell is a MUX.

4. floating point vector multiply-add operation device according to claim 1 and 2 is characterized in that, described multiplication unit is made of a plurality of parallel multiplication subelements.

5. floating point vector multiply-add operation device according to claim 1 and 2 is characterized in that, described adder unit is made of a plurality of parallel addition subelements.

6. the disposal route of a floating point vector multiply-add operation device is characterized in that, comprises the steps:

7. the disposal route of floating point vector multiply-add operation device according to claim 6, it is characterized in that, among the described step B, according to the value m of described the 4th source operand, the m+1 that the multiplier selected cell of described floating point vector multiply-add operation device selects to have in the described second source operand register the second source operand floating point vector of n part partly outputs to n multiplication unit of floating point vector multiply-add operation device; Specifically comprise the steps:

When the value of the 4th source operand was 0, the multiplier selected cell of described floating point vector multiply-add operation device selected the part 1 of the second source operand floating point vector to output to n multiplication unit of floating point vector multiply-add operation device;

When the value of the 4th source operand was 1, the multiplier selected cell of described floating point vector multiply-add operation device selected the part 2 of the second source operand floating point vector to output to n multiplication unit of floating point vector multiply-add operation device;

When the 4th source operand was n-1, the multiplier selected cell of described floating point vector multiply-add operation device selected the n of the second source operand floating point vector partly to output to n multiplication unit of floating point vector multiply-add operation device.