CN101110016A

CN101110016A - Subword paralleling integer multiplying unit

Info

Publication number: CN101110016A
Application number: CNA2007100356514A
Authority: CN
Inventors: 张民选; 董兰飞; 李少青; 陈吉华; 赵振宇; 陈怒兴; 马剑武; 徐炜遐; 孙岩; 乐大珩; 贺鹏; 刘婷; 喻仁峰; 何小威; 郑东裕
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2007-08-29
Filing date: 2007-08-29
Publication date: 2008-01-23

Abstract

The utility model discloses a sub-word parallel integer multiplier of which the data preprocessing module can extend the multiplicand and the multi-plicator as per the operator schema and the symbolic control signal, so as to produce four groups of the multiplicand and four groups of multi-plicator. The corrected value selection module can select and incorporate the corrected value according to the sign digit of the operator schema and the arithmetic product result. The input of the partial producing module comprises four groups of the multiplicand, four groups of multi-plicator and the control signal. The output thereof is the partial product. Each group of the partial producing module is composed of a group of the Booth encoding unit and a group of partial product selected cell; partial product compress tree mold block compress the partial product and the incorporated corrected value. The utility model has the simple structure and simplifies the algorithm as well as realizes the delay and the diminishing of the partial product compression modules, so as to improve the sub-word parallel integer multiplier of the entire multiplier.

Description

Subword paralleling integer multiplying unit

Technical field

The present invention is mainly concerned with the design field of 64 bit architecture microprocessors, refers in particular to a kind of Subword paralleling integer multiplying unit.

Background technology

Advanced microprocessor and multimedia treatment technology are had higher requirement to multiplying.Single instruction multiple data (SIMD) processing mode is a principal feature of multimedia extension and digital signal processing in modern multimedia processor, the general processor.Support the performance of the sub-word parallel multiplier of SIMD processing mode to become the key component that improves the multimedia processor performance.

Fig. 1 is the structural drawing of existing sub-word parallel multiplier.Existing sub-word parallel multiplier is based on the structure of model selection, this multiplier has added the MUX based on model selection in the partial product generation module, and inserted disconnection carry chain signal based on pattern at partial product compressed tree and final each mode boundary place of carry propagate adder, can finish 1 64 * 64,2 32 * 32,4 16 * 16,88 * 8 multiply operation through reconstruct.When common multiplication pattern, as two input A[63:0] and B[63:0] arrive after, at first the long-pending generation unit of entering part is divided into four parts with multiplicand and multiplier, produce partial product by the Booth coding respectively, long-pending being input in the partial product compressed tree (PPRT) of various piece compressed afterwards, pseudo-after the compression and be input in the carry propagate adder (CPA) with pseudo-carry and carry out last addition exported the result at last then.

When sub-word parallel schema, as two input A[63:0] and B[63:0] arrive after, all operations is identical during with common multiplication pattern, just increase a modified value and selected module, select the modified value after the merging, be sent in the partial product compressed tree (PPRT), compress with partial product.Fig. 2 is that the carry chain in the partial product pressure texture disconnects logic.

Wherein:

sum＝(ab)(c _in.□kill) (1)

c _out＝(a.b)+[(a+b).(c _in.□kill)] (2)

But existing sub-word parallel multiplier has following shortcoming:

1, the parallel multiplier that adopts this algorithm design more than when partial product produces the one-level selector switch, increased the one-level gate delay;

2, will be in the disconnection carry chain signal of mode boundary place insertion based on pattern, algorithm complexity;

3, the fan-out load of disconnection carry chain signal is big, interconnect traces is very long, almost spread all over entire portion and overstock the part that contracts on domain is realized, huge fan-out load and long cabling have increased the time-delay of partial product compression to a certain extent, and it is complicated that full customization domain is realized;

This parallel multiplier also will insert the disconnection carry chain signal based on pattern at each mode boundary place in last carry propagate adder, this has also increased time-delay and difficulty for the design of carry propagate adder, makes the versatility of design be very limited.

Summary of the invention

The problem to be solved in the present invention just is: at the technical matters of prior art existence, the invention provides a kind of simple in structure, simplified algorithm and realization, the delay of partial product compression unit is reduced, improve the Subword paralleling integer multiplying unit of the performance of whole multiplier.

For solving the problems of the technologies described above, the solution that the present invention proposes is: a kind of Subword paralleling integer multiplying unit, it is characterized in that: it comprises that data preprocessing module, four independent parts generation modules, a modified value select module and partial product compressed tree module, described data preprocessing module is used for importing multiplicand SRC1[63:0] and multiplier SRC2[63:0] and control signal, according to operator scheme and Signed Domination signal multiplicand and multiplier are expanded, produced corresponding 4 groups of multiplicands and 4 groups of multipliers; Described modified value selects module to be used for modified value is selected and being merged modified value according to the sign bit of operator scheme and result of product; The part generation module be input as 4 groups of multiplicands that data preprocessing module produces, 4 groups of multipliers and control signal, be output as partial product, every group of part generation module is made up of one group of Booth coding unit and one group of partial product selected cell, function is identical, parallel processing, first's generation module produces 9 partial products of low level multiplication, the second portion generation module produces 9 partial products of time low level multiplication, the third part generation module produces time 9 high-order partial products, and the 4th part generation module produces 9 high-order partial products; Partial product compressed tree module is used for the modified value after the partial-product sum merging of part generation module generation is compressed.

Modified value is selected being operating as of module when in sub-word parallel schema multiply operation:

(1), at first when doing the operation of sub-word parallel multiplication, adding a modified value for corresponding partial product according to the difference of each sub-word product signs:

1. the sign bit when low seat word result of product is timing, adds modified value for its result of product:

128’hffff_ffff_fffe_0000_0000_0000_0000_0000；

2. when the symbol of low seat word result of product when negative, add modified value for its result of product:

128’hffff_ffff_fffe_0000_0000_0001_0000_0000；

3. when the symbol of time low seat word result of product for just, then add modified value to its result of product:

128’hffff_fffe_0000_0000_0000_0000_0000_0000；

4. if the symbol of time low seat word result of product adds modified value for negative, then for its result of product:

128’hffff_fffe_0000_0001_0000_0000_0000_0000。

5. the symbol when inferior high seat word result of product is timing, adds modified value for its result of product:

128’hfffe_0000_0000_0000_0000_0000_0000_0000；

6. when the symbol of time high seat word result of product when negative, add modified value for its result of product:

128’hfffe_0001_0000_0000_0000_0000_0000_0000。

(2), just these three modified values are merged before adding the unit three modified values being sent into part accumulation, with partial product generation unit executed in parallel, at last itself and partial product are together sent into the partly overstocked unit that contracts and added up and get final product; According to each sub-word result of product symbol difference, low level, inferior low level and an inferior high position respectively have two kinds of modified values, and permutation and combination has 8 kinds of situations, and the modified value amalgamation result of every kind of situation is as follows, wherein 0 represent product for just, and 1 represents product for negative:

1. a time high position is 0, and inferior low level is 0, and low level is 0 o'clock, and the modified value amalgamation result is:

128’hfffd_fffd_fffe_0000_0000_0000_0000_0000；

2. a time high position is 0, and inferior low level is 0, and low level is 1 o'clock, and the modified value amalgamation result is:

128’hfffd_fffd_fffe_0000_0000_0001_0000_0000；

3. a time high position is 0, and inferior low level is 1, and low level is 0 o'clock, and the modified value amalgamation result is:

128’hfffd_fffd_fffe_0001_0000_0000_0000_0000；

4. a time high position is 0, and inferior low level is 1, and low level is 1 o'clock, and the modified value amalgamation result is:

128’hfffd_fffd_fffe_0001_0000_0001_0000_0000；

5. a time high position is 1, and inferior low level is 0, and low level is 0 o'clock, and the modified value amalgamation result is:

128’hfffd_fffe_fffe_0000_0000_0000_0000_0000；

6. a time high position is 1, and inferior low level is 0, and low level is 1 o'clock, and the modified value amalgamation result is:

128’hfffd_fffe_fffe_0000_0000_0001_0000_0000；

7. a time high position is 1, and inferior low level is 1, and low level is 0 o'clock, and the modified value amalgamation result is:

128’hfffd_fffe_fffe_0001_0000_0000_0000_0000；

8. a time high position is 1, and inferior low level is 1, and low level is 1 o'clock, and the modified value amalgamation result is:

128’hfffd_fffe_fffe_0001_0000_0001_0000_0000。

When described modified value selects module to carry out the modified value merging, most of position of revising is fixed, have only the 97th, 96,64, revise the position according to different the changing of each sub-word product signs combination for 32, and change regular following, wherein the 97th, revising the position for 96 changes according to time high seat word product signs, revising the position for the 64th changes according to time low seat word product signs, revising the position for the 32nd changes according to low seat word product signs, all the other each to revise the position all be fixed value, adds up as long as itself and partial product are together sent into the overstocked unit that contracts of part when being used as sub-word parallel schema multiply operation.

Compared with prior art, advantage of the present invention just is:

1, Subword paralleling integer multiplying unit of the present invention can make the delay of partial product compression unit reduce, thereby has improved the performance of whole multiplier;

2, Subword paralleling integer multiplying unit of the present invention does not need to insert the disconnection carry chain signal based on pattern, has simplified algorithm and realization; Simultaneously also avoid resolving out the long interconnection line of the overall situation of carry chain signal, domain is realized simple efficient;

3, Subword paralleling integer multiplying unit of the present invention 128 puppets making that partial product compression back produces and just can obtain final result of product after by any one carry propagate adder addition with 128 pseudo-carries do not need the specialized designs totalizer;

4, Subword paralleling integer multiplying unit of the present invention is applied widely, except being has 64 integer multiplier algorithms of 4 groups of sub-word parallel functions, after suitably expanding, this multiplier can also be applicable to have any the multiplying that n organizes sub-word parallel function;

5, Subword paralleling integer multiplying unit algorithm of the present invention has reduced hardware spending and overall time-delay, has more excellent performance.

Description of drawings

Fig. 1 is the structural representation of multiplier in the prior art;

Fig. 2 is the synoptic diagram that carry chain disconnects logic;

Fig. 3 is the partial product synoptic diagram of 64 bit parallel integer multiplier under parallel schema;

Fig. 4 is the structural representation of multiplier of the present invention.

Embodiment

Below with reference to the drawings and specific embodiments the present invention is described in further details.

As shown in Figure 4, a kind of Subword paralleling integer multiplying unit of the present invention, it comprises that data pre-service (DPrepare) module, four independent parts produce (PPGenertor) modules, a modified value is selected module and partial product compressed tree (PPRT), described data preprocessing module is used for importing multiplicand SRC1[63:0] and multiplier SRC2[63:0] and control signal, according to operator scheme and Signed Domination signal multiplicand and multiplier are expanded, produced corresponding 4 groups of multiplicands and 4 groups of multipliers; Described modified value selects module to be used for modified value is selected and being merged modified value according to the sign bit of operator scheme and result of product; 4 groups of multiplicands that are input as the data preprocessing module generation of partial product generation module, 4 groups of multipliers and control signal, be output as partial product, every group of part generation module is made up of one group of Booth coding unit (Decode) and one group of partial product selected cell (PPSelect), function is identical, parallel processing, in the present embodiment, Decode1 and PPSelect1 have constituted PPGenertor1, Decode2 and PPSelect2 constitute PPGenertor2, Decode3 and PPSelect3 constitute PPGenertor3, Decode4 and PPSelect4 constitute PPGenertor4, wherein first's generation module PPGenertor1 produces 9 partial products of low level multiplication, second portion generation module PPGenertor2 produces 9 partial products of time low level multiplication, third part generation module PPGenertor3 produces time 9 high-order partial products, and the 4th part generation module PPGenertor4 produces 9 high-order partial products; Partial product compressed tree module is used for the modified value after the partial-product sum merging of part generation module generation is compressed.

64 designed bit parallel integer multiplier of the present invention select to finish the general mode operation according to operator scheme, and the operation of symbol or signless 64 * 64-bits multiplication of integers is promptly arranged; Can finish sub-word parallel schema operation again, i.e. 4 parallel symbol being arranged or do not have the operation of symbol 16 * 16-bits multiplication of integers.When carrying out 64 multiplications of integers, finish given multiplicand SRC1[63:0] and multiplier SRC2[63:0] multiply each other, 128 result of product promptly correspond to the product of 64 multiplication.When carrying out 4 parallel 16 multiplications of integers operations with this multiplier, finish multiplicand SRC1[63:48], SRC1[47:32], SRC1[31:16], SRC1[15:0] and multiplier SRC2[63:48], SRC2[47:32], SRC2[31:16], SRC2[15:0] correspondence multiply each other, [127:96] of result of product, [95:63], [63:32], [31:0] are the product of 4 parallel 16 multiplication correspondences.The partial product of 64 bit parallel integer multiplier under parallel schema adopted the booth2 algorithm, and provided with latticed form as shown in Figure 3.The target of 64 bit parallel integer multiplier of design be 128 puppets of making every effort to partial product is compressed that the back produces and and 128 pseudo-carries back of adding up just can obtain finally correct result of product.Obviously, the result of partial product compression is correct when carrying out the general mode multiply operation, but when sub-word parallel schema multiply operation, because the low level multiplicand will be to high-order escape character position, and low level has carry to a high position, if directly carry out the partial product compression, will produce error, this error is by (high position to low level can not exert an influence) of low level to the influence generation of a high position, and is correct in order to guarantee the result, must revise this error.

Therefore, modified value is selected being operating as of module when in sub-word parallel schema multiply operation:

128’hffff_ffff_fffe_0000_0000_0000_0000_0000；

128’hffff_ffff_fffe_0000_0000_0001__0000_0000；

128’hffff_fffe_0000_0000_0000_0000_0000_0000；

128’hffff_fffe_0000_0001_0000?0000_0000_0000。

128’hfffe_0000_0000_0000_0000_0000_0000_0000；

128’hfffe_0001_0000_0000_0000_0000_0000_0000。

By modified value, can make multiplier need not insert disconnection carry chain signal and just can simply realize sub-word concurrent operation.After adopting this step, when doing 4 sub-word parallel schema multiply operations, only need simply to add corresponding modified value, then modified value and partial product are together carried out partial product and compress and just can obtain correct result of product according to the symbol of each sub-word result of product.But simple like this three corrections and partial product are directly added up certainly will increase the delay of part compression module, the performance of multiplier is affected, so the present invention solves this problem by following steps (2).

128’hfffd_fffd_fffe_0000_0000_0000_0000_0000；

128’hfffd_fffd_fffe_0000_0000_0001_0000_0000；

128’hfffd_fffd_fffe_0001_0000_0000_0000_0000；

128’hfffd_fffd_fffe_0001_0000_0001_0000_0000；

128’hfffd_fffe_fffe_0000_0000_0000_0000_0000；

128’hfffd_fffe_fffe_0000_0000_0001_0000_0000；

128’hfffd_fffe_fffe_0001_0000_0000_0000_0000；

128’hfffd_fffe_fffe_0001_0000_0001_0000_0000。

When modified value selects module to carry out the modified value merging, most of position of revising is fixed, have only the 97th, 96,64, revise the position according to different the changing of each sub-word product signs combination for 32, and change regular following, wherein the 97th, revising the position for 96 changes according to time high seat word product signs, revising the position for the 64th changes according to time low seat word product signs, revising the position for the 32nd changes according to low seat word product signs, all the other each to revise the position all be fixed value, adds up as long as itself and partial product are together sent into the overstocked unit that contracts of part when being used as sub-word parallel schema multiply operation.The modified value merga pass is sent into part accumulation in modified value and just this modified value is merged before adding the unit, and while and partial product generation unit executed in parallel reduce the delay of partial product compression unit, improve the performance of whole multiplier.

Claims

1. Subword paralleling integer multiplying unit, it is characterized in that: it comprises that data preprocessing module, four independent parts generation modules, a modified value select module and partial product compressed tree module, described data preprocessing module is used for importing multiplicand SRC1[63:0] and multiplier SRC2[63:0] and control signal, according to operator scheme and Signed Domination signal multiplicand and multiplier are expanded, produced corresponding 4 groups of multiplicands and 4 groups of multipliers; Described modified value selects module to be used for modified value is selected and being merged modified value according to the sign bit of operator scheme and result of product; The part generation module be input as 4 groups of multiplicands that data preprocessing module produces, 4 groups of multipliers and control signal, be output as partial product, every group of part generation module is made up of one group of Booth coding unit and one group of partial product selected cell, function is identical, parallel processing, first's generation module produces 9 partial products of low level multiplication, the second portion generation module produces 9 partial products of time low level multiplication, the third part generation module produces time 9 high-order partial products, and the 4th part generation module produces 9 high-order partial products; Partial product compressed tree module is used for the modified value after the partial-product sum merging of part generation module generation is compressed.

2. Subword paralleling integer multiplying unit according to claim 1 is characterized in that modified value is selected being operating as of module when in sub-word parallel schema multiply operation:

128’hffff_ffff_fffe_0000_0000_0000_0000_0000；

128’hffff_ffff_fffe_0000_0000_0001_0000_0000；

128’hffff_fffe_0000_0000_0000_0000_0000_0000；

128’hffff_fffe_0000_0001_0000_0000_0000_0000；

128’hfffe_0000_0000_0000_0000_0000_0000_0000；

128’hfffe_0001_0000_0000_0000_0000_0000_0000。

128’hfffd_fffd_fffe_0000_0000_0000_0000_0000；

128’hfffd_fffd_fffe_0000_0000_0001_0000_0000；

128’hfffd_fffd_fffe_0001_0000_0000_0000_0000；

128’hfffd_fffd_fffe_0001_0000_0001_0000_0000；

128’hfffd_fffe_fffe_0000_0000_0000_0000_0000；

128’hfffd_fffe_fffe_0000_0000_0001_0000_0000；

128’hfffd_fffe_fffe_0001_0000_0000_0000_0000；

128’hfffd_fffe_fffe_0001_0000_0001_0000_0000。

3. Subword paralleling integer multiplying unit according to claim 2, it is characterized in that: when described modified value selects module to carry out the modified value merging, most of position of revising is fixed, have only the 97th, 96,64, revise the position according to different the changing of each sub-word product signs combination for 32, and change regular following, wherein the 97th, revising the position for 96 changes according to time high seat word product signs, revising the position for the 64th changes according to time low seat word product signs, revising the position for the 32nd changes according to low seat word product signs, all the other each to revise the position all be fixed value, adds up as long as itself and partial product are together sent into the overstocked unit that contracts of part when being used as sub-word parallel schema multiply operation.