CN100595729C - 32 bits integral multiplier based on CISC microprocessor - Google Patents

32 bits integral multiplier based on CISC microprocessor Download PDF

Info

Publication number
CN100595729C
CN100595729C CN200810175922A CN200810175922A CN100595729C CN 100595729 C CN100595729 C CN 100595729C CN 200810175922 A CN200810175922 A CN 200810175922A CN 200810175922 A CN200810175922 A CN 200810175922A CN 100595729 C CN100595729 C CN 100595729C
Authority
CN
China
Prior art keywords
multiplier
symbol
bit
result
multiplicand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200810175922A
Other languages
Chinese (zh)
Other versions
CN101458617A (en
Inventor
高德远
王党辉
王得利
樊晓桠
张盛兵
黄小平
魏廷存
张萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN200810175922A priority Critical patent/CN100595729C/en
Publication of CN101458617A publication Critical patent/CN101458617A/en
Application granted granted Critical
Publication of CN100595729C publication Critical patent/CN100595729C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)

Abstract

The invention discloses a 32 bit integer multiplier, belonging to the computer microprocessor design field, comprising a 4-2 compressor, characterized in that, the 4-2 compressor is a third level 4-2compressor array for displaying the multiplier can complete 32 bit multiply operation with symbol or without symbol, after expanding the multiplicand through the symbol, a 4 base Booth encoder based on 4 is used, 16 partial products can be generated by the multiplicand register; a third level pipelining is adopted, computed results can be returned in batch, low 32 bit part of the second step returning result, high 32 bit part of the third part returning result, 32 bit part of the result bus; once multiply operation can be controlled and completed by three microinstructions or two microinstructions. According to the third level 4-2 compressor array design, the microinstruction is used for controlling and satisfying various multiply operations of different opportunities; the generation of the 4 base Booth coding partial product with 32 bit operand with symbol or without symbol is simplified from 17 to 16, the structure of the multiplier is simplified, the multiplication time delay can bereduced.

Description

32 integer multiplier based on the CISC microprocessor
Technical field
The present invention relates to a kind of 32 integer multiplier based on the CISC microprocessor.
Background technology
With reference to Fig. 6.There are two kinds of multiplying orders in the X86 of the Intel instruction, have symbol to take advantage of and do not have symbol and take advantage of.Therefore the multiplier and the multiplicand most significant digit that participate in computing may also may be is-not symbol position for sign bit, for increasing operation rate, there is symbol not have symbol mixed type multiplier based on normal employing of CISC (the Complex InStruction Computer sophisticated vocabulary microprocessor) multiplier of Intel.Therefore two number averages that participate in computing occur with complement form.
If two multiplier A and multiplicand B with complement representation do multiplying, multiplier A bit wide is N, i.e. A[N-1:0], and to establish N be even number, then multiplier A can be expressed as:
A=-A N-1×2 N-1+A N-2×2 N-2+…A 1×2 1+A 0×2 0
=(-A N-1+A N-2)×2 N-1+(-A N-2+A N-3)×2 N-2+…+(-A 2+A 1)×2 2+(A 1+A 0)×2 1+(A 0+0)×2=(-2A N-1+A N-2+A N-3)×2 N-2(-2A N-3+A N-4+A N-5)×2 N-4+·+(2A 3+A 2+A 1)×2 2+(-2A 1+A 0+0)×2
So multiplying each other, A and B can be expressed as
A × B = ( Σ i = 0 , A - 1 = 0 N 2 - 1 ( - 2 A 2 i + 1 + A 2 i + A 2 i - 1 ) × 2 2 i ) × B
= Σ i = 0 , A - 1 = 0 N 2 - 1 ( ( - 2 A 2 i + 1 + A 2 i + A 2 i - 1 ) × B ) × 2 2 i
For (2A 2i+1+ A 2i+ A 2j-1) * B part can adopt preprocess method, is converted into simple a few number addition.Multiplicand is through obtaining four effective results after the pre-service, is that the negate of twice, the multiplicand of multiplicand itself, multiplicand adds 1 respectively, adds 1 again after getting after the multiplicand twice.Can be according to A 2i+1, A 2i, A 2i-1Value from following table, choose processing to multiplicand, 2B wherein refers to multiplicand is directly moved to left one ,-B refers to the multiplicand negate, and-2B is with the multiplicand negate again after that moves to left.When B being multiply by a negative value, B taken advantage of corresponding multiple earlier after, also want negate to add 1 again.What E represented is exactly this numerical value.
A 2i+1 A 2i A 2i-1 Operation to B E i
0 0 0 B 0
0 0 1 +1·B 0
0 1 0 +1·B 0
0 1 1 +2·B 0
1 0 0 -2·B +1
1 0 1 -1·B +1
1 1 0 -1·B +1
1 1 1 B 0
It below promptly is the principle of this (Bu Si) algorithm of basic 4 cloth.Any one ((2A wherein 2i+1+ A 2i+ A 2i-1) * B * 2 2i) be a partial product of generation, from derivation, be that two multipliers of even bit N multiply each other for bit wide, will generate N/2 partial product.
Present 32 general practices that have symbol not have symbol mixed type multiplier are: carry out sign extended earlier, be extended to two 33 complementary operation number, because it is even number that Booth requires bit wide, so require to carry out the expansion of two bit signs, re-use basic 4 Booths and generate 17 partial products, adopt compressor reducer to carry out the partial product summation then, obtain two operation results such as the 4-2 compressor reducer.When adopting the 4-2 compressor reducer, need [log 4K] a level 4-2 compressor reducer just can obtain two final values, and wherein k refers to the number of partial product, and the side expands number to refer to and satisfies more than or equal to log 4The integer of the minimum of k.Thereby 17 partial products just need 4 grades of 4-2 compressions.These two operation results carry out additive operation again, finally obtain 64 bit arithmetic values.Multiplier architecture figure such as Fig. 6 of this method design.
Multiplier by this method design has several shortcomings: 1, it is even width that this computing of base 4 cloth requires the multiplier of participation computing, so after being extended to 34 with 32, the partial product that generates reaches 17, this causes many 4-2 compressor reducer computing items imperfect when carrying out the 4-2 compression in the back, area has very big waste, because be level Four 4-2 compression, operation time also can be long.2, finally once generate 64 results, when writing back register, the requirement result common bus is 64 like this.But the result of most X86 instruction once-through operations mostly is 32 most, single for multiplying with 1 times of bus bit wide expansion and be unworthy.And if once write back 64 results simultaneously, and can increase the weight of the burden of parts such as register file, instruction dispatch tracking, cause the requirement register file to support multiport to write such as meeting, the increase of data intersecting chain list item, and the register bypass logic is complicated.3, simultaneously different X86 multiplying orders is also different for result's requirement, 64 results of IMUL r/m32 class command request multiplication divide high 32 to write back two different registers with low 32, and IMUL r32, low 32 of 64 results of r/m32 class command request multiplication write back certain register, and high 32 results give up.Therefore, the calculating process of multiplier should be according to the difference of instruction, in good time end multiply operation, and return correct result.And the designed microprocessor of conventional method adaptability is relatively poor in this respect.
Summary of the invention
In order to overcome the above-mentioned deficiency of prior art, the invention provides a kind of 32 integer multiplier based on the CISC microprocessor, this multiplier time cycle is few, area is little, the result relevant uncomplicated, reusability is high.
The technical solution adopted for the present invention to solve the technical problems: a kind of 32 integer multiplier based on the CISC microprocessor, comprise the 4-2 compressor reducer, be characterized in that described 4-2 compressor reducer is three grades of 4-2 compressor reducer arrays, show that this multiplier can finish symbol or not have 32 multiplyings of symbol, after multiplicand process sign extended, use generates 16 partial products based on this coding of cloth of 4 by multiplicand register;
This multiplier adopts three grades of flowing water, returns result of calculation in batches, low 32 bit positions of second count return results, triple time return results high 32 bit positions, 32 of result bus;
This multiplier is finished multiplication operation by three micro-orders or two micro-order controls.
The invention has the beneficial effects as follows: because this multiplier takes all factors into consideration from system-level viewpoint, avoid the single performance element design of setting about from function and cause with the unmatched drawback of architecture.Designed multiplier to architecture register reservation station and label judge, public result bus bit wide equal pressure is less.After determining micro-order, adopt best three class pipeline to satisfy the various multiply operations of difference demand on opportunity.Simultaneously to expand sign bit have symbol not have the long-pending generation of this coded portions of cloth of symbol 32 bit manipulation bases 4 to be reduced to 16 from 17, not only to have reduced the structure of multiplier, also effectively reduced the multiplication time-delay.Final designed multiplier architecture compactness, area is also little than the prior art multiplier, the result relevant uncomplicated, reusability is high.Below in conjunction with drawings and Examples the present invention is elaborated.
Description of drawings
Fig. 1 is 32 integer multiplier structural drawing that the present invention is based on the CISC microprocessor.
Fig. 2 is 16 partial product primitive form synoptic diagram that the partial product maker generates among Fig. 1.
Fig. 3 is 16 partial product symbolic simplification form synoptic diagram that the partial product maker generates among Fig. 1.
Fig. 4 is 4-2 compressor configuration figure among Fig. 1.
Fig. 5 is a 4-2 compressor reducer array of figure among Fig. 1.
Fig. 6 is background technology multiplier architecture figure.
Embodiment
With reference to Fig. 1~5.Consider the Intel order property, its general form is OPcode A B, two source operand A and B operation, end product still places A, that is to say, A had both made one of them source address, again as destination address, thereby in order to keep pro forma unification as far as possible, CISC types of microprocessors micro-order also adopts same form usually: microcode AB.Such as an addition microoperation, A and B addition, the result finally writes back A.For making full use of of resource, multiplier should have been finished symbol simultaneously and take advantage of and do not have a symbol multiplication.For micro-order, it should minimumly comprise one has symbol to take advantage of mul, and does not have symbol and take advantage of imul.For the instruction of single operand, can be in micro-order, the operand EAX that it is implicit indicates in micro-order.But because its operation result is 64, need write back in two 32 different bit registers, when judgement writes back register,, can only write back wherein a part of result only according to the destination register numbering.Unless becoming dual-port, register writes, and when writing back current operation is remake decoding, usually the logic that adopts is: if (microcode==mul), then EAX ← Mul_result[31:0], EDX ← Mul_result[63:32]. can cause the irregularity of logic when writing back register like this, increase the weight of to write back the logic burden.Also can adopt identical disposal route for the single-operand instruction form that sign multiplication is arranged.But for dual-operand that symbol is arranged and such processing of 3-operand is infeasible.Because their result only gets low 32, and high 32 results are given up.If require so once to write back whole results, only there are two micro-order mul and imul still not enough, minimumly should add an imult micro-order, only get low 32 situation so that solve operation result.
The pipeline organization that typical microprocessor adopts, Main Stage is got finger, deciphers, peeks, carries out and is write back.The data that occur when instruction is carried out are relevant and streamline obstruction that cause can influence the performance of track performance greatly.It is relevant to occur the read-after-write data according to the order of sequence in the execution pipeline.Some 32 multiplication operation once produces two 32 results, if once write back this two results, require too microinstruction decode during to the design of bypass, two operands in next bar micro-order all will be made comparisons relevant to determine whether having data with EDX and EAX.Further consider that if adopt the register renaming technology to support out of order execution, can cause that the judgement of architecture register reservation station and label is complicated, public result bus bit wide increases and the utilization factor reduction, these all can increase the pressure to sequential.Thereby need to consider that timesharing write back two times result, once write back one simply more many than once writing two results of the Huis as a result the time to the logic of register write back and bypass.
Learn from last surface analysis, should avoid once writing back two when writing back register.This just requires should the result be write back at twice in Multiplier Design as far as possible, that is to say that minimum should two bat finish a multiplying order.But owing to there is the such multiplying order of IMUL AB C, B and C multiplied result are not to write back B but be positioned among the A, like this in first count, must indicate two the operand B and the C that participate in computing, and microcode A category-B micro-order has only two operand fields, can't indicate the destination address A that will write back in this bat.So micro-order should be added a bat at least, this bat is used for indicating two operands that participate in computing, thereby finishes a multiplying order minimum triple time.The content of micro-order has also just been determined basically like this, and first count reads in two operands, and second count can write back low 32 results, writes back high 32 results triple time.In hardware corresponding to 160.In addition, in other complicated order, also can use the multiplication microoperation, through its form of statistics is mulA, B, just A and B multiplied result place A, and only need hang down 32 result, promptly only need two micro-orders, thereby a multiply operation should have certain elasticity, to satisfy the opportunity of different multiplication result demands, if control just little being fit in stage that multiplication is carried out with state machine.So only use micro-order to control the different phase of multiplication, this just requires to be easy to distinguish is which claps instruction.In conjunction with microinstruction format, there is one to write return wb (write back), writing return at first beat of streamline is 0, shows not write back any result.Second count and triple time will write back the result, and writing return wb should be 1.Associating multiplying order also needs the updating mark register after finishing, and can use updating mark register-bit flag, and making flag at second count is to be 1 just to have distinguished this two execute phases 0, the triple time.Simultaneously, adopt this method only need in the operand field of correspondence, insert correct register number, just can only represent all types of multiplying orders, avoided increasing the possibility of the bit wide of micro-order with two microcode MUL and IMUL.
So pairing micro-order of multiplying is as follows, is example with MUL EBX:
mul EAX,EBX no wb no flag
mul EAX,EBX wb no flag
mul EDX,EBX wb flag
Surface analysis on the process can be learnt, it is optimum that 32 multiply operations are finished in employing triple time.Also be extended to 33 for two source operands that symbol is arranged by sign bit, so unification is that two 33 the sign multiplication that has multiplies each other.This is encoded with basic 4 cloth, and to control generating portion long-pending, realizes the summation of partial product then with the tree structure of 4-2 compressor reducer composition.Final two results that produce sue for peace with totalizer.
Concerning this coding of cloth, a precondition is arranged, the bit wide that is exactly multiplier A is an even number, but since will have symbol take advantage of and do not have symbol take advantage of unified after, be 33 by the bit wide of sign extended multiplier A, this just need expand one again with A, just becomes 34.The partial product that obtains has 17, and multiplier can use the 4-2 compressor reducer to accelerate summation speed after obtaining partial product, and this just requires to make that partial product is 4 multiple as far as possible.Therefore partial product need be reduced to 16.This coding of cloth is analyzed, if multiplier A is expanded to 34 bit wides, then for IMUL, A33, A32, A31 have only two kinds may: be 0 entirely or be 1 entirely.Table can find that the partial product of both of these case all is 0 before the contrast.And concerning MUL, A33, A32, A31 also has only two kinds of possibilities: 000 or 001, and table can find that 000 o'clock partial product also be 0 before the contrast, when having only 001, partial product is a multiplicand, again because this time i=16, that is to say this partial product only to final multiplication result high 32 influential.Consider and above-mentionedly will write back low 32 results at second count, and writing back high 32 results triple time, thereby this part in the end one just can be added to when clapping on high 32 of net result, guaranteed only to produce 16 partial products like this, reduce the quantity of 4-2 compressor reducer, also can relax the preceding two sequential pressure of clapping.Sign extended also needs one to get final product simultaneously.Like this, 64 totalizers can be split becomes two 32 multiplier, and high 32 multipliers can time-sharing multiplex.Sign extended correspondence 110, partial product generates corresponding 120.32 additions of height corresponding 140 and 150.
Each partial product is because all want corresponding Ei, direct added-time on this partial product, can cause the inconvenience of calculating, and this value can be added tail end to next partial product.Shown in Figure 2 is the partial product that obtains through after this coding of cloth.Wherein Si is the sign extended of partial product, if partial product for just, then Si is 1, otherwise is 0.
According to conversion:
Figure C20081017592200071
Wherein, S ^ ⊕ S = 1
After the simplification, can effectively reduce power consumption and area.
After generating 16 partial products, these 16 partial products all need be sued for peace, structure adopts the parallel summation of a plurality of 4-2 compressor reducers faster.The compound with regular structure of 4-2 compressor reducer utilizes VLSI to realize.The 4-2 compressor reducer as shown in Figure 5, wherein L1, L2, L3, L4 are four input positions, CIN is last one carry, output S is an operation result, COUT is the input carry to next bit, CARRY is second output of compressor reducer.In Fig. 5, COUT generates after through the two-stage gate delay after effectively in input position, also is just to need CIN later at the two-stage gate delay and calculate, and four addends of therefore many groups can carry out additive operation simultaneously.Per four partial products need obtain two groups of outputs with many groups 4-2 compressor reducer, after first order 4-2 compression, can obtain 8 groups of outputs, pass through one-level again after, obtain 4 groups of outputs, pass through third level compression after, obtain two groups of outputs.
Owing to will finish the multiply operation of two 32 positional operands triple time, therefore the logic function of wanting three different cycles of reasonable distribution to finish is to finish low 32 computing at second count but a prerequisite is arranged, and finishes high 32 computing triple time.Because main time delay is the carry of low level to a high position, thereby low 32 result of calculation is to be bound to fulfil ahead of schedule than high 32.Wherein a kind of method can for: first order register is placed after the 4-2 compressor reducer of the second level, and second level register is placed two groups of outputs that third level 4-2 compression is produced with selecting after the addition of add with carry musical instruments used in a Buddhist or Taoist mass.Like this, in first cycle of multiplication, need finish sign extended, the generation of partial product and preceding two-stage 4-2 compression, and second beat finished third level 4-2 compression and produced 64 multiplication results, the 3rd beat judges whether to add multiplicand obtaining final multiplication result on high 32, and according to multiplication result corresponding marker bit is set.Whether wherein add an enable signal before each grade logical organization, three grades of enable signals are to be determined jointly by microcode and flag and wb position, whether move this grade operation and intermediate result will be squeezed in the register in order to decision.

Claims (1)

1, a kind of 30 two-digit integer multipliers, comprise the 4-2 compressor reducer, it is characterized in that: described 4-2 compressor reducer is three grades of 4-2 compressor reducer arrays, show that this multiplier can finish symbol or not have 32 multiplyings of symbol, after multiplicand process sign extended, use generates 16 partial products based on this coding of cloth of 4 by multiplicand register;
This multiplier adopts three grades of flowing water, returns result of calculation in batches, low 32 bit positions of second count return results, triple time return results high 32 bit positions, 32 of result bus;
This multiplier is finished multiplication operation by three micro-orders or two micro-order controls.
CN200810175922A 2008-01-22 2008-10-29 32 bits integral multiplier based on CISC microprocessor Expired - Fee Related CN100595729C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810175922A CN100595729C (en) 2008-01-22 2008-10-29 32 bits integral multiplier based on CISC microprocessor

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN200810017358 2008-01-22
CN200810017358.X 2008-01-22
CN200810175922A CN100595729C (en) 2008-01-22 2008-10-29 32 bits integral multiplier based on CISC microprocessor

Publications (2)

Publication Number Publication Date
CN101458617A CN101458617A (en) 2009-06-17
CN100595729C true CN100595729C (en) 2010-03-24

Family

ID=40769494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810175922A Expired - Fee Related CN100595729C (en) 2008-01-22 2008-10-29 32 bits integral multiplier based on CISC microprocessor

Country Status (1)

Country Link
CN (1) CN100595729C (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8495125B2 (en) * 2009-05-27 2013-07-23 Microchip Technology Incorporated DSP engine with implicit mixed sign operands
CN102999312B (en) * 2012-12-20 2015-09-30 西安电子科技大学 The optimization method of base 16 booth multiplier
CN105824601A (en) * 2016-03-31 2016-08-03 同济大学 Partial product multiplexing method supporting multi-mode multiplier
CN111258542B (en) * 2018-11-30 2022-06-17 上海寒武纪信息科技有限公司 Multiplier, data processing method, chip and electronic equipment
CN109976707B (en) * 2019-03-21 2023-05-05 西南交通大学 Automatic generation method of variable bit-width multiplier
CN113031918A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN112948901B (en) * 2021-02-04 2023-10-03 深圳安捷丽新技术有限公司 Optimization method and device for acceleration operation in SSD main control chip
CN114816335B (en) * 2022-06-28 2022-11-25 之江实验室 Memristor array sign number multiplication implementation method, device and equipment
CN116205244B (en) * 2023-05-06 2023-08-11 中科亿海微电子科技(苏州)有限公司 Digital signal processing structure

Also Published As

Publication number Publication date
CN101458617A (en) 2009-06-17

Similar Documents

Publication Publication Date Title
CN100595729C (en) 32 bits integral multiplier based on CISC microprocessor
US10289605B2 (en) Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
CN100367269C (en) Executing partial-width packed data instructions
CN109597646A (en) Processor, method and system with configurable space accelerator
CN105453071B (en) For providing method, equipment, instruction and the logic of vector group tally function
CN102231102B (en) Method for processing RSA password based on residue number system and coprocessor
CN102012893B (en) Extensible vector operation device
CN105359129A (en) Methods, apparatus, instructions and logic to provide population count functionality for genome sequencing and alignment
CN104699458A (en) Fixed point vector processor and vector data access controlling method thereof
Balpande et al. Design of FPGA based Instruction fetch & decode Module of 32-bit RISC (MIPS) processor
CN105335127A (en) Scalar operation unit structure supporting floating-point division method in GPDSP
CN105302525B (en) Method for parallel processing for the reconfigurable processor of multi-level heterogeneous structure
US6463453B1 (en) Low power pipelined multiply/accumulator with modified booth's recoder
CN104536914B (en) The associated processing device and method marked based on register access
CN201145892Y (en) 32 bits integer multiplier unit
US20020040378A1 (en) Single instruction multiple data processing
CN112256330B (en) RISC-V instruction set extension method for accelerating digital signal processing
JPH096610A (en) Method and system for replacement of operand during execution of compound instruction in data-processing system
CN100356315C (en) Design method of number mixed multipler for supporting single-instruction multiple-operated
CN106528052A (en) Microprocessor architecture based on distributed function units
Al-sudany et al. FPGA-Based Multi-Core MIPS Processor Design
US20100005456A1 (en) Compiling method, compiling apparatus and computer system for a loop in a program
CN206470741U (en) A kind of microprocessor architecture design based on distributed function unit
CN202720631U (en) Single/double transmission instruction set-based microprocessor instruction processing system
Anjam et al. A shared reconfigurable VLIW multiprocessor system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100324

Termination date: 20121029