CN102087590A

CN102087590A - Execution device of resource-multiplexing floating point SIMD (single instruction multiple data) instruction

Info

Publication number: CN102087590A
Application number: CN2009101551405A
Authority: CN
Inventors: 傅可威; 高金加; 孟建熠; 严晓浪
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2009-12-03
Filing date: 2009-12-03
Publication date: 2011-06-08

Abstract

The invention discloses an execution device of a resource-multiplexing floating point SIMD (single instruction multiple data) instruction, which comprises a mantissa complement fetching circuit, an index subtracting circuit, a mantissa match exponent shifting circuit, a mantissa adding circuit, a mantissa and rounding operation circuit, a result encapsulation circuit and an SIMD logic operation instruction, wherein in each specific execution circuit, the lower operands and higher operands of the SIMD logic operation instruction all multiplex the hardware resources of the operation of double-precision floating-point numbers. The execution device of the resource-multiplexing floating point SIMD instruction provided by the invention can accelerate the execution of single-precision floating-point operation.

Description

The actuating unit of the floating-point SIMD instruction of resource multiplex

Technical field

The present invention relates to the arithmetical logic actuating unit of floating-point SIMD (single instruction multiple data) instruction, the arithmetical logic actuating unit of especially multiplexing single precision or double-precision floating point calculation resources.

Background technology

In the prior art, according to the technical report of Oberman, the usage frequency of floating-point plus-minus performance element is about 55% in floating-point operation.The quickening of floating add subtraction execution speed is significant to the performance that promotes the floating-point arithmetic logical block.

The execution of floating add subtraction comprises following steps: index subtracts each other, mantissa is to rank displacement, mantissa's addition, round off operation and result encapsulation.Fig. 1 shows the data path of the floating-point plus and minus calculation of typical single precision or double precision.Wherein II (Instruction Issue) represents that this pipeline stages is the transmitting instructions level, and (Execution 1 for E1 ^St) representing that this pipeline stages is that the first order is carried out in computing, (Execution 2 for E2 ^Nd) representing that this pipeline stages is that the second level is carried out in computing, (Execution 3 for E3 ^Rd) represent that this pipeline stages is that the third level is carried out in computing.

Mantissa gets complementary circuit 11 and calculates the complement code of one of them floating-point source operand according to arithmetic type (addition or subtraction), thereby unifies additive operation and subtraction.Complementary operation is got by the mantissa that this circuit needs one 25/54 totalizer to finish single precision/double-precision floating points.

It is poor that index subtraction circuit 12 calculates the index of two floating-point source operands, to control mantissa to rank shift circuit 13.The index additive operation of single precision/double-precision floating points needs one 9/12 totalizer.

Mantissa carries out shifting function to the less floating-point source operand of 13 pairs of indexes of rank shift circuit, and the index of two operands is equated.This circuit needs one 48/106 shift unit to finish mantissa's shifting function of single precision/double-precision floating points.

Mantissa's adder circuit 14 calculate two operands mantissa and.Mantissa's sum operation of single precision/double-precision floating points needs one 27/56 totalizer.

Round off function circuit 15 according to mantissa and and mantissa to rank displacements 13 information that provide to the mantissa and the operation of rounding off.This circuit needs one 26/55 the totalizer that rounds off to finish the operation of rounding off.

As a result 16 pairs of encapsulated circuits round off after the operation mantissa and and the index adjustment of standardizing, obtain final result of calculation.When the result encapsulated, the displacement of the mantissa of single precision/double-precision floating points needed one 26/55 shift unit; The index adjustment of single precision/double-precision floating points needs one 8/11 totalizer.

Summary of the invention

The slower deficiency of execution speed when handling single precision and double-precision floating point computing in order to overcome existing arithmetical logic actuating unit the invention provides a kind of actuating unit of floating-point SIMD instruction of resource multiplex of the execution speed that can accelerate the single-precision floating point computing.

The technical solution adopted for the present invention to solve the technical problems is:

A kind of actuating unit of floating-point SIMD instruction of resource multiplex, this actuating unit comprises:

Complementary circuit is got by mantissa, is used for complement code is got by operand mantissa, and unified signed magnitude arithmetic(al) logic comprises that getting of double-precision floating points add musical instruments used in a Buddhist or Taoist mass, and single precision instructs getting of multiplexing double-precision floating points to add the low path of musical instruments used in a Buddhist or Taoist mass;

The index subtraction circuit, be used to obtain the index difference and the magnitude relationship of two groups of operands, for mantissa to rank shift circuit prepare control signal, comprise that index subtracts each other totalizer, described index subtracts each other totalizer and is divided into Gao Lu and low path, and single precision instructs multiplexing index to subtract each other the low path of totalizer;

Mantissa is to the rank shift circuit, be used to select less operand mantissa to carry out rank are shifted, make two operand indexes of floating add subtraction equate, the input data of mantissa's adder circuit are provided, the mantissa that comprises double-precision floating points is to the rank shift unit, to the little shift unit in rank, single precision is instructed the little shift unit of multiplexing low path to the mantissa of little shift unit in rank and low path in described mantissa comprises Gao Lu to the rank shift unit mantissa;

Mantissa's adder circuit is used to finish the addition of two operand mantissa, obtain complement representation mantissa and, prepare for the operation of rounding off, comprise double precision mantissa addition totalizer, single precision is instructed the low path of multiplexing described double precision mantissa addition totalizer;

The mantissa and the function circuit that rounds off are used to finish the operation of rounding off of floating add subtraction, comprise the double precision totalizer that rounds off, and single precision is instructed the multiplexing described double precision totalizer that rounds off;

Encapsulated circuit as a result, be used for to mantissa and and the index adjustment of standardizing, operation result is expressed as normalized relocatable, comprise the big shift unit that double precision mantissa and normalization are adjusted, described big shift unit comprises low path little shift unit and the Gao Lu little shift unit of standardizing of standardizing, single precision is instructed the multiplexing low path little shift unit of standardizing, also comprise the totalizer that the index normalization is adjusted, described totalizer comprises low path index normalization totalizer and Gao Lu index normalization totalizer, single precision instructs multiplexing low path index normalization totalizer, double precision to instruct the high road index normalization totalizer of multiplexing low path index normalization totalizer and part;

Described actuating unit also comprises the SIMD logic instruction;

Get in the complementary circuit in described mantissa, the low path that getting of the multiplexing double-precision floating points of complementary operation added musical instruments used in a Buddhist or Taoist mass is got by the mantissa of the low path operand of SIMD logic instruction, and the Gao Lu that getting of the multiplexing double-precision floating points of complementary operation added musical instruments used in a Buddhist or Taoist mass gets in the mantissa of the high dataway operation number of SIMD logic instruction;

In described index subtraction circuit, the multiplexing index of index phase reducing of the low path operand of SIMD logic instruction subtracts each other the low path of totalizer, and the multiplexing index of index phase reducing of the high dataway operation number of SIMD logic instruction subtracts each other the Gao Lu of totalizer;

Described mantissa to the rank shift circuit in, the mantissa of the low path operand of SIMD logic instruction to the mantissa of the multiplexing double-precision floating points of rank shifting function to the mantissa of rank shift unit low path to the little shift unit in rank, the mantissa of totalizer Gao Lu is subtracted each other to the little shift unit in rank to the number of the multiplexing double-precision floating points of rank shifting function by the mantissa of the high dataway operation number of SIMD logic instruction;

In described mantissa adder circuit, the low path of mantissa's addition totalizer of the multiplexing double-precision floating points of mantissa's phase add operation of the low path operand of SIMD logic instruction, the Gao Lu of mantissa's addition totalizer of the multiplexing double-precision floating points of mantissa's phase add operation of the high dataway operation number of SIMD logic instruction;

In the described mantissa and the function circuit that rounds off, the round off Gao Lu of totalizer of the round off low path of totalizer of the mantissa of the low path operand of SIMD logic instruction and the multiplexing double precision of operation that rounds off, the mantissa of the high dataway operation number of SIMD logic instruction and the multiplexing double precision of operation that rounds off;

In described encapsulated circuit as a result, the low path operand of SIMD logic instruction to mantissa and standardize and adjust the multiplexing low path of the operation little shift unit of standardizing, the high dataway operation number of SIMD logic instruction to mantissa and standardize and adjust the multiplexing high road of the operation little shift unit of standardizing, the multiplexing low path normalization of operation totalizer is adjusted in the low path operand of SIMD logic instruction index is standardized, and the multiplexing high road normalization totalizer of operation is adjusted in the high dataway operation number of SIMD logic instruction index is standardized.

Described SIMD logic instruction comprises floating add SIMD instruction or floating-point subtraction SIMD instruction.

As preferred a kind of scheme: described actuating unit also comprises: sign bit is adjusted circuit, is used for carrying out negate or putting 0 at the sign bit of floating-point operation number.

Further, described SIMD logic instruction comprises the SIMD instruction that takes absolute value of floating-point negate SIMD instruction or floating-point.When the execution floating-point took absolute value the SIMD instruction, symbol was adjusted circuit and is put 0 operation; When carrying out floating-point negate SIMD instruction, symbol is adjusted circuit and is carried out negate.

Further again, described actuating unit also comprises: pre-number zero logical circuits, the side-play amount when being used for calculating mantissa and first 1 and occurring.By adopting pre-several zero-sum normalization calibration logic, accelerate the acquisition of operation result.

Double precision instructs multiplexing described index to subtract each other the low path and the part Gao Lu of totalizer, and double precision is instructed the high road index normalization totalizer of multiplexing low path index normalization totalizer and part.

Technical conceive of the present invention is: single instruction multiple data (Single Instruction MultipleData) operation is the technology that commonly used being used to promotes arithmetic speed, minimizing instruction number in a kind of microprocessor Design.The essence of this technology is to utilize the storage space of double precision datum to deposit two groups of separate single-precision number certificates, and two groups are carried out computing simultaneously.The existence of this technology reduced finish mass data carry out the task of identical operation (for example multimedia figure, Flame Image Process etc.) the total quantity of the instruction that must carry out, make that the data throughout of floating-point arithmetic logical block has increased by one times in the unit interval.The increase of floating-point arithmetic logical block data handling capacity to the speed of finishing that improves this task, reduce that to finish this required by task power consumption significant.

Floating-point takes absolute value, and SIMD instructs and floating-point negate SIMD instruction belongs to floating add subtraction SIMD instruction on instruction type.Be fixed as zero by second source operand, the executive circuit that these two kinds of instructions can complete multiplexing floating add subtraction SIMD instruction with floating add subtraction SIMD instruction.For example: the absolute value of floating number A | A|=|A+0|; Inverted value～A=of floating number A～(A+0).Circuit 17 is adjusted in unique is-symbol position that needs to increase.It is that the negate instruction or the instruction that takes absolute value are carried out negate to sign bit respectively or put 0 (0 expression positive number) according to instruction type that symbol is adjusted circuit 17.

For the microprocessor instruction set of supporting floating-point operation under the prerequisite that increases the little hardware resource, increase a floating add SIMD and instruct (shape such as FADDM), be used to accelerate the execution speed of floating add.

For the microprocessor instruction set of supporting floating-point operation under the prerequisite that increases the little hardware resource, increase a floating-point subtraction SIMD and instruct (shape such as FSUBM), be used to accelerate the execution speed of floating-point subtraction.

For the microprocessor instruction set of supporting floating-point operation under the prerequisite that increases the little hardware resource, increase the floating-point SIMD that takes absolute value and instruct (shape such as FABSM), be used to accelerate the take absolute value execution speed of computing of floating-point.

For the microprocessor instruction set of supporting floating-point operation under the prerequisite that increases the little hardware resource, increase a floating-point negate SIMD and instruct (shape such as FNEGM), be used to accelerate the execution speed of floating-point negate computing.

Beneficial effect of the present invention mainly shows: the arithmetic speed of accelerating actuating unit.

Description of drawings

Fig. 1 is the synoptic diagram of the data path of typical single precision or double-precision floating point signed magnitude arithmetic(al).

Fig. 2 is the synoptic diagram of the activity of four floating-point SIMD instructions, wherein, and Fig. 2 (a) expression FADDM R _d, R _m, R _n, Fig. 2 (b) expression FSUBM R _d, R _m, R _n, Fig. 2

(c) expression FABSM R _d, R _n, Fig. 2 (d) expression FNEGM R _d, R _n

Fig. 3 is the synoptic diagram that complementary circuit is got by the multiplexing mantissa of hardware resource.

Fig. 4 is the synoptic diagram of the multiplexing index subtraction circuit of hardware resource.

Fig. 5 is the synoptic diagram of the multiplexing mantissa of hardware resource to the rank shift circuit.

Fig. 6 is the synoptic diagram of the multiplexing mantissa's adder circuit of hardware resource.

Fig. 7 is the synoptic diagram of the multiplexing mantissa of hardware resource and the function circuit that rounds off.

Fig. 8 is the synoptic diagram of the multiplexing encapsulated circuit as a result of hardware resource.

Fig. 9 is the synoptic diagram that the sign bit of double precision instruction and SIMD instruction is adjusted circuit, and wherein, circuit is adjusted in Fig. 9 (a) expression double precision negate/command character position that takes absolute value, and Fig. 9 (b) expression SIMD negate/circuit is adjusted in the command character position that takes absolute value.

Embodiment

Below in conjunction with accompanying drawing the present invention is further described.

Embodiment 1

With reference to Fig. 2～Fig. 8, the actuating unit of a kind of floating-point SIMD of resource multiplex instruction, this actuating unit comprises:

Complementary circuit is got by mantissa, is used for complement code is got by operand mantissa, and unified signed magnitude arithmetic(al) logic comprises that getting of double-precision floating points add musical instruments used in a Buddhist or Taoist mass, and single precision instructs multiplexing described double-precision floating points to get the low path of adding musical instruments used in a Buddhist or Taoist mass;

The index subtraction circuit is used to obtain the index difference and the magnitude relationship of two groups of operands, for mantissa to rank shift circuit prepare control signal, comprise that index subtracts each other totalizer, single precision instructs multiplexing described index to subtract each other the low path of totalizer;

Encapsulated circuit as a result, be used for to mantissa and and the index adjustment of standardizing, operation result is expressed as normalized relocatable, comprise the big shift unit that double precision mantissa and normalization are adjusted, described big shift unit comprises low path little shift unit and the Gao Lu little shift unit of standardizing of standardizing, and single precision is instructed the multiplexing low path little shift unit of standardizing; Comprise that also index normalization adjusts totalizer, described index normalization is adjusted totalizer and is comprised low path index normalization totalizer and Gao Lu index normalization totalizer, and single precision is instructed multiplexing low path index normalization totalizer;

Described actuating unit also comprises the SIMD logic instruction;

Get in the complementary circuit in described mantissa, the low level that getting of the multiplexing double-precision floating points of complementary operation added musical instruments used in a Buddhist or Taoist mass is got by the mantissa of the low path operand of SIMD logic instruction, and the high position that getting of the multiplexing double-precision floating points of complementary operation added musical instruments used in a Buddhist or Taoist mass is got by the mantissa of the high dataway operation number of SIMD logic instruction;

In described index subtraction circuit, the index of the multiplexing double-precision floating points of index phase reducing of the low path operand of SIMD logic instruction subtracts each other the low level of totalizer, and the index of the multiplexing double-precision floating points of index phase reducing of the high dataway operation number of SIMD logic instruction subtracts each other the high position of totalizer;

Pre-number zero logical circuits, the side-play amount when being used for calculating mantissa and first 1 and occurring;

In described encapsulated circuit as a result, the low path operand of SIMD logic instruction to mantissa and standardize and adjust the multiplexing low path of the operation little shift unit of standardizing, the high dataway operation number of SIMD logic instruction to mantissa and standardize and adjust the multiplexing high road of the operation little shift unit of standardizing; The multiplexing low path normalization of operation totalizer is adjusted in the low path operand of SIMD logic instruction index is standardized, and the multiplexing high road normalization totalizer of operation is adjusted in the high dataway operation number of SIMD logic instruction index is standardized.

The SIMD logic instruction of present embodiment comprises floating add SIMD instruction or floating-point subtraction SIMD instruction.

Fig. 2 shows the performed instruction activity of this device.The instruction set that this device is supported comprises that the SIMD of the SIMD instruction (FSUBM) of SIMD instruction (FADDM), the floating-point subtraction of floating add, SIMD instruction (FABSM) that floating-point takes absolute value and floating-point negate instructs (FNEGM).

The source operand of SIMD instruction utilizes the storage space of 64 double precision (double) floating number to deposit two 32 single precision (Single) floating number.SIMD low path (double-precision floating point format register low 32) and SIMD Gao Lu (double-precision floating point format register high 32) be independence fully logically.

Fig. 2 (a) shows the data stream of floating add SIMD instruction.Its assembly language representation class is seemingly: FADDM R _d, R _m, R _n

Here, " FADDM " represents operational code, identified the operation of necessary execution, R _d, R _m, R _nRepresent the register of double-precision floating point form, the content of each register is all regarded as two independently single precision operands of 32.This instruction expression is with source-register R _m, R _nIn low 32 (being OP1l and the OP2l among Fig. 2 (a)) additions after deposit in R _dLow 32, with source-register R _m, R _nIn high 32 (being OP1h and the OP2h among Fig. 2 (a)) addition after deposit in R _dHigh 32, thereby finished two floating add computings with an instruction.

Fig. 2 (b) shows the data stream of floating-point subtraction SIMD instruction.Its assembly language representation class is seemingly: FSUBM R _d, R _m, R _n

Here, " FSUBM " represents operational code, identified the operation of necessary execution, R _d, R _m, R _nThe SIMD instruction of meaning and floating add identical.This instruction expression is with source-register R _m, R _nIn low 32 (they being OP1l and the OP2l among Fig. 2 (b)) deposit in R after subtracting each other _dLow 32, with source-register R _m, R _nIn high 32 (being OP1h and the OP2h among Fig. 2 (b)) deposit in R after subtracting each other _dHigh 32, thereby finished two floating-point subtractions with an instruction.

Fig. 2 (c) shows the take absolute value data stream of SIMD instruction of floating-point.Its assembly language representation class is seemingly:

FABSM?R _d，R _n

Here, " FABSM " represents operational code, identified the operation of necessary execution, R _d, R _nThe SIMD instruction of meaning and floating add identical.This instruction expression is with source-register R _nIn low 32 (they being the OP1l among Fig. 2 (c)) deposit in R after taking absolute value _dLow 32, with source-register R _nIn high 32 (being the OP1h among Fig. 2 (c)) deposit in R after taking absolute value _dHigh 32, thereby finished the computing that takes absolute value of two floating-points with an instruction.

Fig. 2 (d) shows the data stream of floating-point negate SIMD instruction.Its assembly language representation class is seemingly:

FNEGM?R _d，R _n

Here, " FNEGM " represents operational code, identified the operation of necessary execution, R _d, R _nThe SIMD instruction of meaning and floating add identical.This instruction expression is with source-register R _nIn low 32 (being the OP1l among Fig. 2 (d)) symbol negates after deposit in R _dLow 32, with source-register R _nIn high 32 (being the OP1h among Fig. 2 (d)) symbol negate after deposit in R _dHigh 32, thereby finished two floating-point negate computings with an instruction.Fig. 3 to Fig. 8 is respectively each electronic circuit of this device, and it specifically implements circuit according to the streamline tissue.

Complementary circuit is got by the operand mantissa that Fig. 3 shows this device, it is characterized in that SIMD instructs complete multiplexing double-precision floating points to get the totalizer of complement arithmetic (number in the figure is 35).For double-precision floating points, the operand of getting complement code comprises { 1bit sign bit, 1bit normalized floating point number leading 1,52bits mantissa position }, therefore, needs one 54 to get and add musical instruments used in a Buddhist or Taoist mass at least.For single precision floating datum, the operand of getting complement code comprises { 1bit sign bit, 1bit normalized floating point number leading 1,23bits mantissa position }, totally 25, and the SIMD instruction needs getting of 2 * 25=50 position add musical instruments used in a Buddhist or Taoist mass altogether.Therefore, the operand that instructs of single precision instruction and SIMD is got 54 of adding that musical instruments used in a Buddhist or Taoist mass can complete multiplexing double-precision floating points and is got and add musical instruments used in a Buddhist or Taoist mass.In a specific embodiment, the mantissa of double precision instruction gets complementary operation and need use whole 54 totalizer, and the operand of single precision instruction is got and added the multiplexing double-precision floating points of musical instruments used in a Buddhist or Taoist mass and get low 25 that add musical instruments used in a Buddhist or Taoist mass.Low 25 that complementary operation can multiplexing totalizer 35 get in the mantissa of SIMD instruction low path operand, and SIMD instructs the mantissa of high dataway operation number to get high 25, middle with 4 zero intervals that complementary operation can multiplexing totalizer 35.Other rational bit distribution schemes also can reach multiplexing target.

Label is that 33 operand mantissa selects logical foundation instruction type (single/double/SIMD, i.e. single precision/double precision/single instruction multiple data) from 64 flating point register R among Fig. 3 _2mAnd R _2nIn extract correct operand.(actual subtraction is meant that jack per line two numbers subtract each other or contrary sign two number additions for the instruction of actual subtraction, actual addition is meant that jack per line two number additions or contrary sign two numbers subtract each other), need get complementary operation to second operand, (MUX, number in the figure is 34) chooses correct data by data selector.Get complementary operation and comprise negate and add 1 liang of step, therefore getting the input carry signal of adding musical instruments used in a Buddhist or Taoist mass may comprise following five kinds of situations:

1. single precision and double precision the instruction actual subtraction situation: getting and adding musical instruments used in a Buddhist or Taoist mass input carry position is 1;

2. single precision, double precision instruction and SIMD instruction two-way are the situation of actual addition: getting and adding musical instruments used in a Buddhist or Taoist mass input carry position is 0;

3. SIMD instruction two-way is the situation of actual subtraction: get and add musical instruments used in a Buddhist or Taoist mass input carry position add 1 on the lowest order of two corresponding mantissa;

4. SIMD instructs the high road to be that actual subtraction, low path are the situation of actual addition: get and add musical instruments used in a Buddhist or Taoist mass add 1 on the lowest order of Gao Lu mantissa, and add 0 on the lowest order of low path mantissa;

5. SIMD instructs the high road to be that actual addition, low path are the situation of actual subtraction: get and add musical instruments used in a Buddhist or Taoist mass add 0 on the lowest order of Gao Lu mantissa, and add 1 on the lowest order of low path mantissa.

Label is that 35 totalizer input carry selects circuit to select corresponding getting to add musical instruments used in a Buddhist or Taoist mass input carry signal according to situation in above 5 among Fig. 3.

This circuit will be got operand after the complement code and deposit in and be used for next stage mantissa the rank shift circuit is used in the pipeline register at last.

Fig. 4 shows the operand index subtraction circuit of this device, it is characterized in that SIMD instructs the index of complete multiplexing single precision and double-precision floating point instruction to subtract each other totalizer (number in the figure is 44).For double-precision floating points, totalizer 44 needs { 1bit sign bit, 11bits exponent bits }, totally 12; For single precision floating datum, totalizer 44 needs { 1bit sign bit, 8bits exponent bits }, totally 9; For the SIMD instruction, index subtracts each other totalizer needs 2 * 9=18 position.Therefore in a kind of feasible embodiment, use the index of 18 bits to subtract each other totalizer, low 12 of this totalizer are used in the double precision instruction, and low 9 of totalizer are used in the single precision instruction, SIMD instruction low path uses low 9 of this totalizer, and SIMD instructs high road to use the high 9 of this totalizer.Other reasonable plan, for example " with 21 of totalizer 44 extension bits; double precision instruction takies low 12; the single precision instruction takies high 9; SIMD instructs high low path to take 9 of height respectively; middle with zero at interval " or change the position etc. that all types of instructions take totalizer, all can reach target.Its common feature is on the hardware foundation of double precision and single precision instruction, to increase a small amount of hardware expense, the function that realization SIMD instruction index subtracts each other.

Label is that 43 operand index selects logical foundation instruction type (single/double/SIMD, i.e. single precision/double precision/single instruction multiple data) from 64 flating point register R among Fig. 4 _2mAnd R _2nIn extract correct operand index, prepare input for index subtracts each other totalizer.

In a specific embodiment, Fig. 3 and the operand mantissa shown in Fig. 4 get complementary operation and index mutually reducing be arranged in executed in parallel in the same level production line, be the first order of this device streamline.

Fig. 5 show operand mantissa to the rank shift circuit, it is characterized in that SIMD instructs the mantissa of complete multiplexing double precision instruction to rank shift unit (number in the figure is 56,57).The significant figure of double-precision floating points have 53, therefore double precision mantissa needs 106 bits at least to the rank shift unit, its reason is: for (the round to nearest that rounds off nearby, the basic rounding mode that defines in the Std.IEEE754-1985 floating-point operation standard), needing to keep enough carry digits is used for correctly rounding off, if the displacement figure place is smaller or equal to 53, then the shift unit of 106 bits can meet the demands naturally; If the displacement figure place greater than 53, then shift out the numeral most significant digit must be 0, then can judge shift out numeral must be less than 0.5, so 106 bit shift units can guarantee the information integrity that rounds off fully.In like manner, for single precision floating datum, mantissa only needs 2 * 24=48 position to the rank shift unit; For SIMD instruction, mantissa needs 2 * 48=96 position to the rank shift unit, so single precision and SIMD instruction can complete multiplexing double precision instructions to the rank shift unit, and does not need extra hardware expense.

In a specific embodiment, 106 mantissa forms the little shift unit in rank the mantissa of rank shift unit by two 53.When present instruction was type double precision, two mantissa synthesized a big shift unit to the little shift unit in rank; When present instruction is the single precision instruction, only need to use the mantissa of low path to the little shift unit in rank; When present instruction was SIMD, the mantissa that the low circuit-switched data of SIMD instruction is used low path was to the little shift unit in rank (number in the figure is 57), and high circuit-switched data uses the mantissa of Gao Lu to the little shift unit in rank (number in the figure is 56).The advantage of Shi Shiing is like this, has promptly saved hardware spending, helps the design of low-power consumption again.Other rational bit distribution schemes also can reach multiplexing target.

It is similar that the operand of label 52 is selected logical and Fig. 3, Fig. 4 among Fig. 5, chooses correct mantissa according to the present instruction type from pipeline register.The size cases that the operand mantissa exchange logic of label 53 subtracts each other two operands that obtain according to higher level's index is selected less operand to put into the rank shift unit is carried out the rank displacement is deposited in bigger operand in the pipeline register simultaneously.

Operand mantissa shown in Fig. 5 is the second level streamline of this specific embodiment to the rank shift circuit.

Fig. 6 shows mantissa's adder circuit, it is characterized in that mantissa's addition totalizer (number in the figure is 65) that SIMD instructs complete multiplexing double precision to instruct.Two inputs of this totalizer are not shifting function number among the pipeline register E1/E2 (op_fix, number in the figure is 62) and the number of shifting function (op shift, number in the figure is 63) of selecting logic (number in the figure is 64) to select through operand.

In a specific embodiment, for the type double precision instruction, the input of totalizer 65 is { most significant digits that 1bit sign bit, 1 bit carry digit, 53bit significant figure, 1bit shift out numeral }, totally 56; In like manner, for the single precision type instruction, the input of totalizer 65 is { most significant digits that 1bit sign bit, 1 bit carry digit, 24bit significant figure, 1bit shift out numeral }, totally 27; For the SIMD type instruction, the input of totalizer 65 is totally 2 * 27=54 position.So double precision mantissa addition totalizer that single precision is instructed and the SIMD instruction can be multiplexing fully 56.In specific embodiment, low 27 of the multiplexing totalizer 65 of single precision instruction mantissa's addition; Low 27 of the multiplexing totalizer 65 of mantissa's addition of SIMD instruction low path, the multiplexing totalizer 65 of mantissa's addition of SIMD instruction Gao Lu high 27, middle with 2 bit of zero interval.Other rational bit distribution schemes also can reach multiplexing purpose.

The operation result of totalizer 65 be with the mantissa of complement representation and, be used for next step round off operation and pre-number Z-operation.

Fig. 7 show mantissa and the function circuit that rounds off, it is characterized in that SIMD instructs the totalizer that rounds off (number in the figure is 77) of complete multiplexing double precision instruction.The feature of the totalizer that rounds off be with the mantissa of complement representation and (number in the figure is 73) ask in the true form process (negate adds 1) add 1 operation and the carry logic is unified the shared totalizer that rounds off mutually.Its reason is: ask in the true form process add 1 operation be the mantissa of infinite precision and lowest order on add 1, therefore carry then is to add 1 on the least significant bit (LSB) of normalized floating point number, asks adding 1 operation and not necessarily can influence final result in the true form process.In a specific embodiment, it is unified mutually to add 1 operation by a kind of algorithm with these two, is judged whether and need be added 1 on least significant bit (LSB) by the steering logic that rounds off (number in the figure is 74), and come control data selector switch 76 to select 0 or 1 with this.75 of data selectors according to current mantissa and whether be negative choose the original value of mantissa or back mantissa and inverted value.

The ready logic (number in the figure is 72) that rounds off provides the necessary information that rounds off for the steering logic that rounds off, promptly to shifting out numeral and 0.5 size cases that compares in the shifting process of rank, comprise four kinds of information: shift out numeral greater than 0.5, shift out numeral and equal 0.5, shift out numerical value less than 0.5 and be not equal to 0, shift out numeral and equal 0.

The mantissa of the steering logic that rounds off (number in the figure is 74) ready logic provides according to rounding off four information and complement representation and, decision mantissa and least significant bit (LSB) and whether mantissa and least significant bit (LSB) on add 1.

In a specific embodiment, for the type double precision instruction, the input of the totalizer that rounds off is 55 mantissa and (56 bit arithmetic results of mantissa's addition totalizer remove a bit sign position); For the single precision type instruction, the input of the totalizer that rounds off is 26 mantissa and (27 bit arithmetic results of mantissa's addition totalizer remove a bit sign position); For the SIMD type instruction, the input of the totalizer that rounds off be 2 * 26=52 position mantissa and, therefore, single precision and SIMD type instruction can multiplexing fully these 55 totalizers that round off.A kind of rational bit allocative decision is: the single precision type instruction takies low 26 of the totalizer that rounds off; The low path of SIMD instruction takies low 26 of the totalizer that rounds off, and the Gao Lu of SIMD instruction takies the high 26 of the totalizer that rounds off, and is middle at interval zero with three.Other rational bit allocative decisions also can reach multiplexing purpose.

In a specific embodiment, the mantissa after the operation of rounding off prepares for next stage with leaving among the pipeline register E2/E3.Mantissa's adder circuit shown in Fig. 6 and Fig. 7 and mantissa and the circuit that rounds off have been formed the third level streamline of this device.

Fig. 8 shows the normalization of exponential sum mantissa and adjusts circuit, it is characterized in that two totalizers (number in the figure is 86,87) that SIMD instructs the index normalization to adjust are to increase a small amount of bit wide to realize on the basis of the index normalization adjustment totalizer that multiplexing type double precision instructs; The mantissa and the normalization shift device (number in the figure is 88) of the multiplexing type double precision instruction of the shift unit that SIMD instruction mantissa and normalization are adjusted.Another feature of this circuit is to adopt pre-number zero logic at upper level (E2 level), obtain mantissa and in the digital number crossed of first 1 number when occurring, be called side-play amount (bias, number in the figure is 83), calibrate at the corresponding levels (E3 level), thereby accelerated the acquisition speed of operation result.

In a specific embodiment, for the situation of actual addition, mantissa and only have two kinds of situations: a carry is arranged, both no-carry did not have borrow yet, so data selector (number in the figure is 85) gating zero, expression mantissa and first 1 do not have skew.Because the input of pre-number zero logic be before rounding off mantissa and, the operation of rounding off may cause carry, has therefore caused carry if round off, and then should add 1 to index, data selector (number in the figure is 89) is selected the index of the result of totalizer 87 as net result; If round off no-carry, then index need not calibrated, and data selector 89 is selected the index of the result of totalizer 86 as net result.Whether have the information of carry be stored in pipeline register E2/E3, by this signal control data selector switch 89 if rounding off.It should be noted that if the input of pre-number zero logic be after rounding off mantissa and, then do not need calibration, but very unfavorable to sequential time delay, present invention includes these two kinds pre-number zero logics.

For the situation of actual subtraction, mantissa and have following three kinds of situations: both no-carry did not have borrow yet, a borrow was arranged, the multidigit borrow was arranged.For the third situation, then the result is accurately inevitable, and the operation of rounding off does not influence the result.For the situation of actual subtraction, index need deduct the side-play amount that pre-number zero obtains.Caused carry if round off, then should add 1 calibration to index, data selector 89 is selected the index of the result of totalizer 87 as net result; Do not cause carry if round off, then index need not calibrated, and data selector 89 is selected the index of the result of totalizer 86 as net result.

In a specific embodiment, the index normalization of type double precision instruction is adjusted and is needed two 11 totalizer, the index normalization of single precision type instruction is adjusted and is needed two 8 totalizers, the normalization of the index of SIMD type instruction is adjusted needs two 16 totalizer, and therefore a kind of feasible scheme is to adopt two 16 totalizer to realize

totalizer

86,87 among the figure.A kind of feasible bit allocative decision is: the index normalization of type double precision instruction is adjusted low 11 that take

totalizer

86,87; The least-significant byte that takies

totalizer

86,87 is adjusted in the index normalization of single precision type instruction; The least-significant byte that takies

totalizer

86,87 is adjusted in the normalization of SIMD type instruction low path index, and the most-significant byte that takies

totalizer

86,87 is adjusted in index normalization in the high road of SIMD type instruction.On the basis of single precision and double precision instruction execution, increase the little hardware resource like this, just realized the normalization adjustment of SIMD instruction index.Other rational bit allocative decisions also can reach multiplexing purpose.

In a specific embodiment, the input of normalization shift device (number in the figure is 88) is the output (leaving among the pipeline register E2/E3) that upper level (E2 level) rounds off and operates.Therefore mantissa's normalization shift of type double precision instruction needs one 55 shift unit; Mantissa's normalization shift of single precision type instruction only needs one 26 shift unit; Mantissa's normalization shift of SIMD type instruction needs two 26 shift unit.55 normalization shift device (number in the figure is 78) is made up of two little shift units of normalization: 29 high road shift unit and 26 low path shift unit.When present instruction was the double precision instruction, two little shift units of normalization synthesized a big shift unit; When present instruction is the single precision instruction, only need to use the little shift unit of normalization of low path; When present instruction was the SIMD instruction, the low circuit-switched data of SIMD instruction was used the little shift unit of normalization of low path, and high circuit-switched data is used the little shift unit of normalization of Gao Lu.The advantage of Shi Shiing is like this, has promptly saved hardware spending, helps the design of low-power consumption again.Other rational bit distribution schemes also can reach multiplexing target.

Embodiment 2

With reference to Fig. 9, the actuating unit of present embodiment also comprises: sign bit is adjusted circuit, is used for carrying out negate or putting 0 at the sign bit of floating-point operation number.

Described SIMD logic instruction comprises floating-point take absolute value SIMD instruction or floating-point negate SIMD instruction.

Floating-point takes absolute value, and SIMD instructs and floating-point negate SIMD instruction belongs to floating add subtraction SIMD instruction on instruction type.Be fixed as zero by second source operand, the executive circuit that these two kinds of instructions can complete multiplexing floating add subtraction SIMD instruction with floating add subtraction SIMD instruction.For example: the absolute value of floating number A | A|=|A+0|; Inverted value～A=of floating number A～(A+0).Circuit is adjusted in unique is-symbol position that needs to increase.Fig. 9 shows the sign bit of the floating-point negate/double precision that takes absolute value instruction and the floating-point negate/SIMD that takes absolute value instruction respectively and adjusts circuit.The feature that the sign bit of the floating-point negate/SIMD that takes absolute value instruction is adjusted circuit is to adjust on the basis of circuit, by increasing its function of a small amount of logic realization at the sign bit of the multiplexing floating-point negate/double precision that takes absolute value.It (is sign bit that double precision negate instruction only needs the double-precision floating points most significant digit, the 63rd of double-precision floating points) negate gets final product, the double precision instruction that takes absolute value only needs the double-precision floating points most significant digit to be set to 0 (sign bit is 0 expression floating number for just, and sign bit is that 1 expression floating number is for bearing).In like manner, the floating-point negate/single precision that takes absolute value instruction can be by being put into single precision floating datum in high 32 sign bit adjustment circuit that come multiplexing double precision instruction of double-precision floating point number register.For the floating-point negate/SIMD that takes absolute value instruction, its high circuit-switched data is adjusted circuit with the sign bit that the single precision instruction equally can complete multiplexing double precision instruction; Low circuit-switched data then need increase the function that added logic realizes that sign bit is adjusted.

In a specific embodiment, sign bit is adjusted circuit and is negate by decision instruction or takes absolute value, and whether instruction type is that SIMD instructs and adjusts sign bit.No matter instruction type is single precision, double precision or SIMD, and the most significant digit of double-precision floating point number register all needs to adjust.Therefore need to use data selector 92; When instruction was instructed for negate, data selector 92 was selected the sign bit after the negate; When instruction was instructed for taking absolute value, data selector 92 was selected 0 value as sign bit.When instruction type is the SIMD instruction, also need the sign bit of low circuit-switched data is adjusted.Data selector 93 has been realized with data selector 92 similar functions: when instruction was instructed for negate, data selector 93 was selected the sign bit after the negate; When instruction was instructed for taking absolute value, data selector 93 was selected 0 value as sign bit.On this basis, when instruction type was the SIMD instruction, data selector 94 selected to adjust the sign bit of sign bit later as low circuit-switched data; Otherwise data selector 94 selects this raw data to write pipeline register.

In a specific embodiment, the sign bit after adjusting leaves among the pipeline register E1/E2 and uses for the E2 level production line.

Should be realized that present embodiment defines above-mentioned according to the output result of each circuit and as the multiplexing circuit structure of hardware resource of claim definition.Should be noted that available multiple different bit Distribution Strategy obtains result identical or that part is identical.Should be noted that different according to the difference of integrated circuit technology and dominant frequency design objective, the division of streamline has multitude of different ways.Present invention includes all changes form of the floating-point SIMD instruction executing device circuit structure of all these resource multiplexes.

Claims

1. the actuating unit of the floating-point SIMD of resource multiplex instruction, this actuating unit comprises:

Mantissa is to the rank shift circuit, be used to select less operand mantissa to carry out rank are shifted, make two operand indexes of floating add subtraction equate, the input data of mantissa's adder circuit are provided, the mantissa that comprises double-precision floating points is to the rank shift unit, to the little shift unit in rank, single precision instructs the mantissa of multiplexing low path to the little shift unit in rank to the mantissa of little shift unit in rank and low path in described mantissa comprises Gao Lu to the rank shift unit mantissa

It is characterized in that: described actuating unit also comprises the SIMD logic instruction;

In described index subtraction circuit, the index of the multiplexing double-precision floating points of index phase reducing of the low path operand of SIMD logic instruction subtracts each other the low path of totalizer, and the index of the multiplexing double-precision floating points of index phase reducing of the high dataway operation number of SIMD logic instruction subtracts each other the Gao Lu of totalizer;

Described mantissa to the rank shift circuit in, the mantissa of the low path operand of SIMD logic instruction to the mantissa of the multiplexing double-precision floating points of rank shifting function to the little mantissa of rank shift unit low path to the rank shift unit, the little mantissa of totalizer Gao Lu is subtracted each other to the rank shift unit to the number of the multiplexing double-precision floating points of rank shifting function by the mantissa of the high dataway operation number of SIMD logic instruction;

In described encapsulated circuit as a result, the low path operand of SIMD logic instruction to mantissa and standardize and adjust the multiplexing low path small dimension shift unit of operation, the high dataway operation number of SIMD logic instruction to mantissa and standardize and adjust the multiplexing high road of operation small dimension shift unit; The multiplexing low path normalization of operation totalizer is adjusted in the low path operand of SIMD logic instruction index is standardized, and the multiplexing high road normalization totalizer of operation is adjusted in the high dataway operation number of SIMD logic instruction index is standardized.

2. the actuating unit of the floating-point SIMD of resource multiplex as claimed in claim 1 instruction, it is characterized in that: described actuating unit also comprises: sign bit is adjusted circuit, is used for carrying out negate or putting 0 at the sign bit of floating-point operation number.

3. the actuating unit of the floating-point SIMD of resource multiplex as claimed in claim 1 instruction, it is characterized in that: described SIMD logic instruction comprises floating add SIMD instruction or floating-point subtraction SIMD instruction.

4. the actuating unit of the floating-point SIMD of resource multiplex as claimed in claim 2 instruction is characterized in that: described SIMD logic instruction comprises floating-point take absolute value SIMD instruction or floating-point negate SIMD instruction.

5. as the actuating unit of the floating-point SIMD of the described resource multiplex of one of claim 1-4 instruction, it is characterized in that: described actuating unit also comprises: pre-number zero logical circuit, the side-play amount when being used for calculating mantissa and first 1 and occurring.

6. as the actuating unit of the floating-point SIMD of the described resource multiplex of one of claim 1-4 instruction, it is characterized in that: double precision instructs multiplexing described index to subtract each other the low path and the part Gao Lu of totalizer, and double precision is instructed the high road index normalization totalizer of multiplexing low path index normalization totalizer and part.