CN105335127A - Scalar operation unit structure supporting floating-point division method in GPDSP - Google Patents
Scalar operation unit structure supporting floating-point division method in GPDSP Download PDFInfo
- Publication number
- CN105335127A CN105335127A CN201510718454.7A CN201510718454A CN105335127A CN 105335127 A CN105335127 A CN 105335127A CN 201510718454 A CN201510718454 A CN 201510718454A CN 105335127 A CN105335127 A CN 105335127A
- Authority
- CN
- China
- Prior art keywords
- floating
- point
- instruction
- scalar
- precision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000008901 benefit Effects 0.000 claims abstract description 17
- 238000007667 floating Methods 0.000 claims description 62
- 238000012545 processing Methods 0.000 claims description 24
- 230000001413 cellular effect Effects 0.000 claims description 17
- 238000010276 construction Methods 0.000 claims description 17
- 102100021935 C-C motif chemokine 26 Human genes 0.000 claims description 11
- 101000897493 Homo sapiens C-C motif chemokine 26 Proteins 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 229920006395 saturated elastomer Polymers 0.000 claims description 10
- 238000006073 displacement reaction Methods 0.000 claims description 9
- 238000012856 packing Methods 0.000 claims description 9
- 238000002360 preparation method Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 4
- 241001269238 Data Species 0.000 claims description 3
- HVXBOLULGPECHP-UHFFFAOYSA-N combretastatin A4 Natural products C1=C(O)C(OC)=CC=C1C=CC1=CC(OC)=C(OC)C(OC)=C1 HVXBOLULGPECHP-UHFFFAOYSA-N 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 125000002950 monocyclic group Chemical group 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 5
- 101150082208 DIABLO gene Proteins 0.000 description 10
- 102100033189 Diablo IAP-binding mitochondrial protein Human genes 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 230000006872 improvement Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Abstract
The invention discloses a scalar operation unit structure supporting a floating-point division method in a GPDSP. The scalar operation unit structure comprises a first component SMAC1, a second component SMAC2 and a third component SIEU, which are used as scalar calculation components and are used for supporting the scalar basic calculation; each scalar calculation component corresponds to one scalar instruction in a VLIW execute packet. The scalar operation unit structure has the advantages that the instruction execution periods are few; the delay is small; the structure is simple; the feasibility is high, and the like.
Description
Technical field
The present invention is mainly concerned with field of microprocessors, refers in particular to one and is applicable in high performance universal DSP (GPDSP) chip, support that the scalar operation unit of floating-point division realizes structure.
Background technology
The develop rapidly of the digital service driven along with internet, mobile communication, consumer electronics, multimedia technology, people need more powerful digital signal processor, process huge data service.Such as high definition 2D or 3D Digital Image Processing, Radar Signal Processing, independent navigation information processing, mobile communication etc.Because these algorithms all have data operation intensity, relate to the computings such as a large amount of floating-point, fixed point, logic, plural basic operation and division.Especially division, the performance of single-precision floating point division or double-precision floating point division arithmetic, by the generation considerable influence to whole processor overall performance, will become the performance bottleneck in some application.
At present, a high performance universal DSP (GPDSP) that directly can support floating-point divide instructions is not had.The general floating-point series DSP of such as TI directly can not realize floating-point divide instructions, and hardware obtains approximate value reciprocal by the method for look-up table, then calls correlator program by Newton iteration mode and realizes division arithmetic.This implementation area is less, but cannot be obtained the floating-point division result of IEEE-754 standard by alternative manner, and relative to direct hardware implementing, the method iterative computation time is longer.
Because division hardware implementation algorithm complexity is high, project organization is complicated, area occupied is comparatively large, generally directly do not design divide block at the vector unit that concurrency is larger.Therefore, a kind ofly support that the scalar operation Unit Design of floating-point division has great importance.
Summary of the invention
The technical problem to be solved in the present invention is just: the technical matters existed for prior art, the invention provides the scalar operation cellular construction that a kind of instruction execution cycle is few, postpone to support in little, that structure is simple, feasibility is good GPDSP floating-point division.
For solving the problems of the technologies described above, the present invention by the following technical solutions:
Support a scalar operation cellular construction for floating-point division in GPDSP, it comprises first component SMAC1, second component SMAC2 as scalar operation parts and the 3rd parts SIEU, for supporting scalar basic operations; The corresponding VLIW of each described scalar operation parts performs a scalar instruction in bag.
As a further improvement on the present invention: also comprise scalar register file, for reading and the written-back operation of data; When receiving the scalar instruction distributing parts and distribute, judge it is belong to which scalar operation parts after decoding, the source operand address of correspondence and read request are delivered to scalar register file simultaneously, until instruction useful signal deliver to scalar operation parts after, the data obtained from scalar register file will be obtained, start to perform computing, finally result is write back scalar register file.
As a further improvement on the present invention: described first component SMAC1 and second component SMAC2 is isomorphism MAC arithmetic unit; Described MAC arithmetic unit comprises floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV; Above-mentioned each functional unit is the separate unit having identical data path, and same period can only have a functional part to perform effective instruction, and after executing, result selects logic by afterbody, exports corresponding destination address to.
As a further improvement on the present invention: described first component SMAC1 and second component SMAC2 is isomorphism MAC arithmetic unit; Described MAC arithmetic unit comprises floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV; Above-mentioned each functional unit is the separate unit having identical data path, and same period can only have a functional part to perform effective instruction, and after executing, result selects logic by afterbody, exports corresponding destination address to.
As a further improvement on the present invention: described floating point multiplication addition unit FMAC is used for processing multicycle complicated floating-point operation, and adopt dynamic pipeline structure, each cycle can flow out an instruction, and each flowing water of same clock period station can perform different operations.
As a further improvement on the present invention: described floating point multiplication addition unit FMAC adopts double precision to be separated FMAC structure with single precision to rank shifting function to rank shifting function, comprising: the normalization processing module S that operand preparation module R, mantissa multiplier module X, double precision multiply-add operation path Y, single precision multiply-add operation path Z, single double precision path are multiplexing; Described operand preparation module R, according to instruction, completes floating-point single precision, the symbol of double precision operand, index, the separation of mantissa and the exception of input operand according to IEEE-754 standard and judges; The single precision multiplication result mantissa that described mantissa multiplier module X is responsible for all instructions calculates; Described double precision multiply-add operation path Y has been used for the afterbody CSA4:2 Partial product compressions that index jump calculates and 161 of double precision operand C calculate rank displacement, double-precision result mantissa of double-precision operation; Described single precision multiply-add operation path Z be used for SIMD take advantage of add, SIMD take advantage of subtract, SIMD multiplication and dot product, complex multiplication operations index jump calculate, mantissa exchange and mantissa exchange after to rank; The multiplexing normalization processing module S of described single double precision path has been used for that the resultant mantissa after to rank displacement calculates, normalization process and index correction operation.
As a further improvement on the present invention: described fixed point multiplicaton addition unit IMAC is used for performing fixed point multiply accumulating; Realization fixed point and floating point multiplication addition, take advantage of and subtract instruction in, two operands inputting multiplier are 64 floating datas, 3-operand be 53 floating-point operation number, result is the floating-point operation number of 64; And perform fixed point when taking advantage of plus-minus instruction, two operands of multiplier are 32 symbol/signless operand, the 3rd operand be one 64 have symbol/without symbolic operand, result be one 64 have symbol/signless destination operand.
As a further improvement on the present invention: described floating-point arithmetic logical block FALU comprises floating-point FALU short period instruction module, floating-point ALU conversion instruction module and floating-point ALU plus-minus method instruction module; Described floating-point FALU short period instruction module comprises all monocyclic floating-point arithmetic logical orders, comprise being greater than of list/double precision, be less than and comparison of equalization instruction, ask the instruction of the index of list/double precision, mantissa and absolute value, calculate the instruction that the instruction of list/double precision Reciprocals sums inverse square root and single-precision floating point convert double-precision floating points to.
As a further improvement on the present invention: described 3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, and both are the separate units with identical data path, same period can only have a functional unit to perform effective instruction; After executing, result selects logic by afterbody, exports corresponding destination address to; Meanwhile, saturated and unsaturation situation when performing according to instruction, also can produce asserts signal, indicate this condition execution instruction.
As a further improvement on the present invention: described BP unit comprises three functional units, be 64 bit shift device unit shifter, bit processing unit Bitp and packing unwrapper unit PK respectively; The decoded signal of standing out from decoding and the source operand coming from register are received by three functional units, and starting computing immediately, final Output rusults unpacks selection and bit processing unit result the operation result of functional unit according to selection signal from displacement and packing and exports; If packing unpacks instruction, and need to sentence saturated, saturated mark can export to status register while result exports.
Compared with prior art, the invention has the advantages that:
1, support the scalar operation cellular construction of floating-point division in GPDSP of the present invention, based on the floating-point divide instructions hardware implementing structure of SRT-8 algorithm, have instruction execution cycle few, postpone little, structure is simple, the feature that feasibility is good.Meanwhile, whole SPE structure can provide logical multiplexing to design, and better meets the portability of design and the feature of area controllability.
2, supporting the scalar operation cellular construction of floating-point division in GPDSP of the present invention, is hybrid operation unit, can realize 64 fixed points, 32 fixed points, double-precision floating point and single-precision floating point related operations, complete function.
3, support the scalar operation cellular construction of floating-point division in GPDSP of the present invention, the executed in parallel of three streamlines can be realized, be applicable to realizing within a processor; Floating-point operation part supports floating point multiplication addition, multiplication, addition, also has the related operations such as plural number, dot product, can meet the requirement of various application occasions.
4, support the scalar operation cellular construction of floating-point division in GPDSP of the present invention, multiplexing critical component 64*64 multiplier, can realize fixed point and floating-point mixing multiplication on the same hardware platform, area overhead is little.The present invention supports to design based on the division of SRT-8 algorithm, can realize 64 have symbol without symbol fixed point integer division, 32 have symbol without the instruction of symbol fixed point integer division, double-precision floating point division and single-precision floating point division.
Accompanying drawing explanation
Fig. 1 is scalar operation unit (SPE) of the present invention position view within a processor.
Fig. 2 is the topological structure schematic diagram of scalar operation unit (SPE) of the present invention.
Fig. 3 is the data path architecture schematic diagram of scalar operation unit (SPE) of the present invention.
Fig. 4 is the topological structure schematic diagram of the present invention's SMAC parts in embody rule example.
Fig. 5 is the structural representation of the present invention SMAC parts subelement FMAC in embody rule example.
Fig. 6 is the structural representation of the present invention SMAC parts subelement IMAC in embody rule example.
Fig. 7 is the structural representation of the present invention SMAC parts subelement FALU in embody rule example.
Fig. 8 is the structural representation of the present invention SMAC parts subelement FDIV in embody rule example.
Fig. 9 is the structural representation of the present invention's SMAC part reusing 64*64 multiplier in embody rule example.
Figure 10 is the topological structure schematic diagram of the present invention's IEU parts in embody rule example.
Figure 11 is the structural representation of the present invention SIEU parts subelement BP in embody rule example.
Figure 12 is the structural representation of the present invention SIEU parts subelement IALU in embody rule example.
Embodiment
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
As shown in Figure 1, be scalar operation unit (SPE) position within a processor of the present invention.Scalar operation cell S PE of the present invention is arranged in the scalar processing unit SC of processor, by receiving in instruction flow control unit the scalar operation class instruction distributing parts and distribute, delivering to function arithmetic element corresponding in SPE and performing after decoding.Also comprise scalar data memory access unit in scalar processing unit SC, it can realize the memory access flowing water station controls such as the decoding of scalar access instruction, address computation and data write back, and can also provide Data support for SPE; SPE provides the related operation such as address process, digital independent also can to scalar memory access unit simultaneously.
As shown in Figure 2, be the topological structure schematic diagram of scalar operation cell S PE of the present invention.Inner integrated three arithmetic units of SPE are first component SMAC1, second component SMAC2 and the 3rd parts SIEU, for supporting scalar basic operations respectively.Each scalar operation parts, corresponding VLIW performs a scalar instruction in bag, namely SPE comprise three can the streamline of executed in parallel.Meanwhile, in SPE, also comprise a scalar register file, for reading and the written-back operation of data.When SPE receives the scalar instruction distributing parts and distribute, judge it is belong to which arithmetic unit after decoding, the source operand address of correspondence and read request are delivered to scalar register file simultaneously, until instruction useful signal deliver to arithmetic unit after, the data obtained from scalar register file will be obtained, start to perform computing, finally result is write back scalar register file.Scalar register file in processor is arranged in scalar operation cell S PE, it can provide independently reading-writing port for first component SMAC1, second component SMAC2 and these three functional parts of the 3rd parts SIEU, to ensure that each functional part meets it and realizes instruction action required and keep count of, as first component SMAC1, second component SMAC2 have 3 read ports and 1 write port respectively, 3rd parts SIEU has 2 read ports, 1 write port.
As shown in Figure 3 and Figure 4, first component SMAC1, second component SMAC2 are isomorphism MAC arithmetic unit, each MAC arithmetic unit comprises four independently functional units, is respectively: floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV.Wherein, floating point multiplication addition unit FMAC and fixed point multiplicaton addition unit IMAC is a multiplexing 64*64 multiplier, makes the area of whole processor reduce to some extent.Floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, these four functional units of floating-point arithmetic logical block FALU, floating-point division unit F DIV are the separate units having identical data path, same period can only have a functional part to perform effective instruction, after executing, result selects logic by afterbody, exports corresponding destination address to; Namely same period can not start to perform or write back simultaneously, but can by software flow schedule parallel; Its operand is originated the data be mainly in scalar register file, can also from immediate Imm.Except floating-point arithmetic logical block FALU only has two operand instruction, other three unit all support 3-operand instruction.3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, and both are the separate units with identical data path, and both same periods can not start to perform or write back simultaneously, can be dispatched realize walking abreast by software flow; Its operand is originated the data be mainly in scalar register file, can also share register SVR, vector location Partial controll register VULCR, both only have two operand instruction from immediate Imm, mark vector.
As shown in Figure 5, floating point multiplication addition unit FMAC is the functional part processing multicycle complicated floating-point operation in processor calculating unit.These parts can realize 4 class floating point instructions: multiplying order, take advantage of add instruction, take advantage of after add instruction, add instruction.It adopts dynamic pipeline structure, and each cycle can flow out an instruction, and each flowing water of same clock period station can perform different operations.
In the present embodiment, floating point multiplication addition unit FMAC adopts double precision to be separated FMAC structure with single precision to rank shifting function to rank shifting function, the i.e. FMAC structure of path separation, object is set out to simplify hardware implementing algorithm and reducing the large bit wide data register between standing.Its general structure is made up of five parts: operand preparation module R, mantissa multiplier module X, double precision multiply-add operation path Y, single precision multiply-add operation path Z, the normalization processing module S that single double precision path is multiplexing.Wherein, operand preparation module R, according to instruction, completes floating-point single precision, the symbol of double precision operand, index, the separation of mantissa and the exception of input operand according to IEEE-754 standard and judges.The single precision multiplication result mantissa that mantissa multiplier module X is responsible for all instructions calculates.Double precision path Y completes the index jump calculating of double-precision operation and 161 afterbody CSA4:2 Partial product compressions calculated rank displacement, double-precision result mantissa of double precision operand C.Single precision path Z completes SIMD and takes advantage of and add, and SIMD takes advantage of and subtracts, the index jump of SIMD multiplication and dot product, complex multiplication operations calculates, mantissa exchanges and mantissa exchange after to rank.S module completes the operations such as the resultant mantissa calculating after to rank displacement, normalization process and index correction.
As shown in Figure 6, fixed point multiplicaton addition unit IMAC is the functional unit performing fixed point multiply accumulating in operation processing unit, can perform fixed point signed magnitude arithmetic(al), multiplying, multiply-add operation, take advantage of and subtract computing, the computing of MOV class.A large amount of fixed point plus-minus method and MOV computing is all there is in fixed-point algorithm operation and control process, object fixed point plus-minus method and MOV being integrated into fixed point MAC unit is the instruction slots in order to increase fixed point plus-minus method and MOV class in VLIW instruction, improves arithmetic speed.Realization fixed point and floating point multiplication addition, take advantage of and subtract instruction in, two operands inputting multiplier are 64 floating datas, 3-operand be 53 floating-point operation number, result is the floating-point operation number of 64; And fixed point takes advantage of the realization of plus-minus instruction, two operands of multiplier are 32 symbol/signless operand, the 3rd operand be one 64 have symbol/without symbolic operand, result be one 64 have symbol/signless destination operand.Also support the related operation of fixed point point sum plural number.
As shown in Figure 7, in the structure of floating-point arithmetic logical block FALU, according to practical function and the difference of instruction cycle, divide into three sub-execution modules: floating-point FALU short period instruction module, floating-point ALU conversion instruction module and floating-point ALU plus-minus method instruction module; Wherein, floating-point FALU short period instruction module comprises all monocyclic floating-point arithmetic logical orders, comprise being greater than of list/double precision, be less than and comparison of equalization instruction, ask the instruction of the index of list/double precision, mantissa and absolute value, calculate the instruction that the instruction of list/double precision Reciprocals sums inverse square root and single-precision floating point convert double-precision floating points to; Floating-point ALU conversion instruction module achieves between 2 cycle floating-points and fixed point and type conversion instructions between list/double-precision floating point, comprise that list/double-precision floating points is converted to integer instructions, list/double-precision floating points block convert integer to, (have symbol or without symbol) integer is converted to list/double-precision floating points, double-precision floating point converts single precision floating datum instruction to; Floating-point ALU plus-minus method instruction module achieves the list/double-precision floating point plus-minus method instruction in 4 cycles.
As shown in Figure 8, be the floating-point division unit F DIV in the present embodiment, wherein follow operand A and B of IEEE-754 floating-point format for two, perform divide operations between them, its computing can be divided into the following steps:
Whether be exception data, and arrange result data if S1. detecting operand.
S2. the sign bit of result of calculation value: two sign bit XORs.
S3. the exponential part of result of calculation, two index exponents are subtracted each other.
S4. mantissa is divided by: mantissa's low level of divisor is increased by 0 and makes it to increase to original one times to the mantissa's figure place expanding divisor with this, obtain the figure place result of accuracy limitations.
S5. result normalization: after mantissa is divided by, may need to move to left, reduce index simultaneously, and according to rounding mode, carries out mantissa result adjustment.
S6. abnormality detection: the generation overflow and the underflow that floating-point division are defined to two kinds of exceptions in IEEE-754: if the greatest exponential value that result exponent allows beyond precision, returns overflow abnormal; If result exponent is also less than the minimal index value of precision defined, return underflow exception.
The structural design of this floating-point division unit F DIV is based on the SIMD structure floating-point divide instructions of SRT-8 algorithm.Described SRT-8 instruction is divided into 01,10,11 call instruction, and the selection signal namely shown in figure selects the type that instruction performs; It performs double-precision floating point 1 ~ 6 (the two single-precision floating point 1 ~ 3 of SIMD) secondary division iterations respectively, double-precision floating point 7 ~ 12 (the two single-precision floating point 4 ~ 6 of SIMD) secondary division iterations, double-precision floating point 13 ~ 18 (the two single-precision floating point 7 ~ 9 of SIMD) secondary division iterations.Finally according to remainder and business's result of SRT-8 instruction output, and the call number of SRT-8 instruction, business's result of normalization double-precision floating point division or the two single-precision floating point division nonidentity operation precision of SIMD.
This structure, based on SRT-8 algorithm, utilizes hardware resource multiplex technique and iteration cutting technique, Parallel Implementation double-precision floating point division on same hardware configuration, the two single-precision floating point division function of SIMD.
SPE structural support floating-point division of the present invention.Scalar floating-point division parts (SFDIV) are the functional units performing the computing of scalar floating-point division in SMAC parts, mainly achieve four instructions, be respectively based on the double-precision floating point division iterations instruction of SRT-8 division algorithm, double-precision floating point division standardizing order, based on the two single-precision floating point division iterations instruction of SIMD of SRT-8 division algorithm and the two single-precision floating point division standardizing order of SIMD.In GPDSP, instruction adopts 40 codings, and its SIMD floating-point divide instructions collection comprises two double-precision floating point divide instruction (FSRT8D and FNORMD) and two two single-precision floating point divide instruction (FSRT8S32 and FNORMS32) of SIMD.Its instruction description realized is as shown in table 1:
Table 1 divider instruction type and function
Instruction name | Beat number | Coding figure place | Command function | Scalar instruction |
SFSRT8D | 7 | 40 | Double-precision floating point division iterations | Be |
SFNORMD | 2 | 40 | Double-precision floating point division is standardized | Be |
SFSRT8S32 | 4 | 40 | The two single-precision floating point division iterations of SIMD | Be |
SFNORMS32 | 2 | 40 | The two single-precision floating point division normalization of SIMD | Be |
As shown in Figure 9, be the 64*64 multiplier architecture schematic diagram of SMAC part reusing in the present invention.In SMAC structure, consider that 64x64 position multiplying order takies very large-area situation in the design, the multiplier in SMAC parts adopts logic module reuse plan, and fixed point/floating-point Multiplexing module main body is four 32x32 position multipliers.Start to perform multiply operation through operand process after data input, according to different instructions, need the instruction according to distributing, advanced line operate number is selected and Bits Expanding process, is input to respectively by the operand handled well in 4 32x32 multipliers and carries out multiplying.Result divides fixed point results and floating point result, and then different according to instruction, result writes back register or is sent to the next stop.
As shown in Figure 10, be the topological structure schematic diagram of the 3rd parts IEU in embody rule example.3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, and both are the separate units with identical data path, and same period can only have a functional unit to perform effective instruction.After executing, result selects logic by afterbody, exports corresponding destination address to.Meanwhile, saturated and unsaturation situation when performing according to instruction, also can produce asserts signal, indicate this condition execution instruction.That is: can not start both same period to perform or write back simultaneously, can be dispatched by software flow and realize walking abreast; Its operand is originated the data be mainly in scalar register file, can also share register SVR, vector location Partial controll register VULCR, both only have two operand instruction from immediate Imm, mark vector.
As shown in figure 11, be the topological structure schematic diagram of bit processing unit BP in embody rule example.BP unit comprises three functional units, is 64 bit shift device unit shifter, bit processing unit Bitp and packing unwrapper unit PK respectively.The decoded signal of standing out from decoding and the source operand coming from register are received by three functional units, and starting computing immediately, final Output rusults unpacks selection and bit processing unit result the operation result of functional unit according to selection signal from displacement and packing and exports.If packing unpacks instruction, and need to sentence saturated, saturated mark can export to status register while result exports.
As shown in figure 12, be the topological structure schematic diagram of fixed point arithmetic logical block IALU in embody rule example.Fixed point arithmetic logical block IALU contains 8 submodules, and because the time delay of plus-minus method operation is maximum, in the structural design of IALU, the multiplexing addition of discord comparing class instruction, adopts the adder structure be separated.Logic selecting sequence placing it in afterbody, in order to further reduce time delay, the operation of saturation add-minus method being separated, by plus-minus method instruction, realization combined by saturated instruction and relevant control register.
Below be only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, should be considered as protection scope of the present invention.
Claims (10)
1. supporting a scalar operation cellular construction for floating-point division in GPDSP, it is characterized in that, comprising first component SMAC1, the second component SMAC2 as scalar operation parts and the 3rd parts SIEU, for supporting scalar basic operations; The corresponding VLIW of each described scalar operation parts performs a scalar instruction in bag.
2. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 1, it is characterized in that, also comprise scalar register file, for reading and the written-back operation of data; When receiving the scalar instruction distributing parts and distribute, judge it is belong to which scalar operation parts after decoding, the source operand address of correspondence and read request are delivered to scalar register file simultaneously, until instruction useful signal deliver to scalar operation parts after, the data obtained from scalar register file will be obtained, start to perform computing, finally result is write back scalar register file.
3. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 1, it is characterized in that, described first component SMAC1 and second component SMAC2 is isomorphism MAC arithmetic unit; Described MAC arithmetic unit comprises floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV; Above-mentioned each functional unit is the separate unit having identical data path, and same period can only have a functional part to perform effective instruction, and after executing, result selects logic by afterbody, exports corresponding destination address to.
4. in GPDSP according to claim 3, support the scalar operation cellular construction of floating-point division, it is characterized in that, the multiplier of described floating point multiplication addition unit FMAC and fixed point multiplicaton addition unit IMAC adopts logic module reuse plan, and fixed point/floating-point Multiplexing module main body is four 32x32 position multipliers; Start to perform multiply operation through operand process after data input, according to different instructions, according to the instruction distributed, advanced line operate number is selected and Bits Expanding process, is input to respectively by the operand handled well in 4 32x32 multipliers and carries out multiplying; Result divides fixed point results and floating point result, and then different according to instruction, result writes back register or is sent to the next stop.
5. in GPDSP according to claim 3, support the scalar operation cellular construction of floating-point division, it is characterized in that, described floating point multiplication addition unit FMAC is used for processing multicycle complicated floating-point operation, adopt dynamic pipeline structure, each cycle can flow out an instruction, and each flowing water of same clock period station can perform different operations.
6. in GPDSP according to claim 5, support the scalar operation cellular construction of floating-point division, it is characterized in that, described floating point multiplication addition unit FMAC adopts double precision to be separated FMAC structure with single precision to rank shifting function to rank shifting function, comprising: the normalization processing module S that operand preparation module R, mantissa multiplier module X, double precision multiply-add operation path Y, single precision multiply-add operation path Z, single double precision path are multiplexing; Described operand preparation module R, according to instruction, completes floating-point single precision, the symbol of double precision operand, index, the separation of mantissa and the exception of input operand according to IEEE-754 standard and judges; The single precision multiplication result mantissa that described mantissa multiplier module X is responsible for all instructions calculates; Described double precision multiply-add operation path Y has been used for the afterbody CSA4:2 Partial product compressions that index jump calculates and 161 of double precision operand C calculate rank displacement, double-precision result mantissa of double-precision operation; Described single precision multiply-add operation path Z be used for SIMD take advantage of add, SIMD take advantage of subtract, SIMD multiplication and dot product, complex multiplication operations index jump calculate, mantissa exchange and mantissa exchange after to rank; The multiplexing normalization processing module S of described single double precision path has been used for that the resultant mantissa after to rank displacement calculates, normalization process and index correction operation.
7. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 3, it is characterized in that, described fixed point multiplicaton addition unit IMAC is used for performing fixed point multiply accumulating; Realization fixed point and floating point multiplication addition, take advantage of and subtract instruction in, two operands inputting multiplier are 64 floating datas, 3-operand be 53 floating-point operation number, result is the floating-point operation number of 64; And perform fixed point when taking advantage of plus-minus instruction, two operands of multiplier are 32 symbol/signless operand, the 3rd operand be one 64 have symbol/without symbolic operand, result be one 64 have symbol/signless destination operand.
8. in GPDSP according to claim 3, support the scalar operation cellular construction of floating-point division, it is characterized in that, described floating-point arithmetic logical block FALU comprises floating-point FALU short period instruction module, floating-point ALU conversion instruction module and floating-point ALU plus-minus method instruction module; Described floating-point FALU short period instruction module comprises all monocyclic floating-point arithmetic logical orders, comprise being greater than of list/double precision, be less than and comparison of equalization instruction, ask the instruction of the index of list/double precision, mantissa and absolute value, calculate the instruction that the instruction of list/double precision Reciprocals sums inverse square root and single-precision floating point convert double-precision floating points to.
9. according to the scalar operation cellular construction supporting floating-point division in the GPDSP in claim 1 ~ 8 described in any one, it is characterized in that, described 3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, both are the separate units with identical data path, and same period can only have a functional unit to perform effective instruction; After executing, result selects logic by afterbody, exports corresponding destination address to; Meanwhile, saturated and unsaturation situation when performing according to instruction, also can produce asserts signal, indicate this condition execution instruction.
10. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 9, it is characterized in that, described BP unit comprises three functional units, is 64 bit shift device unit shifter, bit processing unit Bitp and packing unwrapper unit PK respectively; The decoded signal of standing out from decoding and the source operand coming from register are received by three functional units, and starting computing immediately, final Output rusults unpacks selection and bit processing unit result the operation result of functional unit according to selection signal from displacement and packing and exports; If packing unpacks instruction, and need to sentence saturated, saturated mark can export to status register while result exports.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510718454.7A CN105335127A (en) | 2015-10-29 | 2015-10-29 | Scalar operation unit structure supporting floating-point division method in GPDSP |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510718454.7A CN105335127A (en) | 2015-10-29 | 2015-10-29 | Scalar operation unit structure supporting floating-point division method in GPDSP |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105335127A true CN105335127A (en) | 2016-02-17 |
Family
ID=55285703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510718454.7A Pending CN105335127A (en) | 2015-10-29 | 2015-10-29 | Scalar operation unit structure supporting floating-point division method in GPDSP |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105335127A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709858A (en) * | 2016-12-12 | 2017-05-24 | 中国航空工业集团公司西安航空计算技术研究所 | Single-instruction multi-thread staining processing unit structure for uniform staining graphic processing unit |
CN107748674A (en) * | 2017-09-07 | 2018-03-02 | 中国科学院微电子研究所 | The information processing system of Bit Oriented granularity |
CN108762720A (en) * | 2018-06-14 | 2018-11-06 | 北京比特大陆科技有限公司 | Data processing method, data processing equipment and electronic equipment |
CN109426738A (en) * | 2017-08-23 | 2019-03-05 | 中芯国际集成电路制造(上海)有限公司 | A kind of hardware decoder and encryption method, electronic device |
CN109783055A (en) * | 2017-11-10 | 2019-05-21 | 瑞昱半导体股份有限公司 | Floating point arithmetic circuit and method |
CN111290790A (en) * | 2020-01-22 | 2020-06-16 | 安徽大学 | Conversion device for converting fixed point into floating point |
CN112506468A (en) * | 2020-12-09 | 2021-03-16 | 上海交通大学 | RISC-V general processor supporting high throughput multi-precision multiplication |
CN112835551A (en) * | 2021-03-09 | 2021-05-25 | 上海壁仞智能科技有限公司 | Data processing method for processing unit, electronic device, and computer-readable storage medium |
CN113157247A (en) * | 2021-04-23 | 2021-07-23 | 西安交通大学 | Reconfigurable integer-floating point multiplier |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021832A (en) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | 64 bit floating-point integer amalgamated arithmetic group capable of supporting local register and conditional execution |
CN101093442A (en) * | 2007-07-18 | 2007-12-26 | 中国科学院计算技术研究所 | Carry verification device of floating point unit for multiply and summation, and multiplication CSA compression tree |
CN101174200A (en) * | 2007-05-18 | 2008-05-07 | 清华大学 | 5-grade stream line structure of floating point multiplier adder integrated unit |
CN101986264A (en) * | 2010-11-25 | 2011-03-16 | 中国人民解放军国防科学技术大学 | Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor |
CN103984521A (en) * | 2014-05-27 | 2014-08-13 | 中国人民解放军国防科学技术大学 | Method and device for achieving SIMD structure floating point division in general-purpose digital signal processor (GPDSP) |
CN103984522A (en) * | 2014-05-27 | 2014-08-13 | 中国人民解放军国防科学技术大学 | Method for achieving fixed point and floating point mixed division in general-purpose digital signal processor (GPDSP) |
-
2015
- 2015-10-29 CN CN201510718454.7A patent/CN105335127A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021832A (en) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | 64 bit floating-point integer amalgamated arithmetic group capable of supporting local register and conditional execution |
CN101174200A (en) * | 2007-05-18 | 2008-05-07 | 清华大学 | 5-grade stream line structure of floating point multiplier adder integrated unit |
CN101093442A (en) * | 2007-07-18 | 2007-12-26 | 中国科学院计算技术研究所 | Carry verification device of floating point unit for multiply and summation, and multiplication CSA compression tree |
CN101986264A (en) * | 2010-11-25 | 2011-03-16 | 中国人民解放军国防科学技术大学 | Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor |
CN103984521A (en) * | 2014-05-27 | 2014-08-13 | 中国人民解放军国防科学技术大学 | Method and device for achieving SIMD structure floating point division in general-purpose digital signal processor (GPDSP) |
CN103984522A (en) * | 2014-05-27 | 2014-08-13 | 中国人民解放军国防科学技术大学 | Method for achieving fixed point and floating point mixed division in general-purpose digital signal processor (GPDSP) |
Non-Patent Citations (6)
Title |
---|
吴珊 等: ""32 位DSP 通路分离乘加部件的设计与验证"", 《第18届全国半导体集成电路、硅材料学术会议》 * |
宋博荣: ""X-DSP SIMD浮点算术逻辑部件的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
宋博荣: ""X-DSP SIMD浮点算术逻辑部件的设计与实现"", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
彭浩: ""X-DSP 64 位 SIMD 位处理部件及混洗单元的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
邓子椰: ""一种基于SRT-8算法的SIMD浮点除法器的设计与实现"", 《计算机工程与科学》 * |
韩珊珊 等: ""基于定点与浮点复用的SIMD乘法器设计与实现"", 《第18届全国半导体集成电路、硅材料学术会议》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709858A (en) * | 2016-12-12 | 2017-05-24 | 中国航空工业集团公司西安航空计算技术研究所 | Single-instruction multi-thread staining processing unit structure for uniform staining graphic processing unit |
CN109426738B (en) * | 2017-08-23 | 2021-11-12 | 中芯国际集成电路制造(上海)有限公司 | Hardware encryptor, encryption method and electronic device |
CN109426738A (en) * | 2017-08-23 | 2019-03-05 | 中芯国际集成电路制造(上海)有限公司 | A kind of hardware decoder and encryption method, electronic device |
CN107748674A (en) * | 2017-09-07 | 2018-03-02 | 中国科学院微电子研究所 | The information processing system of Bit Oriented granularity |
CN107748674B (en) * | 2017-09-07 | 2021-08-31 | 中国科学院微电子研究所 | Information processing system oriented to bit granularity |
CN109783055B (en) * | 2017-11-10 | 2021-02-12 | 瑞昱半导体股份有限公司 | Floating-point number arithmetic circuit and method |
CN109783055A (en) * | 2017-11-10 | 2019-05-21 | 瑞昱半导体股份有限公司 | Floating point arithmetic circuit and method |
CN108762720B (en) * | 2018-06-14 | 2021-06-29 | 北京比特大陆科技有限公司 | Data processing method, data processing device and electronic equipment |
CN108762720A (en) * | 2018-06-14 | 2018-11-06 | 北京比特大陆科技有限公司 | Data processing method, data processing equipment and electronic equipment |
CN111290790A (en) * | 2020-01-22 | 2020-06-16 | 安徽大学 | Conversion device for converting fixed point into floating point |
CN111290790B (en) * | 2020-01-22 | 2023-03-24 | 安徽大学 | Conversion device for converting fixed point into floating point |
CN112506468A (en) * | 2020-12-09 | 2021-03-16 | 上海交通大学 | RISC-V general processor supporting high throughput multi-precision multiplication |
CN112835551A (en) * | 2021-03-09 | 2021-05-25 | 上海壁仞智能科技有限公司 | Data processing method for processing unit, electronic device, and computer-readable storage medium |
CN113157247A (en) * | 2021-04-23 | 2021-07-23 | 西安交通大学 | Reconfigurable integer-floating point multiplier |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105335127A (en) | Scalar operation unit structure supporting floating-point division method in GPDSP | |
CN102262525B (en) | Vector-operation-based vector floating point operational device and method | |
CN104111816B (en) | Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP | |
CN110168493B (en) | Fused multiply-add floating-point operations on 128-bit wide operands | |
US8838664B2 (en) | Methods and apparatus for compressing partial products during a fused multiply-and-accumulate (FMAC) operation on operands having a packed-single-precision format | |
US20090113169A1 (en) | Reconfigurable array processor for floating-point operations | |
US20090198974A1 (en) | Methods for conflict-free, cooperative execution of computational primitives on multiple execution units | |
JP4232838B2 (en) | Reconfigurable SIMD type processor | |
CN103984521B (en) | The implementation method and device of SIMD architecture floating-point division in GPDSP | |
US8996601B2 (en) | Method and apparatus for multiply instructions in data processors | |
CN106951211A (en) | A kind of restructural fixed and floating general purpose multipliers | |
US9996345B2 (en) | Variable length execution pipeline | |
US20130282784A1 (en) | Arithmetic processing device and methods thereof | |
US20100125621A1 (en) | Arithmetic processing device and methods thereof | |
CN103984522A (en) | Method for achieving fixed point and floating point mixed division in general-purpose digital signal processor (GPDSP) | |
CN104991757A (en) | Floating point processing method and floating point processor | |
CN100367191C (en) | Fast pipeline type divider | |
US8019805B1 (en) | Apparatus and method for multiple pass extended precision floating point multiplication | |
Rupley et al. | The floating-point unit of the jaguar x86 core | |
GB2511314A (en) | Fast fused-multiply-add pipeline | |
CN105335128A (en) | 64-bit fixed-point ALU (arithmetic logical unit) circuit based on three-stage carry lookahead adder in GPDSP | |
CN202331425U (en) | Vector floating point arithmetic device based on vector arithmetic | |
Lasith et al. | Efficient implementation of single precision floating point processor in FPGA | |
US20140052767A1 (en) | Apparatus and architecture for general powering computation | |
Baesler et al. | FPGA implementation of a decimal floating-point accurate scalar product unit with a parallel fixed-point multiplier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160217 |