CN105335127A - Scalar operation unit structure supporting floating-point division method in GPDSP - Google Patents

Scalar operation unit structure supporting floating-point division method in GPDSP Download PDF

Info

Publication number
CN105335127A
CN105335127A CN201510718454.7A CN201510718454A CN105335127A CN 105335127 A CN105335127 A CN 105335127A CN 201510718454 A CN201510718454 A CN 201510718454A CN 105335127 A CN105335127 A CN 105335127A
Authority
CN
China
Prior art keywords
floating
point
instruction
scalar
precision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510718454.7A
Other languages
Chinese (zh)
Inventor
彭元喜
雷元武
彭浩
陈书明
郭阳
刘祥远
田甜
徐恩
胡封林
刘仲
孙永节
陈虎
刘胜
王耀华
吴虎成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201510718454.7A priority Critical patent/CN105335127A/en
Publication of CN105335127A publication Critical patent/CN105335127A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]

Abstract

The invention discloses a scalar operation unit structure supporting a floating-point division method in a GPDSP. The scalar operation unit structure comprises a first component SMAC1, a second component SMAC2 and a third component SIEU, which are used as scalar calculation components and are used for supporting the scalar basic calculation; each scalar calculation component corresponds to one scalar instruction in a VLIW execute packet. The scalar operation unit structure has the advantages that the instruction execution periods are few; the delay is small; the structure is simple; the feasibility is high, and the like.

Description

The scalar operation cellular construction of floating-point division is supported in GPDSP
Technical field
The present invention is mainly concerned with field of microprocessors, refers in particular to one and is applicable in high performance universal DSP (GPDSP) chip, support that the scalar operation unit of floating-point division realizes structure.
Background technology
The develop rapidly of the digital service driven along with internet, mobile communication, consumer electronics, multimedia technology, people need more powerful digital signal processor, process huge data service.Such as high definition 2D or 3D Digital Image Processing, Radar Signal Processing, independent navigation information processing, mobile communication etc.Because these algorithms all have data operation intensity, relate to the computings such as a large amount of floating-point, fixed point, logic, plural basic operation and division.Especially division, the performance of single-precision floating point division or double-precision floating point division arithmetic, by the generation considerable influence to whole processor overall performance, will become the performance bottleneck in some application.
At present, a high performance universal DSP (GPDSP) that directly can support floating-point divide instructions is not had.The general floating-point series DSP of such as TI directly can not realize floating-point divide instructions, and hardware obtains approximate value reciprocal by the method for look-up table, then calls correlator program by Newton iteration mode and realizes division arithmetic.This implementation area is less, but cannot be obtained the floating-point division result of IEEE-754 standard by alternative manner, and relative to direct hardware implementing, the method iterative computation time is longer.
Because division hardware implementation algorithm complexity is high, project organization is complicated, area occupied is comparatively large, generally directly do not design divide block at the vector unit that concurrency is larger.Therefore, a kind ofly support that the scalar operation Unit Design of floating-point division has great importance.
Summary of the invention
The technical problem to be solved in the present invention is just: the technical matters existed for prior art, the invention provides the scalar operation cellular construction that a kind of instruction execution cycle is few, postpone to support in little, that structure is simple, feasibility is good GPDSP floating-point division.
For solving the problems of the technologies described above, the present invention by the following technical solutions:
Support a scalar operation cellular construction for floating-point division in GPDSP, it comprises first component SMAC1, second component SMAC2 as scalar operation parts and the 3rd parts SIEU, for supporting scalar basic operations; The corresponding VLIW of each described scalar operation parts performs a scalar instruction in bag.
As a further improvement on the present invention: also comprise scalar register file, for reading and the written-back operation of data; When receiving the scalar instruction distributing parts and distribute, judge it is belong to which scalar operation parts after decoding, the source operand address of correspondence and read request are delivered to scalar register file simultaneously, until instruction useful signal deliver to scalar operation parts after, the data obtained from scalar register file will be obtained, start to perform computing, finally result is write back scalar register file.
As a further improvement on the present invention: described first component SMAC1 and second component SMAC2 is isomorphism MAC arithmetic unit; Described MAC arithmetic unit comprises floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV; Above-mentioned each functional unit is the separate unit having identical data path, and same period can only have a functional part to perform effective instruction, and after executing, result selects logic by afterbody, exports corresponding destination address to.
As a further improvement on the present invention: described first component SMAC1 and second component SMAC2 is isomorphism MAC arithmetic unit; Described MAC arithmetic unit comprises floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV; Above-mentioned each functional unit is the separate unit having identical data path, and same period can only have a functional part to perform effective instruction, and after executing, result selects logic by afterbody, exports corresponding destination address to.
As a further improvement on the present invention: described floating point multiplication addition unit FMAC is used for processing multicycle complicated floating-point operation, and adopt dynamic pipeline structure, each cycle can flow out an instruction, and each flowing water of same clock period station can perform different operations.
As a further improvement on the present invention: described floating point multiplication addition unit FMAC adopts double precision to be separated FMAC structure with single precision to rank shifting function to rank shifting function, comprising: the normalization processing module S that operand preparation module R, mantissa multiplier module X, double precision multiply-add operation path Y, single precision multiply-add operation path Z, single double precision path are multiplexing; Described operand preparation module R, according to instruction, completes floating-point single precision, the symbol of double precision operand, index, the separation of mantissa and the exception of input operand according to IEEE-754 standard and judges; The single precision multiplication result mantissa that described mantissa multiplier module X is responsible for all instructions calculates; Described double precision multiply-add operation path Y has been used for the afterbody CSA4:2 Partial product compressions that index jump calculates and 161 of double precision operand C calculate rank displacement, double-precision result mantissa of double-precision operation; Described single precision multiply-add operation path Z be used for SIMD take advantage of add, SIMD take advantage of subtract, SIMD multiplication and dot product, complex multiplication operations index jump calculate, mantissa exchange and mantissa exchange after to rank; The multiplexing normalization processing module S of described single double precision path has been used for that the resultant mantissa after to rank displacement calculates, normalization process and index correction operation.
As a further improvement on the present invention: described fixed point multiplicaton addition unit IMAC is used for performing fixed point multiply accumulating; Realization fixed point and floating point multiplication addition, take advantage of and subtract instruction in, two operands inputting multiplier are 64 floating datas, 3-operand be 53 floating-point operation number, result is the floating-point operation number of 64; And perform fixed point when taking advantage of plus-minus instruction, two operands of multiplier are 32 symbol/signless operand, the 3rd operand be one 64 have symbol/without symbolic operand, result be one 64 have symbol/signless destination operand.
As a further improvement on the present invention: described floating-point arithmetic logical block FALU comprises floating-point FALU short period instruction module, floating-point ALU conversion instruction module and floating-point ALU plus-minus method instruction module; Described floating-point FALU short period instruction module comprises all monocyclic floating-point arithmetic logical orders, comprise being greater than of list/double precision, be less than and comparison of equalization instruction, ask the instruction of the index of list/double precision, mantissa and absolute value, calculate the instruction that the instruction of list/double precision Reciprocals sums inverse square root and single-precision floating point convert double-precision floating points to.
As a further improvement on the present invention: described 3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, and both are the separate units with identical data path, same period can only have a functional unit to perform effective instruction; After executing, result selects logic by afterbody, exports corresponding destination address to; Meanwhile, saturated and unsaturation situation when performing according to instruction, also can produce asserts signal, indicate this condition execution instruction.
As a further improvement on the present invention: described BP unit comprises three functional units, be 64 bit shift device unit shifter, bit processing unit Bitp and packing unwrapper unit PK respectively; The decoded signal of standing out from decoding and the source operand coming from register are received by three functional units, and starting computing immediately, final Output rusults unpacks selection and bit processing unit result the operation result of functional unit according to selection signal from displacement and packing and exports; If packing unpacks instruction, and need to sentence saturated, saturated mark can export to status register while result exports.
Compared with prior art, the invention has the advantages that:
1, support the scalar operation cellular construction of floating-point division in GPDSP of the present invention, based on the floating-point divide instructions hardware implementing structure of SRT-8 algorithm, have instruction execution cycle few, postpone little, structure is simple, the feature that feasibility is good.Meanwhile, whole SPE structure can provide logical multiplexing to design, and better meets the portability of design and the feature of area controllability.
2, supporting the scalar operation cellular construction of floating-point division in GPDSP of the present invention, is hybrid operation unit, can realize 64 fixed points, 32 fixed points, double-precision floating point and single-precision floating point related operations, complete function.
3, support the scalar operation cellular construction of floating-point division in GPDSP of the present invention, the executed in parallel of three streamlines can be realized, be applicable to realizing within a processor; Floating-point operation part supports floating point multiplication addition, multiplication, addition, also has the related operations such as plural number, dot product, can meet the requirement of various application occasions.
4, support the scalar operation cellular construction of floating-point division in GPDSP of the present invention, multiplexing critical component 64*64 multiplier, can realize fixed point and floating-point mixing multiplication on the same hardware platform, area overhead is little.The present invention supports to design based on the division of SRT-8 algorithm, can realize 64 have symbol without symbol fixed point integer division, 32 have symbol without the instruction of symbol fixed point integer division, double-precision floating point division and single-precision floating point division.
Accompanying drawing explanation
Fig. 1 is scalar operation unit (SPE) of the present invention position view within a processor.
Fig. 2 is the topological structure schematic diagram of scalar operation unit (SPE) of the present invention.
Fig. 3 is the data path architecture schematic diagram of scalar operation unit (SPE) of the present invention.
Fig. 4 is the topological structure schematic diagram of the present invention's SMAC parts in embody rule example.
Fig. 5 is the structural representation of the present invention SMAC parts subelement FMAC in embody rule example.
Fig. 6 is the structural representation of the present invention SMAC parts subelement IMAC in embody rule example.
Fig. 7 is the structural representation of the present invention SMAC parts subelement FALU in embody rule example.
Fig. 8 is the structural representation of the present invention SMAC parts subelement FDIV in embody rule example.
Fig. 9 is the structural representation of the present invention's SMAC part reusing 64*64 multiplier in embody rule example.
Figure 10 is the topological structure schematic diagram of the present invention's IEU parts in embody rule example.
Figure 11 is the structural representation of the present invention SIEU parts subelement BP in embody rule example.
Figure 12 is the structural representation of the present invention SIEU parts subelement IALU in embody rule example.
Embodiment
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
As shown in Figure 1, be scalar operation unit (SPE) position within a processor of the present invention.Scalar operation cell S PE of the present invention is arranged in the scalar processing unit SC of processor, by receiving in instruction flow control unit the scalar operation class instruction distributing parts and distribute, delivering to function arithmetic element corresponding in SPE and performing after decoding.Also comprise scalar data memory access unit in scalar processing unit SC, it can realize the memory access flowing water station controls such as the decoding of scalar access instruction, address computation and data write back, and can also provide Data support for SPE; SPE provides the related operation such as address process, digital independent also can to scalar memory access unit simultaneously.
As shown in Figure 2, be the topological structure schematic diagram of scalar operation cell S PE of the present invention.Inner integrated three arithmetic units of SPE are first component SMAC1, second component SMAC2 and the 3rd parts SIEU, for supporting scalar basic operations respectively.Each scalar operation parts, corresponding VLIW performs a scalar instruction in bag, namely SPE comprise three can the streamline of executed in parallel.Meanwhile, in SPE, also comprise a scalar register file, for reading and the written-back operation of data.When SPE receives the scalar instruction distributing parts and distribute, judge it is belong to which arithmetic unit after decoding, the source operand address of correspondence and read request are delivered to scalar register file simultaneously, until instruction useful signal deliver to arithmetic unit after, the data obtained from scalar register file will be obtained, start to perform computing, finally result is write back scalar register file.Scalar register file in processor is arranged in scalar operation cell S PE, it can provide independently reading-writing port for first component SMAC1, second component SMAC2 and these three functional parts of the 3rd parts SIEU, to ensure that each functional part meets it and realizes instruction action required and keep count of, as first component SMAC1, second component SMAC2 have 3 read ports and 1 write port respectively, 3rd parts SIEU has 2 read ports, 1 write port.
As shown in Figure 3 and Figure 4, first component SMAC1, second component SMAC2 are isomorphism MAC arithmetic unit, each MAC arithmetic unit comprises four independently functional units, is respectively: floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV.Wherein, floating point multiplication addition unit FMAC and fixed point multiplicaton addition unit IMAC is a multiplexing 64*64 multiplier, makes the area of whole processor reduce to some extent.Floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, these four functional units of floating-point arithmetic logical block FALU, floating-point division unit F DIV are the separate units having identical data path, same period can only have a functional part to perform effective instruction, after executing, result selects logic by afterbody, exports corresponding destination address to; Namely same period can not start to perform or write back simultaneously, but can by software flow schedule parallel; Its operand is originated the data be mainly in scalar register file, can also from immediate Imm.Except floating-point arithmetic logical block FALU only has two operand instruction, other three unit all support 3-operand instruction.3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, and both are the separate units with identical data path, and both same periods can not start to perform or write back simultaneously, can be dispatched realize walking abreast by software flow; Its operand is originated the data be mainly in scalar register file, can also share register SVR, vector location Partial controll register VULCR, both only have two operand instruction from immediate Imm, mark vector.
As shown in Figure 5, floating point multiplication addition unit FMAC is the functional part processing multicycle complicated floating-point operation in processor calculating unit.These parts can realize 4 class floating point instructions: multiplying order, take advantage of add instruction, take advantage of after add instruction, add instruction.It adopts dynamic pipeline structure, and each cycle can flow out an instruction, and each flowing water of same clock period station can perform different operations.
In the present embodiment, floating point multiplication addition unit FMAC adopts double precision to be separated FMAC structure with single precision to rank shifting function to rank shifting function, the i.e. FMAC structure of path separation, object is set out to simplify hardware implementing algorithm and reducing the large bit wide data register between standing.Its general structure is made up of five parts: operand preparation module R, mantissa multiplier module X, double precision multiply-add operation path Y, single precision multiply-add operation path Z, the normalization processing module S that single double precision path is multiplexing.Wherein, operand preparation module R, according to instruction, completes floating-point single precision, the symbol of double precision operand, index, the separation of mantissa and the exception of input operand according to IEEE-754 standard and judges.The single precision multiplication result mantissa that mantissa multiplier module X is responsible for all instructions calculates.Double precision path Y completes the index jump calculating of double-precision operation and 161 afterbody CSA4:2 Partial product compressions calculated rank displacement, double-precision result mantissa of double precision operand C.Single precision path Z completes SIMD and takes advantage of and add, and SIMD takes advantage of and subtracts, the index jump of SIMD multiplication and dot product, complex multiplication operations calculates, mantissa exchanges and mantissa exchange after to rank.S module completes the operations such as the resultant mantissa calculating after to rank displacement, normalization process and index correction.
As shown in Figure 6, fixed point multiplicaton addition unit IMAC is the functional unit performing fixed point multiply accumulating in operation processing unit, can perform fixed point signed magnitude arithmetic(al), multiplying, multiply-add operation, take advantage of and subtract computing, the computing of MOV class.A large amount of fixed point plus-minus method and MOV computing is all there is in fixed-point algorithm operation and control process, object fixed point plus-minus method and MOV being integrated into fixed point MAC unit is the instruction slots in order to increase fixed point plus-minus method and MOV class in VLIW instruction, improves arithmetic speed.Realization fixed point and floating point multiplication addition, take advantage of and subtract instruction in, two operands inputting multiplier are 64 floating datas, 3-operand be 53 floating-point operation number, result is the floating-point operation number of 64; And fixed point takes advantage of the realization of plus-minus instruction, two operands of multiplier are 32 symbol/signless operand, the 3rd operand be one 64 have symbol/without symbolic operand, result be one 64 have symbol/signless destination operand.Also support the related operation of fixed point point sum plural number.
As shown in Figure 7, in the structure of floating-point arithmetic logical block FALU, according to practical function and the difference of instruction cycle, divide into three sub-execution modules: floating-point FALU short period instruction module, floating-point ALU conversion instruction module and floating-point ALU plus-minus method instruction module; Wherein, floating-point FALU short period instruction module comprises all monocyclic floating-point arithmetic logical orders, comprise being greater than of list/double precision, be less than and comparison of equalization instruction, ask the instruction of the index of list/double precision, mantissa and absolute value, calculate the instruction that the instruction of list/double precision Reciprocals sums inverse square root and single-precision floating point convert double-precision floating points to; Floating-point ALU conversion instruction module achieves between 2 cycle floating-points and fixed point and type conversion instructions between list/double-precision floating point, comprise that list/double-precision floating points is converted to integer instructions, list/double-precision floating points block convert integer to, (have symbol or without symbol) integer is converted to list/double-precision floating points, double-precision floating point converts single precision floating datum instruction to; Floating-point ALU plus-minus method instruction module achieves the list/double-precision floating point plus-minus method instruction in 4 cycles.
As shown in Figure 8, be the floating-point division unit F DIV in the present embodiment, wherein follow operand A and B of IEEE-754 floating-point format for two, perform divide operations between them, its computing can be divided into the following steps:
Whether be exception data, and arrange result data if S1. detecting operand.
S2. the sign bit of result of calculation value: two sign bit XORs.
S3. the exponential part of result of calculation, two index exponents are subtracted each other.
S4. mantissa is divided by: mantissa's low level of divisor is increased by 0 and makes it to increase to original one times to the mantissa's figure place expanding divisor with this, obtain the figure place result of accuracy limitations.
S5. result normalization: after mantissa is divided by, may need to move to left, reduce index simultaneously, and according to rounding mode, carries out mantissa result adjustment.
S6. abnormality detection: the generation overflow and the underflow that floating-point division are defined to two kinds of exceptions in IEEE-754: if the greatest exponential value that result exponent allows beyond precision, returns overflow abnormal; If result exponent is also less than the minimal index value of precision defined, return underflow exception.
The structural design of this floating-point division unit F DIV is based on the SIMD structure floating-point divide instructions of SRT-8 algorithm.Described SRT-8 instruction is divided into 01,10,11 call instruction, and the selection signal namely shown in figure selects the type that instruction performs; It performs double-precision floating point 1 ~ 6 (the two single-precision floating point 1 ~ 3 of SIMD) secondary division iterations respectively, double-precision floating point 7 ~ 12 (the two single-precision floating point 4 ~ 6 of SIMD) secondary division iterations, double-precision floating point 13 ~ 18 (the two single-precision floating point 7 ~ 9 of SIMD) secondary division iterations.Finally according to remainder and business's result of SRT-8 instruction output, and the call number of SRT-8 instruction, business's result of normalization double-precision floating point division or the two single-precision floating point division nonidentity operation precision of SIMD.
This structure, based on SRT-8 algorithm, utilizes hardware resource multiplex technique and iteration cutting technique, Parallel Implementation double-precision floating point division on same hardware configuration, the two single-precision floating point division function of SIMD.
SPE structural support floating-point division of the present invention.Scalar floating-point division parts (SFDIV) are the functional units performing the computing of scalar floating-point division in SMAC parts, mainly achieve four instructions, be respectively based on the double-precision floating point division iterations instruction of SRT-8 division algorithm, double-precision floating point division standardizing order, based on the two single-precision floating point division iterations instruction of SIMD of SRT-8 division algorithm and the two single-precision floating point division standardizing order of SIMD.In GPDSP, instruction adopts 40 codings, and its SIMD floating-point divide instructions collection comprises two double-precision floating point divide instruction (FSRT8D and FNORMD) and two two single-precision floating point divide instruction (FSRT8S32 and FNORMS32) of SIMD.Its instruction description realized is as shown in table 1:
Table 1 divider instruction type and function
Instruction name Beat number Coding figure place Command function Scalar instruction
SFSRT8D 7 40 Double-precision floating point division iterations Be
SFNORMD 2 40 Double-precision floating point division is standardized Be
SFSRT8S32 4 40 The two single-precision floating point division iterations of SIMD Be
SFNORMS32 2 40 The two single-precision floating point division normalization of SIMD Be
As shown in Figure 9, be the 64*64 multiplier architecture schematic diagram of SMAC part reusing in the present invention.In SMAC structure, consider that 64x64 position multiplying order takies very large-area situation in the design, the multiplier in SMAC parts adopts logic module reuse plan, and fixed point/floating-point Multiplexing module main body is four 32x32 position multipliers.Start to perform multiply operation through operand process after data input, according to different instructions, need the instruction according to distributing, advanced line operate number is selected and Bits Expanding process, is input to respectively by the operand handled well in 4 32x32 multipliers and carries out multiplying.Result divides fixed point results and floating point result, and then different according to instruction, result writes back register or is sent to the next stop.
As shown in Figure 10, be the topological structure schematic diagram of the 3rd parts IEU in embody rule example.3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, and both are the separate units with identical data path, and same period can only have a functional unit to perform effective instruction.After executing, result selects logic by afterbody, exports corresponding destination address to.Meanwhile, saturated and unsaturation situation when performing according to instruction, also can produce asserts signal, indicate this condition execution instruction.That is: can not start both same period to perform or write back simultaneously, can be dispatched by software flow and realize walking abreast; Its operand is originated the data be mainly in scalar register file, can also share register SVR, vector location Partial controll register VULCR, both only have two operand instruction from immediate Imm, mark vector.
As shown in figure 11, be the topological structure schematic diagram of bit processing unit BP in embody rule example.BP unit comprises three functional units, is 64 bit shift device unit shifter, bit processing unit Bitp and packing unwrapper unit PK respectively.The decoded signal of standing out from decoding and the source operand coming from register are received by three functional units, and starting computing immediately, final Output rusults unpacks selection and bit processing unit result the operation result of functional unit according to selection signal from displacement and packing and exports.If packing unpacks instruction, and need to sentence saturated, saturated mark can export to status register while result exports.
As shown in figure 12, be the topological structure schematic diagram of fixed point arithmetic logical block IALU in embody rule example.Fixed point arithmetic logical block IALU contains 8 submodules, and because the time delay of plus-minus method operation is maximum, in the structural design of IALU, the multiplexing addition of discord comparing class instruction, adopts the adder structure be separated.Logic selecting sequence placing it in afterbody, in order to further reduce time delay, the operation of saturation add-minus method being separated, by plus-minus method instruction, realization combined by saturated instruction and relevant control register.
Below be only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, should be considered as protection scope of the present invention.

Claims (10)

1. supporting a scalar operation cellular construction for floating-point division in GPDSP, it is characterized in that, comprising first component SMAC1, the second component SMAC2 as scalar operation parts and the 3rd parts SIEU, for supporting scalar basic operations; The corresponding VLIW of each described scalar operation parts performs a scalar instruction in bag.
2. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 1, it is characterized in that, also comprise scalar register file, for reading and the written-back operation of data; When receiving the scalar instruction distributing parts and distribute, judge it is belong to which scalar operation parts after decoding, the source operand address of correspondence and read request are delivered to scalar register file simultaneously, until instruction useful signal deliver to scalar operation parts after, the data obtained from scalar register file will be obtained, start to perform computing, finally result is write back scalar register file.
3. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 1, it is characterized in that, described first component SMAC1 and second component SMAC2 is isomorphism MAC arithmetic unit; Described MAC arithmetic unit comprises floating point multiplication addition unit FMAC, fixed point multiplicaton addition unit IMAC, floating-point arithmetic logical block FALU, floating-point division unit F DIV; Above-mentioned each functional unit is the separate unit having identical data path, and same period can only have a functional part to perform effective instruction, and after executing, result selects logic by afterbody, exports corresponding destination address to.
4. in GPDSP according to claim 3, support the scalar operation cellular construction of floating-point division, it is characterized in that, the multiplier of described floating point multiplication addition unit FMAC and fixed point multiplicaton addition unit IMAC adopts logic module reuse plan, and fixed point/floating-point Multiplexing module main body is four 32x32 position multipliers; Start to perform multiply operation through operand process after data input, according to different instructions, according to the instruction distributed, advanced line operate number is selected and Bits Expanding process, is input to respectively by the operand handled well in 4 32x32 multipliers and carries out multiplying; Result divides fixed point results and floating point result, and then different according to instruction, result writes back register or is sent to the next stop.
5. in GPDSP according to claim 3, support the scalar operation cellular construction of floating-point division, it is characterized in that, described floating point multiplication addition unit FMAC is used for processing multicycle complicated floating-point operation, adopt dynamic pipeline structure, each cycle can flow out an instruction, and each flowing water of same clock period station can perform different operations.
6. in GPDSP according to claim 5, support the scalar operation cellular construction of floating-point division, it is characterized in that, described floating point multiplication addition unit FMAC adopts double precision to be separated FMAC structure with single precision to rank shifting function to rank shifting function, comprising: the normalization processing module S that operand preparation module R, mantissa multiplier module X, double precision multiply-add operation path Y, single precision multiply-add operation path Z, single double precision path are multiplexing; Described operand preparation module R, according to instruction, completes floating-point single precision, the symbol of double precision operand, index, the separation of mantissa and the exception of input operand according to IEEE-754 standard and judges; The single precision multiplication result mantissa that described mantissa multiplier module X is responsible for all instructions calculates; Described double precision multiply-add operation path Y has been used for the afterbody CSA4:2 Partial product compressions that index jump calculates and 161 of double precision operand C calculate rank displacement, double-precision result mantissa of double-precision operation; Described single precision multiply-add operation path Z be used for SIMD take advantage of add, SIMD take advantage of subtract, SIMD multiplication and dot product, complex multiplication operations index jump calculate, mantissa exchange and mantissa exchange after to rank; The multiplexing normalization processing module S of described single double precision path has been used for that the resultant mantissa after to rank displacement calculates, normalization process and index correction operation.
7. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 3, it is characterized in that, described fixed point multiplicaton addition unit IMAC is used for performing fixed point multiply accumulating; Realization fixed point and floating point multiplication addition, take advantage of and subtract instruction in, two operands inputting multiplier are 64 floating datas, 3-operand be 53 floating-point operation number, result is the floating-point operation number of 64; And perform fixed point when taking advantage of plus-minus instruction, two operands of multiplier are 32 symbol/signless operand, the 3rd operand be one 64 have symbol/without symbolic operand, result be one 64 have symbol/signless destination operand.
8. in GPDSP according to claim 3, support the scalar operation cellular construction of floating-point division, it is characterized in that, described floating-point arithmetic logical block FALU comprises floating-point FALU short period instruction module, floating-point ALU conversion instruction module and floating-point ALU plus-minus method instruction module; Described floating-point FALU short period instruction module comprises all monocyclic floating-point arithmetic logical orders, comprise being greater than of list/double precision, be less than and comparison of equalization instruction, ask the instruction of the index of list/double precision, mantissa and absolute value, calculate the instruction that the instruction of list/double precision Reciprocals sums inverse square root and single-precision floating point convert double-precision floating points to.
9. according to the scalar operation cellular construction supporting floating-point division in the GPDSP in claim 1 ~ 8 described in any one, it is characterized in that, described 3rd parts SIEU comprises bit processing unit BP and fixed point arithmetic logical block IALU, both are the separate units with identical data path, and same period can only have a functional unit to perform effective instruction; After executing, result selects logic by afterbody, exports corresponding destination address to; Meanwhile, saturated and unsaturation situation when performing according to instruction, also can produce asserts signal, indicate this condition execution instruction.
10. support the scalar operation cellular construction of floating-point division in GPDSP according to claim 9, it is characterized in that, described BP unit comprises three functional units, is 64 bit shift device unit shifter, bit processing unit Bitp and packing unwrapper unit PK respectively; The decoded signal of standing out from decoding and the source operand coming from register are received by three functional units, and starting computing immediately, final Output rusults unpacks selection and bit processing unit result the operation result of functional unit according to selection signal from displacement and packing and exports; If packing unpacks instruction, and need to sentence saturated, saturated mark can export to status register while result exports.
CN201510718454.7A 2015-10-29 2015-10-29 Scalar operation unit structure supporting floating-point division method in GPDSP Pending CN105335127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510718454.7A CN105335127A (en) 2015-10-29 2015-10-29 Scalar operation unit structure supporting floating-point division method in GPDSP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510718454.7A CN105335127A (en) 2015-10-29 2015-10-29 Scalar operation unit structure supporting floating-point division method in GPDSP

Publications (1)

Publication Number Publication Date
CN105335127A true CN105335127A (en) 2016-02-17

Family

ID=55285703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510718454.7A Pending CN105335127A (en) 2015-10-29 2015-10-29 Scalar operation unit structure supporting floating-point division method in GPDSP

Country Status (1)

Country Link
CN (1) CN105335127A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709858A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 Single-instruction multi-thread staining processing unit structure for uniform staining graphic processing unit
CN107748674A (en) * 2017-09-07 2018-03-02 中国科学院微电子研究所 The information processing system of Bit Oriented granularity
CN108762720A (en) * 2018-06-14 2018-11-06 北京比特大陆科技有限公司 Data processing method, data processing equipment and electronic equipment
CN109426738A (en) * 2017-08-23 2019-03-05 中芯国际集成电路制造(上海)有限公司 A kind of hardware decoder and encryption method, electronic device
CN109783055A (en) * 2017-11-10 2019-05-21 瑞昱半导体股份有限公司 Floating point arithmetic circuit and method
CN111290790A (en) * 2020-01-22 2020-06-16 安徽大学 Conversion device for converting fixed point into floating point
CN112506468A (en) * 2020-12-09 2021-03-16 上海交通大学 RISC-V general processor supporting high throughput multi-precision multiplication
CN112835551A (en) * 2021-03-09 2021-05-25 上海壁仞智能科技有限公司 Data processing method for processing unit, electronic device, and computer-readable storage medium
CN113157247A (en) * 2021-04-23 2021-07-23 西安交通大学 Reconfigurable integer-floating point multiplier

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021832A (en) * 2007-03-19 2007-08-22 中国人民解放军国防科学技术大学 64 bit floating-point integer amalgamated arithmetic group capable of supporting local register and conditional execution
CN101093442A (en) * 2007-07-18 2007-12-26 中国科学院计算技术研究所 Carry verification device of floating point unit for multiply and summation, and multiplication CSA compression tree
CN101174200A (en) * 2007-05-18 2008-05-07 清华大学 5-grade stream line structure of floating point multiplier adder integrated unit
CN101986264A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN103984521A (en) * 2014-05-27 2014-08-13 中国人民解放军国防科学技术大学 Method and device for achieving SIMD structure floating point division in general-purpose digital signal processor (GPDSP)
CN103984522A (en) * 2014-05-27 2014-08-13 中国人民解放军国防科学技术大学 Method for achieving fixed point and floating point mixed division in general-purpose digital signal processor (GPDSP)

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021832A (en) * 2007-03-19 2007-08-22 中国人民解放军国防科学技术大学 64 bit floating-point integer amalgamated arithmetic group capable of supporting local register and conditional execution
CN101174200A (en) * 2007-05-18 2008-05-07 清华大学 5-grade stream line structure of floating point multiplier adder integrated unit
CN101093442A (en) * 2007-07-18 2007-12-26 中国科学院计算技术研究所 Carry verification device of floating point unit for multiply and summation, and multiplication CSA compression tree
CN101986264A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN103984521A (en) * 2014-05-27 2014-08-13 中国人民解放军国防科学技术大学 Method and device for achieving SIMD structure floating point division in general-purpose digital signal processor (GPDSP)
CN103984522A (en) * 2014-05-27 2014-08-13 中国人民解放军国防科学技术大学 Method for achieving fixed point and floating point mixed division in general-purpose digital signal processor (GPDSP)

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
吴珊 等: ""32 位DSP 通路分离乘加部件的设计与验证"", 《第18届全国半导体集成电路、硅材料学术会议》 *
宋博荣: ""X-DSP SIMD浮点算术逻辑部件的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
宋博荣: ""X-DSP SIMD浮点算术逻辑部件的设计与实现"", 《中国优秀硕士论文全文数据库 信息科技辑》 *
彭浩: ""X-DSP 64 位 SIMD 位处理部件及混洗单元的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
邓子椰: ""一种基于SRT-8算法的SIMD浮点除法器的设计与实现"", 《计算机工程与科学》 *
韩珊珊 等: ""基于定点与浮点复用的SIMD乘法器设计与实现"", 《第18届全国半导体集成电路、硅材料学术会议》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709858A (en) * 2016-12-12 2017-05-24 中国航空工业集团公司西安航空计算技术研究所 Single-instruction multi-thread staining processing unit structure for uniform staining graphic processing unit
CN109426738B (en) * 2017-08-23 2021-11-12 中芯国际集成电路制造(上海)有限公司 Hardware encryptor, encryption method and electronic device
CN109426738A (en) * 2017-08-23 2019-03-05 中芯国际集成电路制造(上海)有限公司 A kind of hardware decoder and encryption method, electronic device
CN107748674A (en) * 2017-09-07 2018-03-02 中国科学院微电子研究所 The information processing system of Bit Oriented granularity
CN107748674B (en) * 2017-09-07 2021-08-31 中国科学院微电子研究所 Information processing system oriented to bit granularity
CN109783055B (en) * 2017-11-10 2021-02-12 瑞昱半导体股份有限公司 Floating-point number arithmetic circuit and method
CN109783055A (en) * 2017-11-10 2019-05-21 瑞昱半导体股份有限公司 Floating point arithmetic circuit and method
CN108762720B (en) * 2018-06-14 2021-06-29 北京比特大陆科技有限公司 Data processing method, data processing device and electronic equipment
CN108762720A (en) * 2018-06-14 2018-11-06 北京比特大陆科技有限公司 Data processing method, data processing equipment and electronic equipment
CN111290790A (en) * 2020-01-22 2020-06-16 安徽大学 Conversion device for converting fixed point into floating point
CN111290790B (en) * 2020-01-22 2023-03-24 安徽大学 Conversion device for converting fixed point into floating point
CN112506468A (en) * 2020-12-09 2021-03-16 上海交通大学 RISC-V general processor supporting high throughput multi-precision multiplication
CN112835551A (en) * 2021-03-09 2021-05-25 上海壁仞智能科技有限公司 Data processing method for processing unit, electronic device, and computer-readable storage medium
CN113157247A (en) * 2021-04-23 2021-07-23 西安交通大学 Reconfigurable integer-floating point multiplier

Similar Documents

Publication Publication Date Title
CN105335127A (en) Scalar operation unit structure supporting floating-point division method in GPDSP
CN102262525B (en) Vector-operation-based vector floating point operational device and method
CN104111816B (en) Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN110168493B (en) Fused multiply-add floating-point operations on 128-bit wide operands
US8838664B2 (en) Methods and apparatus for compressing partial products during a fused multiply-and-accumulate (FMAC) operation on operands having a packed-single-precision format
US20090113169A1 (en) Reconfigurable array processor for floating-point operations
US20090198974A1 (en) Methods for conflict-free, cooperative execution of computational primitives on multiple execution units
JP4232838B2 (en) Reconfigurable SIMD type processor
CN103984521B (en) The implementation method and device of SIMD architecture floating-point division in GPDSP
US8996601B2 (en) Method and apparatus for multiply instructions in data processors
CN106951211A (en) A kind of restructural fixed and floating general purpose multipliers
US9996345B2 (en) Variable length execution pipeline
US20130282784A1 (en) Arithmetic processing device and methods thereof
US20100125621A1 (en) Arithmetic processing device and methods thereof
CN103984522A (en) Method for achieving fixed point and floating point mixed division in general-purpose digital signal processor (GPDSP)
CN104991757A (en) Floating point processing method and floating point processor
CN100367191C (en) Fast pipeline type divider
US8019805B1 (en) Apparatus and method for multiple pass extended precision floating point multiplication
Rupley et al. The floating-point unit of the jaguar x86 core
GB2511314A (en) Fast fused-multiply-add pipeline
CN105335128A (en) 64-bit fixed-point ALU (arithmetic logical unit) circuit based on three-stage carry lookahead adder in GPDSP
CN202331425U (en) Vector floating point arithmetic device based on vector arithmetic
Lasith et al. Efficient implementation of single precision floating point processor in FPGA
US20140052767A1 (en) Apparatus and architecture for general powering computation
Baesler et al. FPGA implementation of a decimal floating-point accurate scalar product unit with a parallel fixed-point multiplier

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160217