CN102520903B - Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points - Google Patents

Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points Download PDF

Info

Publication number
CN102520903B
CN102520903B CN201110415155.8A CN201110415155A CN102520903B CN 102520903 B CN102520903 B CN 102520903B CN 201110415155 A CN201110415155 A CN 201110415155A CN 102520903 B CN102520903 B CN 102520903B
Authority
CN
China
Prior art keywords
floating
data
bit
maximum
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110415155.8A
Other languages
Chinese (zh)
Other versions
CN102520903A (en
Inventor
王东琳
汪涛
尹磊祖
谢少林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Silang Technology Co ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201110415155.8A priority Critical patent/CN102520903B/en
Publication of CN102520903A publication Critical patent/CN102520903A/en
Application granted granted Critical
Publication of CN102520903B publication Critical patent/CN102520903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points, which comprises a parallel floating point data preprocessing unit, a Mask register, a reconfigurable comparator network and a result selecting unit. The parallel floating point data preprocessing unit is used for analyzing formats of received 512 bit vector data, respectively processing the data according to different data formats, outputting floating point data obtained after processing to the reconfigurable comparator network and outputting various zone bits obtained after processing to the result selecting unit. The Mask register is used for controlling data involved in maximum/minimum. The reconfigurable comparator network is used for inputting the floating point data received from the parallel floating point data preprocessing unit and values received from the Mask register, sequentially comparing the vector data and outputting obtained maximum/minimum results to the result selecting unit. The result selecting unit is used for receiving output of the reconfigurable comparator network and obtaining the final vector maximum/minimum results according to output of the various zone bits received from the parallel floating point data preprocessing unit.

Description

Support the configurable vector maximum/minimum value of the reconfigurable length of fixed and floating network
Technical field
The present invention relates to high-performance digital signal processor technical field, relate in particular to a kind of configurable vector maximum/minimum value of the reconfigurable length of fixed and floating network of supporting.
Background technology
Along with the develop rapidly of computing machine and facing Information Science, digital signal processor (DSP) technology is arisen at the historic moment, in the past 40 years, and DSP has obtained the development of advancing by leaps and bounds.In DSP, no matter how complex calculations finally all transfer to arithmetic element to realize, and therefore, arithmetic element is core component in whole DSP.In recent years, along with the development of digital processing field, the application of DSP is promoting the development of DSP, is the direction of its development for the DSP of specific area, particular demands.
Exist the operation of a large amount of maximums/minimum value in digital processing field, as the extraction of medium filtering, maximum/minimum pixel, Viterbi decoding, threshold test and accuracy detection etc.In tradition dsp processor, the operation of maximum/minimum value is all multiplexing existing fixed, floating-point operation arithmetic unit (ALU), although energy saving chip area like this has following limitation:
1) efficiency is low.General DSP maximum/minimum value instruction only can be compared the size of two data, in the time need to getting maximum/minimum value from a large amount of data, needs many instructions.
2) supported data granularity is little, generally only supports a kind of fixed point format or floating-point format.Taking ADITS20XS series DSP as example, although 32 fixed-point datas can be configured to 1/2/4 32/16/8 fixed-point data, support 8/16/32 fixed-point data form, but it can only get maximum/minimum value successively under 8/16 bit pattern from two data of correspondence, instead of get maximum/minimum value from 8/4 8/16 (2 32 fixed points can be configured to 88,4 16).
3) data amount check is not configurable.Only can from two data, get maximum/minimum value, number that can not flexible configuration data, can not allow multiple data participate in maximum/minimum operation.
In fields such as Modern Radar Signal processing, the processing of piggyback satellite image, compression of images, HD videos, exist a large amount of variable-sized, highdensity calculating, this has proposed more and more higher challenge to arithmetic element, and maximum/minimum operation is by a large bottleneck that becomes arithmetic element.More existing patents and document operate and have carried out some optimizations maximum/minimum value, but be all only confined to this aspect of the multiplexing ALU of scalar processor, and fixed, floating-point is maximum/minimum value operation separates completely, and the further distinctive maximum/minimum value of research vector processor network not.
Therefore, the similarity of, floating data fixed in arithmetic stage analysis, adopt Reconfiguration Technologies to realize varigrained fixed-point data comparison, in fixed-point data path, increase extra control circuit and realize the comparison of floating point data format data, adopt special register configuration to participate in the data amount check of maximum/minimum operation, utilize a set of proprietary configurable resource, the operation of special execute vector maximum/minimum value.A kind of vector maximum/minimum value network of supporting different grain size, different data format, different pieces of information number is provided, and to meet intensive vector maximum/minimum operation demand of specific area, is urgent problem of the present invention.
It should be noted that, the "/" in " maximum/minimum, 1/2/4,32/16/8 and 8/16/32 etc. " all refers to "or" herein, below just repeats no more.
Summary of the invention
(1) technical matters that will solve
In view of this, fundamental purpose of the present invention is to provide a kind of configurable vector maximum/minimum value of the reconfigurable length of fixed and floating network of supporting, support 8/16/32 to simplify single-precision floating-point data operation with/without symbol fixed-point data, 32 IEEE754 standards, participate in the data amount check of vector maximum/minimum value by register flexible configuration, the operation of execute vector maximum/minimum value, accelerate the execution speed of mass data maximum/minimum operation, to meet intensive vector maximum/minimum operation demand of specific area.
(2) technical scheme
For achieving the above object, the invention provides a kind of configurable vector maximum/minimum value of the reconfigurable length of fixed and floating network of supporting, comprise: parallel floating point data pretreatment unit 100, for the form of the 512 bit vector data A that receive is analyzed, and process respectively for different data layouts, the floating data obtaining after processing is exported to restructural comparator network 300, the various zone bits that obtain after processing are exported to result selected cell 400; Mask register 200 is 64 configurable Mask registers, for the data of control and participate in maximum/minimum value; Restructural comparator network 300, be used for using the value that is received from the floating data of parallel floating point data pretreatment unit 100 and be received from Mask register 200 as input, according to the value of Opcode operational code, FBS option data form, U option, M option and Mask register, vector data is compared successively, the maximum obtaining/little value result is exported to result selected cell 400; And result selected cell 400, for receiving the output of restructural comparator network 300, export according to the various zone bits that are received from parallel floating point data pretreatment unit 100 the final vector maximum/minimum value result obtaining.
In such scheme, described parallel floating point data pretreatment unit 100 is analyzed the form of the 512 bit vector data A that receive, and process respectively for different data layouts, comprise: parallel floating point data pretreatment unit 100 is analyzed the form of the 512 bit vector data A that receive, in the time that these 512 bit vector data A is floating point data format, these floating datas are carried out to particular value analysis, obtain improper floating data zone bit NaNFlag, just infinite zone bit PosInfFlag and negative infinite zone bit NegInfFlag, and negative floating data is carried out to complementary operation, in the time that these 512 bit vector data A is fixed-point data form, directly export fixed-point data.
In such scheme, the data of described Mask register 200 control and participate in maximum/minimum value, comprise: Mask register 200 is 64 configurable registers, directly control these 512 bit vector data A, each of Mask register 200 is controlled respectively a byte of these 512 bit vector data A; In the time that M option is effective, only having Mask register corresponding positions is that the unit of 1 instruction just participates in the operation of maximum/minimum value; In the time that M option does not exist, the operation of maximum/minimum value is not affected by Mask register, and these 512 bit vector data A all participates in the operation of maximum/minimum value.
In such scheme, described restructural comparator network 300 is made up of 8/16/32 bit comparator cascade, and each comparer obtains corresponding maximum/minimum value according to input operation code.
In such scheme, described restructural comparator network 300 is made up of multiple 32 bit comparators and 1 16 bit comparator and 18 bit comparator, and except corresponding data input, the input of each comparer also has control signal U, M, FBS; In the time being operated in 32 single-precision floating points or 32 fixed point modes, 512 bit vector data, by 4 layer of 32 bit comparator network, obtain 32 maximum/minimum value; In the time being operated in 16 half-word fixed point modes, the output of the 4th layer of 32 bit comparator enters 1 16 bit comparator, obtains 16 maximum/minimum value; In the time being operated in octet pattern, the output of the 5th layer of 16 bit comparator enters 18 bit comparator, obtains 8 maximum/minimum value; Comparer resource required during by 8 fixed-point data forms is adding corresponding control signal, realizes 16/32 fixed point, and 32 IEEE754 standards are simplified the restructural of single-precision floating point several data form.
In such scheme, in described result selected cell 400,4-1 selector switch MUX3614 is subject to the control of FBS option, and in the time of FBS=2 ' b00, network is operated under 32 fixed point modes, and MUX3614 directly exports 32 results of restructural comparer 300; Work as FBS=2 ' b10, network is operated under 8 fixed point modes, and MUX3 614 directly exports 8 results of restructural comparer 300; Work as FBS=2 ' b11, network is operated under 16 fixed point modes, and MUX3 614 directly exports 16 results of restructural comparer 300; Work as FBS=2 ' b01, network is operated in 32 and simplifies single-precision floating-point data form, selects the final vector maximum/minimum value of output according to various floating-point zone bit signals; If NaNFlag=1, illustrates in 16 floating datas and has NaN floating data, the final output of maximum/minimum value network 32 ' hFFFF_FFFF; If PosInfFlag=1 and Opcode=1 represent the just infinite data of existence in floating data, and are operated in maximal value Max pattern, the just infinite 32 ' h7F80_0000 of maximum/minimum value network output; If it is negative infinite that NegInfFlag=1 and Opcode=0 represent to exist in floating data, and be operated under minimum M in pattern, the negative infinite 32 ' hFF80_0000 of maximum/minimum value network output; In other cases, 32 floating datas of output restructural comparator network 300.
(3) beneficial effect
Configurable vector maximum/the minimum value of the reconfigurable length of this support fixed and floating provided by the invention network, adopt Reconfiguration Technologies to realize varigrained fixed-point data comparison, in fixed-point data path, increase extra steering logic unit and realize floating data comparison, adopt register flexible configuration to participate in the vector data number of maximum/minimum value operation, the operation of execute vector maximum/minimum value, can support that 8/16/32 with/without symbol fixed-point data, 32 IEEE754 standards are simplified single-precision floating-point data operation, vector data length is subject to Mask register configuration simultaneously, accelerate towards the execution speed of the intensive vector maximum/minimum operation of specific area, simplify software programming complexity, improve code density, improve processor execution maximum/minimum value and obtained operation efficiency and dirigibility.
Brief description of the drawings
Fig. 1 is the schematic diagram according to the support fixed and floating restructural of the embodiment of the present invention, the configurable vector maximum/minimum value of data length network.
Fig. 2 is the cut-away view according to the parallel floating point data pretreatment unit 100 of the embodiment of the present invention.
Fig. 3 is the cut-away view according to the restructural comparator network 300 of the embodiment of the present invention.
Fig. 4 is the support different data format 8 bit comparator cut-away views according to the embodiment of the present invention.
Fig. 5 is according to the restructural of the embodiment of the present invention, supports different data format 32 bit comparator cut-away views.
Fig. 6 is result selected cell 400 cut-away views according to the embodiment of the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Principal feature of the present invention is: data layout restructural, data length are configurable.The following explanation of agreement symbol in description process: maximum/minimum value network instruction is described as B=Max/MinA{ (M) } { (U) } { (FBS) }; B is 32 scalar datas, and A is 512 bit vector data; Opcode refers to operational code, and with 1 binary representation, 0 represents minimum M in, and 1 represents maximal value Max; Mask is 64 configurable registers, the octet of every difference control vector register A; M represents that the operation of maximum/minimum value is affected by Mask register, represents that the operation of Mask register pair maximum/minimum value is without impact in the time that M option does not exist; U indicates without the Symbol Option; FBS represents data layout, with 2 binary representations." 00 " represents 32 fixed points, and " 01 " represents that 32 are simplified single-precision floating point, and " 10 " represent octet, and " 11 " represent 16 half-words.OpaValid/OpbValid represents that operand Opa/Opb is effective, is subject to the impact of M option, and in the time that M option is effective, OpaValid/OpbValid is that 1 instruction operand Opa/Opb is effective; In the time that M option does not exist, OpaValid/OpbValid is invalid, and operand Opa/Opb is effectively permanent.
In the embodiment of the present invention, suppose that A is 512 bit vector data, but the present invention is applicable to the occasion that any A is 32 multiple bit wides, the width of Mask register and the length relation of vectorial A are LengthMask=LengthA/8.
As shown in Figure 1, Fig. 1 is the schematic diagram according to the support fixed and floating restructural of the embodiment of the present invention, the configurable vector maximum/minimum value of data length network, this network comprises the parallel floating point data pretreatment unit 100, Mask register 200, restructural comparator network 300 and the result selected cell 400 that connect successively, wherein:
Parallel floating point data pretreatment unit 100, for the form of the 512 bit vector data A that receive is analyzed, in the time that these 512 bit vector data A is floating point data format (FBS=2 ' b01), these floating datas are carried out to particular value analysis, obtain the distinctive mark positions such as improper floating data zone bit (NaNFlag), just infinite zone bit (PosInfFlag), negative infinite zone bit (NegInfFlag), and negative floating data is carried out to complementary operation; In the time that these 512 bit vector data A is fixed-point data form, parallel floating point data pretreatment unit 100 is directly exported fixed-point data; Floating data after treatment is exported to restructural comparator network 300 by parallel floating point data pretreatment unit 100, and result selected cell 400 is exported to in various distinctive marks position.
Mask register 200, for configurable 64 bit registers, for the data of control and participate in maximum/minimum value.64 Mask registers are directly controlled 512 bit vector data A, and each of Mask register 200 is a byte of control vector data A respectively.In the time that M option is effective, only having Mask register corresponding positions is that the unit of 1 instruction just participates in the operation of maximum/minimum value; In the time that M option does not exist, the operation of maximum/minimum value is not affected by Mask register, and 512 bit vector data all participate in the operation of maximum/minimum value.
Restructural comparator network 300, for receiving value through parallel floating point data pretreatment unit 100 data after treatment and Mask register as input, according to the value of Opcode operational code, FBS option data form, U option, M option and Mask register, vector data is compared successively, obtain maximum/minimum value, and maximum/value result is exported to result selected cell 400.
Result selected cell 400, for receiving the output of restructural comparator network 300, export according to the zone bit such as NaNFlag, PosInfFlag, NegInfFlag that is received from parallel floating point data pretreatment unit 100 the final vector maximum/minimum value result obtaining.
Below in conjunction with Fig. 2 to Fig. 6, introduce in detail support fixed and floating restructural provided by the invention, the configurable maximum/minimum value of data length network.The present invention is aspect specific implementation, comprise parallel, restructural, configurable design, wherein parallel floating point data pretreatment unit 100 is realized parallel by 16 parts of identical hardware configurations, 48 bit comparators are reconfigurable into 1/2/4 32/16/8 bit comparator, the flexible configuration of the value witness vector length of Mask register.
As shown in Figure 2, Fig. 2 is the cut-away view according to the parallel floating point pretreatment unit of the embodiment of the present invention, and this parallel floating point pretreatment unit comprises 110,16 the identical floating-point zone bit generation units 120 of vectorial resolving cell, the vectorial floating point result zone bit generation unit 140 that connect successively.
Vector resolving cell 110, for 512 bit vector data A of input are resolved into 16 32 scalar floating data A_0-A_15, and delivers to 16 identical floating-point zone bit generation units 120 successively.
Described floating-point zone bit generation unit 120, for each 32 single-precision floating-point datas are analyzed, judges whether it is the special circumstances such as NaN, infinity, and negative floating data is carried out to complementary operation.In floating-point zone bit generation unit 120,32 floating datas are carried out sign bit, index, mantissa's separation by sign bit, index, mantissa's separative element 121, its Exponential is delivered to index comparator 122 and is carried out index comparison, in the time that being 0, exports index Exp_0=1, in the time that being 255, exports index Exp_255=1, when index is worth for other, Exp_0 and Exp_255 are 0.23 mantissa (there is no the floating-point coefficient through implicit 1 expansion) that mantissa's comparer 123 receiving symbol positions, index, mantissa's separative element 121 are exported, in the time that 23 mantissa are 0, Manti_0=1, Manti_0=0 when other mantissa.31 indexes, mantissa are extended to after 32 through high-order 0 simultaneously, enter negate circuit 124 and MUX0 selector switch 128 by other passage, the control signal of MUX0 selector switch 128 is from the sign bit of floating-point, in the time that sign bit is 1, MUX0 selects index, the mantissa after output negate, otherwise exports index, the mantissa before negate.MUX1 selector switch 129 receives the output and 0 of MUX0 selector switch 128 as its input, its control signal is from the output Exp_0 of index comparator 123, in the time of Exp_0=1, regard floating data as 0, MUX1 selector switch 129 exports 0, in other situations, export normal 32 non-zero, MUX1 selector switch 129 obtains pretreated 32 floating data DisFloat_0.Signal Exp_255 and Mant_0 enter NaN decision logic unit 130 and infinite decision logic unit 126, work as Exp_255=1, and when Mant_0=0, NaN decision logic unit 130 is exported NaN_0=1, and expression floating data is NaN; Work as Exp_255=1, when Mant_0=1, infinite decision logic unit 126 is output as 1, represents that floating data is infinite.Just infinite decision logic unit 131 and negative infinite decision logic unit 132 receive the output of infinite decision logic unit 126 and floating-point-sign position as input, further generate just infinite zone bit PosInf_0, and negative infinite zone bit NegInf_0.So far, the special symbol zone bit of each floating data all generates and obtains pretreated floating data.
Vector floating point result zone bit generation unit 140 obtains whole vectorial floating-point zone bit and vector floating-point data according to the zone bit of each floating data and each floating-point data after treatment.Vector NaN zone bit generation unit 141 is 16 inputs or door, receives each floating-point zone bit NaN_0-NaN_15, and output vector NaN is masked as NaNFlag.The just infinite zone bit generation unit 142 of vector and the negative infinite zone bit generation unit 143 of vector are 16 inputs or door, receive respectively PosInf_0-PosInf_15 and NegInf_0-NegInf-15 as input, obtain the negative infinite zone bit NegInfFlag of the just infinite zone bit PosInfFlag of vector and vector.The floating data DisFloat_0-DisFloat_15 that vector combining unit 144 obtains pre-service carries out combination, obtains 512 bit vector data.
So far, floating point vector pre-service completes, and obtains zone bit and the pretreated floating data of floating point vector.In the time being operated in fixed point mode, floating data pretreatment unit, by other path, directly obtains fixed point vector data without any pre-service, enters next unit.
Described Mask register 200 is 64 configurable Mask registers of user.Each of 64 Mask registers is indicated respectively the octet of 512 vector datas, in the time that Mask register is effective (existence of M option), only has the vector data that Mask register corresponding positions is 1 just to participate in the operation of maximum/minimum value; Otherwise all vector datas all participate in the operation of maximum/minimum value.
As shown in Figure 3, Fig. 3 is the cut-away view according to the restructural comparator network 300 of the embodiment of the present invention.Described restructural comparator network 300 is made up of 8/16/32 bit comparator cascade, and each comparer obtains corresponding maximum/minimum value according to input operation code (maximum/minimum value).Obtain 32 maximum/minimum value by 4 grade of 32 bit comparator, 16 bit comparators of the 5th grade of increase can obtain 16 maximum/minimum value, and the 6th grade increases by 8 bit comparators and can obtain 8 maximum/minimum value.Can realize 8/16/32 by a set of comparer resource simplifies the comparison of single-precision floating-point data, and obtains final maximum/minimum value with/without symbol fixed point, 32.
Comparator network 300 is made up of multiple 32 bit comparators and 1 16 bit comparator and 18 bit comparator, and except corresponding data input, the input of each comparer also has the control signals such as U, M, FBS.In the time being operated in 32 single-precision floating points or 32 fixed point modes, 512 bit vector data, by 4 layer of 32 bit comparator network, obtain 32 maximum/minimum value; In the time being operated in 16 half-word fixed point modes, the output of the 4th layer of 32 bit comparator enters 1 16 bit comparator, obtains 16 maximum/minimum value; In the time being operated in octet pattern, the output of the 5th layer of 16 bit comparator enters 18 bit comparator, obtains 8 maximum/minimum value.Comparer resource required during by 8 fixed-point data forms is adding corresponding control signal, has realized 16/32 fixed point, and 32 IEEE754 standards are simplified the restructural of the several data forms such as single-precision floating point.
16/32 bit comparator is formed by 8 basic bit comparator cascadings, and as shown in Figure 4, Fig. 4 is the support different data format 8 bit comparator cut-away views according to the embodiment of the present invention.Totalizer 412 is calculated the difference of input Opa and Opb the first logical circuit 413 receives the output of totalizer and generates Opa and Opb difference result zone bit with/without symbol data option U: carry flag (CF), overflow indicator (OF), negative mark (NF).The second logical circuit 414 receives the zone bits such as CF, OF, NF, and the control signal such as U, M, Opcode, OpaValid, OpbValid, bears results and selects signal Sel[1:0]; Sel[1:0] take 2 scale-of-two, " 00 " represents that output is invalid, and " 01 " represents should select to export Opa, and " 10 " expression output should be selected Opb, and " 11 " represent that two data equate, still export Opa at this.Sel[0] enter MUX selector switch (417) and select the output of comparer as control signal; OpaValid, OpbValid and M option, by an other path, enter the effective generation unit of comparator results (416) and produce the effective control signal ResultValid of comparator results, wherein simultaneously Re sultValid = OpaValid | OpbValid | M ‾ .
The second logical circuit 144 is born results and is selected signal Sel by some combinational logics, and the generation of Sel meets following truth table:
Form 18 bit comparator Sel generate truth table
Note: Opcode is 0 expression minimum value, and 1 represents maximal value; X represents desirable any value
As shown in Figure 5, Fig. 5 is that this 32 bit comparator is made up of basic 8 bit comparators shown in 4 Fig. 4 and some steering logics according to the restructural of the embodiment of the present invention, support different data format 32 bit comparator cut-away views.4 basic 8 bit comparators (511,512,513,514) concurrent working, the ResultValid signal (ResultValid0-ResultValid3) of 48 bit comparators is spliced into 32 result useful signal ResultValid[3:0 by the 4th logical circuit 516], wherein ResultValid[3:0]={ ResultValid3, ResultValid2, ResultValid1, ResultValid0}.The 3rd logical circuit 515 receives the Sel signal (Sel0-Sel3) of 4 basic 8 bit comparators and FBS option and produces the selection signal Sel[3:0 of 32 bit comparators], Sel[3:0] each indicate respectively each byte of 32 bit comparators from Opa or Opb, being to select Opa at 1 o'clock, is to select Opb at 0 o'clock.Sel[3:0] generation meet following truth table:
Form 2 32 bit comparator Sel formation logic tables
As shown in Figure 6, Fig. 6 is result selected cell 400 cut-away views according to the embodiment of the present invention.Described result selected cell 400 obtains final vector maximum/minimum value result according to 8/16/32 maximum/minimum value result of sign bit NaNFlag, the PosInfFlag of floating-point special circumstances, NegInfFlag and restructural comparator network.With reference to Fig. 1, the various floating-point zone bit signals (NaNFlag, PosInfFlag, NegInfFlag) that result selected cell 400 is exported according to 8/16/32 result of output of restructural comparer 300 and parallel floating point data pretreatment unit 100 are selected the final vector maximum/minimum value of output.4-1 selector switch MUX3 614 is subject to the control of FBS option, and in the time of FBS=2 ' b00, network is operated under 32 fixed point modes, and MUX3 614 directly exports 32 results of restructural comparer 300; Work as FBS=2 ' b10, network is operated under 8 fixed point modes, and MUX3 614 directly exports 8 results of restructural comparer 300; Work as FBS=2 ' b11, network is operated under 16 fixed point modes, and MUX3 614 directly exports 16 results of restructural comparer 300; Work as FBS=2 ' b01, network is operated in 32 IEEE754b standards and simplifies single-precision floating-point data form, selects the final vector maximum/minimum value of output according to various floating-point zone bit signals (NaNFlag, PosInfFlag, NegInfFlag).If NaNFlag=1, illustrates in 16 floating datas and has NaN floating data, the final output of maximum/minimum value network 32 ' hFFFF_FFFF; If PosInfFlag=1 and Opcode=1 represent the just infinite data of existence in floating data, and are operated in maximal value (Max) pattern, the just infinite 32 ' h7F80_0000 of maximum/minimum value network output; If NegInfFlag=1 and Opcode=0 represent to exist and bear infinite 32 ' hFF80_0000 in floating data, and are operated under minimum value (Min) pattern, the output of maximum/minimum value network is negative infinite; In other cases, 32 floating datas of output restructural comparer 300.
Support fixed and floating restructural based on shown in above-mentioned Fig. 1 to Fig. 6, the configurable vector maximum/minimum value of vector length network, the present invention also provides a kind of fixed point restructural, the configurable comparative approach of data length, it is characterized in that, comprises the following steps:
8/16/32 fixed-point data restructural, 8 fixed-point datas are elementary cell; 28 fixed-point datas and corresponding steering logic reconfiguration of cell are 16 fixed-point datas; 48 fixed-point datas and corresponding steering logic reconfiguration of cell are 32 fixed-point datas;
Fixed and floating restructural, floating data determines whether to complete complementary operation according to sign bit situation; When sign bit is 1, floating-point index, mantissa negate respectively, and it is 1 constant that sign bit keeps, and floating-point-sign position, index, mantissa form 32 new bit data; When sign bit is 0, floating-point remains unchanged.Floating-point can multiplexing fixed-point data path after sign bit is processed;
Data length is configurable, realizes by Mask register; Each of Mask register is controlled respectively certain bit field of data, configures by the value of configuration Mask register the data length that participates in computing.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (12)

1. support the configurable vector maximum/minimum value of the reconfigurable length of a fixed and floating network, it is characterized in that, comprising:
Parallel floating point data pretreatment unit (100), for the form of the 512 bit vector data A that receive is analyzed, and process respectively for different data layouts, the floating data obtaining after processing is exported to restructural comparator network (300), the various zone bits that obtain after processing are exported to result selected cell (400);
Mask register (200) is 64 configurable Mask registers, for the data of control and participate in maximum/minimum value comparison;
Restructural comparator network (300), the value that is used for being received from the floating data of parallel floating point data pretreatment unit (100) and be received from Mask register (200) is as input, according to the value of Opcode operational code, FBS option data form, U option, M option and Mask register, vector data is compared successively, the maximum obtaining/little value result is exported to result selected cell (400); And
Result selected cell (400), be used for receiving the output of restructural comparator network (300), according to the various zone bits that are received from parallel floating point data pretreatment unit (100), output obtains final vector maximum/minimum value result.
2. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, described parallel floating point data pretreatment unit (100) is analyzed the form of the 512 bit vector data A that receive, and process respectively for different data layouts, comprising:
Parallel floating point data pretreatment unit (100) is analyzed the form of the 512 bit vector data A that receive, in the time that these 512 bit vector data A is floating point data format, these floating datas are carried out to particular value analysis, obtain improper floating data zone bit NaNFlag, just infinite zone bit PosInfFlag and negative infinite zone bit NegInfFlag, and negative floating data is carried out to complementary operation; In the time that these 512 bit vector data A is fixed-point data form, directly export fixed-point data.
3. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 2 network, it is characterized in that, described parallel floating point data pretreatment unit (100) comprises the vectorial resolving cell (110), 16 identical floating-point zone bit generation units (120) and the vectorial floating point result zone bit generation unit (140) that connect successively, wherein:
Vector resolving cell (110), for these 512 bit vector data A of input is resolved into 16 32 scalar floating data A_0-A_15, and delivers to 16 identical floating-point zone bit generation units (120) successively;
Floating-point zone bit generation unit (120), for each 32 single-precision floating-point datas are analyzed, judges whether it is NaN or infinity, and negative floating data is carried out to complementary operation;
Vector floating point result zone bit generation unit (140), for obtaining whole vectorial floating-point zone bit and vector floating-point data according to the zone bit of each floating data and each floating-point data after treatment.
4. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 3 network, it is characterized in that, in described floating-point zone bit generation unit (120), 32 floating datas are carried out sign bit, index, mantissa's separation by sign bit, index, mantissa's separative element (121), its Exponential is delivered to index comparator (122) and is carried out index comparison, in the time that being 0, exports index Exp_0=1, in the time that being 255, exports index Exp_255=1, when index is worth for other, Exp_0 and Exp_255 are 0; 23 mantissa of mantissa's comparer (123) receiving symbol position, index, mantissa's separative element (121) output, in the time that 23 mantissa are 0, Manti_0=1, Manti_0=0 when other mantissa; 31 indexes, mantissa are extended to after 32 through high-order 0 simultaneously, enter negate circuit (124) and MUX0 selector switch (128) by other passage, the control signal of MUX0 selector switch (128) is from the sign bit of floating-point, in the time that sign bit is 1, MUX0 selects index, the mantissa after output negate, otherwise exports index, the mantissa before negate; MUX1 selector switch (129) receives the output and 0 of MUX0 selector switch (128) as its input, its control signal is from the output Exp_0 of index comparator (123), in the time of Exp_0=1, regard floating data as 0, MUX1 selector switch (129) output 0, in other situations, export normal 32 non-zero, MUX1 selector switch (129) obtains pretreated 32 floating data DisFloat_0; Signal Exp_255 and Mant_0 enter NaN decision logic unit (130) and infinite decision logic unit (126), work as Exp_255=1, when Mant_0=0, NaN decision logic unit (130) output NaN_0=1, expression floating data is NaN; Work as Exp_255=1, when Mant_0=1, infinite decision logic unit (126) is output as 1, represents that floating data is infinite; Just infinite decision logic unit (131) and negative infinite decision logic unit (132) receive the output of infinite decision logic unit (126) and floating-point-sign position as input, further generate just infinite zone bit PosInf_0, and negative infinite zone bit NegInf_0; So far, the special symbol zone bit of each floating data all generates and obtains pretreated floating data; .
5. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 3 network, it is characterized in that, in described vectorial floating point result zone bit generation unit (140), vector NaN zone bit generation unit (141) is 16 inputs or door, receive each floating-point zone bit NaN_0-NaN_15, output vector NaN is masked as NaNFlag; The negative infinite zone bit generation unit (143) of the just infinite zone bit generation unit of vector (142) and vector is 16 inputs or door, receive respectively PosInf_0-PosInf_15 and NegInf_0-NegInf-15 as input, obtain the negative infinite mark NegInfFlag of the just infinite zone bit PosInfFlag of vector and vector; The floating data DisFloat_0-DisFloat_15 that vector combining unit (144) obtains pre-service carries out combination, obtains 512 bit vector data.
6. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, is characterized in that, the data of described Mask register (200) control and participate in maximum/minimum value, comprising:
Mask register (200) is 64 configurable registers, directly controls these 512 bit vector data A, and each of Mask register (200) is controlled respectively a byte of these 512 bit vector data A; In the time that M option is effective, only having Mask register corresponding positions is that the unit of 1 instruction just participates in the operation of maximum/minimum value; In the time that M option does not exist, the operation of maximum/minimum value is not affected by Mask register, and these 512 bit vector data A all participates in the operation of maximum/minimum value.
7. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, described restructural comparator network (300) is made up of 8/16/32 bit comparator cascade, and each comparer obtains corresponding maximum/minimum value according to input operation code.
8. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 7 network, it is characterized in that, in described restructural comparator network (300), obtain 32 maximum/minimum value by 4 grade of 32 bit comparator, 16 bit comparators of the 5th grade of increase can obtain 16 maximum/minimum value, and the 6th grade increases by 8 bit comparators and can obtain 8 maximum/minimum value.
9. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 7 network, it is characterized in that, in described restructural comparator network (300), can realize 8/16/32 by a comparer resource simplifies the comparison of single-precision floating-point data, and obtains final maximum/minimum value with/without symbol fixed point, 32 IEEE7544 standards.
10. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, described restructural comparator network (300) is made up of multiple 32 bit comparators and 1 16 bit comparator and 18 bit comparator, except corresponding data input, the input of each comparer also has control signal U, M, FBS; In the time being operated in 32 single-precision floating points or 32 fixed point modes, 512 bit vector data, by 4 layer of 32 bit comparator network, obtain 32 maximum/minimum value; In the time being operated in 16 half-word fixed point modes, the output of the 4th layer of 32 bit comparator enters 1 16 bit comparator, obtains 16 maximum/minimum value; In the time being operated in octet pattern, the output of the 5th layer of 16 bit comparator enters 18 bit comparator, obtains 8 maximum/minimum value; Comparer resource required during by 8 fixed-point data forms is adding corresponding control signal, realizes 16/32 fixed point, and 32 IEEE754 standards are simplified the restructural of single-precision floating point several data form.
Configurable vector maximum/the minimum value of the 11. reconfigurable length of support fixed and floating according to claim 10 network, it is characterized in that, described 16/32 bit comparator is formed by 8 bit comparator cascadings, and totalizer (412) is calculated the difference of input Opa and Opb the first logical circuit (413) receives the output of totalizer and generates Opa and Opb difference result zone bit with/without symbol data option U: carry flag CF, overflow indicator OF, negative mark NF; The second logical circuit (414) receiving flag position CF, OF, NF, and control signal U, M, Opcode, OpaValid, OpbValid, bear results and select signal Sel[1:0]; Sel[1:0] take 2 scale-of-two, " 00 " represents that output is invalid, and " 01 " represents should select to export Opa, and " 10 " expression output should be selected Opb, and " 11 " represent that two data equate, still export Opa at this; Sel[0] enter MUX selector switch (417) and select the output of comparer as control signal; OpaValid, OpbValid and M option, by an other path, enter the effective generation unit of comparator results (416) and produce the effective control signal ResultValid of comparator results, wherein simultaneously
Configurable vector maximum/the minimum value of the 12. reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, in described result selected cell (400), 4-1 selector switch MUX3 (614) is subject to the control of FBS option, in the time of FBS=2 ' b00, network is operated under 32 fixed point modes, and MUX3 (614) directly exports 32 results of restructural comparer (300); Work as FBS=2 ' b10, network is operated under 8 fixed point modes, and MUX3 (614) directly exports 8 results of restructural comparer (300); Work as FBS=2 ' b11, network is operated under 16 fixed point modes, and MUX3 (614) directly exports 16 results of restructural comparer (300); Work as FBS=2 ' b01, network is operated in 32 and simplifies single-precision floating-point data form, selects the final vector maximum/minimum value of output according to various floating-point zone bit signals; If NaNFlag=1, illustrates in 16 floating datas and has NaN floating data, the final output of maximum/minimum value network 32 ' hFFFF_FFFF; If PosInfFlag=1 and Opcode=1 represent the just infinite data of existence in floating data, and are operated in maximal value Max pattern, the just infinite 32 ' h7F80_0000 of maximum/minimum value network output; If it is negative infinite that NegInfFlag=1 and Opcode=0 represent to exist in floating data, and be operated under minimum M in pattern, the negative infinite 32 ' hFF80_0000 of maximum/minimum value network output; In other cases, 32 floating datas of output restructural comparator network (300).
CN201110415155.8A 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points Active CN102520903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110415155.8A CN102520903B (en) 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110415155.8A CN102520903B (en) 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points

Publications (2)

Publication Number Publication Date
CN102520903A CN102520903A (en) 2012-06-27
CN102520903B true CN102520903B (en) 2014-07-23

Family

ID=46291846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110415155.8A Active CN102520903B (en) 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points

Country Status (1)

Country Link
CN (1) CN102520903B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3087473A1 (en) 2013-12-23 2016-11-02 Intel Corporation Instruction and logic for identifying instructions for retirement in a multi-strand out-of-order processor
CN105511836A (en) * 2016-01-22 2016-04-20 成都三零嘉微电子有限公司 High-speed and multimode modulo addition operation circuit
CN106775579B (en) * 2016-11-29 2019-06-04 北京时代民芯科技有限公司 Floating-point operation accelerator module based on configurable technology
CN107340992B (en) * 2017-06-15 2020-07-28 西安微电子技术研究所 Fixed point data screening circuit
CN107301031B (en) * 2017-06-15 2020-08-04 西安微电子技术研究所 Normalized floating point data screening circuit
CN111381805A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111381804A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111381806A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111260044B (en) * 2018-11-30 2023-06-20 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111381875B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111381802B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN117519637A (en) * 2018-12-28 2024-02-06 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN110888992A (en) * 2019-11-15 2020-03-17 北京三快在线科技有限公司 Multimedia data processing method and device, computer equipment and readable storage medium
CN113094020B (en) * 2021-03-15 2023-03-28 西安交通大学 Hardware device and method for quickly searching maximum or minimum N values of data set

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301137A (en) * 1990-07-23 1994-04-05 Mitsubishi Denki Kabushiki Kaisha Circuit for fixed point or floating point arithmetic operations
KR20090117451A (en) * 2008-05-09 2009-11-12 연세대학교 산학협력단 Reconfigurable arithmetic unit for performing fixed point operation or floating point operation based on input data type
CN101847087A (en) * 2010-04-28 2010-09-29 中国科学院自动化研究所 Reconfigurable transverse summing network structure for supporting fixed and floating points

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301137A (en) * 1990-07-23 1994-04-05 Mitsubishi Denki Kabushiki Kaisha Circuit for fixed point or floating point arithmetic operations
KR20090117451A (en) * 2008-05-09 2009-11-12 연세대학교 산학협력단 Reconfigurable arithmetic unit for performing fixed point operation or floating point operation based on input data type
CN101847087A (en) * 2010-04-28 2010-09-29 中国科学院自动化研究所 Reconfigurable transverse summing network structure for supporting fixed and floating points

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
数字信号处理器中高性能可重构加法器设计;马鸿等;《计算机工程》;20090620;第35卷(第12期);1-12 *
马鸿等.数字信号处理器中高性能可重构加法器设计.《计算机工程》.2009,第35卷(第12期),

Also Published As

Publication number Publication date
CN102520903A (en) 2012-06-27

Similar Documents

Publication Publication Date Title
CN102520903B (en) Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points
TWI515649B (en) Reducing power consumption in a fused multiply-add (fma) unit responsive to input data values
TWI650652B (en) Operation control indicator cache
TWI405126B (en) Microprocessors and methods for executing instruction
CN102103479B (en) Floating point calculator and processing method for floating point calculation
US8577948B2 (en) Split path multiply accumulate unit
CN102722352B (en) Booth multiplier
CN101847087B (en) Reconfigurable transverse summing network structure for supporting fixed and floating points
CN102495719B (en) Vector floating point operation device and method
CN108255777B (en) Embedded floating point type DSP hard core structure for FPGA
JP6535231B2 (en) Apparatus and method for efficient division execution
US20150193202A1 (en) Multi-input and binary reproducible, high bandwidth floating point adder in a collective network
CN102576302B (en) Microprocessor and method for enhanced precision sum-of-products calculation on a microprocessor
CN111538473B (en) Posit floating point number processor
CN110688086A (en) Reconfigurable integer-floating point adder
CN104778026A (en) High-speed data format conversion part with SIMD and conversion method
Ritpurkar et al. Design and simulation of 32-Bit RISC architecture based on MIPS using VHDL
Boersma et al. The POWER7 binary floating-point unit
CN105335128B (en) 64 fixed point ALU circuitries based on three-level carry lookahead adder in GPDSP
CN102411490A (en) Instruction set optimization method for dynamically reconfigurable processors
CN111290790A (en) Conversion device for converting fixed point into floating point
Giri et al. Pipelined floating-point arithmetic unit (fpu) for advanced computing systems using fpga
Nolting et al. Optimizing VLIW-SIMD processor architectures for FPGA implementation
He et al. Multiply-add fused float point unit with on-fly denormalized number processing
Taher et al. Development of Customized MIPS_32 Core Processor for Image Processing Applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20171129

Address after: 102412 Beijing City, Fangshan District Yan Village Yan Fu Road No. 1 No. 11 building 4 layer 402

Patentee after: Beijing Si Lang science and Technology Co.,Ltd.

Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No.

Patentee before: Institute of Automation, Chinese Academy of Sciences

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 201306 building C, No. 888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Shanghai Silang Technology Co.,Ltd.

Address before: 102412 room 402, 4th floor, building 11, No. 1, Yanfu Road, Yancun Town, Fangshan District, Beijing

Patentee before: Beijing Si Lang science and Technology Co.,Ltd.