WO2021120851A1 - Dispositif de traitement en virgule flottante et procédé de traitement de données - Google Patents

Dispositif de traitement en virgule flottante et procédé de traitement de données Download PDF

Info

Publication number
WO2021120851A1
WO2021120851A1 PCT/CN2020/123736 CN2020123736W WO2021120851A1 WO 2021120851 A1 WO2021120851 A1 WO 2021120851A1 CN 2020123736 W CN2020123736 W CN 2020123736W WO 2021120851 A1 WO2021120851 A1 WO 2021120851A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
exponent
mantissa
register
result
Prior art date
Application number
PCT/CN2020/123736
Other languages
English (en)
Chinese (zh)
Inventor
张磊
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2021120851A1 publication Critical patent/WO2021120851A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers

Definitions

  • This application relates to the field of data communication, and in particular to a floating-point processing device and a data processing method.
  • the current processor mainly includes a floating-point processing unit and a fixed-point processing unit.
  • the fixed-point processing unit has low processing accuracy but low hardware resource overhead, while the floating-point processing unit has high processing accuracy but high hardware resource overhead.
  • the amount of data processed by processors has gradually increased, and the requirements for data accuracy have gradually increased.
  • traditional fixed-point processing has been unable to meet the accuracy requirements, and there is an urgent need for a Improve processing accuracy but controllable hardware resource overhead.
  • block floating point is used to improve the accuracy of the data processing process under the condition that the hardware resource overhead is small.
  • the basic principle of block floating point is shown in Figure 1.
  • the data blocks in a data segment have different mantissas M0. ⁇ M7, but there is only one exponent E.
  • This exponent E corresponds to the exponent of the largest absolute value data in the data block.
  • the dynamic range of the data in a data block is not much different, because the entire data segment of the floating point of the data block has only An exponent, so there is no need to pack it with the data mantissa.
  • block floating point can retain more mantissas and higher precision than pure floating point.
  • the existing processor only includes one data register.
  • the data register stores both the exponent data of the block floating point and the mantissa data of the block floating point. When the block floating point is processed, the register overflow problem will be caused. The accuracy is poor.
  • This application provides a floating-point processing device and a data processing method.
  • An embodiment of the present application provides a floating-point processing device, which includes:
  • a normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit the normalization unit is respectively connected to the exponent register and the mantissa register, and the multiplication and accumulation unit is connected to the mantissa register; wherein, the The normalization unit is configured to perform a normalization operation on data to generate floating-point data, the normalization unit sends the exponential data of the floating-point data to the exponent register for storage, and the normalization unit converts the The mantissa data of the block floating point data is sent to the mantissa register for storage; the multiply and accumulate unit is configured to perform multiplication operations according to the mantissa data.
  • An embodiment of the present application provides a data processing method, which includes:
  • the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit; the multiplication result data set is processed by the normalization unit At least one multiplication result in the multiplication result is normalized, and the generated result mantissa data and result exponent data are stored in the mantissa register and the exponent register respectively; the block floating-point exponent of the result exponent data in the exponent register is determined, and according to the The block floating point exponent processes the resulting mantissa data in the mantissa register to generate block floating point data.
  • Fig. 1 is an example diagram of block floating point data in the prior art
  • FIG. 2 is a schematic structural diagram of a floating-point processing device provided by an embodiment of the application.
  • FIG. 3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • FIG. 5 is a flowchart of steps of a data processing method provided by an embodiment of this application.
  • FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • FIG. 7 is an example diagram of a data processing method provided by an embodiment of this application.
  • FIG. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the normalization unit 10 respectively Connected to the exponent register 11 and the mantissa register 12, the multiplying and accumulating unit 13 is connected to the mantissa register 12; wherein, the normalizing unit 10 is configured to perform a normalization operation on data to generate floating-point data, The normalization unit 10 sends the exponent data of the floating-point data to the exponent register 11 for storage, and the normalization unit 10 sends the mantissa data of the floating-point data to the mantissa register 12 for storage. Storage; The multiplication and accumulation unit 13 is configured to perform multiplication operations according to the mantissa data.
  • the normalization unit 10 can be a processor module that processes data, and can process input data into floating-point data, and the data processed by the normalization unit 10 can be fixed-point data and floating-point data.
  • the normalization unit 10 The data can be converted according to the bit width requirement of the mantissa in the mantissa register 12, and the fixed-point data or floating-point data can be converted into floating-point data that meets the 12-bit width requirement of the mantissa register.
  • the fixed-point data 6023 can be normalized Unit 10 is converted into a floating point number 6.023E3.
  • the exponent register 11 and the mantissa register 12 can be small storage data storing binary data, and can temporarily store data and operation structures involved in the operation.
  • the exponent register 11 and the mantissa register 12 can be specific sequential logic circuits, and the exponent register 11 can The exponent data of the floating point data is stored, and the mantissa register 12 can store the mantissa data of the floating point data.
  • the floating-point processing device in the embodiment of the present application may include multiple exponent registers 11 and mantissa registers 12.
  • the mantissa register 12 may include N bit widths of 40-bit mantissa registers 12 and 2N bit widths of 8-bit exponent registers 11, where the number of N may be related to the number of data processed at the same time in the embodiment of the present application .
  • the lower 32 bits in the mantissa register 12 can be data bits for storing mantissa data, and the upper 8 bits can be extended bits.
  • the mantissa register 12 can store the mantissa of floating-point data or fixed-point data.
  • the mantissa data and exponent data stored in the mantissa register 12 and the exponent register 11 have a corresponding relationship, which belongs to
  • the storage address of the mantissa data in the mantissa register 12 may be the same as the storage address of the exponent data in the exponent register 11.
  • the multiplication and accumulation unit 12 may also be included.
  • the multiplication and accumulation unit 12 may perform multiplication and multiplication and addition operations on the mantissa data. For example, when the multiplication and accumulation unit 12 is an 18-bit multiplication and accumulation unit, The mantissa data can be multiplied to generate a 32-bit multiplication result and multiply and accumulate to generate a 40-bit multiply and accumulate result.
  • the technical solution of the embodiment of the present application constitutes a floating-point processing device by a normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit.
  • the normalization unit is connected to the exponent register and the mantissa register respectively, and the multiplication and accumulation unit is connected to the mantissa.
  • Register connection the normalization unit can normalize the data to generate floating point data, the exponent data of the floating point data is sent to the exponent register for storage, the mantissa data of the floating point data is sent to the mantissa register for storage, and the multiplying and accumulating unit is set according to the mantissa The data is multiplied to realize the processing of floating-point data.
  • the exponent data and mantissa data are stored in the exponent register and the mantissa register respectively, which reduces the overflow probability of the register, increases the storage bit width of the mantissa data, and improves the floating-point data
  • the data accuracy of the index and the independent setting of the index register allow the operation of index data to be decoupled from the hardware structure and data calculation, which simplifies the design difficulty of the hardware structure.
  • FIG. 3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • the floating-point processing device in an embodiment of the application further includes a mantissa calculation unit 14, an exponent calculation unit 15, and an association update unit 16.
  • the mantissa calculation unit 14 and the first level cache 17 are respectively connected to the mantissa register 12, the exponent calculation unit 15 is connected to the exponent register 11, and the association update unit 16 is respectively connected to The exponent register 11 is connected to the first level cache 17; wherein the mantissa calculation unit 14 is configured to perform calculations based on the mantissa data; the exponent calculation 15 unit is configured to perform calculations based on the exponent data; The associated update unit 16 is set to store the exponent data overflowed in the exponent register 11 in the first level cache 17 when the exponent register 11 overflows; the first level cache 17 is also set to store the overflow of the mantissa register 12 Mantissa data.
  • the exponent register 11 and the mantissa register 12 may also be connected to the exponent calculation unit 15 and the mantissa calculation unit 14, respectively, and the mantissa calculation unit 14 and the exponent calculation unit 15 may be an arithmetic logic unit (ALU).
  • ALU arithmetic logic unit
  • the mantissa calculation unit 14 can complete the calculation of the mantissa data
  • the exponent calculation unit 15 can implement the calculation of the exponent data, where the exponent calculation
  • the calculations performed by the unit 15 and the mantissa calculation unit 14 may include, but are not limited to, addition, subtraction, multiplication, division, and operation, or operation, not operation, exclusive OR operation, displacement operation, and so on.
  • the associated update unit 16 may store the overflow exponent data in the exponent register 11 to the first level cache 17. Since the mantissa data in the mantissa register 12 has the same storage address as the exponent data in the exponent register 11, when the exponent data in the exponent register 11 When overflowing, the overflowed exponent data can be stored in the first level cache 17. Since the overflowed exponent data has an address conflict with the overflowed mantissa data, the storage address of the exponent data can be converted, for example, the corresponding offset can be increased or decreased It can prevent the exponent data from overflowing when stored in the first level cache 17, and the exponent data and mantissa data are stored in the same location, causing data conflicts.
  • the first level cache 17 can be specifically L1Cache, which can be integrated inside the processor, and can be set to temporarily store data during data processing.
  • the first level cache 17 can store the mantissa data and exponent register overflowed from the mantissa register 12 11 Overflowing index data.
  • FIG. 4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • the association update unit is embodied.
  • the association The update unit may also include a zero-level buffer 161 and a bypass forwarding buffer 162.
  • a zero-level cache 161 and an address conversion backup buffer 162 the zero-level cache 161 is connected to the index register 11; the address conversion backup buffer 162 is respectively connected to the zero-level cache 161 and the first-level cache 17; wherein The zero-level buffer 161 is set to store the exponential data overflowed from the exponent register 11; the address conversion back-up buffer 162 is set to store the exponent data when the zero-level buffer 161 generates overflow exponent data
  • the address conversion is stored in the first level cache 17, and the exponent data after the address conversion does not conflict with the address of the mantissa data stored in the first level cache 17.
  • the zero-level cache 161 may be a cache responsible for caching index data, and can support byte addressing.
  • the index data can be sent to the first-level cache 17 for storage.
  • the address in the register and the address of the mantissa data in the mantissa register are the same, and the exponent data overflowed by the zero-level buffer 161 through the bypass forwarding buffer 162 can be address converted, where the bypass forwarding buffer 162 can be set to the physical address and
  • the buffer where the virtual address is converted can convert the physical address of the exponent data to a virtual address to prevent the exponent data overflowed from the zero-level buffer 161 from being stored in the first-level buffer 17 to cause an address conflict.
  • An independent overflow update mechanism is implemented through the zero-level buffer 161 and the bypass forwarding buffer 162, which facilitates block floating point calculations without considering the overflow of exponential data.
  • a data storage unit 19 and a data loading unit 18 are also connected between the primary cache 17 and the mantissa register 12; wherein, the data storage unit 19, Is configured to store the mantissa data overflowed by the mantissa register 12 into the first-level cache 17; the data loading unit 18 is configured to load the mantissa data stored in the first-level cache 17 to the mantissa register 12 .
  • the mantissa calculation unit 14 is also connected to the exponent register 11, and the mantissa calculation unit 14 is configured to obtain the exponent data stored in the exponent register 11, and compare the mantissa data in the mantissa register 12 according to the exponent data. Perform shift operations.
  • the data storage unit 19 and the data loading unit 18 can also be used to transfer the mantissa data between the first level cache 17 and the mantissa register 12.
  • it can also be used in the zero level cache 161 and the exponent register.
  • a data storage unit and a data loading unit are arranged between 11 to realize the storage and loading of index data.
  • the mantissa calculation unit 14 may also be connected to the exponent register 11 to obtain exponent data stored in the exponent register 11, and the mantissa calculation unit 14 may perform calculations on the mantissa register 12 according to the obtained exponent data. The corresponding mantissa data is shifted.
  • FIG. 5 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the embodiment of the application may use data processing in a processor.
  • the method may be executed by the floating-point processing device in the embodiment of the application.
  • the device can be implemented in software and/or hardware, and generally can be integrated in a processor.
  • the data processing method in the embodiment of the present application includes:
  • Step 101 Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
  • first data set and the second data set may be data sets participating in operations, the first data set and the second data set may include at least one piece of data, and the data may be fixed-point data, floating-point data, and block floating-point data.
  • the acquired first data set and the second data set can be stored as fixed-point data or mantissa data in the mantissa register. It can be understood that if floating-point numbers exist in the first data set and the second data set Data and block floating point data can be obtained through the normalization unit and the corresponding mantissa data can be stored in the mantissa register. After the first data set and the second data set are stored in the mantissa register, the result data set can be multiplied by multiplying and accumulating unit technology. Exemplarily, suppose there are two arrays X and Y of length n, and the bit width of each element in the array can be 16 bits.
  • Step 102 Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
  • the multiplication result can be a constituent element of the multiplication result data set, and the multiplication result can be a full-precision result, which does not meet the requirements of normalization.
  • the multiplication result is 123.E6, which is normalized in the embodiment of this application.
  • the requirement is 1.23E8, and the normalization unit is required to normalize the multiplication result in the multiplication result data set to change the accuracy of the mantissa data of the multiplication result.
  • the multiplication result in the multiplication result data set can be normalized, the multiplication result that does not meet the normalization requirements can be changed, the accuracy of the multiplication result can be adjusted, and the multiplication result after the normalization operation can be adjusted.
  • the result mantissa data is stored in the mantissa register, and the result exponent data is stored in the exponent register.
  • the 32-bit result multiplication result Z[i] can be read from the mantissa register to perform a normalization operation to obtain 16-bit mantissa data M[i] and the corresponding 8-bit exponent data E[ i].
  • M[i] is stored back to the mantissa register
  • E[i] can be stored in the exponent register
  • the indexes of M[i] and E[i] in the mantissa register and the exponent register are in one-to-one correspondence.
  • i can be any value from 0 to the number of multiplication results in the multiplication result data set.
  • the mantissa storage address of the result mantissa data in the mantissa register is the same as the exponent storage address of the result exponent data in the exponent register.
  • the result mantissa data can be stored in association with the result exponent data, and the mantissa storage address of the result mantissa data in the mantissa register can be the same as the exponent storage address of the result exponent data in the result register, which may include
  • the physical address is the same or the logical address is the same, or the physical address of the mantissa storage address is the same as the logical address of the exponent storage address, or the logical address of the mantissa storage address is the same as the physical address of the exponent storage address.
  • Step 103 Determine the block floating point exponent of the result exponent data in the exponent register, and process the result mantissa data in the mantissa register according to the block floating point exponent to generate block floating point data.
  • the block floating-point exponent may be the block floating-point exponent when each multiplication result in the multiplication result data set is converted into a block floating-point data format, and the block floating-point exponent may be the maximum value in the result exponent data corresponding to each multiplication result.
  • the result exponent with the largest value can be found in the exponent register as the block floating-point exponent. Since the size of the result exponent data of each multiplication result is not the same, the result exponent data needs to be shared with the same floating-point exponent, and each multiplication is required.
  • the precision of the result mantissa data corresponding to the result can be adjusted, the result mantissa in the mantissa register can be shifted to adjust the precision, and the multiplication result in the multiplication result data set after the adjusted precision can be used as block floating point data.
  • the first data set and the second data set participating in data processing are stored in a mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the multiplication result data Set the normalization operation in the normalization unit, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, determine the block floating-point exponent by the result exponent data in the exponent register, and compare the result The mantissa data is shifted to convert the multiplication result data set into block floating point data.
  • the technical solution of the embodiment of the present application increases or decreases the data bit width of the mantissa data by separately setting the mantissa register and the exponent register, thereby improving the data accuracy and reducing the data.
  • the coupling degree of the processing process can reduce the design complexity of the hardware.
  • FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the data processing method of the embodiment of the application specifically includes:
  • Step 201 Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
  • the first data set and the second number set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication.
  • the result data set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication.
  • the result data set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication.
  • Step 202 Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
  • the multiplication result in the multiplication result data set can be normalized to compress the accuracy of the mantissa data stored in the mantissa register, and new result mantissa data and result knowledge data can be obtained, and the result mantissa data can be changed And the result exponent data are stored in the mantissa register and exponent register respectively.
  • Step 203 The exponent calculation unit selects the result exponent data with the largest value in the exponent register as the block floating-point exponent.
  • the index calculation unit may sequentially read the result index data stored in the index register, may compare the read result index data, and may use the result index data with the largest value as the block floating point index.
  • Step 204 The mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent.
  • the block floating-point exponent can be used as the shift basis to shift each result mantissa data in the mantissa register, so that the result exponent data corresponding to each result mantissa data is the same as the block floating-point exponent, for example, the result exponent data
  • the difference between the value of the block floating-point exponent is 1, and the resultant mantissa data in the mantissa register can be moved one bit to the right.
  • Step 205 Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.
  • the data composed of the shifted result mantissa data and the floating point exponent may be used as the block floating point data, and the exponent data of each result mantissa data may be the block floating point exponent.
  • FIG. 7 is an example diagram of a data processing method provided by an embodiment of the application.
  • the normalization unit performs normalization compression processing on the 64-bit multiplication result output by the multiplication and accumulation unit .
  • You can get 4 32-bit result mantissa data (M0 ⁇ M4) and 4 independent result index data (E0 ⁇ E3), which are stored in the exponent register and the mantissa register respectively.
  • the exponent in the exponent register is calculated
  • the unit can simultaneously compare the maximum values of E0 to E3, and find the largest result index data as the block floating point index.
  • the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data.
  • the technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.
  • Fig. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the shift of the result mantissa data is embodied.
  • the data processing method of the embodiment of the application includes:
  • Step 301 Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
  • Step 302 Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
  • Step 303 The index calculation unit reads the result index data with the lowest address in the index register as the current maximum result index.
  • the address can be the storage address corresponding to the result index data in the index register
  • the lowest address can be the address with the smallest value corresponding to the storage address
  • the current maximum result index can be the result index with the largest value in the result index data read from the index register. data.
  • the index calculation unit may read the result index data from the index register according to the minimum address requirement, and may use the read result index data as the current maximum result index. It is understandable that when the index calculation unit reads the current maximum result index, it can also read the result index data with the highest address or read any result index data as the current maximum result index.
  • Step 304 The index calculation unit sequentially reads the result index data remaining in the index register and compares it with the current maximum result index. If the result index data is greater than the current maximum result index, then the The result index data is used as the current maximum result index.
  • the index calculation unit can sequentially read the remaining result index data in the index register. Each time a result index data is read, the read result index data can be compared with the current maximum result index. The result index data obtained is greater than the current maximum result index, and the read result index data can be used as the current maximum result index.
  • the difference between the current maximum result exponent and the result exponent data can be stored in the exponent register to replace the original There are stored result index data.
  • Step 305 When the exponent calculation unit finishes reading the result exponent data in the exponent register, use the current maximum result exponent as the block floating point exponent.
  • the current maximum result index can be used as the block floating index. Since the current maximum result index has been compared with all the result index data, the current maximum result index can be The maximum value among all result index data.
  • Step 306 The mantissa calculation unit sequentially reads the result mantissa data and reads the result exponent data corresponding to the result mantissa data in the exponent register.
  • the storage addresses of the result mantissa data and the result exponent data are the same, and the corresponding result index data can be read in the exponent register according to the storage address of the result mantissa data.
  • Step 307 Use the difference between the block floating point exponent and the result exponent data as the number of shift bits.
  • the difference between the result exponent data and the block floating point exponent may be determined by the exponent calculation unit or the mantissa calculation unit as the shift bit, where the shift bit may be the bit shifted by the mantissa data.
  • Step 308 Perform a shift operation on the result mantissa data according to the shift mantissa.
  • each result bit data in the mantissa register can be shifted to the right by the bit width corresponding to the shift bit.
  • Step 309 Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.
  • the data loading unit continuously reads X[i] and Y[i] from the memory to the mantissa register;
  • the bit width is 32bit;
  • the normalization unit synchronously reads the 32-bit multiplication result Z[i] from the mantissa register for normalization operation, and obtains the 16-bit result mantissa data M[i] and the corresponding 8-bit result exponent data E[ i].
  • the result mantissa data M[i] can be stored in the mantissa register, and the result exponent data E[i] is stored in the exponent register, and the result mantissa data M[i] and the result exponent data E[i] are in the mantissa register There is a one-to-one correspondence with the index in the index register.
  • the index calculation unit monitors the dynamic range of the result index data E[i] stored in the index register by the normalization unit, that is, the maximum value of the search result index data E[i].
  • the index calculation unit can obtain the maximum value Emax of the result index data E[i] sequence , Store Emax as independent data in the index register.
  • the result mantissa data M[i] stored in the final mantissa register is the mantissa after the block floating point, and the Emax stored in the exponent register is the block floating point exponent.
  • Steps (1) to (4) are executed in a pipeline, that is, the subsequent steps do not have to wait for the entire sequence to complete the calculation before starting the next level of calculation; steps (6) to (7) are also executed in a pipeline, that is, the subsequent steps do not have to Wait for the entire sequence to complete the calculation before starting the next level of calculation.
  • Step (5) cannot be performed in a pipeline, so it involves storing the results M[i] and E[i] of step (4).
  • step (4) When the output of the calculation structure in step (4) is completed, the processor retrieves the result mantissa data M[i] stored in the first-level cache through the data loading unit and stores it in the mantissa register again, and at the same time the data in the exponent register
  • the loading unit will automatically fetch and replay the result exponent data E[i] stored in the zero-level cache to the exponent register according to the data loading unit operation of the mantissa register, and then perform the operation of step (6).
  • the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data.
  • the technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.
  • user terminal encompasses any suitable type of wireless user equipment, such as a mobile phone, a portable data processing device, a portable web browser, or a vehicle-mounted mobile station.
  • the various embodiments of the present application can be implemented in hardware or dedicated circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although the present application is not limited thereto.
  • Computer program instructions can be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or Object code.
  • ISA instruction set architecture
  • the block diagram of any logic flow in the drawings of the present application may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions.
  • the computer program can be stored on the memory.
  • the memory can be of any type suitable for the local technical environment and can be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical storage devices and systems (digital multi-function optical discs) DVD or CD) etc.
  • Computer-readable media may include non-transitory storage media.
  • the data processor can be any type suitable for the local technical environment, such as but not limited to general-purpose computers, special-purpose computers, microprocessors, digital signal processors (DSP), application-specific integrated circuits (ASIC), programmable logic devices (FGPA) And processors based on multi-core processor architecture.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FGPA programmable logic devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Nonlinear Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

La présente demande concerne un dispositif de traitement en virgule flottante et un procédé de traitement de données. Le dispositif comprend : une unité de normalisation, un registre d'index, un registre de mantisse et une unité de multiplication-accumulation ; l'unité de normalisation est connectée séparément au registre d'index et au registre de mantisse, et l'unité de multiplication-accumulation est connectée au registre de mantisse ; l'unité de normalisation est configurée pour effectuer une opération de normalisation sur des données pour générer des données en virgule flottante, l'unité de normalisation envoie des données d'index des données en virgule flottante au registre d'index pour stockage, et l'unité de normalisation envoie des données de mantisse des données en virgule flottante au registre de mantisse pour stockage ; l'unité de multiplication-accumulation est configurée pour effectuer une opération de multiplication en fonction des données de mantisse. Selon les solutions techniques des modes de réalisation de la présente demande, les données d'index et les données de mantisse des données en virgule flottante sont respectivement stockées dans le registre d'index et le registre de mantisse, de telle sorte que la probabilité de débordement des mantisses dans un processus d'exploitation en virgule flottante soit réduite, que la largeur de bit des données de mantisse soit étendue, et que la précision de données puisse être améliorée.
PCT/CN2020/123736 2019-12-17 2020-10-26 Dispositif de traitement en virgule flottante et procédé de traitement de données WO2021120851A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911304990.7A CN112988110A (zh) 2019-12-17 2019-12-17 一种浮点处理装置和数据处理方法
CN201911304990.7 2019-12-17

Publications (1)

Publication Number Publication Date
WO2021120851A1 true WO2021120851A1 (fr) 2021-06-24

Family

ID=76343611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123736 WO2021120851A1 (fr) 2019-12-17 2020-10-26 Dispositif de traitement en virgule flottante et procédé de traitement de données

Country Status (2)

Country Link
CN (1) CN112988110A (fr)
WO (1) WO2021120851A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202333043A (zh) * 2022-02-02 2023-08-16 國立清華大學 一種浮點數運算方法以及相關的算術單元
EP4343684A1 (fr) * 2022-06-24 2024-03-27 Calterah Semiconductor Technology (Shanghai) Co., Ltd Procédé et appareil de traitement de données, et capteur radar

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292750A1 (en) * 2008-05-22 2009-11-26 Videolq, Inc. Methods and apparatus for automatic accuracy- sustaining scaling of block-floating-point operands
CN110050256A (zh) * 2016-12-07 2019-07-23 微软技术许可有限责任公司 用于神经网络实现的块浮点

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292750A1 (en) * 2008-05-22 2009-11-26 Videolq, Inc. Methods and apparatus for automatic accuracy- sustaining scaling of block-floating-point operands
CN110050256A (zh) * 2016-12-07 2019-07-23 微软技术许可有限责任公司 用于神经网络实现的块浮点

Also Published As

Publication number Publication date
CN112988110A (zh) 2021-06-18

Similar Documents

Publication Publication Date Title
US10514912B2 (en) Vector multiplication with accumulation in large register space
US11175891B2 (en) Systems and methods to perform floating-point addition with selected rounding
US11036504B2 (en) Systems and methods for performing 16-bit floating-point vector dot product instructions
US20060179092A1 (en) System and method for executing fixed point divide operations using a floating point multiply-add pipeline
CN112639722A (zh) 加速矩阵乘法的装置和方法
WO2021120851A1 (fr) Dispositif de traitement en virgule flottante et procédé de traitement de données
TWI493453B (zh) 提高精確度積和演算之微處理器及其視頻解碼裝置、其方法及其電腦程式產品
TW201730752A (zh) 用於有狀態壓縮和解壓縮操作的硬體加速器及方法
EP3394729B1 (fr) Unité fonctionnelle basse à multiplication-ajout fusionnés (fma)
WO2010051298A2 (fr) Instruction et logique de réalisation d’une détection de distance
CN111813371B (zh) 数字信号处理的浮点除法运算方法、系统及可读介质
EP4276608A2 (fr) Appareils, procédés et systèmes pour des instructions de produit à points de matrice 8 bits à virgule flottante
US20160253235A1 (en) Recycling Error Bits in Floating Point Units
US20210279038A1 (en) Using fuzzy-jbit location of floating-point multiply-accumulate results
US20050172210A1 (en) Add-compare-select accelerator using pre-compare-select-add operation
TWI774093B (zh) 用於轉換資料類型的轉換器、晶片、電子設備及其方法
US9280316B2 (en) Fast normalization in a mixed precision floating-point unit
US10503473B1 (en) Floating-point division alternative techniques
US20210182067A1 (en) Apparatuses, methods, and systems for instructions to multiply floating-point values of about one
CN108292219B (zh) 浮点(fp)加法低指令功能单元
CN117785113B (zh) 计算装置及方法、电子设备和存储介质
US9141586B2 (en) Method, apparatus, system for single-path floating-point rounding flow that supports generation of normals/denormals and associated status flags
US11875154B2 (en) Apparatuses, methods, and systems for instructions to multiply floating-point values of about zero
US11847450B2 (en) Apparatuses, methods, and systems for instructions to multiply values of zero
CN116382618A (zh) 单精度浮点运算装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20901527

Country of ref document: EP

Kind code of ref document: A1