WO2021120851A1 - Floating point processing device and data processing method - Google Patents

Floating point processing device and data processing method Download PDF

Info

Publication number
WO2021120851A1
WO2021120851A1 PCT/CN2020/123736 CN2020123736W WO2021120851A1 WO 2021120851 A1 WO2021120851 A1 WO 2021120851A1 CN 2020123736 W CN2020123736 W CN 2020123736W WO 2021120851 A1 WO2021120851 A1 WO 2021120851A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
exponent
mantissa
register
result
Prior art date
Application number
PCT/CN2020/123736
Other languages
French (fr)
Chinese (zh)
Inventor
张磊
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2021120851A1 publication Critical patent/WO2021120851A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers

Definitions

  • This application relates to the field of data communication, and in particular to a floating-point processing device and a data processing method.
  • the current processor mainly includes a floating-point processing unit and a fixed-point processing unit.
  • the fixed-point processing unit has low processing accuracy but low hardware resource overhead, while the floating-point processing unit has high processing accuracy but high hardware resource overhead.
  • the amount of data processed by processors has gradually increased, and the requirements for data accuracy have gradually increased.
  • traditional fixed-point processing has been unable to meet the accuracy requirements, and there is an urgent need for a Improve processing accuracy but controllable hardware resource overhead.
  • block floating point is used to improve the accuracy of the data processing process under the condition that the hardware resource overhead is small.
  • the basic principle of block floating point is shown in Figure 1.
  • the data blocks in a data segment have different mantissas M0. ⁇ M7, but there is only one exponent E.
  • This exponent E corresponds to the exponent of the largest absolute value data in the data block.
  • the dynamic range of the data in a data block is not much different, because the entire data segment of the floating point of the data block has only An exponent, so there is no need to pack it with the data mantissa.
  • block floating point can retain more mantissas and higher precision than pure floating point.
  • the existing processor only includes one data register.
  • the data register stores both the exponent data of the block floating point and the mantissa data of the block floating point. When the block floating point is processed, the register overflow problem will be caused. The accuracy is poor.
  • This application provides a floating-point processing device and a data processing method.
  • An embodiment of the present application provides a floating-point processing device, which includes:
  • a normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit the normalization unit is respectively connected to the exponent register and the mantissa register, and the multiplication and accumulation unit is connected to the mantissa register; wherein, the The normalization unit is configured to perform a normalization operation on data to generate floating-point data, the normalization unit sends the exponential data of the floating-point data to the exponent register for storage, and the normalization unit converts the The mantissa data of the block floating point data is sent to the mantissa register for storage; the multiply and accumulate unit is configured to perform multiplication operations according to the mantissa data.
  • An embodiment of the present application provides a data processing method, which includes:
  • the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit; the multiplication result data set is processed by the normalization unit At least one multiplication result in the multiplication result is normalized, and the generated result mantissa data and result exponent data are stored in the mantissa register and the exponent register respectively; the block floating-point exponent of the result exponent data in the exponent register is determined, and according to the The block floating point exponent processes the resulting mantissa data in the mantissa register to generate block floating point data.
  • Fig. 1 is an example diagram of block floating point data in the prior art
  • FIG. 2 is a schematic structural diagram of a floating-point processing device provided by an embodiment of the application.
  • FIG. 3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • FIG. 5 is a flowchart of steps of a data processing method provided by an embodiment of this application.
  • FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • FIG. 7 is an example diagram of a data processing method provided by an embodiment of this application.
  • FIG. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the normalization unit 10 respectively Connected to the exponent register 11 and the mantissa register 12, the multiplying and accumulating unit 13 is connected to the mantissa register 12; wherein, the normalizing unit 10 is configured to perform a normalization operation on data to generate floating-point data, The normalization unit 10 sends the exponent data of the floating-point data to the exponent register 11 for storage, and the normalization unit 10 sends the mantissa data of the floating-point data to the mantissa register 12 for storage. Storage; The multiplication and accumulation unit 13 is configured to perform multiplication operations according to the mantissa data.
  • the normalization unit 10 can be a processor module that processes data, and can process input data into floating-point data, and the data processed by the normalization unit 10 can be fixed-point data and floating-point data.
  • the normalization unit 10 The data can be converted according to the bit width requirement of the mantissa in the mantissa register 12, and the fixed-point data or floating-point data can be converted into floating-point data that meets the 12-bit width requirement of the mantissa register.
  • the fixed-point data 6023 can be normalized Unit 10 is converted into a floating point number 6.023E3.
  • the exponent register 11 and the mantissa register 12 can be small storage data storing binary data, and can temporarily store data and operation structures involved in the operation.
  • the exponent register 11 and the mantissa register 12 can be specific sequential logic circuits, and the exponent register 11 can The exponent data of the floating point data is stored, and the mantissa register 12 can store the mantissa data of the floating point data.
  • the floating-point processing device in the embodiment of the present application may include multiple exponent registers 11 and mantissa registers 12.
  • the mantissa register 12 may include N bit widths of 40-bit mantissa registers 12 and 2N bit widths of 8-bit exponent registers 11, where the number of N may be related to the number of data processed at the same time in the embodiment of the present application .
  • the lower 32 bits in the mantissa register 12 can be data bits for storing mantissa data, and the upper 8 bits can be extended bits.
  • the mantissa register 12 can store the mantissa of floating-point data or fixed-point data.
  • the mantissa data and exponent data stored in the mantissa register 12 and the exponent register 11 have a corresponding relationship, which belongs to
  • the storage address of the mantissa data in the mantissa register 12 may be the same as the storage address of the exponent data in the exponent register 11.
  • the multiplication and accumulation unit 12 may also be included.
  • the multiplication and accumulation unit 12 may perform multiplication and multiplication and addition operations on the mantissa data. For example, when the multiplication and accumulation unit 12 is an 18-bit multiplication and accumulation unit, The mantissa data can be multiplied to generate a 32-bit multiplication result and multiply and accumulate to generate a 40-bit multiply and accumulate result.
  • the technical solution of the embodiment of the present application constitutes a floating-point processing device by a normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit.
  • the normalization unit is connected to the exponent register and the mantissa register respectively, and the multiplication and accumulation unit is connected to the mantissa.
  • Register connection the normalization unit can normalize the data to generate floating point data, the exponent data of the floating point data is sent to the exponent register for storage, the mantissa data of the floating point data is sent to the mantissa register for storage, and the multiplying and accumulating unit is set according to the mantissa The data is multiplied to realize the processing of floating-point data.
  • the exponent data and mantissa data are stored in the exponent register and the mantissa register respectively, which reduces the overflow probability of the register, increases the storage bit width of the mantissa data, and improves the floating-point data
  • the data accuracy of the index and the independent setting of the index register allow the operation of index data to be decoupled from the hardware structure and data calculation, which simplifies the design difficulty of the hardware structure.
  • FIG. 3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • the floating-point processing device in an embodiment of the application further includes a mantissa calculation unit 14, an exponent calculation unit 15, and an association update unit 16.
  • the mantissa calculation unit 14 and the first level cache 17 are respectively connected to the mantissa register 12, the exponent calculation unit 15 is connected to the exponent register 11, and the association update unit 16 is respectively connected to The exponent register 11 is connected to the first level cache 17; wherein the mantissa calculation unit 14 is configured to perform calculations based on the mantissa data; the exponent calculation 15 unit is configured to perform calculations based on the exponent data; The associated update unit 16 is set to store the exponent data overflowed in the exponent register 11 in the first level cache 17 when the exponent register 11 overflows; the first level cache 17 is also set to store the overflow of the mantissa register 12 Mantissa data.
  • the exponent register 11 and the mantissa register 12 may also be connected to the exponent calculation unit 15 and the mantissa calculation unit 14, respectively, and the mantissa calculation unit 14 and the exponent calculation unit 15 may be an arithmetic logic unit (ALU).
  • ALU arithmetic logic unit
  • the mantissa calculation unit 14 can complete the calculation of the mantissa data
  • the exponent calculation unit 15 can implement the calculation of the exponent data, where the exponent calculation
  • the calculations performed by the unit 15 and the mantissa calculation unit 14 may include, but are not limited to, addition, subtraction, multiplication, division, and operation, or operation, not operation, exclusive OR operation, displacement operation, and so on.
  • the associated update unit 16 may store the overflow exponent data in the exponent register 11 to the first level cache 17. Since the mantissa data in the mantissa register 12 has the same storage address as the exponent data in the exponent register 11, when the exponent data in the exponent register 11 When overflowing, the overflowed exponent data can be stored in the first level cache 17. Since the overflowed exponent data has an address conflict with the overflowed mantissa data, the storage address of the exponent data can be converted, for example, the corresponding offset can be increased or decreased It can prevent the exponent data from overflowing when stored in the first level cache 17, and the exponent data and mantissa data are stored in the same location, causing data conflicts.
  • the first level cache 17 can be specifically L1Cache, which can be integrated inside the processor, and can be set to temporarily store data during data processing.
  • the first level cache 17 can store the mantissa data and exponent register overflowed from the mantissa register 12 11 Overflowing index data.
  • FIG. 4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application.
  • the association update unit is embodied.
  • the association The update unit may also include a zero-level buffer 161 and a bypass forwarding buffer 162.
  • a zero-level cache 161 and an address conversion backup buffer 162 the zero-level cache 161 is connected to the index register 11; the address conversion backup buffer 162 is respectively connected to the zero-level cache 161 and the first-level cache 17; wherein The zero-level buffer 161 is set to store the exponential data overflowed from the exponent register 11; the address conversion back-up buffer 162 is set to store the exponent data when the zero-level buffer 161 generates overflow exponent data
  • the address conversion is stored in the first level cache 17, and the exponent data after the address conversion does not conflict with the address of the mantissa data stored in the first level cache 17.
  • the zero-level cache 161 may be a cache responsible for caching index data, and can support byte addressing.
  • the index data can be sent to the first-level cache 17 for storage.
  • the address in the register and the address of the mantissa data in the mantissa register are the same, and the exponent data overflowed by the zero-level buffer 161 through the bypass forwarding buffer 162 can be address converted, where the bypass forwarding buffer 162 can be set to the physical address and
  • the buffer where the virtual address is converted can convert the physical address of the exponent data to a virtual address to prevent the exponent data overflowed from the zero-level buffer 161 from being stored in the first-level buffer 17 to cause an address conflict.
  • An independent overflow update mechanism is implemented through the zero-level buffer 161 and the bypass forwarding buffer 162, which facilitates block floating point calculations without considering the overflow of exponential data.
  • a data storage unit 19 and a data loading unit 18 are also connected between the primary cache 17 and the mantissa register 12; wherein, the data storage unit 19, Is configured to store the mantissa data overflowed by the mantissa register 12 into the first-level cache 17; the data loading unit 18 is configured to load the mantissa data stored in the first-level cache 17 to the mantissa register 12 .
  • the mantissa calculation unit 14 is also connected to the exponent register 11, and the mantissa calculation unit 14 is configured to obtain the exponent data stored in the exponent register 11, and compare the mantissa data in the mantissa register 12 according to the exponent data. Perform shift operations.
  • the data storage unit 19 and the data loading unit 18 can also be used to transfer the mantissa data between the first level cache 17 and the mantissa register 12.
  • it can also be used in the zero level cache 161 and the exponent register.
  • a data storage unit and a data loading unit are arranged between 11 to realize the storage and loading of index data.
  • the mantissa calculation unit 14 may also be connected to the exponent register 11 to obtain exponent data stored in the exponent register 11, and the mantissa calculation unit 14 may perform calculations on the mantissa register 12 according to the obtained exponent data. The corresponding mantissa data is shifted.
  • FIG. 5 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the embodiment of the application may use data processing in a processor.
  • the method may be executed by the floating-point processing device in the embodiment of the application.
  • the device can be implemented in software and/or hardware, and generally can be integrated in a processor.
  • the data processing method in the embodiment of the present application includes:
  • Step 101 Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
  • first data set and the second data set may be data sets participating in operations, the first data set and the second data set may include at least one piece of data, and the data may be fixed-point data, floating-point data, and block floating-point data.
  • the acquired first data set and the second data set can be stored as fixed-point data or mantissa data in the mantissa register. It can be understood that if floating-point numbers exist in the first data set and the second data set Data and block floating point data can be obtained through the normalization unit and the corresponding mantissa data can be stored in the mantissa register. After the first data set and the second data set are stored in the mantissa register, the result data set can be multiplied by multiplying and accumulating unit technology. Exemplarily, suppose there are two arrays X and Y of length n, and the bit width of each element in the array can be 16 bits.
  • Step 102 Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
  • the multiplication result can be a constituent element of the multiplication result data set, and the multiplication result can be a full-precision result, which does not meet the requirements of normalization.
  • the multiplication result is 123.E6, which is normalized in the embodiment of this application.
  • the requirement is 1.23E8, and the normalization unit is required to normalize the multiplication result in the multiplication result data set to change the accuracy of the mantissa data of the multiplication result.
  • the multiplication result in the multiplication result data set can be normalized, the multiplication result that does not meet the normalization requirements can be changed, the accuracy of the multiplication result can be adjusted, and the multiplication result after the normalization operation can be adjusted.
  • the result mantissa data is stored in the mantissa register, and the result exponent data is stored in the exponent register.
  • the 32-bit result multiplication result Z[i] can be read from the mantissa register to perform a normalization operation to obtain 16-bit mantissa data M[i] and the corresponding 8-bit exponent data E[ i].
  • M[i] is stored back to the mantissa register
  • E[i] can be stored in the exponent register
  • the indexes of M[i] and E[i] in the mantissa register and the exponent register are in one-to-one correspondence.
  • i can be any value from 0 to the number of multiplication results in the multiplication result data set.
  • the mantissa storage address of the result mantissa data in the mantissa register is the same as the exponent storage address of the result exponent data in the exponent register.
  • the result mantissa data can be stored in association with the result exponent data, and the mantissa storage address of the result mantissa data in the mantissa register can be the same as the exponent storage address of the result exponent data in the result register, which may include
  • the physical address is the same or the logical address is the same, or the physical address of the mantissa storage address is the same as the logical address of the exponent storage address, or the logical address of the mantissa storage address is the same as the physical address of the exponent storage address.
  • Step 103 Determine the block floating point exponent of the result exponent data in the exponent register, and process the result mantissa data in the mantissa register according to the block floating point exponent to generate block floating point data.
  • the block floating-point exponent may be the block floating-point exponent when each multiplication result in the multiplication result data set is converted into a block floating-point data format, and the block floating-point exponent may be the maximum value in the result exponent data corresponding to each multiplication result.
  • the result exponent with the largest value can be found in the exponent register as the block floating-point exponent. Since the size of the result exponent data of each multiplication result is not the same, the result exponent data needs to be shared with the same floating-point exponent, and each multiplication is required.
  • the precision of the result mantissa data corresponding to the result can be adjusted, the result mantissa in the mantissa register can be shifted to adjust the precision, and the multiplication result in the multiplication result data set after the adjusted precision can be used as block floating point data.
  • the first data set and the second data set participating in data processing are stored in a mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the multiplication result data Set the normalization operation in the normalization unit, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, determine the block floating-point exponent by the result exponent data in the exponent register, and compare the result The mantissa data is shifted to convert the multiplication result data set into block floating point data.
  • the technical solution of the embodiment of the present application increases or decreases the data bit width of the mantissa data by separately setting the mantissa register and the exponent register, thereby improving the data accuracy and reducing the data.
  • the coupling degree of the processing process can reduce the design complexity of the hardware.
  • FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the data processing method of the embodiment of the application specifically includes:
  • Step 201 Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
  • the first data set and the second number set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication.
  • the result data set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication.
  • the result data set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication.
  • Step 202 Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
  • the multiplication result in the multiplication result data set can be normalized to compress the accuracy of the mantissa data stored in the mantissa register, and new result mantissa data and result knowledge data can be obtained, and the result mantissa data can be changed And the result exponent data are stored in the mantissa register and exponent register respectively.
  • Step 203 The exponent calculation unit selects the result exponent data with the largest value in the exponent register as the block floating-point exponent.
  • the index calculation unit may sequentially read the result index data stored in the index register, may compare the read result index data, and may use the result index data with the largest value as the block floating point index.
  • Step 204 The mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent.
  • the block floating-point exponent can be used as the shift basis to shift each result mantissa data in the mantissa register, so that the result exponent data corresponding to each result mantissa data is the same as the block floating-point exponent, for example, the result exponent data
  • the difference between the value of the block floating-point exponent is 1, and the resultant mantissa data in the mantissa register can be moved one bit to the right.
  • Step 205 Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.
  • the data composed of the shifted result mantissa data and the floating point exponent may be used as the block floating point data, and the exponent data of each result mantissa data may be the block floating point exponent.
  • FIG. 7 is an example diagram of a data processing method provided by an embodiment of the application.
  • the normalization unit performs normalization compression processing on the 64-bit multiplication result output by the multiplication and accumulation unit .
  • You can get 4 32-bit result mantissa data (M0 ⁇ M4) and 4 independent result index data (E0 ⁇ E3), which are stored in the exponent register and the mantissa register respectively.
  • the exponent in the exponent register is calculated
  • the unit can simultaneously compare the maximum values of E0 to E3, and find the largest result index data as the block floating point index.
  • the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data.
  • the technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.
  • Fig. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
  • the shift of the result mantissa data is embodied.
  • the data processing method of the embodiment of the application includes:
  • Step 301 Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
  • Step 302 Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
  • Step 303 The index calculation unit reads the result index data with the lowest address in the index register as the current maximum result index.
  • the address can be the storage address corresponding to the result index data in the index register
  • the lowest address can be the address with the smallest value corresponding to the storage address
  • the current maximum result index can be the result index with the largest value in the result index data read from the index register. data.
  • the index calculation unit may read the result index data from the index register according to the minimum address requirement, and may use the read result index data as the current maximum result index. It is understandable that when the index calculation unit reads the current maximum result index, it can also read the result index data with the highest address or read any result index data as the current maximum result index.
  • Step 304 The index calculation unit sequentially reads the result index data remaining in the index register and compares it with the current maximum result index. If the result index data is greater than the current maximum result index, then the The result index data is used as the current maximum result index.
  • the index calculation unit can sequentially read the remaining result index data in the index register. Each time a result index data is read, the read result index data can be compared with the current maximum result index. The result index data obtained is greater than the current maximum result index, and the read result index data can be used as the current maximum result index.
  • the difference between the current maximum result exponent and the result exponent data can be stored in the exponent register to replace the original There are stored result index data.
  • Step 305 When the exponent calculation unit finishes reading the result exponent data in the exponent register, use the current maximum result exponent as the block floating point exponent.
  • the current maximum result index can be used as the block floating index. Since the current maximum result index has been compared with all the result index data, the current maximum result index can be The maximum value among all result index data.
  • Step 306 The mantissa calculation unit sequentially reads the result mantissa data and reads the result exponent data corresponding to the result mantissa data in the exponent register.
  • the storage addresses of the result mantissa data and the result exponent data are the same, and the corresponding result index data can be read in the exponent register according to the storage address of the result mantissa data.
  • Step 307 Use the difference between the block floating point exponent and the result exponent data as the number of shift bits.
  • the difference between the result exponent data and the block floating point exponent may be determined by the exponent calculation unit or the mantissa calculation unit as the shift bit, where the shift bit may be the bit shifted by the mantissa data.
  • Step 308 Perform a shift operation on the result mantissa data according to the shift mantissa.
  • each result bit data in the mantissa register can be shifted to the right by the bit width corresponding to the shift bit.
  • Step 309 Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.
  • the data loading unit continuously reads X[i] and Y[i] from the memory to the mantissa register;
  • the bit width is 32bit;
  • the normalization unit synchronously reads the 32-bit multiplication result Z[i] from the mantissa register for normalization operation, and obtains the 16-bit result mantissa data M[i] and the corresponding 8-bit result exponent data E[ i].
  • the result mantissa data M[i] can be stored in the mantissa register, and the result exponent data E[i] is stored in the exponent register, and the result mantissa data M[i] and the result exponent data E[i] are in the mantissa register There is a one-to-one correspondence with the index in the index register.
  • the index calculation unit monitors the dynamic range of the result index data E[i] stored in the index register by the normalization unit, that is, the maximum value of the search result index data E[i].
  • the index calculation unit can obtain the maximum value Emax of the result index data E[i] sequence , Store Emax as independent data in the index register.
  • the result mantissa data M[i] stored in the final mantissa register is the mantissa after the block floating point, and the Emax stored in the exponent register is the block floating point exponent.
  • Steps (1) to (4) are executed in a pipeline, that is, the subsequent steps do not have to wait for the entire sequence to complete the calculation before starting the next level of calculation; steps (6) to (7) are also executed in a pipeline, that is, the subsequent steps do not have to Wait for the entire sequence to complete the calculation before starting the next level of calculation.
  • Step (5) cannot be performed in a pipeline, so it involves storing the results M[i] and E[i] of step (4).
  • step (4) When the output of the calculation structure in step (4) is completed, the processor retrieves the result mantissa data M[i] stored in the first-level cache through the data loading unit and stores it in the mantissa register again, and at the same time the data in the exponent register
  • the loading unit will automatically fetch and replay the result exponent data E[i] stored in the zero-level cache to the exponent register according to the data loading unit operation of the mantissa register, and then perform the operation of step (6).
  • the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data.
  • the technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.
  • user terminal encompasses any suitable type of wireless user equipment, such as a mobile phone, a portable data processing device, a portable web browser, or a vehicle-mounted mobile station.
  • the various embodiments of the present application can be implemented in hardware or dedicated circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although the present application is not limited thereto.
  • Computer program instructions can be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or Object code.
  • ISA instruction set architecture
  • the block diagram of any logic flow in the drawings of the present application may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions.
  • the computer program can be stored on the memory.
  • the memory can be of any type suitable for the local technical environment and can be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical storage devices and systems (digital multi-function optical discs) DVD or CD) etc.
  • Computer-readable media may include non-transitory storage media.
  • the data processor can be any type suitable for the local technical environment, such as but not limited to general-purpose computers, special-purpose computers, microprocessors, digital signal processors (DSP), application-specific integrated circuits (ASIC), programmable logic devices (FGPA) And processors based on multi-core processor architecture.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FGPA programmable logic devices

Abstract

The present application provides a floating point processing device and a data processing method. The device comprises: a normalizing unit, an index register, a mantissa register, and a multiply-accumulate unit; the normalizing unit is separately connected to the index register and the mantissa register, and the multiply-accumulate unit is connected to the mantissa register; the normalizing unit is configured to perform a normalization operation on data to generate floating point data, the normalizing unit sends index data of the floating point data to the index register for storage, and the normalizing unit sends mantissa data of the floating point data to the mantissa register for storage; the multiply-accumulate unit is configured to perform multiplication operation according to the mantissa data. According to the technical solutions of the embodiments of the present application, the index data and the mantissa data of the floating point data are respectively stored into the index register and the mantissa register, so that the overflow probability of mantissas in a floating-point operation process is reduced, the bit width of the mantissa data is expanded, and the data accuracy can be improved.

Description

一种浮点处理装置和数据处理方法Floating point processing device and data processing method 技术领域Technical field
本申请涉及数据通信领域,具体涉及一种浮点处理装置和数据处理方法。This application relates to the field of data communication, and in particular to a floating-point processing device and a data processing method.
背景技术Background technique
目前处理器中主要包括浮点处理单元和定点处理单元,其中定点处理单元处理精度较低但是硬件资源开销较小,而浮点处理单元处理精度较高但是硬件资源开销较大。随着技术和协议的不断演进,处理器处理数据量逐渐增加,对数据精度的要求也逐渐提升,在某些领域(例如基带通信)传统的定点处理已经无法满足精度要求,亟需一种能够提升处理精度但是硬件资源开销可控的处理方式。The current processor mainly includes a floating-point processing unit and a fixed-point processing unit. The fixed-point processing unit has low processing accuracy but low hardware resource overhead, while the floating-point processing unit has high processing accuracy but high hardware resource overhead. With the continuous evolution of technology and protocols, the amount of data processed by processors has gradually increased, and the requirements for data accuracy have gradually increased. In some fields (such as baseband communications), traditional fixed-point processing has been unable to meet the accuracy requirements, and there is an urgent need for a Improve processing accuracy but controllable hardware resource overhead.
现有技术中通过块浮点实现在硬件资源开销增加较小的情况下提升数据处理过程的精度,块浮点的基本原理如图1所示,一个数据段内数据块有各自不同的尾数M0~M7,但是仅只有一个指数E,这个指数E对应的是数据块中绝对值最大数据的指数,当一个数据块内的数据动态范围相差不大时,由于数据块浮点的整个数据段只有一个指数,因此无需和数据尾数打包在一起,在相同数据位宽的情况下,块浮点相比纯浮点可以保留更多的尾数,精度更高。此外,由于块浮点中数据定标相同,可以直接进行加法运算,不需要进行额外的移位操作,具有更少的操作数量。但是现有处理器中仅包括一个数据寄存器,数据寄存器内既存储块浮点的指数数据又存储块浮点的尾数数据,在对块浮点进行处理时会带来寄存器溢出问题,块浮点的精度较差。In the prior art, block floating point is used to improve the accuracy of the data processing process under the condition that the hardware resource overhead is small. The basic principle of block floating point is shown in Figure 1. The data blocks in a data segment have different mantissas M0. ~M7, but there is only one exponent E. This exponent E corresponds to the exponent of the largest absolute value data in the data block. When the dynamic range of the data in a data block is not much different, because the entire data segment of the floating point of the data block has only An exponent, so there is no need to pack it with the data mantissa. In the case of the same data bit width, block floating point can retain more mantissas and higher precision than pure floating point. In addition, since the data scaling in the block floating point is the same, the addition operation can be performed directly, no additional shift operation is required, and the number of operations is less. However, the existing processor only includes one data register. The data register stores both the exponent data of the block floating point and the mantissa data of the block floating point. When the block floating point is processed, the register overflow problem will be caused. The accuracy is poor.
发明内容Summary of the invention
本申请提供一种浮点处理装置和数据处理方法。This application provides a floating-point processing device and a data processing method.
本申请实施例提供的一种浮点处理装置,该装置包括:An embodiment of the present application provides a floating-point processing device, which includes:
归一化单元、指数寄存器、尾数寄存器和乘累加单元,所述归一化单元分别与所述指数寄存器和所述尾数寄存器连接,所述乘累加单元与所述尾数寄存器连接;其中,所述归一化单元设置为对数据进行归一操作生成浮点数据,所述归一化单元将所述浮点数据的指数数据发送到所述指数寄存器进行存储,所述归一化单元将所述块浮点数据的尾数数据发送到所述尾数寄存器进行存储;所述乘累加单元设置为根据所述尾数数据进行乘法运算。A normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit, the normalization unit is respectively connected to the exponent register and the mantissa register, and the multiplication and accumulation unit is connected to the mantissa register; wherein, the The normalization unit is configured to perform a normalization operation on data to generate floating-point data, the normalization unit sends the exponential data of the floating-point data to the exponent register for storage, and the normalization unit converts the The mantissa data of the block floating point data is sent to the mantissa register for storage; the multiply and accumulate unit is configured to perform multiplication operations according to the mantissa data.
本申请实施例提供的一种数据处理方法,该方法包括:An embodiment of the present application provides a data processing method, which includes:
将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定所述第一数据集和第二数据集的乘法结果数据集;通过归一化单元对所述乘法结果数据集内至少一个乘法结果进行归一化操作,将生成的结果尾数数据和结果指数数据分别存储到尾数寄存器和指数寄存器;确定所述指数寄存器内的结果指数数据的块浮点指数,并根据所述块浮点指数处理所述尾数寄存器内的结果尾数数据以生成块浮点数据。The first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit; the multiplication result data set is processed by the normalization unit At least one multiplication result in the multiplication result is normalized, and the generated result mantissa data and result exponent data are stored in the mantissa register and the exponent register respectively; the block floating-point exponent of the result exponent data in the exponent register is determined, and according to the The block floating point exponent processes the resulting mantissa data in the mantissa register to generate block floating point data.
关于本申请的以上实施例和其他方面以及其实现方式,在附图说明、具体实施方式和权利要求中提供更多说明。Regarding the above embodiments and other aspects of the application and their implementation manners, more descriptions are provided in the description of the drawings, the specific implementation manners, and the claims.
附图说明Description of the drawings
图1为现有技术中块浮点数据的示例图;Fig. 1 is an example diagram of block floating point data in the prior art;
图2为本申请实施例提供的一种浮点处理装置的结构示意图;2 is a schematic structural diagram of a floating-point processing device provided by an embodiment of the application;
图3为本申请实施例提供的另一种浮点处理装置的结构示意图;3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application;
图4为本申请实施例提供的另一种浮点处理装置的结构示意图;4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application;
图5为本申请实施例提供的一种数据处理方法的步骤流程图;FIG. 5 is a flowchart of steps of a data processing method provided by an embodiment of this application;
图6为本申请实施例提供的一种数据处理方法的步骤流程图;FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application;
图7为本申请实施例提供的一种数据处理方法的示例图;FIG. 7 is an example diagram of a data processing method provided by an embodiment of this application;
图8为本申请实施例提供的一种数据处理方法的步骤流程图。FIG. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚明白,下文中将结合附图对本申请的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the objectives, technical solutions, and advantages of the present application clearer, the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other arbitrarily if there is no conflict.
图2为本申请实施例提供的一种浮点处理装置的结构示意图,本申请实施例可以适用于处理器中处理浮点数据的情况,该装置可以通过软件和/或硬件的方式实现,并一般可以集成在处理器中,参见图2,本申请实施例的浮点处理装置包括:归一化单元10、指数寄存器11、尾数寄存器12和乘累加单元13,所述归一化单元10分别与所述指数寄存器11和所述尾数寄存器12连接,所述乘累加单元13与所述尾数寄存器12连接;其中,所述归一化单元10设置为对数据进行归一操作生成浮点数据,所述归一化单元10将所述浮点数据的指数数据发送到所述指数寄存器11进行存储,所述归一化单元10将所述浮点数据的尾数数据发送到所述尾数寄存器12进行存储;所述乘累加单元13设置为根据所述尾数数据进行乘法运算。FIG. 2 is a schematic structural diagram of a floating-point processing device provided by an embodiment of the application. The embodiment of the application may be applicable to the case of processing floating-point data in a processor. The device may be implemented by software and/or hardware, and Generally, it can be integrated in a processor. Referring to FIG. 2, the floating-point processing device of the embodiment of the present application includes: a normalization unit 10, an exponent register 11, a mantissa register 12, and a multiply-accumulate unit 13. The normalization unit 10 respectively Connected to the exponent register 11 and the mantissa register 12, the multiplying and accumulating unit 13 is connected to the mantissa register 12; wherein, the normalizing unit 10 is configured to perform a normalization operation on data to generate floating-point data, The normalization unit 10 sends the exponent data of the floating-point data to the exponent register 11 for storage, and the normalization unit 10 sends the mantissa data of the floating-point data to the mantissa register 12 for storage. Storage; The multiplication and accumulation unit 13 is configured to perform multiplication operations according to the mantissa data.
其中,归一化单元10可以是对数据进行处理的处理器模块,可以将输入的数据处理为浮点数据,归一化单元10处理的数据可以是定点数据和浮点数据,归一化单元10可以将数据按照尾数寄存器12中尾数的位宽要求进行转换,将定点数据或者浮点数据转化为符合尾数寄存器12位宽要求的浮点数据,示例性的,定点数据6023可以通过归一化单元10转化为浮点数6.023E3。Among them, the normalization unit 10 can be a processor module that processes data, and can process input data into floating-point data, and the data processed by the normalization unit 10 can be fixed-point data and floating-point data. The normalization unit 10 The data can be converted according to the bit width requirement of the mantissa in the mantissa register 12, and the fixed-point data or floating-point data can be converted into floating-point data that meets the 12-bit width requirement of the mantissa register. For example, the fixed-point data 6023 can be normalized Unit 10 is converted into a floating point number 6.023E3.
可选的,指数寄存器11和尾数寄存器12可以是存储二进制数据的小型存储数据,可以暂时存储参与运算的数据和运算结构,指数寄存器11和尾数寄存器12可以具体为时序逻辑电路,指数寄存器11可以存储浮点数据的指数数据,尾数寄存器12可以存储浮点数据的尾数数据。可以理解的是,本申请实施例中的浮点处理装置中可以包括多个指数寄存器11 和尾数寄存器12。示例性的,可以包括N个位宽为40比特位尾数寄存器12和2N个位宽为8位的指数寄存器11,其中,N的个数可以与本申请实施例中同时处理数据的个数相关。尾数寄存器12中低32比特位可以为数据位用于存储尾数数据,高8比特位可以为拓展位。在本申请实施例中尾数寄存器12可以存储浮点数据的尾数,也可以之间存储定点数据,可选的,尾数寄存器12和指数寄存器11中的存储的尾数数据和指数数据存在对应关系,属于相同浮点数据的尾数数据和指数数据,尾数数据在尾数寄存器12中的存储地址可以与指数数据在指数寄存器11中的存储地址相同。Optionally, the exponent register 11 and the mantissa register 12 can be small storage data storing binary data, and can temporarily store data and operation structures involved in the operation. The exponent register 11 and the mantissa register 12 can be specific sequential logic circuits, and the exponent register 11 can The exponent data of the floating point data is stored, and the mantissa register 12 can store the mantissa data of the floating point data. It can be understood that the floating-point processing device in the embodiment of the present application may include multiple exponent registers 11 and mantissa registers 12. Exemplarily, it may include N bit widths of 40-bit mantissa registers 12 and 2N bit widths of 8-bit exponent registers 11, where the number of N may be related to the number of data processed at the same time in the embodiment of the present application . The lower 32 bits in the mantissa register 12 can be data bits for storing mantissa data, and the upper 8 bits can be extended bits. In the embodiment of the present application, the mantissa register 12 can store the mantissa of floating-point data or fixed-point data. Optionally, the mantissa data and exponent data stored in the mantissa register 12 and the exponent register 11 have a corresponding relationship, which belongs to For the mantissa data and exponent data of the same floating point data, the storage address of the mantissa data in the mantissa register 12 may be the same as the storage address of the exponent data in the exponent register 11.
在本申请实施例中,还可以包括乘累加单元12,乘累加单元12可以对尾数数据进行乘法运算和乘法加运算,示例性的,当乘法累加单元12为18比特位的乘累加单元时,可以通过对尾数数据进行乘法运算生成32比特位的乘法结果和进行乘法累加运算生成40比特位的乘累加结果。In the embodiment of the present application, the multiplication and accumulation unit 12 may also be included. The multiplication and accumulation unit 12 may perform multiplication and multiplication and addition operations on the mantissa data. For example, when the multiplication and accumulation unit 12 is an 18-bit multiplication and accumulation unit, The mantissa data can be multiplied to generate a 32-bit multiplication result and multiply and accumulate to generate a 40-bit multiply and accumulate result.
本申请实施例的技术方案,通过由归一化单元、指数寄存器、尾数寄存器和乘累加单元构成浮点处理装置,其中,归一化单元分别与指数寄存器和尾数寄存器连接,乘累加单元与尾数寄存器连接,归一化单元可以对数据进行归一操作生成浮点数据,浮点数据的指数数据发送到指数寄存器存储,浮点数据的尾数数据发送到尾数寄存器存储,乘累加单元设置为根据尾数数据进行乘法运算,实现了对浮点数据的处理,将指数数据和尾数数据分别存储在指数寄存器和尾数寄存器,降低了寄存器的溢出机率,增添了尾数数据的存储位宽,提高了浮点数据的数据精度,指数寄存器的单独设置,让指数数据的操作在硬件结构和数据计算上解耦,简化了硬件结构的设计难度。The technical solution of the embodiment of the present application constitutes a floating-point processing device by a normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit. The normalization unit is connected to the exponent register and the mantissa register respectively, and the multiplication and accumulation unit is connected to the mantissa. Register connection, the normalization unit can normalize the data to generate floating point data, the exponent data of the floating point data is sent to the exponent register for storage, the mantissa data of the floating point data is sent to the mantissa register for storage, and the multiplying and accumulating unit is set according to the mantissa The data is multiplied to realize the processing of floating-point data. The exponent data and mantissa data are stored in the exponent register and the mantissa register respectively, which reduces the overflow probability of the register, increases the storage bit width of the mantissa data, and improves the floating-point data The data accuracy of the index and the independent setting of the index register allow the operation of index data to be decoupled from the hardware structure and data calculation, which simplifies the design difficulty of the hardware structure.
图3为本申请实施例提供的另一种浮点处理装置的结构示意图,参见图3,本申请实施例中的浮点处理装置还包括尾数计算单元14、指数计算单元15、关联更新单元16和一级缓存17,所述尾数计算单元14和所述 一级缓存17分别与所述尾数寄存器12连接,所述指数计算单元15与所述指数寄存器11连接,所述关联更新单元16分别与所述指数寄存器11和所述一级缓存17连接;其中,所述尾数计算单元14设置为根据所述尾数数据进行计算;所述指数计算15单元设置为根据所述指数数据进行计算;所述关联更新单元16设置为在指数寄存器11溢出时,将所述指数寄存器11内溢出的指数数据存储到所述一级缓存17;所述一级缓存17还设置为存储所述尾数寄存器12溢出的尾数数据。FIG. 3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application. Referring to FIG. 3, the floating-point processing device in an embodiment of the application further includes a mantissa calculation unit 14, an exponent calculation unit 15, and an association update unit 16. And the first level cache 17, the mantissa calculation unit 14 and the first level cache 17 are respectively connected to the mantissa register 12, the exponent calculation unit 15 is connected to the exponent register 11, and the association update unit 16 is respectively connected to The exponent register 11 is connected to the first level cache 17; wherein the mantissa calculation unit 14 is configured to perform calculations based on the mantissa data; the exponent calculation 15 unit is configured to perform calculations based on the exponent data; The associated update unit 16 is set to store the exponent data overflowed in the exponent register 11 in the first level cache 17 when the exponent register 11 overflows; the first level cache 17 is also set to store the overflow of the mantissa register 12 Mantissa data.
在本申请实施例中,指数寄存器11和尾数寄存器12还可以分别与指数计算单元15和尾数计算单元14连接,尾数计算单元14和指数计算单元15可以是算术逻辑单元(Arithmetic and Logic Unit,ALU),可以是实现算术运算和逻辑运算的组合逻辑电路,可以具体是处理器的执行单元,尾数计算单元14可以完成尾数数据的计算,指数计算单元15可以实现指数数据的计算,其中,指数计算单元15和尾数计算单元14完成的计算可以包括但不限于加、减、乘、除、与运算、或运算、非运算、异或运算和位移运算等。In the embodiment of the present application, the exponent register 11 and the mantissa register 12 may also be connected to the exponent calculation unit 15 and the mantissa calculation unit 14, respectively, and the mantissa calculation unit 14 and the exponent calculation unit 15 may be an arithmetic logic unit (ALU). ), can be a combinational logic circuit that implements arithmetic operations and logical operations, can be specifically the execution unit of the processor, the mantissa calculation unit 14 can complete the calculation of the mantissa data, and the exponent calculation unit 15 can implement the calculation of the exponent data, where the exponent calculation The calculations performed by the unit 15 and the mantissa calculation unit 14 may include, but are not limited to, addition, subtraction, multiplication, division, and operation, or operation, not operation, exclusive OR operation, displacement operation, and so on.
关联更新单元16可以是将指数寄存器11中溢出的指数数据存储到一级缓存17,由于尾数寄存器12中尾数数据与指数寄存器11中的指数数据的存储地址相同,当指数寄存器11中的指数数据溢出时,可以将溢出的指数数据存储到一级缓存17,由于溢出的指数数据与溢出的尾数数据存在地址冲突,可以对指数数据的存储地址进行转换,例如,而可以增减相应的偏移量,可以防止溢出的指数数据存储到一级缓存17时指数数据和尾数数据存储到相同位置导致数据冲突。一级缓存17可以具体是L1Cache,可以集成在处理器内部,可以设置为暂存数据处理过程中的数据,在本申请实施例中一级缓存17可以存储尾数寄存器12溢出的尾数数据和指数寄存器11溢出的指数数据。The associated update unit 16 may store the overflow exponent data in the exponent register 11 to the first level cache 17. Since the mantissa data in the mantissa register 12 has the same storage address as the exponent data in the exponent register 11, when the exponent data in the exponent register 11 When overflowing, the overflowed exponent data can be stored in the first level cache 17. Since the overflowed exponent data has an address conflict with the overflowed mantissa data, the storage address of the exponent data can be converted, for example, the corresponding offset can be increased or decreased It can prevent the exponent data from overflowing when stored in the first level cache 17, and the exponent data and mantissa data are stored in the same location, causing data conflicts. The first level cache 17 can be specifically L1Cache, which can be integrated inside the processor, and can be set to temporarily store data during data processing. In the embodiment of the present application, the first level cache 17 can store the mantissa data and exponent register overflowed from the mantissa register 12 11 Overflowing index data.
图4为本申请实施例提供的另一种浮点处理装置的结构示意图,本申 请实施例中对关联更新单元进行了具体化,参见图4,本申请实施例的浮点处理装置中,关联更新单元还可以包括零级缓存161和旁路转发缓冲162。零级缓存161和地址转换后备缓冲器162;所述零级缓存161与所述指数寄存器11连接;所述地址转换后备缓冲器162分别与所述零级缓存161和一级缓存17连接;其中,所述零级缓存161设置为将指数寄存器11溢出的指数数据存储;所述地址转换后备缓冲器162,设置为在所述零级缓冲161产生溢出的指数数据时,将所述指数数据进行地址转换后存储到所述一级缓存17,所述地址转换后的指数数据与所述一级缓存17内存储的尾数数据的地址不冲突。FIG. 4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application. In the embodiment of the application, the association update unit is embodied. Referring to FIG. 4, in the floating-point processing device of the embodiment of the application, the association The update unit may also include a zero-level buffer 161 and a bypass forwarding buffer 162. A zero-level cache 161 and an address conversion backup buffer 162; the zero-level cache 161 is connected to the index register 11; the address conversion backup buffer 162 is respectively connected to the zero-level cache 161 and the first-level cache 17; wherein The zero-level buffer 161 is set to store the exponential data overflowed from the exponent register 11; the address conversion back-up buffer 162 is set to store the exponent data when the zero-level buffer 161 generates overflow exponent data The address conversion is stored in the first level cache 17, and the exponent data after the address conversion does not conflict with the address of the mantissa data stored in the first level cache 17.
本申请实施例中,零级缓存161可以是负责缓存指数数据的缓存,可以支持按字节寻址,零级缓存161溢出时可以将指数数据发送到一级缓存17存储,由于指数数据在指数寄存器中的地址和尾数数据在尾数寄存器中的地址相同,可以将通过旁路转发缓冲162对零级缓存161溢出的指数数据进行地址变换,其中,旁路转发缓存162可以是设置为物理地址和虚拟地址进行转换的缓冲器,可以将指数数据的物理地址转换位虚拟地址,防止零级缓存161溢出的指数数据存储到一级缓存17造成地址冲突。通过零级缓存161和旁路转发缓冲162实现独立的溢出更新机制,便于进行块浮点计算时,无需考虑指数数据的溢出问题。In the embodiment of the present application, the zero-level cache 161 may be a cache responsible for caching index data, and can support byte addressing. When the zero-level cache 161 overflows, the index data can be sent to the first-level cache 17 for storage. The address in the register and the address of the mantissa data in the mantissa register are the same, and the exponent data overflowed by the zero-level buffer 161 through the bypass forwarding buffer 162 can be address converted, where the bypass forwarding buffer 162 can be set to the physical address and The buffer where the virtual address is converted can convert the physical address of the exponent data to a virtual address to prevent the exponent data overflowed from the zero-level buffer 161 from being stored in the first-level buffer 17 to cause an address conflict. An independent overflow update mechanism is implemented through the zero-level buffer 161 and the bypass forwarding buffer 162, which facilitates block floating point calculations without considering the overflow of exponential data.
参见图4,本申请实施例中浮点处理装置中所述一级缓存17和所述尾数寄存器12之间还连接有数据存储单元19和数据加载单元18;其中,所述数据存储单元19,设置为将所述尾数寄存器12溢出的所述尾数数据存储到所述一级缓存17;所述数据加载单元18,设置为将所述一级缓存17存储的尾数数据加载到所述尾数寄存器12。所述尾数计算单元14还与所述指数寄存器11连接,所述尾数计算单元14设置为获取所述指数寄存器11存储的指数数据,并根据所述指数数据对所述尾数寄存器12内的尾数数据进行移位运算。Referring to FIG. 4, in the floating-point processing device in the embodiment of the present application, a data storage unit 19 and a data loading unit 18 are also connected between the primary cache 17 and the mantissa register 12; wherein, the data storage unit 19, Is configured to store the mantissa data overflowed by the mantissa register 12 into the first-level cache 17; the data loading unit 18 is configured to load the mantissa data stored in the first-level cache 17 to the mantissa register 12 . The mantissa calculation unit 14 is also connected to the exponent register 11, and the mantissa calculation unit 14 is configured to obtain the exponent data stored in the exponent register 11, and compare the mantissa data in the mantissa register 12 according to the exponent data. Perform shift operations.
在本申请实施例中,还可以通过数据存储单元19和数据加载单元18进行一级缓存17和尾数寄存器12之间的尾数数据的传输,可选的,还可以在零级缓存161和指数寄存器11之间设置数据存储单元和数据加载单元,实现指数数据的存储与加载。可选的,本申请实施例中,尾数计算单元14还可以与指数寄存器11连接,可以获取到指数寄存器11内存储的指数数据,尾数计算单元14可以根据获取到的指数数据对尾数寄存器12内对应的尾数数据进行移位操作。In the embodiment of the present application, the data storage unit 19 and the data loading unit 18 can also be used to transfer the mantissa data between the first level cache 17 and the mantissa register 12. Optionally, it can also be used in the zero level cache 161 and the exponent register. A data storage unit and a data loading unit are arranged between 11 to realize the storage and loading of index data. Optionally, in the embodiment of the present application, the mantissa calculation unit 14 may also be connected to the exponent register 11 to obtain exponent data stored in the exponent register 11, and the mantissa calculation unit 14 may perform calculations on the mantissa register 12 according to the obtained exponent data. The corresponding mantissa data is shifted.
图5为本申请实施例提供的一种数据处理方法的步骤流程图,本申请实施例可使用处理器中数据处理的情况,该方法可以由本申请实施例中的浮点处理装置来执行,该装置可以通过软件和/或硬件的方式实现,并一般可以集成在处理器中,本申请实施例的数据处理方法包括:FIG. 5 is a flow chart of the steps of a data processing method provided by an embodiment of the application. The embodiment of the application may use data processing in a processor. The method may be executed by the floating-point processing device in the embodiment of the application. The device can be implemented in software and/or hardware, and generally can be integrated in a processor. The data processing method in the embodiment of the present application includes:
步骤101、将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定所述第一数据集和第二数据集的乘法结果数据集。Step 101: Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
其中,第一数据集和第二数据集可以是参与运算的数据集合,第一数据集和第二数据集中可以包括至少一个数据,数据可以是定点数据、浮点数据和块浮点数据。Wherein, the first data set and the second data set may be data sets participating in operations, the first data set and the second data set may include at least one piece of data, and the data may be fixed-point data, floating-point data, and block floating-point data.
在本申请实施例中,可以将获取到的第一数据集和第二数据集作为定点数据或者尾数数据存储到尾数寄存器,可以理解的是,如果第一数据集和第二数据集中存在浮点数据和块浮点数据,可以通过归一化单元获取到对应的尾数数据存储到尾数寄存器。在第一数据集和第二数据集在尾数寄存器存储后,可以通过乘累加单元技术乘法结果数据集。示例性的,假设有两个长度为n的数组X和Y,数组中每个元素的位宽可以为16比特位,要计算Z[i]=X[i]*Y[i],其中,i可以为0到n-1的任意整数,可以先进行归一化处理将块浮点数据归一化到16比特位。从内存中连续读取X[i]和Y[i]到尾数寄存器中,可以同时通过乘累加单元计算乘法结果数据集Z[i]=X[i]*Y[i],也可以将乘法结果数据集存储到尾数寄存器。In the embodiment of the present application, the acquired first data set and the second data set can be stored as fixed-point data or mantissa data in the mantissa register. It can be understood that if floating-point numbers exist in the first data set and the second data set Data and block floating point data can be obtained through the normalization unit and the corresponding mantissa data can be stored in the mantissa register. After the first data set and the second data set are stored in the mantissa register, the result data set can be multiplied by multiplying and accumulating unit technology. Exemplarily, suppose there are two arrays X and Y of length n, and the bit width of each element in the array can be 16 bits. To calculate Z[i]=X[i]*Y[i], where, i can be any integer from 0 to n-1, and the block floating point data can be normalized to 16 bits by normalization processing first. Continuously read X[i] and Y[i] from the memory to the mantissa register, and the multiplication result data set Z[i]=X[i]*Y[i] can be calculated by the multiplication and accumulation unit at the same time, and the multiplication can also be performed The result data set is stored in the mantissa register.
步骤102、通过归一化单元对所述乘法结果数据集内至少一个乘法结果进行归一化操作,将生成的结果尾数数据和结果指数数据分别存储到尾数寄存器和指数寄存器。Step 102: Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
其中,乘法结果可以是乘法结果数据集的组成元素,乘法结果可以为全精度的结果,与归一化的要求不相符合,例如,乘法结果为123.E6,本申请实施例中归一化的要求为1.23E8,需要归一化单元对乘法结果数据集内的乘法结果进行归一化操作,改变乘法结果的尾数数据的精度。Among them, the multiplication result can be a constituent element of the multiplication result data set, and the multiplication result can be a full-precision result, which does not meet the requirements of normalization. For example, the multiplication result is 123.E6, which is normalized in the embodiment of this application. The requirement is 1.23E8, and the normalization unit is required to normalize the multiplication result in the multiplication result data set to change the accuracy of the mantissa data of the multiplication result.
本申请实施例中,可以将乘法结果数据集中的乘法结果进行归一化操作,改变不符合归一化要求的乘法结果,对乘法结果的精度进行调整,将进行归一化操作后乘法结果的结果尾数数据存储到尾数寄存器,将结果指数数据存储到指数寄存器。In the embodiment of the present application, the multiplication result in the multiplication result data set can be normalized, the multiplication result that does not meet the normalization requirements can be changed, the accuracy of the multiplication result can be adjusted, and the multiplication result after the normalization operation can be adjusted. The result mantissa data is stored in the mantissa register, and the result exponent data is stored in the exponent register.
示例性的,可以从尾数寄存器中读取32比特位的结果乘法结果Z[i]进行归一化操作,得到16比特位的尾数数据M[i]和对应的8比特位的指数数据E[i]。其中,M[i]存储回尾数寄存器中,而E[i]则可以存储到指数寄存器中,且M[i]和E[i]在尾数寄存器和指数寄存器中的索引是一一对应的,其中,i可以是0到乘法结果数据集中乘法结果数量中任一个数值。Exemplarily, the 32-bit result multiplication result Z[i] can be read from the mantissa register to perform a normalization operation to obtain 16-bit mantissa data M[i] and the corresponding 8-bit exponent data E[ i]. Among them, M[i] is stored back to the mantissa register, and E[i] can be stored in the exponent register, and the indexes of M[i] and E[i] in the mantissa register and the exponent register are in one-to-one correspondence. Wherein, i can be any value from 0 to the number of multiplication results in the multiplication result data set.
一种实施方式中,结果尾数数据在所述尾数寄存器内的尾数存储地址与所述结果指数数据在所述指数寄存器内的指数存储地址相同。In one embodiment, the mantissa storage address of the result mantissa data in the mantissa register is the same as the exponent storage address of the result exponent data in the exponent register.
可选的,为了便于数据处理操作,可以将结果尾数数据与结果指数数据关联存储,结果尾数数据在尾数寄存器内的尾数存储地址可以与结果指数数据在结果寄存器内的指数存储地址相同,可以包括物理地址相同或者逻辑地址相同,也可以是尾数存储地址的物理地址与指数存储地址的逻辑地址相同,或者是尾数存储地址的逻辑地址与指数存储地址的物理地址相同。Optionally, in order to facilitate data processing operations, the result mantissa data can be stored in association with the result exponent data, and the mantissa storage address of the result mantissa data in the mantissa register can be the same as the exponent storage address of the result exponent data in the result register, which may include The physical address is the same or the logical address is the same, or the physical address of the mantissa storage address is the same as the logical address of the exponent storage address, or the logical address of the mantissa storage address is the same as the physical address of the exponent storage address.
步骤103、确定所述指数寄存器内的结果指数数据的块浮点指数,并根据所述块浮点指数处理所述尾数寄存器内的结果尾数数据以生成块浮 点数据。Step 103: Determine the block floating point exponent of the result exponent data in the exponent register, and process the result mantissa data in the mantissa register according to the block floating point exponent to generate block floating point data.
其中,块浮点指数可以是乘法结果数据集中各乘法结果转化为块浮点数据格式时的块浮点指数,块浮点指数可以是各乘法结果对应的结果指数数据中的最大值。Wherein, the block floating-point exponent may be the block floating-point exponent when each multiplication result in the multiplication result data set is converted into a block floating-point data format, and the block floating-point exponent may be the maximum value in the result exponent data corresponding to each multiplication result.
可选的,可以在指数寄存器内查找数值最大的结果指数数据作为块浮点指数,由于各乘法结果的结果指数数据大小不相同,需要将结果指数数据共用同一块浮点指数,需要对各乘法结果对应的结果尾数数据的精度进行调节,可以对在尾数寄存器内的结果尾数进行位移操作调节精度,而可以将调节精度后的乘法结果数据集内的乘法结果作为块浮点数据。Optionally, the result exponent with the largest value can be found in the exponent register as the block floating-point exponent. Since the size of the result exponent data of each multiplication result is not the same, the result exponent data needs to be shared with the same floating-point exponent, and each multiplication is required. The precision of the result mantissa data corresponding to the result can be adjusted, the result mantissa in the mantissa register can be shifted to adjust the precision, and the multiplication result in the multiplication result data set after the adjusted precision can be used as block floating point data.
本申请实施例的技术方案,通过尾数寄存器存储参与数据处理的第一数据集和第二数据集,通过乘累加单元确定第一数据集和第二数据集的乘法结果数据集,对乘法结果数据集在归一化单元进行归一化操作,将生成的结果尾数数据存储在尾数寄存器,将生成的指数数据存储在指数寄存器,通过指数寄存器内的结果指数数据确定块浮点指数,并对结果尾数数据进行移位将所述乘法结果数据集转换为块浮点数据,本申请实施例的技术方案通过单独设置尾数寄存器和指数寄存器,增减尾数数据的数据位宽,提高数据精度,降低数据处理过程的耦合度,可减少硬件的设计复杂度。In the technical solution of the embodiment of the present application, the first data set and the second data set participating in data processing are stored in a mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the multiplication result data Set the normalization operation in the normalization unit, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, determine the block floating-point exponent by the result exponent data in the exponent register, and compare the result The mantissa data is shifted to convert the multiplication result data set into block floating point data. The technical solution of the embodiment of the present application increases or decreases the data bit width of the mantissa data by separately setting the mantissa register and the exponent register, thereby improving the data accuracy and reducing the data. The coupling degree of the processing process can reduce the design complexity of the hardware.
图6为本申请实施例提供的一种数据处理方法的步骤流程图,参见图6,本申请实施例的数据处理方法具体包括:FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application. Referring to FIG. 6, the data processing method of the embodiment of the application specifically includes:
步骤201、将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定所述第一数据集和第二数据集的乘法结果数据集。Step 201: Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
可选的,可以对第一数据集和第二数集预先存储到尾数寄存器,可以在乘累加单元对第一数据集合第二数集内的数据进行全精度乘法运算,可以将乘法结果构成乘法结果数据集。Optionally, the first data set and the second number set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication. The result data set.
步骤202、通过归一化单元对所述乘法结果数据集内至少一个乘法结 果进行归一化操作,将生成的结果尾数数据和结果指数数据分别存储到尾数寄存器和指数寄存器。Step 202: Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
在本申请实施例中,可以对乘法结果数据集内的乘法结果进行归一化处理将尾数寄存器内存储的尾数数据精度进行压缩,获取新的结果尾数数据和结果知识数据,可以将结果尾数数据和结果指数数据分别存储到尾数寄存器和指数寄存器。In the embodiment of the present application, the multiplication result in the multiplication result data set can be normalized to compress the accuracy of the mantissa data stored in the mantissa register, and new result mantissa data and result knowledge data can be obtained, and the result mantissa data can be changed And the result exponent data are stored in the mantissa register and exponent register respectively.
步骤203、指数计算单元在所述指数寄存器内选择数值最大的结果指数数据作为所述块浮点指数。Step 203: The exponent calculation unit selects the result exponent data with the largest value in the exponent register as the block floating-point exponent.
可选的,指数计算单元可以依次读取指数寄存器内存储的结果指数数据,可以将读取到的结果指数数据进行比较,可以将数值最大的结果指数数据作为块浮点指数。Optionally, the index calculation unit may sequentially read the result index data stored in the index register, may compare the read result index data, and may use the result index data with the largest value as the block floating point index.
步骤204、尾数计算单元根据所述块浮点指数对所述尾数寄存器内的所述结果尾数数据进行移位运算。Step 204: The mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent.
可选的,可以将块浮点指数作为移位依据,对尾数寄存器内的各结果尾数数据进行移位,使得各结果尾数数据对应的结果指数数据与块浮点指数相同,例如,结果指数数据与块浮点指数的数值的差值为1,可以将尾数寄存器中的结果尾数数据向右移动一位。Optionally, the block floating-point exponent can be used as the shift basis to shift each result mantissa data in the mantissa register, so that the result exponent data corresponding to each result mantissa data is the same as the block floating-point exponent, for example, the result exponent data The difference between the value of the block floating-point exponent is 1, and the resultant mantissa data in the mantissa register can be moved one bit to the right.
步骤205、将移位运算后的结果尾数数据和所述块浮点指数作为块浮点数据。Step 205: Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.
在本申请实施例中,可以将移位后的各结果尾数数据和浮点指数构成的数据作为块浮点数据,各结果尾数数据的指数数据可以是块浮点指数。In the embodiment of the present application, the data composed of the shifted result mantissa data and the floating point exponent may be used as the block floating point data, and the exponent data of each result mantissa data may be the block floating point exponent.
示例性的,图7为本申请实施例提供的一种数据处理方法的示例图,参见图7,在归一化单元对乘累加单元输出的64比特位的乘法结果进行归一化压缩处理后,可以得到4个32比特位的结果尾数数据(M0~M4)和4个独立的结果指数数据(E0~E3),分别存储在指数寄存器和尾数寄存器中,此时,指数寄存器中的指数计算单元可以同步进行E0~E3的最大值比较, 寻找最大的结果指数数据作为块浮点指数。找到4个结果指数数据(E0~E3)中的最大值(例如E3),由尾数计算单元对尾数寄存器上的结果尾数数据(M0~M2)按照块浮点指数E3定标重新移位,使得结果尾数数据的精度向E3对齐。Exemplarily, FIG. 7 is an example diagram of a data processing method provided by an embodiment of the application. Referring to FIG. 7, after the normalization unit performs normalization compression processing on the 64-bit multiplication result output by the multiplication and accumulation unit , You can get 4 32-bit result mantissa data (M0~M4) and 4 independent result index data (E0~E3), which are stored in the exponent register and the mantissa register respectively. At this time, the exponent in the exponent register is calculated The unit can simultaneously compare the maximum values of E0 to E3, and find the largest result index data as the block floating point index. Find the maximum value (such as E3) of the four result exponent data (E0~E3), and the result mantissa data (M0~M2) on the mantissa register is re-shifted according to the block floating-point exponent E3 by the mantissa calculation unit, so that The precision of the resultant mantissa data is aligned to E3.
本申请实施例的技术方案,通过将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定第一数据集和第二数据集的乘法结果数据集,通过归一化单元对乘法结果数据集内的乘法结果进行归一化,将生成的结果尾数数据存储到尾数寄存器,将生成的指数数据存储到指数寄存器,指数计算单元在指数寄存器内选择数值最大的结果指数数据作为块浮点指数,尾数计算单元根据块浮点指数对尾数寄存器内的结果尾数数据进行移位运算生成块浮点数据,本申请实施例的技术方案,通过将结果尾数数据和结果指数数据分别放在两个寄存器中进行计算,降低了硬件设计和数据处理的耦合性,可以减少计算的复杂度,增加了结果尾数数据的可用位宽,提高了数据精度。In the technical solution of the embodiment of the present application, the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data. The technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.
图8为本申请实施例提供的一种数据处理方法的步骤流程图,本申请实施例中对结果尾数数据的移位进行了具体化,参见图8,本申请实施例的数据处理方法包括:Fig. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application. In the embodiment of the application, the shift of the result mantissa data is embodied. Referring to Fig. 8, the data processing method of the embodiment of the application includes:
步骤301、将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定所述第一数据集和第二数据集的乘法结果数据集。Step 301: Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.
步骤302、通过归一化单元对所述乘法结果数据集内至少一个乘法结果进行归一化操作,将生成的结果尾数数据和结果指数数据分别存储到尾数寄存器和指数寄存器。Step 302: Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.
步骤303、指数计算单元读取所述指数寄存器内地址最低的结果指数数据作为当前最大结果指数。Step 303: The index calculation unit reads the result index data with the lowest address in the index register as the current maximum result index.
其中,地址可以是指数寄存器内结果指数数据对应的存储地址,地址最低可以是存储地址对应数值最小的地址,当前最大结果指数可以是从指 数寄存器中读出的结果指数数据中数值最大的结果指数数据。Among them, the address can be the storage address corresponding to the result index data in the index register, the lowest address can be the address with the smallest value corresponding to the storage address, and the current maximum result index can be the result index with the largest value in the result index data read from the index register. data.
可选的,指数计算单元可以从指数寄存器中按照地址最低的要求读取到结果指数数据,可以将读取到的结果指数数据作为当前最大结果指数。可以理解的是,指数计算单元读取当前最大结果指数时,还可以读取地址最高的结果指数数据或者是任一读取一个结果指数数据作为当前最大结果指数。Optionally, the index calculation unit may read the result index data from the index register according to the minimum address requirement, and may use the read result index data as the current maximum result index. It is understandable that when the index calculation unit reads the current maximum result index, it can also read the result index data with the highest address or read any result index data as the current maximum result index.
步骤304、指数计算单元依次读取所述指数寄存器内剩余的所述结果指数数据并与所述当前最大结果指数进行比较,若所述结果指数数据大于所述当前最大结果指数,则将所述结果指数数据作为所述当前最大结果指数。Step 304: The index calculation unit sequentially reads the result index data remaining in the index register and compares it with the current maximum result index. If the result index data is greater than the current maximum result index, then the The result index data is used as the current maximum result index.
可选的,指数计算单元可以依次读取到指数寄存器内剩余的结果指数数据,每读取到一个结果指数数据,可以将读取到的结果指数数据与当前最大结果指数进行对比,如果读取到的结果指数数据大于当前最大结果指数,可以将读取到的结果指数数据作为当前最大结果指数。可选的,为了便于对结果尾数数据的移位操作,如果读取到的结果指数数据小于或等于当前最大结果指数,可以将当前最大结果指数与结果指数数据的差存储到指数寄存器以替换原有存储的结果指数数据。Optionally, the index calculation unit can sequentially read the remaining result index data in the index register. Each time a result index data is read, the read result index data can be compared with the current maximum result index. The result index data obtained is greater than the current maximum result index, and the read result index data can be used as the current maximum result index. Optionally, in order to facilitate the shift operation of the result mantissa data, if the read result exponent data is less than or equal to the current maximum result exponent, the difference between the current maximum result exponent and the result exponent data can be stored in the exponent register to replace the original There are stored result index data.
步骤305、当指数计算单元完成所述指数寄存器内结果指数数据的读取时,则将所述当前最大结果指数作为所述块浮点指数。Step 305: When the exponent calculation unit finishes reading the result exponent data in the exponent register, use the current maximum result exponent as the block floating point exponent.
可选的,在读取到指数寄存器内所有结果指数数据后,可以将当前最大结果指数作为块浮点指数,由于当前最大结果指数与所有的结果指数数据进行过对比,当前最大结果指数可以是所有结果指数数据中的最大值。Optionally, after reading all the result index data in the index register, the current maximum result index can be used as the block floating index. Since the current maximum result index has been compared with all the result index data, the current maximum result index can be The maximum value among all result index data.
步骤306、尾数计算单元依次读取结果尾数数据并在所述指数寄存器读取所述结果尾数数据对应的结果指数数据。Step 306: The mantissa calculation unit sequentially reads the result mantissa data and reads the result exponent data corresponding to the result mantissa data in the exponent register.
在本申请实施例中,结果尾数数据和结果指数数据的存储地址相同,可以根据结果尾数数据的存储地址在指数寄存器中读取对应的结果指数 数据。In the embodiment of the present application, the storage addresses of the result mantissa data and the result exponent data are the same, and the corresponding result index data can be read in the exponent register according to the storage address of the result mantissa data.
步骤307、将所述块浮点指数与所述结果指数数据的差值作为移位位数。Step 307: Use the difference between the block floating point exponent and the result exponent data as the number of shift bits.
可选的,可以由指数计算单元或者尾数计算单元确定结果指数数据与块浮点指数的差值作为移位位数,其中移位位数可以是尾数数据进行位移的位数。Optionally, the difference between the result exponent data and the block floating point exponent may be determined by the exponent calculation unit or the mantissa calculation unit as the shift bit, where the shift bit may be the bit shifted by the mantissa data.
步骤308、根据所述移位尾数对所述结果尾数数据进行移位操作。Step 308: Perform a shift operation on the result mantissa data according to the shift mantissa.
可选的,可以将尾数寄存器中各结果位数数据向右移动移位位数对应的位宽。Optionally, each result bit data in the mantissa register can be shifted to the right by the bit width corresponding to the shift bit.
步骤309、将移位运算后的结果尾数数据和所述块浮点指数作为块浮点数据。Step 309: Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.
示例性的,假设有两个长度为n的数组X和Y,数组中每个元素位宽为16bit整数,要计算Z[i]=X[i]*Y[i],并进行块浮点归一化到16bit,其中i=0~n-1。Exemplarily, suppose there are two arrays X and Y of length n, and the bit width of each element in the array is a 16-bit integer. To calculate Z[i]=X[i]*Y[i], and perform block floating Normalized to 16bit, where i=0~n-1.
(1)数据加载单元从内存中连续读取X[i]和Y[i]到尾数寄存器中;(1) The data loading unit continuously reads X[i] and Y[i] from the memory to the mantissa register;
(2)同步的,乘累加单元计算Z[i]=X[i]*Y[i],并存储到尾数寄存器中,此时乘法结果数据集Z[i]可以为全精度的乘法结果,位宽为32bit;(2) Synchronously, the multiplication and accumulation unit calculates Z[i]=X[i]*Y[i] and stores it in the mantissa register. At this time, the multiplication result data set Z[i] can be the multiplication result of full precision. The bit width is 32bit;
(3)同步的,归一化单元同步从尾数寄存器中读取32bit的乘法结果Z[i]进行归一化操作,得到16bit的结果尾数数据M[i]和对应的8bit结果指数数据E[i]。其中,结果尾数数据M[i]可以存储回尾数寄存器中,而结果指数数据E[i]则存储到指数寄存器中,且结果尾数数据M[i]和结果指数数据E[i]在尾数寄存器和指数寄存器中的索引是一一对应的。(3) Synchronous, the normalization unit synchronously reads the 32-bit multiplication result Z[i] from the mantissa register for normalization operation, and obtains the 16-bit result mantissa data M[i] and the corresponding 8-bit result exponent data E[ i]. Among them, the result mantissa data M[i] can be stored in the mantissa register, and the result exponent data E[i] is stored in the exponent register, and the result mantissa data M[i] and the result exponent data E[i] are in the mantissa register There is a one-to-one correspondence with the index in the index register.
(4)同步的,指数计算单元对归一化单元存储到指数寄存器中的结果指数数据E[i]进行动态范围的监测,即搜索结果指数数据E[i]中的最大值。(4) Synchronously, the index calculation unit monitors the dynamic range of the result index data E[i] stored in the index register by the normalization unit, that is, the maximum value of the search result index data E[i].
(5)当指数计算单元完成对结果指数数据E[i]序列的最后一个结果 指数数据E[N-1]的比较后,指数计算单元可以得到结果指数数据E[i]序列的最大值Emax,将Emax作为独立的数据存储到指数寄存器中。(5) After the index calculation unit completes the comparison of the last result index data E[N-1] of the result index data E[i] sequence, the index calculation unit can obtain the maximum value Emax of the result index data E[i] sequence , Store Emax as independent data in the index register.
(6)指数计算单元依次读取指数寄存器中的结果指数数据E[i],计算E[i]=Emax-E[i],即移位位数,计算结果存回指数寄存器的相同索引中,将原来的E[i]覆盖。(6) The index calculation unit sequentially reads the result index data E[i] in the index register, calculates E[i]=Emax-E[i], which is the number of shift bits, and saves the calculation result back to the same index in the index register , Overwrite the original E[i].
(7)同步的,位数计算单元读取指数寄存器中指数计算单元更新后的E[i]值,根据E[i]值读取尾数寄存器中对应索引上存储的结果尾数数据M[i]进行对其移位,即M[i]=M[i]<<E[i],再将结果存储回尾数寄存器,覆盖掉原来的尾数寄存器M[i]。最终尾数寄存器中存储的结果尾数数据M[i]即为块浮点后的尾数,而指数寄存器中存储的Emax即块浮点指数。(7) Synchronously, the digit calculation unit reads the updated E[i] value of the exponent calculation unit in the exponent register, and reads the result mantissa data M[i] stored on the corresponding index in the mantissa register according to the E[i] value Shift it, that is, M[i]=M[i]<<E[i], and then store the result back to the mantissa register, overwriting the original mantissa register M[i]. The result mantissa data M[i] stored in the final mantissa register is the mantissa after the block floating point, and the Emax stored in the exponent register is the block floating point exponent.
(8)步骤(1)~(4)是流水执行的,即后续步骤不必等待整个序列都完成计算才开始下一级计算;步骤(6)~(7)也是流水进行的,即后续步骤不必等待整个序列都完成计算才开始下一级计算。而步骤(5)则无法流水进行,因此涉及对步骤(4)的结果M[i]和E[i]的存储。当序列长度n>2N时,尾数寄存器和指数寄存器都无法完整存下步骤(4)的输出结构,因此就需数据存储单元将步骤(4)输出的存储在尾数寄存器中的M[i]部分数据存储到一级缓存中,而对应的数据加载单元将步骤(4)输出的存储在指数寄存器中的E[i]序列部分数据存储到零级缓存中。(8) Steps (1) to (4) are executed in a pipeline, that is, the subsequent steps do not have to wait for the entire sequence to complete the calculation before starting the next level of calculation; steps (6) to (7) are also executed in a pipeline, that is, the subsequent steps do not have to Wait for the entire sequence to complete the calculation before starting the next level of calculation. Step (5) cannot be performed in a pipeline, so it involves storing the results M[i] and E[i] of step (4). When the sequence length n>2N, neither the mantissa register nor the exponent register can completely store the output structure of step (4), so the data storage unit is required to store the output of step (4) in the M[i] part of the mantissa register The data is stored in the first-level cache, and the corresponding data loading unit stores the part of the E[i] sequence data stored in the exponent register output in step (4) into the zero-level cache.
(9)当步骤(4)的计算结构输出完成后,处理器通过数据加载单元将存储到一级缓存中的结果尾数数据M[i]取回再次存储到尾数寄存器中,同时指数寄存器的数据加载单元会自动化的根据尾数寄存器的数据加载单元操作将存储到零级缓存中的结果指数数据E[i]取回放到指数寄存器中,再进行步骤(6)的操作。(9) When the output of the calculation structure in step (4) is completed, the processor retrieves the result mantissa data M[i] stored in the first-level cache through the data loading unit and stores it in the mantissa register again, and at the same time the data in the exponent register The loading unit will automatically fetch and replay the result exponent data E[i] stored in the zero-level cache to the exponent register according to the data loading unit operation of the mantissa register, and then perform the operation of step (6).
本申请实施例的技术方案,通过将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定第一数据集和第二数据集的乘法结果数据集,通过归一化单元对乘法结果数据集内的乘法结果进行归一化,将生 成的结果尾数数据存储到尾数寄存器,将生成的指数数据存储到指数寄存器,指数计算单元在指数寄存器内选择数值最大的结果指数数据作为块浮点指数,尾数计算单元根据块浮点指数对尾数寄存器内的结果尾数数据进行移位运算生成块浮点数据,本申请实施例的技术方案,通过将结果尾数数据和结果指数数据分别放在两个寄存器中进行计算,降低了硬件设计和数据处理的耦合性,可以减少计算的复杂度,增加了结果尾数数据的可用位宽,提高了数据精度。In the technical solution of the embodiment of the present application, the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data. The technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.
以上所述,仅为本申请的示例性实施例而已,并非用于限定本申请的保护范围。The above are only exemplary embodiments of the present application, and are not used to limit the protection scope of the present application.
本领域内的技术人员应明白,术语用户终端涵盖任何适合类型的无线用户设备,例如移动电话、便携数据处理装置、便携网络浏览器或车载移动台。Those skilled in the art should understand that the term user terminal encompasses any suitable type of wireless user equipment, such as a mobile phone, a portable data processing device, a portable web browser, or a vehicle-mounted mobile station.
一般来说,本申请的多种实施例可以在硬件或专用电路、软件、逻辑或其任何组合中实现。例如,一些方面可以被实现在硬件中,而其它方面可以被实现在可以被控制器、微处理器或其它计算装置执行的固件或软件中,尽管本申请不限于此。In general, the various embodiments of the present application can be implemented in hardware or dedicated circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although the present application is not limited thereto.
本申请的实施例可以通过移动装置的数据处理器执行计算机程序指令来实现,例如在处理器实体中,或者通过硬件,或者通过软件和硬件的组合。计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码。The embodiments of the present application may be implemented by executing computer program instructions by a data processor of a mobile device, for example, in a processor entity, or by hardware, or by a combination of software and hardware. Computer program instructions can be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or Object code.
本申请附图中的任何逻辑流程的框图可以表示程序步骤,或者可以表示相互连接的逻辑电路、模块和功能,或者可以表示程序步骤与逻辑电路、模块和功能的组合。计算机程序可以存储在存储器上。存储器可以具有任何适合于本地技术环境的类型并且可以使用任何适合的数据存储技术实现,例如但不限于只读存储器(ROM)、随机访问存储器(RAM)、光存 储器装置和系统(数码多功能光碟DVD或CD光盘)等。计算机可读介质可以包括非瞬时性存储介质。数据处理器可以是任何适合于本地技术环境的类型,例如但不限于通用计算机、专用计算机、微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、可编程逻辑器件(FGPA)以及基于多核处理器架构的处理器。The block diagram of any logic flow in the drawings of the present application may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions. The computer program can be stored on the memory. The memory can be of any type suitable for the local technical environment and can be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical storage devices and systems (digital multi-function optical discs) DVD or CD) etc. Computer-readable media may include non-transitory storage media. The data processor can be any type suitable for the local technical environment, such as but not limited to general-purpose computers, special-purpose computers, microprocessors, digital signal processors (DSP), application-specific integrated circuits (ASIC), programmable logic devices (FGPA) And processors based on multi-core processor architecture.
通过示范性和非限制性的示例,上文已提供了对本申请的示范实施例的详细描述。但结合附图和权利要求来考虑,对以上实施例的多种修改和调整对本领域技术人员来说是显而易见的,但不偏离本发明的范围。因此,本发明的恰当范围将根据权利要求确定。By way of exemplary and non-limiting examples, a detailed description of the exemplary embodiments of the present application has been provided above. However, considering the accompanying drawings and claims, various modifications and adjustments to the above embodiments are obvious to those skilled in the art, but they do not deviate from the scope of the present invention. Therefore, the proper scope of the present invention will be determined according to the claims.

Claims (10)

  1. 一种浮点处理装置,包括:归一化单元、指数寄存器、尾数寄存器和乘累加单元,所述归一化单元分别与所述指数寄存器和所述尾数寄存器连接,所述乘累加单元与所述尾数寄存器连接;其中,所述归一化单元设置为对数据进行归一操作生成浮点数据,所述归一化单元将所述浮点数据的指数数据发送到所述指数寄存器进行存储,所述归一化单元将所述浮点数据的尾数数据发送到所述尾数寄存器进行存储;所述乘累加单元设置为根据所述尾数数据进行乘法运算。A floating-point processing device, comprising: a normalization unit, an exponent register, a mantissa register, and a multiply-accumulate unit, the normalized unit is connected to the exponent register and the mantissa register, and the multiply-accumulate unit is connected to the multiply and accumulate unit. The mantissa register is connected; wherein the normalization unit is configured to perform a normalization operation on data to generate floating-point data, and the normalization unit sends exponential data of the floating-point data to the exponent register for storage, The normalization unit sends the mantissa data of the floating point data to the mantissa register for storage; the multiply and accumulate unit is configured to perform a multiplication operation according to the mantissa data.
  2. 根据权利要求1所述的装置,其中,还包括:尾数计算单元、指数计算单元、关联更新单元和一级缓存,所述尾数计算单元和所述一级缓存分别与所述尾数寄存器连接,所述指数计算单元与所述指数寄存器连接,所述关联更新单元分别与所述指数寄存器和所述一级缓存连接;其中,所述尾数计算单元设置为根据所述尾数数据进行计算;所述指数计算单元设置为根据所述指数数据进行计算;所述关联更新单元设置为在指数寄存器溢出时,将所述指数寄存器内溢出的指数数据存储到所述一级缓存;所述一级缓存还设置为存储所述尾数寄存器溢出的尾数数据。The device according to claim 1, further comprising: a mantissa calculation unit, an exponent calculation unit, an associated update unit, and a first-level cache, the mantissa calculation unit and the first-level cache are respectively connected to the mantissa register, so The exponent calculation unit is connected to the exponent register, and the association update unit is respectively connected to the exponent register and the first level cache; wherein the mantissa calculation unit is configured to perform calculations based on the mantissa data; the exponent The calculation unit is configured to perform calculations based on the index data; the correlation update unit is configured to store the index data overflowed in the index register in the first level cache when the index register overflows; the level one cache is also configured To store the mantissa data overflowed by the mantissa register.
  3. 根据权利要求2所述的装置,其中,所述关联更新单元包括:The apparatus according to claim 2, wherein the association update unit comprises:
    零级缓存和地址转换后备缓冲器;所述零级缓存与所述指数寄存器连接;所述地址转换后备缓冲器分别与所述零级缓存和一级缓存连接;A zero-level cache and an address conversion backup buffer; the zero-level cache is connected to the index register; the address conversion backup buffer is respectively connected to the zero-level cache and the first-level cache;
    其中,所述零级缓存设置为将指数寄存器溢出的指数数据存储;所述地址转换后备缓冲器,设置为在所述零级缓冲产生溢出的指数数据时,将所述指数数据进行地址转换后存储到所述一级缓存,所述地址转换后的指数数据与所述一级缓存内存储的尾数数据的地址不冲突。Wherein, the zero-level buffer is set to store the exponential data overflowing from the exponent register; the address conversion backup buffer is set to perform address conversion on the exponent data when the zero-level buffer generates overflow exponent data Stored in the first level cache, and the exponent data after the address conversion does not conflict with the address of the mantissa data stored in the first level cache.
  4. 根据权利要求2所述的装置,其中,所述一级缓存和所述尾数寄存器之间还连接有数据存储单元和数据加载单元;The device according to claim 2, wherein a data storage unit and a data loading unit are also connected between the first level cache and the mantissa register;
    其中,所述数据存储单元,设置为将所述尾数寄存器溢出的所述尾数 数据存储到所述一级缓存;Wherein, the data storage unit is configured to store the mantissa data overflowed from the mantissa register in the first level cache;
    所述数据加载单元,设置为将所述一级缓存存储的尾数数据加载到所述尾数寄存器。The data loading unit is configured to load the mantissa data stored in the first level cache to the mantissa register.
  5. 根据权利要求2所述的装置,其中,所述尾数计算单元还与所述指数寄存器连接,所述尾数计算单元设置为获取所述指数寄存器存储的指数数据,并根据所述指数数据对所述尾数寄存器内的尾数数据进行移位运算。2. The device according to claim 2, wherein the mantissa calculation unit is further connected to the exponent register, and the mantissa calculation unit is configured to obtain exponent data stored in the exponent register, and perform calculations on the exponent according to the exponent data. The mantissa data in the mantissa register is shifted.
  6. 一种数据处理方法,该方法包括:A data processing method, the method includes:
    将第一数据集和第二数据集存储到尾数寄存器,并通过乘累加单元确定所述第一数据集和第二数据集的乘法结果数据集;Storing the first data set and the second data set in a mantissa register, and determining the multiplication result data set of the first data set and the second data set through a multiplying and accumulating unit;
    通过归一化单元对所述乘法结果数据集内至少一个乘法结果进行归一化操作,将生成的结果尾数数据和结果指数数据分别存储到尾数寄存器和指数寄存器;Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register respectively;
    确定所述指数寄存器内的结果指数数据的块浮点指数,并根据所述块浮点指数处理所述尾数寄存器内的结果尾数数据以生成块浮点数据。Determine the block floating point exponent of the result exponent data in the exponent register, and process the result mantissa data in the mantissa register according to the block floating point exponent to generate block floating point data.
  7. 根据权利要求6所述的方法,其中,所述结果尾数数据在所述尾数寄存器内的尾数存储地址与所述结果指数数据在所述指数寄存器内的指数存储地址相同。The method according to claim 6, wherein the mantissa storage address of the result mantissa data in the mantissa register is the same as the exponent storage address of the result exponent data in the exponent register.
  8. 根据权利要求6所述的方法,其中,所述确定所述指数寄存器内的结果指数数据的块浮点指数,并根据所述块浮点指数处理所述尾数寄存器内的结果尾数数据以生成块浮点数据,包括:The method according to claim 6, wherein the determining the block floating point exponent of the result exponent data in the exponent register, and processing the result mantissa data in the mantissa register according to the block floating point exponent to generate a block Floating point data, including:
    指数计算单元在所述指数寄存器内选择数值最大的结果指数数据作为所述块浮点指数;The exponent calculation unit selects the result exponent data with the largest value in the exponent register as the block floating-point exponent;
    尾数计算单元根据所述块浮点指数对所述尾数寄存器内的所述结果尾数数据进行移位运算;A mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent;
    将移位运算后的结果尾数数据和所述块浮点指数作为块浮点数据。The mantissa data of the result of the shift operation and the block floating point exponent are used as block floating point data.
  9. 根据权利要求8所述的方法,其中,所述指数计算单元在所述指数寄存器内选择数值最大的结果指数数据作为所述块浮点指数,包括:8. The method according to claim 8, wherein the exponent calculation unit selecting the result exponent data with the largest value in the exponent register as the block floating-point exponent comprises:
    指数计算单元读取所述指数寄存器内地址最低的结果指数数据作为当前最大结果指数;The index calculation unit reads the result index data with the lowest address in the index register as the current maximum result index;
    指数计算单元依次读取所述指数寄存器内剩余的所述结果指数数据并与所述当前最大结果指数进行比较,若所述结果指数数据大于所述当前最大结果指数,则将所述结果指数数据作为所述当前最大结果指数;The index calculation unit sequentially reads the result index data remaining in the index register and compares it with the current maximum result index, and if the result index data is greater than the current maximum result index, then the result index data As the current maximum result index;
    当指数计算单元完成所述指数寄存器内结果指数数据的读取时,则将所述当前最大结果指数作为所述块浮点指数。When the exponent calculation unit finishes reading the result exponent data in the exponent register, the current maximum result exponent is used as the block floating-point exponent.
  10. 根据权利要求8所述的方法,其中,所述尾数计算单元根据所述块浮点指数对所述尾数寄存器内的所述结果尾数数据进行移位运算,包括:8. The method according to claim 8, wherein the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent, comprising:
    尾数计算单元依次读取结果尾数数据并在所述指数寄存器读取所述结果尾数数据对应的结果指数数据;The mantissa calculation unit sequentially reads the result mantissa data and reads the result exponent data corresponding to the result mantissa data in the exponent register;
    将所述块浮点指数与所述结果指数数据的差值作为移位位数;Taking the difference between the block floating point exponent and the result exponent data as the number of shift bits;
    根据所述移位位数对所述结果尾数数据进行移位操作。Perform a shift operation on the result mantissa data according to the shift bit number.
PCT/CN2020/123736 2019-12-17 2020-10-26 Floating point processing device and data processing method WO2021120851A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911304990.7 2019-12-17
CN201911304990.7A CN112988110A (en) 2019-12-17 2019-12-17 Floating point processing device and data processing method

Publications (1)

Publication Number Publication Date
WO2021120851A1 true WO2021120851A1 (en) 2021-06-24

Family

ID=76343611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123736 WO2021120851A1 (en) 2019-12-17 2020-10-26 Floating point processing device and data processing method

Country Status (2)

Country Link
CN (1) CN112988110A (en)
WO (1) WO2021120851A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023147770A1 (en) * 2022-02-02 2023-08-10 吕仁硕 Floating point number operation method and related arithmetic unit
EP4343684A1 (en) * 2022-06-24 2024-03-27 Calterah Semiconductor Technology (Shanghai) Co., Ltd Data processing method and apparatus, and radar sensor

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292750A1 (en) * 2008-05-22 2009-11-26 Videolq, Inc. Methods and apparatus for automatic accuracy- sustaining scaling of block-floating-point operands
CN110050256A (en) * 2016-12-07 2019-07-23 微软技术许可有限责任公司 Block floating point for neural fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292750A1 (en) * 2008-05-22 2009-11-26 Videolq, Inc. Methods and apparatus for automatic accuracy- sustaining scaling of block-floating-point operands
CN110050256A (en) * 2016-12-07 2019-07-23 微软技术许可有限责任公司 Block floating point for neural fusion

Also Published As

Publication number Publication date
CN112988110A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
US10514912B2 (en) Vector multiplication with accumulation in large register space
US11175891B2 (en) Systems and methods to perform floating-point addition with selected rounding
US11036504B2 (en) Systems and methods for performing 16-bit floating-point vector dot product instructions
US20060179092A1 (en) System and method for executing fixed point divide operations using a floating point multiply-add pipeline
CN112639722A (en) Apparatus and method for accelerating matrix multiplication
JP2014093085A (en) Reducing power consumption in fma unit responsive to input data values
WO2021120851A1 (en) Floating point processing device and data processing method
TWI493453B (en) Microprocessor, video decoding device, method and computer program product for enhanced precision sum-of-products calculation on a microprocessor
TW201730752A (en) Hardware accelerators and methods for stateful compression and decompression operations
EP3394729B1 (en) Fused multiply add (fma) low functional unit
WO2010051298A2 (en) Instruction and logic for performing range detection
CN111813371B (en) Floating point division operation method, system and readable medium for digital signal processing
EP4276608A2 (en) Apparatuses, methods, and systems for 8-bit floating-point matrix dot product instructions
US20210279038A1 (en) Using fuzzy-jbit location of floating-point multiply-accumulate results
US20050172210A1 (en) Add-compare-select accelerator using pre-compare-select-add operation
TWI774093B (en) Converter, chip, electronic equipment and method for converting data types
US20140136820A1 (en) Recycling Error Bits in Floating Point Units
US10503473B1 (en) Floating-point division alternative techniques
US20210182067A1 (en) Apparatuses, methods, and systems for instructions to multiply floating-point values of about one
CN108292219B (en) Floating Point (FP) add low instruction functional unit
US11875154B2 (en) Apparatuses, methods, and systems for instructions to multiply floating-point values of about zero
US11847450B2 (en) Apparatuses, methods, and systems for instructions to multiply values of zero
CN117785113A (en) Computing device and method, electronic device, and storage medium
CN116382618A (en) Single-precision floating point arithmetic device
JPH06301710A (en) Method and device for double precision product-sum operation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20901527

Country of ref document: EP

Kind code of ref document: A1