WO2021120851A1

WO2021120851A1 - Floating point processing device and data processing method

Info

Publication number: WO2021120851A1
Application number: PCT/CN2020/123736
Authority: WO
Inventors: 张磊
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2019-12-17
Filing date: 2020-10-26
Publication date: 2021-06-24
Also published as: CN112988110A

Abstract

The present application provides a floating point processing device and a data processing method. The device comprises: a normalizing unit, an index register, a mantissa register, and a multiply-accumulate unit; the normalizing unit is separately connected to the index register and the mantissa register, and the multiply-accumulate unit is connected to the mantissa register; the normalizing unit is configured to perform a normalization operation on data to generate floating point data, the normalizing unit sends index data of the floating point data to the index register for storage, and the normalizing unit sends mantissa data of the floating point data to the mantissa register for storage; the multiply-accumulate unit is configured to perform multiplication operation according to the mantissa data. According to the technical solutions of the embodiments of the present application, the index data and the mantissa data of the floating point data are respectively stored into the index register and the mantissa register, so that the overflow probability of mantissas in a floating-point operation process is reduced, the bit width of the mantissa data is expanded, and the data accuracy can be improved.

Description

Floating point processing device and data processing method

Technical field

This application relates to the field of data communication, and in particular to a floating-point processing device and a data processing method.

Background technique

The current processor mainly includes a floating-point processing unit and a fixed-point processing unit. The fixed-point processing unit has low processing accuracy but low hardware resource overhead, while the floating-point processing unit has high processing accuracy but high hardware resource overhead. With the continuous evolution of technology and protocols, the amount of data processed by processors has gradually increased, and the requirements for data accuracy have gradually increased. In some fields (such as baseband communications), traditional fixed-point processing has been unable to meet the accuracy requirements, and there is an urgent need for a Improve processing accuracy but controllable hardware resource overhead.

In the prior art, block floating point is used to improve the accuracy of the data processing process under the condition that the hardware resource overhead is small. The basic principle of block floating point is shown in Figure 1. The data blocks in a data segment have different mantissas M0. ~M7, but there is only one exponent E. This exponent E corresponds to the exponent of the largest absolute value data in the data block. When the dynamic range of the data in a data block is not much different, because the entire data segment of the floating point of the data block has only An exponent, so there is no need to pack it with the data mantissa. In the case of the same data bit width, block floating point can retain more mantissas and higher precision than pure floating point. In addition, since the data scaling in the block floating point is the same, the addition operation can be performed directly, no additional shift operation is required, and the number of operations is less. However, the existing processor only includes one data register. The data register stores both the exponent data of the block floating point and the mantissa data of the block floating point. When the block floating point is processed, the register overflow problem will be caused. The accuracy is poor.

Summary of the invention

This application provides a floating-point processing device and a data processing method.

An embodiment of the present application provides a floating-point processing device, which includes:

A normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit, the normalization unit is respectively connected to the exponent register and the mantissa register, and the multiplication and accumulation unit is connected to the mantissa register; wherein, the The normalization unit is configured to perform a normalization operation on data to generate floating-point data, the normalization unit sends the exponential data of the floating-point data to the exponent register for storage, and the normalization unit converts the The mantissa data of the block floating point data is sent to the mantissa register for storage; the multiply and accumulate unit is configured to perform multiplication operations according to the mantissa data.

An embodiment of the present application provides a data processing method, which includes:

The first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit; the multiplication result data set is processed by the normalization unit At least one multiplication result in the multiplication result is normalized, and the generated result mantissa data and result exponent data are stored in the mantissa register and the exponent register respectively; the block floating-point exponent of the result exponent data in the exponent register is determined, and according to the The block floating point exponent processes the resulting mantissa data in the mantissa register to generate block floating point data.

Regarding the above embodiments and other aspects of the application and their implementation manners, more descriptions are provided in the description of the drawings, the specific implementation manners, and the claims.

Description of the drawings

Fig. 1 is an example diagram of block floating point data in the prior art;

2 is a schematic structural diagram of a floating-point processing device provided by an embodiment of the application;

3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application;

4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application;

FIG. 5 is a flowchart of steps of a data processing method provided by an embodiment of this application;

FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application;

FIG. 7 is an example diagram of a data processing method provided by an embodiment of this application;

FIG. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the present application clearer, the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other arbitrarily if there is no conflict.

FIG. 2 is a schematic structural diagram of a floating-point processing device provided by an embodiment of the application. The embodiment of the application may be applicable to the case of processing floating-point data in a processor. The device may be implemented by software and/or hardware, and Generally, it can be integrated in a processor. Referring to FIG. 2, the floating-point processing device of the embodiment of the present application includes: a normalization unit 10, an exponent register 11, a mantissa register 12, and a multiply-accumulate unit 13. The normalization unit 10 respectively Connected to the exponent register 11 and the mantissa register 12, the multiplying and accumulating unit 13 is connected to the mantissa register 12; wherein, the normalizing unit 10 is configured to perform a normalization operation on data to generate floating-point data, The normalization unit 10 sends the exponent data of the floating-point data to the exponent register 11 for storage, and the normalization unit 10 sends the mantissa data of the floating-point data to the mantissa register 12 for storage. Storage; The multiplication and accumulation unit 13 is configured to perform multiplication operations according to the mantissa data.

Among them, the normalization unit 10 can be a processor module that processes data, and can process input data into floating-point data, and the data processed by the normalization unit 10 can be fixed-point data and floating-point data. The normalization unit 10 The data can be converted according to the bit width requirement of the mantissa in the mantissa register 12, and the fixed-point data or floating-point data can be converted into floating-point data that meets the 12-bit width requirement of the mantissa register. For example, the fixed-point data 6023 can be normalized Unit 10 is converted into a floating point number 6.023E3.

Optionally, the exponent register 11 and the mantissa register 12 can be small storage data storing binary data, and can temporarily store data and operation structures involved in the operation. The exponent register 11 and the mantissa register 12 can be specific sequential logic circuits, and the exponent register 11 can The exponent data of the floating point data is stored, and the mantissa register 12 can store the mantissa data of the floating point data. It can be understood that the floating-point processing device in the embodiment of the present application may include multiple exponent registers 11 and mantissa registers 12. Exemplarily, it may include N bit widths of 40-bit mantissa registers 12 and 2N bit widths of 8-bit exponent registers 11, where the number of N may be related to the number of data processed at the same time in the embodiment of the present application . The lower 32 bits in the mantissa register 12 can be data bits for storing mantissa data, and the upper 8 bits can be extended bits. In the embodiment of the present application, the mantissa register 12 can store the mantissa of floating-point data or fixed-point data. Optionally, the mantissa data and exponent data stored in the mantissa register 12 and the exponent register 11 have a corresponding relationship, which belongs to For the mantissa data and exponent data of the same floating point data, the storage address of the mantissa data in the mantissa register 12 may be the same as the storage address of the exponent data in the exponent register 11.

In the embodiment of the present application, the multiplication and accumulation unit 12 may also be included. The multiplication and accumulation unit 12 may perform multiplication and multiplication and addition operations on the mantissa data. For example, when the multiplication and accumulation unit 12 is an 18-bit multiplication and accumulation unit, The mantissa data can be multiplied to generate a 32-bit multiplication result and multiply and accumulate to generate a 40-bit multiply and accumulate result.

The technical solution of the embodiment of the present application constitutes a floating-point processing device by a normalization unit, an exponent register, a mantissa register, and a multiplication and accumulation unit. The normalization unit is connected to the exponent register and the mantissa register respectively, and the multiplication and accumulation unit is connected to the mantissa. Register connection, the normalization unit can normalize the data to generate floating point data, the exponent data of the floating point data is sent to the exponent register for storage, the mantissa data of the floating point data is sent to the mantissa register for storage, and the multiplying and accumulating unit is set according to the mantissa The data is multiplied to realize the processing of floating-point data. The exponent data and mantissa data are stored in the exponent register and the mantissa register respectively, which reduces the overflow probability of the register, increases the storage bit width of the mantissa data, and improves the floating-point data The data accuracy of the index and the independent setting of the index register allow the operation of index data to be decoupled from the hardware structure and data calculation, which simplifies the design difficulty of the hardware structure.

FIG. 3 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application. Referring to FIG. 3, the floating-point processing device in an embodiment of the application further includes a mantissa calculation unit 14, an exponent calculation unit 15, and an association update unit 16. And the first level cache 17, the mantissa calculation unit 14 and the first level cache 17 are respectively connected to the mantissa register 12, the exponent calculation unit 15 is connected to the exponent register 11, and the association update unit 16 is respectively connected to The exponent register 11 is connected to the first level cache 17; wherein the mantissa calculation unit 14 is configured to perform calculations based on the mantissa data; the exponent calculation 15 unit is configured to perform calculations based on the exponent data; The associated update unit 16 is set to store the exponent data overflowed in the exponent register 11 in the first level cache 17 when the exponent register 11 overflows; the first level cache 17 is also set to store the overflow of the mantissa register 12 Mantissa data.

In the embodiment of the present application, the exponent register 11 and the mantissa register 12 may also be connected to the exponent calculation unit 15 and the mantissa calculation unit 14, respectively, and the mantissa calculation unit 14 and the exponent calculation unit 15 may be an arithmetic logic unit (ALU). ), can be a combinational logic circuit that implements arithmetic operations and logical operations, can be specifically the execution unit of the processor, the mantissa calculation unit 14 can complete the calculation of the mantissa data, and the exponent calculation unit 15 can implement the calculation of the exponent data, where the exponent calculation The calculations performed by the unit 15 and the mantissa calculation unit 14 may include, but are not limited to, addition, subtraction, multiplication, division, and operation, or operation, not operation, exclusive OR operation, displacement operation, and so on.

The associated update unit 16 may store the overflow exponent data in the exponent register 11 to the first level cache 17. Since the mantissa data in the mantissa register 12 has the same storage address as the exponent data in the exponent register 11, when the exponent data in the exponent register 11 When overflowing, the overflowed exponent data can be stored in the first level cache 17. Since the overflowed exponent data has an address conflict with the overflowed mantissa data, the storage address of the exponent data can be converted, for example, the corresponding offset can be increased or decreased It can prevent the exponent data from overflowing when stored in the first level cache 17, and the exponent data and mantissa data are stored in the same location, causing data conflicts. The first level cache 17 can be specifically L1Cache, which can be integrated inside the processor, and can be set to temporarily store data during data processing. In the embodiment of the present application, the first level cache 17 can store the mantissa data and exponent register overflowed from the mantissa register 12 11 Overflowing index data.

FIG. 4 is a schematic structural diagram of another floating-point processing device provided by an embodiment of the application. In the embodiment of the application, the association update unit is embodied. Referring to FIG. 4, in the floating-point processing device of the embodiment of the application, the association The update unit may also include a zero-level buffer 161 and a bypass forwarding buffer 162. A zero-level cache 161 and an address conversion backup buffer 162; the zero-level cache 161 is connected to the index register 11; the address conversion backup buffer 162 is respectively connected to the zero-level cache 161 and the first-level cache 17; wherein The zero-level buffer 161 is set to store the exponential data overflowed from the exponent register 11; the address conversion back-up buffer 162 is set to store the exponent data when the zero-level buffer 161 generates overflow exponent data The address conversion is stored in the first level cache 17, and the exponent data after the address conversion does not conflict with the address of the mantissa data stored in the first level cache 17.

In the embodiment of the present application, the zero-level cache 161 may be a cache responsible for caching index data, and can support byte addressing. When the zero-level cache 161 overflows, the index data can be sent to the first-level cache 17 for storage. The address in the register and the address of the mantissa data in the mantissa register are the same, and the exponent data overflowed by the zero-level buffer 161 through the bypass forwarding buffer 162 can be address converted, where the bypass forwarding buffer 162 can be set to the physical address and The buffer where the virtual address is converted can convert the physical address of the exponent data to a virtual address to prevent the exponent data overflowed from the zero-level buffer 161 from being stored in the first-level buffer 17 to cause an address conflict. An independent overflow update mechanism is implemented through the zero-level buffer 161 and the bypass forwarding buffer 162, which facilitates block floating point calculations without considering the overflow of exponential data.

Referring to FIG. 4, in the floating-point processing device in the embodiment of the present application, a data storage unit 19 and a data loading unit 18 are also connected between the primary cache 17 and the mantissa register 12; wherein, the data storage unit 19, Is configured to store the mantissa data overflowed by the mantissa register 12 into the first-level cache 17; the data loading unit 18 is configured to load the mantissa data stored in the first-level cache 17 to the mantissa register 12 . The mantissa calculation unit 14 is also connected to the exponent register 11, and the mantissa calculation unit 14 is configured to obtain the exponent data stored in the exponent register 11, and compare the mantissa data in the mantissa register 12 according to the exponent data. Perform shift operations.

In the embodiment of the present application, the data storage unit 19 and the data loading unit 18 can also be used to transfer the mantissa data between the first level cache 17 and the mantissa register 12. Optionally, it can also be used in the zero level cache 161 and the exponent register. A data storage unit and a data loading unit are arranged between 11 to realize the storage and loading of index data. Optionally, in the embodiment of the present application, the mantissa calculation unit 14 may also be connected to the exponent register 11 to obtain exponent data stored in the exponent register 11, and the mantissa calculation unit 14 may perform calculations on the mantissa register 12 according to the obtained exponent data. The corresponding mantissa data is shifted.

FIG. 5 is a flow chart of the steps of a data processing method provided by an embodiment of the application. The embodiment of the application may use data processing in a processor. The method may be executed by the floating-point processing device in the embodiment of the application. The device can be implemented in software and/or hardware, and generally can be integrated in a processor. The data processing method in the embodiment of the present application includes:

Step 101: Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.

Wherein, the first data set and the second data set may be data sets participating in operations, the first data set and the second data set may include at least one piece of data, and the data may be fixed-point data, floating-point data, and block floating-point data.

In the embodiment of the present application, the acquired first data set and the second data set can be stored as fixed-point data or mantissa data in the mantissa register. It can be understood that if floating-point numbers exist in the first data set and the second data set Data and block floating point data can be obtained through the normalization unit and the corresponding mantissa data can be stored in the mantissa register. After the first data set and the second data set are stored in the mantissa register, the result data set can be multiplied by multiplying and accumulating unit technology. Exemplarily, suppose there are two arrays X and Y of length n, and the bit width of each element in the array can be 16 bits. To calculate Z[i]=X[i]*Y[i], where, i can be any integer from 0 to n-1, and the block floating point data can be normalized to 16 bits by normalization processing first. Continuously read X[i] and Y[i] from the memory to the mantissa register, and the multiplication result data set Z[i]=X[i]*Y[i] can be calculated by the multiplication and accumulation unit at the same time, and the multiplication can also be performed The result data set is stored in the mantissa register.

Step 102: Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.

Among them, the multiplication result can be a constituent element of the multiplication result data set, and the multiplication result can be a full-precision result, which does not meet the requirements of normalization. For example, the multiplication result is 123.E6, which is normalized in the embodiment of this application. The requirement is 1.23E8, and the normalization unit is required to normalize the multiplication result in the multiplication result data set to change the accuracy of the mantissa data of the multiplication result.

In the embodiment of the present application, the multiplication result in the multiplication result data set can be normalized, the multiplication result that does not meet the normalization requirements can be changed, the accuracy of the multiplication result can be adjusted, and the multiplication result after the normalization operation can be adjusted. The result mantissa data is stored in the mantissa register, and the result exponent data is stored in the exponent register.

Exemplarily, the 32-bit result multiplication result Z[i] can be read from the mantissa register to perform a normalization operation to obtain 16-bit mantissa data M[i] and the corresponding 8-bit exponent data E[ i]. Among them, M[i] is stored back to the mantissa register, and E[i] can be stored in the exponent register, and the indexes of M[i] and E[i] in the mantissa register and the exponent register are in one-to-one correspondence. Wherein, i can be any value from 0 to the number of multiplication results in the multiplication result data set.

In one embodiment, the mantissa storage address of the result mantissa data in the mantissa register is the same as the exponent storage address of the result exponent data in the exponent register.

Optionally, in order to facilitate data processing operations, the result mantissa data can be stored in association with the result exponent data, and the mantissa storage address of the result mantissa data in the mantissa register can be the same as the exponent storage address of the result exponent data in the result register, which may include The physical address is the same or the logical address is the same, or the physical address of the mantissa storage address is the same as the logical address of the exponent storage address, or the logical address of the mantissa storage address is the same as the physical address of the exponent storage address.

Step 103: Determine the block floating point exponent of the result exponent data in the exponent register, and process the result mantissa data in the mantissa register according to the block floating point exponent to generate block floating point data.

Wherein, the block floating-point exponent may be the block floating-point exponent when each multiplication result in the multiplication result data set is converted into a block floating-point data format, and the block floating-point exponent may be the maximum value in the result exponent data corresponding to each multiplication result.

Optionally, the result exponent with the largest value can be found in the exponent register as the block floating-point exponent. Since the size of the result exponent data of each multiplication result is not the same, the result exponent data needs to be shared with the same floating-point exponent, and each multiplication is required. The precision of the result mantissa data corresponding to the result can be adjusted, the result mantissa in the mantissa register can be shifted to adjust the precision, and the multiplication result in the multiplication result data set after the adjusted precision can be used as block floating point data.

In the technical solution of the embodiment of the present application, the first data set and the second data set participating in data processing are stored in a mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the multiplication result data Set the normalization operation in the normalization unit, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, determine the block floating-point exponent by the result exponent data in the exponent register, and compare the result The mantissa data is shifted to convert the multiplication result data set into block floating point data. The technical solution of the embodiment of the present application increases or decreases the data bit width of the mantissa data by separately setting the mantissa register and the exponent register, thereby improving the data accuracy and reducing the data. The coupling degree of the processing process can reduce the design complexity of the hardware.

FIG. 6 is a flow chart of the steps of a data processing method provided by an embodiment of the application. Referring to FIG. 6, the data processing method of the embodiment of the application specifically includes:

Step 201: Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.

Optionally, the first data set and the second number set can be pre-stored in the mantissa register, the data in the first data set and the second number set can be multiplied with full precision in the multiplication and accumulation unit, and the multiplication result can be formed into a multiplication. The result data set.

Step 202: Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.

In the embodiment of the present application, the multiplication result in the multiplication result data set can be normalized to compress the accuracy of the mantissa data stored in the mantissa register, and new result mantissa data and result knowledge data can be obtained, and the result mantissa data can be changed And the result exponent data are stored in the mantissa register and exponent register respectively.

Step 203: The exponent calculation unit selects the result exponent data with the largest value in the exponent register as the block floating-point exponent.

Optionally, the index calculation unit may sequentially read the result index data stored in the index register, may compare the read result index data, and may use the result index data with the largest value as the block floating point index.

Step 204: The mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent.

Optionally, the block floating-point exponent can be used as the shift basis to shift each result mantissa data in the mantissa register, so that the result exponent data corresponding to each result mantissa data is the same as the block floating-point exponent, for example, the result exponent data The difference between the value of the block floating-point exponent is 1, and the resultant mantissa data in the mantissa register can be moved one bit to the right.

Step 205: Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.

In the embodiment of the present application, the data composed of the shifted result mantissa data and the floating point exponent may be used as the block floating point data, and the exponent data of each result mantissa data may be the block floating point exponent.

Exemplarily, FIG. 7 is an example diagram of a data processing method provided by an embodiment of the application. Referring to FIG. 7, after the normalization unit performs normalization compression processing on the 64-bit multiplication result output by the multiplication and accumulation unit , You can get 4 32-bit result mantissa data (M0～M4) and 4 independent result index data (E0～E3), which are stored in the exponent register and the mantissa register respectively. At this time, the exponent in the exponent register is calculated The unit can simultaneously compare the maximum values of E0 to E3, and find the largest result index data as the block floating point index. Find the maximum value (such as E3) of the four result exponent data (E0～E3), and the result mantissa data (M0～M2) on the mantissa register is re-shifted according to the block floating-point exponent E3 by the mantissa calculation unit, so that The precision of the resultant mantissa data is aligned to E3.

In the technical solution of the embodiment of the present application, the first data set and the second data set are stored in the mantissa register, and the multiplication result data set of the first data set and the second data set is determined by the multiplication and accumulation unit, and the normalization unit Normalize the multiplication result in the multiplication result data set, store the generated result mantissa data in the mantissa register, store the generated exponent data in the exponent register, and the exponent calculation unit selects the result exponent data with the largest value in the exponent register as Block floating-point exponent, the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent to generate block floating-point data. The technical solution of the embodiment of the present application puts the result mantissa data and the result exponent data separately Calculating in two registers reduces the coupling of hardware design and data processing, reduces the complexity of calculation, increases the available bit width of the result mantissa data, and improves the data accuracy.

Fig. 8 is a flow chart of the steps of a data processing method provided by an embodiment of the application. In the embodiment of the application, the shift of the result mantissa data is embodied. Referring to Fig. 8, the data processing method of the embodiment of the application includes:

Step 301: Store the first data set and the second data set in the mantissa register, and determine the multiplication result data set of the first data set and the second data set through the multiplication and accumulation unit.

Step 302: Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register, respectively.

Step 303: The index calculation unit reads the result index data with the lowest address in the index register as the current maximum result index.

Among them, the address can be the storage address corresponding to the result index data in the index register, the lowest address can be the address with the smallest value corresponding to the storage address, and the current maximum result index can be the result index with the largest value in the result index data read from the index register. data.

Optionally, the index calculation unit may read the result index data from the index register according to the minimum address requirement, and may use the read result index data as the current maximum result index. It is understandable that when the index calculation unit reads the current maximum result index, it can also read the result index data with the highest address or read any result index data as the current maximum result index.

Step 304: The index calculation unit sequentially reads the result index data remaining in the index register and compares it with the current maximum result index. If the result index data is greater than the current maximum result index, then the The result index data is used as the current maximum result index.

Optionally, the index calculation unit can sequentially read the remaining result index data in the index register. Each time a result index data is read, the read result index data can be compared with the current maximum result index. The result index data obtained is greater than the current maximum result index, and the read result index data can be used as the current maximum result index. Optionally, in order to facilitate the shift operation of the result mantissa data, if the read result exponent data is less than or equal to the current maximum result exponent, the difference between the current maximum result exponent and the result exponent data can be stored in the exponent register to replace the original There are stored result index data.

Step 305: When the exponent calculation unit finishes reading the result exponent data in the exponent register, use the current maximum result exponent as the block floating point exponent.

Optionally, after reading all the result index data in the index register, the current maximum result index can be used as the block floating index. Since the current maximum result index has been compared with all the result index data, the current maximum result index can be The maximum value among all result index data.

Step 306: The mantissa calculation unit sequentially reads the result mantissa data and reads the result exponent data corresponding to the result mantissa data in the exponent register.

In the embodiment of the present application, the storage addresses of the result mantissa data and the result exponent data are the same, and the corresponding result index data can be read in the exponent register according to the storage address of the result mantissa data.

Step 307: Use the difference between the block floating point exponent and the result exponent data as the number of shift bits.

Optionally, the difference between the result exponent data and the block floating point exponent may be determined by the exponent calculation unit or the mantissa calculation unit as the shift bit, where the shift bit may be the bit shifted by the mantissa data.

Step 308: Perform a shift operation on the result mantissa data according to the shift mantissa.

Optionally, each result bit data in the mantissa register can be shifted to the right by the bit width corresponding to the shift bit.

Step 309: Use the mantissa data of the result of the shift operation and the block floating point exponent as block floating point data.

Exemplarily, suppose there are two arrays X and Y of length n, and the bit width of each element in the array is a 16-bit integer. To calculate Z[i]=X[i]*Y[i], and perform block floating Normalized to 16bit, where i=0～n-1.

(1) The data loading unit continuously reads X[i] and Y[i] from the memory to the mantissa register;

(2) Synchronously, the multiplication and accumulation unit calculates Z[i]=X[i]*Y[i] and stores it in the mantissa register. At this time, the multiplication result data set Z[i] can be the multiplication result of full precision. The bit width is 32bit;

(3) Synchronous, the normalization unit synchronously reads the 32-bit multiplication result Z[i] from the mantissa register for normalization operation, and obtains the 16-bit result mantissa data M[i] and the corresponding 8-bit result exponent data E[ i]. Among them, the result mantissa data M[i] can be stored in the mantissa register, and the result exponent data E[i] is stored in the exponent register, and the result mantissa data M[i] and the result exponent data E[i] are in the mantissa register There is a one-to-one correspondence with the index in the index register.

(4) Synchronously, the index calculation unit monitors the dynamic range of the result index data E[i] stored in the index register by the normalization unit, that is, the maximum value of the search result index data E[i].

(5) After the index calculation unit completes the comparison of the last result index data E[N-1] of the result index data E[i] sequence, the index calculation unit can obtain the maximum value Emax of the result index data E[i] sequence , Store Emax as independent data in the index register.

(6) The index calculation unit sequentially reads the result index data E[i] in the index register, calculates E[i]=Emax-E[i], which is the number of shift bits, and saves the calculation result back to the same index in the index register , Overwrite the original E[i].

(7) Synchronously, the digit calculation unit reads the updated E[i] value of the exponent calculation unit in the exponent register, and reads the result mantissa data M[i] stored on the corresponding index in the mantissa register according to the E[i] value Shift it, that is, M[i]=M[i]<<E[i], and then store the result back to the mantissa register, overwriting the original mantissa register M[i]. The result mantissa data M[i] stored in the final mantissa register is the mantissa after the block floating point, and the Emax stored in the exponent register is the block floating point exponent.

(8) Steps (1) to (4) are executed in a pipeline, that is, the subsequent steps do not have to wait for the entire sequence to complete the calculation before starting the next level of calculation; steps (6) to (7) are also executed in a pipeline, that is, the subsequent steps do not have to Wait for the entire sequence to complete the calculation before starting the next level of calculation. Step (5) cannot be performed in a pipeline, so it involves storing the results M[i] and E[i] of step (4). When the sequence length n>2N, neither the mantissa register nor the exponent register can completely store the output structure of step (4), so the data storage unit is required to store the output of step (4) in the M[i] part of the mantissa register The data is stored in the first-level cache, and the corresponding data loading unit stores the part of the E[i] sequence data stored in the exponent register output in step (4) into the zero-level cache.

(9) When the output of the calculation structure in step (4) is completed, the processor retrieves the result mantissa data M[i] stored in the first-level cache through the data loading unit and stores it in the mantissa register again, and at the same time the data in the exponent register The loading unit will automatically fetch and replay the result exponent data E[i] stored in the zero-level cache to the exponent register according to the data loading unit operation of the mantissa register, and then perform the operation of step (6).

The above are only exemplary embodiments of the present application, and are not used to limit the protection scope of the present application.

Those skilled in the art should understand that the term user terminal encompasses any suitable type of wireless user equipment, such as a mobile phone, a portable data processing device, a portable web browser, or a vehicle-mounted mobile station.

In general, the various embodiments of the present application can be implemented in hardware or dedicated circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although the present application is not limited thereto.

The embodiments of the present application may be implemented by executing computer program instructions by a data processor of a mobile device, for example, in a processor entity, or by hardware, or by a combination of software and hardware. Computer program instructions can be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or Object code.

The block diagram of any logic flow in the drawings of the present application may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions. The computer program can be stored on the memory. The memory can be of any type suitable for the local technical environment and can be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical storage devices and systems (digital multi-function optical discs) DVD or CD) etc. Computer-readable media may include non-transitory storage media. The data processor can be any type suitable for the local technical environment, such as but not limited to general-purpose computers, special-purpose computers, microprocessors, digital signal processors (DSP), application-specific integrated circuits (ASIC), programmable logic devices (FGPA) And processors based on multi-core processor architecture.

By way of exemplary and non-limiting examples, a detailed description of the exemplary embodiments of the present application has been provided above. However, considering the accompanying drawings and claims, various modifications and adjustments to the above embodiments are obvious to those skilled in the art, but they do not deviate from the scope of the present invention. Therefore, the proper scope of the present invention will be determined according to the claims.

Claims

A floating-point processing device, comprising: a normalization unit, an exponent register, a mantissa register, and a multiply-accumulate unit, the normalized unit is connected to the exponent register and the mantissa register, and the multiply-accumulate unit is connected to the multiply and accumulate unit. The mantissa register is connected; wherein the normalization unit is configured to perform a normalization operation on data to generate floating-point data, and the normalization unit sends exponential data of the floating-point data to the exponent register for storage, The normalization unit sends the mantissa data of the floating point data to the mantissa register for storage; the multiply and accumulate unit is configured to perform a multiplication operation according to the mantissa data.
The device according to claim 1, further comprising: a mantissa calculation unit, an exponent calculation unit, an associated update unit, and a first-level cache, the mantissa calculation unit and the first-level cache are respectively connected to the mantissa register, so The exponent calculation unit is connected to the exponent register, and the association update unit is respectively connected to the exponent register and the first level cache; wherein the mantissa calculation unit is configured to perform calculations based on the mantissa data; the exponent The calculation unit is configured to perform calculations based on the index data; the correlation update unit is configured to store the index data overflowed in the index register in the first level cache when the index register overflows; the level one cache is also configured To store the mantissa data overflowed by the mantissa register.
The apparatus according to claim 2, wherein the association update unit comprises:

A zero-level cache and an address conversion backup buffer; the zero-level cache is connected to the index register; the address conversion backup buffer is respectively connected to the zero-level cache and the first-level cache;

Wherein, the zero-level buffer is set to store the exponential data overflowing from the exponent register; the address conversion backup buffer is set to perform address conversion on the exponent data when the zero-level buffer generates overflow exponent data Stored in the first level cache, and the exponent data after the address conversion does not conflict with the address of the mantissa data stored in the first level cache.
The device according to claim 2, wherein a data storage unit and a data loading unit are also connected between the first level cache and the mantissa register;

Wherein, the data storage unit is configured to store the mantissa data overflowed from the mantissa register in the first level cache;

The data loading unit is configured to load the mantissa data stored in the first level cache to the mantissa register.
2. The device according to claim 2, wherein the mantissa calculation unit is further connected to the exponent register, and the mantissa calculation unit is configured to obtain exponent data stored in the exponent register, and perform calculations on the exponent according to the exponent data. The mantissa data in the mantissa register is shifted.
A data processing method, the method includes:

Storing the first data set and the second data set in a mantissa register, and determining the multiplication result data set of the first data set and the second data set through a multiplying and accumulating unit;

Perform a normalization operation on at least one multiplication result in the multiplication result data set by a normalization unit, and store the generated result mantissa data and result exponent data in the mantissa register and the exponent register respectively;

Determine the block floating point exponent of the result exponent data in the exponent register, and process the result mantissa data in the mantissa register according to the block floating point exponent to generate block floating point data.
The method according to claim 6, wherein the mantissa storage address of the result mantissa data in the mantissa register is the same as the exponent storage address of the result exponent data in the exponent register.
The method according to claim 6, wherein the determining the block floating point exponent of the result exponent data in the exponent register, and processing the result mantissa data in the mantissa register according to the block floating point exponent to generate a block Floating point data, including:

The exponent calculation unit selects the result exponent data with the largest value in the exponent register as the block floating-point exponent;

A mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent;

The mantissa data of the result of the shift operation and the block floating point exponent are used as block floating point data.
8. The method according to claim 8, wherein the exponent calculation unit selecting the result exponent data with the largest value in the exponent register as the block floating-point exponent comprises:

The index calculation unit reads the result index data with the lowest address in the index register as the current maximum result index;

The index calculation unit sequentially reads the result index data remaining in the index register and compares it with the current maximum result index, and if the result index data is greater than the current maximum result index, then the result index data As the current maximum result index;

When the exponent calculation unit finishes reading the result exponent data in the exponent register, the current maximum result exponent is used as the block floating-point exponent.
8. The method according to claim 8, wherein the mantissa calculation unit performs a shift operation on the result mantissa data in the mantissa register according to the block floating-point exponent, comprising:

The mantissa calculation unit sequentially reads the result mantissa data and reads the result exponent data corresponding to the result mantissa data in the exponent register;

Taking the difference between the block floating point exponent and the result exponent data as the number of shift bits;

Perform a shift operation on the result mantissa data according to the shift bit number.