TW202115560A

TW202115560A - Multiplier and method for floating-point arithmetic, integrated circuit chip, and computing device

Info

Publication number: TW202115560A
Application number: TW109135585A
Authority: TW
Inventors: 張堯; 劉少禮
Original assignee: 大陸商安徽寒武紀信息科技有限公司
Priority date: 2019-10-14
Filing date: 2020-10-14
Publication date: 2021-04-16
Also published as: CN112732220A; CN112732221A; TWI763079B

Abstract

The present invention relates to a multiplier and a method for floating-point arithmetic, an integrated circuit chip, and a computing device. The computing device can be included in a combination processing device. The combination processing device may also include a universal interconnection interface and other processing device(s). The computing device interacts with the other processing device(s) to together accomplish an arithmetic operation prescribed by a user. The combination processing device may also include a storage device. The storage device is connected to the computing device and the other processing device(s) and used for data of the computing device and the other processing device(s). Schemes of the present invention can be widely applied to various floating-point data arithmetic operations.

Description

Multiplier, method, integrated circuit chip and computing device for floating-point operation

本揭示係關於浮點運算技術領域，特別是關於一種用於浮點運算的方法、乘法器、積體電路晶片和計算裝置。This disclosure relates to the technical field of floating-point operations, in particular to a method for floating-point operations, multipliers, integrated circuit chips and computing devices.

在當前的各種信號處理演算法中，如向量之間的內積操作和矩陣的卷積運算中，使用到大量的乘加操作，而這些乘加操作的效率往往取決於乘法器的執行速度。儘管當前的乘法器在執行效率方面獲得了顯著提高，但在處理浮點類型資料方面，其還存在提升的空間。因此，如何獲得一種高效率、低功耗和低成本的乘法器來執行浮點型資料的乘法操作成為習知技術中需要解決的問題。In current various signal processing algorithms, such as inner product operations between vectors and matrix convolution operations, a large number of multiplication and addition operations are used, and the efficiency of these multiplication and addition operations often depends on the execution speed of the multiplier. Although current multipliers have achieved significant improvements in execution efficiency, there is still room for improvement in processing floating-point data. Therefore, how to obtain a multiplier with high efficiency, low power consumption and low cost to perform the multiplication operation of floating-point data has become a problem that needs to be solved in the prior art.

為了至少部分地解決先前技術中提到的技術問題，本披露的方案提供了一種用於浮點運算的乘法器、方法、包括該乘法器的積體電路晶片和計算裝置。In order to at least partially solve the technical problems mentioned in the prior art, the solution of the present disclosure provides a multiplier and method for floating-point operations, an integrated circuit chip including the multiplier, and a computing device.

在一個方面中，本披露提供一種乘法器，用於進行浮點數的乘法運算，其中，所述乘法器包括：尾數處理單元，用於根據所述浮點數的尾數來獲得所述乘法運算後的尾數，所述尾數處理單元包括控制電路，所述控制電路用於在兩個浮點數中的至少一個的尾數位寬大於所述尾數處理單元一次可處理的資料位寬時，多次調用所述尾數處理單元。In one aspect, the present disclosure provides a multiplier for performing multiplication operations of floating-point numbers, wherein the multiplier includes: a mantissa processing unit for obtaining the multiplication operation according to the mantissa of the floating-point number After the mantissa, the mantissa processing unit includes a control circuit for multiple times when the bit width of at least one of the two floating-point numbers is greater than the data bit width that can be processed by the mantissa processing unit at one time Call the mantissa processing unit.

在另一方面中，本披露提供一種使用乘法器執行浮點數乘法運算的方法，其中，利用所述乘法器的尾數處理單元根據所述浮點數的尾數來獲得所述乘法運算後的尾數，所述尾數處理單元包括控制電路，所述控制電路用於在兩個浮點數中的至少一個的尾數位寬大於所述尾數處理單元一次可處理的資料位寬時，多次調用所述尾數處理單元。In another aspect, the present disclosure provides a method for performing floating-point number multiplication using a multiplier, wherein the mantissa processing unit of the multiplier is used to obtain the mantissa after the multiplication operation according to the mantissa of the floating-point number The mantissa processing unit includes a control circuit configured to call the mantissa multiple times when the bit width of at least one of the two floating-point numbers is greater than the data bit width that can be processed by the mantissa processing unit at one time. Mantissa processing unit.

在又一方面中，本披露提供一種積體電路晶片，包括所述的乘法器。在一個或多個實施例中，本披露的乘法器可以構成一個獨立的積體電路晶片或佈置在一塊積體電路晶片或計算裝置上，實現對多種不同資料格式的浮點數的運算。In yet another aspect, the present disclosure provides an integrated circuit chip including the multiplier described above. In one or more embodiments, the multiplier of the present disclosure can constitute an independent integrated circuit chip or be arranged on an integrated circuit chip or a computing device to realize the operation of floating-point numbers in a variety of different data formats.

利用本披露的乘法器、相應的運算方法、積體電路晶片和計算裝置，可以支援對多種浮點類型的資料進行運算而無需針對不同的浮點類型資料而提供多個單獨的乘法器。由此，本披露的乘法器適用靈活，可以廣泛應用於各類浮點資料運算。另外，在處理位寬較大的輸入資料時，本披露的乘法器支持迴圈複用操作，從而無需佈置更多的處理晶片，由此也減小了積體電路的佈置面積。By using the multiplier, corresponding operation method, integrated circuit chip and computing device of the present disclosure, it is possible to support operations on multiple floating-point data without providing multiple separate multipliers for different floating-point data. Therefore, the multiplier of the present disclosure is flexible and can be widely used in various floating-point data operations. In addition, when processing input data with a larger bit width, the multiplier of the present disclosure supports loop multiplexing operation, so that there is no need to arrange more processing chips, thereby also reducing the layout area of the integrated circuit.

本披露的技術方案在整體上提供一種用於浮點數運算的乘法器、方法、積體電路晶片和計算裝置。不同於習知技術的浮點運算乘法器，本披露提供了一種支援多種運算模式的乘法器，從而克服習知乘法器只能支持一種類型浮點運算的缺陷。特別地，本披露利用多種運算模式來指示不同的浮點資料類型，並且在浮點數的乘法計算過程中，基於運算模式之一來執行資料的各類操作，包括例如編碼、壓縮、求和、規格化和捨入操作，從而實現與多種浮點資料類型之一關聯的操作。由此，本披露的乘法器可以支援多模式下的操作，進一步提高浮點運算的靈活性並降低運算的成本。The technical solution of the present disclosure provides a multiplier, method, integrated circuit chip and computing device for floating-point number arithmetic as a whole. Different from the conventional floating-point arithmetic multiplier, the present disclosure provides a multiplier that supports multiple operation modes, thereby overcoming the defect that the conventional multiplier can only support one type of floating-point arithmetic. In particular, the present disclosure uses multiple operation modes to indicate different floating-point data types, and in the multiplication calculation process of floating-point numbers, various operations of data are performed based on one of the operation modes, including, for example, encoding, compression, and summation. , Normalization and rounding operations to achieve operations associated with one of multiple floating-point data types. As a result, the multiplier of the present disclosure can support operations in multiple modes, further improving the flexibility of floating-point operations and reducing the cost of operations.

下面將結合附圖對本披露的技術方案及其多個實施例進行詳細的描述。應當理解的是，將關於浮點運算闡述許多具體細節以便提供對本披露所述多個實施例的透徹理解。然而，本領域具有通常知識者在本披露公開內容的教導下可以在沒有這些具體細節的情況下實踐本披露描述的多個實施例。在其他情況下，本披露公開的內容並沒有詳細描述公知的方法、過程和元件，以避免不必要地模糊本披露描述的實施例。另外，該描述也不應被視為限制本披露的多個實施例的範圍。The technical solution of the present disclosure and its multiple embodiments will be described in detail below with reference to the accompanying drawings. It should be understood that many specific details will be elaborated on floating-point operations in order to provide a thorough understanding of the various embodiments of the present disclosure. However, those with ordinary knowledge in the art can practice multiple embodiments described in this disclosure without these specific details under the teaching of the disclosure of this disclosure. In other cases, the content disclosed in the present disclosure does not describe well-known methods, processes, and elements in detail, so as to avoid unnecessarily obscuring the embodiments described in the present disclosure. In addition, this description should not be regarded as limiting the scope of the various embodiments of the present disclosure.

第1圖是示出根據本披露實施例的浮點資料格式100的示意圖。如第1圖中所示，可以應用本披露技術方案的浮點數可以包括三個部分，例如符號(或符號位102)、指數(或指數位104)和尾數(或尾數位106)，其中對於無符號的浮點數則可以不存在符號或符號位。在一些實施例中，適用於本披露乘法器的浮點數可以包括半精度浮點數、單精確度浮點數、腦浮點數、雙精度浮點數、自訂浮點數中的至少一種。具體來說，在一些實施例中，可以應用本披露技術方案的浮點數格式可以是符合IEEE754標準的浮點格式，例如雙精度浮點數(float64，簡寫為“FP64”)、單精確度浮點數(float32，簡寫“FP32”)或半精度浮點數(float16，簡寫“FP16”)。在另外一些實施例中，浮點數格式也可以是現有的16位腦浮點數(bfloat16，簡寫“BF16”)，也可以是自訂的浮點數格式，例如8位腦浮點數(bfloat8，簡寫“BF8”)、無符號半精度浮點數(unsigned float16，簡寫“UFP16”)、無符號16位腦浮點數(unsigned bfloat16，簡寫“UBF16”)。為了便於理解，下面的表1示出上述的部分資料格式，其中的符號位寬、指數位寬和尾數位寬僅用於示例性的說明目的。表1 資料類型符號位寬指數位寬尾數位寬 FP16 1 5 10 BF16 1 8 7 FP32 1 8 23 BF8 1 5 3 UFP16 0 5(或6) 11(或10) UBF16 0 8 8 Figure 1 is a schematic diagram showing a floating point data format 100 according to an embodiment of the present disclosure. As shown in Figure 1, the floating-point number to which the technical solution of the present disclosure can be applied can include three parts, such as sign (or sign bit 102), exponent (or exponent bit 104), and mantissa (or mantissa bit 106), where For unsigned floating-point numbers, there may be no sign or sign bit. In some embodiments, the floating-point numbers suitable for the multiplier of the present disclosure may include at least half-precision floating-point numbers, single-precision floating-point numbers, brain floating-point numbers, double-precision floating-point numbers, and custom floating-point numbers. One kind. Specifically, in some embodiments, the floating-point number format to which the technical solution of the disclosure can be applied may be a floating-point format that conforms to the IEEE754 standard, such as double-precision floating-point number (float64, abbreviated as "FP64"), single-precision Floating point number (float32, abbreviated "FP32") or half-precision floating point number (float16, abbreviated "FP16"). In other embodiments, the floating-point number format can also be an existing 16-bit brain floating-point number (bfloat16, abbreviated as "BF16"), or a custom floating-point number format, such as an 8-bit brain floating-point number ( bfloat8, abbreviated as "BF8"), unsigned half-precision floating-point number (unsigned float16, abbreviated as "UFP16"), unsigned 16-bit brain floating-point number (unsigned bfloat16, abbreviated as "UBF16"). For ease of understanding, the following Table 1 shows some of the above-mentioned data formats, in which the sign bit width, exponent bit width, and mantissa bit width are only used for illustrative purposes. Table 1 Data type Sign bit width Exponent bit width Mantissa bit width FP16 1 5 10 BF16 1 8 7 FP32 1 8 twenty three BF8 1 5 3 UFP16 0 5 (or 6) 11 (or 10) UBF16 0 8 8

對於上面所提到的各種浮點數格式，本披露的乘法器在操作中至少可以支援具有任意上述格式的兩個浮點數之間的相乘操作，其中兩個浮點數可以具有相同或不同的浮點資料格式。例如，兩個浮點數之間的相乘操作可以是FP16*FP16、BF16*BF16、FP32*FP32、FP32*BF16、FP16*BF16、FP32*FP16、BF8*BF16、UBF16*UFP16或UBF16*FP16等兩個浮點數之間的相乘操作。For the various floating-point number formats mentioned above, the multiplier of the present disclosure can at least support the multiplication operation between two floating-point numbers with any of the above-mentioned formats in operation, wherein the two floating-point numbers can have the same or Different floating-point data formats. For example, the multiplication operation between two floating-point numbers can be FP16*FP16, BF16*BF16, FP32*FP32, FP32*BF16, FP16*BF16, FP32*FP16, BF8*BF16, UBF16*UFP16, or UBF16*FP16 Wait for the multiplication operation between two floating-point numbers.

第2圖是示出根據本披露實施例的乘法器200的示意性結構框圖。如前所述，本披露的乘法器支援各種資料格式的浮點數的相乘操作，而這些資料格式可以通過本披露的運算模式來指示，以使得乘法器工作在多種運算模式之一。Figure 2 is a schematic structural block diagram of a multiplier 200 according to an embodiment of the present disclosure. As mentioned above, the multiplier of the present disclosure supports the multiplication operation of floating-point numbers in various data formats, and these data formats can be indicated by the operation mode of the present disclosure, so that the multiplier works in one of a variety of operation modes.

如第2圖中所示，本披露的乘法器總體上可以包括指數處理單元202和尾數處理單元204，其中指數處理單元用於處理浮點數的指數位，而尾數處理單元用於處理浮點數的尾數位。可選地或附加地，在一些實施例中，當乘法器處理的浮點數具有符號位時，乘法器還可以包括符號處理單元206，該符號處理單元可以用於處理包括符號位的浮點數。As shown in Figure 2, the multiplier of the present disclosure may generally include an exponent processing unit 202 and a mantissa processing unit 204, where the exponent processing unit is used to process the exponent bits of floating-point numbers, and the mantissa processing unit is used to process floating-point numbers. The mantissa of the number. Alternatively or additionally, in some embodiments, when the floating-point number processed by the multiplier has a sign bit, the multiplier may further include a sign processing unit 206, which may be used to process a floating point number including a sign bit. number.

在操作中，所述乘法器可以根據運算模式之一對接收、輸入或緩存的第一浮點數和第二浮點數執行浮點運算，該第一浮點數和第二浮點數具有如前所討論的浮點資料格式之一。例如，當乘法器處於第一運算模式中，其可以支援兩個浮點數FP16*FP16的乘法運算，而當乘法器處於第二運算模式中，其可以支援兩個浮點數BF16*BF16的乘法運算。類似地，當乘法器處於第三運算模式中，其可以支援兩個浮點數FP32*FP32的乘法運算，而當乘法器處於第四運算模式中，其可以支援兩個浮點數FP32*BF16的乘法運算。這裡，示例的運算模式和浮點數對應關係如下表2所示。表2 運算模式編號運算浮點數類型 1 FP16*FP16 2 BF16*BF16 3 FP32*FP32 4 FP32*BF16 In operation, the multiplier can perform floating-point operations on the received, input, or buffered first floating-point number and second floating-point number according to one of the operating modes, the first floating-point number and the second floating-point number having One of the floating-point data formats discussed earlier. For example, when the multiplier is in the first operation mode, it can support the multiplication of two floating-point numbers FP16*FP16, and when the multiplier is in the second operation mode, it can support two floating-point numbers BF16*BF16. Multiplication operation. Similarly, when the multiplier is in the third operation mode, it can support the multiplication of two floating-point numbers FP32*FP32, and when the multiplier is in the fourth operation mode, it can support two floating-point numbers FP32*BF16 The multiplication operation. Here, the corresponding relationship between the sample operation mode and the floating-point number is shown in Table 2 below. Table 2 Operation mode number Arithmetic floating point number type 1 FP16*FP16 2 BF16*BF16 3 FP32*FP32 4 FP32*BF16

在一個實施例中，上述的表2可以存儲於乘法器的一個存儲器中，並且乘法器根據從外部設備接收到的指令來選擇表中的運算模式之一，而該外部設備例如可以是第10圖中示出的外部設備1012。在另一個實施例中，該運算模式的輸入也可以經由如第3圖中所示的模式選擇單元308來自動地實現。例如，當兩個FP16型的浮點數輸入到本披露的乘法器時，模式選擇單元可以根據該兩個浮點數的資料格式而選擇乘法器工作於第一運算模式中。又例如，當一個FP32型浮點數和一個BF16型浮點數輸入到本披露的乘法器時，模式選擇單元可以根據該兩個浮點數的資料格式而選擇乘法器工作於第四運算模式中。In an embodiment, the above-mentioned table 2 may be stored in a memory of the multiplier, and the multiplier selects one of the operation modes in the table according to the instruction received from the external device, and the external device may be the 10th The external device 1012 shown in the figure. In another embodiment, the input of the operation mode can also be realized automatically via the mode selection unit 308 as shown in FIG. 3. For example, when two FP16 floating-point numbers are input to the multiplier of the present disclosure, the mode selection unit can select the multiplier to work in the first operation mode according to the data format of the two floating-point numbers. For another example, when a FP32 type floating point number and a BF16 type floating point number are input to the multiplier of the present disclosure, the mode selection unit may select the multiplier to work in the fourth operation mode according to the data format of the two floating point numbers in.

可以看出，本披露的不同運算模式與對應的浮點型資料相關聯。也就是說，本披露的運算模式可以用於指示第一浮點數的資料格式和第二浮點數的資料格式。在另一個實施例中，本披露的運算模式不僅可以指示第一浮點數的資料格式和第二浮點數的資料格式，還可以用於指示乘法運算後的資料格式。結合表2擴展的運算模式在下表3中示出。表3 運算模式編號運算浮點數類型輸出結果類型 11 FP16*FP16 FP16 12 BF16 13 FP32 21 BF16*BF16 FP16 22 BF16 23 FP32 31 FP32*FP32 FP16 32 BF16 33 FP32 41 FP32*BF16 FP16 42 BF16 43 FP32 It can be seen that the different operation modes of the present disclosure are associated with corresponding floating-point data. In other words, the operation mode of the present disclosure can be used to indicate the data format of the first floating-point number and the data format of the second floating-point number. In another embodiment, the operation mode of the present disclosure can not only indicate the data format of the first floating-point number and the data format of the second floating-point number, but can also be used to indicate the data format after the multiplication operation. The operation mode extended in conjunction with Table 2 is shown in Table 3 below. table 3 Operation mode number Arithmetic floating point number type Output result type 11 FP16*FP16 FP16 12 BF16 13 FP32 twenty one BF16*BF16 FP16 twenty two BF16 twenty three FP32 31 FP32*FP32 FP16 32 BF16 33 FP32 41 FP32*BF16 FP16 42 BF16 43 FP32

與表2中所示的運算模式編號不同，表3中的運算模式擴展一位以用於指示浮點乘法運算後的資料格式。例如，當乘法器工作於運算模式21中，其對輸入的BF16*BF16兩個浮點數執行浮點運算，並且將浮點乘法運算後以FP16的資料格式輸出。Different from the operation mode numbers shown in Table 2, the operation modes in Table 3 are extended by one bit to indicate the data format after floating-point multiplication. For example, when the multiplier works in operation mode 21, it performs floating-point operations on the input BF16*BF16 two floating-point numbers, and outputs the floating-point multiplication in the FP16 data format.

上面以編號形式的運算模式來指示浮點資料格式僅僅是示例性的而非限制性的，根據本披露的教導，也可以想到根據運算模式建立索引以確定乘數和被乘數的格式。例如，運算模式包括兩個索引，第一個索引用於指示第一浮點數的類型，第二個索引用於指示第二浮點數的類型，例如運算模式13中的第一索引“1”指示第一浮點數(或稱被乘數)為第一浮點格式，即FP16，而第二索引“3”指示第二浮點數(或稱乘數)為第二浮點格式，即FP32。進一步，也可以對運算模式增加第三索引，該第三索引指示輸出結果的資料格式，例如對於運算模式131中的第三索引“1”，其可以指示輸出結果的資料格式是第一浮點格式，即FP16。當運算模式數目增加時，可以根據需要增加相應的索引或索引的層級，以便於對運算模式和資料格式之間關係的確立。The above operation mode in number form to indicate the floating point data format is only exemplary and not restrictive. According to the teaching of the present disclosure, it is also conceivable to establish an index according to the operation mode to determine the format of the multiplier and the multiplicand. For example, the operation mode includes two indexes. The first index is used to indicate the type of the first floating-point number, and the second index is used to indicate the type of the second floating-point number. For example, the first index in operation mode 13 is "1". "Indicates that the first floating-point number (or multiplicand) is in the first floating-point format, that is, FP16, and the second index "3" indicates that the second floating-point number (or multiplier) is in the second floating-point format, That is FP32. Further, a third index can also be added to the operation mode, which indicates the data format of the output result. For example, for the third index "1" in the operation mode 131, it can indicate that the data format of the output result is the first floating point. The format is FP16. When the number of operation modes increases, the corresponding index or index level can be increased as needed to facilitate the establishment of the relationship between the operation mode and the data format.

另外，儘管這裡示例性地以數字編號來指代運算模式，在其他的例子中，也可以根據應用需要以其他的符號或編碼來對運算模式進行指代，例如通過字母、符號或數字及其結合等等，並且通過這樣的字母、數字、符號或其組合的表達來指代運算模式並標識出第一浮點數、第二浮點數和輸出結果的資料格式。另外，當這些表達以指令形式形成時，該指令可以包括三個域或欄位，第一域用於指示第一浮點數的資料格式，第二域用於指示第二浮點數的資料格式，而第三域用於指示輸出結果的資料格式。當然，這些域也可以被合併於一個域，或增加新的域以用於指示更多的與浮點資料格式相關的內容。可以看出，本披露的運算模式不僅可以與輸入的浮點數資料格式相關聯，也可以用於規格化輸出結果，以獲得期望資料格式的乘積結果。In addition, although numerical numbers are exemplified here to refer to the operation mode, in other examples, other symbols or codes can also be used to refer to the operation mode according to application needs, such as letters, symbols, or numbers and their Combination, etc., and through the expression of such letters, numbers, symbols, or their combination to refer to the operation mode and identify the first floating-point number, the second floating-point number, and the data format of the output result. In addition, when these expressions are formed in the form of instructions, the instructions can include three fields or fields. The first field is used to indicate the data format of the first floating-point number, and the second field is used to indicate the data of the second floating-point number. Format, and the third field is used to indicate the data format of the output result. Of course, these fields can also be combined into one field, or new fields can be added to indicate more content related to the floating-point data format. It can be seen that the operation mode of the present disclosure can not only be associated with the input floating-point number data format, but also can be used to normalize the output result to obtain the product result of the desired data format.

第3圖是示出根據本披露實施例的乘法器300的更多細節結構框圖。從第3圖所示內容可以看出，其不僅包括第2圖中所示出的指數處理單元202、尾數處理單元204和可選的符號處理單元206，還示出這些單元可以包括的內部元件以及與這些單元操作相關的單元，下面結合第3圖來具體描述這些單元的示例性操作。FIG. 3 is a block diagram showing a more detailed structure of the multiplier 300 according to an embodiment of the present disclosure. As can be seen from the content shown in Figure 3, it not only includes the exponent processing unit 202, mantissa processing unit 204, and optional symbol processing unit 206 shown in Figure 2, but also shows the internal components that these units can include. As well as the units related to the operations of these units, the exemplary operations of these units will be described in detail below in conjunction with Figure 3.

為了執行浮點數的乘法運算，指數處理單元可以用於根據前述的運算模式、第一浮點數的指數和第二浮點數的指數獲得乘法運算後的指數。在一個實施例中，該指數處理單元可以通過加減法電路來實現。例如，此處的指數處理單元可以用於將第一浮點數的指數、第二浮點數的指數和各自對應的輸入浮點資料格式的偏移值相加，並且接著減去輸出浮點資料格式的偏移值，以獲得第一浮點數和第二浮點數的乘法運算後的指數。In order to perform the multiplication operation of the floating-point number, the exponent processing unit may be used to obtain the multiplied exponent according to the aforementioned operation mode, the exponent of the first floating-point number, and the exponent of the second floating-point number. In an embodiment, the exponent processing unit may be implemented by an addition and subtraction circuit. For example, the exponent processing unit here can be used to add the exponent of the first floating-point number, the exponent of the second floating-point number, and the corresponding offset value of the input floating-point data format, and then subtract the output floating-point The offset value of the data format to obtain the exponent after the multiplication of the first floating-point number and the second floating-point number.

進一步，乘法器的尾數處理單元可以用於根據前述的運算模式、第一浮點數和所述第二浮點數來獲得乘法運算後的尾數。在一個實施例中，尾數處理單元可以包括部分積運算單元312和部分積求和單元314，其中所述部分積運算單元用於根據第一浮點數的尾數和第二浮點數的尾數獲得中間結果。在一些實施例中，該中間結果可以是第一浮點數和第二浮點數在相乘操作過程中所獲得的多個部分積(如第5圖和第6圖中所示意性示出的)。所述部分積求和單元用於將所述中間結果進行加和運算以獲得加和結果，並將所述加和結果作為所述乘法運算後的尾數。Further, the mantissa processing unit of the multiplier can be used to obtain the mantissa after the multiplication operation according to the foregoing operation mode, the first floating-point number, and the second floating-point number. In one embodiment, the mantissa processing unit may include a partial product arithmetic unit 312 and a partial product summation unit 314, wherein the partial product arithmetic unit is used to obtain the mantissa of the first floating-point number and the mantissa of the second floating-point number. Intermediate results. In some embodiments, the intermediate result may be multiple partial products obtained during the multiplication operation of the first floating-point number and the second floating-point number (as shown schematically in Figures 5 and 6) of). The partial product summation unit is configured to perform an addition operation on the intermediate result to obtain an addition result, and use the addition result as the mantissa after the multiplication operation.

為了獲得中間結果，在一個實施例中，本披露利用布斯(“Booth”)編碼電路對第二浮點數(如充當浮點運算中的乘數)的尾數的高低位補0(其中對高位補0是將尾數作為無符號數轉為有符號數)，以便獲得所述中間結果。需要理解的是，根據編碼方法的不同，也可以對第一浮點數(如充當浮點運算中的被乘數)的尾數進行編碼(如高低位補0)，或者對二者都進行編碼，以獲得多個部分積。關於部分積的更多描述，稍後將結合附圖來說明。In order to obtain intermediate results, in one embodiment, the present disclosure uses a Booth ("Booth") encoding circuit to complement the high and low bits of the mantissa of the second floating-point number (such as serving as a multiplier in floating-point operations) with 0 (where the Complementing the high bit with zero is to convert the mantissa as an unsigned number to a signed number) in order to obtain the intermediate result. It should be understood that, depending on the encoding method, the mantissa of the first floating-point number (such as the multiplicand in floating-point operations) can also be encoded (such as high and low digits filled with 0), or both To obtain multiple partial products. More descriptions about partial products will be described later in conjunction with the drawings.

在另一個實施例中，所述部分積求和單元可以包括加法器，其用於對所述中間結果進行加和，以獲得所述加和結果。在又一個實施例中，部分積求和單元包括華萊士樹和加法器，其中所述華萊士樹用於對所述中間結果進行加和，以獲得第二中間結果，所述加法器用於對所述第二中間結果進行加和，以獲得所述加和結果。在這些實施例中，加法器可以包括全加器、串列加法器和超前進位加法器中的至少一種。In another embodiment, the partial product summation unit may include an adder, which is used to add the intermediate result to obtain the sum result. In yet another embodiment, the partial product summation unit includes a Wallace tree and an adder, wherein the Wallace tree is used to add the intermediate results to obtain a second intermediate result, and the adder uses To add the second intermediate result to obtain the added result. In these embodiments, the adder may include at least one of a full adder, a tandem adder, and a forward bit adder.

在一個實施例中，本披露的乘法器還包括規則化單元318和捨入單元320。該規則化單元可以用於對乘法運算後的尾數和指數進行浮點數規則化處理，以獲得規則化指數結果和規則化尾數結果，並且將所述規則化指數結果和所述規則化尾數結果作為所述乘法運算後的指數和乘法運算後的尾數。例如，根據運算模式所指示的資料格式，規則化單元可以調整指數和尾數的位寬，以使其符合前述指示的資料格式的要求。另外，規則化單元還可以對指數或尾數做其他方面的調整。例如，在一些應用場景中，當尾數的值不為0時，尾數位的最高有效位應為1；否則，可以修改指數位並同時對尾數位進行移位，使其變為規格化數的形式。在另一個實施例中，該規則化單元還可以根據乘法運算後的尾數對所述乘法運算後的指數進行調整。例如，當乘法運算後的尾數的最高位為1時，可以將乘法運算後所獲得的指數加1。與之相應，捨入單元可以用於根據捨入模式對所述規則化尾數結果執行捨入操作，並將執行了捨入操作後的尾數作為所述乘法運算後的尾數。根據不同的應用場景，該捨入單元可以執行例如包括向下捨入、向上捨入、向最近的有效數捨入等的捨入操作。在一些應用場景中，捨入單元也可以對尾數右移過程中移出的1進行捨入。In an embodiment, the multiplier of the present disclosure further includes a regularization unit 318 and a rounding unit 320. The regularization unit can be used to perform floating-point regularization processing on the mantissa and exponent after the multiplication operation to obtain a regularized exponent result and a regularized mantissa result, and combine the regularized exponent result and the regularized mantissa result As the exponent after the multiplication operation and the mantissa after the multiplication operation. For example, according to the data format indicated by the operation mode, the regularization unit can adjust the bit width of the exponent and the mantissa to meet the requirements of the data format indicated above. In addition, the regularization unit can also make other adjustments to the exponent or mantissa. For example, in some application scenarios, when the value of the mantissa is not 0, the most significant bit of the mantissa bit should be 1; otherwise, you can modify the exponent bit and shift the mantissa bit at the same time to make it a normalized number. form. In another embodiment, the regularization unit may also adjust the exponent after the multiplication operation according to the mantissa after the multiplication operation. For example, when the highest bit of the mantissa after the multiplication operation is 1, the exponent obtained after the multiplication operation can be increased by 1. Correspondingly, the rounding unit may be used to perform a rounding operation on the regularized mantissa result according to a rounding mode, and use the mantissa after the rounding operation is performed as the mantissa after the multiplication operation. According to different application scenarios, the rounding unit may perform rounding operations including rounding down, rounding up, and rounding to the nearest significant number, for example. In some application scenarios, the rounding unit can also round the 1 that is shifted out in the process of shifting the mantissa to the right.

除了指數處理單元和尾數處理單元，本披露的乘法器還可選地包括符號處理單元，當輸入的浮點數是帶有符號位的浮點數時，該符號處理單元可以用於根據第一浮點數的符號和第二浮點數的符號獲得乘法運算後的符號。例如，在一個實施例中，該符號處理單元可以包括異或邏輯電路322，所述異或邏輯電路用於根據所述第一浮點數的符號和所述第二浮點數的符號進行異或運算，獲得所述乘法運算後的符號。在另一個實施例中，該符號處理單元也可以通過真值表或邏輯判斷來實現。In addition to the exponent processing unit and the mantissa processing unit, the multiplier of the present disclosure may also optionally include a sign processing unit. When the input floating-point number is a floating-point number with a sign bit, the sign processing unit can be used according to the first The sign of the floating-point number and the sign of the second floating-point number obtain the sign after the multiplication operation. For example, in one embodiment, the symbol processing unit may include an XOR logic circuit 322, which is used to perform XOR according to the sign of the first floating-point number and the sign of the second floating-point number. Or operation to obtain the sign after the multiplication operation. In another embodiment, the symbol processing unit can also be implemented by a truth table or logical judgment.

另外，為了使輸入或接收到的第一和第二浮點數符合規定的格式，在一個實施例中，本披露的乘法器還可以包括規格化處理單元324，用於當所述第一浮點數或第二浮點數為非規格化的非零浮點數時，根據所述運算模式，對所述第一浮點數或第二浮點數進行規格化處理，以獲得對應的指數和尾數。例如，當選擇的運算模式是表2中所示出的第2種運算模式，而輸入的第一和第二浮點數是FP16型資料，則可以利用規格化處理單元將FP16型資料規格化為BF16型資料，以便乘法器以第2種運算模式進行操作。在一個或多個實施例中，規格化處理單元還可以用於對存在隱式的1的規格化浮點數和不存在隱式的1的非規格化浮點數的尾數進行預處理(例如尾數的擴充)，以便於後續的尾數處理單元的操作。基於上文的描述，可以理解的是這裡的規格化處理單元324和前述的規則化單元318在一些實施例中也可以執行相同或相類似的操作，不同的是規格化處理單元324針對於輸入的浮點資料進行規格化處理而規則化單元318針對於將要輸出的尾數和指數進行規則化處理。In addition, in order to make the input or received first and second floating-point numbers comply with the prescribed format, in one embodiment, the multiplier of the present disclosure may further include a normalization processing unit 324 for when the first floating-point number When the point or the second floating-point number is a non-normalized non-zero floating-point number, the first floating-point number or the second floating-point number is normalized according to the operation mode to obtain the corresponding exponent And mantissa. For example, when the selected operation mode is the second operation mode shown in Table 2, and the input first and second floating-point numbers are FP16 data, the FP16 data can be normalized by the normalization processing unit It is BF16 type data, so that the multiplier can operate in the second operation mode. In one or more embodiments, the normalization processing unit may also be used to preprocess the mantissa of normalized floating-point numbers with an implicit 1 and non-normalized floating-point numbers without an implicit 1 (for example, Mantissa expansion) to facilitate subsequent mantissa processing unit operations. Based on the above description, it can be understood that the normalization processing unit 324 and the aforementioned regularization unit 318 can also perform the same or similar operations in some embodiments. The difference is that the normalization processing unit 324 is specific to the input. The floating point data of is normalized, and the regularization unit 318 regularizes the mantissa and exponent to be output.

以上結合第3圖對本披露的乘法器及其多個實施例進行了描述。基於上面的描述，本領域具有通常知識者可以理解本披露的方案通過乘法器的執行來獲得乘法運算後的結果(包括指數、尾數和可選的符號)。根據應用場景的不同，例如在不需要前述的規則化處理和捨入處理時，通過尾數處理單元和指數處理單元所獲得的結果即可以視為最終的運算結果。進一步，對於需要前述的規則化處理和捨入處理時，則經過該規則化處理和捨入處理後所獲得的指數和尾數可以視為最終的運算結果，或最終的運算結果的一部分(當考慮最終的符號時)。進一步，本披露的方案通過多種運算模式來使得乘法器支援不同類型或資料格式的浮點數的運算，從而可以實現乘法器的複用，由此節省了晶片設計的開銷並節約了計算成本。另外，通過多次調用機制，本披露的乘法器也支援高位寬的浮點數的計算。鑒於在浮點數乘法操作中，尾數(或稱尾數位或尾數部分)的相乘操作對於整個浮點運算的性能至關重要，下面將結合第4圖來描述本披露的尾數操作。The multiplier of the present disclosure and its various embodiments have been described above with reference to FIG. 3. Based on the above description, those with ordinary knowledge in the art can understand that the solution of the present disclosure obtains the result of the multiplication operation (including the exponent, the mantissa and optional signs) through the execution of the multiplier. According to different application scenarios, for example, when the aforementioned regularization processing and rounding processing are not required, the result obtained by the mantissa processing unit and the exponential processing unit can be regarded as the final operation result. Furthermore, when the aforementioned regularization processing and rounding processing are required, the exponent and mantissa obtained after the regularization processing and rounding processing can be regarded as the final calculation result, or a part of the final calculation result (when considering The final symbol). Furthermore, the solution of the present disclosure uses multiple operation modes to enable the multiplier to support the operation of floating-point numbers of different types or data formats, so that the multiplexing of the multiplier can be realized, thereby saving the overhead of chip design and the calculation cost. In addition, through the multiple call mechanism, the multiplier of the present disclosure also supports the calculation of high-bit-width floating-point numbers. In view of the fact that in the floating-point number multiplication operation, the multiplication operation of the mantissa (also called the mantissa bit or the mantissa part) is very important to the performance of the entire floating-point operation, the following will describe the mantissa operation of the present disclosure with reference to FIG. 4.

第4圖是示出根據本披露實施例的尾數處理單元操作400的示意性框圖。如第4圖中所示，本披露的尾數處理操作可以主要涉及兩個單元，即前述結合如第3圖所討論的部分積運算單元和部分積求和單元。從操作時序上來看，該尾數處理操作大體可以分為第一階段和第二階段，在第一階段中該尾數處理操作將獲得中間結果，而在第二階段中該尾數處理操作將獲得從加法器408輸出的尾數結果。FIG. 4 is a schematic block diagram showing the operation 400 of the mantissa processing unit according to an embodiment of the present disclosure. As shown in Figure 4, the mantissa processing operation of the present disclosure may mainly involve two units, namely the partial product operation unit and the partial product summation unit discussed in Figure 3 in combination with the foregoing. From the perspective of operation sequence, the mantissa processing operation can be roughly divided into the first stage and the second stage. In the first stage, the mantissa processing operation will obtain intermediate results, and in the second stage, the mantissa processing operation will obtain the addition The mantissa result output by the converter 408.

在示例性的具體操作中，由乘法器接收到的第一浮點數和第二浮點數可以被劃分成多個部分，即前述的符號(可選的)、指數和尾數。可選地，在經過規格化處理後，兩個浮點數的尾數部分將作為輸入進入到尾數處理單元(如第2圖或第3圖中的尾數處理單元)，並且具體地進入到部分積運算單元。如第4圖中所示，本披露利用布斯編碼電路402對第二浮點數(即浮點運算中的乘數)的尾數的高低位補0，並進行布斯編碼處理，從而在部分積產生電路404中獲得所述中間結果。當然，這裡的第一浮點數和第二浮點數僅僅用於說明性而非限制性的目的，因此在一些應用場景中，第一浮點數可以是乘數而第二浮點數可以是被乘數。相應地，在一些編碼處理中，也可以對充當被乘數的浮點數執行編碼操作。In an exemplary specific operation, the first floating-point number and the second floating-point number received by the multiplier may be divided into multiple parts, namely the aforementioned sign (optional), exponent, and mantissa. Optionally, after the normalization process, the mantissa part of the two floating-point numbers will enter the mantissa processing unit (such as the mantissa processing unit in Figure 2 or Figure 3) as input, and specifically enter the partial product Operation unit. As shown in Figure 4, the present disclosure uses Booth coding circuit 402 to fill the high and low bits of the mantissa of the second floating-point number (that is, the multiplier in floating-point operations) with zeros, and performs Booth coding processing, so that in part The intermediate result is obtained in the product generating circuit 404. Of course, the first floating-point number and the second floating-point number here are only for illustrative and not restrictive purposes. Therefore, in some application scenarios, the first floating-point number can be a multiplier and the second floating-point number can be Is the multiplicand. Correspondingly, in some encoding processes, encoding operations can also be performed on floating-point numbers that serve as multiplicands.

為了更好的理解本披露的技術方案，下面對布斯編碼進行簡要地介紹。一般地，當兩個二進位數字進行相乘操作時，通過乘法操作會產生大量的稱之為部分積的中間結果，然後在對這些部分積進行累加操作進而得到兩個二進位數字相乘的最終結果。其中部分積數量越多，陣列乘法器的面積和功耗就會越大，執行速度就會越慢，其實現電路也就越困難。而布斯編碼的目的就是為了有效地減少部分積的求和項的數量，從而減小電路面積。其演算法在於首先對輸入的乘數進行相應規則的編碼，在一個實施例中，編碼規則例如可以是下表4所示的規則：表4 待編碼資料編碼信號 y_2i+1 y_2i y_2i-1 PPi 0 0 0 0 0 0 1 X 0 1 0 X 0 1 1 2X 1 0 0 -2X 1 0 1 - X 1 1 0 - X 1 1 1 -0(=0) In order to better understand the technical solution of the present disclosure, the following briefly introduces Booth coding. Generally, when two binary numbers are multiplied, a large number of intermediate results called partial products will be produced through the multiplication operation, and then these partial products are accumulated to obtain the multiplication of two binary numbers. Final Results. The greater the number of partial products, the greater the area and power consumption of the array multiplier, the slower the execution speed, and the more difficult it is to implement the circuit. The purpose of Booth coding is to effectively reduce the number of summations of partial products, thereby reducing the circuit area. The algorithm is to first encode the input multiplier according to the corresponding rules. In one embodiment, the encoding rules may be, for example, the rules shown in Table 4 below: Table 4 Data to be encoded Coded signal y _2i+1 y _2i y _2i-1 PPi 0 0 0 0 0 0 1 X 0 1 0 X 0 1 1 2X 1 0 0 -2X 1 0 1 -X 1 1 0 -X 1 1 1 -0(=0)

其中表4中的y_2i+1 ，y_2i 和y_2i-1 可以表示每一組待編碼子資料(即乘數)對應的數值，X可以表示第一浮點數(即被乘數)中的尾數。對每一組對應的待編碼資料進行布斯編碼處理後，得到對應的編碼信號PPi(i=0，1，2，...，n)。如表4中所示意性示出的，布斯編碼後得到的編碼信號可以包括五類，分別為-2X、2X、-X、X和0。示例性地，基於上述的編碼規則，若接收到的被乘數為8位資料“

”，則可以獲得下述的部分積：Among them, y _2i+1 , y _2i and y _{2i-1 in} Table 4 can represent the value corresponding to each group of sub-data to be coded (ie, the multiplier), and X can represent the first floating point number (ie, the multiplicand) Mantissa. After Booth coding processing is performed on each group of corresponding data to be coded, the corresponding coded signal PPi (i=0, 1, 2, ..., n) is obtained. As shown schematically in Table 4, the coded signal obtained after Booth coding can include five types, which are -2X, 2X, -X, X, and 0, respectively. Exemplarily, based on the above encoding rules, if the received multiplicand is 8-bit data "

", the following partial product can be obtained:

1)當乘數位中包括上表中的連續三位資料“001”時，部分積為X，可以表示為“

”，第9位是符號位，即

；1) When the multiplier digits include the continuous three-digit data "001" in the above table, the partial product is X, which can be expressed as "

", the 9th bit is the sign bit, that is

；

2)當乘數位中包括上表中的連續三位資料“011”時，部分積為2X，可以表示為X左移一位，得到“

0”，即

；2) When the multiplier digits include the continuous three-digit data "011" in the above table, the partial product is 2X, which can be expressed as X shifted by one bit to the left, and "

0", that is

；

3)當乘數位中包括上表中的連續三位資料“101”時，部分積為-X，可以表示為“

”，表示對“

”按位取反再加1，即

；3) When the multiplier digits include the continuous three-digit data "101" in the above table, the partial product is -X, which can be expressed as "

", which means "

"Invert the bit and add 1 again, that is

；

4)當乘數位中包括上表中的連續三位資料“100”時，部分積為-2X，可以表示為

，表示對“

”左移一位後取反再加1，即

+1；4) When the multiplier digits include the continuous three-digit data "100" in the above table, the partial product is -2X, which can be expressed as

, Which means "

"Move to the left by one place, invert it and add 1 again, that is

+1;

5)當乘數位中包括上表中的連續三位資料“111”或“000”時，部分積為0，即

。5) When the multiplier digits include the continuous three-digit data "111" or "000" in the above table, the partial product is 0, that is

.

應當理解的是上面結合表4對獲得部分積的過程的描述僅僅是示例性的而非限制性的，本領域具有通常知識者在本披露的教導下，可以對表4中的規則進行改變，以獲得不同於表4所示出的部分積。例如，在乘數位中存在連續多位(例如3位或3位以上)的特定數時，得到的部分積可以是被乘數的補數，或者例如在對部分積進行加和之後再執行上述3)和4)項中的“加1”操作。It should be understood that the above description of the process of obtaining partial products in conjunction with Table 4 is only exemplary and not restrictive. Those with ordinary knowledge in the art can change the rules in Table 4 under the teaching of this disclosure. To obtain a partial product different from that shown in Table 4. For example, when there are multiple consecutive specific numbers (such as 3 or more) in the multiplier bits, the partial product obtained can be the complement of the multiplicand, or the above can be performed after adding the partial products. 3) and 4) "plus 1" operation.

根據上述介紹性描述可以理解，通過對第二浮點數的尾數利用布斯編碼電路進行編碼，並且利用第一浮點數的尾數，可以從部分積產生電路產生多個部分積作為中間結果，並且將中間結果輸送入到部分積求和單元中的華萊士樹(“Wallace Tree”)壓縮器406。應當理解的是，此處利用布斯編碼獲得部分積僅是本披露得到部分積的一種優選方式，而本領域具有通常知識者也可以通過其他的方式來獲得該部分積。例如，還可以通過移位操作來獲得，即根據乘數的位值為1還是0來選擇移位加被乘數還是加0而獲得相應的部分積。類似地，利用華萊士樹壓縮器以實現部分積的加法操作也僅僅是示例性的而非限制性的，本領域具有通常知識者也可以想到利用其他類型的加法器來實現這樣的部分積相加操作。該加法器例如可以是一個或多個全加器、半加器或二者的各種組合形式。According to the above introductory description, it can be understood that by using the Booth coding circuit to encode the mantissa of the second floating-point number, and using the mantissa of the first floating-point number, multiple partial products can be generated from the partial product generation circuit as intermediate results. And the intermediate result is sent to the Wallace Tree ("Wallace Tree") compressor 406 in the partial product summation unit. It should be understood that the use of Booth coding to obtain the partial product here is only a preferred method for obtaining the partial product in the present disclosure, and those with ordinary knowledge in the art can also obtain the partial product in other ways. For example, it can also be obtained through a shift operation, that is, according to whether the bit value of the multiplier is 1 or 0, the shift plus the multiplicand or the plus 0 is selected to obtain the corresponding partial product. Similarly, the use of a Wallace tree compressor to implement the addition operation of partial products is only exemplary and not restrictive. Those with ordinary knowledge in the art can also think of using other types of adders to implement such partial products. Add operation. The adder may be, for example, one or more full adders, half adders, or various combinations of the two.

關於華萊士樹壓縮器(或簡稱為華萊士樹)，其主要用於對上述的中間結果(即多個部分積)進行求和，以減少部分積的累加次數(即，壓縮)。通常，華萊士樹壓縮器可以採用進位保存CAS(carry-save)架構和Wallace樹算法，其利用華萊士樹陣列的計算速度比傳統進位傳遞的加法快得多。Regarding the Wallace tree compressor (or Wallace tree for short), it is mainly used to sum the above-mentioned intermediate results (ie, multiple partial products) to reduce the number of accumulation of partial products (ie, compression). Generally, a Wallace tree compressor can adopt a carry-save CAS (carry-save) architecture and a Wallace tree algorithm. The calculation speed of the Wallace tree array is much faster than the traditional carry-save addition.

具體地，華萊士樹壓縮器能平行計算各行部分積之和，例如可以將N個部分積的累加次數從N-1次減少到Log₂ N次，從而提高了乘法器的速度，對資源的有效利用具有重要意義。根據不同的應用需要，可以將華萊士樹壓縮器設計成多種類型，例如7-2華萊士樹、4-2華萊士樹以及3-2華萊士樹等。在一個或多個實施例中，本披露使用7-2華萊士樹作為實現本披露的各種浮點運算的示例，稍後將結合第5圖和第6圖對其進行詳細的描述。Specifically, the Wallace tree compressor can calculate the sum of the partial products of each row in parallel. For example, the number of accumulations of N partial products can be reduced from N-1 times to Log ₂ N times, thereby increasing the speed of the multiplier and reducing resources The effective use of it is of great significance. According to different application needs, the Wallace tree compressor can be designed into many types, such as 7-2 Wallace tree, 4-2 Wallace tree and 3-2 Wallace tree. In one or more embodiments, the present disclosure uses a 7-2 Wallace tree as an example for implementing various floating-point operations of the present disclosure, which will be described in detail later in conjunction with FIG. 5 and FIG. 6.

在一些實施例中，本披露所公開的華萊士樹壓縮操作可以佈置為具有M個輸入，N個輸出，其數目可以不小於K，其中N為預設的小於M的正整數，K為不小於中間結果的最大位寬的正整數。例如，M可以是7，N可以是2，即下文將詳細描述的7-2華萊士樹。當中間結果的最大位寬是48時，K可以取正整數48，也就是說華萊士樹的數目可以是48個。In some embodiments, the Wallace tree compression operation disclosed in the present disclosure may be arranged to have M inputs and N outputs, the number of which may not be less than K, where N is a preset positive integer less than M, and K is A positive integer not less than the maximum bit width of the intermediate result. For example, M can be 7, and N can be 2, which is a 7-2 Wallace tree which will be described in detail below. When the maximum bit width of the intermediate result is 48, K can take a positive integer of 48, which means that the number of Wallace trees can be 48.

在一些實施例中，根據運算模式，可以選用一組或多組所述華萊士樹對所述中間結果進行加和，其中每組有X個華萊士樹，X為所述中間結果的位數。進一步，各組內的華萊士樹之間可以存在依次進位的關係，而各組間並不存在進位的關係。在示例性的連接中，華萊士樹壓縮器可以通過進位進行連接，例如來自于低位華萊士樹壓縮器的進位輸出(如第6圖中Cin)至高位華萊士樹，而高位華萊士樹壓縮器的進位輸出(Cout)又可以成為更高位華萊士樹壓縮器接收來自低位華萊士樹壓縮器的進位輸入。另外，當從多個華萊士樹壓縮器中選擇一個或多個華萊士時，可以進行任意的選擇，例如既可以按0、1、2和3編號的順序來選擇，也可以按0、2、4和6編號的順序來連接，只要選擇的華萊士樹壓縮器是按上述的進位關係來選擇即可。In some embodiments, according to the operation mode, one or more groups of the Wallace trees can be selected to add the intermediate results, wherein each group has X Wallace trees, and X is the sum of the intermediate results. Digits. Further, the Wallace trees in each group may have a sequential carry relationship, but there is no carry relationship between each group. In an exemplary connection, the Wallace tree compressor can be connected by carrying, for example, the carry output from the low-level Wallace tree compressor (such as Cin in Figure 6) to the high-level Wallace tree, and the high-level Hua The Carry output (Cout) of the Laishi tree compressor can become a higher-order Wallace tree compressor to receive the carry input from the lower-order Wallace tree compressor. In addition, when one or more Wallace tree compressors are selected from multiple Wallace tree compressors, arbitrary selection can be made. For example, it can be selected in the order of 0, 1, 2 and 3, or 0 , 2, 4, and 6 are connected in the order of numbers, as long as the selected Wallace tree compressor is selected according to the above-mentioned carry relationship.

下面結合一個說明性的示例來介紹上文的華萊士樹及其操作。假設第一浮點數和第二浮點數的是16位資料（例如FP16*FP16），乘法器支援的資料位寬是32位 (由此支援兩組16位數的並行相乘操作)，華萊士樹是7個(即上述M的一個示例值)輸入和2個(即上述N的一個示例值)輸出的7-2華萊士樹壓縮器。在該示例場景下，可以採用48個(即上述K的一個示例值)華萊士樹來並行完成兩組資料的乘法運算。The following is an illustrative example to introduce the Wallace tree and its operation above. Assuming that the first floating-point number and the second floating-point number are 16-bit data (for example, FP16*FP16), the data width supported by the multiplier is 32 bits (thus supporting two sets of 16-bit parallel multiplication operations), The Wallace tree is a 7-2 Wallace tree compressor with 7 inputs (that is, an example value of M above) and 2 (that is, an example value of N above) output. In this example scenario, 48 Wallace trees (that is, an example value of K above) can be used to perform the multiplication operation of the two sets of data in parallel.

在上述的48個華萊士樹中，第0~23個華萊士樹(即第一組華萊士樹中的24個華萊士樹)可以完成第一組乘法的部分積加和運算，並且該組內的各華萊士樹可以依次通過進位連接。進一步，第24~47個華萊士樹(即第二組華萊士樹中的24個華萊士樹)可以完成第二組乘法的部分積加和運算，其中該組內的各華萊士樹依次通過進位連接。另外，第一組中的第23個華萊士樹和第二組中的第24個華萊士樹之間不存在進位關係，即不同組的華萊士樹之間不存在進位關係。Among the above 48 Wallace trees, the 0th to 23rd Wallace trees (that is, the 24 Wallace trees in the first group of Wallace trees) can complete the partial product addition and operation of the first group of multiplications , And each Wallace tree in the group can be connected by carry in turn. Furthermore, the 24th to 47th Wallace trees (that is, the 24 Wallace trees in the second group of Wallace trees) can complete the partial product addition operation of the second group of multiplications, where each Wallace in the group The scholar trees are connected by carry in turn. In addition, there is no carry relationship between the 23rd Wallace tree in the first group and the 24th Wallace tree in the second group, that is, there is no carry relationship between Wallace trees in different groups.

返回到第4圖，在通過華萊士樹壓縮器對部分積進行加和壓縮後，將經過壓縮後的部分積通過加法器進行求和，以獲得尾數乘法操作的結果。關於加法器，在本披露的一個或多個實施例中，其可以包括全加器、串列加法器和超前進位加法器中的一種，用於對華萊士樹壓縮器進行加和所得到的最後兩行部分積進行求和操作，以獲得尾數乘法操作的結果。Returning to Figure 4, after the partial products are added and compressed by the Wallace tree compressor, the compressed partial products are summed by the adder to obtain the result of the mantissa multiplication operation. Regarding the adder, in one or more embodiments of the present disclosure, it may include one of a full adder, a tandem adder, and an advance bit adder for adding and summing the Wallace tree compressor. The obtained partial products of the last two rows are summed to obtain the result of the mantissa multiplication operation.

可以理解，通過第4圖所示出的尾數乘法操作，特別是示例性地使用布斯編碼和華萊士樹，可以有效地獲得尾數乘法操作的結果。具體地，布斯編碼處理能有效減少部分積求和項的數目，從而減小電路面積，而華萊士壓縮樹能平行計算各行部分積之和，從而提高了乘法器的速度。It can be understood that the mantissa multiplication operation shown in Figure 4, especially the exemplary use of Booth coding and Wallace tree, can effectively obtain the result of the mantissa multiplication operation. Specifically, Booth encoding processing can effectively reduce the number of partial product summations, thereby reducing the circuit area, while the Wallace compression tree can calculate the sum of partial products of each row in parallel, thereby increasing the speed of the multiplier.

下面將結合第5圖和第6圖對部分積和7-2華萊士樹的示例操作過程作詳細的描述。可以理解的是這裡的描述僅僅是示例性的而非限制性的，目的僅在於對本披露方案的更好理解。The example operation process of the partial product sum 7-2 Wallace tree will be described in detail below in conjunction with Figures 5 and 6. It can be understood that the description here is merely exemplary rather than restrictive, and is only for a better understanding of the present disclosure.

第5圖示出在經過前述結合第2圖-第4圖所描述的尾數處理單元中的部分積產生電路後所獲得的部分積500，如圖中的兩個虛線之間四行白色圓點，其中每行白色圓點標識出一個部分積。為了便於後續的華萊士樹壓縮器的執行，可以預先對位數進行擴展。例如，第5圖中的黑點為複製的每個9位部分積的最高位數值，可以看出部分積被擴展對齊至16(8+8)bit(即，被乘數尾數的位寬8bit+乘數尾數的位寬8bit)。在另一個實施例中，例如對於25*13二進位乘法的部分積，其部分積被擴展至38(25+13)bit(即，被乘數尾數的位寬25bit+乘數尾數的位寬13bit)。Figure 5 shows the partial product 500 obtained after passing through the partial product generating circuit in the mantissa processing unit described in conjunction with Figures 2 to 4, as shown in the figure between the two dashed lines in four rows of white dots , Where each row of white dots identifies a partial product. In order to facilitate the subsequent implementation of the Wallace tree compressor, the number of bits can be expanded in advance. For example, the black dot in Figure 5 is the highest value of each 9-bit partial product copied. It can be seen that the partial product is expanded and aligned to 16(8+8)bit (that is, the bit width of the multiplicand mantissa is 8bit+ The bit width of the multiplier mantissa is 8bit). In another embodiment, for example, for the partial product of 25*13 binary multiplication, the partial product is expanded to 38 (25+13) bits (ie, the bit width of the multiplicand mantissa is 25 bits + the bit width of the multiplier mantissa is 13 bits ).

第6圖是示出根據本披露實施例的華萊士樹壓縮器的操作流程和示意框圖600。Figure 6 is a schematic block diagram 600 showing the operation flow of the Wallace tree compressor according to an embodiment of the present disclosure.

如第6圖中所示，在對兩個浮點數的尾數執行相乘操作後，例如如前所述，通過將乘數進行布斯編碼並且通過被乘數可以獲得第6圖中所示出的7個部分積。由於布斯編碼算法的使用，減小了產生的部分積的數目。為了便於理解，圖中在部分積部分用虛線框標識出一個包括7個元素的華萊士樹，並且進一步以箭頭示出其從7個元素壓縮至2個元素的過程。在一個實施例中，該壓縮過程(或稱加和過程)可以借助於全加器來實現，即輸入三個元素輸出兩個元素(即一個和“sum”以及向高位的進位“carry”)。7-2華萊士樹壓縮器的示意框圖在第6圖的右側示出，可以理解該華萊士樹壓縮器包括7個來自一列部分積的輸入(如第6圖左側虛線框中標識的七個元素)。在操作中，第0列華萊士樹的進位輸入為0，每列華萊士樹的進位輸出Cout作為下一列華萊士樹的進位輸入Cin。As shown in Figure 6, after performing the multiplication operation on the mantissa of the two floating-point numbers, for example, as described above, the multiplier can be obtained by Booth coding and the multiplicand as shown in Figure 6. Out of the 7 partial products. Due to the use of Booth coding algorithm, the number of partial products generated is reduced. For ease of understanding, in the figure, a dashed frame is used in the partial product part to identify a Wallace tree that includes 7 elements, and the process of compressing it from 7 elements to 2 elements is further shown with arrows. In one embodiment, the compression process (or called the addition process) can be implemented by means of a full adder, that is, three elements are input and two elements are output (ie, a sum "sum" and a carry "carry" to the higher order) . The schematic block diagram of the 7-2 Wallace Tree Compressor is shown on the right side of Figure 6. It can be understood that the Wallace Tree Compressor includes 7 inputs from a column of partial products (as indicated in the dashed box on the left side of Figure 6). Of the seven elements). In operation, the carry input of the Wallace tree in the 0th column is 0, and the carry output Cout of each Wallace tree is used as the carry input Cin of the next Wallace tree.

從第6圖左側部分中可以看到，經過四次壓縮後可以將包括7個元素的華萊士樹壓縮為包括2個元素。如前所提到，本披露利用7-2華萊士樹壓縮器將7行的部分積最終壓縮成具有兩行的部分積(即本披露的第二中間結果)，並且利用加法器(例如超前進位加法器)來獲得尾數結果。It can be seen from the left part of Figure 6 that after four compressions, the Wallace tree including 7 elements can be compressed to include 2 elements. As mentioned earlier, this disclosure uses a 7-2 Wallace tree compressor to finally compress the partial product of 7 rows into a partial product with two rows (ie, the second intermediate result of this disclosure), and uses an adder (for example, Advance bit adder) to get the mantissa result.

為了進一步闡述本披露方案的原理，下面將示例性地描述本披露的乘法器如何完成FP16*FP16、FP16*FP16、FP32*FP32和FP32*BF16四種運算模式下在第一階段的操作，即直到華萊士樹壓縮器完成中間結果的求和以獲得第二中間結果：In order to further illustrate the principle of the present disclosure, the following will exemplarily describe how the multiplier of the present disclosure completes the operation in the first stage under the four operation modes of FP16*FP16, FP16*FP16, FP32*FP32 and FP32*BF16, namely Until the Wallace tree compressor completes the summation of the intermediate results to obtain the second intermediate result:

(1)FP16*FP16(1)FP16*FP16

在乘法器的該運算模式下，浮點數的尾數位為10bit，考慮IEEE754標準下非規格化非零數，可以擴展1bit位，從而尾數位為11bit。另外，由於尾數位為無符號數，採用布斯編碼算法時可以在高位擴展1bit的0，因此總的尾數位數為12bit。當對作為第二浮點數即乘數進行布斯編碼，並且參照第一浮點數時，則通過部分積產生電路可以在高低部分分別獲得7個部分積，其中第七個部分積為0，每個部分積的位寬為24bit，此時可以通過48個7-2華萊士樹進行壓縮處理，並且第23個到第24個華萊士樹的進位為0。In this operation mode of the multiplier, the mantissa bits of the floating-point number are 10 bits. Considering the non-normalized non-zero numbers under the IEEE754 standard, the mantissa bits can be extended by 1 bit, so that the mantissa bits are 11 bits. In addition, because the mantissa bits are unsigned numbers, when Booth coding algorithm is used, 1 bit of 0 can be extended in the high bit, so the total mantissa bits are 12 bits. When Booth encoding is used as the second floating-point number, that is, the multiplier, and referring to the first floating-point number, the partial product generation circuit can obtain 7 partial products in the high and low parts respectively, and the seventh partial product is 0. , The bit width of each partial product is 24bit. At this time, 48 7-2 Wallace trees can be compressed, and the carry of the 23rd to 24th Wallace trees is 0.

(2)BF16*BF16(2)BF16*BF16

在乘法器的該運算模式下，浮點數的尾數位為7bit，考慮IEEE754標準下非規格化非零數及擴展為有符號數，則尾數可以擴展為9bit。當對作為第二浮點數即乘數進行布斯編碼，並且參照第一浮點數時，則通過部分積產生電路可以在高低部分分別獲得7個有效部分積，其中第6、7個部分積為0，每個部分積位寬為18bit，通過使用第0~17個和第24~41個兩組的7-2華萊士樹進行壓縮處理，其中第23到第24個華萊士樹的進位為0。In this operation mode of the multiplier, the mantissa of the floating-point number is 7 bits. Considering the non-normalized non-zero number under the IEEE754 standard and the expansion to a signed number, the mantissa can be expanded to 9 bits. When Booth coding is used as the second floating-point number, that is, the multiplier, and referring to the first floating-point number, the partial product generation circuit can obtain 7 effective partial products in the high and low parts respectively, of which the 6th and 7th parts The product is 0, and the bit width of each part is 18bit. Compression is performed by using the 7-2 Wallace trees of the 0th to 17th and the 24th to 41st groups, of which the 23rd to the 24th Wallace The carry of the tree is 0.

(3)FP32*FP32(3)FP32*FP32

在乘法器的該運算模式下，浮點數的尾數位可以為23bit，考慮IEEE754標準下非規格化非零數及擴展為有符號數，則尾數可以擴展為25bit。為節省乘法單元的面積，例如乘法器所支援的位寬可以被設計得較小，並且使得本披露的乘法器在該運算模式下可以被調用兩次以完成一次運算。為此，每次尾數位進行的乘法為25bit*13bit，即將第一浮點數ina擴展1比特0成為25bit的有符號數，將第二浮點數inb的24bit尾數位分高低兩部分12bit分別擴展1比特0得到兩個13bit的乘數，表示為inb_high13和inb_low13高低兩部分。具體操作中，第一次調用本披露的乘法器計算ina*inb_low13，第二次調用乘法器計算ina*inb_high13。在每一次的計算中，通過布斯編碼生成7個有效部分積，每個部分積的位寬為38bit，通過第0~37個的7-2華萊士樹進行壓縮。In this operation mode of the multiplier, the mantissa bits of the floating-point number can be 23 bits. Considering the non-normalized non-zero numbers under the IEEE754 standard and expansion to signed numbers, the mantissa can be expanded to 25 bits. In order to save the area of the multiplication unit, for example, the bit width supported by the multiplier can be designed to be smaller, and the multiplier of the present disclosure can be called twice in this operation mode to complete an operation. For this reason, the multiplication of the mantissa bits each time is 25bit*13bit, that is, the first floating-point number ina is expanded by 1 bit 0 to become a 25-bit signed number, and the 24bit mantissa bits of the second floating-point number inb are divided into high and low parts, 12 bits respectively. Extend 1 bit 0 to get two 13-bit multipliers, which are expressed as the high and low parts of inb_high13 and inb_low13. In the specific operation, the multiplier of the present disclosure is called for the first time to calculate ina*inb_low13, and the multiplier is called for the second time to calculate ina*inb_high13. In each calculation, 7 effective partial products are generated through Booth coding, and the bit width of each partial product is 38 bits, compressed by the 0th to 37th 7-2 Wallace trees.

(4)FP32*BF16(4)FP32*BF16

該乘法器的該運算模式下，第一浮點數ina的尾數位為23bit，第二浮點數的inb的尾數位為7bit，考慮IEEE754標準下非規格化非零數和擴展為有符號數，則尾數可以分別擴展為25bit和9bit，進行25bit×9bit的乘法，獲得7個有效部分積，其中第6、7個部分積為0，每個部分積的位寬為34bit，通過第0~33個華萊士樹進行壓縮。In this operation mode of the multiplier, the mantissa bit of the first floating-point number ina is 23bit, and the mantissa bit of the second floating-point number inb is 7bit. Considering the denormalized non-zero number under the IEEE754 standard and the expansion to a signed number , The mantissa can be expanded to 25bit and 9bit respectively, and the multiplication of 25bit×9bit is performed to obtain 7 effective partial products. The sixth and seventh partial products are 0, and the bit width of each partial product is 34bit. 33 Wallace trees are compressed.

以上通過具體示例描述了本披露的乘法器如何在四種運算模式下完成第一階段的操作，其中優選的使用了布斯編碼算法和7-2華萊士樹。基於上述的描述，本領域具有通常知識者可以理解本披露使用7個部分積，使得可以在不同的運算模式中複用7-2華萊士樹。The above describes how the multiplier of the present disclosure completes the first stage of operation in four operation modes through specific examples, in which the Booth coding algorithm and the 7-2 Wallace tree are preferably used. Based on the above description, those with ordinary knowledge in the art can understand that this disclosure uses 7 partial products, so that the 7-2 Wallace tree can be reused in different operation modes.

下面將更具體地描述本公開的乘法器（尾數處理單元和指數處理單元）被多次調用的情況。The case where the multiplier (mantissa processing unit and exponent processing unit) of the present disclosure is called multiple times will be described in more detail below.

根據本公開的另一方面，如第3圖所示，尾數處理單元可以包括控制電路316，並且該控制電路316可以用於在兩個浮點數中的至少一個的尾數位寬大於所述尾數處理單元一次可處理的資料位寬時，多次調用所述尾數處理單元。所述尾數處理單元一次可處理的資料位寬是指尾數處理單元所支援的兩個位寬（例如乘數位寬和被乘數位寬）。因此，可以理解，所述控制電路用於根據所述兩個浮點數中的一個的尾數位寬和所述尾數處理單元所支援的兩個位寬中的一個，或者根據所述兩個浮點數的尾數位寬和所述尾數處理單元所支援的兩個位寬來確定多次調用所述尾數處理單元以獲得所述乘法運算後的尾數。因此，乘法器中的尾數處理單元的這種反復調用避免了佈置大面積的乘法器部件來處理大位寬尾數運算並且避免了佈置小面積的乘法器部件無法處理大位寬尾數運算，從而在適用性更強的同時有利於減小晶片面積。According to another aspect of the present disclosure, as shown in FIG. 3, the mantissa processing unit may include a control circuit 316, and the control circuit 316 may be used to set the mantissa bit width of at least one of the two floating-point numbers to be larger than the mantissa. When the processing unit can process the data bit width at one time, the mantissa processing unit is called multiple times. The data bit width that can be processed by the mantissa processing unit at one time refers to two bit widths (for example, the multiplier bit width and the multiplicand bit width) supported by the mantissa processing unit. Therefore, it can be understood that the control circuit is configured to perform according to the bit width of one of the two floating-point numbers and one of the two bit widths supported by the mantissa processing unit, or according to the two floating-point numbers. The bit width of the mantissa of the point and the two bit widths supported by the mantissa processing unit determine that the mantissa processing unit is called multiple times to obtain the mantissa after the multiplication operation. Therefore, this repeated invocation of the mantissa processing unit in the multiplier avoids arranging large-area multiplier components to handle large-bit-width mantissa operations and avoids arranging small-area multiplier components that cannot handle large-bit-width mantissa operations. The applicability is stronger and it is conducive to reducing the wafer area.

根據本公開的第一實施例，所述兩個浮點數包括第一浮點數和第二浮點數，所述尾數處理單元支援第一位寬和第二位寬，所述第一浮點數的尾數作為與所述第一位寬對應的第一輸入，所述第二浮點數的尾數作為與所述第二位寬對應的第二輸入，所述第一輸入的位寬小於或等於所述第一位寬，所述控制電路用於當所述第二輸入的位寬大於所述第二位寬時，多次調用所述尾數處理單元來獲得所述乘法運算後的尾數。根據該實施例，已知兩個輸入中的一個的位寬固定小於或等於與其對應的尾數處理單元所支援的一個位寬，由此，只需判斷另一個輸入與對應的尾數處理單元所支援位寬的大小關係，即可確定是否多次調用尾數處理單元。According to the first embodiment of the present disclosure, the two floating point numbers include a first floating point number and a second floating point number, the mantissa processing unit supports a first bit width and a second bit width, and the first floating point number The mantissa of the point is used as the first input corresponding to the first bit width, the mantissa of the second floating point number is used as the second input corresponding to the second bit width, and the bit width of the first input is less than Or equal to the first bit width, and the control circuit is configured to call the mantissa processing unit multiple times to obtain the mantissa after the multiplication operation when the bit width of the second input is greater than the second bit width . According to this embodiment, it is known that the bit width of one of the two inputs is fixed to be less than or equal to the bit width supported by the corresponding mantissa processing unit. Therefore, it is only necessary to determine whether the other input is supported by the corresponding mantissa processing unit. The size relationship of the bit width can determine whether to call the mantissa processing unit multiple times.

根據本公開的第二實施例，所述兩個浮點數包括第一浮點數和第二浮點數，所述尾數處理單元支援第一位寬和第二位寬，所述第一浮點數的尾數作為與所述第一位寬對應的第一輸入，所述第二浮點數的尾數作為與所述第二位寬對應的第二輸入，所述控制電路用於當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬小於或等於所述第二位寬時、當所述第二輸入的位寬大於所述第二位寬且所述第一輸入的位寬小於或等於所述第一位寬時或者當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬大於所述第二位寬時，多次調用所述尾數處理單元來獲得所述乘法運算後的尾數。根據該實施例，兩個輸入的位寬與尾數處理單元所支援的兩個位寬的大小關係不確定，需要判斷兩個輸入與各自對應的尾數處理單元所支援位寬的大小關係，來確定是否多次調用尾數處理單元。According to a second embodiment of the present disclosure, the two floating-point numbers include a first floating-point number and a second floating-point number, the mantissa processing unit supports a first bit width and a second bit width, and the first floating point number The mantissa of the point is used as the first input corresponding to the first bit width, the mantissa of the second floating-point number is used as the second input corresponding to the second bit width, and the control circuit is used when the When the bit width of the first input is greater than the first bit width and the bit width of the second input is less than or equal to the second bit width, when the bit width of the second input is greater than the second bit width and When the bit width of the first input is less than or equal to the first bit width or when the bit width of the first input is greater than the first bit width and the bit width of the second input is greater than the second bit When it is wide, the mantissa processing unit is called multiple times to obtain the mantissa after the multiplication operation. According to this embodiment, the relationship between the bit widths of the two inputs and the two bit widths supported by the mantissa processing unit is uncertain. It is necessary to determine the relationship between the two inputs and the bit widths supported by the respective mantissa processing units to determine Whether to call the mantissa processing unit multiple times.

根據該第二實施例，當所述第一浮點數的尾數位寬小於所述第二浮點數的尾數位寬並且所述第一位寬大於所述第二位寬時，或者當所述第一浮點數的尾數位寬大於所述第二浮點數的尾數位寬並且所述第一位寬小於所述第二位寬時，所述控制電路選擇所述第一浮點數的尾數作為與所述第二位寬對應的所述第二輸入並且選擇所述第二浮點數的尾數作為與所述第一位寬對應的第一輸入。應當理解，在兩個浮點數的尾數無規則輸入時，可以先將輸入的兩個浮點數的尾數根據大位寬對大位寬、小位寬對小位寬的策略與尾數處理單元支援的兩個位寬進行匹配，以避免本可一次處理完成兩個浮點數的尾數運算，卻進行了多次調用。According to this second embodiment, when the mantissa bit width of the first floating-point number is smaller than the mantissa bit width of the second floating-point number and the first bit width is greater than the second bit width, or when the When the mantissa bit width of the first floating-point number is greater than the mantissa bit width of the second floating-point number and the first bit width is less than the second bit width, the control circuit selects the first floating-point number The mantissa of is used as the second input corresponding to the second bit width and the mantissa of the second floating point number is selected as the first input corresponding to the first bit width. It should be understood that when the mantissa of two floating-point numbers is entered irregularly, the mantissa of the two input floating-point numbers can be first inputted according to the strategy of large bit width to large bit width and small bit width to small bit width and the mantissa processing unit The supported two bit widths are matched to avoid multiple calls that can be done to complete the mantissa operation of two floating-point numbers at one time.

進一步地，當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬小於或等於所述第二位寬時，所述控制電路根據所述第一輸入的位寬和所述第一位寬來確定調用所述尾數處理單元的次數以及在每次調用中輸入所述尾數處理單元的資料。當所述第二輸入的位寬大於所述第二位寬且所述第一輸入的位寬小於或等於所述第一位寬時，所述控制電路根據所述第二輸入的位寬和所述第二位寬來確定調用所述尾數處理單元的次數以及在每次調用中輸入所述尾數處理單元的資料。當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬大於所述第二位寬時，所述控制電路根據所述第一輸入的位寬和所述第一位寬以及所述第二輸入的位寬和所述第二位寬來確定調用所述尾數處理單元的次數以及在每次調用中輸入所述尾數處理單元的資料。Further, when the bit width of the first input is greater than the first bit width and the bit width of the second input is less than or equal to the second bit width, the control circuit according to the first input The bit width and the first bit width determine the number of times the mantissa processing unit is called and the data of the mantissa processing unit is input in each call. When the bit width of the second input is greater than the second bit width and the bit width of the first input is less than or equal to the first bit width, the control circuit is based on the sum of the bit widths of the second input The second bit width determines the number of times the mantissa processing unit is called and the data of the mantissa processing unit is input in each call. When the bit width of the first input is greater than the first bit width and the bit width of the second input is greater than the second bit width, the control circuit is based on the bit width of the first input and the The first bit width and the bit width of the second input and the second bit width determine the number of times the mantissa processing unit is called and the data of the mantissa processing unit is input in each call.

在本公開中，關於第一浮點數和第二浮點數的描述只是為了區分兩個浮點數，其中“第一”和“第二”不具有限定作用。同樣地，關於第一位寬和第二位寬的描述只是為了區分尾數處理單元所支援的兩個最大處理位寬，並且關於第一輸入和第二輸入的描述只是為了區分所述尾數處理單元的與所述兩個最大處理位寬對應的兩個輸入，因此其中“第一”和“第二”都不具有限定作用。In the present disclosure, the description of the first floating-point number and the second floating-point number is only for distinguishing the two floating-point numbers, where "first" and "second" have no limiting effect. Similarly, the description about the first bit width and the second bit width is only for distinguishing the two maximum processing bit widths supported by the mantissa processing unit, and the description about the first input and the second input is only for distinguishing the mantissa processing unit The two inputs corresponding to the two maximum processing bit widths, so neither "first" nor "second" has a limiting effect.

值得注意的是，以上實施例描述的輸入乘法器的浮點數是符合運算要求格式以及適用乘法器內部部件和外部部件的浮點數，即經過例如規格化等預處理的浮點數。應當理解，輸入乘法器的浮點數可以是規格化或非規格化的浮點數，結合以上關於規格化單元的描述可知，如果輸入的兩個浮點數中的至少一個浮點數為非規格化的非零浮點數，可以首先通過規格化單元對所述至少一個浮點數進行規格化處理，以獲得規格化後的指數和尾數，然後使用規格化後的尾數作為尾數處理單元的輸入來進行上述的浮點數乘法運算。另外，本公開之前提到的布斯編碼電路進行有符號定點數乘法計算，因此還需要對尾數前面擴展1位0，即將尾數變為有符號正數，然後使用擴展後的有符號尾數作為尾數處理單元的輸入來進行上述的浮點數乘法運算。當然，還可以對浮點數進行其他的預處理，並將預處理後的浮點數的尾數作為尾數處理單元的輸入來進行上述的浮點數乘法運算，例如以上關於規格化單元的描述中提到的為了適用運算模式而對浮點數進行的規格化，本公開的第一實施例和第二實施例同樣適用於如上所述的根據運算模式進行浮點數的運算。It is worth noting that the floating-point number of the input multiplier described in the above embodiment is a floating-point number that meets the format required by the operation and applies to the internal and external components of the multiplier, that is, the floating-point number that has undergone preprocessing such as normalization. It should be understood that the floating-point number input to the multiplier can be a normalized or non-normalized floating-point number. Combining the description of the normalization unit above, it can be known that if at least one of the two input floating-point numbers is non-standardized. For a normalized non-zero floating-point number, the at least one floating-point number may be normalized by a normalization unit first to obtain a normalized exponent and mantissa, and then the normalized mantissa is used as the mantissa processing unit. Input to perform the floating-point number multiplication described above. In addition, the Booth coding circuit mentioned in the present disclosure performs signed fixed-point multiplication calculations, so it is necessary to extend the mantissa by 1 bit 0, that is, the mantissa becomes a signed positive number, and then use the extended signed mantissa as the mantissa. The input of the unit performs the floating-point number multiplication described above. Of course, it is also possible to perform other preprocessing on floating-point numbers, and use the mantissa of the pre-processed floating-point number as the input of the mantissa processing unit to perform the above-mentioned floating-point number multiplication operation, for example, in the description of the normalization unit above As mentioned above for the normalization of floating-point numbers in order to apply the operation mode, the first and second embodiments of the present disclosure are also applicable to the above-mentioned operation of floating-point numbers according to the operation mode.

下面將詳細說明根據本公開的上述第二實施例的多次調用尾數處理單元的三個示例。為了更清楚直觀地理解這三個示例，上述第一輸入例如可以是乘數，第二輸入例如可以是被乘數，第一位寬例如可以是尾數處理單元所支援的最大乘數位寬，第二位寬例如可以是尾數處理單元所支援的最大被乘數位寬。Three examples of the multi-call mantissa processing unit according to the above-mentioned second embodiment of the present disclosure will be described in detail below. In order to understand these three examples more clearly and intuitively, the above-mentioned first input may be a multiplier, for example, the second input may be a multiplicand, and the first bit width may be, for example, the maximum multiplier bit width supported by the mantissa processing unit. The two-bit width may be, for example, the maximum multiplicand bit width supported by the mantissa processing unit.

根據本公開的多次調用尾數處理單元的第一示例，結合以上描述的根據運算模式的浮點數乘法運算，以輸入到本公開乘法器的兩個浮點數為非規格化的非零浮點數為例，並結合本公開使用的布斯編碼電路進行有符號定點數乘法運算的情況，首先將兩個浮點數規格化，因此兩個浮點數的尾數擴展1位，另外為了適用於本公開實施例中的布斯編碼電路，再將兩個尾數擴展1位而形成有符號數。在經過這些預處理後，將兩個浮點數的尾數和尾數處理單元的輸入進行匹配。因此，當乘數的位寬大於最大乘數位寬且被乘數的位寬小於或等於最大被乘數時，所述控制電路將該乘數對應的原始尾數僅規格化後形成的尾數作為待截取尾數，並且為了適用於本公開實施例中的布斯編碼電路，對每次截取的部分擴展符號位。為了使得尾數處理單元可以處理該待截取尾數，在每次調用中從該待截取尾數中截取位寬為A-1的部分，其中，A代表尾數處理單元所支援的最大乘數位寬，對每次截取的位寬為A-1的部分在高位補充一位0作為符號形成位寬為A的乘數部分，該乘數部分作為在每次調用中輸入尾數處理單元的一個輸入。另外，所述被乘數（在該實施例中，該被乘數是規格化且擴展符號位的尾數）在每次調用中作為另一個輸入而輸入尾數處理單元。由此，可以使用以下公式來確定尾數處理單元的調用次數：According to the first example of the multi-call mantissa processing unit of the present disclosure, combined with the floating-point number multiplication operation according to the operation mode described above, the two floating-point numbers input to the multiplier of the present disclosure are denormalized, non-zero floats. Take the number of points as an example, and combine the Booth coding circuit used in the present disclosure to perform signed fixed-point number multiplication. First, the two floating-point numbers are normalized, so the mantissa of the two floating-point numbers is extended by 1 bit. In addition, for application In the Booth coding circuit in the embodiment of the present disclosure, the two mantissas are extended by 1 bit to form a signed number. After these preprocessing, the mantissa of the two floating-point numbers and the input of the mantissa processing unit are matched. Therefore, when the bit width of the multiplier is greater than the maximum multiplier bit width and the bit width of the multiplicand is less than or equal to the maximum multiplicand, the control circuit only normalizes the original mantissa corresponding to the multiplier as the waiting number. The mantissa is truncated, and in order to be applicable to the Booth coding circuit in the embodiment of the present disclosure, the sign bit is extended for each truncated part. In order to enable the mantissa processing unit to process the mantissa to be truncated, the part of the mantissa to be truncated is truncated with a bit width of A-1 in each call, where A represents the maximum bit width of the multiplier supported by the mantissa processing unit. The part with the bit width A-1 of the second interception is supplemented with a bit 0 at the high bit as a symbol to form a multiplier part with a bit width A, and the multiplier part is used as an input of the mantissa processing unit in each call. In addition, the multiplicand (in this embodiment, the multiplicand is a normalized and extended sign bit mantissa) is input to the mantissa processing unit as another input in each call. Therefore, the following formula can be used to determine the number of calls of the mantissa processing unit:

n=ceil((B+1)/(A-1))，n=ceil((B+1)/(A-1)),

其中，n代表調用尾數處理單元的次數，B代表未規格化且未擴展符號位的尾數的位寬，B+1代表對尾數規格化後的位寬，B+1也可理解為B+2-1，即乘數的位寬減去符號位的位寬，A代表乘數部分的位寬（尾數處理單元所支援的最大乘數位寬），A-1代表每次調用中從待截取尾數中截取的部分的位寬。Among them, n represents the number of times the mantissa processing unit is called, B represents the bit width of the unnormalized mantissa without extending the sign bit, B+1 represents the bit width after the mantissa is normalized, and B+1 can also be understood as B+2 -1, that is, the bit width of the multiplier minus the bit width of the sign bit, A represents the bit width of the multiplier part (the maximum bit width of the multiplier supported by the mantissa processing unit), A-1 represents the mantissa to be truncated in each call The bit width of the intercepted part in.

舉例來說，尾數處理單元所支援的最大乘數位寬例如為8bit，最大被乘數位寬例如為32bit，輸入乘法器的兩個浮點數分別是FP32類型和BF16類型的浮點數，因此選擇在FP32*BF16運算模式中進行乘法運算，並且兩個浮點數是非規格化非零數，因此兩個浮點數的尾數分別具有23bit和7bit的位寬，考慮IEEE754標準，則兩個尾數的位寬可以擴展為24bit和8bit。為了適用於本公開實施例中的布斯編碼電路，再將兩個尾數擴展1比特0成為25bit和9bit的有符號數。因此控制電路將位寬為9bit的尾數作為與最大乘數位寬對應的乘數並且將位寬為25bit的尾數作為與最大被乘數位寬對應的被乘數，由於僅乘數的位寬（9bit）大於最大乘數位寬（8bit），而被乘數的位寬（25bit）小於最大被乘數位寬（32bit），因此將該乘數所對應的原始尾數僅規格化後形成的尾數作為待截取尾數inb，則被乘數作為輸入尾數處理單元的被乘數ina。根據以上公式，ceil((7+1)/(8-1))=2，因此，需要調用兩次尾數處理單元，並且在每次調用時，在inb中每次截取7bit資料，最後一次調用（第二次調用）時，不足7bit資料，則將剩餘資料全部截取並在前面補0湊齊7bit，並且每次截取的7bit資料擴展1比特0（符號位）成為8bit作為乘數部分inb_m，因此，在每次調用時進行的計算為ina*inb_m，即位寬為25bit的被乘數與位寬為8bit的乘數部分的乘法運算，從而可以計算得出該次調用所獲得的尾數結果。值得注意的是，對待截取尾數的截取可以按照從高位到低位的順序進行，也可以按照從低位到高位的順序進行。值得注意的是，該示例同樣適用於本公開上述第一實施例。For example, the maximum multiplier bit width supported by the mantissa processing unit is, for example, 8bit, and the maximum multiplicand bit width is, for example, 32bit. The two floating-point numbers input to the multiplier are FP32 and BF16 floating-point numbers, so choose Multiplication is performed in the FP32*BF16 operation mode, and the two floating-point numbers are non-normalized non-zero numbers. Therefore, the mantissa of the two floating-point numbers has a bit width of 23bit and 7bit respectively. Considering the IEEE754 standard, the two mantissas are The bit width can be expanded to 24bit and 8bit. In order to be applicable to the Booth coding circuit in the embodiment of the present disclosure, the two mantissas are extended by 1 bit 0 to become 25-bit and 9-bit signed numbers. Therefore, the control circuit takes the mantissa with a bit width of 9bit as the multiplier corresponding to the maximum multiplier bit width and uses the mantissa with a bit width of 25bit as the multiplicand corresponding to the maximum multiplicand bit width. Since only the bit width of the multiplier (9bit ) Is greater than the maximum multiplier bit width (8bit), and the bit width of the multiplicand (25bit) is less than the maximum multiplicand bit width (32bit), so the original mantissa corresponding to the multiplier is only the normalized mantissa as the mantissa to be intercepted The mantissa inb, the multiplicand is used as the multiplicand ina of the input mantissa processing unit. According to the above formula, ceil((7+1)/(8-1))=2, therefore, it is necessary to call the mantissa processing unit twice, and in each call, intercept 7bit data in inb each time, and call it the last time (The second call), if there is less than 7bit data, all the remaining data will be intercepted and 0 will be added to the front to make up 7bit, and each intercepted 7bit data will be extended by 1 bit 0 (sign bit) to 8bit as the multiplier part inb_m, Therefore, the calculation performed at each call is ina*inb_m, that is, the multiplication operation of the multiplicand with a bit width of 25bit and the multiplier part with a bit width of 8bit, so that the mantissa result obtained by the call can be calculated. It is worth noting that the interception of the mantissa to be truncated can be performed in the order from high to low, or in the order from low to high. It should be noted that this example is also applicable to the above-mentioned first embodiment of the present disclosure.

根據本公開的多次調用尾數處理單元的第二示例，結合以上描述的根據運算模式的浮點數乘法運算，以輸入到本公開乘法器的兩個浮點數為非規格化的非零浮點數為例，並結合本公開使用的布斯編碼電路進行有符號定點數乘法運算的情況，首先將兩個浮點數規格化，因此兩個浮點數的尾數擴展1位，另外為了適用於本公開實施例中的布斯編碼電路，再將兩個尾數擴展1位而形成有符號數。在經過這些預處理後，將兩個浮點數的尾數和尾數處理單元的輸入進行匹配。因此，當被乘數的位寬大於最大被乘數位寬且乘數的位寬小於或等於最大乘數位寬時，所述控制電路將該被乘數對應的原始尾數僅規格化後形成的尾數作為待截取尾數，並且為了適用於本公開實施例中的布斯編碼電路，對每次截取的部分擴展符號位。為了使得尾數處理單元可以處理該待截取尾數，在每次調用中從該尾數中截取位寬為C-1的部分，其中，C代表尾數處理單元所支援的最大被乘數位寬，對每次截取的位寬為C-1的部分在高位補充一位0作為符號形成位寬為C的被乘數部分，該被乘數部分作為在每次調用中輸入尾數處理單元的一個輸入。另外，所述乘數（在該實施例中，該乘數是規格化且擴展符號位的尾數）在每次調用中作為另一個輸入而輸入尾數處理單元。由此，可以使用以下公式來確定尾數處理單元的調用次數：According to the second example of the multiple-call mantissa processing unit of the present disclosure, in combination with the floating-point number multiplication operation according to the operation mode described above, the two floating-point numbers input to the multiplier of the present disclosure are denormalized non-zero floats. Take the number of points as an example, and combine the Booth coding circuit used in the present disclosure to perform signed fixed-point number multiplication. First, the two floating-point numbers are normalized, so the mantissa of the two floating-point numbers is extended by 1 bit. In addition, for application In the Booth coding circuit in the embodiment of the present disclosure, the two mantissas are extended by 1 bit to form a signed number. After these preprocessing, the mantissa of the two floating-point numbers and the input of the mantissa processing unit are matched. Therefore, when the bit width of the multiplicand is greater than the maximum multiplicand bit width and the bit width of the multiplier is less than or equal to the maximum multiplier bit width, the control circuit only normalizes the original mantissa corresponding to the multiplicand to the mantissa formed As the mantissa to be truncated, and in order to be applicable to the Booth coding circuit in the embodiment of the present disclosure, the sign bit is extended for each truncated part. In order to enable the mantissa processing unit to process the mantissa to be truncated, the part with a bit width of C-1 is truncated from the mantissa in each call, where C represents the maximum bit width of the multiplicand supported by the mantissa processing unit. The truncated part with a bit width of C-1 is supplemented with a bit of 0 at the high bit as a symbol to form a multiplicand part with a bit width of C, which is used as an input of the mantissa processing unit in each call. In addition, the multiplier (in this embodiment, the multiplier is a normalized and extended sign bit mantissa) is input to the mantissa processing unit as another input in each call. Therefore, the following formula can be used to determine the number of calls of the mantissa processing unit:

n=ceil((D+1)/(C-1))，n=ceil((D+1)/(C-1)),

其中，n代表調用尾數處理單元的次數，D代表未規格化且未擴展符號位的尾數的位寬，D+1代表對尾數規格化後的位寬，D+1也可理解為D+2-1，即被乘數的位寬減去符號位的位寬，C代表被乘數部分的位寬（尾數處理單元所支援的最大被乘數位寬），C-1代表每次調用中從待截取尾數中截取的部分的位寬。Among them, n represents the number of times the mantissa processing unit is called, D represents the bit width of the unnormalized mantissa without extending the sign bit, D+1 represents the bit width after the mantissa is normalized, and D+1 can also be understood as D+2 -1, that is, the bit width of the multiplicand minus the bit width of the sign bit, C represents the bit width of the multiplicand part (the maximum bit width of the multiplicand supported by the mantissa processing unit), C-1 represents the bit width of the multiplicand from each call The bit width of the truncated part of the mantissa to be truncated.

舉例來說，尾數處理單元所支援的最大乘數位寬例如為12bit，最大被乘數位寬例如為16bit，輸入乘法器的兩個浮點數分別是FP32類型和BF16類型的浮點數，因此選擇在FP32*BF16運算模式中進行乘法運算，並且兩個浮點數是非規格化非零數，因此兩個浮點數的尾數分別具有23bit和7bit的位寬，考慮IEEE754標準，則兩個尾數的位寬可以擴展為24bit和8bit。為了適用於本公開實施例中的布斯編碼電路，再將兩個尾數擴展1比特0成為25bit和9bit的有符號數。因此控制電路將位寬為9bit的尾數作為與最大乘數位寬對應的乘數並且將位寬為25bit的尾數作為與最大被乘數位寬對應的被乘數，由於僅被乘數的位寬（25bit）大於尾數處理單元所支援的最大被乘數位寬（16bit），而乘數的位寬（9bit）小於最大乘數位寬（12bit），因此將該被乘數所對應的原始尾數僅規格化後形成的尾數作為待截取尾數ina，則乘數作為輸入尾數處理單元的乘數inb。根據以上公式，ceil((23+1)/(16-1))=2，因此，需要調用兩次尾數處理單元，並且在每次調用時，在ina中每次截取15bit資料，最後一次調用（第二次調用）時，不足15bit數據則在前面補0湊齊15bit，並且每次截取的15bit資料擴展1比特0（符號位）成為16bit作為被乘數部分ina_m，因此，在每次調用時進行的計算為ina_m*inb，即位寬為16bit的被乘數部分與位寬為9bit的乘數的乘法運算，從而可以計算得出該次調用所獲得的尾數結果。值得注意的是，對待截取尾數的截取可以按照從高位到低位的順序進行，也可以按照從低位到高位的順序進行。值得注意的是，該示例同樣適用於本公開上述第一實施例。For example, the maximum multiplier bit width supported by the mantissa processing unit is, for example, 12bit, and the maximum multiplicand bit width is, for example, 16bit. The two floating-point numbers input to the multiplier are FP32 and BF16 floating-point numbers, so choose Multiplication is performed in the FP32*BF16 operation mode, and the two floating-point numbers are non-normalized non-zero numbers. Therefore, the mantissa of the two floating-point numbers has a bit width of 23bit and 7bit respectively. Considering the IEEE754 standard, the two mantissas are The bit width can be expanded to 24bit and 8bit. In order to be applicable to the Booth coding circuit in the embodiment of the present disclosure, the two mantissas are extended by 1 bit 0 to become 25-bit and 9-bit signed numbers. Therefore, the control circuit takes the mantissa with a bit width of 9 bits as the multiplier corresponding to the maximum multiplier bit width and uses the mantissa with a bit width of 25 bits as the multiplicand corresponding to the maximum multiplicand bit width, because only the bit width of the multiplicand ( 25bit) is greater than the maximum multiplicand bit width (16bit) supported by the mantissa processing unit, and the multiplier bit width (9bit) is less than the maximum multiplier bit width (12bit), so the original mantissa corresponding to the multiplicand is only normalized The resulting mantissa is used as the mantissa to be truncated ina, and the multiplier is used as the multiplier inb of the input mantissa processing unit. According to the above formula, ceil((23+1)/(16-1))=2, therefore, it is necessary to call the mantissa processing unit twice, and in each call, 15bit data is intercepted in ina each time, and the last call (The second call), if the data is less than 15bit, add 0 to the front to make up 15bit, and each intercepted 15bit data is extended by 1 bit 0 (sign bit) to become 16bit as the multiplicand part ina_m, therefore, in each call The calculation performed at this time is ina_m*inb, that is, the multiplication operation of the multiplicand part with a bit width of 16 bits and a multiplier with a bit width of 9 bits, so that the mantissa result obtained by this call can be calculated. It is worth noting that the interception of the mantissa to be truncated can be performed in the order from high to low, or in the order from low to high. It should be noted that this example is also applicable to the above-mentioned first embodiment of the present disclosure.

根據本公開的多次調用尾數處理單元的第三示例，結合以上描述的根據運算模式的浮點數乘法運算，以輸入到本公開乘法器的兩個浮點數為非規格化的非零浮點數為例，並結合本公開使用的布斯編碼電路進行有符號定點數乘法運算的情況，首先將兩個浮點數規格化，因此兩個浮點數的尾數擴展1位，另外為了適用於本公開實施例中的布斯編碼電路，再將兩個尾數擴展1位而形成有符號數。在經過這些預處理後，將兩個浮點數的尾數和尾數處理單元的輸入進行匹配。因此，當所述乘數的位寬大於所述最大乘數位寬且所述被乘數（在該實施例中，該被乘數是規格化且擴展符號位的尾數）的位寬大於所述最大被乘數位寬時，所述控制電路將該乘數對應的原始尾數僅規格化後形成的尾數和該被乘數對應的原始尾數僅規格化後形成的尾數作為待截取尾數，並且為了適用於本公開實施例中的布斯編碼電路，對每次截取的部分擴展符號位。為了使得尾數處理單元可以處理這兩個待截取尾數，在每次調用中分別從與乘數對應的待截取尾數中截取位寬為A-1的部分並且從與被乘數對應的待截取尾數中截取位寬為C-1的部分，其中，A代表尾數處理單元所支援的最大乘數位寬，C代表尾數處理單元所支援的最大被乘數位寬，對每次截取的位寬為A-1的部分在高位補充一位0作為符號形成位寬為A的乘數部分，該乘數部分作為在每次調用中輸入尾數處理單元的一個輸入，並且對每次截取的位寬為C-1的部分在高位補充一位0作為符號形成位寬為C的被乘數部分，該被乘數部分作為在每次調用中輸入尾數處理單元的另一個輸入。由此，可以使用以下公式來確定尾數處理單元的調用次數：According to the third example of the multi-call mantissa processing unit of the present disclosure, combined with the floating-point number multiplication operation according to the operation mode described above, the two floating-point numbers input to the multiplier of the present disclosure are denormalized non-zero floats. Take the number of points as an example, and combine the Booth coding circuit used in the present disclosure to perform signed fixed-point number multiplication. First, the two floating-point numbers are normalized, so the mantissa of the two floating-point numbers is extended by 1 bit. In addition, for application In the Booth coding circuit in the embodiment of the present disclosure, the two mantissas are extended by 1 bit to form a signed number. After these preprocessing, the mantissa of the two floating-point numbers and the input of the mantissa processing unit are matched. Therefore, when the bit width of the multiplier is greater than the maximum multiplier bit width and the multiplicand (in this embodiment, the multiplicand is a normalized and extended sign bit mantissa) bit width is greater than the When the maximum bit width of the multiplicand is the mantissa to be truncated by the control circuit, the original mantissa corresponding to the multiplier is only normalized and the mantissa formed by the original mantissa corresponding to the multiplicand is only normalized as the mantissa to be truncated. In the Booth coding circuit in the embodiment of the present disclosure, the sign bit is extended for each intercepted part. In order to enable the mantissa processing unit to process these two mantissas to be truncated, in each call the part of the mantissa to be truncated corresponding to the multiplier is truncated with a bit width of A-1 and the mantissa to be truncated corresponding to the multiplicand is truncated. The part where the bit width of the interception is C-1, where A represents the maximum multiplier bit width supported by the mantissa processing unit, and C represents the maximum multiplicand bit width supported by the mantissa processing unit. The bit width for each interception is A- The part of 1 is supplemented with a bit of 0 at the high bit as a symbol to form a multiplier part with a bit width of A. The multiplier part is used as an input to the mantissa processing unit in each call, and the bit width for each interception is C- The part of 1 is supplemented with a bit of 0 at the high bit as a symbol to form a multiplicand part with a bit width of C. The multiplicand part is used as another input of the mantissa processing unit in each call. Therefore, the following formula can be used to determine the number of calls of the mantissa processing unit:

n=ceil((B+1)/(A-1))* ceil((D+1)/(C-1))n=ceil((B+1)/(A-1))* ceil((D+1)/(C-1))

其中，n代表調用尾數處理單元的次數，B代表未規格化且未擴展符號位的尾數的位寬，B+1代表對尾數規格化後的位寬，B+1也可理解為B+2-1，即乘數的位寬減去符號位的位寬，A代表乘數部分的位寬（尾數處理單元所支援的最大乘數位寬），A-1代表每次調用中從與乘數對應的待截取尾數中截取的部分的位寬，D代表未規格化且未擴展符號位的尾數的位寬，D+1代表對尾數規格化後的位寬，D+1也可理解為D+2-1，即被乘數的位寬減去符號位的位寬，C代表被乘數部分的位寬（尾數處理單元所支援的最大被乘數位寬），C-1代表每次調用中從待截取尾數中截取的部分的位寬。Among them, n represents the number of times the mantissa processing unit is called, B represents the bit width of the unnormalized mantissa without extending the sign bit, B+1 represents the bit width after the mantissa is normalized, and B+1 can also be understood as B+2 -1, that is, the bit width of the multiplier minus the bit width of the sign bit, A represents the bit width of the multiplier part (the maximum multiplier bit width supported by the mantissa processing unit), and A-1 represents the slave and multiplier in each call Corresponding to the bit width of the truncated part of the mantissa to be truncated, D represents the bit width of the unnormalized and unexpanded sign bit mantissa, D+1 represents the bit width after the mantissa is normalized, and D+1 can also be understood as D +2-1, that is, the bit width of the multiplicand minus the bit width of the sign bit, C represents the bit width of the multiplicand part (the maximum bit width of the multiplicand supported by the mantissa processing unit), and C-1 represents each call The bit width of the part to be truncated from the mantissa to be truncated in.

舉例來說，尾數處理單元所支援的最大乘數位寬例如為8bit，最大被乘數位寬例如為16bit，輸入乘法器的兩個浮點數都是FP32類型的浮點數，因此選擇在FP32*FP32運算模式中進行乘法運算，並且兩個浮點數是非規格化非零數，因此兩個浮點數的尾數位寬都為23bit，考慮IEEE754標準，則兩個尾數的位寬可以擴展為24bit。為了適用於本公開實施例中的布斯編碼電路，再將兩個尾數擴展1比特0成為25bit的有符號數。因此控制電路將兩個浮點數的尾數分別選擇作為與最大乘數位寬對應的乘數和與最大被乘數位寬對應的被乘數（由於兩個浮點數的尾數在擴展後位寬相同，因此任選一個作為乘數，另一個作為被乘數），由於所述乘數的位寬（25bit）大於所述最大乘數位寬（8bit）且所述被乘數的位寬（25bit）大於所述最大被乘數位寬（16bit），因此將乘數所對應的原始尾數規格化後形成的尾數作為待截取尾數inb並且將被乘數所對應的原始尾數規格化後形成的尾數作為待截取尾數ina。根據以上公式，ceil((23+1)/(8-1))* ceil((23+1)/(16-1))=8，因此，需要調用八次尾數處理單元。在每次調用時，在inb中每次截取7bit資料，最後一次調用時，不足7bit資料，則將剩餘資料全部截取並在前面補0湊齊7bit，並且每次截取的7bit資料擴展1比特0（符號位）成為8bit作為乘數部分inb_m，由於將inb截取為四個部分，因此可以具有四個乘數部分inb_m1、inb_m2、inb_m3、inb_m4。另外在每次調用時，在ina中每次截取15bit資料，最後一次調用時，不足15bit資料，則將剩餘資料全部截取並在前面補0湊齊15bit，並且每次截取的15bit資料擴展1比特0（符號位）成為16bit作為被乘數部分ina_m，由於將ina截取為兩個部分，因此可以具有兩個被乘數部分ina_m1、ina_m2。因此，例如在八次調用尾數處理單元時可以依次進行以下計算：ina_m1*inb_m1、ina_m1*inb_m2、ina_m1*inb_m3、ina_m1*inb_m4、ina_m2*inb_m1、ina_m2*inb_m2、ina_m2*inb_m3、ina_m2*inb_m4，當然也可以依次進行以下計算：inb_m1*ina_m1、inb_m1*ina_m2、inb_m2*ina_m1、inb_m2*ina_m2、inb_m3*ina_m1、inb_m3*ina_m2、inb_m4*ina_m1、inb_m4*ina_m2。每次調用進行的計算為位寬為16bit的被乘數部分與位寬為8bit的乘數部分的乘法運算，從而可以計算得出該次調用所獲得的尾數結果。值得注意的是，對待截取尾數的截取可以按照從高位到低位的順序進行，也可以按照從低位到高位的順序進行。For example, the maximum multiplier bit width supported by the mantissa processing unit is, for example, 8bit, and the maximum multiplicand bit width is, for example, 16bit. The two floating-point numbers input to the multiplier are both FP32-type floating-point numbers, so choose FP32* Multiplication is performed in the FP32 operation mode, and the two floating-point numbers are non-normalized non-zero numbers, so the mantissa width of the two floating-point numbers is 23bit. Considering the IEEE754 standard, the bit width of the two mantissas can be extended to 24bit . In order to be applicable to the Booth coding circuit in the embodiment of the present disclosure, the two mantissas are extended by 1 bit 0 to become a signed number of 25 bits. Therefore, the control circuit selects the mantissa of the two floating-point numbers as the multiplier corresponding to the maximum multiplier bit width and the multiplicand corresponding to the maximum multiplicand bit width (because the mantissa of the two floating-point numbers has the same bit width after expansion , So you can choose one as the multiplier and the other as the multiplicand), because the bit width of the multiplier (25bit) is greater than the maximum multiplier bit width (8bit) and the bit width of the multiplicand (25bit) Is greater than the maximum multiplicand bit width (16bit), so the mantissa formed by normalizing the original mantissa corresponding to the multiplier is used as the mantissa to be truncated inb, and the mantissa formed by normalizing the original mantissa corresponding to the multiplicand is used as the mantissa to be truncated. Truncate the mantissa ina. According to the above formula, ceil((23+1)/(8-1))* ceil((23+1)/(16-1))=8, therefore, the mantissa processing unit needs to be called eight times. At each call, intercept 7bit data in inb each time. At the last call, if there is less than 7bit data, then all the remaining data will be intercepted and the first 0 will be added to make up 7bit, and the 7bit data intercepted each time will be extended by 1 bit 0 The (sign bit) becomes 8bit as the multiplier part inb_m. Since inb is cut into four parts, there can be four multiplier parts inb_m1, inb_m2, inb_m3, and inb_m4. In addition, at each call, 15bit data is intercepted in ina each time. At the last call, if there is less than 15bit data, all the remaining data will be intercepted and 0 to make up 15bit, and the 15bit data intercepted will be extended by 1 bit each time. 0 (sign bit) becomes 16bit as the multiplicand part ina_m. Since ina is truncated into two parts, it can have two multiplicand parts ina_m1 and ina_m2. Therefore, for example, when the mantissa processing unit is called eight times, the following calculations can be performed sequentially: ina_m1*inb_m1, ina_m1*inb_m2, ina_m1*inb_m3, ina_m1*inb_m4, ina_m2*inb_m1, ina_m2*inb_m2, ina_m2*inb_m3, inb_m2*in The following calculations can also be performed in sequence: inb_m1*ina_m1, inb_m1*ina_m2, inb_m2*ina_m1, inb_m2*ina_m2, inb_m3*ina_m1, inb_m3*ina_m2, inb_m4*ina_m1, inb_m4*ina_m2. The calculation for each call is the multiplication of the multiplicand part with a bit width of 16 bits and the multiplier part with a bit width of 8 bits, so that the mantissa result obtained by the call can be calculated. It is worth noting that the interception of the mantissa to be truncated can be performed in the order from high to low, or in the order from low to high.

以上示例僅僅用於說明性而非限制性的目的，根據這些示例，本領域具有通常知識者可以想到在其它運算模式下多次調用最大支援任意位寬的尾數處理單元所進行的浮點數乘法運算。The above examples are only for illustrative and not restrictive purposes. According to these examples, those with ordinary knowledge in the art can think of calling the floating-point number multiplication performed by the mantissa processing unit that supports any bit width multiple times in other operation modes. Operation.

針對以上多次調用尾數處理單元，所述尾數處理單元還可以包括移位加法電路，所述移位加法電路用於根據每次調用所述尾數處理單元所獲得的尾數結果來獲得所述乘法運算後的尾數。In view of the above multiple calls to the mantissa processing unit, the mantissa processing unit may further include a shift and add circuit for obtaining the multiplication operation according to the mantissa result obtained by calling the mantissa processing unit each time After the mantissa.

進一步，所述移位加法電路包括移位器、中間存儲器和加法器，當所述控制電路根據所述運算模式多次調用所述尾數處理單元時，在第一次調用後，所述移位器將第一次調用獲得的尾數結果進行移位獲得移位後尾數結果並將所述移位後尾數結果存入所述中間存儲器中，從第二次調用開始，所述移位器將當次調用中獲得的尾數結果進行移位獲得當次尾數結果，所述加法器將所述當次尾數結果與存儲在所述中間存儲器中的結果相加並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器，並且在最後一次調用後存儲在所述中間存儲器中的結果作為所述乘法運算後的尾數。Further, the shift and add circuit includes a shifter, an intermediate memory, and an adder. When the control circuit calls the mantissa processing unit multiple times according to the operation mode, after the first call, the shift The shifter shifts the mantissa result obtained in the first call to obtain the shifted mantissa result and stores the shifted mantissa result in the intermediate memory. Starting from the second call, the shifter will be The mantissa result obtained in this call is shifted to obtain the current mantissa result, and the adder adds the current mantissa result to the result stored in the intermediate memory and stores the added result in the The intermediate memory is updated in the intermediate memory, and the result stored in the intermediate memory after the last call is used as the mantissa after the multiplication operation.

在該實施例中，例如，對待截取尾數的截取按照從高位到低位的順序進行。在每次調用所述尾數處理單元時，所述移位器將當次調用中獲得的尾數結果按照以下公式進行移位：In this embodiment, for example, the truncation of the mantissa to be truncated is performed in the order from high order to low order. Each time the mantissa processing unit is called, the shifter shifts the mantissa result obtained in the current call according to the following formula:

Y=k+jY=k+j

其中，Y代表當次調用中獲得的尾數結果所需進行的移位數，k代表在與乘數對應的待截取尾數中在當次調用所使用的截取部分後面的全部資料的位數之和，j代表在與被乘數對應的待截取尾數中在當次調用所使用的截取部分後面的全部資料的位數之和。應當理解，如果僅乘數的位寬大於最大乘數位寬或者僅被乘數的位寬大於最大被乘數位寬，則只需要對與乘數對應的待截取尾數或與被乘數對應的待截取尾數進行截取，而不需要截取的尾數每次調用時使用的是其全部資料，因此後面不存在資料，從而k或j的取值為0，由此可知對於僅乘數的位寬大於最大乘數位寬的情況，以上計算移位數的公式可以寫為：Y=k，對於僅被乘數的位寬大於最大被乘數位寬的情況，以上計算移位數的公式可以寫為：Y= j。Among them, Y represents the number of shifts required for the mantissa result obtained in the current call, and k represents the sum of the digits of all data behind the intercepted part used in the current call in the mantissa to be truncated corresponding to the multiplier , J represents the sum of the digits of all data after the intercepted part used in the current call in the mantissa to be intercepted corresponding to the multiplicand. It should be understood that if only the bit width of the multiplier is greater than the maximum multiplicand bit width or only the bit width of the multiplicand is greater than the maximum multiplicand bit width, then only the mantissa to be truncated corresponding to the multiplier or the to be truncated corresponding to the multiplicand is required. Truncate the mantissa for truncation, and the mantissa that does not need to be truncated uses all its data each time it is called, so there is no data behind, so the value of k or j is 0, which shows that the bit width of only the multiplier is greater than the maximum In the case of the multiplier bit width, the above formula for calculating the shift number can be written as: Y=k. For the case where only the bit width of the multiplicand is greater than the maximum multiplicand bit width, the above formula for calculating the shift number can be written as: Y = j.

舉例來說，如前所述，在FP32*BF16運算模式中，當僅所述乘數的位寬大於所述最大乘數位寬時，兩次調用尾數處理單元，並且例如對待截取尾數的截取按照從高位到低位的順序進行。具體地，例如兩次調用中的乘數部分分別為inb_m1和inb_m2，在第一次調用後，所述移位器將ina*inb_m1的結果向左移位，由於在第一次調用中截取7bit資料，因此在該次調用所使用的這7bit資料之後的全部資料的位數之和為k=8-7=1bit，根據上述公式可知，Y=1，因此，向左移位的位數為1位，從而獲得移位1位後的結果R1，所述加法器將該R1存入所述中間存儲器中；在第二次調用（最後一次調用）後，所述移位器將ina*inb_m2的結果向左移位，由於在第二次調用中已將最後1位資料截取，因此在該次調用的所使用的1bit資料之後不存在資料，根據上述公式可知，Y=0，因此，向左移位的位數為0位，即不移位，從而獲得結果R2，所述加法器將該R2與存儲在所述中間存儲器中的R1相加，並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器中，由於該第二次調用為最後一次調用，因此在第二次調用後存儲在所述中間存儲器中的結果為所述乘法運算後的尾數。對上述當僅所述被乘數的位寬大於所述最大被乘數位寬時的情況，移位加法電路可以同樣如此工作。For example, as mentioned above, in the FP32*BF16 operation mode, when only the bit width of the multiplier is greater than the maximum multiplier bit width, the mantissa processing unit is called twice, and for example, the interception of the mantissa to be truncated is performed according to From high order to low order. Specifically, for example, the multiplier parts in the two calls are inb_m1 and inb_m2 respectively. After the first call, the shifter shifts the result of ina*inb_m1 to the left, because 7bit is intercepted in the first call Therefore, the sum of the digits of all the data after the 7bit data used in this call is k=8-7=1bit. According to the above formula, Y=1, therefore, the number of bits shifted to the left is 1 bit to obtain the result R1 shifted by 1 bit. The adder stores this R1 in the intermediate memory; after the second call (the last call), the shifter will ina*inb_m2 The result of is shifted to the left. Since the last 1 bit of data has been intercepted in the second call, there is no data after the 1bit data used in this call. According to the above formula, Y=0, therefore, The number of bits shifted to the left is 0 bits, that is, it is not shifted, so that the result R2 is obtained. The adder adds this R2 to R1 stored in the intermediate memory, and stores the result of the addition in the intermediate memory. The intermediate memory is updated in the intermediate memory. Since the second call is the last call, the result stored in the intermediate memory after the second call is the mantissa after the multiplication operation. For the above-mentioned situation when only the bit width of the multiplicand is greater than the maximum bit width of the multiplicand, the shift addition circuit can work in the same way.

舉例來說，如前所述，在FP32*FP32運算模式中，當所述乘數的位寬大於所述最大乘數位寬且所述被乘數的位寬大於所述最大被乘數位寬時，八次調用尾數處理單元，並且例如對待截取尾數的截取按照從高位到低位的順序進行。具體地，例如八次調用中的乘數部分分別為inb_m1、inb_m2、inb_m3和inb_m4，被乘數部分分別為ina_m1、ina_m2，例如在八次調用尾數處理單元時依次進行以下計算：ina_m1*inb_m1、ina_m1*inb_m2、ina_m1*inb_m3、ina_m1*inb_m4、ina_m2*inb_m1、ina_m2*inb_m2、ina_m2*inb_m3、ina_m2*inb_m4。在第一次調用中，所述移位器將ina_m1*inb_m1的結果向左移位，由於在第一次調用中在與乘數對應的待截取尾數中截取7bit資料，因此在該待截取尾數中在該次調用所使用的7bit資料之後的全部資料的位數之和為k=24-7=17bit，並且在與被乘數對應的待截取尾數中截取15bit資料，因此在該待截取尾數中在該次調用所使用的15bit資料之後的全部資料的位數之和為j=24-15=9bit，根據上述公式可知，Y=17+9=26，因此，向左移位的位數為26位，從而獲得移位26位後的結果S1，所述加法器將該S1存入所述中間存儲器中；在第二次調用後，所述移位器將ina_m1*inb_m2的結果向左移位，由於在第二次調用中在與乘數對應的待截取尾數中截取下一個7bit資料，因此在該待截取尾數中在該次調用所使用的7bit資料之後的全部資料的位數之和為k=24-7-7=10bit，而在與被乘數對應的待截取尾數中截取與上一次調用時同樣的7bit資料（使用與上一次調用同樣的7bit資料），因此在該待截取尾數中在該次調用所使用的7bit資料之後的全部資料的位數之和仍為j=24-15=9bit，根據上述公式可知，Y=10+9=19，因此，向左移位的位數為19位，從而獲得移位19位後的結果S2，所述加法器將該S2與存儲在所述中間存儲器中S1相加，並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器；如此重複調用尾數處理單元直到第四次調用，在第四次調用中，所述移位器將ina_m1*inb_m4的結果向左移位，由於在第四次調用中截取與乘數對應的待截取尾數中的最後3bit資料，因此在該待截取尾數中在該次調用所使用的3bit資料之後不存在資料，從而k=0，而在與被乘數對應的待截取尾數中截取與上一次調用時同樣的7bit資料，因此在該待截取尾數中在該次調用所使用的7bit資料之後的全部資料的位數之和仍為j=24-15=9bit，根據上述公式可知，Y=0+9=9，因此，向左移位的位數為9位，從而獲得移位9位後的結果S4，所述加法器將該S4與存儲在所述中間存儲器中的結果相加，並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器；由於在第五次至第八次調用中，都是截取與被乘數對應的待截取尾數中最後9bit資料，而在該9bit資料之後不再有資料，因此在第五次至第八次調用中，j=0，在第五次調用中，所述移位器將ina_m2*inb_m1的結果向左移位，由於在第五次調用中在與乘數對應的待截取尾數中截取與在第一次調用中同樣的7bit資料，因此k=24-7=17bit，根據上述公式可知，Y=17+0=17，因此，向左移位的位數為17位，從而獲得移位17位後的結果S5，所述加法器將該結果S5與存儲在所述中間存儲器中的結果相加，並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器；如此重複調用尾數處理單元直到第八次調用，在第八次調用中，所述移位器將ina_m2*inb_m4的結果向左移位，由於在第八次調用中截取與乘數對應的待截取尾數中的最後3bit資料，因此在該待截取尾數中在該次調用所使用的3bit資料之後不存在資料，從而k=0，根據上述公式可知，Y=0+0=0，因此，向左移位的位數為0位，即不移位，從獲得不移位的結果S8，所述加法器將該S8與存儲在所述中間存儲器中的結果相加，並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器；由於該第八次調用為最後一次調用，因此在第八次調用後存儲在所述中間存儲器中的結果為所述乘法運算後的尾數。For example, as mentioned above, in the FP32*FP32 operation mode, when the bit width of the multiplier is greater than the maximum multiplier bit width and the bit width of the multiplicand is greater than the maximum multiplicand bit width , The mantissa processing unit is called eight times, and for example, the truncation of the mantissa to be truncated is performed in order from high to low. Specifically, for example, the multiplier parts in the eight calls are inb_m1, inb_m2, inb_m3, and inb_m4, respectively, and the multiplicand parts are ina_m1, ina_m2, respectively. For example, when the mantissa processing unit is called eight times, the following calculations are sequentially performed: ina_m1*inb_m1, ina_m1*inb_m2, ina_m1*inb_m3, ina_m1*inb_m4, ina_m2*inb_m1, ina_m2*inb_m2, ina_m2*inb_m3, ina_m2*inb_m4. In the first call, the shifter shifts the result of ina_m1*inb_m1 to the left. Since the 7-bit data is intercepted in the mantissa to be truncated corresponding to the multiplier in the first call, the mantissa is to be truncated The sum of the digits of all the data after the 7bit data used in this call is k=24-7=17bit, and 15bit data is truncated in the mantissa to be truncated corresponding to the multiplicand, so the mantissa to be truncated is The sum of the digits of all data after the 15bit data used in this call is j=24-15=9bit. According to the above formula, Y=17+9=26. Therefore, the number of bits shifted to the left It is 26 bits to obtain the result S1 shifted by 26 bits. The adder stores the S1 in the intermediate memory; after the second call, the shifter shifts the result of ina_m1*inb_m2 to the left Shift, because in the second call, the next 7-bit data is intercepted in the mantissa to be truncated corresponding to the multiplier, so in the mantissa to be truncated, the number of bits in all the data after the 7-bit data used in the call is The sum is k=24-7-7=10bit, and in the mantissa to be truncated corresponding to the multiplicand, the same 7-bit data as in the previous call is intercepted (using the same 7-bit data as the previous call), so the The sum of the digits of all data after the 7bit data used in this call in the truncation mantissa is still j=24-15=9bit. According to the above formula, Y=10+9=19, therefore, shift to the left The number of bits of is 19 bits, so that the result S2 shifted by 19 bits is obtained. The adder adds this S2 to S1 stored in the intermediate memory, and stores the result of the addition in the intermediate memory To update the intermediate memory; repeat the call to the mantissa processing unit until the fourth call. In the fourth call, the shifter shifts the result of ina_m1*inb_m4 to the left, because in the fourth call The last 3bit data in the mantissa to be truncated corresponding to the multiplier is intercepted, so there is no data in the mantissa to be truncated after the 3bit data used in this call, so k=0, and the data in the mantissa corresponding to the multiplicand In the truncation mantissa, the same 7-bit data as the last call is intercepted. Therefore, the sum of the digits of all data after the 7-bit data used in the call in the mantissa to be truncated is still j=24-15=9bit, according to The above formula shows that Y=0+9=9, therefore, the number of bits shifted to the left is 9 bits, so as to obtain the result S4 shifted by 9 bits, and the adder stores this S4 and the intermediate memory in the intermediate memory. Add the results in and store the added results in the intermediate memory to update the intermediate memory; because in the fifth to eighth calls, the to-be intercepted corresponding to the multiplicand is intercepted The last 9bit data in the mantissa, and there is no more data after the 9bit data, so in the fifth to the eighth call, j=0, in the fifth call, so The shifter shifts the result of ina_m2*inb_m1 to the left. Because in the fifth call, the same 7bit data as in the first call is intercepted in the mantissa to be truncated corresponding to the multiplier, so k=24- 7=17bit. According to the above formula, Y=17+0=17. Therefore, the number of bits shifted to the left is 17 bits to obtain the result S5 shifted by 17 bits. The adder divides the result S5 with The result stored in the intermediate memory is added, and the result of the addition is stored in the intermediate memory to update the intermediate memory; in this way, the mantissa processing unit is called repeatedly until the eighth call, and at the eighth call , The shifter shifts the result of ina_m2*inb_m4 to the left. Since the last 3bit data in the mantissa to be truncated corresponding to the multiplier is intercepted in the eighth call, in the mantissa to be truncated, After calling the used 3bit data, there is no data, so k=0. According to the above formula, Y=0+0=0. Therefore, the number of bits shifted to the left is 0 bits, that is, no shift is obtained. The result of shifting S8, the adder adds S8 to the result stored in the intermediate memory, and stores the result of the addition in the intermediate memory to update the intermediate memory; The eight calls are the last call, so the result stored in the intermediate memory after the eighth call is the mantissa after the multiplication operation.

另一方面，為了進一步減小乘法器的面積，所述指數處理單元包括第二控制電路（圖中未示出），所述第二控制電路用於根據所述兩個浮點數中的一個的指數位寬和所述指數處理單元所支援的兩個位寬中的一個或者根據所述兩個浮點數的指數位寬和所述指數處理單元所支援的兩個位寬來確定多次調用所述指數處理單元以獲得所述乘法運算後的指數。On the other hand, in order to further reduce the area of the multiplier, the exponent processing unit includes a second control circuit (not shown in the figure). The exponent bit width of and one of the two bit widths supported by the exponent processing unit or multiple times are determined according to the exponent bit width of the two floating-point numbers and the two bit widths supported by the exponent processing unit The exponent processing unit is called to obtain the exponent after the multiplication operation.

根據本公開的第三實施例，所述兩個浮點數包括第一浮點數和第二浮點數，所述指數處理單元支援第三位寬和第四位寬，所述第一浮點數的指數作為與所述第三位寬對應的第三輸入，所述第二浮點數的指數作為與所述第四位寬對應的第四輸入，所述第三輸入的位寬小於或等於所述第三位寬，所述第二控制電路用於當所述第四輸入的位寬大於所述第四位寬時，多次調用所述指數處理單元來獲得所述乘法運算後的指數。根據該實施例，已知兩個輸入中的一個的位寬固定小於或等於與其對應的指數處理單元所支援的一個位寬，由此，只需判斷另一個輸入與對應的指數處理單元所支援位寬的大小關係，即可確定是否多次調用指數處理單元。According to a third embodiment of the present disclosure, the two floating point numbers include a first floating point number and a second floating point number, the exponent processing unit supports a third bit width and a fourth bit width, and the first floating point number The exponent of the number of points is used as the third input corresponding to the third bit width, the exponent of the second floating point number is used as the fourth input corresponding to the fourth bit width, and the bit width of the third input is less than Or equal to the third bit width, and the second control circuit is configured to call the exponent processing unit multiple times to obtain the multiplication operation when the fourth input has a bit width greater than the fourth bit width. The index. According to this embodiment, it is known that the bit width of one of the two inputs is fixed to be less than or equal to the bit width supported by the corresponding exponential processing unit. Therefore, it is only necessary to determine that the other input is supported by the corresponding exponential processing unit. The size relationship of the bit width can determine whether to call the exponential processing unit multiple times.

根據本公開的第四實施例，所述兩個浮點數包括第一浮點數和第二浮點數，所述指數處理單元支援第三位寬和第四位寬，所述第一浮點數的指數作為與所述第三位寬對應的第三輸入，所述第二浮點數的指數作為與所述第四位寬對應的第四輸入，所述第二控制電路用於當所述第三輸入的位寬大於所述第三位寬且所述第四輸入的位寬小於或等於所述第四位寬時、當所述第四輸入的位寬大於所述第四位寬且所述第三輸入的位寬小於或等於所述第三位寬時或者當所述第三輸入的位寬大於所述第三位寬且所述第四輸入的位寬大於所述第四位寬時，多次調用所述指數處理單元來獲得所述乘法運算後的指數。根據該實施例，兩個輸入的位寬與指數處理單元所支援的兩個位寬的大小關係不確定，需要判斷兩個輸入與各自對應的指數處理單元所支援位寬的大小關係，來確定是否多次調用指數處理單元。According to the fourth embodiment of the present disclosure, the two floating-point numbers include a first floating-point number and a second floating-point number, the exponent processing unit supports a third bit width and a fourth bit width, and the first floating-point number The exponent of the number of points is used as the third input corresponding to the third bit width, the exponent of the second floating point number is used as the fourth input corresponding to the fourth bit width, and the second control circuit is used for When the bit width of the third input is greater than the third bit width and the bit width of the fourth input is less than or equal to the fourth bit width, when the bit width of the fourth input is greater than the fourth bit width Wide and the bit width of the third input is less than or equal to the third bit width or when the bit width of the third input is greater than the third bit width and the bit width of the fourth input is greater than the first When it is four bits wide, the exponent processing unit is called multiple times to obtain the exponent after the multiplication operation. According to this embodiment, the relationship between the bit widths of the two inputs and the two bit widths supported by the exponential processing unit is uncertain. It is necessary to determine the relationship between the two inputs and the bit widths supported by their respective exponential processing units to determine Whether to call the index processing unit multiple times.

根據該第四實施例，當所述第一浮點數的指數位寬小於所述第二浮點數的指數位寬並且所述第三位寬大於所述第四位寬時，或者當所述第一浮點數的指數位寬大於所述第二浮點數的指數位寬並且所述第三位寬小於所述第四位寬時，所述第二控制電路選擇所述第一浮點數的指數作為與所述第四位寬對應的所述第四輸入並且選擇所述第二浮點數的指數作為與所述第三位寬對應的第三輸入。應當理解，在兩個浮點數的指數無規則輸入時，可以先將輸入的兩個浮點數的指數根據大位寬對大位寬、小位寬對小位寬的策略與指數處理單元支援的兩個位寬進行匹配，以避免本可一次處理完成兩個浮點數的指數運算，卻進行了多次調用。According to this fourth embodiment, when the exponent bit width of the first floating-point number is smaller than the exponent bit width of the second floating-point number and the third bit width is greater than the fourth bit width, or when the When the exponent bit width of the first floating-point number is greater than the exponent bit width of the second floating-point number and the third bit width is less than the fourth bit width, the second control circuit selects the first floating The exponent of the number of points is used as the fourth input corresponding to the fourth bit width and the exponent of the second floating point number is selected as the third input corresponding to the third bit width. It should be understood that when the exponents of two floating-point numbers are entered irregularly, the exponents of the two input floating-point numbers can be first inputted according to the strategy of large bit width to large bit width and small bit width to small bit width and the exponent processing unit The two supported bit widths are matched to avoid the exponential operation of two floating-point numbers that can be processed at one time, but multiple calls are made.

進一步地，當所述第三輸入的位寬大於所述第三位寬且所述第四輸入的位寬小於或等於所述第四位寬時、當所述第四輸入的位寬大於所述第四位寬且所述第三輸入的位寬小於或等於所述第三位寬時或者當所述第三輸入的位寬大於所述第三位寬且所述第四輸入的位寬大於所述第四位寬時，所述第二控制電路用於當所述第三輸入的位寬小於或等於所述第四輸入的位寬且所述第三位寬小於或等於所述第四位寬時，根據所述第四輸入的位寬和所述第三位寬來確定調用所述指數處理單元的次數以及在每次調用中輸入所述指數處理單元的資料。值得注意的是，以上三種情況下，指數處理單元的調用次數以及在每次調用中輸入所述指數處理單元的資料都是根據第三輸入和第四輸入的位寬中的較大者與第三位寬和第四位寬中的較小者來確定。當然，當第三輸入和第四輸入的位寬相同或者第三位寬和第四位寬相同時，可以在相同位寬的兩者中任選其一。Further, when the bit width of the third input is greater than the third bit width and the bit width of the fourth input is less than or equal to the fourth bit width, when the bit width of the fourth input is greater than the When the fourth bit width and the bit width of the third input is less than or equal to the third bit width or when the bit width of the third input is greater than the third bit width and the bit width of the fourth input is larger In the fourth bit width, the second control circuit is used when the bit width of the third input is less than or equal to the bit width of the fourth input and the third bit width is less than or equal to the first bit width. When the width is four bits, the number of invocations of the index processing unit and the data of the index processing unit are input in each invocation according to the bit width of the fourth input and the third bit width. It is worth noting that, in the above three cases, the number of calls to the index processing unit and the data input to the index processing unit in each call are based on the larger of the third input and the fourth input bit width and the first one. The smaller of the three-digit width and the fourth-digit width is determined. Of course, when the bit width of the third input and the fourth input are the same or the third bit width and the fourth bit width are the same, you can choose one of the two with the same bit width.

在該實施例中，關於第一浮點數和第二浮點數的描述只是為了區分兩個浮點數，其中“第三”和“第四”不具有限定作用。同樣地，關於第三輸入和第四輸入的描述只是為了區分所述指數處理單元的兩個輸入，關於第三位寬和第四位寬的描述只是為了區分指數處理單元所支援的與所述指數處理單元的兩個輸入對應的兩個最大處理位寬，因此其中“第三”和“第四”都不具有限定作用。In this embodiment, the description of the first floating-point number and the second floating-point number is only for distinguishing between the two floating-point numbers, where "third" and "fourth" do not have a limiting effect. Similarly, the description of the third input and the fourth input is only to distinguish the two inputs of the exponent processing unit, and the description of the third and fourth bit width is only to distinguish between the exponential processing unit supported and the exponential processing unit. The two inputs of the index processing unit correspond to the two maximum processing bit widths, so neither "third" nor "fourth" has a limiting effect.

值得注意的是，以上實施例描述的輸入乘法器的浮點數是符合運算要求格式以及適用乘法器內部部件和外部部件的浮點數，即經過例如規格化等預處理的浮點數。應當理解，輸入乘法器的浮點數可以是規格化或非規格化的浮點數，結合以上關於規格化單元的描述可知，如果輸入的兩個浮點數中的至少一個浮點數為非規格化的非零浮點數，可以首先通過規格化單元對所述至少一個浮點數進行規格化處理，以獲得規格化後的指數和尾數，然後使用規格化後的指數作為指數處理單元的輸入來進行上述的浮點數乘法運算。當然，還可以對浮點數進行其他的預處理，並將預處理後的浮點數的指數作為指數處理單元的輸入來進行上述的浮點數乘法運算，例如以上關於規格化單元的描述中提到的為了適用運算模式而對浮點數進行的規格化，本公開的第三實施例和第四實施例同樣適用於如上所述的根據運算模式進行浮點數的運算。It is worth noting that the floating-point number of the input multiplier described in the above embodiment is a floating-point number that meets the format required by the operation and applies to the internal and external components of the multiplier, that is, the floating-point number that has undergone preprocessing such as normalization. It should be understood that the floating-point number input to the multiplier can be a normalized or non-normalized floating-point number. Combining the description of the normalization unit above, it can be known that if at least one of the two input floating-point numbers is non-standardized. For a normalized non-zero floating-point number, the at least one floating-point number may be normalized by a normalization unit first to obtain a normalized exponent and mantissa, and then the normalized exponent is used as the exponent processing unit. Input to perform the floating-point number multiplication described above. Of course, you can also perform other preprocessing on floating-point numbers, and use the exponent of the pre-processed floating-point number as the input of the exponent processing unit to perform the above-mentioned floating-point number multiplication operation, for example, in the description of the normalization unit above As mentioned above for the normalization of floating-point numbers in order to apply the operation mode, the third and fourth embodiments of the present disclosure are also applicable to the above-mentioned operation of floating-point numbers according to the operation mode.

下面將詳細說明多次調用指數處理單元的示例。為了更清楚直觀地理解該示例，上述第三輸入例如可以是加數，第四輸入例如可以是被加數，第三位寬例如可以是指數處理單元所支援的最大加數位寬，第四位寬例如可以是指數處理單元所支援的最大被加數位寬。An example of calling the index processing unit multiple times will be described in detail below. In order to understand this example more clearly and intuitively, the above-mentioned third input may be an addend, for example, the fourth input may be an addend, and the third bit width may be, for example, the maximum addend bit width supported by the exponent processing unit. The width may be, for example, the maximum addendum width supported by the exponential processing unit.

根據本公開的多次調用指數處理單元的示例，結合以上描述的根據運算模式的浮點數乘法運算，以輸入到本公開乘法器的兩個浮點數為非規格化的非零浮點數為例，首先將兩個浮點數規格化，因此兩個浮點數的尾數擴展1位。在經過該預處理後，兩個浮點數的指數和指數處理單元的輸入進行匹配。因此，當加數的位寬大於最大加數位寬且被加數的位寬小於或等於最大被加數位寬時、當被加數的位寬大於最大被加數位寬且加數的位寬小於或等於最大加數位寬時或者當加數的位寬大於最大加數位寬且被加數的位寬大於最大被加數位寬時，所述控制電路可以根據以下公式來確定指數處理單元的調用次數：According to the example of multiple calls to the exponential processing unit of the present disclosure, combined with the floating-point number multiplication operation according to the operation mode described above, the two floating-point numbers input to the multiplier of the present disclosure are denormalized non-zero floating-point numbers As an example, first normalize two floating-point numbers, so the mantissa of the two floating-point numbers is extended by 1 bit. After this preprocessing, the exponents of the two floating-point numbers are matched with the input of the exponent processing unit. Therefore, when the bit width of the addend is greater than the maximum addend bit width and the bit width of the addend is less than or equal to the maximum addend bit width, when the bit width of the addend is greater than the maximum addend bit width and the bit width of the addend is less than Or equal to the maximum addend bit width or when the addend bit width is greater than the maximum addend bit width and the addend bit width is greater than the maximum addend bit width, the control circuit may determine the number of calls of the exponent processing unit according to the following formula :

m= ceil（P/（Q-1）），m= ceil(P/(Q-1)),

其中，m代表調用指數處理單元的次數，P代表被加數的位寬，Q代表最大加數位寬，Q-1代表每次調用中從加數和被加數中截取的部分的位寬。在每次調用中同時對加數和被加數截取位寬為Q-1的部分，使得從加數和被加數中截取的相同位寬且相同數位的部分進行加法運算，若在調用中截取的部分的資料不足Q-1位或無數據，在其前面或全部補0湊齊Q-1位資料。在將從加數和被加數中截取的部分前面擴展一個進位後，形成輸入指數處理單元的加數部分和被加數部分，因此,Q也代表每次調用時輸入指數處理單元的加數部分和被加數部分的位寬。Among them, m represents the number of times the exponent processing unit is called, P represents the bit width of the addend, Q represents the maximum addend bit width, and Q-1 represents the bit width of the part intercepted from the addend and the addend in each call. In each call, the addend and the addend are intercepted at the same time as the part of the bit width Q-1, so that the parts with the same bit width and the same digits intercepted from the addend and the addend are added. If in the call The intercepted part of the data is less than Q-1 bits or there is no data. Add 0 to the front or all of it to make up the Q-1 bits of data. After extending a carry before the part intercepted from the addend and the addend, the addend part and the addend part of the input exponent processing unit are formed. Therefore, Q also represents the addend of the input exponential processing unit each time it is called. The bit width of the part and the addend part.

由此，第二控制電路可在每次調用指數處理單元時，從加數和被加數中按照相同的順序截取Q-1位的部分作為指數處理單元的輸入，通過指數處理單元獲得該次調用的指數結果，並且在調用指數處理單元m次之後獲得最終的指數。值得注意的是，上述相同的順序可以是從高位到低位的順序，也可以從低位到高位的順序。Therefore, the second control circuit can intercept the Q-1 bit part from the addend and the addend in the same order as the input of the exponent processing unit every time the exponent processing unit is called, and obtain this time by the exponent processing unit. The index result of the call, and the final index is obtained after calling the index processing unit m times. It is worth noting that the above-mentioned same order can be from high order to low order, or from low order to high order.

舉例來說，加數的位寬為6bit，被加數的位寬為9bit，指數處理單元所支援的最大加數位寬和最大被加數位寬都為8bit。因此，調用指數處理單元的次數為ceil（9/（8-1））=2，並且首先將加數前面補0，使得加數的位寬和被加數的位寬相同，然後在每次調用中按照從高位到低位的順序同時對加數和被加數截取位寬為7位的部分，並將這兩個截取的部分分別擴展一位進位位，形成兩個8位的帶進位資料進行相加，在第二次調用（即最後一次調用）時，只能從加數和被加數中截取2位資料（只剩2位資料），因此，在第二次調用時截取的2位資料前補0湊齊7位，並且擴展一位進位位，形成兩個8位的帶進位資料進行相加。For example, the bit width of the addend is 6 bits, the bit width of the addend is 9 bits, and the maximum addend bit width and the maximum addend bit width supported by the exponent processing unit are both 8 bits. Therefore, the number of times the exponent processing unit is called is ceil(9/(8-1))=2, and first add 0 in front of the addend, so that the bit width of the addend and the bit width of the addend are the same, and then every time In the call, the 7-bit part of the addend and the addend are simultaneously intercepted in the order from high to low, and the two intercepted parts are extended by one carry bit to form two 8-bit carry bits. The data is added. In the second call (that is, the last call), only 2 bits of data can be intercepted from the addend and the addend (only 2 bits of data are left). Therefore, the intercepted data in the second call Add 0 to the front of the 2-bit data to make up 7 bits, and expand a carry bit to form two 8-bit data with carry to add.

值得注意的是，該示例中的當加數的位寬大於最大加數位寬且被加數的位寬小於或等於最大被加數位寬時和當被加數的位寬大於最大被加數位寬且加數的位寬小於或等於最大加數位寬時對指數處理單元的調用同樣適用於本公開上述第三實施例。It is worth noting that in this example, when the bit width of the addend is greater than the maximum addend bit width and the bit width of the addend is less than or equal to the maximum addend bit width and when the bit width of the addend is greater than the maximum addend bit width And when the bit width of the addend is less than or equal to the maximum addend bit width, the call to the exponential processing unit is also applicable to the third embodiment of the present disclosure.

根據實施例，所述指數處理單元還可以包括第二移位加法電路，所述第二移位加法電路用於根據每次調用所述指數處理單元所獲得的指數結果來獲得所述乘法運算後的指數。According to an embodiment, the exponent processing unit may further include a second shift and add circuit configured to obtain the post-multiplication operation according to the exponent result obtained by calling the exponent processing unit each time The index.

進一步，所述第二移位加法電路包括第二移位器、第二中間存儲器和第二加法器，當所述第二控制電路多次調用所述指數處理單元時，在第一次調用後，所述第二移位器將第一次調用獲得的指數結果進行移位並將移位後的指數結果存入所述第二中間存儲器中，從第二次調用指數處理單元開始，所述第二移位器將當次調用中獲得的指數結果進行移位，所述第二加法器將移位後的指數結果與存儲在第二中間存儲器中的數值相加並且將相加後的結果存儲在所述第二中間存儲器中來更新所述第二中間存儲器，並且將在最後一次調用中存儲在所述第二中間存儲器中的數值作為所述乘法運算後的指數。Further, the second shift and add circuit includes a second shifter, a second intermediate memory, and a second adder. When the second control circuit calls the exponent processing unit multiple times, after the first call , The second shifter shifts the index result obtained by the first call and stores the shifted index result in the second intermediate memory, starting from the second call of the index processing unit, the The second shifter shifts the exponent result obtained in the current call, and the second adder adds the shifted exponent result to the value stored in the second intermediate memory and adds the result of the addition. The second intermediate memory is stored in the second intermediate memory to update the second intermediate memory, and the value stored in the second intermediate memory in the last call is used as the exponent after the multiplication operation.

在每次調用所述指數處理單元時，所述第二移位器將當次調用中獲得的指數結果按照以下方式進行移位：若在調用指數處理單元時按照從高位到低位的順序截取加數和被加數時，對當次調用從加數和被加數中所截取的部分向左移位，移位位數是當次調用中從被加數中截取的部分之後的部分的位數。Each time the exponent processing unit is called, the second shifter shifts the exponent result obtained in the current call in the following manner: if the exponent processing unit is called, the index is intercepted and added in the order from high to low. When the number and the addend, the part intercepted from the addend and the addend in the current call is shifted to the left. The shift bit is the bit of the part after the part intercepted from the addend in the current call number.

舉例來說，結合以上示例，例如加數的位寬為6bit，被加數的位寬為9bit，指數處理單元所支援的最大加數位寬和最大被加數位寬都為8bit，在每次調用中按照從高位到低位的順序同時對加數和被加數截取位寬為7位的部分。具體地，在第一次調用指數處理單元後，所述第二移位器將第一次調用獲得的指數結果向左移2位（因為該次調用中被加數截取的部分之後有2位資料）並將移位後的指數結果存入所述第二中間存儲器中，從第二次調用指數處理單元開始，所述第二移位器將當次調用中獲得的指數結果向左移位，由於該次調用中截取的部分之後不再有資料，因此向左移0位，即不移位，所述第二加法器將移0位後的指數結果與存儲在第二中間存儲器中的數值相加並且將相加後的結果存儲在所述第二中間存儲器中來更新所述第二中間存儲器，由於該第二次調用即為最後一次調用，因此在該第二次調用後存儲在所述第二中間存儲器中的數值即為所述乘法運算後的指數。For example, combining the above example, for example, the bit width of the addend is 6bit, the bit width of the addend is 9bit, and the maximum addend bit width and the maximum addend bit width supported by the exponent processing unit are both 8bit. In accordance with the order from high to low, both the addend and the addend are truncated with a bit width of 7 bits. Specifically, after the exponent processing unit is called for the first time, the second shifter shifts the exponent result obtained in the first call by 2 bits to the left (because there are 2 bits after the part intercepted by the addend in this call. Data) and store the shifted index result in the second intermediate memory. Starting from the second call to the index processing unit, the second shifter shifts the index result obtained in the current call to the left , Because there is no more data after the intercepted part in this call, it is shifted by 0 bits to the left, that is, without shifting, the second adder will shift the result of the exponent by 0 bits with the result of the exponent stored in the second intermediate memory. Add the values and store the result of the addition in the second intermediate memory to update the second intermediate memory. Since this second call is the last call, it is stored in the second intermediate memory after the second call. The value in the second intermediate memory is the exponent after the multiplication operation.

根據以上具體描述的本公開乘法器（尾數處理單元和指數處理單元）被多次調用的情況可知，所述控制模組可以包括多個子模組，所述多個子模組可以分別用於執行多次調用中的各種操作，例如確定多次調用尾數處理單元、確定調用次數、確定每次調用中輸入所述尾數處理單元的資料、判斷尾數位寬與尾數處理單元所支援位寬是否匹配、調整尾數輸入等。所述第二控制模組也可以包括多個子模組，同樣地，這些子模組可以分別執行多次調用中的各種操作。According to the situation that the multiplier (mantissa processing unit and exponent processing unit) of the present disclosure described above is called multiple times, it can be known that the control module may include multiple sub-modules, and the multiple sub-modules may be used to execute multiple sub-modules. Various operations in this call, such as determining multiple calls to the mantissa processing unit, determining the number of calls, determining the input of the mantissa processing unit data in each call, determining whether the mantissa bit width matches the bit width supported by the mantissa processing unit, and adjusting Mantissa input, etc. The second control module may also include multiple sub-modules, and similarly, these sub-modules may respectively perform various operations in multiple calls.

上文結合第4圖-第6圖詳細描述了本披露的乘法器在執行浮點運算時，對第一浮點數和第二浮點數的尾數相乘所執行的操作。當然，第4圖為了注重描述本披露乘法器的尾數處理單元的操作，並沒有繪出其他的單元，例如指數處理單元和符號處理單元，並對其進行描述。下面將結合第7圖對本披露的乘法器進行整體上的描述，對於前文針對尾數處理單元所做的描述，同樣也適用於第7圖所繪的情形。The foregoing describes in detail the operations performed by the multiplier of the present disclosure to multiply the mantissa of the first floating-point number and the second floating-point number when performing floating-point operations in conjunction with FIGS. 4-6. Of course, in Figure 4, in order to focus on describing the operation of the mantissa processing unit of the multiplier of the present disclosure, other units, such as the exponent processing unit and the symbol processing unit, are not drawn and described. Hereinafter, the overall description of the multiplier of the present disclosure will be given in conjunction with FIG. 7. The previous description of the mantissa processing unit is also applicable to the situation depicted in FIG. 7.

第7圖是示出根據本披露實施例的乘法器700的整體示意框圖。需要理解的是圖中繪出的各類單元的位置、存在和連接關係僅僅是示例性的而非限制性的，例如其中的一些單元可以集成，而另一些單元也可以分離或依應用場景的不同而被省略或替換。FIG. 7 is an overall schematic block diagram showing a multiplier 700 according to an embodiment of the present disclosure. It should be understood that the positions, existence, and connection relationships of the various units depicted in the figure are only exemplary and not restrictive. For example, some of the units can be integrated, while other units can also be separated or depending on the application scenario. It is omitted or replaced if it is different.

本披露的乘法器在每種運算模式的操作中按操作流程可以示例性地分為第一階段和第二階段，如圖中的虛線所繪出的。概括來說，在第一階段中：輸出符號位的計算結果，輸出指數位的中間計算結果，輸出尾數位的中間計算結果 (例如包括前述的輸入尾數位定點乘法布斯算法的編碼過程和華萊士樹壓縮過程)。在第二階段中：對指數和尾數進行規則化和捨入操作，以輸出指數的計算結果和輸出尾數的計算結果。The multiplier of the present disclosure can be exemplarily divided into a first stage and a second stage in the operation of each operation mode according to the operation flow, as shown by the dotted line in the figure. In summary, in the first stage: output the calculation result of the sign bit, output the intermediate calculation result of the exponent bit, output the intermediate calculation result of the mantissa bit (for example, including the aforementioned encoding process of the input mantissa fixed-point multiplication Booth algorithm and Hua Laisha tree compression process). In the second stage: regularize and round the exponent and mantissa to output the calculation result of the exponent and the calculation result of the mantissa.

如第7圖中所示，本披露的乘法器可以包括模式選擇單元702和規格化處理單元704，其中模式選擇單元可以根據輸入模式信號 (in_mode)來選擇運算模式。在一個實施例中，該輸入模式信號可以與表2中的運算模式編號相對應。例如，當輸入模式信號指示表2中的運算模式編號“1”時，則可以令乘法器工作於FP16*FP16的運算模式中，而當輸入模式信號指示表2中的運算模式編號“3”時，則可以令乘法器工作於FP32*FP32的運算模式中。為了圖示的目的，第7圖僅示出FP16*FP16、BF16*BF16、FP32*FP32和FP32*BP16四種示例性運算模式。然而，正如前所述，本披露的乘法器同樣也支持其他多種不同的運算模式。As shown in Figure 7, the multiplier of the present disclosure may include a mode selection unit 702 and a normalization processing unit 704, wherein the mode selection unit may select an operation mode according to the input mode signal (in_mode). In an embodiment, the input mode signal may correspond to the operation mode number in Table 2. For example, when the input mode signal indicates the operation mode number "1" in Table 2, the multiplier can be made to work in the operation mode of FP16*FP16, and when the input mode signal indicates the operation mode number "3" in Table 2 At this time, the multiplier can be operated in the FP32*FP32 operation mode. For the purpose of illustration, Figure 7 only shows four exemplary operation modes of FP16*FP16, BF16*BF16, FP32*FP32, and FP32*BP16. However, as mentioned above, the multiplier of the present disclosure also supports many other different operation modes.

規格化處理單元可以配置成用於當第一浮點數或第二浮點數為非規格化的非零浮點數時，根據運算模式，對第一浮點數或第二浮點數進行規格化處理，以獲得對應的指數和尾數，例如按照IEEE754標準、對運算模式所指示的資料格式的浮點數進行規則化處理。The normalization processing unit may be configured to perform processing on the first floating-point number or the second floating-point number according to the operation mode when the first floating-point number or the second floating-point number is a non-normalized non-zero floating-point number. Normalization processing to obtain the corresponding exponent and mantissa, for example, according to the IEEE754 standard, the floating-point number in the data format indicated by the operation mode is regularized.

進一步，乘法器包括尾數處理單元，以執行第一浮點數尾數和第二浮點數尾數的相乘操作。為此，在一個或多個實施例中，該尾數處理單元可以包括位數擴展電路706、布斯編碼器708、部分積產生電路710、華萊士樹壓縮器712以及加法器714，其中位數擴展電路可以用於對所述第一浮點數和所述第二浮點數中的至少一個的尾數進行位數擴展，例如在高位補0，以適合於布斯編碼器的操作。控制電路可以根據位數擴展電路對尾數進行符號位擴展後獲得的尾數進行以上多次調用尾數處理單元的操作。由於關於布斯編碼器、部分積產生電路、華萊士樹壓縮器和加法器，已經結合第4圖-第6圖進行了詳細了描述，因此相同的描述在此同樣適用並因此不再贅述。Further, the multiplier includes a mantissa processing unit to perform a multiplication operation of the first floating-point number mantissa and the second floating-point number mantissa. To this end, in one or more embodiments, the mantissa processing unit may include a bit expansion circuit 706, a Booth encoder 708, a partial product generation circuit 710, a Wallace tree compressor 712, and an adder 714. The number expansion circuit may be used to expand the mantissa of at least one of the first floating-point number and the second floating-point number by a number of bits, for example, adding zeros to the upper bits, so as to be suitable for the operation of the Booth encoder. The control circuit can perform the above operations of calling the mantissa processing unit multiple times according to the mantissa obtained after the sign bit extension of the mantissa by the bit extension circuit. Since the Booth encoder, partial product generation circuit, Wallace tree compressor and adder have been described in detail in conjunction with Figures 4-6, the same description is also applicable here and will not be repeated here. .

在一些實施例中，本披露的乘法器還包括規則化單元716和捨入單元718，該規則化單元和捨入單元具有與第3圖中所示出的單元相同的功能。具體地，對於規則化單元，其可以根據如第7圖中所示的輸出模式信號“out_mode”所指示的資料格式來對所述加和結果和來自於指數處理單元的指數資料進行浮點數規則化處理以獲得規則化指數結果和規則化尾數結果。例如，根據輸出模式信號所指示的資料格式，規則化單元可以調整指數和尾數的位寬，以使其符合前述指示的資料格式的要求。再例如，當尾數的最高位為0，且該尾數不為0，則規則化單元可以重複將尾數左移1位，並且指數減1，直到最高位數值為1。對於捨入單元，在一個實施例中，其可以用於根據捨入模式對所述規則化尾數結果執行捨入操作以獲得捨入後的尾數，並將捨入後的尾數作為所述乘法運算後的尾數。In some embodiments, the multiplier of the present disclosure further includes a regularization unit 716 and a rounding unit 718, and the regularization unit and the rounding unit have the same functions as the units shown in FIG. 3. Specifically, for the regularization unit, it can perform floating-point numbers on the addition result and the exponent data from the exponent processing unit according to the data format indicated by the output mode signal "out_mode" as shown in Figure 7. Regularization processing to obtain regularized exponent results and regularized mantissa results. For example, according to the data format indicated by the output mode signal, the regularization unit can adjust the bit width of the exponent and the mantissa to meet the requirements of the data format indicated above. For another example, when the highest bit of the mantissa is 0, and the mantissa is not 0, the regularization unit can repeatedly shift the mantissa by 1 bit to the left, and subtract 1 from the exponent until the highest bit value is 1. For the rounding unit, in one embodiment, it can be used to perform a rounding operation on the regularized mantissa result according to a rounding mode to obtain a rounded mantissa, and use the rounded mantissa as the multiplication operation After the mantissa.

在一個或多個實施例中，前述的輸出模式信號可以是運算模式的一部分，用於指示乘法運算後的資料格式。例如，如前表3中所描述的，當運算模式編號為“12”時，則其中的數字“1”可以相當於前述的“in_mode”信號，用於指示執行FP16*FP16的乘法操作，而其中的數位“2”可以相當於“out_mode”信號，用於指示輸出結果的資料類型是BF16。因此可以理解的是，在一些應用場景中，輸出模式信號可以與前述的輸入模式信號合併，以提供給模式選擇單元。基於此合併後的模式信號，模式選擇單元可以在乘法器操作的初始階段明確輸入資料和輸出結果的資料格式，而無需向規則化單獨的提供輸出模式信號，由此也可以進一步簡化操作。In one or more embodiments, the aforementioned output mode signal may be a part of the operation mode, and is used to indicate the data format after the multiplication operation. For example, as described in Table 3 above, when the operation mode number is "12", the number "1" can be equivalent to the aforementioned "in_mode" signal, which is used to instruct the execution of the FP16*FP16 multiplication operation, and The digit "2" can be equivalent to the "out_mode" signal, which is used to indicate that the data type of the output result is BF16. Therefore, it can be understood that in some application scenarios, the output mode signal may be combined with the aforementioned input mode signal to provide the mode selection unit. Based on the combined mode signal, the mode selection unit can clarify the data format of the input data and the output result in the initial stage of the operation of the multiplier without separately providing the output mode signal to the regularization, which can further simplify the operation.

在一個或多個實施例中，對於前述的捨入操作，可以示例性包括如下5種捨入模式。In one or more embodiments, for the aforementioned rounding operation, the following five rounding modes can be exemplarily included.

(1)捨入到最接近的值：在此模式下，當兩個值同樣接近的情況下，偶數優先。此時會將結果捨入為最接近且可以表示的值，但是當存在兩個數同樣接近的時候，則取其中的偶數作為捨入結果(在二進位中是以0結尾的數)；(1) Round to the nearest value: In this mode, when the two values are similarly close, the even number takes precedence. At this time, the result will be rounded to the closest and representable value, but when there are two numbers that are equally close, the even number among them is taken as the rounding result (in the binary bit, it is a number ending in 0);

(2)四捨五入：示例性操作參見下面的例子；(2) Rounding: See the example below for exemplary operations;

(3)朝+∞方向捨入：在此規則下，會將結果朝正無限大的方向捨入；(3) Rounding towards +∞: Under this rule, the result will be rounded towards positive infinity;

(4)朝-∞方向捨入：在此規則下，會將結果朝負無限大的方向捨入；以及(4) Rounding towards -∞: Under this rule, the result will be rounded towards negative infinity; and

(5)朝0方向捨入：在此規則下，會將結果朝0的方向捨入。(5) Rounding towards 0: Under this rule, the result will be rounded towards 0.

對於“四捨五入”模式下的尾數捨入的例子：例如兩個規格化浮點數的24位的尾數相乘得到一個48位(47~0)的尾數，經過規格化處理（若尾數的最高位為0，將尾數左移1位；若尾數的最高位為1，則尾數不動，且將前面所求的臨時的階碼加1），輸出時只取第46至第24位。當尾數的第23位為0時，則捨去第(23-0)位；當尾數的第23位為1時，則向第24位進1並捨去第(23-0)位。For the example of mantissa rounding in "rounding" mode: for example, the 24-bit mantissa of two normalized floating-point numbers is multiplied to obtain a 48-bit (47~0) mantissa, which is normalized (if the highest bit of the mantissa is If it is 0, move the mantissa to the left by 1 bit; if the highest bit of the mantissa is 1, the mantissa does not move, and the temporary order code requested above is added by 1), and only the 46th to the 24th digits are taken during output. When the 23rd digit of the mantissa is 0, the (23-0) digit is discarded; when the 23rd digit of the mantissa is 1, the 24th digit is 1 and the (23-0) digit is discarded.

返回到第7圖，本披露的乘法器還包括指數處理單元720和符號處理單元722，其中指數處理單元可以用於根據運算模式、第一浮點數的指數和第二浮點數的指數獲得所述乘法運算後的指數。例如，指數處理電路可以將第一浮點數的指數位資料、第二浮點數的指數位資料和各自對應的輸入浮點資料類型的偏移值相加，並且減去輸出浮點資料類型的偏移值，以獲得所述第一浮點數和第二浮點數的乘積的指數位資料。在一個或多個實施例中，指數處理單元可以實現為或包括加減法電路，其用於根據所述運算模式、所述第一浮點數的指數、所述第二浮點數的指數和所述運算模式獲得所述乘法運算後的指數。Returning to Figure 7, the multiplier of the present disclosure further includes an exponent processing unit 720 and a sign processing unit 722, wherein the exponent processing unit can be used to obtain the exponent of the first floating-point number and the exponent of the second floating-point number according to the operation mode, The exponent after the multiplication operation. For example, the exponent processing circuit can add the exponent bit data of the first floating-point number, the exponent bit data of the second floating-point number, and the respective offset values of the corresponding input floating-point data type, and subtract the output floating-point data type To obtain the exponent bit data of the product of the first floating-point number and the second floating-point number. In one or more embodiments, the exponent processing unit may be implemented as or include an addition and subtraction circuit, which is configured to perform according to the operation mode, the exponent of the first floating-point number, the exponent of the second floating-point number, and The operation mode obtains the exponent after the multiplication operation.

符號處理單元在一個實施例中可以實現為異或電路，其用於對所述第一浮點數和第二浮點數的符號位資料執行異或操作，以獲得所述第一浮點數和第二浮點數的乘積的符號位資料。The sign processing unit may be implemented as an exclusive OR circuit in one embodiment, which is used to perform an exclusive OR operation on the sign bit data of the first floating point number and the second floating point number to obtain the first floating point number The sign data of the product of the second floating-point number.

上文結合第7圖對本披露的乘法器整體進行了詳細的描述。通過該描述，本領域具有通常知識者可以理解本披露的乘法器支援多種運算模式下的操作，從而克服了習知技術中僅支援單一浮點型運算的乘法器的缺陷。進一步，由於本披露的乘法器可以複用，因此也支援高位寬的浮點型資料，降低了運算成本和開銷。在一個或多個實施例中，本披露的乘法器還可以佈置成或包括於積體電路晶片或計算裝置中，以實現在多種運算模式下對浮點數執行乘法運算。The entire multiplier of the present disclosure is described in detail above in conjunction with Figure 7. Through this description, those with ordinary knowledge in the art can understand that the multiplier of the present disclosure supports operations in multiple operation modes, thereby overcoming the defect of the multiplier that only supports a single floating-point operation in the prior art. Furthermore, because the multiplier of the present disclosure can be multiplexed, it also supports high-bit wide floating-point data, which reduces the computational cost and overhead. In one or more embodiments, the multiplier of the present disclosure may also be arranged or included in an integrated circuit chip or a computing device to implement multiplication operations on floating-point numbers in multiple operation modes.

第8圖是示出根據本披露實施例的使用乘法器執行浮點數乘法運算的方法800的流程圖。可以理解的是此處所述的乘法器即前面結合第1圖-第7圖詳細描述的乘法器，因此在前關於該乘法器及其內部組成、功能和操作的描述也同樣適用於此處的描述。FIG. 8 is a flowchart illustrating a method 800 for performing floating-point number multiplication operations using a multiplier according to an embodiment of the present disclosure. It can be understood that the multiplier described here is the multiplier described in detail in conjunction with Figures 1-7, so the previous descriptions of the multiplier and its internal composition, functions and operations are also applicable here. description of.

如第8圖中所示，所述方法800可以包括在步驟S802處利用所述乘法器的指數處理單元來根據運算模式、第一浮點數的指數和第二浮點數的指數獲得所述乘法運算後的指數。正如前所述，該運算模式可以是多種運算模式中的一種，並且可以用於指示浮點數的資料格式。在一個或多個實施例中，該運算模式還可以用於確定輸出結果的浮點數的資料格式。As shown in Figure 8, the method 800 may include using the exponent processing unit of the multiplier at step S802 to obtain the exponent according to the operation mode, the exponent of the first floating-point number, and the exponent of the second floating-point number. Exponent after multiplication. As mentioned earlier, the operation mode can be one of a variety of operation modes, and can be used to indicate the data format of floating-point numbers. In one or more embodiments, the operation mode can also be used to determine the data format of the floating point number of the output result.

接著，在步驟S804處，該方法800可以利用乘法器的尾數處理單元來根據所述運算模式、第一浮點數和第二浮點數獲得所述乘法運算後的尾數。關於尾數的示例性操作，本披露在一些優選的實施例中使用了布斯編碼算法和華萊士樹壓縮器，從而提高尾數處理的效率。另外，當第一浮點數和第二浮點數是有符號數時，方法800還可以在步驟S806中用於根據第一浮點數的符號和第二浮點數的符號獲得乘法運算後的符號。Next, at step S804, the method 800 may use the mantissa processing unit of the multiplier to obtain the mantissa after the multiplication operation according to the operation mode, the first floating-point number, and the second floating-point number. Regarding the exemplary operation of the mantissa, the present disclosure uses the Booth coding algorithm and the Wallace tree compressor in some preferred embodiments, so as to improve the efficiency of the mantissa processing. In addition, when the first floating-point number and the second floating-point number are signed numbers, the method 800 may also be used in step S806 to obtain the post-multiplication operation according to the sign of the first floating-point number and the sign of the second floating-point number. symbol.

儘管上述方法以步驟形式示出利用本披露的乘法器來執行浮點數乘法運算，但這些步驟順序並不意味著本方法的步驟必須依所述順序來執行，而是可以以其他順序或並行的方式來處理。另外，此處為了描述的簡明而沒有闡述方法800的其他步驟，但本領域具有通常知識者根據本披露的內容可以理解該方法也可以通過使用乘法器來執行前述結合第1圖-第7圖描述的各種操作。Although the above method shows the use of the multiplier of the present disclosure to perform floating-point number multiplication in the form of steps, the order of these steps does not mean that the steps of the method must be executed in the stated order, but can be performed in other orders or in parallel. Way to deal with. In addition, the other steps of the method 800 are not described here for the sake of brevity of description, but those with ordinary knowledge in the field can understand that the method can also be executed by using a multiplier in combination with Figures 1-7. Various operations described.

在本披露的上述實施例中，對各個實施例的描述都各有側重，某個實施例中沒有詳述的部分，可以參見其他實施例的相關描述。上述實施例的各技術特徵可以進行任意的組合，為使描述簡潔，未對上述實施例中的各個技術特徵所有可能的組合都進行描述，然而，只要這些技術特徵的組合不存在矛盾，都應當認為是本說明書記載的範圍。In the above-mentioned embodiments of the present disclosure, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should all be combined. It is considered as the range described in this specification.

第9圖是示出根據本披露實施例的一種組合處理裝置900的結構圖。如圖所示，該組合處理裝置900包括計算裝置902，該計算裝置可以包括如前結合附圖描述的本披露的乘法器。另外，該組合處理裝置還包括通用互聯介面904和其他處理裝置906。根據本披露的計算裝置與其他處理裝置進行交互，共同完成用戶指定的操作。Figure 9 is a structural diagram showing a combined processing device 900 according to an embodiment of the present disclosure. As shown in the figure, the combined processing device 900 includes a computing device 902, which may include the multiplier of the present disclosure as described above with reference to the accompanying drawings. In addition, the combined processing device also includes a universal interconnection interface 904 and other processing devices 906. The computing device according to the present disclosure interacts with other processing devices to jointly complete the operation specified by the user.

根據本披露的方案，該其他處理裝置可以包括中央處理器(“CPU”)、圖形處理器(“GPU”)、神經網路處理器等通用和／或專用處理器中的一種或多種類型的處理器，其數目不做限制而是依實際需要來確定。在一個或多個實施例中，該其他處理裝置可以作為本披露的計算裝置(其可以具體化為機器學習運算裝置)與外部資料和控制的介面，執行包括但不限於資料搬運，完成對本機器學習運算裝置的開啟、停止等的基本控制；其他處理裝置也可以和機器學習運算裝置協作共同完成運算任務。According to the solution of the present disclosure, the other processing device may include one or more types of general-purpose and/or special-purpose processors such as a central processing unit ("CPU"), a graphics processing unit ("GPU"), and a neural network processor. The number of processors is not limited but determined according to actual needs. In one or more embodiments, the other processing device can be used as an interface between the computing device of the present disclosure (which can be embodied as a machine learning computing device) and external data and control, and perform operations including but not limited to data handling, and complete the processing of the machine. The basic control of the start and stop of the learning computing device; other processing devices can also cooperate with the machine learning computing device to complete computing tasks.

根據本披露的方案，該通用互聯介面可以用於在計算裝置與其他處理裝置間傳輸資料和控制指令。例如，該計算裝置可以經由所述通用互聯介面從其他處理裝置中獲取所需的輸入資料，寫入該計算裝置片上的存儲裝置。進一步，該計算裝置可以經由所述通用互聯介面從其他處理裝置中獲取控制指令，寫入計算裝置片上的控制緩存。替代地或可選地，通用互聯介面也可以讀取計算裝置的存儲模組中的資料並傳輸給其他處理裝置。According to the solution of the present disclosure, the universal interconnection interface can be used to transmit data and control commands between a computing device and other processing devices. For example, the computing device can obtain required input data from other processing devices via the universal interconnection interface, and write the input data to the on-chip storage device of the computing device. Further, the computing device can obtain control instructions from other processing devices via the universal interconnection interface, and write them into the on-chip control buffer of the computing device. Alternatively or alternatively, the universal interconnection interface can also read the data in the storage module of the computing device and transmit it to other processing devices.

可選地，該組合處理裝置還可以包括存儲裝置908，其可以分別與所述計算裝置和所述其他處理裝置連接。在一個或多個實施例中，存儲裝置可以用於保存所述計算裝置和所述其他處理裝置的資料，尤其適用於所需要運算的資料在本計算裝置或其他處理裝置的內部存儲中無法全部保存的資料。Optionally, the combined processing device may further include a storage device 908, which may be connected to the computing device and the other processing device respectively. In one or more embodiments, the storage device may be used to store the data of the computing device and the other processing device, and it is especially suitable for the data required to be calculated in the internal storage of the computing device or other processing device. Saved information.

根據應用場景的不同，本披露的組合處理裝置可以作為手機、機器人、無人機、視頻採集、視頻監控設備等設備的系統單晶片，從而有效地降低控制部分的核心面積，提高處理速度並降低整體的功耗。在此情況時，該組合處理裝置的通用互聯介面與設備的某些部件相連接。此處的某些部件可以例如是監視器，顯示器，滑鼠，鍵盤，網卡或wifi介面。According to different application scenarios, the combined processing device of the present disclosure can be used as a system-on-chip for mobile phones, robots, drones, video capture, video surveillance equipment and other equipment, thereby effectively reducing the core area of the control part, increasing the processing speed and reducing the overall Power consumption. In this case, the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components here can be, for example, a monitor, a display, a mouse, a keyboard, a network card, or a wifi interface.

在一些實施例裡，本披露還公開了一種晶片或積體電路晶片，其包括了上述計算裝置、組合處理裝置以及本披露的乘法器。在另一些實施例裡，本披露還公開了一種晶片封裝結構，其包括了上述晶片。In some embodiments, the present disclosure also discloses a chip or integrated circuit chip, which includes the above-mentioned computing device, the combined processing device, and the multiplier of the present disclosure. In other embodiments, the present disclosure also discloses a chip package structure, which includes the above-mentioned chip.

在一些實施例裡，本披露還公開了一種板卡，其包括了上述晶片封裝結構。參閱第10圖，其提供了前述的示例性板卡1000，上述板卡1000除了包括上述晶片1002以外，還可以包括其他的配套部件，該配套部件可以包括但不限於：存儲器件1004、介面裝置1006和控制器件1008。In some embodiments, the present disclosure also discloses a board card, which includes the above-mentioned chip packaging structure. Refer to Figure 10, which provides the aforementioned exemplary board card 1000. In addition to the aforementioned chip 1002, the board card 1000 may also include other supporting components. The supporting components may include, but are not limited to: a storage device 1004, an interface device 1006 and control device 1008.

所述存儲器件與所述晶片封裝結構內的晶片通過匯流排連接，用於存儲資料。所述存儲器件可以包括多組存儲單元1010。每一組所述存儲單元與所述晶片通過匯流排連接。可以理解，每一組所述存儲單元可以是DDR SDRAM(“Double Data Rate SDRAM”，雙倍速率同步動態隨機記憶體)。The storage device is connected with the chip in the chip package structure through a bus bar for storing data. The storage device may include multiple groups of storage units 1010. Each group of the storage unit and the chip are connected by a bus bar. It can be understood that each group of the storage units may be DDR SDRAM ("Double Data Rate SDRAM", double-rate synchronous dynamic random memory).

DDR不需要提高時鐘頻率就能加倍提高SDRAM的速度。DDR允許在時鐘脈衝的上升沿和下降沿讀出資料。DDR的速度是標準SDRAM的兩倍。在一個實施例中，所述存儲器件可以包括4組所述存儲單元。每一組所述存儲單元可以包括多個DDR4顆粒(晶片)。在一個實施例中，所述晶片內部可以包括4個72位DDR4控制器，上述72位DDR4控制器中64bit用於傳輸資料，8bit用於ECC校驗。DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In an embodiment, the storage device may include 4 groups of the storage unit. Each group of the memory cell may include a plurality of DDR4 particles (wafers). In an embodiment, the chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification.

在一個實施例中，每一組所述存儲單元可以包括多個並聯設置的雙倍速率同步動態隨機記憶體。DDR在一個時鐘週期內可以傳輸兩次數據。在所述晶片中設置控制DDR的控制器，用於對每個所述存儲單元的資料傳輸與資料存儲的控制。In an embodiment, each group of the storage unit may include a plurality of double-rate synchronous dynamic random memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is arranged in the chip for controlling the data transmission and data storage of each storage unit.

所述介面裝置與所述晶片封裝結構內的晶片電連接。所述介面裝置用於實現所述晶片與外部設備1012(例如伺服器或電腦)之間的資料傳輸。例如在一個實施例中，所述介面裝置可以為標準PCIE介面。例如，待處理的資料由伺服器通過標準PCIE介面傳遞至所述晶片，實現資料轉移。在另一個實施例中，所述介面裝置還可以是其他的介面，本披露並不限制上述其他的介面的具體表現形式，所述介面單元能夠實現轉接功能即可。另外，所述晶片的計算結果仍由所述介面裝置傳送回外部設備(例如伺服器)。The interface device is electrically connected to the chip in the chip package structure. The interface device is used to implement data transmission between the chip and an external device 1012 (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer. In another embodiment, the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function. In addition, the calculation result of the chip is still sent back to an external device (such as a server) by the interface device.

所述控制器件與所述晶片電連接，以便對所述晶片的狀態進行監控。具體地，所述晶片與所述控制器件可以通過SPI介面電連接。所述控制器件可以包括微控制器(“MCU”，Micro Controller Unit)。所述晶片可以包括多個處理晶片、多個處理核或多個處理電路，並且可以帶動多個負載。由此，所述晶片可以處於多負載和輕負載等不同的工作狀態。通過所述控制裝置可以實現對所述晶片中多個處理晶片、多個處理和/或多個處理電路的工作狀態的調控。The control device is electrically connected with the wafer to monitor the state of the wafer. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a microcontroller ("MCU", Micro Controller Unit). The wafer may include multiple processing wafers, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the wafer can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of multiple processing wafers, multiple processing and/or multiple processing circuits in the wafer.

在一些實施例裡，本披露還公開了一種電子設備或裝置，其包括了上述板卡。根據不同的應用場景，電子設備或裝置可以包括資料處理裝置、機器人、電腦、印表機、掃描器、平板電腦、智慧終端機、手機、行車記錄器、導航儀、感測器、監視器、伺服器、雲端伺服器、相機、攝影機、投影儀、手錶、耳機、移動存儲、可穿戴設備、交通工具、家用電器、和/或醫療設備。所述交通工具包括飛機、輪船和/或車輛；所述家用電器包括電視、空調、微波爐、冰箱、電鍋、加濕器、洗衣機、電燈、瓦斯爐、油煙機；所述醫療設備包括核磁共振儀、B超儀和/或心電圖儀。In some embodiments, the present disclosure also discloses an electronic device or device, which includes the above-mentioned board. According to different application scenarios, electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, monitors, Servers, cloud servers, cameras, cameras, projectors, watches, headsets, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; and the medical equipment includes nuclear magnetic resonance Instrument, B-ultrasound instrument and/or electrocardiograph.

需要說明的是，對於前述的各方法實施例，為了簡單描述，故將其都表述為一系列的動作組合，但是本領域具有通常知識者應該知悉，本披露並不受所描述的動作順序的限制，因為依據本披露，某些步驟可以採用其他順序或者同時進行。其次，本領域具有通常知識者也應該知悉，說明書中所描述的實施例均屬於可選實施例，所涉及的動作和模組並不一定是本披露所必須的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those with ordinary knowledge in the art should know that this disclosure is not subject to the described sequence of actions. Restricted, because according to this disclosure, certain steps can be performed in other order or at the same time. Secondly, those with ordinary knowledge in the art should also be aware that the embodiments described in the specification are optional embodiments, and the actions and modules involved are not necessarily required for this disclosure.

在上述實施例中，對各個實施例的描述都各有側重，某個實施例中沒有詳述的部分，可以參見其他實施例的相關描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

在本披露所提供的幾個實施例中，應該理解到，所披露的裝置，可通過其它的方式實現。例如，以上所描述的裝置實施例僅僅是示意性的，例如所述單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如多個單元或元件可以結合或者可以集成到另一個系統，或一些特徵可以忽略，或不執行。另一點，所顯示或討論的相互之間的耦合或直接耦合或通信連接可以是通過一些介面，裝置或單元的間接耦合或通信連接，可以是電性、光學、聲學、磁性或其它的形式。In the several embodiments provided in this disclosure, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or elements can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, optical, acoustic, magnetic or other forms.

所述作為分離部件說明的單元可以是或者也可以不是物理上分開的，作為單元顯示的部件可以是或者也可以不是物理單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部單元來實現本實施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外，在本披露各個實施例中的各功能單元可以集成在一個處理單元中，也可以是各個單元單獨物理存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用軟體程式模組的形式實現。In addition, the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized either in the form of hardware or in the form of software program modules.

所述集成的單元如果以軟體程式模組的形式實現並作為獨立的產品銷售或使用時，可以存儲在一個電腦可讀取記憶體中。基於這樣的理解，當本披露的技術方案可以以軟體產品的形式體現出來，該電腦軟體產品存儲在一個存儲器中，包括若干指令用以使得一台電腦設備(可為個人電腦、伺服器或者網路設備等)執行本披露各個實施例所述方法的全部或部分步驟。而前述的存儲器包括：隨身碟、唯讀記憶體(“ROM”，Read-Only Memory)、隨機存取記憶體(“RAM”，Random Access Memory)、移動硬碟、磁碟或者光碟等各種可以存儲程式碼的介質。If the integrated unit is realized in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, when the technical solution of this disclosure can be embodied in the form of a software product, the computer software product is stored in a memory and includes a number of instructions to enable a computer device (which can be a personal computer, a server or a network). Road equipment, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: flash drives, read-only memory ("ROM", Read-Only Memory), random access memory ("RAM", Random Access Memory), mobile hard drives, magnetic disks or optical disks, etc. The medium on which the program code is stored.

依據以下條款可更好地理解前述內容：The foregoing can be better understood according to the following clauses:

條款A1，一種乘法器，用於進行浮點數的乘法運算，其中，所述乘法器包括：Clause A1, a multiplier for multiplication of floating-point numbers, wherein the multiplier includes:

尾數處理單元，用於根據所述浮點數的尾數來獲得所述乘法運算後的尾數，The mantissa processing unit is configured to obtain the mantissa after the multiplication operation according to the mantissa of the floating-point number,

所述尾數處理單元包括控制電路，所述控制電路用於在兩個浮點數中的至少一個的尾數位寬大於所述尾數處理單元一次可處理的資料位寬時，多次調用所述尾數處理單元。The mantissa processing unit includes a control circuit configured to call the mantissa multiple times when the bit width of at least one of the two floating-point numbers is greater than the data bit width that can be processed by the mantissa processing unit at one time Processing unit.

條款A2，根據條款A1所述的乘法器，其中，所述兩個浮點數包括第一浮點數和第二浮點數，所述尾數處理單元支援第一位寬和第二位寬，所述第一浮點數的尾數作為與所述第一位寬對應的第一輸入，所述第二浮點數的尾數作為與所述第二位寬對應的第二輸入，所述第一輸入的位寬小於或等於所述第一位寬，所述控制電路用於當所述第二輸入的位寬大於所述第二位寬時，多次調用所述尾數處理單元來獲得所述乘法運算後的尾數。Clause A2, the multiplier according to clause A1, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, and the mantissa processing unit supports a first bit width and a second bit width, The mantissa of the first floating-point number is used as the first input corresponding to the first bit width, the mantissa of the second floating-point number is used as the second input corresponding to the second bit width, and the first The bit width of the input is less than or equal to the first bit width, and the control circuit is configured to call the mantissa processing unit multiple times to obtain the bit width of the second input when the bit width of the second input is greater than the second bit width. The mantissa after the multiplication operation.

條款A3，根據條款A1或條款A2所述的乘法器，其中，所述兩個浮點數包括第一浮點數和第二浮點數，所述尾數處理單元支援第一位寬和第二位寬，所述第一浮點數的尾數作為與所述第一位寬對應的第一輸入，所述第二浮點數的尾數作為與所述第二位寬對應的第二輸入，所述控制電路用於當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬小於或等於所述第二位寬時、當所述第二輸入的位寬大於所述第二位寬且所述第一輸入的位寬小於或等於所述第一位寬時或者當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬大於所述第二位寬時，多次調用所述尾數處理單元來獲得所述乘法運算後的尾數。Clause A3, the multiplier according to clause A1 or clause A2, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, and the mantissa processing unit supports a first bit width and a second bit width. Bit width, the mantissa of the first floating point number is used as the first input corresponding to the first bit width, and the mantissa of the second floating point number is used as the second input corresponding to the second bit width, so The control circuit is used for when the bit width of the first input is greater than the first bit width and the bit width of the second input is less than or equal to the second bit width, when the bit width of the second input is larger When the second bit width and the bit width of the first input are less than or equal to the first bit width or when the bit width of the first input is greater than the first bit width and the second input When the bit width is greater than the second bit width, the mantissa processing unit is called multiple times to obtain the mantissa after the multiplication operation.

條款A4，根據條款A1-A3任一項所述的乘法器，其中，當所述第一浮點數的尾數位寬小於所述第二浮點數的尾數位寬並且所述第一位寬大於所述第二位寬時，或者當所述第一浮點數的尾數位寬大於所述第二浮點數的尾數位寬並且所述第一位寬小於所述第二位寬時，所述控制電路選擇所述第一浮點數的尾數作為與所述第二位寬對應的所述第二輸入並且選擇所述第二浮點數的尾數作為與所述第一位寬對應的第一輸入。Clause A4, the multiplier according to any one of clauses A1-A3, wherein when the mantissa bit width of the first floating-point number is smaller than the mantissa bit width of the second floating-point number and the first bit width is larger At the second bit width, or when the mantissa bit width of the first floating-point number is greater than the mantissa bit width of the second floating-point number and the first bit width is smaller than the second bit width, The control circuit selects the mantissa of the first floating-point number as the second input corresponding to the second bit width and selects the mantissa of the second floating-point number as the mantissa corresponding to the first bit width The first input.

條款A5，根據條款A1-A4任一項所述的乘法器，其中，當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬小於或等於所述第二位寬時，所述控制電路根據所述第一輸入的位寬和所述第一位寬來確定調用所述尾數處理單元的次數以及在每次調用中輸入所述尾數處理單元的資料。Clause A5, the multiplier according to any one of clauses A1-A4, wherein when the bit width of the first input is greater than the first bit width and the bit width of the second input is less than or equal to the first bit width When the width is two bits, the control circuit determines the number of invocations of the mantissa processing unit according to the bit width of the first input and the first bit width and inputs the data of the mantissa processing unit in each call.

條款A6，根據條款A1-A5任一項所述的乘法器，其中，當所述第二輸入的位寬大於所述第二位寬且所述第一輸入的位寬小於或等於所述第一位寬時，所述控制電路根據所述第二輸入的位寬和所述第二位寬來確定調用所述尾數處理單元的次數以及在每次調用中輸入所述尾數處理單元的資料。Clause A6, the multiplier according to any one of clauses A1-A5, wherein when the bit width of the second input is greater than the second bit width and the bit width of the first input is less than or equal to the first input When one bit is wide, the control circuit determines the number of invocations of the mantissa processing unit according to the bit width of the second input and the second bit width and inputs the data of the mantissa processing unit in each call.

條款A7，根據條款A1-A6任一項所述的乘法器，其中，當所述第一輸入的位寬大於所述第一位寬且所述第二輸入的位寬大於所述第二位寬時，所述控制電路根據所述第一輸入的位寬和所述第一位寬以及所述第二輸入的位寬和所述第二位寬來確定調用所述尾數處理單元的次數以及在每次調用中輸入所述尾數處理單元的資料。Clause A7, the multiplier according to any one of clauses A1-A6, wherein when the bit width of the first input is greater than the first bit width and the bit width of the second input is greater than the second bit When it is wide, the control circuit determines the number of times to call the mantissa processing unit according to the bit width of the first input and the first bit width, and the bit width of the second input and the second bit width, and Enter the data of the mantissa processing unit in each call.

條款A8，根據條款A1-A7任一項所述的乘法器，其中，所述尾數處理單元還包括移位加法電路，所述移位加法電路用於根據每次調用所述尾數處理單元所獲得的尾數結果來獲得所述乘法運算後的尾數。Clause A8, the multiplier according to any one of clauses A1-A7, wherein the mantissa processing unit further includes a shift and add circuit, and the shift and add circuit is configured to obtain the result of each invocation of the mantissa processing unit. The mantissa result of to obtain the mantissa after the multiplication operation.

條款A9，根據條款A1-A8任一項所述的乘法器，其中，所述移位加法電路包括移位器、中間存儲器和加法器，當所述控制電路多次調用所述尾數處理單元時，在第一次調用後，所述移位器將第一次調用獲得的尾數結果進行移位獲得移位後尾數結果並將所述移位後尾數結果存入所述中間存儲器中，從第二次調用開始，所述移位器將當次調用中獲得的尾數結果進行移位獲得當次尾數結果，所述加法器將所述當次尾數結果與存儲在所述中間存儲器中的結果相加並且將相加後的結果存儲在所述中間存儲器中來更新所述中間存儲器，並且在最後一次調用後存儲在所述中間存儲器中的結果作為所述乘法運算後的尾數。Clause A9, the multiplier according to any one of clauses A1-A8, wherein the shift and add circuit includes a shifter, an intermediate memory, and an adder, and when the control circuit calls the mantissa processing unit multiple times After the first call, the shifter shifts the mantissa result obtained in the first call to obtain the shifted mantissa result and stores the shifted mantissa result in the intermediate memory, starting from the first At the beginning of the second call, the shifter shifts the mantissa result obtained in the current call to obtain the current mantissa result, and the adder compares the current mantissa result with the result stored in the intermediate memory. Add and store the result of the addition in the intermediate memory to update the intermediate memory, and the result stored in the intermediate memory after the last call is used as the mantissa after the multiplication operation.

條款A10，根據條款A1-A9任一項所述的乘法器，其中，所述乘法器還包括指數處理單元，所述指數處理單元用於根據所述兩個浮點數的指數來獲得所述乘法運算後的指數，所述指數處理單元包括第二控制電路，所述第二控制電路用於根據所述兩個浮點數中的一個的指數位寬和所述指數處理單元所支援的兩個位寬中的一個或者根據所述兩個浮點數的指數位寬和所述指數處理單元所支援的兩個位寬來確定多次調用所述指數處理單元以獲得所述乘法運算後的指數。Clause A10, the multiplier according to any one of clauses A1-A9, wherein the multiplier further includes an exponent processing unit configured to obtain the exponent of the two floating-point numbers The exponent after the multiplication operation, the exponent processing unit includes a second control circuit, and the second control circuit is used to determine the exponent width of one of the two floating-point numbers and the two supported by the exponent processing unit. One of the bit widths or the exponent bit width of the two floating-point numbers and the two bit widths supported by the exponent processing unit are used to determine multiple calls to the exponent processing unit to obtain the multiplication index.

條款A11，根據條款A1-A10任一項所述的乘法器，其中，所述兩個浮點數包括第一浮點數和第二浮點數，所述指數處理單元支援第三位寬和第四位寬，所述第一浮點數的指數作為與所述第三位寬對應的第三輸入，所述第二浮點數的指數作為與所述第四位寬對應的第四輸入，所述第三輸入的位寬小於或等於所述第三位寬，所述第二控制電路用於當所述第四輸入的位寬大於所述第四位寬時，多次調用所述指數處理單元來獲得所述乘法運算後的指數。Clause A11. The multiplier according to any one of clauses A1-A10, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, and the exponent processing unit supports a third bit width and The fourth bit width, the exponent of the first floating point number is used as the third input corresponding to the third bit width, and the exponent of the second floating point number is used as the fourth input corresponding to the fourth bit width , The bit width of the third input is less than or equal to the third bit width, and the second control circuit is configured to call the third bit width multiple times when the fourth input bit width is greater than the fourth bit width. The exponent processing unit obtains the exponent after the multiplication operation.

條款A12，根據條款A1-A11任一項所述的乘法器，其中，所述兩個浮點數包括第一浮點數和第二浮點數，所述指數處理單元支援第三位寬和第四位寬，所述第一浮點數的指數作為與所述第三位寬對應的第三輸入，所述第二浮點數的指數作為與所述第四位寬對應的第四輸入，所述第二控制電路用於當所述第三輸入的位寬大於所述第三位寬且所述第四輸入的位寬小於或等於所述第四位寬時、當所述第四輸入的位寬大於所述第四位寬且所述第三輸入的位寬小於或等於所述第三位寬時或者當所述第三輸入的位寬大於所述第三位寬且所述第四輸入的位寬大於所述第四位寬時，多次調用所述指數處理單元來獲得所述乘法運算後的指數。Clause A12. The multiplier according to any one of clauses A1-A11, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, and the exponent processing unit supports a third bit width and The fourth bit width, the exponent of the first floating point number is used as the third input corresponding to the third bit width, and the exponent of the second floating point number is used as the fourth input corresponding to the fourth bit width The second control circuit is used for when the bit width of the third input is greater than the third bit width and the bit width of the fourth input is less than or equal to the fourth bit width, when the fourth bit width is When the bit width of the input is greater than the fourth bit width and the bit width of the third input is less than or equal to the third bit width or when the bit width of the third input is greater than the third bit width and the When the bit width of the fourth input is greater than the fourth bit width, the exponent processing unit is called multiple times to obtain the exponent after the multiplication operation.

條款A13，根據條款A1-A12任一項所述的乘法器，其中，當所述第一浮點數的指數位寬小於所述第二浮點數的指數位寬並且所述第三位寬大於所述第四位寬時，或者當所述第一浮點數的指數位寬大於所述第二浮點數的指數位寬並且所述第三位寬小於所述第四位寬時，所述第二控制電路選擇所述第一浮點數的指數作為與所述第四位寬對應的所述第四輸入並且選擇所述第二浮點數的指數作為與所述第三位寬對應的第三輸入。Clause A13, the multiplier according to any one of clauses A1-A12, wherein when the exponent bit width of the first floating-point number is smaller than the exponent bit width of the second floating-point number and the third bit width is larger When the fourth bit width, or when the exponent bit width of the first floating-point number is greater than the exponent bit width of the second floating-point number and the third bit width is smaller than the fourth bit width, The second control circuit selects the exponent of the first floating-point number as the fourth input corresponding to the fourth bit width and selects the exponent of the second floating-point number as the third bit width Corresponding third input.

條款A14，根據條款A1-A13任一項所述的乘法器，其中，所述第二控制電路用於當所述第三輸入的位寬小於或等於所述第四輸入的位寬且所述第三位寬小於或等於所述第四位寬時，根據所述第四輸入的位寬和所述第三位寬來確定調用所述指數處理單元的次數以及在每次調用中輸入所述指數處理單元的資料。Clause A14, the multiplier according to any one of clauses A1-A13, wherein the second control circuit is used when the bit width of the third input is less than or equal to the bit width of the fourth input and the When the third bit width is less than or equal to the fourth bit width, the number of times the exponent processing unit is called is determined according to the bit width of the fourth input and the third bit width, and the Data of the index processing unit.

條款A15，根據條款A1-A14任一項所述的乘法器，其中，所述指數處理單元還包括第二移位加法電路，所述第二移位加法電路用於根據每次調用所述指數處理單元所獲得的指數結果來獲得所述乘法運算後的指數。Clause A15, the multiplier according to any one of clauses A1-A14, wherein the exponent processing unit further includes a second shift and add circuit, and the second shift and add circuit is configured to call the exponent according to each time The exponent result obtained by the processing unit is used to obtain the exponent after the multiplication operation.

條款A16，根據條款A1-A15任一項所述的乘法器，其中，所述尾數處理單元包括部分積運算單元和部分積求和單元，其中所述部分積運算單元用於根據所述兩個浮點數的尾數獲得中間結果，所述部分積求和單元用於將所述中間結果進行加和運算以獲得加和結果，並將所述加和結果作為所述乘法運算後的尾數。Clause A16, the multiplier according to any one of clauses A1-A15, wherein the mantissa processing unit includes a partial product operation unit and a partial product summation unit, wherein the partial product operation unit is configured to The mantissa of the floating-point number obtains an intermediate result, and the partial product summation unit is configured to perform an addition operation on the intermediate result to obtain an addition result, and use the addition result as the mantissa after the multiplication operation.

條款A17，根據條款A1-A16任一項所述的乘法器，其中，所述部分積運算單元包括布斯編碼電路，所述布斯編碼電路用於對所述第一浮點數或所述第二浮點數的尾數進行布斯編碼處理，以獲得所述中間結果。Clause A17. The multiplier according to any one of clauses A1 to A16, wherein the partial product operation unit includes a Booth coding circuit, and the Booth coding circuit is configured to compare the first floating-point number or the The mantissa of the second floating-point number is subjected to Booth coding processing to obtain the intermediate result.

條款A18，根據條款A1-A17任一項所述的乘法器，其中，所述部分積求和單元包括加法器，所述加法器用於對所述中間結果進行加和，以獲得所述加和結果。Clause A18, the multiplier according to any one of clauses A1-A17, wherein the partial product summation unit includes an adder, and the adder is used to add the intermediate result to obtain the sum result.

條款A19，根據條款A1-A18任一項所述的乘法器，其中，所述部分積求和單元包括華萊士樹和加法器，其中所述華萊士樹用於對所述中間結果進行加和，以獲得第二中間結果，所述加法器用於對所述第二中間結果進行加和，以獲得所述加和結果。Clause A19, the multiplier according to any one of clauses A1 to A18, wherein the partial product summation unit includes a Wallace tree and an adder, and the Wallace tree is used to perform a calculation on the intermediate result And to obtain a second intermediate result, and the adder is used to add the second intermediate result to obtain the added result.

條款A20，根據條款A1-A19任一項所述的乘法器，其中，所述加法器包括全加器、串列加法器和超前進位加法器中的至少一種。Clause A20. The multiplier according to any one of clauses A1-A19, wherein the adder includes at least one of a full adder, a tandem adder, and a forward bit adder.

條款A21，根據條款A1-A20任一項所述的乘法器，其中，當所述中間結果的個數不足M個時，補充零值作為中間結果，使得所述中間結果的數量等於M，其中M為預設的正整數。Clause A21, the multiplier according to any one of clauses A1-A20, wherein, when the number of intermediate results is less than M, a zero value is added as an intermediate result, so that the number of intermediate results is equal to M, where M is a preset positive integer.

條款A22，根據條款A1-A21任一項所述的乘法器，其中，每個所述華萊士樹具有M個輸入和N個輸出，所述華萊士樹的數目不小於K,其中N為預設的小於M的正整數,K為不小於所述中間結果的最大位寬的正整數。Clause A22, the multiplier according to any one of clauses A1-A21, wherein each of the Wallace trees has M inputs and N outputs, and the number of Wallace trees is not less than K, where N Is a preset positive integer less than M, and K is a positive integer not less than the maximum bit width of the intermediate result.

條款A23，根據條款A1-A22任一項所述的乘法器，其中，所述部分積求和單元用於選用一組或多組所述華萊士樹對所述中間結果進行加和，其中每組所述華萊士樹有X個華萊士樹，X為所述中間結果的位數，其中各組內的所述華萊士樹之間存在依次進位的關係，而各組之間的華萊士樹不存在進位的關係。Clause A23, the multiplier according to any one of clauses A1-A22, wherein the partial product summation unit is used to select one or more groups of the Wallace trees to add the intermediate results, wherein Each group of the Wallace trees has X Wallace trees, and X is the number of bits of the intermediate result, wherein the Wallace trees in each group have a sequential carry relationship, and between the groups The Wallace tree does not have a carry relationship.

條款A24，根據條款A1-A23任一項所述的乘法器，其中，所述乘法器還包括：Clause A24. The multiplier according to any one of clauses A1-A23, wherein the multiplier further includes:

規格化處理單元，用於當所述兩個浮點數中的至少一個浮點數為非規格化的非零浮點數時，對所述至少一個浮點數進行規格化處理，以獲得對應的指數和尾數。A normalization processing unit, configured to perform normalization processing on the at least one floating-point number when at least one of the two floating-point numbers is a non-normalized non-zero floating-point number to obtain the corresponding The exponent and mantissa.

條款A25，根據條款A1-A24任一項所述的乘法器，其中，所述乘法器用於根據運算模式進行所述兩個浮點數的乘法運算，所述運算模式指示所述兩個浮點數的資料格式，所述尾數處理單元用於根據所述運算模式以及所述兩個浮點數的尾數來獲得所述乘法運算後的尾數，並且所述指數處理單元用於根據所述運算模式以及所述兩個浮點數的指數來獲得所述乘法運算後的指數。Clause A25, the multiplier according to any one of clauses A1-A24, wherein the multiplier is configured to perform a multiplication operation of the two floating-point numbers according to an operation mode, and the operation mode indicates the two floating-point numbers The mantissa processing unit is used to obtain the mantissa after the multiplication operation according to the operation mode and the mantissa of the two floating-point numbers, and the exponent processing unit is used to obtain the mantissa after the multiplication operation according to the operation mode And the exponent of the two floating-point numbers to obtain the exponent after the multiplication operation.

條款A26，根據條款A1-A25任一項所述的乘法器，所述規格化處理單元還用於根據所述運算模式，對所述兩個浮點數中的至少一個浮點數進行規格化處理，以獲得對應的指數和尾數。Clause A26, according to the multiplier of any one of clauses A1-A25, the normalization processing unit is further configured to normalize at least one of the two floating-point numbers according to the operation mode Process to obtain the corresponding exponent and mantissa.

條款A27，根據條款A1-A26任一項所述的乘法器，其中，所述資料格式包括半精度浮點數、單精確度浮點數、腦浮點數、雙精度浮點數、自訂浮點數中的至少一種。Clause A27, the multiplier according to any one of clauses A1-A26, wherein the data format includes half-precision floating-point numbers, single-precision floating-point numbers, brain floating-point numbers, double-precision floating-point numbers, custom At least one of floating point numbers.

條款A28，根據條款A1-A27任一項所述的乘法器，其中，所述尾數處理單元包括位數擴展電路，所述位數擴展電路用於對所述第一浮點數和所述第二浮點數中的至少一個的尾數進行位數擴展。Clause A28, the multiplier according to any one of clauses A1-A27, wherein the mantissa processing unit includes a bit number expansion circuit, and the bit number expansion circuit is configured to compare the first floating-point number and the first The mantissa of at least one of the two floating-point numbers is expanded by digits.

條款A29，根據條款A1-A28任一項所述的乘法器，其中，所述浮點數還包括符號，所述乘法器進一步包括：Clause A29. The multiplier according to any one of clauses A1-A28, wherein the floating-point number further includes a sign, and the multiplier further includes:

符號處理單元，用於根據所述兩個浮點數的符號獲得乘法運算後的符號。The sign processing unit is used to obtain the sign after the multiplication operation according to the sign of the two floating-point numbers.

條款A30，根據條款A1-A29任一項所述的乘法器，其中，所述符號處理單元包括異或邏輯電路，所述異或邏輯電路用於根據所述兩個浮點數的符號進行異或運算，獲得所述乘法運算後的符號。Clause A30. The multiplier according to any one of clauses A1-A29, wherein the sign processing unit includes an exclusive-or logic circuit, and the exclusive-or logic circuit is used to perform an exclusive operation according to the signs of the two floating-point numbers. Or operation to obtain the sign after the multiplication operation.

條款A31，根據條款A1-A30任一項所述的乘法器，進一步包括規則化單元，用於：Clause A31, the multiplier according to any one of clauses A1-A30, further includes a regularization unit for:

對所述乘法運算後的尾數和指數進行浮點數規則化處理，以獲得規則化指數結果和規則化尾數結果，並且將所述規則化指數結果和所述規則化尾數結果作為所述乘法運算後的指數和所述乘法運算後的尾數。Perform floating-point regularization processing on the mantissa and exponent after the multiplication operation to obtain a regularized exponent result and a regularized mantissa result, and use the regularized exponent result and the regularized mantissa result as the multiplication operation After the exponent and the mantissa after the multiplication operation.

條款A32，根據條款A1-A31任一項所述的乘法器，進一步包括：Clause A32, the multiplier according to any one of clauses A1-A31, further includes:

捨入單元，用於根據捨入模式對所述規則化尾數結果執行捨入操作以獲得捨入後的尾數，並將所述捨入後的尾數作為所述乘法運算後的尾數。The rounding unit is configured to perform a rounding operation on the regularized mantissa result according to a rounding mode to obtain a rounded mantissa, and use the rounded mantissa as the mantissa after the multiplication operation.

條款A33，一種使用乘法器執行浮點數乘法運算的方法，其中，Item A33, a method for performing floating-point number multiplication using a multiplier, where,

利用所述乘法器的尾數處理單元根據所述浮點數的尾數來獲得所述乘法運算後的尾數，Using the mantissa processing unit of the multiplier to obtain the mantissa after the multiplication operation according to the mantissa of the floating-point number,

條款A34，一種積體電路晶片，包括根據條款A1-A31的任意一項所述的乘法器。Clause A34, an integrated circuit chip including the multiplier according to any one of clauses A1-A31.

條款A35，一種計算裝置，包括根據條款A1-A31的任意一項所述的乘法器或根據條款A34所述的積體電路晶片。Clause A35, a computing device comprising the multiplier according to any one of clauses A1-A31 or the integrated circuit chip according to clause A34.

以上對本披露實施例進行了詳細介紹，本文中應用了具體個例對本披露的原理及實施方式進行了闡述，以上實施例的說明只是用於幫助理解本披露的方法及其核心思想；同時，對於本領域具有通常知識者，依據本披露的思想，在具體實施方式及應用範圍上均會有改變之處，綜上所述，本說明書內容不應理解為對本披露的限制。The embodiments of the disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the disclosure; at the same time, for Those with ordinary knowledge in the field, based on the ideas of this disclosure, will have changes in the specific implementation and scope of application. In summary, the content of this specification should not be construed as a limitation on this disclosure.

應當理解，本披露的申請專利範圍、說明書及附圖中的術語“第一”、“第二”、“第三”和“第四”等是用於區別不同物件，而不是用於描述特定順序。本披露的說明書和申請專利範圍中使用的術語“包括”和“包含”指示所描述特徵、整體、步驟、操作、元素和/或元件的存在，但並不排除一個或多個其它特徵、整體、步驟、操作、元素、元件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the scope of patent application, specification and drawings of this disclosure are used to distinguish different objects, not to describe specific order. The terms "including" and "comprising" used in the specification and the scope of the patent application of this disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or elements, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, elements, and/or collections thereof.

還應當理解，在此本披露說明書中所使用的術語僅僅是出於描述特定實施例的目的，而並不意在限定本披露。如在本披露說明書和申請專利範圍中所使用的那樣，除非上下文清楚地指明其它情況，否則單數形式的“一”、“一個”及“該”意在包括複數形式。還應當進一步理解，在本披露說明書和申請專利範圍中使用的術語“和/ 或”是指相關聯列出的項中的一個或多個的任何組合以及所有可能組合，並且包括這些組合。It should also be understood that the terms used in this disclosure specification are only for the purpose of describing specific embodiments, and are not intended to limit the disclosure. As used in this disclosure specification and the scope of the patent application, unless the context clearly indicates other circumstances, the singular forms of "a", "an" and "the" are intended to include plural forms. It should be further understood that the term "and/or" used in this disclosure specification and the scope of the patent application refers to any combination of one or more of the items listed in association and all possible combinations, and includes these combinations.

如在本說明書和申請專利範圍中所使用的那樣，術語“如果”可以依據上下文被解釋為“當... 時”或“一旦”或“回應於確定”或“回應於檢測到”。類似地，短語“如果確定”或“如果檢測到[所描述條件或事件]”可以依據上下文被解釋為意指“一旦確定”或“回應於確定”或“一旦檢測到[所描述條件或事件]”或“回應於檢測到[所描述條件或事件]”。As used in this specification and the scope of the patent application, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "response to determination" or "once detected [described condition or event]" depending on the context. Event]" or "in response to the detection of [the described condition or event]".

以上對本披露實施例進行了詳細介紹，本文中應用了具體個例對本披露的原理及實施方式進行了闡述，以上實施例的說明僅用於幫助理解本披露的方法及其核心思想。同時，本領域具有通常知識者依據本披露的思想，基於本披露的具體實施方式及應用範圍上做出的改變或變形之處，都屬於本披露保護的範圍。綜上所述，本說明書內容不應理解為對本披露的限制。The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementation manners of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, any changes or modifications made by those with ordinary knowledge in the field based on the ideas of this disclosure, the specific implementation and application scope of this disclosure, are all within the protection scope of this disclosure. In summary, the content of this specification should not be construed as a limitation of this disclosure.

100:浮點資料格式 102:符號位 104:指數位 106:尾數位 200、300、700:乘法器 202、720:指數處理單元 204:尾數處理單元 206、722:符號處理單元 308:模式選擇單元 312:部分積運算單元 314:部分積求和單元 316:控制電路 318、716:規則化單元 320:捨入單元 322:異或邏輯電路 324:規格化處理單元 400:尾數處理單元操作 402:布斯編碼電路 404:部分積產生電路 406:Wallace壓縮器 408、714:加法器 500:部分積 600:華萊士樹壓縮器的操作流程和示意框圖 702:模式選擇單元 704:規格化處理單元 706:位數擴展電路 708:布斯編碼器 710:部分積產生電路 712:華萊士樹壓縮器 800:使用乘法器執行浮點數乘法運算的方法 716:模式選擇單元 720:模式選擇單元 722:模式選擇單元 900:組合處理裝置 902:計算裝置 904:通用互聯介面 906:其他處理裝置 908:存儲裝置 1000:板卡 1002:晶片 1004:存儲器件 1006:介面裝置 1008:控制器件 1010:存儲單元 1012:外部設備 S802-S806:步驟100: floating point data format 102: sign bit 104: Exponent bit 106: Mantissa digit 200, 300, 700: multiplier 202, 720: Index processing unit 204: Mantissa Processing Unit 206, 722: symbol processing unit 308: Mode selection unit 312: Partial product operation unit 314: Partial product summation unit 316: control circuit 318, 716: Regularization Unit 320: rounding unit 322: XOR logic circuit 324: Normalized Processing Unit 400: Mantissa processing unit operation 402: Booth coding circuit 404: Partial product generation circuit 406: Wallace compressor 408, 714: adder 500: partial product 600: Operation process and schematic block diagram of Wallace Tree Compressor 702: Mode selection unit 704: Normalized Processing Unit 706: Digit Expansion Circuit 708: Booth Encoder 710: Partial product generation circuit 712: Wallace Tree Compressor 800: Method of performing floating-point number multiplication using a multiplier 716: mode selection unit 720: Mode selection unit 722: Mode Selection Unit 900: Combination processing device 902: computing device 904: Universal Interconnect Interface 906: other processing devices 908: storage device 1000: board 1002: chip 1004: storage device 1006: Interface device 1008: control device 1010: storage unit 1012: external equipment S802-S806: steps

[第1圖]顯示根據本披露實施例的浮點資料格式的示意圖。 [第2圖]顯示根據本披露實施例的乘法器的示意性結構框圖。 [第3圖]顯示根據本披露實施例的乘法器的更多細節的結構框圖。 [第4圖]顯示根據本披露實施例的尾數處理單元的示意性框圖。 [第5圖]顯示根據本披露實施例的部分積操作的示意圖。 [第6圖]顯示根據本披露實施例的華萊士樹壓縮器的操作流程和示意框圖。 [第7圖]顯示根據本披露實施例的乘法器的整體示意框圖。 [第8圖]顯示根據本披露實施例的使用乘法器執行浮點數乘法運算的方法的流程圖。 [第9圖]顯示根據本披露實施例的一種組合處理裝置的結構圖。 [第10圖]顯示根據本披露實施例的一種板卡的結構示意圖。[Figure 1] A schematic diagram showing a floating-point data format according to an embodiment of the present disclosure. [Figure 2] shows a schematic structural block diagram of a multiplier according to an embodiment of the present disclosure. [Figure 3] A block diagram showing more details of the multiplier according to an embodiment of the present disclosure. [Figure 4] A schematic block diagram showing a mantissa processing unit according to an embodiment of the present disclosure. [Figure 5] A schematic diagram showing the partial product operation according to an embodiment of the present disclosure. [Figure 6] shows the operation flow and schematic block diagram of the Wallace tree compressor according to an embodiment of the present disclosure. [Figure 7] shows an overall schematic block diagram of a multiplier according to an embodiment of the present disclosure. [Figure 8] A flowchart showing a method for performing floating-point number multiplication using a multiplier according to an embodiment of the present disclosure. [Figure 9] shows a structural diagram of a combined processing device according to an embodiment of the present disclosure. [Figure 10] A schematic diagram showing the structure of a board card according to an embodiment of the disclosure.

900:組合處理裝置900: Combination processing device

902:計算裝置902: computing device

904:通用互聯介面904: Universal Interconnect Interface

906:其他處理裝置906: other processing devices

908:存儲裝置908: storage device

Claims

A multiplier used for multiplication of floating-point numbers, wherein the multiplier includes: The mantissa processing unit is configured to obtain the mantissa after the multiplication operation according to the mantissa of the floating-point number, The mantissa processing unit includes a control circuit configured to call the mantissa multiple times when the bit width of at least one of the two floating-point numbers is greater than the data bit width that can be processed by the mantissa processing unit at one time Processing unit.

Such as the multiplier of claim 1, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, and the mantissa processing unit supports a first bit width and a second bit width, and the first The mantissa of the floating point number is used as the first input corresponding to the first bit width, the mantissa of the second floating point number is used as the second input corresponding to the second bit width, and the bit width of the first input Is less than or equal to the first bit width, and the control circuit is configured to call the mantissa processing unit multiple times to obtain the multiplication operation when the second input bit width is greater than the second bit width mantissa.

Such as the multiplier of claim 1, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, and the mantissa processing unit supports a first bit width and a second bit width, and the first The mantissa of the floating point number is used as the first input corresponding to the first bit width, the mantissa of the second floating point number is used as the second input corresponding to the second bit width, and the control circuit is used for When the bit width of the first input is greater than the first bit width and the bit width of the second input is less than or equal to the second bit width, when the bit width of the second input is greater than the second bit width And the bit width of the first input is less than or equal to the first bit width or when the bit width of the first input is greater than the first bit width and the bit width of the second input is greater than the second bit width. When bit width, the mantissa processing unit is called multiple times to obtain the mantissa after the multiplication operation.

Such as the multiplier of claim 3, wherein, when the mantissa bit width of the first floating-point number is smaller than the mantissa bit width of the second floating-point number and the first bit width is greater than the second bit width, Or when the mantissa bit width of the first floating-point number is greater than the mantissa bit width of the second floating-point number and the first bit width is smaller than the second bit width, the control circuit selects the first The mantissa of the floating point number is used as the second input corresponding to the second bit width and the mantissa of the second floating point number is selected as the first input corresponding to the first bit width.

Such as the multiplier of claim 4, wherein, when the bit width of the first input is greater than the first bit width and the bit width of the second input is less than or equal to the second bit width, the control circuit The number of invocations of the mantissa processing unit and the input of data of the mantissa processing unit in each call are determined according to the bit width of the first input and the first bit width.

Such as the multiplier of claim 4, wherein, when the bit width of the second input is greater than the second bit width and the bit width of the first input is less than or equal to the first bit width, the control circuit According to the bit width of the second input and the second bit width, the number of times the mantissa processing unit is called and the data of the mantissa processing unit are input in each call.

Such as the multiplier of claim 4, wherein, when the bit width of the first input is greater than the first bit width and the bit width of the second input is greater than the second bit width, the control circuit The bit width of the first input and the first bit width and the bit width of the second input and the second bit width are used to determine the number of times the mantissa processing unit is called and the mantissa is input in each call Processing unit data.

For example, the multiplier of any one of claims 2 to 7, wherein the mantissa processing unit further includes a shift and add circuit, and the shift and add circuit is used to calculate the mantissa result obtained by calling the mantissa processing unit each time Obtain the mantissa after the multiplication operation.

For example, the multiplier of claim 8, wherein the shift and add circuit includes a shifter, an intermediate memory, and an adder. When the control circuit calls the mantissa processing unit multiple times, after the first call, the The shifter shifts the mantissa result obtained in the first call to obtain the shifted mantissa result and stores the shifted mantissa result in the intermediate memory. Starting from the second call, the shift The device shifts the mantissa result obtained in the current call to obtain the current mantissa result, the adder adds the current mantissa result to the result stored in the intermediate memory and stores the added result The intermediate memory is updated in the intermediate memory, and the result stored in the intermediate memory after the last call is used as the mantissa after the multiplication operation.

Such as the multiplier of claim 1, wherein the multiplier further includes an exponent processing unit configured to obtain the exponent after the multiplication operation according to the exponents of the two floating-point numbers, the exponent The processing unit includes a second control circuit, the second control circuit is configured to use one of the exponent bit width of one of the two floating-point numbers and one of the two bit widths supported by the exponent processing unit, or according to the exponent bit width of one of the two floating-point numbers. The exponent bit width of the two floating-point numbers and the two bit width supported by the exponent processing unit determine that the exponent processing unit is called multiple times to obtain the exponent after the multiplication operation.

For example, the multiplier of claim 10, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, the exponent processing unit supports a third bit width and a fourth bit width, and the first The exponent of the floating point number is used as the third input corresponding to the third bit width, the exponent of the second floating point number is used as the fourth input corresponding to the fourth bit width, and the bit width of the third input is Less than or equal to the third bit width, and the second control circuit is configured to call the exponent processing unit multiple times to obtain the multiplication operation when the bit width of the fourth input is greater than the fourth bit width After the index.

For example, the multiplier of claim 10, wherein the two floating-point numbers include a first floating-point number and a second floating-point number, the exponent processing unit supports a third bit width and a fourth bit width, and the first The exponent of the floating point number is used as the third input corresponding to the third bit width, the exponent of the second floating point number is used as the fourth input corresponding to the fourth bit width, and the second control circuit is used for When the bit width of the third input is greater than the third bit width and the bit width of the fourth input is less than or equal to the fourth bit width, when the bit width of the fourth input is greater than the fourth bit width Bit width and the bit width of the third input is less than or equal to the third bit width or when the bit width of the third input is greater than the third bit width and the bit width of the fourth input is greater than the When the fourth bit is wide, the exponent processing unit is called multiple times to obtain the exponent after the multiplication operation.

Such as the multiplier of claim 12, wherein, when the exponent bit width of the first floating-point number is smaller than the exponent bit width of the second floating-point number and the third bit width is greater than the fourth bit width, Or when the exponent bit width of the first floating-point number is greater than the exponent bit width of the second floating-point number and the third bit width is smaller than the fourth bit width, the second control circuit selects the The exponent of the first floating point number is used as the fourth input corresponding to the fourth bit width and the exponent of the second floating point number is selected as the third input corresponding to the third bit width.

Such as the multiplier of claim 13, wherein the second control circuit is used when the bit width of the third input is less than or equal to the bit width of the fourth input and the third bit width is less than or equal to the When the fourth bit is wide, the number of invocations of the index processing unit and the data of the index processing unit are input in each invocation according to the bit width of the fourth input and the third bit width.

For example, the multiplier of any one of claims 11 to 14, wherein the exponent processing unit further includes a second shift and add circuit, and the second shift and add circuit is used to obtain the value obtained by calling the exponent processing unit each time The result of the exponent is used to obtain the exponent after the multiplication operation.

For example, the multiplier of claim 1, wherein the mantissa processing unit includes a partial product operation unit and a partial product summation unit, wherein the partial product operation unit is used to obtain an intermediate result according to the mantissa of the two floating-point numbers, The partial product summation unit is configured to perform an addition operation on the intermediate result to obtain an addition result, and use the addition result as the mantissa after the multiplication operation.

For example, the multiplier of claim 16, wherein the partial product operation unit includes a Booth coding circuit, and the Booth coding circuit is used to distribute the mantissa of the first floating-point number or the second floating-point number Encoding process to obtain the intermediate result.

Such as the multiplier of claim 17, wherein the partial product summation unit includes an adder, and the adder is used to add the intermediate result to obtain the added result.

Such as the multiplier of claim 17, wherein the partial product summation unit includes a Wallace tree and an adder, wherein the Wallace tree is used to add the intermediate results to obtain a second intermediate result , The adder is used to add the second intermediate result to obtain the added result.

Such as the multiplier of claim 18 or 19, wherein the adder includes at least one of a full adder, a tandem adder, and an advance bit adder.

For example, the multiplier of claim 19, wherein when the number of intermediate results is less than M, zero is added as an intermediate result, so that the number of intermediate results is equal to M, where M is a preset positive integer.

For example, the multiplier of claim 21, wherein each of the Wallace trees has M inputs and N outputs, and the number of the Wallace trees is not less than K, where N is a preset positive integer less than M , K is a positive integer not less than the maximum bit width of the intermediate result.

For example, the multiplier of claim 22, wherein the partial product summation unit is used to select one or more groups of the Wallace trees to add the intermediate results, wherein each group of the Wallace trees has X Wallace trees, X is the number of digits of the intermediate result, wherein the Wallace trees in each group have a sequential carry relationship, and the Wallace trees between each group do not have a carry relationship.

Such as the multiplier of claim 10, wherein the multiplier further includes: A normalization processing unit, configured to perform normalization processing on the at least one floating-point number when at least one of the two floating-point numbers is a non-normalized non-zero floating-point number to obtain the corresponding The exponent and mantissa.

Such as the multiplier of claim 10, wherein the multiplier is used to perform the multiplication operation of the two floating-point numbers according to the operation mode, the operation mode indicates the data format of the two floating-point numbers, and the mantissa is processed The unit is used to obtain the mantissa after the multiplication operation according to the operation mode and the mantissa of the two floating-point numbers, and the exponent processing unit is used to obtain the mantissa after the multiplication operation according to the operation mode and the mantissa of the two floating-point numbers. Exponent to obtain the exponent after the multiplication operation.

For example, the multiplier of claim 25, wherein the normalization processing unit is further configured to perform normalization processing on at least one of the two floating-point numbers according to the operation mode to obtain a corresponding exponent And mantissa.

For example, the multiplier of request 26, wherein the data format includes at least one of half-precision floating-point numbers, single-precision floating-point numbers, brain floating-point numbers, double-precision floating-point numbers, and custom floating-point numbers.

For example, the multiplier of claim 17, wherein the mantissa processing unit includes a bit number expansion circuit, and the bit number expansion circuit is used to calculate at least one of the first floating-point number and the second floating-point number. The mantissa is expanded by digits.

Such as the multiplier of claim 1, wherein the floating-point number further includes a sign, and the multiplier further includes: The sign processing unit is used to obtain the sign after the multiplication operation according to the sign of the two floating-point numbers.

For example, the multiplier of claim 29, wherein the sign processing unit includes an exclusive OR logic circuit, and the exclusive OR logic circuit is used to perform an exclusive OR operation according to the signs of the two floating-point numbers to obtain the multiplication operation symbol.

For example, the multiplier of claim 25 further includes a regularization unit for: Perform floating-point regularization processing on the mantissa and exponent after the multiplication operation to obtain a regularized exponent result and a regularized mantissa result, and use the regularized exponent result and the regularized mantissa result as the multiplication operation After the exponent and the mantissa after the multiplication operation.

For example, the multiplier of claim 31 further includes: The rounding unit is configured to perform a rounding operation on the regularized mantissa result according to a rounding mode to obtain a rounded mantissa, and use the rounded mantissa as the mantissa after the multiplication operation.

A method of using a multiplier to perform floating-point number multiplication, where, Using the mantissa processing unit of the multiplier to obtain the mantissa after the multiplication operation according to the mantissa of the floating-point number, The mantissa processing unit includes a control circuit configured to call the mantissa multiple times when the bit width of at least one of the two floating-point numbers is greater than the data bit width that can be processed by the mantissa processing unit at one time Processing unit.

An integrated circuit chip comprising the multiplier described in any one of claim items 1 to 32.

A computing device, comprising the multiplier described in any one of claim items 1-32 or the integrated circuit chip described in claim 34.