TWI489375B

TWI489375B - Carryless multiplication apparatus and method

Info

Publication number: TWI489375B
Application number: TW100136024A
Authority: TW
Inventors: Timothy A Elliott
Original assignee: Via Tech Inc
Priority date: 2010-12-03
Filing date: 2011-10-05
Publication date: 2015-06-21
Also published as: TW201224916A; CN102360276B; CN102360276A

Description

Carry-free multiplication device and processing method thereof

本發明係有關於一種微電子，特別是有關於一種用以進行一無進位乘法運算的技術。 This invention relates to a microelectronic, and more particularly to a technique for performing a carry-free multiplication operation.

在目前大多的通訊中，可對通訊資料進行加密。有效的加密方法從簡單的認證，到使用對稱關鍵加密技術的散列編碼訊息(hashed enciphered message)都可採行。在對稱關鍵加密技術中，較為常見的操作模式係為加洛瓦計數器模式(Galois/Counter Mode；以下簡稱GCM)。GCM可對一訊息進行加密及認證。 In most current communications, communication data can be encrypted. Efficient encryption methods can be used from simple authentication to hashed enciphered messages using symmetric key cryptography. In the symmetric key encryption technology, the more common operation mode is Galois Counter Mode (GCM). GCM can encrypt and authenticate a message.

本領域人士均深知，GCM係結合計數器模式的加密技術以及近來被開發出的Galois模式的認證技術。在GCM中，係利用加洛瓦場(Galois field)的乘法運算以進行認證。雖然加洛瓦場的乘法運算並非本案所欲討論的範圍，但加洛瓦場的乘法運算係為一無進位乘法。 It is well known in the art that GCM is combined with a counter mode encryption technique and a recently developed Galois mode authentication technology. In GCM, the multiplication of the Galois field is used for authentication. Although the multiplication of the Galloway field is not the scope of the case, the multiplication of the Garowa field is a carry-free multiplication.

一般而言，無進位乘法係為二進制多項式乘法，並且亦為估算兩運算元的乘積的數學運算，而且又不會產生或多出進位位元。事實上，INTEL公司已提供一指令(如PCLMULQDQ)，其可控制x86相容的微處理器執行此功能。 In general, the carry-free multiplication is a binary polynomial multiplication, and is also a mathematical operation that estimates the product of the two operands, and does not produce or add more carry bits. In fact, INTEL has provided an instruction (such as PCLMULQDQ) that can control x86 compatible microprocessors to perform this function.

因此，當微處理器的設計者修改原本的設計，用以提供更多的功能時，必須一併考慮到無進位的乘法運算。這是一個簡單的操作，但本領域技術人員均深知，必需利用許多的硬體才能做到無進位的乘法運算。舉例而言，在64 位元的無進位乘法操作中，將會產生64個部分乘積。將64個部分乘積作互斥或(XOR)運算後，便可得到一具128位元的最終結果。在目前大部分的微處理器設計中，並沒有任何單元或是邏輯可執行這樣的運算。然而，在大多的微處理器中，大多具有至少一個乘法單元，用以進行一般的乘法運算。 Therefore, when the designer of the microprocessor modifies the original design to provide more functionality, the multiply-free multiplication must be considered together. This is a simple operation, but those skilled in the art are well aware that it is necessary to utilize a large number of hardware in order to achieve a carry-free multiplication operation. For example, at 64 In the carry-less multiplication operation of the bit, 64 partial products will be produced. After the 64 partial products are mutually exclusive or (XOR), a final result of 128 bits is obtained. In most current microprocessor designs, there is no unit or logic that can perform such operations. However, in most microprocessors, there are at least one multiplication unit for performing general multiplication operations.

近年來，發展出許多的改良，讓目前的乘法單元可執行的更快。舉例而言，布斯編碼(Booth encoding)就是一種常見的技術。在乘法運算中，布斯編碼可減少一半的部分乘積。華萊士樹(Wallace tree)也是一種常見的技術，用以加總布斯編碼所產生的部分乘積。 In recent years, many improvements have been developed to make current multiplication units more executable. For example, Booth encoding is a common technique. In multiplication, Booth coding reduces the partial product by half. The Wallace tree is also a common technique used to sum up the partial product produced by the Booth code.

雖然具有較佳的效能，但是上述的技術將會產生或增加進位。因此，目前的乘法單元完全無法適用於一無進位乘法運算中。 Although with better performance, the above techniques will produce or increase carry. Therefore, the current multiplication unit is completely unsuitable for a carry-free multiplication operation.

為了解決上述缺點，本案發明人發現，最好能夠盡量使用原本的硬體，以避免增加電源損耗以及元件數量。另外，從除錯及測試的觀點來看，最好能夠使用原本的硬體架構，使其達到不同的功能。 In order to solve the above disadvantages, the inventors of the present invention found that it is preferable to use the original hardware as much as possible to avoid an increase in power loss and the number of components. In addition, from the point of view of debugging and testing, it is best to use the original hardware architecture to achieve different functions.

因此，必須提供一種裝置及方法，用以在一處理器或其它裝置中，進行一無進位乘法運算，並且大量使用原本的硬體元件。 Therefore, it is necessary to provide an apparatus and method for performing a carry-free multiplication operation in a processor or other device and using a large amount of original hardware components.

另外，需要一個能夠進行二進制的無進位乘法運算的乘法單元，又不需要對原本的乘法單元進行太多的修改。 In addition, a multiply unit capable of binary carry-less multiplication is required, and there is no need to make too many modifications to the original multiply unit.

本發明可解決上述問題，並且可滿足習知技術的其它問題、缺點及限制。本發明提供一種較優先的技術，其可在一處理器或其它裝置中，使用習知的布斯硬體，進行一無進位乘法操作。在一可能實施例中，本發明提供一種無進位乘法裝置，用以進行一無進位乘法運算，其包括一無進位預先格式單元、一布斯編碼器、一壓縮器、一左移器以及一互斥或閘。無進位預先格式單元接收一乘數運算元，並將乘數運算元格式化成複數部分。布斯編碼器接收並判斷該等部分，並選擇一被乘數運算元的複數第一部分乘積。藉由該等部分，可避免被乘數運算元的複數第二部分乘積被選擇。第二部分乘積會造成一進位現象。壓縮器耦接布斯編碼器，用以透過複數進位儲存加法器，加總第一部分乘積。進位儲存加法器產生複數加總位元以及複數進位位元。進位儲存加法器以一華萊士樹架構排列。在執行無進位乘法運算時，進位位元不被致能。左移器耦接壓縮器，用以將壓縮器的輸出左移至少一位元。互斥或閘耦接壓縮器以及左移器，用以進行一互斥或運算，並產生一無進位乘法結果。 The present invention solves the above problems and can satisfy other problems, disadvantages and limitations of the prior art. The present invention provides a prioritized technique for performing a carry-free multiplication operation using a conventional Booth hardware in a processor or other device. In a possible embodiment, the present invention provides a carry-free multiply device for performing a carry-free multiplication operation, comprising a carry-free preformat unit, a Booth encoder, a compressor, a left shifter, and a Mutually exclusive or gate. The carry-free preformat unit receives a multiplier operand and formats the multiplier operand into a complex part. The Booth encoder receives and determines the portions and selects the complex first partial product of a multiplicand operand. With these parts, it is avoided that the complex second partial product of the multiplicand operand is selected. The second part of the product will cause a carry phenomenon. The compressor is coupled to the Buss encoder to store the adder through the complex carry and add the first partial product. The carry store adder generates a complex sum bit and a complex carry bit. The carry storage adders are arranged in a Wallace tree architecture. The carry bit is not enabled when performing a carry-free multiplication operation. The left shifter is coupled to the compressor to shift the output of the compressor to the left by at least one bit. The mutex or gate is coupled to the compressor and the left shifter for performing a mutually exclusive OR operation and producing a carry-free multiplication result.

本發明另提供一種方法，用以執行一無進位乘法運算，其包括：在一處理器內的一乘法單元中，將一乘數運算元格式化成複數部分；透過一布斯編碼器，判斷該等部分，並選擇一被乘數運算元的複數第一部分乘積其中藉由該等部分，可避免被乘數運算元的複數第二部分乘積被選擇，該等第二部分乘積會造成一進位現象；透過複數進位儲存加法器，處理該等第一部分乘積，用以產生複數加總位元以及複數進位位元，其中該等進位儲存加法器以一華萊士樹架構排列，並且在執行該無進位乘法運算時，不致能該等進位位元；將該華萊士樹的輸出，左移至少一位元；以及對該華萊士樹的輸出進行一互斥或運算，用以產生一無進位乘法結果。 The present invention further provides a method for performing a carry-free multiplication operation, comprising: formatting a multiplier operation element into a complex part in a multiplication unit in a processor; determining the And selecting a multiplicative first partial product of a multiplicand operator, wherein by means of the portions, the second partial product of the multiplicand operator is prevented from being selected, and the second partial product causes a carry phenomenon Processing the first partial product through a complex carry storage adder to generate a complex total a bit and a complex carry bit, wherein the carry store adders are arranged in a Wallace tree architecture, and the carry bit is not enabled when performing the carry-less multiplication operation; the output of the Wallace tree Shifting at least one bit to the left; and performing a mutually exclusive OR operation on the output of the Wallace tree to produce a carry-free multiplication result.

在工業領域中，本發明可實現在一微處理器之中，並且該微處理器可應用在一般功能或是特殊功能的電腦裝置中。 In the industrial field, the present invention can be implemented in a microprocessor, and the microprocessor can be applied to a computer device of a general function or a special function.

本發明可解決上述問題，並且可滿足習知技術的其它問題、缺點及限制。本發明提供一種較優先的技術，其可在一處理器或其它裝置中，使用習知的布斯硬體，進行一無進位乘法操作。在一可能實施例中，本發明提供一種無進位乘法裝置，用以進行一無進位乘法運算，其包括一第一運算元暫存器、一第二運算元暫存器、一操作碼偵測器、一無進位預先格式單元、一壓縮器、一左移器以及一互斥或閘。第一及第二運算元暫存器分別接收一第一運算元以及一第二運算元，用以進行無進位乘法運算。操作碼偵測器接收一無進位乘法指令，並根據無進位乘法指令，致能一無進位信號。當無進位信號被致能時，無進位預先格式單元將第一運算元格式化成複數部分。一布斯編碼器藉由該等部分，避免選擇到第二運算元的複數第二部分乘積。第二部分乘積會造成一進位現象。壓縮器透過複數進位儲存加法器，加總第二運算元的複數第一部分乘積。進位儲存加法器產生複數加總位元以及複數進位位元。進位儲存加法器以一華萊士樹架構排列。當無進位信號被致能時，進位位元不被致能。左移器耦接壓縮器，用以將壓縮器的輸出左移至少一位元。互斥或閘耦接壓縮器以及左移器，用以進行一互斥或運算，並產生一無進位乘法結果。 The present invention solves the above problems and can satisfy other problems, disadvantages and limitations of the prior art. The present invention provides a prioritized technique for performing a carry-free multiplication operation using a conventional Booth hardware in a processor or other device. In a possible embodiment, the present invention provides a carry-free multiply device for performing a carry-free multiplication operation, including a first operand register, a second operand register, and an opcode detection. , a carry-free preformat unit, a compressor, a left shifter, and a mutex or gate. The first and second operand registers respectively receive a first operand and a second operand for performing a carry-free multiplication operation. The opcode detector receives a carry-free multiply instruction and enables a carry-free signal based on the carry-free multiply instruction. When no carry signal is enabled, the no pre-format unit formats the first operand into a complex portion. With a portion of the Buss encoder, the selection of the complex second partial product of the second operand is avoided. The second part of the product will cause a carry phenomenon. The compressor stores the adder of the first operand of the second operand through a complex carry store adder. The carry store adder generates a complex sum bit and a complex carry bit. The carry storage adders are arranged in a Wallace tree architecture. When no carry signal is enabled, The carry bit is not enabled. The left shifter is coupled to the compressor to shift the output of the compressor to the left by at least one bit. The mutex or gate is coupled to the compressor and the left shifter for performing a mutually exclusive OR operation and producing a carry-free multiplication result.

本發明提供一種方法，用以進行一無進位乘法運算，包括：在一處理器內的一乘法單元中，接收一第一運算元以及一第二運算元，用以進行無進位乘法運算；根據一無進位乘法指令，致能一無進位信號；當無進位信號被致能時，將第一運算元格式化成複數部分，其中一布斯編碼器藉由該等部分，避免選擇到該第二運算元的複數第二部分乘積，該等第二部分乘積會造成一進位現象；透過複數進位儲存加法器，加總該第二運算元的複數第一部分乘積，該等進位儲存加法器產生複數加總位元以及複數進位位元，其中該等進位儲存加法器以一華萊士樹架構排列，當該無進位信號被致能時，該等進位位元不被致能；將該華萊士樹的輸出，左移至少一位元；以及對該華萊士樹的輸出進行一互斥或運算，用以產生一無進位乘法結果。 The present invention provides a method for performing a carry-free multiplication operation, comprising: receiving, in a multiplication unit in a processor, a first operation element and a second operation element for performing a carry-free multiplication operation; A carry-free multiply instruction enables a carry-free signal; when no carry signal is enabled, the first operand is formatted into a complex portion, wherein a Booth encoder avoids selecting the second by the portions The second partial product of the operands, the second partial product causing a carry phenomenon; the complex first load storage adder adds the first partial product of the second operational element, and the carry storage adder generates a complex addition a total bit and a plurality of carry bits, wherein the carry store adders are arranged in a Wallace tree structure, and when the carry signal is enabled, the carry bits are not enabled; the Wallace is enabled The output of the tree, shifted to the left by at least one element; and a mutually exclusive OR operation on the output of the Wallace tree to produce a carry-free multiplication result.

本發明可解決上述問題，並且可滿足習知技術的其它問題、缺點及限制。本發明提供一種較優先的技術，其可在一處理器或其它裝置中，使用習知的布斯硬體，進行一無進位乘法操作。在一可能實施例中，本發明提供一種裝置，用以進行一無進位乘法運算。本發明之裝置包括一操作碼偵測器以及一無進位預先格式單元。操作碼偵測器接收一無進位乘法指令，並根據無進位乘法指令，致能一無進位信號。當無進位信號被致能時，無進位預先格式單元將第一運算元格式化成複數部分。一布斯編碼器藉由該等部分，可選擇一第二運算元的複數第一部分乘積，並且避免選擇到該第二運算元的複數第二部分乘積，該等第二部分乘積會造成一進位現象。該等第一部分乘積進行一互斥或運算，用以產生一無進位乘法結果。 The present invention solves the above problems and can satisfy other problems, disadvantages and limitations of the prior art. The present invention provides a prioritized technique for performing a carry-free multiplication operation using a conventional Booth hardware in a processor or other device. In a possible embodiment, the present invention provides an apparatus for performing a carry-free multiplication operation. The apparatus of the present invention includes an opcode detector and a carry-free preformat unit. Opcode detector A no carry multiplication instruction is received, and according to the no carry multiplication instruction, a carry signal is enabled. When no carry signal is enabled, the no pre-format unit formats the first operand into a complex portion. A Booth encoder can select a complex first partial product of a second operational element by means of the portions and avoid selecting a complex second partial product of the second operational element, the second partial product causing a carry phenomenon. The first partial products are subjected to a mutually exclusive OR operation to produce a carry-free multiplication result.

本發明提供一種方法，用以執行一無進位乘法運算。本發明之方法包括，在一處理器內的一乘法單元中，接收一無進位乘法指令，並與一第一運算元以及一第二運算元，一起進行無進位乘法運算；根據無進位乘法指令，致能一無進位信號；以及當無進位信號被致能時，將第一運算元格式化成複數部分，其中一布斯編碼器藉由該等部分，選擇第二運算元的複數第一部分乘積，並且避免選擇到第二運算元的複數第二部分乘積，該等第二部分乘積會造成一進位現象；該等第一部分乘積進行一互斥或運算，用以產生一無進位乘法結果。 The present invention provides a method for performing a carry-free multiplication operation. The method of the present invention includes receiving a carry-in multiplication instruction in a multiplication unit in a processor, and performing a carry-free multiplication operation together with a first operation element and a second operation element; according to the carry-free multiplication instruction , enabling a carry-free signal; and when the no-carry signal is enabled, formatting the first operand into a complex portion, wherein a Buss encoder selects a complex first partial product of the second operand by the portions And avoiding selecting a complex second partial product of the second operand, the second partial product causing a carry phenomenon; the first partial products undergoing a mutually exclusive OR operation to produce a carryless multiplication result.

為讓本發明之特徵和優點能更明顯易懂，下文特舉出較佳實施例，並配合所附圖式，作詳細說明如下： In order to make the features and advantages of the present invention more comprehensible, the preferred embodiments are described below, and are described in detail with reference to the accompanying drawings.

本領域之技術人員可根據以下的內容，在一特定應用的範圍及要求下，製造使用本發明。另外，本領域之技術人員亦可根據以下的內容，作些微的修改，進而推出其它實施例。因此，本發明的並不限定在以下的特定實施例，但本發明的最大範圍係符合原理以及新穎特徵。 Those skilled in the art can use the following content in a specific application. The invention is manufactured and used under the scope and requirements. In addition, those skilled in the art can make minor modifications according to the following contents, and further introduce other embodiments. Therefore, the present invention is not limited to the specific embodiments described below, but the maximum scope of the invention is in accordance with the principles and novel features.

有鑑於上述乘法及無進位乘法運算的背景討論，以及處理器產生乘法結果的技術，將藉由第1-3圖，說明裝置的限制。接著，將在第4-7圖說明本發明係如何解決目前習知的乘法裝置的缺點及限制，並說明本發明如何利用原本進行一般乘法操作的硬體架構，進行一無進位乘法操作。 In view of the background discussion of the above multiplication and carry-less multiplication operations, and the technique by which the processor produces multiplication results, the limitations of the apparatus will be illustrated by Figures 1-3. Next, how the present invention solves the shortcomings and limitations of the conventional multiplying apparatus will be explained in Figs. 4-7, and how the present invention performs a carry-less multiplication operation using the hardware architecture originally performing the general multiplication operation.

第1圖係為64位元乘法單元之一可能實施例。64位元乘法單元100可應用在一微處理器或其它裝置中。乘法單元100具有一第一運算元暫存器(operand register)101。第一運算元暫存器101耦接一布斯編碼器(Booth encoder)104。乘法單元100具有一第二運算元暫存器102。第二運算元暫存器102耦接一部分乘積產生器103。布斯編碼器104與部分乘積產生器103均耦接布斯多工器105。布斯多工器105透過匯流排PARTPROD，耦接到壓縮器106。壓縮器106具有複數進位儲存加法器(carry-save adder；CSA)108。進位儲存加法器108係以一習知的華萊士樹(Wallace Tree)架構排列，用以降低加總多個部分乘積時的傳遞延遲(propagation delay)。壓縮器106透過匯流排CARRIES及SUMS，耦接一全加法器109。全加法器109透過匯流排RESULT，輸出一乘法結果，該乘法結果係為2的補數，並具有128位元。為了使乘法單元100產生最終的128位元乘積，一乘積同步器107產生一同步信號CLK。為了同步化乘法單元100內的操作，使乘法單元100產生最終128位元乘積，同步信號CLK被傳送至布斯編碼器104及壓縮器106。 Figure 1 is a possible embodiment of a 64-bit multiplying unit. The 64-bit multiplying unit 100 can be applied to a microprocessor or other device. The multiplication unit 100 has a first operand register 101. The first operand register 101 is coupled to a Boots encoder 104. The multiplication unit 100 has a second operand register 102. The second operand register 102 is coupled to a portion of the product generator 103. Both the Buss encoder 104 and the partial product generator 103 are coupled to the Buss multiplexer 105. The Buss multiplexer 105 is coupled to the compressor 106 via a bus bar PARTPROD. Compressor 106 has a carry-save adder (CSA) 108. The carry storage adder 108 is arranged in a conventional Wallace Tree architecture to reduce the propagation delay when summing a plurality of partial products. The compressor 106 is coupled to a full adder 109 via the bus bars CARRIES and SUMS. The full adder 109 outputs a multiplication result through the bus bar RESULT, the multiplication result being a 2's complement and having 128 bits. In order for the multiplication unit 100 to produce the final 128-bit product, a product synchronizer 107 generates a synchronization signal CLK. In order to synchronize the operations within the multiplying unit 100, the multiplying unit 100 is caused to produce the final 128-bit product, and the synchronization signal CLK is transmitted to the Buss encoder 104 and the compressor 106.

在操作時，一指令(未顯示)會直接地或間接地，或是分成兩運算元，傳送至乘法單元100。因此，一具有64位元的乘數運算元OP A會被提供至第一運算元暫存器101，並且一具有64位元的被乘數運算元OP B會被提供至第二運算元暫存器102。乘數運算元OP A及被乘數運算元OP B均為2的補數(complement)格式。一般而言，64位元的運算元較為常見，故暫存器101及102均為64位元暫存器。然而，在其它的乘法單元架構中，亦可使用其它位元數量的暫存器。舉例而言，本領域人士均深知，在64位元乘法中，可將兩個64位元的運算元分成四個32位元的運算元，乘法單元100利用已知的技術及裝置，處理四個32位元的運算元，以得到一乘積結果。 In operation, an instruction (not shown) is transmitted to the multiplication unit 100 either directly or indirectly, or in two operands. Therefore, a multiplier operand OP A having 64 bits is supplied to the first operand register 101, and a multiplicand operand OP B having 64 bits is supplied to the second operand. The memory 102. The multiplier operand OP A and the multiplicand operand OP B are both 2 complement formats. In general, 64-bit operands are more common, so registers 101 and 102 are 64-bit scratchpads. However, in other multiplying cell architectures, other bit number registers can also be used. For example, it is well known in the art that in 64-bit multiplication, two 64-bit operands can be divided into four 32-bit operands, and multiplication unit 100 processes using known techniques and devices. Four 32-bit operands to get a product result.

本領域人士均深知，乘法單元100大多利用布斯編碼技術，降低部分乘積的個數，將所有部分乘積加總在一起，便可得到一最終乘積。一般而言，布斯編碼器104係為一3位元布斯編碼器。藉由連續操作布斯編碼器104，便可產生多個部分乘積。該等部分乘積係為基底-4(radix-4)的乘法結果。因此，可降低部分乘積的個數。加總該等部分乘積，便可得到一最終結果。因此，藉由同步信號CLK的同步，布斯編碼器104判斷乘數運算元OP A的連續3位元資料段的數值，並透過匯流排PPSEL，控制布斯多工器105，用以從五個選擇信號中，選擇一者。匯流排PPSEL上的信號控制布斯多工器105，用以從五個部分乘積中，選擇一者，其中這五個部分乘積與被乘數運算元OP B有關。這五個部分乘積都是由部分乘積產生器103所產生。部分乘積產生器103將被乘數運算元OP B乘上0，用以產生部分乘積0。部分乘積產生器103將被乘數運算元OP B乘上+1，用以產生部分乘積B。部分乘積產生器103將被乘數運算元OP B乘上-1，用以產生部分乘積-B。部分乘積產生器103將被乘數運算元OP B乘上+2，用以產生部分乘積2B。部分乘積產生器103將被乘數運算元OP B乘上-2，用以產生部分乘積-2B。本領域人士均深知，只要部分乘積產生器103求出被乘數運算元OP B的補數、或是將被乘數運算元OP B往左移、或是先取被乘數運算元OP B的補數，再將補數結果往左移，便可得到上述五個部分乘積(-2B、-B、0、B、2B)。 It is well known in the art that the multiplication unit 100 mostly uses the Booth coding technique to reduce the number of partial products and add all the partial products together to obtain a final product. In general, the Booth encoder 104 is a 3-bit Buss encoder. By continuously operating the Buss encoder 104, a plurality of partial products can be generated. These partial products are the result of multiplication of substrate-4 (radix-4). Therefore, the number of partial products can be reduced. Adding these partial products together will give you a final result. Therefore, by synchronizing the synchronization signal CLK, the Boots encoder 104 determines the value of the continuous 3-bit data segment of the multiplier operand OP A and controls the Buss multiplexer 105 through the bus bar PPSEL for Of the selection signals, one is selected. Signal on busbar PPSEL The Buss multiplexer 105 is controlled to select one of the five partial products, wherein the five partial products are related to the multiplicand operand OP B . These five partial products are all produced by the partial product generator 103. The partial product generator 103 multiplies the multiplicand operand OP B by 0 to generate a partial product 0. The partial product generator 103 multiplies the multiplicand operand OP B by +1 to generate a partial product B. The partial product generator 103 multiplies the multiplicand operand OP B by -1 to generate a partial product -B. The partial product generator 103 multiplies the multiplicand operand OP B by +2 to generate a partial product 2B. The partial product generator 103 multiplies the multiplied operand OP B by -2 to generate a partial product -2B. It is well known in the art that as long as the partial product generator 103 finds the complement of the multiplicand operand OP B , or shifts the multiplicand operand OP B to the left, or takes the multiplicand operand OP B first. The complement of the complement, and then shift the result of the complement to the left, you can get the above five partial products (-2B, -B, 0, B, 2B).

同步信號CLK除了可控制布斯編碼器104，使其檢查乘數運算元OP A的連續3位元資料段的數值，更可控制壓縮器106，使其儲存相對應的部分乘積，直到乘數運算元OP A的所有連續3位元資料段的數值均被檢查過。這些部分乘積會被分配到進位儲存加法器108的輸入端A、B及C，用以在匯流排CARRIES上產生一進位位元，以及在匯流排SUMS產生加總位元，然後藉由全加法器109，求出進位位元與加總位元的總合，並透過匯流排RESULT，輸出具有128位元的最終乘積，其中此最終乘積係為2的補數。 In addition to controlling the Buss encoder 104, the sync signal CLK can check the value of the consecutive 3-bit data segments of the multiplier operand OP A, and can further control the compressor 106 to store the corresponding partial product until the multiplier The values of all consecutive 3-bit data segments of operand OP A have been checked. These partial products are assigned to inputs A, B, and C of the carry store adder 108 for generating a carry bit on the bus bar CARRIES and summing bits in the bus bar SUMS, followed by full addition. The 109 finds the sum of the carry bit and the total bit, and outputs a final product having 128 bits through the bus RESULT, wherein the final product is a 2's complement.

第2圖係說明第1圖的乘法單元100係如何使用布斯編碼器減少部分乘積的個數。如上所述，許多在微處理器或是其它裝置裡的乘法單元均會使用布斯編碼技術。布斯編碼技術係根據兩運算元中之一者的多個位元，產生部分乘積。此技術主要係將一基底-2的乘法器編碼成較高基底。在3位元的布斯編碼技術中，基底-2的乘法器被編碼成基底-4的乘法器，因此，大約可將部分乘積的個數降低一半。由Kim所提出的美國專利案US 5,691,930中，已揭露布斯編碼技術，故不再詳加介紹。第2圖顯示乘數(OP A)的3位元資料段的數值與多個乘法係數間的對應關係，其中將被乘數(OP B)與乘法係數相乘後，便可得到相對應的部分乘積。舉例而言，當乘數的3位元資料段的數值為000及111時，則可對應到乘法係數0。當乘數的3位元資料段的數值為001及010時，則可對應乘法係數+1。當乘數的3位元資料段的數值為101及110時，則可對應乘法係數-1。當乘數的3位元資料段的數值為011時，則可對應乘法係數+2。當乘數的3位元資料段的數值為100時，則可對應乘法係數-2。部分乘積產生器103將被乘數運算元OP B乘上相對應的乘法係數，便可產生多個部分乘積，並將該等部分乘積輸入至多工器105。布斯編碼器104判斷乘數運算元OP A的每個3位元資料段的數值，並根據判斷結果，透過匯流排PPSEL，選擇一相對應的部分乘積。 Figure 2 is a diagram showing how the multiplication unit 100 of Fig. 1 uses Booth. The encoder reduces the number of partial products. As mentioned above, many multiplication units in microprocessors or other devices use Booth coding techniques. The Booth coding technique produces a partial product based on a plurality of bits of one of the two operands. This technique primarily encodes a substrate-2 multiplier into a higher substrate. In the 3-bit Booth coding technique, the multiplier of the substrate-2 is encoded into a multiplier of the base-4, so that the number of partial products can be reduced by about half. The Booth coding technique has been disclosed in U.S. Patent No. 5,691,930, issued toK. Figure 2 shows the correspondence between the value of the 3-bit data segment of the multiplier (OP A) and a plurality of multiplication coefficients, wherein the multiplicand (OP B) is multiplied by the multiplication coefficient to obtain the corresponding Partial product. For example, when the value of the 3-bit data segment of the multiplier is 000 and 111, the multiplication coefficient 0 can be corresponding. When the value of the 3-bit data segment of the multiplier is 001 and 010, the multiplication coefficient +1 can be used. When the value of the 3-bit data segment of the multiplier is 101 and 110, the multiplication coefficient -1 can be matched. When the value of the 3-bit data segment of the multiplier is 011, the multiplication coefficient +2 can be matched. When the value of the 3-bit data segment of the multiplier is 100, the multiplication coefficient -2 can be matched. The partial product generator 103 multiplies the multiplicand operator OP B by the corresponding multiplication coefficient to generate a plurality of partial products, and inputs the partial products to the multiplexer 105. The Boots encoder 104 judges the value of each 3-bit data segment of the multiplier operand OP A and selects a corresponding partial product through the bus bar PPSEL according to the judgment result.

第3圖係說明布斯編碼技術如何降低乘法操作下的部分乘積的數量。在第3圖中，係以4位元的乘法為例。第3圖顯示具有4位元的被乘數運算元301。被乘數運算元301可能被提供予上述的部分乘積產生器。第3圖亦顯示具有4位元的乘數運算元302。乘數運算元302可能被提供予上述的布斯編碼器。本領域人士均深知，為了進行3位元布斯編碼，必需將數值為0的位元303排列在乘數運算元302的最終有效位元(least significant bit；LSB)之後。根據第一3位元資料段304的數值，可從第2圖所呈現的表格200中，得知一乘法係數。根據該乘法係數，便可選擇一具有4位元的部分乘積。如第3圖所示，第一3位元資料段304的數值為110，故可對應到乘法係數-1。因此，將被乘數運算元301取2的補數，再擴展其位元數，故可得到被擴展的部分乘積307，其數值為11111001。下一個3位元資料段305與資料段304重疊1位元。根據第2圖的表格200，可得知資料段305對應到乘法係數+1。因此，被乘數運算元301可直接作為部分乘積308。由於以基底-4為例，故將部分乘積308往左移2位元。也就是將部分乘積308的最終有效位元對齊部分乘積307的位元2(若部分乘積307的最終有效位元稱為位元0)。根據第2圖的表格，可得知最後的3位元資料段306對應到乘法係數0。因此，部分乘積309為0000，並將部分乘積309往左移2位元。也就是將部分乘積309的最終有效位元對齊部分乘積308的位元2(若部分乘積308的最終有效位元稱為位元0)。 Figure 3 illustrates how the Buss coding technique reduces the number of partial products under multiplication operations. In Fig. 3, a 4-bit multiplication is taken as an example. Figure 3 shows a multiplicand operand 301 with 4 bits. The multiplicand operand 301 may be provided to the partial product generator described above. Figure 3 also shows A multiplier operand 302 having 4 bits. The multiplier operand 302 may be provided to the Booth encoder described above. It is well known in the art that in order to perform 3-bit Booth coding, it is necessary to arrange a bit 303 having a value of 0 after the last significant bit (LSB) of the multiplier operation element 302. Based on the value of the first 3-bit data segment 304, a multiplication coefficient can be known from the table 200 presented in FIG. According to the multiplication coefficient, a partial product having 4 bits can be selected. As shown in Fig. 3, the value of the first 3-bit data segment 304 is 110, so it can correspond to the multiplication coefficient -1. Therefore, the multiplicand operation unit 301 takes the 2's complement and then expands its bit number, so that the expanded partial product 307 can be obtained, and its value is 11111001. The next 3-bit data segment 305 overlaps the data segment 304 by one bit. According to the table 200 of FIG. 2, it can be known that the data segment 305 corresponds to the multiplication coefficient +1. Therefore, the multiplicand operand 301 can be directly used as the partial product 308. Since the substrate-4 is taken as an example, the partial product 308 is shifted to the left by 2 bits. That is, the last significant bit of the partial product 308 is aligned to the bit 2 of the partial product 307 (if the last significant bit of the partial product 307 is referred to as bit 0). According to the table of Fig. 2, it can be known that the last 3-bit data segment 306 corresponds to the multiplication coefficient 0. Therefore, the partial product 309 is 0000, and the partial product 309 is shifted to the left by 2 bits. That is, the last significant bit of the partial product 309 is aligned to the bit 2 of the partial product 308 (if the last significant bit of the partial product 308 is referred to as bit 0).

加總部分乘積307~309，便可得到8位元的相乘結果310，其數值為00010101。 By summing the partial products 307~309, the multiplication result 310 of the 8-bit is obtained, and its value is 00010101.

本發明與布斯編碼有關。在執行乘法操作時，布斯編碼所能達到的效能相當高。然而，布斯編碼無法執行無進位乘法操作。在判斷3位元資料段的數值時，其可能對應到乘法係數+2或-2，因而在加總部分乘積後，將會發生進位現象。為了在一處理器或其它裝置中，進行沒有進位的乘法操作，就不能使用布斯編碼技術。另外，進位現象亦會發生在計算加總的壓縮器中，因此，也不能使用壓縮。 The invention relates to Booth coding. The performance that Booth code can achieve is quite high when performing multiplication operations. However, the Booth code cannot perform a carry-free multiplication operation. When judging the value of the 3-bit data segment, it may correspond To the multiplication factor +2 or -2, a carry occurs after the product of the total is added. In order to perform a multiplication operation without carry in a processor or other device, the Booth coding technique cannot be used. In addition, the carry phenomenon also occurs in the compressor that calculates the total, so compression cannot be used.

因此，為了執行一無進位乘法運算，本發明提供一完全獨立的無進位乘法單元，也就是在一乘法單元中，至少提供獨立的無進位乘法硬體。本領域之技術人員深知，若增加新的硬體，將會增加功率消耗、降低可靠度並增加裝置的測試及除錯的複雜度。 Therefore, in order to perform a carry-free multiplication operation, the present invention provides a completely independent carry-less multiplication unit, that is, at least one independent multi-carrier multiplication hardware is provided in a multiplication unit. Those skilled in the art are well aware that adding new hardware will increase power consumption, reduce reliability, and increase the complexity of testing and debugging of the device.

本領域人士均深知，最佳的方法係使用處理器或其它裝置內原本的乘法硬體，使其作最有效的運用。然而，由於布斯編碼及壓縮硬體的特性，是不可能藉由布斯編碼及壓縮硬體達到無進位的乘法操作。 It is well known in the art that the best method is to use the original multiplication hardware in a processor or other device to make the most efficient use. However, due to the characteristics of Booth coding and compression hardware, it is impossible to achieve a carry-free multiplication operation by using Buss coding and compression hardware.

本發明提供一種裝置及方法，用以在一處理器或其它裝置中，達到無進位的乘法操作。本發明利用原本的布斯編碼元件與壓縮元件，再作些微的修改。因此，本發明所揭露的無進位的乘法操作，係在原本的乘法單元中，作最少的必需修改，並且不會影響原本乘法單元的速度。以下將藉由第4-7圖，說明本發明。 The present invention provides an apparatus and method for achieving a carry-free multiplication operation in a processor or other device. The invention utilizes the original Booth coding element and the compression element with minor modifications. Therefore, the carry-free multiplication operation disclosed in the present invention is the least necessary modification in the original multiplication unit and does not affect the speed of the original multiplication unit. The invention will be described below with reference to Figures 4-7.

綜上所述，根據乘數的位元資料段的數值，可得到一對應的乘法係數。由於此乘法係數可能為+2或-2，故在乘法操作中，可能會產生進位。另外，原本具有華萊士樹結構的進位儲存加法器(CSAs)也會產生進位。因此，本發明提供一種無進位乘法技術，將單一操作切分成兩次操作，用以避免加總部分乘積時，發生進位。本發明亦提供一種改良的壓縮器，其可選擇性地致能或不致能所產生的進位。 In summary, according to the value of the bit data segment of the multiplier, a corresponding multiplication coefficient can be obtained. Since this multiplication factor may be +2 or -2, a carry may occur in the multiplication operation. In addition, carry storage adders (CSAs) that originally have a Wallace tree structure also generate carry. Accordingly, the present invention provides a carry-free multiplication technique that splits a single operation into two operations to avoid a carry occurring when a partial product is added. The invention also provides a A modified compressor that selectively enables or disables the resulting carry.

第4圖為本發明之不會產生進位的布斯編碼係數。第4圖相似第2圖，不同之處在於第4圖只有兩個乘法係數0及+1，分別對應到數值000及010。藉由本發明所格式化出的乘數運算元，便可避免出現如第4圖刪除線所刪除的數值(如001、011~111)。本發明格式化乘數，再根據格式化後的結果，使用布斯編碼裝置。由於會產生進位的乘法係數會被避免，故可進行一無進位乘法操作。 Fig. 4 is a Boots code coefficient of the present invention which does not generate a carry. Figure 4 is similar to Figure 2, except that Figure 4 has only two multiplication coefficients, 0 and +1, corresponding to the values 000 and 010, respectively. By the multiplier operation element formatted by the present invention, the value deleted by the strikethrough as shown in Fig. 4 (e.g., 001, 011 to 111) can be avoided. The present invention formats the multiplier and then uses the Booth encoding device based on the formatted result. Since the multiplication factor that will generate the carry is avoided, a carry-less multiplication operation can be performed.

第5圖顯示本發明如何格式化運算元，再利用布斯編碼，執行無進位的乘法操作。第5圖顯示三個表示式501、511及521。表示式501具有8位元運算元502以及位元503。位元503的數值為0，並排列在運算元502的最終有效位元(LSB)之後。一般而言，運算元502的最終有效位元係稱為位元0(bit 0)。若將運算元502的第奇數個位元(即bit 1、bit 3、bit 5及bit 7)的數值修改成0，則修改後的結果如表示式511的偶數部分512所示。為了對偶數部分512進行布斯編碼的計算，可將位元513排列在運算元512的最終有效位元之後，其中位元513的數值為0。將運算元502的第奇數個位元(即bit 1、bit 3、bit 5及bit 7)的數值往右移(即作為bit 0、bit 2、bit4及bit 6)，再將運算元502的第奇數個位元填入0，便可得到表示式521的奇數部分522。為了對奇數部分522進行布斯編碼計算，需將位元523排列在奇數部分522的最終有效位元之後，其中位元523的數值為0。 Figure 5 shows how the present invention formats the operands and then uses the Booth code to perform a multiply-free multiplication operation. Figure 5 shows three representations 501, 511 and 521. The expression 501 has an 8-bit operand 502 and a bit 503. The value of bit 503 is zero and is arranged after the last significant bit (LSB) of operand 502. In general, the last significant bit of operand 502 is referred to as bit 0 (bit 0). If the value of the odd-numbered bits (i.e., bit 1, bit 3, bit 5, and bit 7) of the operand 502 is modified to 0, the modified result is as shown by the even portion 512 of the expression 511. To perform the calculation of the Booth code for the even portion 512, the bit 513 can be arranged after the last significant bit of the operand 512, where the value of the bit 513 is zero. The values of the odd-numbered bits (ie, bit 1, bit 3, bit 5, and bit 7) of the operand 502 are shifted to the right (ie, as bit 0, bit 2, bit 4, and bit 6), and then the operation element 502 is The odd-numbered bits are filled with 0 to obtain the odd-numbered portion 522 of the expression 521. In order to perform the Bussian coding calculation on the odd portion 522, the bit 523 is arranged after the last significant bit of the odd portion 522, where the value of the bit 523 is zero.

偶數部分512及奇數部分522完整表示原本的運算元 502，並可取代運算元502進行乘法運算。將奇數部分522的部分乘積的加總結果左移一位元後，再與偶數部分512的部分乘積的加總結果相加，便可產生最終的乘法結果。 The even part 512 and the odd part 522 completely represent the original operation element 502, and can replace the operand 502 for multiplication. The result of adding the total of the partial products of the odd portion 522 to the left by one bit, and then adding the total of the partial products of the even portion 512, produces the final multiplication result.

在本實施例中，預先格式化運算元502，以產生偶數部分512及奇數部分522，再利用布斯編碼器檢查偶數部分512及奇數部分522，便可得到一乘法結果。針對一般格式化的運算元502而言，若用這種方法進行乘法操作，需重覆兩次乘法單元的所有步驟。然而，本發明藉由將運算元502預先格式化成一偶數部分512及一奇數部分522，便可使用布斯編碼技術，而又不會產生進位，因為在表示式511及521中，所有的3位元資料段514~518及524~528的數值不是000就是010，因此，所對應到的乘法係數不是0就是1。由於本發明可透過習知的布斯編碼結構，執行沒有進位的乘法操作，因此，不會增加微處理器或其它裝置內的無進位的乘法裝置的複雜度。在原本的表示式501中，若檢查資料段504~508的數值時，將因為資料段505所對應的乘法係數為-2，而造成乘法操作發生進位。然而，在本實施例中，由預先格式化所產生的資料段514~518及524~528的數值並不會造成進位。 In the present embodiment, the operand 502 is preformatted to generate the even portion 512 and the odd portion 522, and the even portion 512 and the odd portion 522 are checked by the Booth encoder to obtain a multiplication result. For a generally formatted operand 502, if the multiplication operation is performed in this way, all steps of the multiplication unit are repeated twice. However, the present invention can use the Booth coding technique by pre-formatting the operand 502 into an even part 512 and an odd part 522 without generating a carry because in the expressions 511 and 521, all 3 The values of the bit data segments 514~518 and 524~528 are not 000 or 010. Therefore, the corresponding multiplication coefficient is not 0 or 1. Since the present invention can perform a multiplication operation without carry by the conventional Booth code structure, the complexity of the carry-free multiplying device in the microprocessor or other device is not increased. In the original expression 501, if the value of the data segments 504 to 508 is checked, the multiplication coefficient corresponding to the data segment 505 is -2, and the multiplication operation is carried out. However, in the present embodiment, the values of the data segments 514-518 and 524-528 generated by the pre-formatting do not cause a carry.

第6圖為本發明之無進位的乘法單元。乘法單元600與第1圖的乘法單元100相似。乘法單元600具有一第一運算元暫存器601。第一運算暫存器601耦接一無進位預先格式單元612。無進位預先格式單元612耦接一布斯編碼器604。乘法單元600具有一第二運算元暫存器602。第二運算元暫存器602耦接一部分乘積產生器603。布斯編碼器604及部分乘積產生器603均耦接一布斯多工器605。布斯多工器605透過一匯流排PARTPROD，耦接一壓縮器606。壓縮器606具有許多無進位壓縮係數。壓縮器606具有一無進位致能輸入信號，並包括複數進位儲存加法器(CSAs)608。進位儲存加法器608係以華萊士樹架構排列。壓縮器606透過一匯流排SUMS，耦接一左移器609，以及透過一匯流排CARRIES，耦接一全加法器610。左移器609耦接全加法器610。在一可能實施例中，全加法器610透過一匯流排RESULT，輸出具有128位元的乘法結果。全加法器610透過匯流排RESULT，耦接一暫存器613。另外，一乘積同步器607產生一同步信號CLK。為了使乘法單元600產生最終的128位元乘積，無進位預先格式單元612、壓縮器606、左移器609以及全加法器610接收同步信號CLK，用以同步進行操作。另外，本發明之乘法單元600具有一操作碼偵測器(opcode detector)611。操作碼偵測器611產生一無進位信號CARRYLESS。無進位預先格式單元612、壓縮器606以及左移器609接收無進位信號CARRYLESS。 Figure 6 is a carry-only multiplication unit of the present invention. The multiplication unit 600 is similar to the multiplication unit 100 of Fig. 1. The multiplication unit 600 has a first operand register 601. The first operation register 601 is coupled to a carry-free pre-format unit 612. The carry-free pre-format unit 612 is coupled to a Booth encoder 604. Multiplication unit 600 has a second operand register 602. The second operand register 602 is coupled to a portion of the product generator 603. Booth Both the encoder 604 and the partial product generator 603 are coupled to a Buss multiplexer 605. The Buss multiplexer 605 is coupled to a compressor 606 via a bus bar PARTPROD. Compressor 606 has a number of carry-free compression coefficients. Compressor 606 has a carry-free enable input signal and includes complex carry storage adders (CSAs) 608. The carry storage adder 608 is arranged in a Wallace tree architecture. The compressor 606 is coupled to a left shifter 609 through a bus bar SSUM and coupled to a full adder 610 via a bus bar CARRIES. The left shifter 609 is coupled to the full adder 610. In a possible embodiment, the full adder 610 outputs a multiplication result of 128 bits through a bus RESULT. The full adder 610 is coupled to a register 613 via a bus bar RESULT. In addition, a product synchronizer 607 generates a synchronization signal CLK. In order for the multiplication unit 600 to produce the final 128-bit product, the carry-free preformat unit 612, the compressor 606, the left shifter 609, and the full adder 610 receive the synchronization signal CLK for synchronous operation. In addition, the multiplication unit 600 of the present invention has an opcode detector 611. The opcode detector 611 generates a carry-free signal CARRYLESS. The carry-free preformat unit 612, the compressor 606, and the left shifter 609 receive the carry-free signal CARRYLESS.

在習知的乘法操作或是無進位乘法操作中，一指令(未顯示)會被直接地或間接地、或是分成兩運算元，傳送到乘法單元600。在一可能實施例中，一乘數運算元OP A會被提供至第一運算元暫存器601，而一被乘數運算元OP B會被提供至第二運算元暫存器602。乘數運算元OP A及被乘數運算元OP B均為2的補數。在本實施例中，乘數運算元OP A及被乘數運算元OP B均具有64位元，但並非用以限制本發明。在其它實施例中，乘數運算元OP A及被乘數運算元OP B具有其它數目的位元。在另一實施例中，在64位元乘法中，可將兩個64位元的運算元分成四個32位元的運算元，並透過乘法單元600進行乘法運算。 In a conventional multiplication operation or a carry-less multiplication operation, an instruction (not shown) is transferred to the multiplication unit 600 directly or indirectly, or into two operands. In one possible embodiment, a multiplier operand OP A is provided to the first operand register 601, and a multiplicand operand OP B is provided to the second operand register 602. Both the multiplier operand OP A and the multiplicand operand OP B are 2's complement. In this embodiment, both the multiplier operand OP A and the multiplicand operand OP B have 64 bits, but not It is used to limit the invention. In other embodiments, the multiplier operand OP A and the multiplicand operand OP B have other numbers of bits. In another embodiment, in 64-bit multiplication, two 64-bit operands can be divided into four 32-bit operands and multiplied by multiplying unit 600.

為了產生一最終乘積，乘法單元600使用第1圖的乘法單元100的布斯編碼，用以降低加總的部分乘積的數量。在一實施例中，使用3位元的布斯編碼器604。當布斯編碼器604運作時，可使布斯多工器605連續輸出相對應的部分乘積，其中該等部分乘積係為基底-4的乘法結果。因此，可降低部分乘積的加總數量。藉由加總部分乘積以及一係數，便可產生一最終結果。針對其它不同基底的布斯編碼，在進行無進位的乘法時，必須對無進位預設格式化、後格式化以及部分乘積進行同量的修改，用以消除可能引起的進位。因此，藉由同步信號CLK，布斯編碼器604計算本身所接收到的連續3位元資料段，並且透過匯流排PPSEL，控制布斯多工器605。匯流排PPSEL上的信號使布斯多工器105從五個部分乘積((-2B、-B、0、B、2B)中，選擇一者，其中上述五個部分乘積與被乘數運算元OP B有關。這些部分乘積都是由部分乘積產生器603所產生。部分乘積產生器603將被乘數運算元OP B乘於0，故可產生部分乘積0；部分乘積產生器603將被乘數運算元OP B乘於1，故可產生部分乘積B。部分乘積產生器603將被乘數運算元OP B乘於-1，故可產生部分乘積-B。部分乘積產生器603將被乘數運算元OP B乘於2，故可產生部分乘積2B。部分乘積產生器603將被乘數運算元OP B乘於-2，故可產生部分乘積-2B。對本領域之技術人員而言，可利用部分乘積產生器603，取得被乘數運算元OP B的補數、或是將被乘數運算元OP B往左移、或是先取被乘數運算元OP B的補數，再將補數結果往左移，便能得到這五個部分乘積。 To generate a final product, the multiplication unit 600 uses the Booth code of the multiplication unit 100 of Fig. 1 to reduce the number of summed partial products. In an embodiment, a 3-bit Booth encoder 604 is used. When the Booth encoder 604 is in operation, the Buss multiplexer 605 can continuously output the corresponding partial product, wherein the partial products are the multiplication results of the substrate-4. Therefore, the total number of partial products can be reduced. By summing up the partial product and a coefficient, a final result can be produced. For Booth coding of other different substrates, in the case of carry-free multiplication, the same amount of modification must be performed for the no-preset preset formatting, post-formatting, and partial product to eliminate the possible carry. Therefore, by the synchronization signal CLK, the Buss encoder 604 calculates the continuous 3-bit data segment itself, and controls the Buss multiplexer 605 through the bus bar PPSEL. The signal on the bus PPSEL causes the Buss multiplexer 105 to select one of the five partial products ((-2B, -B, 0, B, 2B), wherein the five partial product and the multiplicand operand OP B. These partial products are all generated by the partial product generator 603. The partial product generator 603 multiplies the multiplicand operand OP B by 0, so that a partial product 0 can be generated; the partial product generator 603 will be multiplied The number operand OP B is multiplied by 1, so that a partial product B can be generated. The partial product generator 603 multiplies the multiplicand operand OP B by -1, so that a partial product -B can be generated. The partial product generator 603 will be multiplied The number operand OP B is multiplied by 2, so that a partial product 2B can be generated. The partial product generator 603 multiplies the multiplied operand OP B At -2, a partial product -2B can be generated. For those skilled in the art, the partial product generator 603 can be used to obtain the complement of the multiplicand operand OP B or to shift the multiplicand operand OP B to the left or to take the multiplicand operator first. The complement of OP B, and then shift the result of the complement to the left, you can get the product of these five parts.

若操作碼偵測器611偵測到一正常乘法指令時，則不致能信號CARRLESS。因此，無進位預先格式單元612單純地將第一運算元暫存器601所接收到的乘數運算元OP A傳送至布斯編碼器604。若操作碼偵測器611偵測到一無進位乘法指令時，則致能信號CARRLESS。因此，無進位預先格式單元612將乘數格式化成一偶數部分以及一奇數部分(如第5圖的偶數部分512及奇數部分522所示)，並依序地進行布斯編碼的檢查。 If the opcode detector 611 detects a normal multiply instruction, the signal CARRLESS is not enabled. Therefore, the carry-free preformat unit 612 simply transfers the multiplier operand OP A received by the first operand register 601 to the Booth encoder 604. If the opcode detector 611 detects a carry-free multiply instruction, it activates the signal CARRLESS. Therefore, the carry-free preformat unit 612 formats the multiplier into an even portion and an odd portion (as shown by the even portion 512 and the odd portion 522 of FIG. 5), and sequentially checks the Booth code.

同步信號CLK使布斯編碼器604檢查本身所接收到的連續3位元資料段，並可使壓縮器606儲存相對應的部分乘積，直到所有的資料段均被檢查完畢。若無進位信號CARRYLESS不被致能(也就是偵測到一正常乘法指令)，部分乘積會被分配到進位儲存加法器608的輸入端A、B及C，用以在匯流排CARRIES上產生一進位位元，以及在匯流排SUMS上，產生加總位元。全加法器610加總匯流排SUMS上的位元，用以在匯流排RESULT上，產生128位元的最終結果，其中最終結果係為2的補數。舉例而言，若信號CARYLESS未被致能，左移器609會將匯流排SUMS上的數值直接地傳送到全加法器610。若信號CARYLESS被致能(也就是偵測到一無進位乘法指令時)，所有的進位儲存加法器608所輸出的進位位元會被無效化(也就是被設定成0)。只有進位儲存加法器608所輸出的加總位元才是有效的。在一可能實施例中，乘數運算元OP A的偶數部分會被分配到進位儲存加法器608的輸入端A、B及C，用以在匯流排SUMS上，產生偶數部分的加總位元。這些偶數部分的加總位元會被儲存在暫存器613之中。接著，乘數運算元OP A的奇數部分會被分配到進位儲存加法器608的輸入端A、B及C，用以在匯流排SUMS上，產生奇數部分的加總位元。這些奇數部分的加總位元會被左移器609左移一位元。在兩實施例中，全加法器610透過匯流排CARRIES所接收到的值為0。在匯流排SUMS產生奇數部分的加總位元後，藉由將暫存器613所儲存的資料(偶數部分)與匯流排RESULT上的資料(奇數部分)作互斥或(XOR)運算，便可得到一最終的無進位結果。在一可能實施例中，可利用一互斥或閘(XOR gate)，進行互斥或運算。 The sync signal CLK causes the Buss encoder 604 to check the consecutive 3-bit data segments it has received, and causes the compressor 606 to store the corresponding partial product until all data segments have been checked. If the carry signal CARRYLESS is not enabled (ie, a normal multiply instruction is detected), the partial product is assigned to the inputs A, B, and C of the carry storage adder 608 for generating a bus on the CARRIES. The carry bit, and on the bus SUMS, produces a total bit. The full adder 610 adds the bits on the bus SSUM to generate a final result of 128 bits on the bus RESULT, where the final result is a 2's complement. For example, if the signal CARYLESS is not enabled, the left shifter 609 will pass the value on the bus line SUMS directly to the full adder 610. If the signal CARYLESS is enabled (that is, when a carry-free multiply instruction is detected), The carry bits output by all carry store adders 608 are invalidated (i.e., set to zero). Only the summing bits output by the carry storage adder 608 are valid. In a possible embodiment, the even portion of the multiplier operand OP A is assigned to the inputs A, B, and C of the carry store adder 608 for generating the even bits of the even portion on the bus SUMS. . The summing bits of these even parts are stored in the register 613. Next, the odd portion of the multiplier operand OP A is assigned to the inputs A, B, and C of the carry store adder 608 for generating the odd majority of the sum bits on the bus SUMS. The summing bits of these odd parts are shifted left by one bit by the left shifter 609. In both embodiments, the full adder 610 receives a value of zero through the bus bar CARRIES. After the bus line SUMS generates the odd-numbered partial sum bits, by mutually exclusive or (XOR) the data (even part) stored in the register 613 and the data (odd part) on the bus RESULT, A final no-carry result is obtained. In a possible embodiment, a mutual exclusion or operation may be performed using a XOR gate.

本發明之乘法單元600可執行一般的乘法運算以及無進位乘法運算。乘法單元600具有邏輯、電路、裝置或是微碼(microcode)，如微指令或是原生指令(native instruction)、或是邏輯、電路、裝置或微碼之間的組合、或是其它可執行上述運作的其它等效元件。在一可能實施例中，用以進行上述運作的元件可被其它電路、微碼…等使用。這些電路或微碼係用以進行處理器或其它裝置的其它運作。在本實施例中，微碼係為一名詞，其與多個微指令有關。微指令(亦稱為原生指令)係為一指令，其由一單元所執行。舉例而言，微指令可直接地由一精簡指令集計算機(reduced instruction set computer；RISC)所執行。針對一複雜指令集計算機(Complex Instruction Set Computer；CISC)而言，如x86相容微處理器，x86指令會被轉換成微指令，並藉由CISC微處理器內的至少一單元，執行轉換後的微指令。 The multiplication unit 600 of the present invention can perform general multiplication operations and carry-less multiplication operations. The multiplication unit 600 has logic, circuitry, means or microcode, such as microinstructions or native instructions, or a combination of logic, circuitry, devices or microcode, or other executable Other equivalent components of operation. In a possible embodiment, the elements used to perform the above operations may be used by other circuits, microcodes, and the like. These circuits or microcode are used for other operations of the processor or other device. In this embodiment, the microcode is a noun associated with a plurality of microinstructions. A microinstruction (also known as a native instruction) is an instruction that consists of a single Yuan executed. For example, the microinstructions can be directly executed by a reduced instruction set computer (RISC). For a Complex Instruction Set Computer (CISC), such as an x86 compatible microprocessor, x86 instructions are converted to microinstructions and executed by at least one unit within the CISC microprocessor. Microinstructions.

本領域人士可體會出，將操作碼偵測器611、無進位預先格式單元612以及左移器609與壓縮器606及全加法器610相結合，只是些微的修改，並不會使處理器的硬體變得很複雜。然而，本發明的好處(如功率損耗低、可靠度高、除錯及測試時間短)遠大於本發明的效能特性。 Those skilled in the art will appreciate that the opcode detector 611, the carry-free pre-format unit 612, and the left shifter 609 are combined with the compressor 606 and the full adder 610, with minor modifications, and do not cause the processor. The hardware has become very complicated. However, the benefits of the present invention (e.g., low power loss, high reliability, debug and short test time) are much greater than the performance characteristics of the present invention.

第7圖為本發明之無進位的乘法方法。在方法701中，一處理器、微處理器或是其它裝置執行一般的乘法指令或是無進位乘法指令。接著，流程進入步驟702。 Figure 7 is a carry-only multiplication method of the present invention. In method 701, a processor, microprocessor or other device executes a general multiply instruction or a carry-free multiply instruction. Then, the flow proceeds to step 702.

在步驟702中，擷取並執行下一乘法指令，並將擷取後的結果提供予一乘法單元。接著，流程進入步驟703。 In step 702, the next multiplication instruction is retrieved and executed, and the captured result is provided to a multiplication unit. Next, the flow proceeds to step 703.

在步驟703中，執行一計算，用以判斷乘法單元是否接收到一無進位乘法指令。若乘法單元並未接收到一無進位乘法指令，則執行步驟705。若乘法單元接收到一無進位乘法指令，則執入步驟704。 In step 703, a calculation is performed to determine whether the multiply unit has received a carry-free multiply instruction. If the multiply unit does not receive a carry-free multiply instruction, step 705 is performed. If the multiply unit receives a carry-free multiply instruction, then step 704 is performed.

在步驟705中，乘法單元執行一般乘法操作。乘法單元利用布斯編碼及壓縮技術，減少部分乘積的個數，並產生一最終結果。接著，執行步驟713。 In step 705, the multiplication unit performs a general multiplication operation. The multiplication unit uses Booth coding and compression techniques to reduce the number of partial products and produce a final result. Next, step 713 is performed.

在步驟704中，檢查乘數的偶數位元，並根據檢查結果，得到被乘數的部份乘績。詳細的說明是，將一乘數運算元的第奇數個位元的數值修改成0，再根據布斯編碼技術，判斷在修改後的結果中，連續3位元資料段，用以得到複數部分乘積，並且該等部分乘積的總合並不會發生進位。由於乘數運算元的第奇數個位元的數值均為0，因此，所有可能因布斯編碼而產生的進位現象均會被排除。接著，執行步驟706。 In step 704, the even-numbered bits of the multiplier are checked, and based on the result of the check, a partial score of the multiplicand is obtained. The detailed explanation is that a multiplier will be shipped. The value of the odd-numbered bits of the operator is modified to 0, and then according to the Booth coding technique, in the modified result, the continuous 3-bit data segment is used to obtain the product of the complex part, and the total of the partial products The merge does not occur in the carry. Since the value of the odd-numbered bits of the multiplier operator is 0, all carry-over phenomena that may be caused by the Bussian code are excluded. Next, step 706 is performed.

在步驟706中，藉由壓縮器內的華萊士樹結構，進位位元的數值會被設定成0，即不致能華萊士樹裡的進位位元。接著，執行步驟707。 In step 706, by the Wallace tree structure in the compressor, the value of the carry bit is set to zero, that is, the carry bit in the Wallace tree is not enabled. Next, step 707 is performed.

在步驟707中，根據偶數部份，加總部份乘積，以得到第一無進位加總結果SUM1。詳細的說明是，將所有部分乘積進行互斥或(XOR)運算，用以產生一第一無進位加總結果SUM1。接著，執行步驟708。 In step 707, the product of the headquarters is added according to the even part to obtain the first carry-free total result SUM1. The detailed description is that all partial products are mutually exclusive or (XOR) operated to generate a first carry-free summed result SUM1. Next, step 708 is performed.

在步驟708中，將乘數運算元右移1位元。接著，執行步驟709。 In step 708, the multiplier operand is shifted right by 1 bit. Next, step 709 is performed.

在步驟709中，檢查位移後的乘數的偶數位元，並根據檢查結果，得到被乘數的部份乘積。詳細的說明是，將已右移的乘數運算元的第奇數個位元的數值設定成0，用以產生此乘數運算元的一奇數部分。根據布斯編碼技術，判斷此奇數部分的連續3位元資料段，用以選擇複數部分乘積。根據該等部分乘積，便可得到一無進位乘法結果。接著，執行步驟710。 In step 709, the even-numbered bits of the shifted multiplier are checked, and based on the result of the check, a partial product of the multiplicand is obtained. In detail, the value of the odd-numbered bits of the right-shifted multiplier operator is set to 0 to generate an odd-numbered portion of the multiplier operand. According to the Booth coding technique, a continuous 3-bit data segment of the odd-numbered portion is determined to select a complex partial product. Based on these partial products, a result of a carry-free multiplication can be obtained. Then, step 710 is performed.

在步驟710中，加總被乘數的部份乘積，以得到第二無進位加總結果SUM2。詳細的說明是，將奇數部分所得到的部分乘積進行互斥或運算，用以得到一第二無進位加總結果SUM2。接著，執行步驟711。 In step 710, the partial product of the multiplicand is summed to obtain a second carry-free sum result SUM2. The detailed description is that the partial product obtained by the odd part is mutually exclusive ORed to obtain a second non-carrying addition. The total result is SUM2. Then, step 711 is performed.

在步驟711中，將第二無進位加總結果SUM2左移1位元。接著，執行步驟712。 In step 711, the second carry-free addition result SUM2 is shifted left by 1 bit. Next, step 712 is performed.

在步驟712中，加總左移後的第二無進位加總結果SUM2與第一無進位加總結果SUM1，用以產生一最終無進位乘法結果。接著，執行步驟713。 In step 712, the second carry-free sum result SUM2 after the left shift is added to the first carry-free sum result SUM1 to generate a final carry-free multiplication result. Next, step 713 is performed.

在步驟713中，完成此流程。 In step 713, this process is completed.

雖然上述內容已詳細說明本發明之目的、功能及優點，但本發明亦包含其它實施例。舉例而言，由於64位元的無進位乘法是目前處理器及其它裝置中，較為普遍的大小，故在上述內容中，較詳細說明64位元的無進位乘法。然而，本發明亦可適用在其它具有不同位元數量的處理器或裝置中。因此，本發明並不限定在64位元。 While the foregoing has been described in detail, the preferred embodiments of the invention For example, since the 64-bit non-carry multiplication is a relatively common size in current processors and other devices, in the above, the 64-bit non-carry multiplication is described in more detail. However, the invention is also applicable to other processors or devices having different numbers of bits. Therefore, the present invention is not limited to 64 bits.

另外，許多乘法單元均係利用一多通道裝置。舉例而言，64位元的運算元會被分成4個32位元的運算元。乘法單元根據這4個運算元，產生許多乘積結果。該等乘積結果會被加總在一起，用以產生一最終結果。本發明的目的之一就是利用一般乘法所使用的布斯編碼以及部分乘積產生硬體。 In addition, many multiplying units utilize a multi-channel device. For example, a 64-bit operand is divided into four 32-bit operands. The multiplication unit produces a number of product results based on the four operands. These product results are summed together to produce a final result. One of the objects of the present invention is to generate hardware by using the Booth code and partial product used in general multiplication.

最後，雖然上述內容係利用基底-4的布斯編碼技術，但並非用以限制本發明。為了使用現有的布斯編碼硬體架構，在其它實施例中，可使用大於4的基底。為了使用布斯編碼，但又不想產生進位，則可選擇一輸入運算元的某些特定位元，並將未選擇的位元的數值設定成0。因此，可將該輸入運算元格式化成複數部分。 Finally, although the above is based on the Booth coding technique of the substrate-4, it is not intended to limit the invention. In order to use the existing Booth coded hardware architecture, in other embodiments, a substrate greater than 4 can be used. To use the Booth code, but do not want to generate a carry, you can select a specific bit of an input operand and set the value of the unselected bit to zero. Therefore, the input operand can be formatted into a complex portion.

雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above preferred embodiments, it is not intended to limit the invention, and any one of ordinary skill in the art can make some modifications and refinements without departing from the spirit and scope of the invention. Therefore, the scope of the invention is defined by the scope of the appended claims.

100、600‧‧‧乘法單元 100, 600‧‧‧ multiplication unit

101、601‧‧‧第一運算元暫存器 101, 601‧‧‧ first operand register

102、602‧‧‧第二運算元暫存器 102, 602‧‧‧ second operand register

103、603‧‧‧部分乘積產生器 103, 603‧‧‧ partial product generator

104、604‧‧‧布斯編碼器 104, 604‧‧‧ Booth encoder

105、605‧‧‧布斯多工器 105, 605‧‧‧Bus multiplexer

106、606‧‧‧壓縮器 106, 606‧‧‧ Compressor

107、607‧‧‧乘積同步器 107, 607‧‧‧ Product Synchronizer

108、608‧‧‧進位儲存加法器 108, 608‧‧‧ Carry Storage Adder

109、610‧‧‧全加法器 109, 610‧‧‧Full adder

301、302、502‧‧‧運算元 301, 302, 502‧‧‧Operating elements

303、503、513、523‧‧‧位元 303, 503, 513, 523‧‧ ‧ bits

304~306、504~508、514~518、524~528‧‧‧資料段 304~306, 504~508, 514~518, 524~528‧‧‧ data segment

307~309‧‧‧部分乘積 307~309‧‧‧ partial product

310‧‧‧相乘結果 310‧‧‧Multiplication results

501、511、521‧‧‧表示式 501, 511, 521‧‧‧ expression

512‧‧‧偶數部分 512‧‧‧ even part

522‧‧‧奇數部分 522‧‧‧odd parts

609‧‧‧左移器 609‧‧‧ Left shifter

611‧‧‧操作碼偵測器 611‧‧‧Operation Code Detector

612‧‧‧無進位預先格式單元 612‧‧‧No carry preformat unit

613‧‧‧暫存器 613‧‧‧ register

第1圖為微處理器或相似裝置中的64位元乘法單元之方塊圖。 Figure 1 is a block diagram of a 64-bit multiplying unit in a microprocessor or similar device.

第2圖係為一表格，用以說明第1圖的乘法單元係如何利用布斯編碼降低部分乘積的數量。 Figure 2 is a table illustrating how the multiplication unit of Figure 1 uses the Booth code to reduce the number of partial products.

第3圖係說明如何利用布斯編碼技術，在4位元乘法操作中，降低部分乘積的數量。 Figure 3 illustrates how to use the Booth coding technique to reduce the number of partial products in a 4-bit multiplication operation.

第4圖顯示本發明之可進行一無進位乘法的布斯編碼係數。 Figure 4 shows the Booth coding coefficients of the present invention which can perform a carry-free multiplication.

第5圖係為本發明如何格式化一運算元。 Figure 5 is a diagram of how the present invention formats an operand.

第6圖係為本發明之一無進位乘法單元之方塊圖。 Figure 6 is a block diagram of a carry-free multiplication unit of the present invention.

第7圖係為本發明之無進位乘法流程圖。 Figure 7 is a flow chart of the carry-in multiplication of the present invention.

600‧‧‧乘法單元 600‧‧‧multiplication unit

601‧‧‧第一運算元暫存器 601‧‧‧First operand register

602‧‧‧第二運算元暫存器 602‧‧‧Second operand register

603‧‧‧部分乘積產生器 603‧‧‧Partial product generator

604‧‧‧布斯編碼器 604‧‧‧ Booth encoder

605‧‧‧布斯多工器 605‧‧‧Bus multiplexer

606‧‧‧壓縮器 606‧‧‧Compressor

607‧‧‧乘積同步器 607‧‧‧Product Synchronizer

608‧‧‧進位儲存加法器 608‧‧‧ Carry Storage Adder

609‧‧‧左移器 609‧‧‧ Left shifter

610‧‧‧全加法器 610‧‧‧Full adder

611‧‧‧操作碼偵測器 611‧‧‧Operation Code Detector

612‧‧‧無進位預先格式單元 612‧‧‧No carry preformat unit

613‧‧‧暫存器 613‧‧‧ register

Claims

A carry-free multiply device for performing a carry-free multiplication operation, comprising: a carry-free pre-format unit for receiving a multiplier operand, and formatting the multiplier operand into a complex part, wherein the parts include An even-numbered portion and an odd-numbered portion; a Buss multiplexer that receives a first partial product of a multiplicand operation element and a complex second partial product; a Booth encoder that receives and determines the portions and causes the The Buss multiplexer selects the first partial products, wherein by means of the portions, the second partial product is prevented from being selected, and the second partial product causes a carry phenomenon; a compressor passes through the Booth The multiplexer is coupled to the Buss encoder for summing the first partial products through a complex carry storage adder, the carry storage adder generating a complex total bit and a complex carry bit, wherein the carry The storage adder is arranged in a Wallace tree structure. When the carry-in multiplication operation is performed, the carry bit is not enabled; a left shifter coupled to the compressor is used to Performing the carry-in multiplication operation shifting the output of the summed bits corresponding to the odd portion to the left by at least one bit; and a mutually exclusive or gate coupled to the output of the compressor and the output of the left shifter And performing, in performing the carry-in multiplication operation, the summed bits corresponding to the left shift corresponding to the odd portion and the summed bits of the even portion are mutually exclusive OR operations.

The non-carrying multiplying device of claim 1, wherein The format of the multiplier operand is a 2's complement.

The non-carrying multiplying device of claim 1, wherein the even portion has a value of an even bit of the multiplier, and a value of an odd bit of the multiplier, wherein the even portion The value of the odd bit having the multiplier operand is set to 0; and the odd portion has the value of the odd bit of the multiplier operand, and the value of the even bit of the multiplier operand, where The odd-numbered portion has the value of the even-numbered bit of the multiplier operand set to 0, and the odd-numbered portion has a value shifted to the right by 1 bit; wherein the even-numbered bits of the multiplier operand are included The last significant bit of the multiplier operand.

The carry-in multiplication device of claim 1, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product.

The non-carrying multiplying apparatus of claim 1, wherein the first partial product comprises a complex first multiplication of the multiplicanded operand, the first multiplication comprising: multiplying the multiplicanded operation element 0; and multiply the multiplicand operand by 1.

The non-carrying multiplying apparatus of claim 5, wherein the second partial product comprises a complex second multiplication of the multiplicanded operand, the second multiplications comprising: multiplying the multiplicanded operation element Upper 2; and multiply the multiplicand operator by -2.

The non-carrying multiplying device of claim 1, wherein the non-carrying multiplying device is a multiplying unit or a set in a processor A multiplication unit in the device.

The carry-in multiplication device of claim 7, wherein the multiplication unit is configured to perform a carry-free multiplication operation and a normal multiplication operation.

A method for performing a carry-free multiplication operation, comprising: formatting a multiplier operation element into a complex part in a multiplication unit in a processor; determining the parts through a Booth encoder, and selecting a complex first partial product of a multiplicand operation element, wherein by means of the portions, the complex second partial product of the multiplicanded operation element is prevented from being selected, and the second partial product causes a carry phenomenon; a carry storage adder processing the first partial product for generating a complex sum bit and a plurality of carry bits, wherein the carry storage adders are arranged in a Wallace tree architecture and performing the carry-less multiplication The carry bit is not enabled; the output of the Wallace tree is shifted to the left by at least one bit; and the output of the Wallace tree is mutually exclusive ORed to produce a carry-free multiplication result .

The method of claim 9, wherein the multiplier operand is in the form of a 2's complement.

The method of claim 9, wherein the portion comprises: an even portion, a value having an even bit of the multiplier, and a value of an odd bit of the multiplier, wherein The value of the odd bit of the multiplier operand that the even part has is set to 0; An odd-numbered portion having a value of an odd-numbered bit of the multiplier operation element and a value of an even-numbered bit of the multiplier operation element, wherein the odd-numbered portion has an even-numbered bit of the multiplier operation element set 0, and the value of the odd portion is shifted to the right by 1 bit; wherein the even bit of the multiplier is included in the even bit of the multiplier.

The method of claim 9, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product.

The method of claim 9, wherein the first partial product comprises a complex first multiplication of the multiplicanded operand, the first multiplication comprising: multiplying the multiplicand operator by 0; Multiply the multiplicand operand by 1.

The method of claim 13, wherein the second partial product comprises a complex second multiplication of the multiplicand operand, the second multiplications comprising: multiplying the multiplicand operator by 2; And multiplying the multiplicand operator by -2.

The method of claim 9, wherein the operands are all operands having 64 bits.

A carry-free multiply device for performing a carry-free multiplication operation, comprising: a first operand register, receiving a first operand for performing the carry-free multiplication operation; and a second operand register Receiving a second operand for performing The non-carry multiplication operation; an opcode detector, receiving a carry-free multiplication instruction, and enabling a carry-free signal according to the non-carry multiplication instruction; and a non-carry pre-format unit coupled to the first operation element a memory, when the carry-free signal is enabled, the non-carry pre-format unit formats the first operand into a complex portion, wherein the portions include an even portion and an odd portion, wherein a Buss multiplexer Receiving a complex first partial product of the second operand and a complex second partial product, and a Buss encoder, by means of the portions, causes the Buss multiplexer to avoid selecting the second partial product, the The two-part product causes a carry phenomenon; a compressor passes through the complex carry storage adder to add the first partial product of the second operand, and the carry storage adder generates a complex total bit and a complex carry bit The carry storage adders are arranged in a Wallace tree architecture, and when the non-carry signal is enabled, the carry bits are not enabled; a left shifter is coupled a compressor for shifting an output of the summed bits corresponding to the odd portion to the left by at least one bit when performing the carry-less multiplication operation; and a mutually exclusive or gate coupled to the output of the compressor and An output of the left shifter for mutually exclusive repulsing the left-shifted corresponding bit corresponding to the odd-numbered portion and the even-numbered portion of the even-numbered portion when performing the carry-less multiplication operation Operation.

The non-carrying multiplying device of claim 16, wherein the format of the first and second operands is a complement of two.

A carry-free multiplying device as described in claim 16 of the patent application, The even number portion has a value of an even bit of the first operand, and a value of the odd bit of the first operand, wherein the even bit portion has a value of an odd bit of the first operand set And the odd portion has a value of an odd bit of the first operand, and a value of an even bit of the first operand, wherein the odd portion has an even bit of the first operand The value is set to 0, and the value of the odd portion is shifted to the right by 1 bit; wherein the even bit of the first operand includes the last significant bit of the first operand.

The carry-less multiplying device of claim 16, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product.

The non-carrying multiplying device of claim 16, wherein the first partial product comprises a complex first multiplication of the second operational element, the first multiplications comprising: multiplying the second operational element by 0; And multiplying the second operand by one.

The non-carrying multiplying device of claim 20, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplications comprising: multiplying the second operand by 2 And multiplying the second operand by -2.

A carry-less multiply device as described in claim 16, wherein the carry-in multiply device is a multiplying unit in a processor or a multiplying unit in a device.

If there is a carry-in multiplication device as described in claim 22, The multiplication unit is configured to perform a carry-free multiplication operation and a normal multiplication operation.

A method for performing a carry-free multiplication operation, comprising: receiving, in a multiplication unit in a processor, a first operation element and a second operation element for performing the carry-less multiplication operation; a carry multiplication instruction enabling a carry signal; when the carry signal is enabled, the first operand is formatted into a complex portion, wherein a Booth encoder avoids selecting the second by using the portion The second partial product of the operands, the second partial product causing a carry phenomenon; the complex first load storage adder adds the first partial product of the second operational element, and the carry storage adder generates a complex addition a total bit and a plurality of carry bits, wherein the carry store adders are arranged in a Wallace tree structure, and when the carry signal is enabled, the carry bits are not enabled; the Wallace is enabled The output of the tree, shifted to the left by at least one element; and a mutually exclusive OR operation on the output of the Wallace tree to produce a carry-free multiplication result.

The method of claim 24, wherein the format of the first and second operands is a complement of two.

The method of claim 24, wherein the portion comprises: an even portion having a value of an even bit of the first operand, and a value of an odd bit of the first operand, wherein the The even-numbered portion has the value of the odd-numbered bit of the first operand set to 0; An odd-numbered portion having a value of an odd-numbered bit of the first operational element and a value of an even-numbered bit of the first operational element, wherein a value of an even-numbered bit of the first operational element having the odd-numbered portion is set 0, and the value of the odd portion is shifted to the right by 1 bit; wherein the even bit of the first operand includes the last significant bit of the first operand.

The method of claim 24, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product.

The method of claim 24, wherein the first partial product comprises a complex first multiplication of the second operational element, the first multiplications comprising: multiplying the second operational element by 0; The second operand is multiplied by 1.

The method of claim 28, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplications comprising: multiplying the second operand by 2; The second operand is multiplied by -2.

The method of claim 24, wherein the operands are all operands having 64 bits.

A carry-free multiply device that formats a first operand for performing a carry-free multiplication operation, comprising: an opcode detector, receiving a carry-free multiply instruction, and enabling one according to the carry-free multiply instruction No carry signal; and a carry-free pre-format unit, when the carry-free signal is enabled, The non-carry pre-format unit formats the first operand into a complex portion, wherein a Booth encoder enables the one multiplexer to select a complex first partial product of a second operand, and Preventing the Boots multiplexer from selecting a complex second partial product of the second operand, the second partial product causing a carry phenomenon; wherein the first partial product performs a mutual exclusion operation to generate a No carry multiplication result.

The non-carrying multiplying device of claim 31, wherein the format of the first and second operands is a complement of two.

The non-carrying multiplying device of claim 31, wherein the portion comprises: an even portion, a value having an even bit of the first operand, and a value of an odd bit of the first operand The value of the odd bit of the first operand having the even portion is set to 0; and an odd portion having the value of the odd bit of the first operand, and the even bit of the first operand a value of a digit, wherein the odd-numbered portion has a value of an even-numbered bit of the first operand set to 0, and the odd-numbered portion has a value shifted to the right by 1 bit; wherein the first operand The even bit of the first bit includes the last significant bit of the first operand.

A carry-less multiply device as described in claim 31, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product.

The non-carrying multiplying device of claim 31, wherein the first partial product comprises a complex first multiplication of the second operational element, the first multiplications comprising: Multiplying the second operand by 0; and multiplying the second operand by 1.

The non-carrying multiplying device of claim 35, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplication comprising: multiplying the second operand by 2 And multiplying the second operand by -2.

The non-carrying multiplying device of claim 31, wherein the non-carrying multiplying device is a multiplying unit in a processor or a multiplying unit in a device.

The carry-in multiplication device of claim 37, wherein the multiplication unit is configured to perform a carry-free multiplication operation and a normal multiplication operation.

A method for performing a carry-free multiplication operation, comprising: receiving a carry-free multiplication instruction in a multiplication unit in a processor, and performing the absence with a first operation element and a second operation element a carry-multiplication operation; according to the carry-in multiplication instruction, enabling a carry-free signal; and when the carry-free signal is enabled, formatting the first operand into a complex portion, wherein a Buss encoder uses the Part, causing a Buss multiplexer to select a complex first partial product of the second operand, and avoiding having the Buss multiplexer select a complex second partial product of the second operand, the second partial product A carry-in phenomenon is caused; wherein the first partial products are subjected to a mutual exclusion operation to generate a carry-free multiplication result.

The method of claim 39, wherein the format of the first and second operands is a complement of two.

The method of claim 39, wherein the portion comprises: an even portion having a value of an even bit of the first operand, and a value of an odd bit of the first operand, wherein the The even-numbered portion has the value of the odd-numbered bit of the first operand set to 0; and an odd-numbered portion having the value of the odd-numbered bit of the first operand, and the even-numbered bit of the first operand a value, wherein a value of an even bit of the first operand having the odd portion is set to 0, and a value of the odd portion is shifted to the right by 1 bit; wherein an even bit of the first operand In the element, the last significant bit of the first operand is included.

The method of claim 39, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product.

The method of claim 39, wherein the first partial product comprises a complex first multiplication of the second operational element, the first multiplications comprising: multiplying the second operational element by 0; The second operand is multiplied by 1.

The method of claim 43, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplications comprising: multiplying the second operand by 2; The second operand is multiplied by -2.

The method of claim 39, wherein the operands are all operands having 64 bits.