TW201224916A - Carryless multiplication apparatus and method - Google Patents

Carryless multiplication apparatus and method Download PDF

Info

Publication number
TW201224916A
TW201224916A TW100136024A TW100136024A TW201224916A TW 201224916 A TW201224916 A TW 201224916A TW 100136024 A TW100136024 A TW 100136024A TW 100136024 A TW100136024 A TW 100136024A TW 201224916 A TW201224916 A TW 201224916A
Authority
TW
Taiwan
Prior art keywords
carry
operand
bit
multiplication
value
Prior art date
Application number
TW100136024A
Other languages
Chinese (zh)
Other versions
TWI489375B (en
Inventor
Timothy A Elliott
Original Assignee
Via Tech Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/960,231 external-priority patent/US8645448B2/en
Priority claimed from US12/960,239 external-priority patent/US8667040B2/en
Priority claimed from US12/960,246 external-priority patent/US8635262B2/en
Application filed by Via Tech Inc filed Critical Via Tech Inc
Publication of TW201224916A publication Critical patent/TW201224916A/en
Application granted granted Critical
Publication of TWI489375B publication Critical patent/TWI489375B/en

Links

Landscapes

  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

An apparatus having a carryless preformat unit, a Booth encoder, a compressor, a left shifter, and exclusive-OR logic. The carryless preformat unit receives a multiplier operand and partitions the multiplier operand into parts. The Booth encoder receives the parts and directs selection of first partial products of a multiplicand that do not reflect implicit carry operations. The compressor sums the first partial products via a configuration of carry save adders that generate sum bits and carry bits, where generation of the carry bits is disabled during execution of the carryless multiplication. The left shifter shifts bits of one or more outputs of the compressor. The exclusive-OR logic is coupled to the compressor and the left shifter, and is configured to execute an exclusive-OR function on the outputs to yield a carryless multiplication result.

Description

201224916 六、發明說明: 【發明所屬之技術領域】 本發明係有關於一種微電子,特別是有於一種用以進 行一無進位乘法運算的技術。 【先前技術】 在目前大多的通訊中,可對通訊資料進行加密。有效 的加密方法從簡單的認證,到使用對稱關鍵加密技術的散 列編碼訊息(hashed enciphered message)都可採行。在對稱 關鍵加密技術中,較為常見的操作模式係為加洛瓦計數器 模式(Galois/Counter Mode ;以下簡稱 GCM)。GCM 可對一 訊息進行加密及認證。 本領域人士均深知,GCM係結合計數器模式的加密技 術以及近來被開發出的Galois模式的認證技術。在GCM 中,係利用加洛瓦場(Galois field)的乘法運算以進行認證。 雖然加洛瓦場的乘法運算並非本案所欲討論的範圍,但加 洛瓦場的乘法運算係為一無進位乘法。 一般而言,無進位乘法係為二進制多項式乘法,並且 亦為估算兩運算元的乘積的數學運算,而且又不會產生或 多出進位位元。事實上,INTEL公司已提供一指令(如 PCLMULQDQ),其可控制χ86相容的微處理器執行此功能。 因此,當微處理器的設計者修改原本的設計,用以提 供更多的功能時’必須一併考慮到無進位的乘法運算。這 是一個簡單的操作,但本領域技術人員均深知,必需利用 許多的硬體才能做到無進位的乘法運算。舉例而言,在64 CNTR2522I00-TW/0608-A42859-TW/Final201224916 VI. Description of the Invention: TECHNICAL FIELD OF THE INVENTION The present invention relates to a microelectronic, and more particularly to a technique for performing a carry-less multiplication operation. [Prior Art] In most current communications, communication data can be encrypted. Efficient encryption methods can be used from simple authentication to hashed enciphered messages using symmetric key cryptography. Among the symmetric key encryption technologies, the more common operation mode is the Galois Counter Mode (Glois/Counter Mode; hereinafter referred to as GCM). GCM can encrypt and authenticate a message. It is well known in the art that GCM is combined with counter mode encryption technology and the recently developed Galois mode authentication technology. In GCM, a multiplication of the Galois field is used for authentication. Although the multiplication of the Galloway field is not the scope of the case, the multiplication of the Galloway field is a carry-free multiplication. In general, the carry-free multiplication is a binary polynomial multiplication, and is also a mathematical operation that estimates the product of two operands, and does not produce or add more carry bits. In fact, INTEL has provided an instruction (such as PCLMULQDQ) that can control a 相容86 compatible microprocessor to perform this function. Therefore, when the designer of the microprocessor modifies the original design to provide more functionality, the multiply-free multiplication must be considered together. This is a simple operation, but those skilled in the art are well aware that many hardware must be utilized to achieve a carry-free multiplication operation. For example, at 64 CNTR2522I00-TW/0608-A42859-TW/Final

S 201224916 5的無進位乘法操作中,將會產生64個部分乘積。將 一固部分乘積作互斥或(x〇R)運算後,便可得到一具128 一、最、、Ό果。在目前大部分的微處理器設計中,並沒 有任何單70或是邏輯可執行這樣的運#。然而,在大多的 微處理器中’大多具有至少—個乘法單元,用以進行-般 的乘法運算。 —t年來發展出許多的改良,讓目前的乘法單元可執 =的更陕。舉例而&,布斯編碼(B⑽让⑶⑶⑴吨)就是一種 吊見的技術。在乘法運算中,布斯編碼可減少一半的部分 乘積。華萊士樹⑽lace tree)也是一種常見的技術,用以 加總布斯編碼所產生的部分乘積。 、雖然具有較佳的效能,但是上述的技術將會產生或增 加進位。因此’目前的乘法單元完全無法適用於一無進位 乘法運算中。 為了解決上述缺點,本案發明人發現,最好能夠盡量 使用原本的硬體’以避免增加電源損耗以及元件數量。另 外,彳欠除錯及測試的觀點來看,最好能夠使用原本的硬體 架構’使其達到不同的功能。 因此,必須提供一種裝置及方法,用以在一處理器或 其它裝置中,進行一無進位乘法運算,並且大量使用原本 的硬體元件。 另外,需要一個能夠進行二進制的無進位乘法運算的 乘法單元,又不需要對原本的乘法單元進行太多的修改。 【發明内容】 CNTR2522I00-TW/06O8-A42859-TW/Final 5 201224916 本發明可解決上述問題,並且可滿足習知技術的其它 問題、缺點及限制。本發明提供一種較優先的技術,其可 在一處理器或其它裝置中,使用習知的布斯硬體,進行一 無進位乘法操作。在一可能實施例中,本發明提供一種無 進位乘法裝置,用以進行一無進位乘法運算,其包括一無 進位預先格式單元、一布斯編碼器、一壓縮器、一左移器 以及一互斥或閘。無進位預先格式單元接收一乘數運算 元,並將乘數運算元格式化成複數部分。布斯編碼器接收 並判斷該等部分,並選擇一被乘數運算元的複數第一部分 乘積。藉由該等部分,可避免被乘數運算元的複數第二部 分乘積被選擇。第二部分乘積會造成一進位現象。壓縮器 耦接布斯編碼器,用以透過複數進位儲存加法器,加總第 一部分乘積。進位儲存加法器產生複數加總位元以及複數 進位位元。進位儲存加法器以一華萊士樹架構排列。在執 行無進位乘法運算時,進位位元不被致能。左移器耦接壓 縮器,用以將壓縮器的輸出左移至少一位元。互斥或閘耦 接壓縮器以及左移器,用以進行一互斥或運算,並產生一 無進位乘法結果。 本發明另提供一種方法,用以執行一無進位乘法運 算,其包括:在一處理器内的一乘法單元中,將一乘數運 算元格式化成複數部分;透過一布斯編碼器,判斷該等部 分,並選擇一被乘數運算元的複數第一部分乘積其中藉由 該等部分,可避免被乘數運算元的複數第二部分乘積被選 擇,該等第二部分乘積會造成一進位現象;透過複數進位 儲存加法器,處理該等第一部分乘積,用以產生複數加總 6 CNTR2522I00-TW/0608-A42859-TW/FinalIn the carry-in multiplication operation of S 201224916 5, 64 partial products will be generated. After a solid partial product is mutually exclusive or (x〇R), a 128, the most, and the result can be obtained. In most current microprocessor designs, there is no single 70 or logic executable. However, in most microprocessors, most of them have at least one multiplication unit for performing a general multiplication operation. - Many improvements have been developed over the past year, so that the current multiplication unit can be implemented more. For example, &, Booth code (B (10) let (3) (3) (1) tons) is a kind of technology. In multiplication, the Booth code reduces the partial product by half. The Wallace tree is also a common technique used to add the partial product produced by the Booth code. Although the performance is better, the above techniques will produce or increase the carry. Therefore, the current multiplication unit is completely unsuitable for a carry-free multiplication operation. In order to solve the above drawbacks, the inventors of the present invention have found that it is preferable to use the original hardware as much as possible to avoid an increase in power supply loss and the number of components. In addition, from the point of view of debugging and testing, it is better to use the original hardware architecture to achieve different functions. Therefore, it is necessary to provide an apparatus and method for performing a carry-less multiplication operation in a processor or other device and using a large amount of original hardware components. In addition, a multiply unit capable of binary carry-less multiplication is required, and there is no need to make too many modifications to the original multiply unit. SUMMARY OF THE INVENTION CNTR2522I00-TW/06O8-A42859-TW/Final 5 201224916 The present invention solves the above problems and can satisfy other problems, disadvantages and limitations of the prior art. The present invention provides a prioritized technique for performing a carry-free multiplication operation using a conventional Booth hardware in a processor or other device. In a possible embodiment, the present invention provides a carry-free multiply device for performing a carry-free multiplication operation, comprising a carry-free preformat unit, a Booth encoder, a compressor, a left shifter, and a Mutually exclusive or gate. The carry-free preformat unit receives a multiplier operand and formats the multiplier operand into a complex part. The Booth encoder receives and determines the portions and selects the first partial product of the complex multiplicative operand. With these parts, it is avoided that the complex second partial product of the multiplicand operand is selected. The second part of the product will cause a carry phenomenon. The compressor is coupled to the Buss encoder to store the adder through the complex carry and add the first partial product. The carry store adder generates a complex sum bit and a complex carry bit. The carry storage adders are arranged in a Wallace tree architecture. When a carry-less multiplication operation is performed, the carry bit is not enabled. The left shifter is coupled to the compressor for shifting the output of the compressor to the left by at least one bit. Mutually exclusive or gate coupled compressors and left shifters are used to perform a mutual exclusion or operation and produce a carry-free multiplication result. The present invention further provides a method for performing a carry-free multiplication operation, comprising: formatting a multiplier operation element into a complex part in a multiplication unit in a processor; determining the And selecting a multiplicative first partial product of a multiplicand operator, wherein by means of the portions, the second partial product of the multiplicand operator is prevented from being selected, and the second partial product causes a carry phenomenon Processing the first partial product through a complex carry storage adder to generate a complex total of 6 CNTR2522I00-TW/0608-A42859-TW/Final

S 201224916 位元以及複數進位位 萊士樹架構拼列,二其中該等進位錯存加法器以一華 能該等進位位元;蔣 執行該無進位乘法運算時,不致 以及對該華萊士樹^華二樹的輪出,左移至少一位元,· 無進位乘法結果。 出進订一互斥或運算,用以產生— 在工業領域^, 且該微處理H可 ㈣可I現在-微處理器之中 中。 在1魏或是㈣功能的電腦裝ί 本發明可解决上 Β 問題、缺點及限制。2題’並且可滿足習知技術的其它 在-處理器或其它 明提供-種較優先的技術,其可 無進位乘法操作。在一。,使用習知的布斯硬體,進行— 進位乘法裝置,用以^可能實施例中,本發明提供一種無 一運算元暫存器、1二订:無進位乘法運算,其包括-第 -無進位預先格式 冑异70暫存器、-操作碼谓測器、 或閘。第-及第二運::一壓縮器、-左移器以及一互斥 及一第二運算元,用=兀暫存器分別接收一第一運算元以 器接收-無進位乘=仃無進位乘法運算。操作碼谓測 一無進位信號。卷奋:7 ’並根據無進位乘法指令,致能 單元將第一運“格式能時丄無進位預先格式 該” t’避免選擇到第二運算=複 乘力積成一進位現象。壓縮器透過複:進㈣ 運算元的複數第一部分乘積。進位儲 存加法盗產生複數加總位元以及複數進位位元。進位儲存 加法器以-華萊士樹架構排列。t無進位信號被致能時, CNTR2522I00-TW/0608-A42859-TW/Final 7 201224916 進位位元不被致能。左移器耦接壓縮器,用以將壓縮器的 輸出左移至少一位元。互斥或閘耦接壓縮器以及左移器, 用以進行一互斥或運算,並產生一無進位乘法結果。 本發明提供一種方法,用以進行一無進位乘法運算, 包括:在一處理器内的一乘法單元中,接收一第一運算元 以及一第二運算元,用以進行無進位乘法運算;根據一進 位法指令,致能一無進位信號;當無進位信號被致能時, 將第一運算元格式化成複數部分,其中一布斯編碼器藉由 該等部分,避免選擇到該第二運算元的複數第二部分乘 積,該等第二部分乘積會造成一進位現象;透過複數進位 儲存加法器,加總該第二運算元的複數第一部分乘積,該 等進位儲存加法器產生複數加總位元以及複數進位位元, 其中該等進位儲存加法器以一華萊士樹架構排列,當該無 進位信號被致能時,該等進位位元不被致能;將該華萊士 樹的輸出,左移至少一位元;以及對該華萊士樹的輸出進 行一互斥或運算,用以產生一無進位乘法結果。 在工業領域中,本發明可實現在一微處理器之中,並 且該微處理器可應用在一般功能或是特殊功能的電腦裝置 中。 本發明可解決上述問題,並且可滿足習知技術的其它 問題、缺點及限制。本發明提供一種較優先的技術,其可 在一處理器或其它裝置中,使用習知的布斯硬體,進行一 無進位乘法操作。在一可能實施例中,本發明提供一種裝 置,用以進行一無進位乘法運算。本發明之裝置包括一操 作碼偵測器以及一無進位預先格式單元。操作碼偵測器接 CNTR2522100-TW/0608-A42859-TW/Final 8 201224916 收一無進位乘法指令,並根據無進位乘法指令,致能一無 進位信號。當無進位信號被致能時,無進位預先格式單元 將第一運算元格式化成複數部分。一布斯編碼器藉由該等 部分,可選擇一第二運算元的複數第一部分乘積,並且避 免選擇到該第二運算元的複數第二部分乘積,該等第二部 分乘積會造成一進位現象。該等第一部分乘積進行一互斥 或運算,用以產生一無進位乘法結果。 本發明提供一種方法,用以執行一無進位乘法運算。 本發明之方法包括,在一處理器内的一乘法單元中,接收 一無進位指令,並與一第一運算元以及一第二運算元,一 起進行無進位乘法運算;根據無進位指令,致能一無進位 信號;以及當無進位信號被致能時,將第一運算元格式化 成複數部分,其中一布斯編碼器藉由該等部分,選擇第二 運算元的複數第一部分乘積,並且避免選擇到第二運算元 的複數第二部分乘積,該等第二部分乘積會造成一進位現 象;該等第一部分乘積進行一互斥或運算,用以產生一無 進位乘法結果。 在工業領域中,本發明可實現在一微處理器之中,並 且該微處理器可應用在一般功能或是特殊功能的電腦裝置 中。 為讓本發明之特徵和優點能更明顯易懂,下文特舉出 較佳實施例,並配合所附圖式,作詳細說明如下: 【實施方式】 本領域之技術人員可根據以下的内容,在一特定應用 CNTR2522I00-TW/0608-A42859-TW/Final 9 201224916 的乾圍及要求下’製造使用本發明。另外 ΐ員亦可根據以下的内容,作些微的修改,進 實。因此’本發明的並不限定在以下的;、匕 但本發明的最大笳 吁叱實施例, π乾圍係付合原理以及新穎特徵。 有4^於上述乘法及無進位乘法運算的背 A、 處理器產生乘法姓I # 、对。《β,以及 水凌、纟〇果的技術,將藉由第1-3 的限制。接著,將力筮47θ 說明裝置 將在第4-7圖說明本發明係如 習知的乘法裝置的缺戥% 了%决目刖 制,並說明本發明如何利用原 =乘法操作的硬體架構,進行-無進位乘法操作 第/圖係為64纟元乘法單元之—可能實施例。Μ位元 乘法單元1〇〇可雇田y·, 應用在一破處理器或其它裝置中。 元100具有一筮—,雷蝥_此丄 水次早 ^ 乐運 疋暫存器(operand register)l〇i。第 運算元暫存器101耦接一布斯編碼器(Booth :ncoder^lG,。乘法單幻⑻具有—第二運算元暫存器⑽。 第一運鼻凡暫存器102耦接一部分乘積產生器103。布斯 編碼益1〇4與部分乘積產生器103均耦接布斯多工器1〇5。 布斯多工!§ 105透過匯流排PARTpR〇D,耦接到壓縮器 106。壓縮器]06具有複數進位儲存加法器(cari^_save adder ’ CAS)l〇8。進位儲存加法器1〇8係以一習知的華萊 士樹(Wallace Tree)架構排列,用以降低加總多個部分乘積 時的傳遞延遲(propagation delay)。壓縮器1〇6透過匯流排 CARRIES及SUMS ’輕接一全加法器1〇9。全加法器1〇9 透過匯流排RESULT ’輸出一乘法結果,該乘法結果係為2 的補數,並具有128位元。為了使乘法單元1〇〇產生最終 的128位元乘積,一乘積同步器1〇7產生一同步信號CLK。 CNTR2522I00-TW/0608-A42859-TW/Final 1〇S 201224916 bit and the complex carry position of the Lai Shi tree architecture, two of which carry the error adder to a carry bit of Huaneng; when Chiang performs the carry-in multiplication, it does not and the Wallace The tree ^ Hua Ershu's turn, left shift at least one yuan, · no carry multiplication results. A mutual exclusion or operation is performed to generate - in the industrial field ^, and the microprocessor H can be used in the present-microprocessor. The invention can solve the above problems, disadvantages and limitations in the computer device of the 1 Wei or (4) function. 2 questions' and other conventional techniques that can satisfy the prior art, or a more prioritized technique, which can operate without carry multiplication. In a. Using a conventional Booth hardware, a carry-multiplication device is used to enable the present invention to provide a non-operating element register, a two-order: no carry multiplication operation, which includes - No carry preformatted different 70 scratchpads, - opcode prescalers, or gates. The first and second operations: a compressor, a left shifter, and a mutually exclusive and a second operation unit, respectively receive a first operation element by the =兀 register to receive - no carry multiplication = 仃 no Carry multiplication. The opcode is said to have no carry signal. Chongfen: 7 ′ and according to the no-multiply multiply instruction, the enabling unit will first select “the format can be no-pre-preformatted” t’ to avoid the selection to the second operation=the complex force is integrated into a carry phenomenon. The compressor passes through the complex: first (four) operand product of the first part of the complex. The carry memory addition method generates a complex total bit and a complex carry bit. Carry Storage Adders are arranged in a Wallace tree architecture. When the no carry signal is enabled, the CNTR2522I00-TW/0608-A42859-TW/Final 7 201224916 carry bit is not enabled. The left shifter is coupled to the compressor to shift the output of the compressor to the left by at least one bit. The mutex or gate is coupled to the compressor and the left shifter for performing a mutual exclusion or operation and producing a carry-free multiplication result. The present invention provides a method for performing a carry-free multiplication operation, comprising: receiving a first operation element and a second operation element in a multiplication unit in a processor for performing a carry-free multiplication operation; a carry method command, enabling a carry-free signal; when no carry signal is enabled, formatting the first operand into a complex portion, wherein a Buss encoder avoids selecting the second operation by using the portions The second partial product of the complex, the second partial product will cause a carry phenomenon; through the complex carry storage adder, the first partial product of the complex of the second operational element is summed, and the carry storage adder generates a complex total a bit and a plurality of carry bits, wherein the carry store adders are arranged in a Wallace tree structure, and when the carry signal is enabled, the carry bits are not enabled; the Wallace tree is enabled The output, shifting at least one bit to the left; and performing a mutually exclusive OR operation on the output of the Wallace tree to produce a carry-free multiplication result. In the industrial field, the present invention can be implemented in a microprocessor, and the microprocessor can be applied to a computer device of a general function or a special function. The present invention solves the above problems and can satisfy other problems, disadvantages and limitations of the prior art. The present invention provides a prioritized technique for performing a carry-free multiplication operation using a conventional Booth hardware in a processor or other device. In a possible embodiment, the present invention provides an apparatus for performing a carry-free multiplication operation. The apparatus of the present invention includes an opcode detector and a carry-free preformat unit. The opcode detector is connected to CNTR2522100-TW/0608-A42859-TW/Final 8 201224916 to receive a carry-free multiply instruction, and to enable a carry-free signal according to the carry-free multiply instruction. When no carry signal is enabled, the no pre-format unit formats the first operand into a complex portion. A Booth encoder can select a complex first partial product of a second operational element by means of the portions and avoid selecting a complex second partial product of the second operational element, the second partial product causing a carry phenomenon. The first partial products are subjected to a mutually exclusive OR operation to produce a carry-free multiplication result. The present invention provides a method for performing a carry-free multiplication operation. The method of the present invention includes receiving a carry-free instruction in a multiplication unit in a processor, and performing a carry-free multiplication operation together with a first operation element and a second operation element; Capable of having no carry signal; and when the no carry signal is enabled, formatting the first operand into a complex portion, wherein a Buss encoder selects a complex first partial product of the second operand by the portions, and Avoiding the selection of the second partial product of the second operand, the second partial product will cause a carry phenomenon; the first partial products undergo a mutually exclusive OR operation to produce a carry-free multiplication result. In the industrial field, the present invention can be implemented in a microprocessor, and the microprocessor can be applied to a computer device of a general function or a special function. In order to make the features and advantages of the present invention more comprehensible, the preferred embodiments of the present invention are described in detail below, and the following description is given as follows: [Embodiment] Those skilled in the art can, according to the following, The invention is manufactured using the dry circumference and requirements of a specific application CNTR2522I00-TW/0608-A42859-TW/Final 9 201224916. In addition, the employee can make minor modifications based on the following contents. Therefore, the present invention is not limited to the following; 匕 匕 匕 匕 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 There are 4^ on the back A of the above multiplication and no carry multiplication, and the processor generates the multiplication first name I#, right. "β, as well as the technology of water and capsules, will be limited by the first 1-3. Next, the force 47θ description device will be described in Figures 4-7 to illustrate the invention as a conventional multiplication device, and to illustrate how the present invention utilizes the original = multiplication operation of the hardware architecture. The carry-no-multiply operation operation/graph is a 64-inch multiplication unit - a possible embodiment. Μ位 Multiply unit 1 can be used in a broken processor or other device. Yuan 100 has a 筮-, Thunder _ 丄 丄 丄 ^ ^ ^ ^ ^ ^ ^ oper oper oper oper oper oper oper oper oper oper oper oper oper oper oper oper oper The first operand register 101 is coupled to a Booth encoder (Booth: ncoder^lG, the multiplicative single magic (8) has a second operand register (10). The first runnator 102 is coupled to a part of the product. The generator 103, the Booth code benefit 〇4 and the partial product generator 103 are both coupled to the Buss multiplexer 1 〇 5. Buss multiplexer § 105 is coupled to the compressor 106 via the bus bar PARTpR 〇 D. The compressor] 06 has a complex carry storage adder (cari^_save adder 'CAS) l 〇 8. The carry storage adder 1 〇 8 is arranged in a conventional Wallace Tree architecture to reduce the addition The propagation delay of the total number of partial products. The compressor 1〇6 is connected to the busbars CARRIES and SUMS' to connect a full adder 1〇9. The full adder 1〇9 outputs a multiplication by the busbar RESULT ' As a result, the multiplication result is a complement of 2 and has 128 bits. In order for the multiplication unit 1 to generate the final 128-bit product, a product synchronizer 1 〇 7 generates a synchronization signal CLK. CNTR2522I00-TW/ 0608-A42859-TW/Final 1〇

S 201224916 ==單元10°内的操作,使乘法單元咖產生 、 位兀乘積,同步信號CLK被傳送至间 生 器104及壓縮器106。 U步布斯編碼 成兩作時禮―指令(未顯示)會直接地或間接地,或是分 成由連开疋,傳送至乘法單元100。因此, 的乘數運算元〇·ρ Δ各 资、苗蛛 64位元 p W Α會被提供至第—運异元暫h ΗΗ,& 一 =64位元的被乘數運算元〇p B會被 = 异_器1〇2。乘數運算元0P a及被乘 _第-運S 201224916 == Operation within 10° of the unit causes the multiplication unit to generate a bit product, and the synchronization signal CLK is transmitted to the inter-processor 104 and the compressor 106. The U-step Booth code is used as a two-time gift-instruction (not shown) to be transmitted directly or indirectly, or to the multiplication unit 100. Therefore, the multiplier operation 〇·ρ Δ 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 B will be = different _ device 1 〇 2. Multiplier operation unit 0P a and multiplied _ first-transport

均為2的補數(c〇mplement)格式。一般而言开几P B =兀較為常見,故暫存器1Q1及1G2均為64位元^白二運 二:!其它的乘法單元架構中,亦可使用其它位元二 勺暫存$。舉例而言,本領域人士均;罙知,在6 里 可將兩個64位元的運算元分成四個32位元的 ], 法單兀100利用已知的技術及裝置,處理 兀乘 算元,以得到—乘積結果。 们2位凡的運 姑/Γ域人士均深知,乘法單元1GG大多彻布斯編石馬 技甘降低部/分乘積的個數,將所有部分翁積加總在一起, 便可得到一最終乘積。一般而言,布斯編碼器104係為」 3位疋布斯編碼器。藉由連續操作布斯編碼器1〇4,便可產 生多個部分乘積。該等部分乘積係為基底_4(radix_4)的乘法 結果。因此’可降低部分乘積的個數。加總該等部分乘積, 便可得到—最終結果。因此,藉由同步信號CLK的同步, 布斯編碼器1〇4判斷乘數運算元〇p a的連續3位元資料段 的數值’並透過匯流排PPSEL,控制布斯多工器105,用 以從五個選擇信號中,選擇一者。匯流排PPSEL上的信號 CNTR2522I〇〇-TW/〇6〇8-A42859-TW/Final ] 1 201224916 控制布斯多工器105,用以 其中這五個部分乘積與被乘數運算心:中,選擇-者’ 分乘積都是由部分乘積產生=PB有關。這五個部 器⑽將被乘數運算元〇ρδ^3所產生。部分乘積產生 部分乘積產生$ 以產生部分乘積〇。 產生部===算…乘—以 Β乘上七心產生部分^將被綠運算擔 ^ ^ ^ 又積-B。部分乘積產生器ΙίΠ ϋ定 ;積產二70 〇P B乘上+2,用以產生部分乘積2B。部分 1〇3將被乘數運算元〇pB乘上_2,用以產生部 刀乘積-2B。本領域人士均深知,尸八 乘數運算元OPB的補數、或是將乘數運算元: =移乘數運算元0PB的補數,再將補數結 左移,便可付耻述五個部分乘積(·2β、.β、〇、β、 同步信號CLK除了可控制布斯編碼器刚,使其檢查 乘數運算7LOP Α的連續3位元資料段的數值,更可控制壓 縮器使其儲存相對應的部分㈣,直到乘數運算元 〇P A的所有連續3位元資料段的數值均被檢查過。這些部 分乘積會被分配到進位儲存加法器⑽的輸人端a、b及 C ’用以在匯流排CAR刪上產生一進位位元,以及在匯 流排SUMS產生加總位元,然後藉由全加法$ 1〇9,求出 進位位元與加總位元的總合’料過_排娜服,輪 出具有⑵位it的最終乘積,其中此最終乘積係為2的補 數0 第2圖係說明第1圖的乘法單元1〇〇係如何使用布斯 CNTR2522!00-TW/0608-A42859-TW/Final 12 201224916 或是其= = :個數。如上所述’許多在微處理器 中者的位元,產生部分 技術主要係將一基底-2的乘法器編碼成較高 底。在3位元的布斯編 ' ’"、 。基 成基底_4的乘M,t 的乘法器被編鳴 —半。由f. 因此,大約可將部分乘積的個數降低Both are 2's complement (c〇mplement) format. In general, it is more common to open a few P B = ,, so the registers 1Q1 and 1G2 are both 64 bits ^ white two transport two:! In other multiplication unit architectures, other bits can be used for the temporary storage $. For example, it is known in the art that, in 6, two 64-bit operands can be divided into four 32-bit elements, and the method 100 uses known techniques and devices to process the multiplications. Yuan to get the - product result. The two people in the Yungu/Zhangyu area are well aware that the multiplication unit 1GG mostly uses the number of products in the division of the stone, and adds all the parts of the Weng product together to get one. The final product. In general, the Booth encoder 104 is a "3-digit cymbal encoder. By continuously operating the Buss encoder 1〇4, a plurality of partial products can be produced. These partial products are the multiplication results of the base_4 (radix_4). Therefore, the number of partial products can be reduced. Adding these partial products will give you the final result. Therefore, by synchronizing the synchronization signal CLK, the Buss encoder 1〇4 determines the value of the consecutive 3-bit data segments of the multiplier operation unit 〇pa and controls the Buss multiplexer 105 through the bus bar PPSEL for From one of the five selection signals, one is selected. The signal on the bus PPSEL is CNTR2522I〇〇-TW/〇6〇8-A42859-TW/Final] 1 201224916 Controls the Buss multiplexer 105 for the product of the five partial products and the multiplicand: The choice - the 'divided product is related to the partial product yield = PB. These five units (10) will be generated by the multiplier operand 〇ρδ^3. The partial product produces a partial product that produces $ to produce a partial product 〇. The generating part ===counting...multiplication--multiplying the 七 by the seven-heart generating part ^ will be the green operation ^ ^ ^ and the product -B. The partial product generator Ι Π ϋ ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Part 1〇3 will be multiplied by _2 by the multiplier operand 〇pB to generate the partial knife product -2B. Those skilled in the art are well aware that the complement of the corpse eight multiplier operand OPB, or the multiplier operand: = the complement of the shift multiplier operator 0PB, and then shift the complement knot to the left to pay for the shame The five partial products (·2β, .β, 〇, β, and the synchronization signal CLK can control the value of the continuous 3-bit data segment of the multiplier 7LOP 除了 in addition to controlling the Buss encoder, and can control the compressor. Let it store the corresponding part (4) until the values of all consecutive 3-bit data segments of the multiplier operation unit PA are checked. These partial products are assigned to the input terminals a, b of the carry storage adder (10). And C ' is used to generate a carry bit on the bus CAR delete, and to generate a total bit in the bus SUMS, and then obtain the total of the carry bit and the total bit by the total addition $ 1 〇 9 In the case of the material, the final product with the (2) bit it is rotated, and the final product is the complement of the number 0. The second figure shows how the multiplication unit 1 of the first figure uses the Booth CNTR2522. !00-TW/0608-A42859-TW/Final 12 201224916 or its == : number. As mentioned above, 'many in micro processing The bit of the middle part, the part of the technique is mainly to encode a base-2 multiplier into a higher base. In the 3-bit Booth code ' ', the multiplication of the base _4 by M, t The device is croed—half. By f. Therefore, the number of partial products can be reduced.

Kim所提出的美國專利案US 5,691 930中,已福 :!1編:技故不再詳加介紹。第2圖顯示乘數(〇p A) 中將祐㈣數值與多個乘法係數間的對應關係,其 部分乘藉P B)_f乘法係數相乘後,便可得到相對應的 及111昧,^而§ ’當乘數的3位元資料段的數值為000 的數值A 〇二對應到乘法係數〇。當乘數的3位元資料段 3位-眘祖& 〇10時,則可對應乘法係數+ 1。當乘數的 -1。:敗:5數值為101 * 110時’則可對應乘法係數 法㈣+7 am 叫的數值為 肖,則可對應乘 。當乘數的3位元資料段的數值為料,則可 Β^Γ數_2。部分乘積產生器⑻將被乘數運算請 該等部八目乘法ί錢’便可產生多個部分乘積,並將 S'軍;:二Ρ Λ二至多工器1〇5。布斯編碼1 104判斷乘 果?::匯★ :Ρ:個3位元資料段的數值,並根據判斷結 果^匯流排PPSEL,選擇一相對應的部分乘積。 分乘積的數量。在第=技:如:降低乘法操作下的部 二元的被乘數運算元301。被乘數運算元 c^522Z^mls^^ ^ ^ m ^ ± f 3 ® ^ 11^ 13 201224916 具有4位元的乘數運算元302。乘數運算元302可能被提 供予上述的布斯編碼器。本領域人士均深知,為了進行3 位元布斯編碼,必需將數值為0的位元303排列在乘數運 算元302的最終有效位元(least significant bit ; LSB)之後。 根據第一 3位元資料段304的數值,可從第2圖所呈現的 表格200中,得知一乘法係數。根據該乘法係數,便可選 擇一具有4位元的部分乘積。如第3圖所示,第一 3位元 資料段304的數值為110,故可對應到乘法係數-1。因此, 將被乘數運算元301取2的補數,再擴展其位元數,故可 得到被擴展的部分乘積307,其數值為11111001。下一個3 位元資料段305與資料段304重疊1位元。根據第2圖的 表格200,可得知資料段305對應到乘法係數+ 1。因此, 被乘數運算元301可直接作為部分乘積308。由於以基底-4 為例,故將部分乘積308往左移2位元。也就是將部分乘 積308的最終有效位元對齊部分乘積307的位元2(若部分 乘積307的最終有效位元稱為位元0)。根據第2圖的表格, 可得知最後的3位元資料段306對應到乘法係數0。因此, 部分乘積309為0000,並將部分乘積309往左移2位元。 也就是將部分乘積309的最終有效位元對齊部分乘積308 的位元2(若部分乘積308的最終有效位元稱為位元0)。 加總部分乘積307〜309,便可得到8位元的相乘結果 310,其數值為 00010101。 本發明與布斯編碼有關。在執行乘法操作時,布斯編 碼所能達到的效能相當高。然而,布斯編碼無法執行無進 位乘法操作。在判斷3位元資料段的數值時,其可能對應 CNTR2522I00-TW/0608-A42859-TW/Final ] 4 201224916 到乘法係數+2或·2,因而 位現象。為了方一 加總部分乘積後,將會發生進 j你爽理器武甘 $ 乘法操作,就不能使用—〃匕裝置中,進行沒有進位的 會發生在計算加總的㈣!^編石馬技術。另外,進位現象亦 因此,為了執行—盔态中,因此,也不能使用壓縮。 全獨立的無進位乘法單r進位乘法運算,本發明提供一完 提供獨立的無進位乘法^也就是在一乘法單元中,至少 增加新的硬體,將會增加工:。、本領域之技術人員深知,若 置的測試及除錯的複雜度。率〆肖乾、降低可罪度並增加裝 本領域人士均、、罙4 = 裝置内原本的乘法硬體:法係使用處理器或其它 於布斯編碼及壓縮硬體的】性作f有效的運用。然而,由 壓縮硬體達到無進位的乘法操作^不可能藉由布斯編碼及 本發明提供一種裝晉;^ 裝置中,達到無進位的乘法用以在一處理器或其它 編碼元件與壓縮元件,再作此微=發明利用原本的布斯 揭路的無進位的乘法操作,係在原本的乘法單_ t 少的必需修改,並且不會影響原本乘法單元作最 將藉由第4·7圖,說明本發明。法早兀的連度。以下 綜上所述,根據乘數的位元資料段的數值 對應的乘法係數。由於此乘法係數可能為 : 法操作中,可能會產生進位。另外,原本具有華萊 構的進位儲存加法器(CSAs)也會產生進位。因此 明° 提供-種無触乘法技術,將單―操仙分成兩;, 用以避免加總部分乘積時,發生進位。本發明 = CNTR2522I00-TW/0608-A42859-TW/Final 15 ^ 201224916 改良的壓縮器,其可選擇性地致能或不致能 第4 ®為本發明之不會產生進位的布斯編 圖相似第2圖,不同之處在於第4圖〇古二, ^ 4 及十分別對應到數值_及_。藉m㈣倾〇 出的乘數運算元,便可避免出現如第4圖_線=化 數值(如001、011〜111)。本發明格式化乘數,再根摅除的 化後的結果,使用布斯編碼裝置。由於舍吝 格式 係數會被避免,故可進行一無進位乘法操作。 人去 第5圖顯示本發明如何格式化運算元,再 碼’執行無進位的乘法操作。第5圖顯示三個表示式布5斯編 511及521。表示式501具有8位元運算元5〇2 ^及5〇1、 5〇3。位元503的數值為0,並排列在運算元5〇2的最=元 效位元(LSB)之後。一般而言,運算元502的最終有致終有 係稱為位元0(bit 0)。若將運算元502的第奇數個位位7° bit 3、bit 5及bit 7)的數值修改成〇,則修改後的結果Μ# 示式511的偶數部分512所示。為了對偶數部分5l2、表 布斯編碼的計算’可將位元513排列在運算元512的投行 巧敬絲- 有效位元之後,其中位元513的數值為〇。將運算元 的第奇數個位元(即bit 3、bit 5及bit 7)的數值往右移(gp# 為bit 2、bit4及bit 6),再將運算元5〇2的第奇數個仅斧& 入〇,便可得到表示式521的奇數部分522。為了對奇數部 分522進行布斯編碼計算,需將位元523排列在奇數部八 522的最終有效位元之後,其中位元523的數值為〇。刀 偶數部分512及奇數部分522完整表示原本的運曾_ 5〇2,並可取代運算元5〇2進行乘法遘算。將偶數部分 CNTR2522I00-TW/0608-A42859-TW/Final 16 201224916 乘法結^後再與奇數部分522相加,便可產生最終的 施例中’根據運算元5〇2,產生偶數部分M2 :。77 522 ’再利用布斯編碼檢查偶數部分512及奇 dr’便可得到一乘法結果。針對-般格式化的運 =H料㈣法進躲法操作,需重覆兩次 先柊所有步驟。然而,本發明藉由將運算元502預 化成-偶數部分512及一奇數部分522,便可使用 ^由扁碼技術,而又不會產生進位,因為在表示式5ΐι及 中’所有的3位元資斜ρ T s 1兀貝料奴514〜518及524〜528的數值 就是因此,所對應到的乘法係數不是〇就 谁付於本發明可透過習知的布斯編碼結構,執行沒有 的2 作,因此’不會增加微處理器或其它裝置内 ί的乘法農置的複雜度。在原本的運算元5〇1中, 2 =料段5〇4〜508的數值時,將因為資料段505所對 f:乘法係數為+2,而造成乘法操作發生進位。然而,在 貝施例中’由預先格式化所產生的資料段別〜518及 524〜528的數值並不會造成進位。 第6 ®為本發明之無輕喊法單元。緑單元_ ”第1 _乘法單元_相似。乘法單元_呈有一第一 存器601。第一運算暫存請轉接-無進位預 碼無進位預先格式單元612 ·接-布斯編 W 乘法早το 6〇〇具有一第二運算元暫存器6〇2。第 -運异7C暫存器祕—部分乘積產生器綱 碼器,4及部分乘積產生器603均耦接一布斯多工器二爲 CNTR2522I00-TW/〇6〇8-A42859-TW/Final \η 201224916 布斯多工器605透過一匯流排PARTPROD,耦接一壓縮器 606。壓縮器606具有許多無進位壓縮係數。壓縮器606具 有一無進位致能輸入信號,並包括複數進位儲存加法器 (CSAs)608。進位儲存加法器608係以華萊士樹架構排列。 壓縮器606透過一匯流排SUMS,耦接一左移器6〇9,以及 透過一匯流排CARRIES ’耗接·全加法器61〇〇左移609 耦接全加法器610。在一可能實施例中’全加法器61〇透 過一匯流排RESULT,輸出具有128位元的乘法結果。全 加法器610透過匯流排RESULT,耦接一暫存器613。另外, 一乘積同步器607產生一同步信號CLK。為了使乘法單元 600產生最終的128位元乘積’無進位預先格式單元612、 壓縮器606、左移器609以及全加法器610接收同步作號 CLK ’用以同步進行操作。另外,本發明之乘法單元6〇〇 具有一操作碼偵測器(opcode detector)611。操作碍侦測器 611產生一無進位信號CARRYLESS。無進位預先格式單元 612、壓縮器606以及左移器609接收無進位信號 CARRYLESS。 在習知的乘法操作或是無進位乘法操作中,一指令(未 顯示)會被直接地或間接地、或是分成兩運算元,傳送到乘 法單元600。在一可能實施例中,一乘數運算元〇p A會 被提供至第一運算元暫存器601,而一被乘數運算元〇pB 會被&供至第·一運鼻元暫存益602。乘數運算元〇p a及 被乘數運异元OP B均為2的補數。在本實施例中,乘數運 算元0PA及被乘數運算元0PB均具有64位元,但並非 用以限制本發明。在其它實施例中,乘數運算元〇p A及 CNTR2522I00-TW/0608-A42859-TW/Final lg s 201224916 被乘數運算元Op I亡甘一 一 B具有其匕數目的位元。在另一實施例 _ 64Λ乘法中’可將兩個64位元的運算元分成四個 立元的運t元,並透過乘法單元600進行乘法運算。 為了產生-最終乘積’乘法單元繼使用第!圖的乘 至早元100的布斯編碼,用以降低加總的部分乘積的數 ,在A施例中’使用3位元的布斯編碼器_。當布 編碼器6G4運作肖,可使布斯多卫器6()5連續輸出相對 :勺P刀乘積,其中該等部分乘積係為基底_4的乘法結 、因此j可降低部分乘積的加總數量。藉由加總部分乘 係數便可產生一最終結果。針對其它不同H 、 編碼在進行無進位的乘法時,必須對無進位預 格式化、後格式化以及科乘積騎同量的敍,用^ ,可進位。因此’藉由同步信號clk,布斯編碼 益604 5十异本身所接收到的連續3位元資料段,並且透過 排PPSEL ’控制布斯多工器6〇5。匯流排冲肌上的 ㈣使布斯多工器1()5從五個部分乘積((_2B m、 2B)中’選擇一者’其中上述五個部分乘積與被乘數運算元 0PB有關。這些部分乘積都是由部分乘積產生請^產 生。部分乘積產生器603將被乘數運算元〇PB乘於〇,故 可產生部分乘積〇 ;部分乘積產生器6〇3將被乘數運算元 〇P B乘於卜故可產生部分乘積B。部分乘積產生器6〇3 將被乘數運算元0PB乘於_丨,故可產生部分乘積_b。部分 乘積產生器603將被乘數運算元0PB乘於2,故可產生部 分乘積2B。部分乘積產生器603將被乘數運算元〇ρΒ· 於-2,故可產生部分乘積-2Β。對本領域之技術人員而言, CNTR2522I00-TW/0608-A42859-TW/Final 19 201224916In the US patent case US 5,691 930 proposed by Kim, it has been blessed: !1: The technique is not introduced in detail. Figure 2 shows the correspondence between the value of the four values in the multiplier (〇p A) and the multiplication coefficients. The partial multiplication by the PB)_f multiplication coefficient gives the corresponding sum and 111昧,^ And § 'When the value of the 3-bit data segment of the multiplier is 000, the value A 〇 corresponds to the multiplication coefficient 〇. When the 3-bit data segment of the multiplier is 3 bits - Shen Zu & 〇 10, the multiplication coefficient + 1 can be used. When the multiplier is -1. :Failure: When the value of 5 is 101 * 110, then the multiplication coefficient method (4) +7 am is called the value of Xiao, then it can be multiplied. When the value of the 3-bit data segment of the multiplier is the material, the number _2 can be Β^. The partial product generator (8) will be multiplied by the eight-object multiplication ί money to generate multiple partial products, and will be S'jun;: two Λ two to multiplexer 1〇5. Booth code 1 104 judges the multiplication? ::Sink★ :Ρ: The value of a 3-bit data segment, and according to the judgment result ^ bus bar PPSEL, select a corresponding partial product. The number of products. In the first = technique: for example, the part of the binary multiplicand operator 301 under the multiplication operation is reduced. The multiplicand operation unit c^522Z^mls^^ ^ ^ m ^ ± f 3 ® ^ 11^ 13 201224916 The multiplier operation unit 302 having 4 bits. The multiplier operand 302 may be provided to the Booth encoder described above. It is well known in the art that in order to perform 3-bit Booth coding, it is necessary to arrange the bit 303 of value 0 after the last significant bit (LSB) of the multiplier operator 302. Based on the value of the first 3-bit data segment 304, a multiplication coefficient can be found from the table 200 presented in Figure 2. Based on the multiplication coefficient, a partial product having 4 bits can be selected. As shown in Fig. 3, the value of the first 3-bit data segment 304 is 110, so that the multiplication coefficient -1 can be corresponded. Therefore, the multiplicand operation unit 301 takes the 2's complement and then expands its bit number, so that the expanded partial product 307 can be obtained, and its value is 11111001. The next 3-bit data segment 305 overlaps the data segment 304 by one bit. According to the table 200 of Fig. 2, it can be known that the data segment 305 corresponds to the multiplication coefficient + 1. Therefore, the multiplicand operand 301 can be directly used as the partial product 308. Since the substrate-4 is taken as an example, the partial product 308 is shifted to the left by 2 bits. That is, the last significant bit of the partial product 308 is aligned to the bit 2 of the partial product 307 (if the last significant bit of the partial product 307 is referred to as bit 0). According to the table of Fig. 2, it can be known that the last 3-bit data segment 306 corresponds to the multiplication coefficient 0. Therefore, the partial product 309 is 0000, and the partial product 309 is shifted to the left by 2 bits. That is, the last significant bit of the partial product 309 is aligned to the bit 2 of the partial product 308 (if the last significant bit of the partial product 308 is referred to as bit 0). By summing the partial products 307 to 309, an 8-bit multiplication result 310 is obtained, which has a value of 00010101. The invention relates to Booth coding. The performance that Booth code can achieve is quite high when performing multiplication operations. However, the Booth code cannot perform a carry-free multiplication operation. When judging the value of the 3-bit data segment, it may correspond to CNTR2522I00-TW/0608-A42859-TW/Final] 4 201224916 to the multiplication coefficient +2 or ·2, thus the bit phenomenon. In order to add a total product to the square, it will happen to you. You can't use it. In the device, if there is no carry, it will happen in the calculation (4)! technology. In addition, the carry-in phenomenon is therefore also used in the helmet-like state, so compression cannot be used. Fully independent carry-in multiply single r-multiplication operation, the present invention provides an independent non-carry multiplication method ^, that is, in a multiply unit, at least adding new hardware, will increase the work:. Those skilled in the art are well aware of the complexity of testing and debugging. Rate 〆 干 dry, reduce guilt and increase the number of people in the field, 罙 4 = the original multiplication hardware in the device: the law uses the processor or other Booth code and compression hardware Use. However, the multiply operation by the compressed hardware to achieve no carry is not possible by the Booth code and the present invention provides a device for achieving a carry-free multiplication for a processor or other coding element and compression element, To make this micro=invention, the original multiplication operation using the original Booth road is necessary to modify the original multiplication _t, and does not affect the original multiplication unit to be the most with the fourth figure. The invention is illustrated. The law is early. In summary, the multiplication coefficient corresponding to the value of the bit data segment of the multiplier is as follows. Since this multiplication factor may be: In the normal operation, a carry may occur. In addition, carry-save adders (CSAs) that originally had a Wallace configuration also generate carry. Therefore, it is provided that the technique of no-touch multiplication is used to divide the single-sense into two; in order to avoid the addition of a partial product, a carry occurs. The present invention = CNTR2522I00-TW/0608-A42859-TW/Final 15 ^ 201224916 Improved compressor, which can selectively enable or disable the 4th ® is a Boots-like drawing similar to the present invention 2, the difference is that the fourth picture is the second two, ^ 4 and ten correspond to the value _ and _ respectively. By taking the multiplier operation element of m (four), it can avoid the occurrence of the figure as shown in Fig. 4_line = 001, 011~111. The present invention formats the multiplier and, after erasing the result, uses a Booth encoding device. Since the rounding format factor is avoided, a carry-less multiplication operation can be performed. Figure 5 shows how the present invention formats the operands and re-executes the multiply-free multiplication operation. Figure 5 shows three representations of 5 s, 511 and 521. The expression 501 has 8-bit arithmetic elements 5〇2^ and 5〇1, 5〇3. Bit 503 has a value of 0 and is arranged after the most significant bit (LSB) of operand 5〇2. In general, the final end of operand 502 is referred to as bit 0 (bit 0). If the values of the odd-numbered bit positions 7° bit 3, bit 5, and bit 7) of the operand 502 are modified to 〇, the modified result Μ# shows the even-numbered portion 512 of the expression 511. In order to calculate the even portion 512 and the Tables code, the bit 513 can be arranged after the investment-skill-effective bit of the operand 512, where the value of the bit 513 is 〇. The value of the odd-numbered bits of the operand (ie, bit 3, bit 5, and bit 7) is shifted to the right (gp# is bit 2, bit 4, and bit 6), and the odd-numbered operands of the operand 5〇2 are only The axe & 〇 enters the odd part 522 of the expression 521. In order to perform the Bussian coding calculation on the odd-numbered portion 522, the bit 523 is arranged after the last significant bit of the odd-numbered portion 522, where the value of the bit 523 is 〇. The even-numbered portion 512 and the odd-numbered portion 522 of the knives completely represent the original _ _ 5 〇 2, and can be multiplied instead of the arithmetic unit 5 〇 2 . The even part CNTR2522I00-TW/0608-A42859-TW/Final 16 201224916 is multiplied by the odd part 522 and then added to the odd part 522 to produce the final part. According to the operation element 5〇2, the even part M2 is generated. 77 522 'Re-use the Booth code to check the even part 512 and the odd dr' to obtain a multiplication result. For the general formatted operation = H material (four) method into the hiding operation, you need to repeat twice before all the steps. However, the present invention can use the flat code technique by pre-casting the operand 502 into an even-numbered portion 512 and an odd-numbered portion 522, since no carry is generated because in the expression 5ΐι and in the middle of all 3 bits The value of Yuanzi oblique ρ T s 1 兀 料 514 514 514 518 518 518 518 518 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 514 2, so 'does not increase the complexity of the multiplication of the microprocessor in the microprocessor or other devices. In the original operation element 5〇1, when 2 = the value of the segment 5〇4~508, the multiplication operation will be carried out because the f:multiplication coefficient of the data segment 505 is +2. However, in the case of the Besch example, the values of the data segments ~518 and 524~528 generated by the pre-formatting do not cause a carry. The 6th ® is the no-shout method unit of the present invention. The green unit _ ” 1st _ multiplication unit _ is similar. The multiplication unit _ has a first register 601. The first operation temporary storage please transfer - no carry pre-code no carry pre-format unit 612 · connect - Booth code W multiplication Early το 6〇〇 has a second operand register 6〇2. The first-transport 7C register is a partial product generator, and the 4 and partial product generators 603 are coupled to a Busto. The second is a CNTR2522I00-TW/〇6〇8-A42859-TW/Final \η 201224916. The Buss multiplexer 605 is coupled to a compressor 606 via a bus bar PARTPROD. The compressor 606 has a plurality of carry-free compression coefficients. Compressor 606 has a carry-in enable input signal and includes complex carry-storage adders (CSAs) 608. Carry-save adder 608 is arranged in a Wallace tree architecture. Compressor 606 is coupled through a bus, SUMS, The left shifter 6〇9, and the full adder 610 is coupled to the left adder 610 via a bus bar CARRIES 'consumer full adder 61 。 left shift 609. In a possible embodiment, the 'full adder 61 〇 passes through a bus RESULT , the output has a multiplication result of 128 bits. The full adder 610 passes through the bus bar RESU LT is coupled to a register 613. In addition, a product synchronizer 607 generates a synchronization signal CLK. In order for the multiplication unit 600 to generate a final 128-bit product, the non-carry preformat unit 612, the compressor 606, and the left shifter The 609 and the full adder 610 receive the synchronization number CLK' for synchronous operation. In addition, the multiplication unit 6 of the present invention has an opcode detector 611. The operation detector 611 generates a null. The carry signal CARRYLESS. The carry-free pre-format unit 612, the compressor 606, and the left shifter 609 receive the carry-free signal CARRYLESS. In a conventional multiplication operation or a carry-less multiplication operation, an instruction (not shown) is directly or Interleaved or divided into two operands, passed to multiplication unit 600. In a possible embodiment, a multiplier operand 〇p A is provided to first operand register 601, and a multiplicand operation Yuanxiao pB will be supplied to the first one by the first time. The multiplier operation unit 〇pa and the multiplicand transport element OP B are both two's complement. In this embodiment, the multiplier Operator 0PA and multiplicand operator 0PB has 64 bits, but is not intended to limit the present invention. In other embodiments, the multiplier operation unit 〇p A and CNTR2522I00-TW/0608-A42859-TW/Final lg s 201224916 multiplicand operand Op I In the other embodiment _64Λ multiplication, two 64-bit operands can be divided into four tiling elements and multiplied by multiplication unit 600. Operation. In order to generate - the final product 'multiplication unit continues to use the first! The graph is multiplied to the Booth code of the early element 100 to reduce the total number of partial products. In the A embodiment, the 3-bit Buss encoder _ is used. When the cloth encoder 6G4 is operated, the Bussor 6 () 5 can continuously output the relative: scoop P-knife product, wherein the partial product is the multiplication knot of the base _4, so j can reduce the partial product plus The total number. A final result can be produced by summing the partial multiplication coefficients. For the other different H, the encoding is carried out in the carry-free multiplication, the non-carry pre-formatting, post-formatting, and the product multiplication must be the same amount, with ^, can be carried. Therefore, by the synchronizing signal clk, the Booth code encodes a continuous 3-bit data segment received by itself, and controls the Buss multiplexer 6〇5 through the row PPSEL'. (4) on the busbar pulsator (4) causes Buss multiplexer 1() 5 to select one of the five partial products ((_2B m, 2B), where the above five partial products are related to the multiplicand operator 0PB. These partial products are generated by partial product generation. The partial product generator 603 multiplies the multiplicative operation element 〇PB by 〇, so that a partial product 〇 can be generated; the partial product generator 6〇3 will be the multiplicand operation element. The 〇PB is multiplied by the suffix to generate a partial product B. The partial product generator 〇3 multiplies the multiplicand operator 0PB by _丨, so that the partial product _b can be generated. The partial product generator 603 will be the multiplicand operator The partial product 2B can be generated by multiplying 0PB by 2. The partial product generator 603 will be multiplied by 运算ρΒ· at -2, so that a partial product -2 可 can be generated. For those skilled in the art, CNTR2522I00-TW/ 0608-A42859-TW/Final 19 201224916

可利用部分乘積產生器603,取得被乘數運算元B 數、或是將被乘數運算元〇PB往左移、或是先取被乘數不運 算元OPB的補數,再將補數結果往左移,便能得到這五個 部分乘積。 若操作碼偵測器611侧到一正常乘法指令時 致能信號CARRLESS。因此,無進位預先格式單元6以 純地將第一運算元暫存器601所接收到的乘數運算元〇p A 傳送至布斯編碼器604。若操作碼偵測器611偵測到一無 進位乘法指令時’則致能信號(:八11]11^%。因此',' 無^ 預先格式單元612將乘數格式化成一偶數部分以及= 部分(如第5圖的偶數部分512及奇數部分522所示 依序地進行布斯編碼的檢查。 μ w 同步信號CLK使布斯編碼器1Q4檢查本身所接收到的 連續3位S資料段’並可使壓縮器_儲存相對應的部分 乘積,直到所有的資料段均被檢查完畢。若無進位信二 CARRYLE S S不被致能(也就是債測到一正常乘法指令),部° 分乘積會被分配到進位儲存加法器6〇8的輸 C’用以在流排CA簡上產生一進位位元,:= 排SUMS上,—產生加總位元。全加法器61〇力口總匯流排 SUMS上的位兀,用以在匯流排RESULT上,產生ία位 元的最終結果,其中最終結果係為2的補數1例而言, 若信號CARYLESS未被致能,左移器_會將匯流排 SUMS上的數值直接地傳送到全加法器6ι〇。若芦號 CARYL E S S被致能(也就是債測到一無進位乘法指令則, 所有的進位儲存加法器_所輪出的進位位 CNTR2522I0O-TW/0608-A42859-TW/Final 2〇 201224916 (也就是被設定成〇)。只有進位铨十丄 议储存加法器6〇8所輸出的加 總位元才是有效的。在一可能督The partial product generator 603 can be used to obtain the multiplicand operand B number, or to shift the multiplicand operand 〇PB to the left, or to take the complement of the multiplicand non-operating element OPB, and then add the complement result. Move to the left to get the five partial products. If the opcode detector 611 side goes to a normal multiply instruction, it activates the signal CARRLESS. Therefore, the carry-free preformat unit 6 transmits the multiplier operand 〇p A received by the first operand register 601 to the Booth encoder 604. If the opcode detector 611 detects a carry-free multiply instruction, then the enable signal (: eight 11] 11^%. Therefore, 'no' pre-format unit 612 formats the multiplier into an even part and = The portion (such as the even portion 512 and the odd portion 522 of Fig. 5) sequentially checks the Booth code. μ w The sync signal CLK causes the Buss encoder 1Q4 to check the consecutive 3-bit S data segments received by itself. And the compressor _ can store the corresponding partial product until all the data segments have been checked. If there is no carry letter 2 CARRYLE SS is not enabled (that is, the debt measured a normal multiplication command), the partial ° product will The input C' assigned to the carry storage adder 6〇8 is used to generate a carry bit on the stream CA,: = on the SUMS, to generate the total bit. The total adder 61 The position on the SUMS is used to generate the final result of the ία bit on the bus RESULT, where the final result is 2's complement. For example, if the signal CARYLESS is not enabled, the left shifter will Transfer the value on the bus line SUMS directly to the full adder 6ι〇. No. CARYL ESS is enabled (that is, the debt is measured by a carry-in multiplication command, all the carry-save adders _ rounded carry-in bits CNTR2522I0O-TW/0608-A42859-TW/Final 2〇201224916 (that is, Set to 〇). Only the carry-in 铨 10 储存 storage adder 6 〇 8 output of the total bit is valid.

、 貫施例中,乘數運算元OP AIn the example, the multiplier operand OP A

的偶數部分會被分配到進位儲左IThe even part will be assigned to the carry memory left I

J 省存加法器608的輸入端A、BInput A, B of J save adder 608

及C,用以在匯流排SUMS上,吝& ,田A 產生偶數部分的加總位元。 這些偶數部分的加總位元會射轉在暫存器613之中。接 著,乘數運算元OPA的奇數部分會被分_進位儲存加法 器608的輸入端A、b及C 1以在匯流排sums上,產 生奇數部分的加總位元。這些奇數部分的加總位元會被左 移器609左移一位元。在兩實施例中,全加法器61〇透過 匯流排CARRIES所接收到的值為〇。在匯流排SUMs產生 奇數部分的加總位元後,藉由將暫存器613所儲存的資料 (偶數部分)與匯流排RESULT上的資料(奇數部分)作、’ 或(XOR)運算,便可得到一最終的無進位結果。在— 〜 」月匕 貫施例中,可利用一互斥或閘(XOR gate) ’進行互斥或 算。 5 本發明之乘法單元600玎執行一般的乘法運算以及益 進位乘法運算。乘法單元600具有邏輯、電路、裝置或^ 微碼 (microcode), 如微指令或是原生指令(native instruction)、或是邏輯、電路、裝置或微碼之間的組人、 或是其它可執行上述運作的其它等效元件。在一可能實施 例中,用以進行上述運作的元件可被其它電路、微碼..·等 使用。這些電路或微碼係用以進行處理器或其它裝置的其 它運作。在本實施例中,微瑪係為一名詞,其與多個微指 令有關。微指令(亦稱為原生指令)係為一指令,其由一單 元所執行。舉例而言,微指令吁直接地由一精簡指令集叶 CNTR2522I00-TW/0608-A42859-TW/Final 21 201224916 算機(reduced instruction set computer ; RISC)所執行。針對 一複雜指令集計算機(Complex Instruction Set C〇mputer. CISC)而言,如X86相容微處理器,x86指令會被轉換:微 指令’並藉由CISC微處理器内的至少一單元, ; 後的微指令。 仃轉換 本領域人士可體會出,將操作碼偵測器6U、無 先格式單元012以及左移器609與壓縮器6〇6及全加 610相結合,只是些微的修改,並不會使處理器的石^體^ 得很複雜。然而’本發明的好處(如功率損耗低、 變 除錯及測試時間短)遠大於本發明的效能特性。罪又向、 第7圖為本發明之無進位的乘法方法。在方法7 2 =二處=或是其它裝置執行-般的乘法指令或 疋,,.、進位h·?接者,流程進入步驟7〇2。 一 在步驟702中,摘取並執行下一乘法指令 後的結果提供予-乘法單心接著,流程進人步取 在步驟7〇3中,執行一計算,用以 接收到—錢絲料令。魏料元縣接收否 位乘法指令’職行步驟7Q5。若乘法單元 無進 位乘法指令,則執入步驟704。 一無進 在步驟705中,乘法單元執行一般乘法操作 兀利用布斯編碼及壓縮技術,減少部分 去早 生一最終結果。接著,執行步驟713。 回数,並產 在步驟704中,檢查乘數的偶數位元 果,得到被乘數的部份乘績。詳 查結 算元的第奇數個位元的數值修改成〇,二:-乘數運 CNTR2522I00-TW/O608-A42859-TW/Final 22 尿布斯編石馬技 201224916 術,判斷在修改後的結果中,連續3位元資料段,用以得 到複數部分乘積,並且該等部分乘積的總合並不會發生進 位。由於乘數運算元的第奇數個位元的數值均為0,因此, 所有可能因布斯編碼而產生的進位現象均會被排除。接 著,執行步驟706。 在步驟706中,藉由壓縮器内的華萊士樹結構,進位 位元的數值會被設定成0,即不致能華萊士樹裡的進位位 元。接著,執行步驟707。 在步驟707中,根據偶數部份,加總部份乘積,以得 到第一無進位加總結果SUM1。詳細的說明是,將所有部 分乘積進行互斥或(XOR)運算,用以產生一第一無進位加 總結果SUM1。接著,執行步驟708。 在步驟708中,將乘數運算元右移1位元。接著,執 行步驟709。 在步驟709中,檢查位移後的乘數的偶數位元,並根 據檢查結果,得到被乘數的部份乘積。詳細的說明是,將 已右移的乘數運算元的第奇數個位元的數值設定成0,用 以產生此乘數運算元的一奇數部分。根據布斯編碼技術, 判斷此奇數部分的連續3位元資料段,用以選擇複數部分 乘積。根據該等部分乘積,便可得到一無進位乘法結果。 接著,執行步驟710。 在步驟710中,位移部份乘積,並加總部份乘積,以 得到第二無進位加總結果SUM2。詳細的說明是,將奇數 部分所得到的部分乘積進行互斥或運算,用以得到一第二 無進位加總結果SUM2。接著,執行步驟711。 CNTR2522I00-丁 W/0608-A42859-丁 W/Final 23 201224916 在步驟711中,將第二無進位加總結果SUM2左移1 位元。接著,執行步驟712。 在步驟712中,加總左移後的第二無進位加總結果 SUM2與第一無進位加總結果SUM2,用以產生一最終無進 位乘法結果。接著,執行步驟713。 在步驟713中,完成此流程。 雖然上述内容已詳細說明本發明之目的、功能及優 點,但本發明亦包含其它實施例。舉例而言,由於64位元 的無進位乘法是目前處理器及其它裝置中,較為普遍的大 小,故在上述内容中,較詳細說明64位元的無進位乘法。 然而,本發明亦可適用在其它具有不同位元數量的處理器 或裝置中。因此,本發明並不限定在64位元。 另外,許多乘法單元均係利用一多通道裝置。舉例而 言,64位元的運算元會被分成4個32位元的運算元。乘 法單元根據這4個運算元,產生許多乘積結果。該等乘積 結果會被加總在一起,用以產生一最終結果。本發明的目的 之一就是利用一般乘法所使用的布斯編碼以及部分乘積產 生硬體。 最後,雖然上述内容係利用基底-4的布斯編碼技術, 但並非用以限制本發明。為了使用現有的布斯編碼硬體架 構,在其它實施例中,可使用大於4的基底。為了使用布 斯編碼,但又不想產生進位,則可選擇一輸入運算元的某 些特定位元,並將未選擇的位元的數值設定成〇。因此, 可將該輸運算元格式化成複數部分。 雖然本發明已以較佳實施例揭露如上,然其並非用以 CNTR2522I00-TW/0608-A42859-TW/Fina] 24 201224916 限定本發明,任何所屬技術領域中具有通常知識者,在不 脫離本發明之精神和範圍内,當可作些許之更動與潤飾, 因此本發明之保護範圍當視後附之申請專利範圍所界定者 為準。 【圖式簡單說明】 第1圖為微處理器或相似裝置中的64位元乘法單元之 方塊圖。 第2圖係為一表格,用以說明第1圖的乘法單元係如 何利用布斯編碼降低部分乘積的數量。 第3圖係說明如何利用布斯編碼技術,在4位元乘法 操作中,降低部分乘積的數量。 第4圖顯示本發明之可進行一無進位乘法的布斯編碼 係數。 第5圖係為本發明如何格式化一運算元。 第6圖係為本發明之一無進位乘法單元之方塊圖。 第7圖係為本發明之無進位乘法流程圖。 【主要元件符號說明】 100、 600 :乘法單元; 101、 601 :第一運算元暫存器; 102、 602 :第二運算元暫存器; 103、 603 :部分乘積產生器; 104、 604 :布斯編碼器; 105、 605 :布斯多工器; 106、 606 :壓縮器; CNTR2522!00-TW/0608-A42859-TW/Final 25 201224916 107、 607 :乘積同步器; 108、 608 :進位儲存加法器; 109、 610 :全加法器; 3(H、302、502 :運算元; 303、503、513、523 :位元; 304〜306、504〜508、514〜518、524〜528 :資料段; 307〜309 :部分乘積; 310 : 相乘結果, 512 : 偶數部分; 522 : 奇數部分; 609 : 左移器; 611 : 操作碼偵測器; 612 : 無進彳立預先格式單元; 613 : 暫存器。 CNTR2522I00-TW/0608-A42859-TW/Final 26And C, used to generate an even number of additional bits on the bus SUMS, 吝 & The summing bits of these even parts are rotated into the register 613. Next, the odd portion of the multiplier operand OPA is divided into the input terminals A, b, and C 1 of the carry register 608 to produce an odd portion of the sum bits on the bus sums. The summing bits of these odd parts are shifted left by one bit by the left shifter 609. In both embodiments, the full adder 61 接收 receives a value of 〇 through the bus bar CARRIES. After the bus SUMs generates the odd-numbered partial sum bits, by storing the data (even part) stored in the register 613 and the data (odd part) on the bus RESULT, or 'XOR', A final no-carry result is obtained. In the -~ "monthly" example, a mutual exclusion or gate (XOR gate) can be used for mutual exclusion or calculation. The multiplication unit 600 of the present invention performs general multiplication operations and profit multiplication operations. The multiplication unit 600 has logic, circuitry, devices, or microcodes, such as microinstructions or native instructions, or groups of logic, circuits, devices, or microcode, or other executables. Other equivalent components of the above operations. In a possible embodiment, the elements used to perform the above operations can be used by other circuits, microcodes, etc. These circuits or microcode are used for other operations of the processor or other device. In this embodiment, the micromarine is a noun associated with a plurality of microinstructions. A microinstruction (also known as a native instruction) is an instruction that is executed by a single unit. For example, the microinstruction call is directly executed by a reduced instruction set leaf CNTR2522I00-TW/0608-A42859-TW/Final 21 201224916 computer (reduced instruction set computer; RISC). For a Complex Instruction Set Computer (CISC), such as an X86 compatible microprocessor, x86 instructions are converted: the microinstruction 'and by at least one unit within the CISC microprocessor; After the microinstruction.仃 Conversion It will be appreciated by those skilled in the art that the opcode detector 6U, the no-pre-format unit 012, and the left-shifter 609 are combined with the compressor 6〇6 and the full 610, but only minor modifications do not result in processing. The stone body of the device is very complicated. However, the benefits of the present invention (e.g., low power loss, variable distortion, and short test time) are much greater than the performance characteristics of the present invention. The sin is again, and the seventh figure is the multiplication method of the invention. In method 7 2 = two = or other device performs a general multiplication instruction or 疋, ,., carry h·?, the flow proceeds to step 7〇2. In step 702, the result of extracting and executing the next multiplication instruction is provided to the multiplication method. Then, the flow proceeds to step 7〇3, and a calculation is performed to receive the money-making order. . Wei Yuanyuan received no bit multiplication instruction step 7Q5. If the multiplying unit has no carry multiply instruction, then step 704 is executed. In the next step 705, the multiplication unit performs a general multiplication operation, and uses the Buss coding and compression technique to reduce the portion to the early end result. Next, step 713 is performed. The number of rounds is generated and, in step 704, the even-numbered bits of the multiplier are checked to obtain a partial score of the multiplicand. Check the value of the odd-numbered bits of the settlement element and change it to 〇, 2:-multiplier, CNTR2522I00-TW/O608-A42859-TW/Final 22 diapers, Maji 201224216, judged in the modified result A continuous 3-bit data segment used to obtain a complex partial product, and the total combination of the partial products does not undergo a carry. Since the value of the odd-numbered bits of the multiplier operator is 0, all carry-over phenomena that may be caused by the Bussian code are excluded. Next, step 706 is performed. In step 706, by the Wallace tree structure in the compressor, the value of the carry bit is set to zero, i.e., the carry bit in the Wallace tree is not enabled. Next, step 707 is performed. In step 707, based on the even portion, the product of the headquarters is added to obtain the first carry-free total result SUM1. In the detailed description, all partial products are mutually exclusive or (XOR) operated to generate a first carry-free addition result SUM1. Next, step 708 is performed. In step 708, the multiplier operand is shifted right by 1 bit. Next, step 709 is performed. In step 709, the even-numbered bits of the shifted multiplier are checked, and based on the result of the check, a partial product of the multiplicand is obtained. The detailed description is to set the value of the odd-numbered bits of the right-shifted multiplier operand to 0 to generate an odd-numbered portion of the multiplier operand. According to the Booth coding technique, a continuous 3-bit data segment of the odd-numbered portion is determined to select a complex partial product. Based on these partial products, a result of a carry-free multiplication can be obtained. Then, step 710 is performed. In step 710, the partial product is shifted and the head product is added to obtain a second carry-free sum result SUM2. In detail, the partial product obtained by the odd portion is mutually exclusive ORed to obtain a second carry-free addition result SUM2. Then, step 711 is performed. CNTR2522I00-丁 W/0608-A42859-丁 W/Final 23 201224916 In step 711, the second carry-free addition result SUM2 is shifted left by 1 bit. Next, step 712 is performed. In step 712, the second carry-free sum result SUM2 after the left shift is added to the first carry-free sum result SUM2 to generate a final carry-free multiplication result. Next, step 713 is performed. In step 713, this process is completed. While the foregoing has been described in detail, the preferred embodiments of the invention For example, since the 64-bit non-carry multiplication is a relatively common size in current processors and other devices, in the above, the 64-bit non-carry multiplication is described in more detail. However, the invention is also applicable to other processors or devices having different numbers of bits. Therefore, the present invention is not limited to 64 bits. In addition, many multiplying units utilize a multi-channel device. For example, a 64-bit operand is divided into four 32-bit operands. The multiplication unit produces a number of product results based on the four operands. These product results are summed together to produce a final result. One of the objects of the present invention is to generate a hardware using a Booth code and a partial product used in general multiplication. Finally, although the above is based on the Booth encoding technique of the substrate-4, it is not intended to limit the invention. In order to use an existing Booth coded hardware architecture, in other embodiments, a substrate greater than 4 can be used. In order to use the Booth code, but do not want to generate a carry, you can select a specific bit of an input operand and set the value of the unselected bit to 〇. Therefore, the input operand can be formatted into a complex part. Although the present invention has been disclosed above in the preferred embodiments, it is not intended to limit the invention to CNTR2522I00-TW/0608-A42859-TW/Fina] 24 201224916, and any one of ordinary skill in the art without departing from the invention In the spirit and scope of the invention, the scope of protection of the invention is defined by the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram of a 64-bit multiplying unit in a microprocessor or similar device. Figure 2 is a table showing how the multiplication unit of Figure 1 uses the Booth code to reduce the number of partial products. Figure 3 illustrates how to use the Booth coding technique to reduce the number of partial products in a 4-bit multiplication operation. Figure 4 shows the Booth coding coefficients of the present invention which can perform a carry-free multiplication. Figure 5 is a diagram of how the present invention formats an operand. Figure 6 is a block diagram of a carry-free multiplication unit of the present invention. Figure 7 is a flow chart of the carry-in multiplication of the present invention. [Description of main component symbols] 100, 600: multiplication unit; 101, 601: first operand register; 102, 602: second operand register; 103, 603: partial product generator; 104, 604: Booth encoder; 105, 605: Buss multiplexer; 106, 606: compressor; CNTR2522!00-TW/0608-A42859-TW/Final 25 201224916 107, 607: product synchronizer; 108, 608: carry Storage adder; 109, 610: full adder; 3 (H, 302, 502: operand; 303, 503, 513, 523: bit; 304~306, 504~508, 514~518, 524~528: Data segment; 307~309: partial product; 310: multiplication result, 512: even part; 522: odd part; 609: left shifter; 611: opcode detector; 612: no pre-formatted unit; 613 : Register. CNTR2522I00-TW/0608-A42859-TW/Final 26

Claims (1)

201224916 七、申請專利範圍·· 1.一種無進位乘法裝置,用以進行—無進位乘法運算, 包括: 無進位預先格式單元,用以接收—乘數運算元,並 將該乘數運算秘以t成複數部分; 布斯編碼器’接收並判斷該等部分,並選擇一被乘 數運算元的複數第一部分乘積,其中藉由該等部分,可避 免該被乘數運算元的複數第二部分乘積被選擇,該等第二 部分乘積會造成一進位現象; 縮器,耦接該布斯編碼器,用以透過複數進位儲 存加法器,加總該等第一部分乘 a專進位儲存加法器 ft ί 元’其中該等進位儲存 加法器以-華萊士樹架構排列 時’該等驗以不被致能;财衫祕乘法運算 一左移器,耦接該壓縮器, 移至少一位元;以及 。r將該壓縮器的輪出左 :互斥或閑,接該壓縮 -互斥或運算’並產生一無進位乘移用以進行 2. 如申請專利範圍第1項所凉 該乘數運算元的格式係為2 無進位乘法裝置,其中 3. 如申請專利範圍第1項 該等部分包括·· 、叹無進位乘法裝置,其中 …=部分,具有該乘數運算元的偶數位元的數信 以及該乘數運算元的奇數位元 凡的數值, 具有的該乘數運算元的奇數位元 ,、該偶數部分所 数位疋的數值被設定成〇; CNTR2522I00-TW/0608-A42859-TW/Final 2? 久 201224916 一奇數部分,具有該乘數運算元的奇數位元的數值, 以及該乘數運算元的偶數位元的數值,其中該奇數部分所 具有的該乘數運算元的偶數位元的數值被設定成〇,並且 該奇數部分所具有的數值往右移1位元; 其中在該乘數運算元的偶數位元中,包括該乘數運算 元的最終有效位元。 4. 如申請專利範圍第1項所述之無進位乘法裝置,其中 該布斯編碼器包括一基底-4編碼器,用以選擇部分乘積。 5. 如申請專利範圍第1項所述之無進位乘法裝置,其中 該等第一部分乘積包括該被乘數運算元的複數第一乘法, 該等第一乘法包括: 將該被乘數運算元乘上〇 ;以及 將該被乘數運算元乘上1。 6. 如申請專利範圍第5項所述之無進位乘法裝置,其中 該等第二部分乘積包括該被乘數運算元的複數第二乘法, 該等第二乘法包括: 將該被乘數運算元乘上2 ;以及 將該被乘數運算元乘上-2。 7. 如申請專利範圍第1項所述之無進位乘法裝置,其中 該無進位乘法裝置係設定在一處理器裡的一乘法單元或一 裝置裡的一乘法單元。 8. 如申請專利範圍第7項所述之無進位乘法裝置,其中 該乘法單元用以執行一無進位乘法運算以及一正常乘法運 算。 9. 一種方法,用以執行一無進位乘法運算,包括: CNTR2522I00-TW/0608-A42859-TW/Final 28 201224916 在一處理器内的一乘法單元中,將一乘數運算元格式 化成複數部分; 透過一布斯編碼器,判斷該等部分,並選擇一被乘數 運算元的複數第一部分乘積其中藉由該等部分,可避免該 被乘數運算元的複數第二部分乘積被選擇,該等第二部分 乘積會造成一進位現象; 透過複數進位儲存加法器,處理該等第一部分乘積, 用以產生複數加總位元以及複數進位位元,其中該等進位 儲存加法器以一華萊士樹架構排列,並且在執行該無進位 乘法運算時,不致能該等進位位元; 將該華萊士樹的輸出,左移至少一位元;以及 對該華萊士樹的輸出進行一互斥或運算,用以產生一 無進位乘法結果。 10. 如申請專利範圍第9項所述之方法,其中該乘數運 算元的格式係為2的補數。 11. 如申請專利範圍第9項所述之方法,其中該等部分 包括: 一偶數部分,具有該乘數運算元的偶數位元的數值, 以及該乘數運算元的奇數位元的數值,其中該偶數部分所 具有的該乘數運算元的奇數位元的數值被設定成0;以及 一奇數部分,具有該乘數運算元的奇數位元的數值, 以及該乘數運算元的偶數位元的數值,其中該奇數部分所 具有的該乘數運算元的偶數位元的數值被設定成0,並且 該奇數部分所具有的數值往右移1位元; 其中在該乘數運算元的偶數位元中,包括該乘數運算 CNTR2522100-TW/0608-A42859-TW/Fina1 29 201224916 元的最終有效位元。 12. 如申請專利範圍第9項所述之方法,其中該布斯編 碼器包括一基底-4編碼器,用以選擇部分乘積。 13. 如申請專利範圍第9項所述之方法,其中該等第一 部分乘積包括該被乘數運算元的複數第一乘法,該等第一 乘法包括: 將該被乘數運算元乘上〇;以及 將該被乘數運算元乘上1。 14. 如申請專利範圍第13項所述之方法,其中該等第二 部分乘積包括該被乘數運算元的複數第二乘法,該等第二 乘法包括: 將該被乘數運算元乘上2 ;以及 將該被乘數運算元乘上-2。 15. 如申請專利範圍第9項所述之方法,其中該等運算 元均具有64位元的運算元。 16. —種無進位乘法裝置,用以進行一無進位乘法運 算,包括: 一第一運算元暫存器,接收一第一運算元,用以進行 該無進位乘法運算; 一第二運算元暫存器,接收一第二運算元,用以進行 該無進位乘法運算; 一操作碼偵測器,接收一無進位乘法指令,並根據該 無進位乘法指令,致能一無進位信號; 一無進位預先格式單元,當該無進位信號被致能時, 該無進位預先格式單元將該第一運算元格式化成複數部 30 CNTR2522I00-TW/0608-A42859-TW/Final S 201224916 分,其中一布斯編碼器藉由該等部分,避免選擇到該第二 運算元的複數第二部分乘積,該等第二部分乘積會造成一 進位現象; ' 一壓縮器,透過複數進位儲存加法器,加總該第二運 算元的複數第一部分乘積,該等進位儲存加法器產生複數 加總位元以及複數進位位元,其中該等進位儲存加法器以 一華萊士樹架構排列,當該無進位信號被致能時,該等進 位位元不被致能; 一左移器,耦接該壓縮器,用以將該壓縮器的輸出左 移至少一位元;以及 一互斥或閘,耦接該壓縮器以及該左移器,用以進行 一互斥或運算,並產生一無進位乘法結果。 17. 如申請專利範圍第16項所述之無進位乘法裝置,其 中該第一及第二運算元的格式係為2的補數。 18. 如申請專利範圍第16項所述之無進位乘法裝置,其 中該等部分包括: 一偶數部分,具有該第一運算元的偶數位元的數值, 以及該第一運算元的奇數位元的數值,其中該偶數部分所 具有的該第一運算元的奇數位元的數值被設定成0 ;以及 一奇數部分,具有該第一運算元的奇數位元的數值, 以及該第一運算元的偶數位元的數值,其中該奇數部分所 具有的該第一運算元的偶數位元的數值被設定成0,並且 該奇數部分所具有的數值往右移1位元; 其中在該第一運算元的偶數位元中,包括該第一運算 元的最終有效位元。 CNTR2522I00-TW/0608-A42859-TW/Final 31 201224916 _ 19.如申請專利範圍第16項所述之無進位乘法裝置,其 中該布斯編碼器包括一基底_4編碼器,用以選擇部分乘積。 _ 20.如申請專利範圍第16項所述之無進位乘法裝置,其 中該等第一部分乘積包括該第二運算元的複數第一乘法, 該等第一乘法包括: 將該第二運算元乘上0;以及 將該第二運算元乘上1。 21. 如申請專利範圍第2〇項所述之無進位乘法裝置,其 中該等第二部分乘積包括該第二運算元的複數第二乘法, 該專第二乘法包括: 將該第二運算元乘上2 ;以及 將該第二運算元乘上-2。 22. 如申請專利範圍第16項所述之無進位乘法裝置,其 中該無進位乘法裝置係設定在—處理器裡的—乘法單元或 一裝置裡的一乘法單元。 23·如巾%專利㈣帛22項所述之無進位乘法裝置, ;:、=乘法單元用以執行-無進絲法運算以及-正常乘 法運算。 24.種方法’用以進行一無進位乘法運算,包括: 在一處理器内的—垂、土 5S -山 及-第-運瞀元田 接收一第一運算元以 第一運开70 ’用以進行該無進位乘法運算; 根據一進位法指令,致能一無進位信號; 當該無進位信號被致能時,將該第 八 布斯編碼器藉由該等部分,避免選擇 該第二運算元的複數第 今丨刀㈣k擇到 _522Ι__.Α42859_τ~ 一 乘積’該等第二部分乘積會 201224916 造成一進位現象; 透過複數進位儲存加法器,加總該第二運算元的複數 第一部分乘積,該等進位儲存加法器產生複數加總位元以 及複數進位位元,其中該等進位儲存加法器以一華萊士樹 架構排列,當該無進位信號被致能時,該等進位位元不被 致能; 將該華萊士樹的輸出,左移至少一位元;以及 對該華萊士樹的輸出進行一互斥或運算,用以產生一 無進位乘法結果。 25. 如申請專利範圍第24項所述之方法,其中該第一及 第二運算元的格式係為2的補數。 26. 如申請專利範圍第24項所述之方法,其中該等部分 包括: 一偶數部分,具有該第一運算元的偶數位元的數值, 以及該第一運算元的奇數位元的數值,其中該偶數部分所 具有的該第一運算元的奇數位元的數值被設定成0 ;以及 一奇數部分,具有該第一運算元的奇數位元的數值, 以及該第一運算元的偶數位元的數值,其中該奇數部分所 具有的該第一運算元的偶數位元的數值被設定成0,並且 該奇數部分所具有的數值往右移1位元; 其中在該第一運算元的偶數位元中,包括該第一運算 元的最終有效位元。 27. 如申請專利範圍第24項所述之方法,其中該布斯編 碼器包括一基底-4編碼器,用以選擇部分乘積。 28. 如申請專利範圍第24項所述之方法,其中該等第一 CNTR2522I00-TW/0608-A42859-TW/Fina1 33 201224916 部分乘積包括該第二運算元的複數第一乘々 法包括: ,該等第一乘 將該第二運算元乘上〇;以及 將該第二運算元乘上1。 29·如申請專利範圍第28項所述之方法,| 部分乘積包括該第二運算元的複數第二乘法=第一 法包括: 逆專第二乘 將該第二運算元乘上2;以及 將該第二運算元乘上-2。 30. 如申請專利範圍第24項所述之方法 元均具有64位元的運算元。 、甲这4運异 31. —種無進位乘法裝置,格式化一第一運算元, 進行一無進位乘法運算,包括·· 益触’接收—無進位乘法指令,並根據該 …進位采如日令,致能—無進位信號;以及 該盎進袼式單元,#該無進位錢被致能時, ::,,、:,格式單元將該第一運算元格式化成複數部 刀,其中一布斯編踢器藉由該等部分,可選擇一第二運曾 數第一部分乘積,並且避免選擇到該第二運算元的 複數=部分乘積,該等第二部分乘積會造成—進位現象,· :隹中,第一部分乘積進行一互斥或運算,用以產生 一無進位乘法結果。 中如中#專利辄圍第31項所述之無進位乘法裝置,其 中該Γ及第二運算元的格切'為2的補數。 二。:申請專利範圍第31項所述之無進位乘法裝置,其 R2522IO〇-TW/0608-A42859-TW/FinaI 201224916 中該等部分包括: 一偶數部分,具有該第二運算元的偶數位元的數值, 以及該第二運算元的奇數位元的數值,其中該偶數部分所 具有的該第二運算元的奇數位元的數值被設定成〇;以及 一奇數部分,具有該第二運算元的奇數位元的數值, 以及該第二運算元的偶數位元的數值,其中該奇數部分所 具有的該第二運算元的偶數位元的數值被設定成〇,並且 該奇數部分所具有的數值往右移1位元; 其中在該第二運算元的偶數位元中,包括該第二運算 元的最終有效位元。 34. 如申請專利範圍第31項所述之無進位乘法裝置,其 中該布斯編碼器包括一基底-4編碼器,用以選擇部分乘積。 35. 如申請專利範圍第31項所述之無進位乘法裝置,其 中該等第一部分乘積包括該第二運算元的複數第一乘法, 該等第一乘法包括: 將該第二運算元乘上0;以及 將該第二運算元乘上1。 36. 如申請專利範圍第35項所述之無進位乘法裝置,其 中該等第二部分乘積包括該第二運算元的複數第二乘法, 該等第二乘法包括: 將該第二運算元乘上2;以及 將該第二運算元乘上-2。 37. 如申請專利範圍第31項所述之無進位乘法裝置,其 中該無進位乘法裝置係設定在一處理器裡的一乘法單元或 一裝置裡的一乘法單元。 CNTR2522I00-TW/0608-A42859-TW/FinaI 35 201224916 38. 如申請專利範圍第37項所述之無進位乘法裝置,其 中該乘法單元用以執行一無進位乘法運算以及一正常乘法 運算。 39. —種方法,用以執行一無進位乘法運算,包括: 在一處理器内的一乘法單元中,接收一無進位指令, 並與一第一運算元以及一第二運算元,一起進行該無進位 乘法運算; 根據該無進位指令,致能一無進位信號;以及 當該無進位信號被致能時,將該第一運算元格式化 成複數部分,其中一布斯編碼器藉由該等部分,選擇該 第二運算元的複數第一部分乘積,並且避免選擇到該第 二運算元的複數第二部分乘積,該等第二部分乘積會造 成一進位現象; 其中該等第一部分乘積進行一互斥或運算,用以產 生一無進位乘法結果。 40. 如申請專利範圍第39項所述之方法,其中該第一及 第二運算元的格式係為2的補數。 41. 如申請專利範圍第39項所述之方法,其中該等部分 包括: 一偶數部分,具有該第一運算元的偶數位元的數值, 以及該第一運算元的奇數位元的數值,其中該偶數部分所 具有的該第一運算元的奇數位元的數值被設定成〇 ;以及 一奇數部分,具有該第一運算元的奇數位元的數值, 以及該第一運算元的偶數位元的數值,其中該奇數部分所 具有的該第一運算元的偶數位元的數值被設定成0,並且 CNTR2522I00-TW/0608-A42859-TW/Final 36 201224916 該奇數部分所具有的數值往右移1位元; 其中在該第一運算元的偶數位元中,包括該第一運算 元的最終有效位元。 42_如申請專利範圍第39項所述之方法,其中該布斯編 碼器包括一基底-4編碼器,用以選擇部分乘積。 43. 如申請專利範圍第39項所述之方法,其中該等第一 部分乘積包括該第二運算元的複數第一乘法,該等第一乘 法包括: 將該第二運算元乘上〇;以及 將該第二運算元乘上1。 44. 如申請專利範圍第43項所述之方法,其中該等第二 部分乘積包括該第二運算元的複數第二乘法,該等第二乘 法包括: 將該第二運算元乘上2;以及 將該第二運算元乘上-2。 45. 如申請專利範圍第39項所述之方法,其中該等運算 元均具有64位元的運算元。 CNTR2522!00-TW/0608-A42859-TW/Final 37201224916 VII. Scope of application for patents·· 1. A carry-free multiply device for performing - carry-free multiplication operations, including: no carry pre-format unit for receiving-multiplier operation elements, and the multiplier operation t is a complex part; the Booth encoder 'receives and determines the parts, and selects a complex first part product of a multiplicanded operation element, wherein the plural parts of the multiplicanded operation element are avoided by the parts The partial product is selected, and the second partial product causes a carry phenomenon; the reducer is coupled to the Buss encoder for storing the adder by the complex carry, and summing the first partial by a special bit storage adder Ft ί 元 'where the carry storage adders are arranged in the Wallace tree architecture', the tests are not enabled; the left multiplier is coupled to the compressor, and the compressor is coupled to at least one bit. Yuan; and. r turns the compressor out of the left: mutual exclusion or idle, followed by the compression-mutual exclusion or operation 'and produces a carry-free multiplication to perform 2. The multiplicative operation element is cooled as in claim 1 The format is 2 without multiply multiply devices, where 3. as part 1 of the patent application, these parts include ··, sigh no carry multiply device, where ...= part, the number of even bits with the multiplier operation element The value of the letter and the odd-numbered bit of the multiplier, the odd-numbered bit of the multiplier, and the value of the even-numbered bit are set to 〇; CNTR2522I00-TW/0608-A42859-TW /Final 2? 久201224916 An odd-numbered portion having the value of the odd-numbered bit of the multiplier, and the value of the even-numbered bit of the multiplier, wherein the odd-numbered portion has the even-order operator The value of the digit is set to 〇, and the value of the odd portion is shifted to the right by 1 bit; wherein the even bit of the multiplier is included in the even bit of the multiplier. 4. The carry-less multiplying device of claim 1, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product. 5. The carry-in multiplication device of claim 1, wherein the first partial product comprises a complex first multiplication of the multiplicand operand, the first multiplications comprising: the multiplicand operand Multiply the 〇; and multiply the multiplicand operand by 1. 6. The carry-in multiplication device of claim 5, wherein the second partial product comprises a complex second multiplication of the multiplicand operand, the second multiplications comprising: the multiplicative operation Multiply the element by 2; and multiply the multiplicand operator by -2. 7. The carry-free multiplying device of claim 1, wherein the carry-free multiplying device is a multiplying unit in a processor or a multiplying unit in a device. 8. The carry-less multiplying device of claim 7, wherein the multiplying unit is configured to perform a carry-free multiplication operation and a normal multiplication operation. 9. A method for performing a carry-free multiplication operation, comprising: CNTR2522I00-TW/0608-A42859-TW/Final 28 201224916 In a multiply unit within a processor, a multiplier operand is formatted into a complex portion Determining the portions through a Booth encoder and selecting a first partial product of a multiplicand operator, wherein the complex second partial product of the multiplicand operator is avoided by the plurality of products The second partial product causes a carry phenomenon; the first partial product is processed by a complex carry storage adder for generating a complex total bit and a complex carry bit, wherein the carry store adder The Lex tree architecture is arranged, and the carry bit is not enabled when performing the carry-less multiplication; shifting the output of the Wallace tree to the left by at least one bit; and performing the output of the Wallace tree A mutually exclusive OR operation to produce a carry-free multiplication result. 10. The method of claim 9, wherein the multiplier operand is in the form of a 2's complement. 11. The method of claim 9, wherein the portion comprises: an even portion, a value having an even bit of the multiplier, and a value of an odd bit of the multiplier, Wherein the even-numbered portion has a value of an odd-numbered bit of the multiplier operation element set to 0; and an odd-numbered portion having a value of an odd-numbered bit of the multiplier operation element, and an even-numbered bit of the multiplier operation element a value of a meta-number, wherein the odd-numbered portion has a value of an even-numbered bit of the multiplier operation element set to 0, and the value of the odd-numbered portion is shifted to the right by 1 bit; wherein the multiplier operation element The even-numbered bits include the last significant bit of the multiplier operation CNTR2522100-TW/0608-A42859-TW/Fina1 29 201224916. 12. The method of claim 9, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product. 13. The method of claim 9, wherein the first partial product comprises a complex first multiplication of the multiplicanded operand, the first multiplications comprising: multiplying the multiplicand operator by 〇 ; and multiply the multiplicand operand by 1. 14. The method of claim 13, wherein the second partial product comprises a complex second multiplication of the multiplicand operator, the second multiplications comprising: multiplying the multiplicand operator by 2; and multiply the multiplicand operator by -2. 15. The method of claim 9, wherein the operands each have a 64-bit operand. 16. A carry-free multiply device for performing a carry-free multiplication operation, comprising: a first operand register, receiving a first operand for performing the carry-less multiplication operation; and a second operand a register, receiving a second operand for performing the carry-free multiplication operation; an opcode detector receiving a carry-free multiplication instruction and enabling a carry-free signal according to the carry-free multiplication instruction; The carry-free pre-format unit, when the no-carrying signal is enabled, the non-carry pre-format unit formats the first operand into a complex part 30 CNTR2522I00-TW/0608-A42859-TW/Final S 201224916, one of which With the parts, the Booth encoder avoids selecting the second partial product of the second operand, and the second partial product will cause a carry phenomenon; 'a compressor, through the complex carry storage adder, plus Generating a first partial product of the plurality of second operands, the carry storage adder generating a complex summing bit and a complex carry bit, wherein the carry stores the adder The Wallace tree architecture is arranged such that when the carry-free signal is enabled, the carry bit is not enabled; a left shifter coupled to the compressor for shifting the output of the compressor to the left by at least one And a mutex or a mutex coupled to the compressor and the left shifter for performing a mutually exclusive OR operation and generating a carry-free multiplication result. 17. The carry-less multiply device of claim 16, wherein the format of the first and second operands is a complement of two. 18. The carry-less multiply device of claim 16, wherein the portion comprises: an even portion having a value of an even bit of the first operand, and an odd bit of the first operand a value, wherein the even-numbered portion has a value of an odd-numbered bit of the first operand set to 0; and an odd-numbered portion having a value of the odd-numbered bit of the first operand, and the first operand a value of an even bit, wherein the odd-numbered portion has a value of an even-numbered bit of the first operand set to 0, and the odd-numbered portion has a value shifted to the right by 1 bit; wherein the first The even bit of the operand includes the last significant bit of the first operand. The non-carrying multiplying device of claim 16, wherein the Booth encoder includes a base_4 encoder for selecting a partial product. . 20. The carry-less multiplication device of claim 16, wherein the first partial product comprises a complex first multiplication of the second operational element, the first multiplications comprising: multiplying the second operational element Up 0; and multiplying the second operand by 1. 21. The carry-in multiplication device of claim 2, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplication comprising: the second operand Multiply by 2; and multiply the second operand by -2. 22. The carry-free multiplying device of claim 16, wherein the carry-free multiplying device is set in a multiplication unit in the processor or a multiplying unit in a device. 23. The non-carrying multiplying device described in item 22 (4) ; 22; ;, = multiplication unit is used to perform - no wire feeding operation and - normal multiplication operation. 24. The method 'is used to perform a carry-free multiplication operation, including: in a processor - vertical, soil 5S - mountain and - first - Yunyuan Yuantian receives a first operation element to first open 70 ' For performing the carry-free multiplication operation; enabling a carry-free signal according to a carry method instruction; when the non-carry signal is enabled, the eighth Buss encoder is prevented from selecting the first part by using the portion The second operand of the second operand (four) k is selected to _522Ι__.Α42859_τ~ one product 'the second partial product will be 201224916 causing a carry phenomenon; through the complex carry storage adder, add the plural of the second operand a portion of a product, the carry storage adder generating a complex sum bit and a plurality of carry bits, wherein the carry storage adders are arranged in a Wallace tree architecture, and when the carry signal is enabled, the carry The bit is not enabled; the output of the Wallace tree is shifted left by at least one element; and the output of the Wallace tree is mutually exclusive ORed to produce a carry-free multiplication result. 25. The method of claim 24, wherein the format of the first and second operands is a complement of two. 26. The method of claim 24, wherein the portion comprises: an even portion having a value of an even bit of the first operand, and a value of an odd bit of the first operand, Wherein the even-numbered portion has a value of an odd-numbered bit of the first operand set to 0; and an odd-numbered portion having a value of an odd-numbered bit of the first operand, and an even-numbered bit of the first operand a value of a meta-number, wherein the odd-numbered portion has a value of an even-numbered bit of the first operand set to 0, and the odd-numbered portion has a value shifted to the right by 1 bit; wherein the first operand is The even bit includes the last significant bit of the first operand. 27. The method of claim 24, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product. 28. The method of claim 24, wherein the first CNTR2522I00-TW/0608-A42859-TW/Fina1 33 201224916 partial product comprises a complex first first multiplication of the second operand comprising: The first multiplication multiplies the second operand by 〇; and multiplies the second operand by one. 29. The method of claim 28, wherein the partial product comprises a complex second multiplication of the second operand = the first method comprises: inverse multiplying the second operand by multiplying 2; Multiply the second operand by -2. 30. The method described in claim 24 of the patent application has 64-bit operands. A movement difference of 31. A kind of non-carrying multiplication device, formatting a first operation element, performing a carry-free multiplication operation, including ········································································· Japanese, enabling - no carry signal; and the enthalpy unit, # when the carry-in money is enabled, ::,,,:, format unit formats the first operand into a plurality of knives, wherein By using the parts, a Booth knitting machine can select a first partial product of the second operation number and avoid selecting the complex = partial product of the second operational element, and the second partial product will cause a carry phenomenon. , · : In the middle, the first partial product performs a mutual exclusion or operation to generate a carry-free multiplication result. The non-carrying multiplying device described in Item 31 of the Chinese Patent No. 31, wherein the entanglement of the Γ and the second operational element is a complement of two. two. The non-carrying multiplying device of claim 31, wherein the R2522IO〇-TW/0608-A42859-TW/FinaI 201224916 includes: an even portion having an even bit of the second operand a value, and a value of an odd bit of the second operand, wherein the even-numbered portion has a value of an odd bit of the second operand set to 〇; and an odd-numbered portion having the second operand a value of an odd bit, and a value of an even bit of the second operand, wherein a value of the even bit of the second operand having the odd portion is set to 〇, and the value of the odd portion Shifting 1 bit to the right; wherein the even bit of the second operand includes the last significant bit of the second operand. 34. The carry-less multiplying device of claim 31, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product. 35. The carry-in multiplication device of claim 31, wherein the first partial product comprises a complex first multiplication of the second operational element, the first multiplications comprising: multiplying the second operational element 0; and multiply the second operand by 1. 36. The carry-in multiplication device of claim 35, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplications comprising: multiplying the second operand Up 2; and multiplying the second operand by -2. 37. The carry-less multiplying device of claim 31, wherein the carry-free multiplying device is a multiplying unit in a processor or a multiplying unit in a device. The non-carrying multiplying device of claim 37, wherein the multiplying unit is configured to perform a carry-free multiplication operation and a normal multiplication operation, as described in claim 37. 39. A method for performing a carry-free multiplication operation, comprising: receiving a carry-free instruction in a multiply unit within a processor, and performing together with a first operand and a second operand The carry-less multiplication operation; enabling a carry-free signal according to the carry-free instruction; and formatting the first operand into a complex portion when the carry-free signal is enabled, wherein a Booth encoder is used by the An equal portion, selecting a first partial product of the complex of the second operand, and avoiding selecting a second partial product of the second operand, the second partial product causing a carry phenomenon; wherein the first partial product is performed A mutually exclusive OR operation to produce a carry-free multiplication result. 40. The method of claim 39, wherein the format of the first and second operands is a complement of two. 41. The method of claim 39, wherein the portion comprises: an even portion, a value having an even bit of the first operand, and a value of an odd bit of the first operand, Wherein the even-numbered portion has a value of an odd-numbered bit of the first operand set to 〇; and an odd-numbered portion having a value of an odd-numbered bit of the first operand, and an even-numbered bit of the first operand The value of the element, wherein the odd-numbered portion has the value of the even-numbered bit of the first operand set to 0, and CNTR2522I00-TW/0608-A42859-TW/Final 36 201224916 the odd-numbered portion has the value to the right Shifting 1 bit; wherein in the even bit of the first operand, the last significant bit of the first operand is included. 42. The method of claim 39, wherein the Booth encoder comprises a substrate-4 encoder for selecting a partial product. 43. The method of claim 39, wherein the first partial product comprises a complex first multiplication of the second operand, the first multiplications comprising: multiplying the second operand by 〇; Multiply the second operand by 1. 44. The method of claim 43, wherein the second partial product comprises a complex second multiplication of the second operand, the second multiplications comprising: multiplying the second operand by 2; And multiplying the second operand by -2. 45. The method of claim 39, wherein the operands each have a 64-bit operand. CNTR2522!00-TW/0608-A42859-TW/Final 37
TW100136024A 2010-12-03 2011-10-05 Carryless multiplication apparatus and method TWI489375B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/960,231 US8645448B2 (en) 2010-12-03 2010-12-03 Carryless multiplication unit
US12/960,239 US8667040B2 (en) 2010-12-03 2010-12-03 Mechanism for carryless multiplication that employs booth encoding
US12/960,246 US8635262B2 (en) 2010-12-03 2010-12-03 Carryless multiplication preformatting apparatus and method

Publications (2)

Publication Number Publication Date
TW201224916A true TW201224916A (en) 2012-06-16
TWI489375B TWI489375B (en) 2015-06-21

Family

ID=45585610

Family Applications (1)

Application Number Title Priority Date Filing Date
TW100136024A TWI489375B (en) 2010-12-03 2011-10-05 Carryless multiplication apparatus and method

Country Status (2)

Country Link
CN (1) CN102360276B (en)
TW (1) TWI489375B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10817587B2 (en) * 2017-02-28 2020-10-27 Texas Instruments Incorporated Reconfigurable matrix multiplier system and method
CN108363559B (en) * 2018-02-13 2022-09-27 北京旷视科技有限公司 Multiplication processing method, device and computer readable medium for neural network
CN110196709B (en) * 2019-06-04 2021-06-08 浙江大学 Nonvolatile 8-bit Booth multiplier based on RRAM
CN110673823B (en) * 2019-09-30 2021-11-30 上海寒武纪信息科技有限公司 Multiplier, data processing method and chip
CN113031909B (en) * 2019-12-24 2023-09-08 上海寒武纪信息科技有限公司 Data processor, method, device and chip
EP4080350A4 (en) * 2020-04-01 2022-12-28 Huawei Technologies Co., Ltd. Multimode fusion multiplier

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0721159A1 (en) * 1995-01-03 1996-07-10 Texas Instruments Incorporated Multiple-input binary adder
US6684236B1 (en) * 2000-02-15 2004-01-27 Conexant Systems, Inc. System of and method for efficiently performing computations through extended booth encoding of the operands thereto
CN100382011C (en) * 2001-12-14 2008-04-16 Nxp股份有限公司 Pipeline core in montgomery multiplier
US7139787B2 (en) * 2003-01-30 2006-11-21 Sun Microsystems, Inc. Multiply execution unit for performing integer and XOR multiplication
JP2004334212A (en) * 2003-05-09 2004-11-25 Samsung Electronics Co Ltd Montgomery modular multiplier and method thereof
TWI258694B (en) * 2004-04-02 2006-07-21 Ali Corp Method and system for sign extension of multiplier
US8271570B2 (en) * 2007-06-30 2012-09-18 Intel Corporation Unified integer/galois field (2m) multiplier architecture for elliptic-curve crytpography
CN100552620C (en) * 2007-09-21 2009-10-21 清华大学 Large number multiplication device based on quadratic B ooth coding

Also Published As

Publication number Publication date
CN102360276A (en) 2012-02-22
CN102360276B (en) 2014-06-25
TWI489375B (en) 2015-06-21

Similar Documents

Publication Publication Date Title
TW201224916A (en) Carryless multiplication apparatus and method
JP4870932B2 (en) Extended Montgomery Modular Multiplier Supporting Multiple Precision
KR101086560B1 (en) Power-efficient sign extension for booth multiplication methods and systems
KR100834178B1 (en) Multiply-accumulate mac unit for single-instruction/multiple-data simd instructions
US8635262B2 (en) Carryless multiplication preformatting apparatus and method
JPH08263315A (en) Execution method of modular reduction by montgomery method
KR20050065672A (en) Method and a system for performing calculation operations and a device
US20080243976A1 (en) Multiply and multiply and accumulate unit
JPH02202632A (en) Multiplier
KR20040092376A (en) Montgomery modular multiplier and method thereof using carry save addition
Seidel et al. Binary multiplication radix-32 and radix-256
TW384447B (en) Processor with reconfigurable arithmetic data path
Zhang et al. {FLASH}: Towards a high-performance hardware acceleration architecture for cross-silo federated learning
US7587444B2 (en) Data value addition
Reyhani-Masoleh et al. New multiplicative inverse architectures using Gaussian normal basis
Drucker et al. Fast multiplication of binary polynomials with the forthcoming vectorized VPCLMULQDQ instruction
Cilardo et al. A novel unified architecture for public-key cryptography
US7194498B2 (en) Higher radix multiplier with simplified partial product generator
Stine et al. An Efficient Implementation of Radix-4 Integer Division Using Scaling
Satoh Hardware architecture and cost estimates for breaking SHA-1
Satoh et al. High-Speed MARS Hardware.
JP2004334212A (en) Montgomery modular multiplier and method thereof
Hickmann et al. Improved combined binary/decimal fixed-point multipliers
Vassalos et al. Configurable Booth-encoded Modulo 2 (exp n)±1 Multipliers
TWI249685B (en) Apparatus and method for generating packed sum of absolute differences