TWI423121B

TWI423121B - System and method for determination of a horizontal minimum of digital values

Info

Publication number: TWI423121B
Application number: TW99130018A
Authority: TW
Inventors: Rochelle L Stortz; Raymond A Bertram
Original assignee: Via Tech Inc
Priority date: 2009-10-26
Filing date: 2010-09-06
Publication date: 2014-01-11
Also published as: CN103941601B; CN101937333B; CN101937333A; TW201419138A; CN103365624B; CN103941601A; TW201115460A; TWI489374B; CN103365624A

Description

Judging system and method

本發明係有關於一種微處理器指令，特別是有關於一種用以從一數位碼集合(set of digital values)中，判斷出最小碼的系統及方法，其中最小的數位碼作為一水平最小值(horizontal minimum)。The present invention relates to a microprocessor instruction, and more particularly to a system and method for determining a minimum code from a set of digital values, wherein the smallest digit code is used as a horizontal minimum. (horizontal minimum).

目前的微處理器(microprocessor)經常被用來執行媒體指令(Media Instruction)，用以增加多媒體應用的效率。舉例而言，微處理器架構可能包含一個或多個媒體指令，用以從一數位碼集合中，辨識出一水平最小值，以及該水平最小值在一匯流排(bus)或一暫存器(register)的相對應位置(location)。一具體的例子就是英特爾(intel)的SSE4程式參考手冊(SSE4 programming reference manual)裡的PHMINPOSUW指令。PHMINPOSUW指令由8個無正負號字元(unsigned words,128bits)中，找出最小字元及最小字元的相對應位置，其中最小字元具有16個位元(bit)。某些習知的微處理器在執行PHMINPOSUW指令時，需要較多的處理程序或是較多時脈週期。舉例而言，為了辨識出多個字元對裡的最小字元對，則需要使用4個16位元的大小比較器(magnitude comparators)，才能在一第一週期內，將搜尋範圍由8個字元降低至4個字元，再將所找到的4個字元回授(feed back)至2個比較器，用以在一第二週期內，搜尋範圍由4個字元降低至2個字元，最後再將尋找結果回授給1個比較器，用以在一第三(即最後)週期內，找出2個字元裡的最小字元。在一習知的做法中，係藉由增加16位元比較器的數量，達到在單一週期內執行指令的功能。以7個16位元比較器為例，在單一週期內，先利用4個比較器進行第一次的比較，用以將搜尋的範圍由8字元降低至4字元，然後再利用2個比較器，將搜尋的範圍由4字元降低至2字元，最後再利用1比較器，從2字元中找出最小者。然而，每一16位元比較器會佔用微處理器較大的空間，因而增加成本並降低處理效能。Current microprocessors are often used to execute Media Instructions to increase the efficiency of multimedia applications. For example, the microprocessor architecture may include one or more media instructions to identify a level minimum from a set of digit codes, and the level minimum is in a bus or a register. The corresponding location of (register). A specific example is the PHMINPOSUW instruction in Intel's SSE4 programming reference manual. The PHMINPOSUW instruction finds the corresponding position of the smallest character and the smallest character from 8 unsigned words (128 bits), wherein the smallest character has 16 bits. Some conventional microprocessors require more processing or more clock cycles when executing the PHMINPOSUW instruction. For example, in order to identify the smallest pair of characters in a plurality of pairs of characters, it is necessary to use four 16-bit magnitude comparators in order to have a search range of eight in a first cycle. The character is reduced to 4 characters, and the found 4 characters are fed back to 2 comparators, and the search range is reduced from 4 characters to 2 in a second period. The character is finally returned to the comparator for finding the smallest character of the two characters in a third (ie, last) period. In a conventional practice, the function of executing instructions in a single cycle is achieved by increasing the number of 16-bit comparators. Taking seven 16-bit comparators as an example, in a single cycle, the first comparison is performed using four comparators to reduce the search range from 8 characters to 4 characters, and then use 2 The comparator reduces the search range from 4 characters to 2 characters, and finally uses the 1 comparator to find the smallest one from the 2 characters. However, each 16-bit comparator consumes a large amount of space in the microprocessor, thereby increasing cost and processing efficiency.

本發明的目的在於，不增加電路的情況下，又可在單一週期從數位碼集合中找出最小數位碼及其相對應位置。It is an object of the present invention to find the smallest digit code and its corresponding position from a set of digital codes in a single cycle without adding circuitry.

本發明提供一種判斷系統，用以從至少二個二進制碼中，找出一最小二進制碼。在一實施例中，判斷系統包括，一第一加法器、一第二加法器以及一比較電路。第一加法器加總複數第一位元以及複數第二位元，用以提供一第一進位輸出及一第一傳遞輸出。該等第一位元係為一第一二進制碼的高位元。該等第二位元反相於一第二二進制碼的高位元。第二加法器加總複數第三位元以及複數第四位元，用以提供一第二進位輸出。該等第三位元為第一二進制碼的低位元。該等第四位元反相於第二二進制碼的低位元。比較電路根據第一及第二進位輸出及第一傳遞輸出，判斷是否第一二進制碼大於第二二進制碼。第一及第二二進制碼均無正負號(unsigned)。第一及第二加法器執行無正負號二進制加法。該第一傳遞輸出代表該第一加法器是否接收到一進入輸入(carry input)。The present invention provides a judging system for finding a minimum binary code from at least two binary codes. In one embodiment, the determination system includes a first adder, a second adder, and a comparison circuit. The first adder adds a plurality of first bits and a plurality of second bits to provide a first carry output and a first transfer output. The first bit is a high bit of a first binary code. The second bits are inverted to the upper bits of a second binary code. The second adder adds a total of the third bit and the complex fourth bit to provide a second carry output. The third bit is the lower bit of the first binary code. The fourth bit is inverted to the lower bit of the second binary code. The comparison circuit determines whether the first binary code is greater than the second binary code based on the first and second carry outputs and the first transfer output. Both the first and second binary codes have no sign (unsigned). The first and second adders perform an unsigned binary addition. The first delivery output represents whether the first adder receives a carry input.

本發明另提供一種判斷系統，用以快速地由複數數位碼中，找出一水平最小值。本發明之判斷系統包括，複數差異電路、一路徑選擇電路以及一比較電路。每一差異電路比較兩數位碼。路徑選擇電路將該等數位碼中的每一者指定予至少一差異電路，用以將每一數位碼與其它數位碼作比較。每一差異電路可能包括一高加法器以及一低加法器。高加法器比較一第一數位碼的高部分及一第二數位碼的高部分，用以提供一第一進位輸出以及一傳遞輸出。低加法器比較該第一數位碼的低部分及該第二數位碼的低部分，用以提供一第二進位輸出。比較電路比較該等第一及第二進位輸出以及比較該等傳遞輸出，用以得知該等數位碼中之一最小數位碼。The present invention further provides a determination system for quickly finding a horizontal minimum from a plurality of digital code. The judgment system of the present invention includes a complex difference circuit, a path selection circuit, and a comparison circuit. Each difference circuit compares two digit codes. A path selection circuit assigns each of the digital code to at least one difference circuit for comparing each digital code with another digital code. Each difference circuit may include a high adder and a low adder. The high adder compares the high portion of the first digit code with the high portion of the second digit code to provide a first carry output and a pass output. The low adder compares the low portion of the first digit code with the low portion of the second digit code to provide a second carry output. The comparison circuit compares the first and second carry outputs and compares the transfer outputs for learning one of the lowest digit codes of the digital code.

每一傳遞輸出表示該等差異電路之一者的高加法器是否接收一進位輸入。該比較電路包括一解碼電路。解碼電路解碼比較位元，用以提供複數最小位元。每一最小位元表示相對應的數位碼是否為最小數位碼。一位置電路告知該最小數位碼的記憶體位置。判斷系統可能被整合在一微處理器晶片中，用以執行一快速的水平最小指令。Each pass output indicates whether the high adder of one of the difference circuits receives a carry input. The comparison circuit includes a decoding circuit. The decoding circuit decodes the compare bits to provide a complex minimum bit. Each minimum bit indicates whether the corresponding digit code is the least digit code. A position circuit informs the memory location of the least digit code. The decision system may be integrated into a microprocessor die to perform a fast horizontal minimum command.

本發明提供一種判斷方法，用以找出複數數位碼中之一最小數位碼。在一可能實施例中，判斷方法包括下列步驟，比較一第一數位碼的高位元以及一第二數位碼的高位元，用以提供一第一進位輸出以及一傳遞輸出；比較該第一數位碼的低位元以及該第二數位碼的低位元，用以提供一第二進位輸出；以及根據第一及第二進位輸出以及該傳遞輸出，判斷出該第一或第二數位碼係為一較小碼。本發明之判斷方法可能包括，將該等數位碼的每一者傳送至複數加法器對的至少一加法器對中，用以將每一數位碼與其它數位碼相比較，以得知一最小數位碼。本發明的判斷方法更包括，解碼比較位元。本發明的判斷方法更包括，得知最小數位碼在一記憶體中的位置。The present invention provides a method of determining to find one of the plurality of digit codes. In a possible embodiment, the determining method includes the steps of comparing a high bit of a first digit code and a high bit of a second digit code to provide a first carry output and a pass output; comparing the first digit a lower bit of the code and a lower bit of the second digit code for providing a second carry output; and determining, according to the first and second carry outputs and the transfer output, that the first or second digit code system is Smaller code. The method of determining the present invention may include transmitting each of the digit codes to at least one adder pair of the complex adder pair to compare each digit code with another digit code to learn a minimum Digital code. The judging method of the present invention further includes decoding the comparison bit. The judging method of the present invention further includes knowing the position of the least digit code in a memory.

本發明提供一種系統，利用一共用加法器電路，執行一水平最小指令及一誤差絕對值總和指令之一者。在一實施例中，該系統包括，複數加法器、一加總電路、一比較電路以及一路徑選擇電路。輸入運算元包括複數數位碼。對誤差絕對值總和指令而言，該等數位碼包括一第一數位碼集和以及一第二數位碼集合。對水平最小指令而言，該等數位碼包括複數數位碼對。每一數位碼對具有一高數位碼以及一低數位碼。每一加法器將一第一數位碼與一第二數位碼作比較，用以提供一誤差絕對值以及一進位輸出。加總電路加總該等誤差絕對值，用以提供複數誤差絕對值加總值。該等加法器構成複數加法器對，並提供一傳遞輸出。比較電路結合該等進位輸出及該等傳遞輸出，用以找出該等數位碼對的一最小數位碼對。在執行該水平最小指令時，路徑選擇電路將該等數位碼對的每一數位碼對傳送至該等加法器對之至少一加法器對，用以將每一數位碼對與其它數位碼對相比較。在執行該誤差絕對值總和指令時，路徑選擇電路將該第一及第二數位碼集合傳送至該等加法器對，用以得知該第一數位碼集合的每一數位碼與該第二數位碼集合的每一數位碼之間的誤差絕對值，該第二數位碼集合具有連續的數位碼。The present invention provides a system for performing a horizontal minimum command and an error absolute value summation command using a common adder circuit. In one embodiment, the system includes a complex adder, a summing circuit, a comparison circuit, and a path selection circuit. The input operand includes a complex digit code. For an error absolute value sum instruction, the digital code includes a first digital code set and a second digital code set. For horizontal minimum instructions, the digital code includes a complex digital code pair. Each digit code pair has a high digit code and a low digit code. Each adder compares a first digit code with a second digit code to provide an absolute value of the error and a carry output. The summing circuit sums the absolute values of the errors to provide an absolute value of the complex errors plus the total value. The adders form a complex adder pair and provide a transfer output. The comparison circuit combines the carry outputs and the pass outputs to find a minimum digit code pair of the pair of digits. When the horizontal minimum instruction is executed, the path selection circuit transmits each digital code pair of the digital code pair to at least one adder pair of the adder pair for pairing each digital code pair with another digital code pair Compared. When the error absolute value sum command is executed, the path selection circuit transmits the first and second digit code sets to the adder pair to learn each digit code of the first digit code set and the second An absolute value of the error between each digit code of the set of digits, the second set of digits having a continuous digit code.

本發明另提供一種方法，利用一共用加法器電路，執行一水平最小指令以及一誤差絕對值總和指令之一者。在一實施例中，本發明所提供的方法包括：接收複數數位碼。在執行誤差絕對值總和指令時，該等數位碼包括一第一數位碼集合以及一第二數位碼集合。在執行水平最小指令時，該等數位碼包括一高數位碼以及一低數位碼。本發明所提供的方法更包括，提供複數加法器。每一加法器將一第一數位碼與一第二數位碼相比較，用以提供一誤差絕對值以及一進位輸出。本發明所提供的方法更包括，加總該等誤差絕對值，用以提供複數誤差絕對值總和值；將該等加法器分類成複數加法器對，並提供一傳遞輸出；結合該等進位輸出及該等傳遞輸出，用以得知該等數位碼對之一最小數位碼對；以及在執行該水平最小指令時，將該等數位碼對的每一數位碼對傳送至該等加法器對的至少一加法器對，用以將每一數位碼對與其它數位碼對相比較，在執行該誤差絕對值總和指令時，將該第一及第二數位碼集合傳送至該等加法器對，用以得知第一數位碼集合的每一數位碼與該第二數位碼集合的每一連續數位碼之間的誤差絕對值。The present invention further provides a method for performing a horizontal minimum command and an error absolute value summation command using a common adder circuit. In an embodiment, the method provided by the present invention includes receiving a complex digital code. When the error absolute value summation instruction is executed, the digit code includes a first digit code set and a second digit code set. When the horizontal minimum command is executed, the digital code includes a high digit code and a low digit code. The method provided by the present invention further includes providing a complex adder. Each adder compares a first digit code with a second digit code to provide an absolute value of the error and a carry output. The method provided by the present invention further comprises: summing the absolute values of the errors to provide a sum of absolute values of the complex errors; classifying the adders into a complex adder pair and providing a transfer output; combining the carry outputs And the pass output for learning a minimum number of bit pairs of the pair of digits; and, when the minimum instruction is executed, transmitting each pair of pairs of the pair of digits to the pair of adders At least one adder pair for comparing each digit code pair with another digit code pair, and transmitting the first and second digit code sets to the adder pair when performing the error absolute value sum instruction And an absolute value of the error between each digit code of the first digit code set and each consecutive digit code of the second digit code set.

以下的實施例說明用以讓本領域的普通技術人員得以製造和使用本發明公開的內容。較佳實施例的修改對於本領域的技術人員將是顯而易見的，且此處描述的普遍原理可應用於其他實施例。因此，本發明並未局限於此處提出和說明的特定實施例，其應涵蓋所有符合公開於此的原理和新穎特徵的最大範圍。The following examples are presented to enable one of ordinary skill in the art to make and use the present disclosure. Modifications of the preferred embodiments will be apparent to those skilled in the art, and the general principles described herein are applicable to other embodiments. Therefore, the invention is not limited to the specific embodiments set forth and described herein, which are to

本發明注意到，習知微處理器執行水平最小值指令需使用許多週期。本發明在執行相同的指令時，僅需單一週期，並且不會大量增加電路。本發明提供一種系統及方法，用以快速得知水平最小值，為使本發明之特徵和優點能更明顯易懂，下文特舉出較佳實施例，並配合所附圖式(第1~8圖)，作詳細說明。The present invention notes that conventional microprocessors require a number of cycles to execute a horizontal minimum instruction. The present invention requires only a single cycle when executing the same instructions, and does not add a large amount of circuitry. The present invention provides a system and method for quickly knowing the minimum level of the present invention. In order to make the features and advantages of the present invention more apparent, the preferred embodiments are described below, and the drawings are incorporated. 8 figure), for a detailed description.

第1圖為本發明的一實施例中微處理器100的一結構圖。處理器100具有比較電路114，比較電路114可由數位碼集合中，快速地找出一水平最小值，並得到第一數位碼集合及第二數位碼集合的誤差絕對值總和(sum of absolute differences)。在本實施例中，第1圖並未顯示其它習知的系統及功能，如指令擷取(instruction fetch)、指令隊列(instruction queue)、指令解碼(instruction decoding)、以及指令重排(Instruction reordering)…等。雖然第1圖沒有顯示部分習知技術，但並不會影響對於本發明的理解。微處理器100具有排程器(scheduler)102。排程器102安排(route)指令或操作的程序，用以選擇算術邏輯單元(arithmetic logic units；ALUs)或是執行單元(execution units；EUs)。如第1圖所示，排程器102耦接複雜整數執行單元(complex integer execution unit；IEU)104、簡單整數執行單元(simple IEU)106、浮點執行單元(floating point execution unit；FPEU)108、媒體單元(media unit)110以及其它單元112，其中其它單元112係為其它相似或不同的處理單元。媒體單元110一般執行以媒體為基礎的指令及運作，如單指令多數據流式擴展指令集(Streaming SIMD Extensions,SSE)或者多媒體延伸指令集(MultiMedia extension,MMX)及其它類似指令集。SSE是英特爾的x86架構中的一種SIMD指令集，SIMD是指單指令多資料(single instruction multiple data)。媒體單元110具有比較電路114，用以執行至少兩獨立的媒體指令。在本實施例中，該兩媒體指令稱為水平最小指令(PMIN指令)及誤差絕對值總和指令(PSAD指令)。PSAD指令表示第一數位碼(或二進制碼)集合及第二數位碼(或二進制碼)集合的誤差絕對值總和，其中第二數位碼集合緊隨在第一數位碼集合之後。稍後將詳細說明第一數位碼集合及第二數位碼集合。藉由執行PMIN指令，可得知一最小數位碼及其相對應位置。在本實施例中，上述數位碼、二進制碼以及相對應格式係可相互替換，並且這些碼代表複數位元(bit)或十六進位的數位碼。排程器102具有記憶體116。記憶體116用以儲存PSAD指令及PMIN指令的運算元(operand)，並具有第一匯流排ABUS以及第二匯流排BBUS。在一實施例中，第一匯流排ABUS以及第二匯流排BBUS可傳送128位元，但並非用以限制本發明。在其它實施例中，第一匯流排ABUS以及第二匯流排BBUS可傳送其它數量的位元。雖然媒體單元110一般係用以執行其它多種本領域人士所深知的媒體指令，但比較電路114係用以執行PSAD及PMIN指令。FIG. 1 is a block diagram of a microprocessor 100 in accordance with an embodiment of the present invention. The processor 100 has a comparison circuit 114. The comparison circuit 114 can quickly find a horizontal minimum value from the digital code set, and obtain sum of absolute differences of the first digital code set and the second digital code set. . In this embodiment, FIG. 1 does not show other conventional systems and functions, such as instruction fetch, instruction queue, instruction decoding, and instruction reordering. )…Wait. Although FIG. 1 does not show some of the prior art, it does not affect the understanding of the present invention. The microprocessor 100 has a scheduler 102. The scheduler 102 routes or commands a program for selecting arithmetic logic units (ALUs) or execution units (EUs). As shown in FIG. 1, the scheduler 102 is coupled to a complex integer execution unit (IEU) 104, a simple integer execution unit (simple IEU) 106, and a floating point execution unit (FPEU) 108. The media unit 110 and other units 112, wherein the other units 112 are other similar or different processing units. Media unit 110 typically performs media-based instructions and operations, such as Streaming SIMD Extensions (SSE) or MultiMedia extension (MMX) and other similar instruction sets. SSE is a SIMD instruction set in Intel's x86 architecture, and SIMD refers to single instruction multiple data. The media unit 110 has a comparison circuit 114 for executing at least two independent media instructions. In the present embodiment, the two media instructions are referred to as a horizontal minimum instruction (PMIN instruction) and an error absolute value sum instruction (PSAD instruction). The PSAD instruction represents a sum of absolute values of errors of a first set of digital code (or binary code) and a second set of digital code (or binary code), wherein the second set of digital codes follows the first set of digital codes. The first digit code set and the second digit code set will be described in detail later. By executing the PMIN command, a minimum digit code and its corresponding position can be known. In this embodiment, the above-mentioned digital code, binary code and corresponding format are mutually replaceable, and these codes represent complex bits or hexadecimal digit codes. The scheduler 102 has a memory 116. The memory 116 is used to store operands of the PSAD instruction and the PMIN instruction, and has a first bus ABUS and a second bus BBUS. In an embodiment, the first bus ABUS and the second bus BBUS can transmit 128 bits, but are not intended to limit the present invention. In other embodiments, the first bus ABUS and the second bus BBUS can transmit other numbers of bits. Although the media unit 110 is typically used to perform a variety of other media instructions known to those skilled in the art, the comparison circuit 114 is operative to execute the PSAD and PMIN instructions.

在一可能實施例中，針對PSAD指令而言，第一數位碼集合具有4個位元組(每一位元組具有8個位元)，其中這4個位元組係為無正負號位元組。針對PSAD指令而言，第二數位碼集合具有一位元組集合。該位元組集合具有11個連續的位元組。同一時間，每4個連續的位元組會被分類成一個群組。針對第二數位碼集合而言，每一下一個4位元組群組係由下一較高位元組開始，意思就是說，每一下一個群組會位移1個位元組，因此，會重疊上一個群組的最後3個位元組。假設第二數位碼集合具有11個位元組B0~B10。首先將B0~B3分類成第一群組，接著，由下一個較高位元組(如B1)開始，再分類形成第二群組(B1~B4)。因此，第二群組(B1~B4)會重疊第一群組(B0~B3)的最後3個位元組(B1~B3)。第一數位碼集合的每一位元組與第二數位碼集合的每一位元組之間的差稱為誤差絕對值。上述誤差絕對值會被加總在一起。一具體的例子就是英特爾的SSE4程式參考手冊裡的MPSADBW指令。針對PSAD指令而言，第一匯流排ABUS傳送第一運算元。第一運算元包括4個無正負號的位元組。第二匯流排BBUS傳送第二運算元。第二運算元具有11個無正負號的位元組。誤差絕對值總和係為8個無正負號的10位元二進制碼。PSAD指令可能包括一個或多個偏移量(offset)，用以找到上述運算元。本發明並不限定偏移量的大小，任何偏移量均可透過第一匯流排ABUS及第二匯流排BBUS而配置，因此，相對應的運算元會被配置在第一匯流排ABUS及第二匯流排BBUS的最右高位元位置(right-most bit position)。在本實施例中，省略上述偏移量。在一實施例中，PMIN指令提供第一匯流排ABUS中的8個無正負號數位字元的最小值及該最小值的相對應位置，其中這8個無正負號數位字元的每一字元具有16位元。一具體的例子就是英特爾的SSE4程式參考手冊裡的PHMINPOSUW指令。針對PMIN指令而言，第一匯流排ABUS傳送8個字元，每一字元具有16位元。第二匯流排BBUS所傳送的位元可不被定義或是忽略，亦或是令第二匯流排BBUS所傳送的字元與第一匯流排ABUS相同。在本實施例中，比較電路114在單一週期內，利用相同的加法器電路執行雙指令(PMIN指令及PSAD指令)。In a possible embodiment, for the PSAD instruction, the first digit code set has 4 bytes (each byte has 8 bits), wherein the 4 bytes are unsigned bits. Tuple. For the PSAD instruction, the second set of digit codes has a set of one tuples. This byte set has 11 consecutive bytes. At the same time, every 4 consecutive bytes are classified into a group. For the second digit code set, each next 4-byte group starts with the next higher byte, meaning that each next group is shifted by 1 byte, so it will overlap. The last 3 bytes of a group. Assume that the second digit code set has 11 bytes B0~B10. First, B0~B3 are classified into the first group, and then, the next higher byte (such as B1) is started, and then classified into the second group (B1~B4). Therefore, the second group (B1~B4) overlaps the last three bytes (B1~B3) of the first group (B0~B3). The difference between each tuple of the first set of digits and each tuple of the second set of digits is referred to as the absolute value of the error. The absolute values of the above errors will be added together. A specific example is the MPSADBW instruction in Intel's SSE4 Program Reference Manual. For the PSAD instruction, the first bus ABUS transmits the first operand. The first operand includes 4 unsigned bins. The second bus BBUS transmits the second operand. The second operand has 11 unsigned bins. The sum of the absolute values of the errors is 8 unsigned 10-bit binary codes. The PSAD instruction may include one or more offsets to find the above operands. The present invention does not limit the magnitude of the offset. Any offset can be configured through the first bus ABUS and the second bus BBUS. Therefore, the corresponding operands are configured in the first bus ABUS and the first bus. The right-most bit position of the second bus BBUS. In the present embodiment, the above offset amount is omitted. In one embodiment, the PMIN instruction provides a minimum of eight unsigned digits in the first bus ABUS and a corresponding location of the minimum, wherein each of the eight unsigned digits The element has 16 bits. A specific example is the PHMINPOSUW instruction in Intel's SSE4 Program Reference Manual. For the PMIN instruction, the first bus ABUS transmits 8 characters, each of which has 16 bits. The bits transmitted by the second bus BBUS may not be defined or ignored, or the characters transmitted by the second bus BBUS may be the same as the first bus ABUS. In the present embodiment, the comparison circuit 114 executes the dual instructions (PMIN instruction and PSAD instruction) using the same adder circuit in a single cycle.

第2圖係為本發明的比較電路114之一實施例。如圖所示，比較電路114包括，路徑選擇電路(routing circuit)202、低階(low-order；LO)加法器電路203、高階(high-order；HI)加法器電路207、高階/低階比較器電路212。路徑選擇電路202具有二輸入端，分別耦接第一匯流排ABUS及第二匯流排BBUS。路徑選擇電路202具有另一輸入端，用以接收控制碼INSTR。路徑選擇電路202根據輸入端所接收到的控制碼INSTR，對來自第一匯流排ABUS及第二匯流排BBUS的位元組，進行重新排列或重新進行路徑選擇，用以切分第一匯流排ABUS及第二匯流排BBUS的位元組。控制碼INSTR具有至少1位元。在本實施例中，當控制碼INSTR等於1時，表示執行PMIN指令；當控制碼INSTR等於0時，表示執行PSAD指令。第一匯流排ABUS被切分成一高位元部分AH<31:0>以及一低位元部分AL<31:0>，其中高位元部分AH<31:0>及低位元部分AL<31:0>均具有32位元。第二匯流排BBUS被切分成一高位元部分BH<55:0>以及一低位元部分BL<55:0>，其中高位元部分BH<55:0>及低位元部分BL<55:0>均具有56位元。稍後將詳細說明如何根據一開始所執行的指令，對第一匯流排ABUS及第二匯流排BBUS的位元組進行重新排列或重新進行路徑選擇。低階加法器電路203具有第一加法器電路204。第一加法器電路204耦接第一PMIN電路206。高階加法器電路207具有第二加法器電路208。第二加法器電路208耦接第二PMIN電路210。Figure 2 is an embodiment of the comparison circuit 114 of the present invention. As shown, the comparison circuit 114 includes a routing circuit 202, a low-order (LO) adder circuit 203, a high-order (HI) adder circuit 207, and a high-order/low-order Comparator circuit 212. The path selection circuit 202 has two input ends, which are respectively coupled to the first bus ABUS and the second bus BBUS. Path selection circuit 202 has another input for receiving control code INSTR. The path selection circuit 202 rearranges or re-selects the byte from the first bus ABUS and the second bus BBUS according to the control code INSTR received by the input terminal, and divides the first bus bar. The ABUS and the second bus BBUS byte. The control code INSTR has at least 1 bit. In the present embodiment, when the control code INSTR is equal to 1, it indicates that the PMIN instruction is executed; when the control code INSTR is equal to 0, it indicates that the PSAD instruction is executed. The first bus ABUS is divided into a high bit portion AH<31:0> and a low bit portion AL<31:0>, wherein the high bit portion AH<31:0> and the low bit portion AL<31:0> Both have 32 bits. The second bus bar BBUS is divided into a high bit portion BH<55:0> and a low bit portion BL<55:0>, wherein the high bit portion BH<55:0> and the low bit portion BL<55:0> Both have 56 bits. How to rearrange or re-select the bytes of the first bus ABUS and the second bus BBUS according to the instructions executed at the beginning will be described in detail later. The low order adder circuit 203 has a first adder circuit 204. The first adder circuit 204 is coupled to the first PMIN circuit 206. The high order adder circuit 207 has a second adder circuit 208. The second adder circuit 208 is coupled to the second PMIN circuit 210.

第一加法器電路204接收控制碼INSTR、低位元部分AL<31:0>及BL<55:0>，並輸出誤差絕對值總和PSAD<39:0>以及比較位元C<5:0>。誤差絕對值總和PSAD<39:0>具有40位元。比較位元C<5:0>具有6位元。比較位元C<5:0>、AL<15:0>及BL<47:0>被傳送至第一PMIN電路206。針對低位元部分，第一PMIN電路206輸出最小值PMINVAL<15:0>以及相對應位置PMINLOC<1:0>。控制碼INSTR、高位元部分AH<31:0>及BH<55:0>被傳送至第二加法器電路208。第二加法器電路208輸出誤差絕對值總和PSAD<79:40>以及比較位元C<11:6>。誤差絕對值總和PSAD<79:40>具有40位元。比較位元C<11:6>具有6位元。比較位元C<11:6>、AH<15:0>及BH<47:0>被傳送至第二PMIN電路210。針對高位元部分，第二PMIN電路210輸出最小值PMINVAL<31:16>以及相對應位置PMINLOC<3:2>。將第一PMIN電路206所輸出的最小值PMINVAL<15:0>以及相對應位置PMINLOC<1:0>以及第二PMIN電路210所輸出的最小值PMINVAL<31:16>以及相對應位置PMINLOC<3:2>相結合，便可產生PMINVAL<31:0>以及PMINLOC<3:0>。高階/低階比較器電路212接收PMINVAL<31:0>以及PMINLOC<3:0>，並產生最終的最小數位碼MINVAL<15:0>及其相對位置MINLOC<2:0>。The first adder circuit 204 receives the control code INSTR, the lower bit portions AL<31:0> and BL<55:0>, and outputs the sum of the absolute values of the errors, PSAD<39:0>, and the comparison bits C<5:0>. . The sum of the absolute values of the errors, PSAD<39:0>, has 40 bits. The compare bit C<5:0> has 6 bits. The compare bits C<5:0>, AL<15:0>, and BL<47:0> are transferred to the first PMIN circuit 206. For the lower bit portion, the first PMIN circuit 206 outputs the minimum value PMINVAL<15:0> and the corresponding position PMINLOC<1:0>. The control code INSTR, the high bit portions AH<31:0> and BH<55:0> are transferred to the second adder circuit 208. The second adder circuit 208 outputs the sum of the absolute values of the errors, PSAD<79:40>, and the comparison bits C<11:6>. The sum of the absolute values of the errors PSAD<79:40> has 40 bits. The comparison bit C<11:6> has 6 bits. The compare bits C<11:6>, AH<15:0>, and BH<47:0> are transferred to the second PMIN circuit 210. For the high bit portion, the second PMIN circuit 210 outputs the minimum value PMINVAL<31:16> and the corresponding position PMINLOC<3:2>. The minimum value PMINVAL<15:0> and the corresponding position PMINLOC<1:0> output by the first PMIN circuit 206 and the minimum value PMINVAL<31:16> output by the second PMIN circuit 210 and the corresponding position PMINLOC< When 3:2> is combined, PMINVAL<31:0> and PMINLOC<3:0> can be generated. The high order/low order comparator circuit 212 receives PMINVAL<31:0> and PMINLOC<3:0> and produces the final minimum digit code MINVAL<15:0> and its relative position MINLOC<2:0>.

第一加法器電路204及第二加法器電路208根據指令(即控制碼INSTR)，對輸入的位元組進行排列，並進行位元組間的比較。針對PSAD指令而言，組合後的PSAD<79:0>具有8個10位元的數位碼，其中這些數位碼沒有正負號。這8個10位元的數位碼係為執行誤差絕對值總和操作後的結果。針對PSAD指令，第一PMIN電路206、第二PMIN電路210及高階/低階比較器電路212可被省略。針對PMIN指令，當每一高位元部分及低位元部分輸入時，可省略PSAD<79:0>，藉由第一PMIN電路206及第二PMIN電路210所接收到的比較位元C<11:0>，便可得知最小的數位碼及相對位置。當第一匯流排ABUS提供128位元的輸入資料時，高階/低階比較器電路212接收並比較高位元部分及低位元部分的最小數位碼，並輸出最小值MINVAL<15:0>以及相對位置MINLOC<2:0>。The first adder circuit 204 and the second adder circuit 208 arrange the input byte groups according to the instruction (ie, the control code INSTR) and perform a comparison between the bit groups. For the PSAD instruction, the combined PSAD<79:0> has eight 10-bit digit codes, where these digit codes have no sign. These eight 10-bit digit codes are the result of the sum of the absolute values of the execution errors. The first PMIN circuit 206, the second PMIN circuit 210, and the high order/low order comparator circuit 212 may be omitted for the PSAD instruction. For the PMIN instruction, when each high bit portion and low bit portion are input, PSAD<79:0> may be omitted, and the comparison bit C<11 received by the first PMIN circuit 206 and the second PMIN circuit 210 is: 0>, you can know the smallest digit code and relative position. When the first bus ABUS provides 128-bit input data, the high-order/low-order comparator circuit 212 receives and compares the lowest digits of the high-order portion and the low-order portion, and outputs the minimum value MINVAL<15:0> and relative Location MINLOC<2:0>.

第3圖為本發明之路徑選擇電路202之一實施例。路徑選擇電路202根據特定的指令，用以對第一匯流排ABUS及第二匯流排BBUS所提供的數位碼進行排列或是重新進行路徑選擇。緩衝器電路302接收ABUS<31:0>，並針對PSAD指令及PMIN指令，輸出相對應的AL<31:0>。在一實施例中，針對每一位元，緩衝器電路302可包含一獨立的緩衝器，使得ABUS<31:0>可有效地被複製成AL<31:0>。換句話說，AL<31>=ABUS<31>、AL<30>=ABUS<30>、…、AL<0>=ABUS<0>。對於PSAD指令及PMIN指令而言，AL<31:0>具有4個位元組A3~A0。對PMIN指令而言，位元組A3~A0可分成兩對，其中A3及A2可構成字元W1，A1及A0可構成字元W0。字元W1及W0均具有16位元。多工器304接收ABUS<95:64>及ABUS<31:0>。當多工器304的控制信號等於邏輯1(或高位準)時，多工器304的輸出AH<31:0>等於ABUS<95:64>。當多工器304的控制信號等於邏輯0(或低位準)時，多工器304的輸出AH<31:0>等於ABUS<31:0>。在一實施例中，對於32位元的AH<31:0>中的每一位元，均提供單獨的一具有1位元寬度的多工器，因此對於每一輸入端及輸出端均具有單獨的多工器路徑(MUX path)。若控制碼INSTR代表PMIN指令時，則多工器304將ABUS<95:64>作為AH<31:0>。這32位元形成4個位元組A11~A8。針對PMIN而言，位元組A11~A8可分成兩字元，其中位元組A11及A10可構成字元W5，而位元組A9及A8可構成字元W4。若控制碼INSTR代表PSAD時，則多工器304將ABUS<31:0>作為AH<31:0>。這32位元形成4個位元組A3~A0。位元組的複製就是因為PSAD指令的第一運算元對於高階及低階部分來說是相同的，稍後將詳細說明。Figure 3 is an embodiment of the path selection circuit 202 of the present invention. The path selection circuit 202 is configured to arrange or re-route the digital code provided by the first bus ABUS and the second bus BBUS according to a specific instruction. The buffer circuit 302 receives ABUS<31:0> and outputs a corresponding AL<31:0> for the PSAD instruction and the PMIN instruction. In an embodiment, for each bit, buffer circuit 302 can include a separate buffer such that ABUS<31:0> can be effectively copied to AL<31:0>. In other words, AL<31>=ABUS<31>, AL<30>=ABUS<30>,...,AL<0>=ABUS<0>. For the PSAD instruction and the PMIN instruction, AL<31:0> has 4 bytes A3~A0. For the PMIN instruction, the bytes A3~A0 can be divided into two pairs, wherein A3 and A2 can form the character W1, and A1 and A0 can form the character W0. The characters W1 and W0 each have 16 bits. The multiplexer 304 receives ABUS<95:64> and ABUS<31:0>. When the control signal of multiplexer 304 is equal to logic 1 (or high level), the output AH<31:0> of multiplexer 304 is equal to ABUS<95:64>. When the control signal of multiplexer 304 is equal to logic 0 (or low level), the output AH<31:0> of multiplexer 304 is equal to ABUS<31:0>. In one embodiment, a separate multiplexer having a 1-bit width is provided for each of the 32-bit AH<31:0> bits, thus having an input for each input and output. A separate multiplexer path (MUX path). If the control code INSTR represents the PMIN command, the multiplexer 304 takes ABUS<95:64> as AH<31:0>. These 32 bits form 4 bytes A11~A8. For PMIN, the bytes A11~A8 can be divided into two characters, wherein the bytes A11 and A10 can constitute the character W5, and the bytes A9 and A8 can constitute the character W4. If the control code INSTR represents PSAD, the multiplexer 304 takes ABUS<31:0> as AH<31:0>. These 32 bits form 4 bytes A3~A0. The copying of the byte is because the first operand of the PSAD instruction is the same for the high-order and low-order parts, as will be described in detail later.

當多工器306的控制信號為邏輯1時(即控制碼INSTR=1)，多工器306接收並輸出8個高位元0x8以及ABUS<63:16>，其中這8個高位元0x8的邏輯值均為0。此時，多工器306的輸出BL<55:0>為8個高位元0x8以及ABUS<63:16>。當多工器306的控制信號為邏輯0時，多工器306接收並輸出BBUS<55:0>，此時，多工器306的輸出BL<55:0>為BBUS<55:0>。在一實施例中，針對每一匯流排的每一位元組而言，可使用具有1位元寬度的多工器。若控制碼INSTR代表PMIN指令時，則ABUS<63:16>會被選擇到。ABUS<63:16>具有6個位元組A7~A2。位元組A7~A2可被分別3對。位元組A7及A6可構成字元W3。位元組A5及A4可構成字元W2。位元組A3及A2可構成字元W1。若控制碼INSTR代表PSAD指令時，BBUS<55:0>會被選擇到。BBUS<55:0>具7個低位元組B6~B0的第二運算元。當多工器308的控制端為邏輯1時，多工器308接收並輸出8個高位元0x8以及ABUS<127:79>,其中這8個高位元0x8的邏輯值均為0。此時，多工器308的輸出BH<55:0>為8個高位元0x8以及ABUS<127:79>的組合。當多工器308的控制端為邏輯0時，多工器308接收並輸出BBUS<87:32>。此時，多工器308的輸出BH<55:0>為BBUS<87:32>。若控制碼INSTR為PMIN指令時，ABUS<127:79>會被選擇。ABUS<127:79>具有6個位元組A15~A10。位元組A15~A10可分別3對。位元組A15及A14可構成字元W7。位元組A13及A12可構成字元W6。位元組A11及A10可構成字元W5。若控制碼INSTR為PSAD指令時，BBUS<87:32>會被選擇。BBUS<87:32>具有7個高位元組B10~B4的，7個高位元組B10~B4構成PSAD指令的第二運算元。When the control signal of the multiplexer 306 is logic 1 (ie, the control code INSTR=1), the multiplexer 306 receives and outputs 8 high bits 0x8 and ABUS<63:16>, wherein the logic of the 8 high bits 0x8 The value is 0. At this time, the output BL<55:0> of the multiplexer 306 is 8 high bits 0x8 and ABUS<63:16>. When the control signal of the multiplexer 306 is logic 0, the multiplexer 306 receives and outputs BBUS<55:0>. At this time, the output BL<55:0> of the multiplexer 306 is BBUS<55:0>. In an embodiment, a multiplexer having a 1-bit width may be used for each byte of each bus. If the control code INSTR represents the PMIN instruction, then ABUS<63:16> will be selected. ABUS<63:16> has 6 bytes A7~A2. The bytes A7~A2 can be respectively 3 pairs. Bytes A7 and A6 can form a character W3. Bytes A5 and A4 can form a character W2. The bytes A3 and A2 can form the character W1. If the control code INSTR represents the PSAD instruction, BBUS<55:0> will be selected. BBUS<55:0> has the second operand of the 7 lower bytes B6~B0. When the control terminal of the multiplexer 308 is logic 1, the multiplexer 308 receives and outputs 8 high bits 0x8 and ABUS<127:79>, wherein the logical values of the 8 high bits 0x8 are all 0. At this time, the output BH<55:0> of the multiplexer 308 is a combination of 8 high bits 0x8 and ABUS<127:79>. When the control terminal of the multiplexer 308 is logic 0, the multiplexer 308 receives and outputs BBUS<87:32>. At this time, the output BH<55:0> of the multiplexer 308 is BBUS<87:32>. If the control code INSTR is the PMIN instruction, ABUS<127:79> will be selected. ABUS<127:79> has 6 bytes A15~A10. The bytes A15~A10 can be 3 pairs respectively. The bytes A15 and A14 can form the character W7. The bytes A13 and A12 can constitute a character W6. The bytes A11 and A10 can constitute a character W5. If the control code INSTR is a PSAD instruction, BBUS<87:32> will be selected. BBUS<87:32> has 7 high-order bytes B10~B4, and 7 high-order bytes B10~B4 constitute the second operand of the PSAD instruction.

請參考第2圖，針對PMIN指令而言，利用第3圖所顯示的路徑選擇電路202的選派，可將字元W1及W0提供給AL匯流排，將字元W3~W1提供給BL匯流排，以便傳送到第一加法器電路204。第一加法器電路204將字元W0分別與字元W1~W3相比較，再將字元W1分別與字元W2~W3相比較，然後再將字元W2與字元W3相比較，並根據比較結果，提供相對應的比較位元C<5:0>。第一PMIN電路206接收字元W3~W0，並將最小字元作為PMINVAL<15:0>。第一PMIN電路206指出第一匯流排ABUS的低位元部分的最小字元及其相對應位置PMINLOC<1:0>。舉例而言，若最小字元位於ABUS<15:0>時，則PMINLOC=00；若最小字元位於ABUS<32:16>時，則PMINLOC=01。同樣道理，針對PMIN指令而言，可將字元W5及W4提供給AH匯流排，將字元W7~W5提供給BH匯流排，以便傳送到第二加法器電路208。第二加法器電路208將字元W4與字元W5~W7相比較，然後再將字元W5分別與字元W6~W7相比較，接著將字元W6分別與字元W7相比較，並根據比較結果，提供相對應的比較位元C<11:6>。第二PMIN電路210接收字元W7~W4，並將字元W7~W4中的最小字元的相對應位元作為PMINVAL<31:16>。第二PMIN電路210亦指示位於第一匯流排ABUS的高位元部分的最小字元的相對應位置PMINLOC<3:2>。舉例而言，若最小字元位於ABUS<79:64>時，則PMINLOC=00；若最小字元位於ABUS<95:65>時，則PMINLOC=01。高階/低階比較器電路212將PMINVAL<15:0>的字元與PMINVAL<31:16>的字元相比較，用以辨識出何者才是ABUS<127:0>中的最小值。藉由高階/低階比較器電路212的比較結果，亦可得知最小值的相對位置MINLOC<2:0>。Referring to FIG. 2, for the PMIN instruction, using the selection of the path selection circuit 202 shown in FIG. 3, the characters W1 and W0 can be supplied to the AL bus, and the characters W3 to W1 can be supplied to the BL bus. For transmission to the first adder circuit 204. The first adder circuit 204 compares the character W0 with the characters W1 to W3, respectively, and compares the character W1 with the characters W2 to W3, and then compares the character W2 with the character W3, and according to Comparing the results, the corresponding comparison bits C<5:0> are provided. The first PMIN circuit 206 receives the characters W3~W0 and takes the smallest character as PMINVAL<15:0>. The first PMIN circuit 206 indicates the smallest character of the lower bit portion of the first bus ABUS and its corresponding position PMINLOC<1:0>. For example, if the minimum character is at ABUS<15:0>, then PMINLOC=00; if the smallest character is at ABUS<32:16>, then PMINLOC=01. Similarly, for the PMIN instruction, the characters W5 and W4 can be provided to the AH bus, and the characters W7~W5 can be provided to the BH bus for transmission to the second adder circuit 208. The second adder circuit 208 compares the character W4 with the characters W5~W7, and then compares the character W5 with the characters W6~W7, respectively, and then compares the character W6 with the character W7, respectively, according to Comparing the results, the corresponding comparison bits C<11:6> are provided. The second PMIN circuit 210 receives the characters W7 to W4, and takes the corresponding bit of the smallest character of the characters W7 to W4 as PMINVAL<31:16>. The second PMIN circuit 210 also indicates the corresponding position PMINLOC<3:2> of the smallest character located in the high-order portion of the first bus ABUS. For example, if the minimum character is at ABUS<79:64>, then PMINLOC=00; if the smallest character is at ABUS<95:65>, then PMINLOC=01. The high order/low order comparator circuit 212 compares the PMINVAL<15:0> characters with the PMINVAL<31:16> characters to identify which is the minimum value in ABUS<127:0>. The relative position of the minimum value MINLOC<2:0> can also be known by the comparison result of the high-order/low-order comparator circuit 212.

請參考第2圖，針對PSAD指令而言，路徑選擇電路202(如第3圖所示)藉由位元組的選派，將來自第一匯流排ABUS的第一運算元的位元組A3~A0提供給AL<31:0>及AH<31:0>，並分別將AL<31:0>提供予第一加法器電路204以及將AH<31:0>提供予第二加法器電路208。路徑選擇電路202將來自第二匯流排BBUS的第二運算元的位元組B6~B0作為BL<55:0>，並將BL<55:0>傳送至第一加法器電路204。路徑選擇電路202將來自第二匯流排BBUS的第二運算元的位元組B10~B4作為BH<55:0>，並將BH<55:0>傳送至第二加法器電路208。針對PSAD指令而言，第一加法器電路204將位元組A0與B0間的差、位元組A1與B1間的差、位元組A2與B2間的差與位元組A3與B3間的差加總在一起，並提供第一10位元的結果PSAD<9:0>。第一加法器電路204將位元組A0與B1間的差、位元組A1與B2間的差、位元組A2與B3間的差與位元組A3與B4間的差加總在一起，並提供第二10位元的結果PSAD<19:10>。第一加法器電路204將位元組A0與B2間的差、位元組A1與B3間的差、位元組A2與B4間的差與位元組A3與B5間的差加總在一起，並提供第三10位元的結果PSAD<29:20>。第一加法器電路204將位元組A0與B3間的差、位元組A1與B4間的差、位元組A2與B5間的差與位元組A3與B6間的差加總在一起，並提供第三10位元的結果PSAD<39:30>。同樣道理，第二加法器電路208將位元組A0與B4間的差、位元組A1與B5間的差、位元組A2與B6間的差與位元組A3與B7間的差加總在一起，並提供第一10位元的結果PSAD<49:40>。第二加法器電路208將位元組A0與B5間的差、位元組A1與B6間的差、位元組A2與B7間的差與位元組A3與B8間的差加總在一起，並提供第二10位元的結果PSAD<59:50>。第二加法器電路208將位元組A0與B6間的差、位元組A1與B7間的差、位元組A2與B8間的差與位元組A3與B9間的差加總在一起，並提供第三10位元的結果PSAD<69:60>。第二加法器電路208將位元組A0與B7間的差、位元組A1與B8間的差、位元組A2與B9間的差與位元組A3與B10間的差加總在一起，並提供第四10位元的結果PSAD<79:70>。Referring to FIG. 2, for the PSAD instruction, the path selection circuit 202 (shown in FIG. 3) selects the byte A3 of the first operation element from the first bus ABUS by the selection of the byte. A0 is supplied to AL<31:0> and AH<31:0>, and provides AL<31:0> to the first adder circuit 204 and AH<31:0> to the second adder circuit 208, respectively. . The path selection circuit 202 takes the byte B6~B0 of the second operand from the second bus BBUS as BL<55:0> and transmits BL<55:0> to the first adder circuit 204. The path selection circuit 202 takes the byte B10~B4 of the second operand from the second bus BBUS as BH<55:0> and transfers BH<55:0> to the second adder circuit 208. For the PSAD instruction, the first adder circuit 204 divides the difference between the bytes A0 and B0, the difference between the bytes A1 and B1, the difference between the bytes A2 and B2, and the difference between the bytes A3 and B3. The difference is added together and provides the result of the first 10 bits PSAD<9:0>. The first adder circuit 204 adds the difference between the bytes A0 and B1, the difference between the bytes A1 and B2, the difference between the bytes A2 and B3, and the difference between the bytes A3 and B4. And provide the result of the second 10-bit PSAD<19:10>. The first adder circuit 204 adds the difference between the bytes A0 and B2, the difference between the bytes A1 and B3, the difference between the bytes A2 and B4, and the difference between the bytes A3 and B5. And provide the result of the third 10-bit PSAD<29:20>. The first adder circuit 204 adds the difference between the bytes A0 and B3, the difference between the bytes A1 and B4, the difference between the bytes A2 and B5, and the difference between the bytes A3 and B6. And provide the result of the third 10-bit PSAD<39:30>. By the same token, the second adder circuit 208 divides the difference between the bytes A0 and B4, the difference between the bytes A1 and B5, the difference between the bytes A2 and B6, and the difference between the bytes A3 and B7. Together, and provide the first 10-bit result PSAD<49:40>. The second adder circuit 208 sums the difference between the bytes A0 and B5, the difference between the bytes A1 and B6, the difference between the bytes A2 and B7, and the difference between the bytes A3 and B8. And provide the result of the second 10-bit PSAD<59:50>. The second adder circuit 208 sums the difference between the bytes A0 and B6, the difference between the bytes A1 and B7, the difference between the bytes A2 and B8, and the difference between the bytes A3 and B9. And provide the result of the third 10-bit PSAD<69:60>. The second adder circuit 208 sums the difference between the bytes A0 and B7, the difference between the bytes A1 and B8, the difference between the bytes A2 and B9, and the difference between the bytes A3 and B10. And provide the result of the fourth 10-bit PSAD<79:70>.

第4圖為本發明之第一加法器電路204之一實施例。第一加法器電路204處理AL<31:0>與BL<31:0>中的位元組，並提供PSAD<39:0>或C<5:0>。第一加法器電路204包括差異電路(difference circuit)402、總和電路(sum circuit)404、選擇邏輯電路(selection logic)410及選擇邏輯電路412。差異電路402具有多個差異單元DIFF1~DIFF8。差異單元DIFF1~DIFF8各自獨立。總和電路404具有總和單元S1~S4。總和單元S1~S4各自獨立。每一差異單元判斷4個位元組(即2對位元組)之間的差異(無正負號)。每一差異單元將每一對位元組的其中之一位元組反相後，再與另一位元組加總在一起。每一對位元組所產生的差異即為誤差絕對值。差異單元所接收到的位元組資料係由一開始所執行的指令所決定。選擇邏輯電路410具有複數多工電路。每一多工電路彼此獨立。該等多工電路根據一開始所執行的指令，選擇特定位元組予差異單元DIFF3。如圖所示，針對PMIN指令而言，當選擇邏輯電路410的控制端為邏輯1時(即控制碼INSTR=1)，選擇邏輯電路410選擇並輸出位元組BL<47:40>、BL<31:24>、BL<39:32>及BL<23:16>予差異單元DIFF3。位元組BL<47:40>、BL<31:24>、BL<39:32>及BL<23:16>分別對應於位元組A7~A4。針對PSAD指令而言，當選擇邏輯電路410的控制端為邏輯0時(即控制碼INSTR=0)，選擇邏輯電路410選擇並輸出位元組BL<23:16>、AL<15:8>、BL<15:8>及AL<7:0>予差異單元DIFF3。位元組BL<23:16>、AL<15:8>、BL<15:8>及AL<7:0>分別對應於位元組B2、A1、B1及A0。同樣的道理，針對PMIN指令而言，當選擇邏輯電路412的控制端為邏輯1時，選擇邏輯電路412選擇並輸出位元組AL<15:8>及AL<7:0>予差異單元DIFF8。位元組AL<15:8>及AL<7:0>分別對應於位元組A1及A0。針對PSAD指令而言，當選擇邏輯電路412的控制端為邏輯0時，選擇邏輯電路412選擇並輸出位元組AL<23:16>及AL<15:8>予差異單元DIFF3。位元組AL<23:16>及AL<15:8>分別對應於位元組A2及A1。Figure 4 is an embodiment of a first adder circuit 204 of the present invention. The first adder circuit 204 processes the byte in AL<31:0> and BL<31:0> and provides PSAD<39:0> or C<5:0>. The first adder circuit 204 includes a difference circuit 402, a sum circuit 404, a selection logic 410, and a selection logic circuit 412. The difference circuit 402 has a plurality of difference units DIFF1 to DIFF8. The difference units DIFF1~DIFF8 are independent. The summing circuit 404 has summing units S1 to S4. The summation units S1 to S4 are independent of each other. Each difference unit judges the difference (no sign) between 4 bytes (ie, 2 pairs of bytes). Each difference unit inverts one of the bytes of each pair of bytes and then sums it with another. The difference produced by each pair of bytes is the absolute value of the error. The byte data received by the difference unit is determined by the instruction executed at the beginning. The selection logic circuit 410 has a complex multiplex circuit. Each multiplexed circuit is independent of each other. The multiplex circuits select a particular byte to the difference unit DIFF3 based on the instructions executed at the outset. As shown, for the PMIN instruction, when the control terminal of the selection logic circuit 410 is logic 1 (ie, the control code INSTR=1), the selection logic circuit 410 selects and outputs the byte BL<47:40>, BL. <31:24>, BL<39:32> and BL<23:16> to the difference unit DIFF3. The byte BL<47:40>, BL<31:24>, BL<39:32>, and BL<23:16> correspond to the byte groups A7~A4, respectively. For the PSAD instruction, when the control terminal of the selection logic circuit 410 is logic 0 (ie, the control code INSTR=0), the selection logic circuit 410 selects and outputs the byte groups BL<23:16>, AL<15:8>. , BL<15:8> and AL<7:0> to the difference unit DIFF3. The byte groups BL<23:16>, AL<15:8>, BL<15:8>, and AL<7:0> correspond to the byte groups B2, A1, B1, and A0, respectively. By the same token, for the PMIN instruction, when the control terminal of the selection logic circuit 412 is logic 1, the selection logic circuit 412 selects and outputs the byte groups AL<15:8> and AL<7:0> to the difference unit DIFF8. . The bytes AL<15:8> and AL<7:0> correspond to the byte groups A1 and A0, respectively. For the PSAD instruction, when the control terminal of the selection logic circuit 412 is logic 0, the selection logic circuit 412 selects and outputs the byte groups AL<23:16> and AL<15:8> to the difference unit DIFF3. The bytes AL<23:16> and AL<15:8> correspond to the byte groups A2 and A1, respectively.

針對PSAD指令而言，差異單元DIFF1的第一反相輸入端接收位元組BL<15:8>。位元組BL<15:8>對應位元組B1。差異單元DIFF1的第二非反相輸入端接收位元組AL<15:8>。位元組AL<15:8>對應位元組A1。差異單元DIFF1確定位元組A1與B1之間的誤差絕對值(|A1-B1|)。差異單元DIFF1將位元組A1與B1之間的誤差絕對值(|A1-B1|)作為結果AD1，並由第一輸出端輸出。同樣地，差異單元DIFF1的第三反相輸入端接收位元組BL<7:0>。位元組BL<7:0>對應位元組B0。差異單元DIFF1的第四非反相輸入端接收位元組AL<7:0>。位元組AL<7:0>對應位元組A0。差異單元DIFF1確定位元組A0與B0之間的誤差絕對值(|A0-B0|)。差異單元DIFF1將位元組A0與B0之間的誤差絕對值(|A0-B0|)作為結果AD2，並由第二輸出端輸出。同樣地，差異單元DIFF2確定位元組A3與B3之間的誤差絕對值(|A3-B3|)，並位元組A3與B3之間的誤差絕對值作為AD3，並由第一輸出端輸出。差異單元DIFF2確定位元組A2與B2之間的誤差絕對值(|A2-B2|)，並將位元組A2與B2之間的誤差絕對值作為AD4，並由第二輸出端輸出。總而言之，當控制碼INSTR為PSAD指令時，差異電路402確定位元組A0分別與位元組B0~B3之間的誤差絕對值、位元組A1分別與位元組B1~B4之間的誤差絕對值、位元組A2分別與位元組B2~B5之間的誤差絕對值、及位元組A3分別與位元組B3~B6之間的誤差絕對值。For the PSAD instruction, the first inverting input of the difference unit DIFF1 receives the byte group BL<15:8>. The byte BL<15:8> corresponds to the byte B1. The second non-inverting input of the difference unit DIFF1 receives the byte AL<15:8>. The byte AL<15:8> corresponds to the byte A1. The difference unit DIFF1 determines the absolute value of the error (|A1-B1|) between the bytes A1 and B1. The difference unit DIFF1 takes the absolute value of the error (|A1-B1|) between the bytes A1 and B1 as the result AD1, and outputs it from the first output. Similarly, the third inverting input of the difference unit DIFF1 receives the byte group BL<7:0>. The byte BL<7:0> corresponds to the byte B0. The fourth non-inverting input of the difference unit DIFF1 receives the byte AL<7:0>. The byte AL<7:0> corresponds to the byte A0. The difference unit DIFF1 determines the absolute value of the error (|A0-B0|) between the bytes A0 and B0. The difference unit DIFF1 takes the absolute value of the error (|A0-B0|) between the bytes A0 and B0 as the result AD2, and outputs it from the second output. Similarly, the difference unit DIFF2 determines the absolute value of the error between the bytes A3 and B3 (|A3-B3|), and the absolute value of the error between the bytes A3 and B3 is taken as AD3, and is output by the first output. . The difference unit DIFF2 determines the absolute value of the error (|A2-B2|) between the bytes A2 and B2, and takes the absolute value of the error between the bytes A2 and B2 as AD4, and outputs it from the second output. In summary, when the control code INSTR is the PSAD command, the difference circuit 402 determines the absolute value of the error between the byte A0 and the byte B0~B3, and the error between the byte A1 and the byte B1~B4, respectively. The absolute value, the absolute value of the error between the byte A2 and the byte B2 to B5, and the absolute value of the error between the byte A3 and the byte B3 to B6, respectively.

總和單元S1計算4個位元組AD1~AD4的總合，並將計算後的結果作為10位元的PSAD<9:0>。總和單元S1的計算結果對應於(|A0-B0|)+(|A1-B1|)+(|A2-B2|)+(|A3-B3|)。針對PSAD指令而言，差異單元DIFF3確定A0與B1之間的誤差絕對值，並將A0與B1之間的誤差絕對值作為AD6。差異單元DIFF3確定A1與B2之間的誤差絕對值，並將A1與B2之間的誤差絕對值作為AD5。差異單元DIFF4確定A2與B3之間的誤差絕對值，並將A2與B3之間的誤差絕對值作為AD8。差異單元DIFF4確定A3與B4之間的誤差絕對值，並將A3與B4之間的誤差絕對值作為AD7。總和單元S2計算4個位元組AD5~AD8的總合，並將計算後的結果作為10位元的PSAD<19:10>。總和單元S2的計算結果對應於(|A0-B1|)+(|A1-B2|)+(|A2-B3|)+(|A3-B4|)。同樣地，針對PSAD指令而言，總和單元S3計算4個位元組AD9~AD12的總合，並將計算後的結果作為10位元的PSAD<29:20>。總和單元S3的計算結果對應於(|A0-B2|)+(|A1-B3|)+(|A2-B4|)+(|A3-B5|)。最後，針對PSAD指令而言，總和單元S4計算4個位元組AD13~AD16的總合，並將計算後的結果作為10位元的PSAD<39:30>。總和單元S3的計算結果對應於(|A0-B3|)+(|A1-B4|)+(|A2-B5|)+(|A3-B6|)。雖然第4圖僅顯示第一加法器電路204的一實施例，但第二加法器電路208大致上與第一加法器電路204相似，用以確定位元組A0分別與位元組B4~B7之間的誤差絕對值、位元組A1分別與位元組B5~B8之間的誤差絕對值、位元組A2分別與位元組B6~B9之間的誤差絕對值、以及位元組A3分別與位元組B7~B10之間的誤差絕對值。另外，第二加法器電路208加總4個誤差絕對值，並根據加總後的結果，提供4個加總值。PSAD<79:40>包含這4個加總值。The summation unit S1 calculates the total of the four bytes AD1 to AD4, and takes the calculated result as the 10-bit PSAD<9:0>. The calculation result of the summation unit S1 corresponds to (|A0-B0|)+(|A1-B1|)+(|A2-B2|)+(|A3-B3|). For the PSAD instruction, the difference unit DIFF3 determines the absolute value of the error between A0 and B1, and takes the absolute value of the error between A0 and B1 as AD6. The difference unit DIFF3 determines the absolute value of the error between A1 and B2, and takes the absolute value of the error between A1 and B2 as AD5. The difference unit DIFF4 determines the absolute value of the error between A2 and B3, and takes the absolute value of the error between A2 and B3 as AD8. The difference unit DIFF4 determines the absolute value of the error between A3 and B4, and takes the absolute value of the error between A3 and B4 as AD7. The sum unit S2 calculates the sum of the four bytes AD5 to AD8, and takes the calculated result as a 10-bit PSAD<19:10>. The calculation result of the summation unit S2 corresponds to (|A0-B1|)+(|A1-B2|)+(|A2-B3|)+(|A3-B4|). Similarly, for the PSAD instruction, the sum unit S3 calculates the sum of the four bytes AD9 to AD12, and takes the calculated result as the 10-bit PSAD<29:20>. The calculation result of the summation unit S3 corresponds to (|A0-B2|)+(|A1-B3|)+(|A2-B4|)+(|A3-B5|). Finally, for the PSAD instruction, the sum unit S4 calculates the sum of the four bytes AD13 to AD16, and takes the calculated result as the 10-bit PSAD<39:30>. The calculation result of the summation unit S3 corresponds to (|A0-B3|)+(|A1-B4|)+(|A2-B5|)+(|A3-B6|). Although FIG. 4 shows only one embodiment of the first adder circuit 204, the second adder circuit 208 is substantially similar to the first adder circuit 204 to determine that the byte A0 is associated with the byte B4~B7, respectively. The absolute value of the error, the absolute value of the error between the byte A1 and the byte B5~B8, the absolute value of the error between the byte A2 and the byte B6~B9, respectively, and the byte A3 The absolute value of the error between each bit and B7~B10. In addition, the second adder circuit 208 adds up to four absolute values of the error and provides four summed values based on the summed result. PSAD<79:40> contains these four totals.

總而言之，對於PSAD指令而言，差異電路402用以確定第一數位碼集合中的每一位元組(A3:A0)與第二數位碼集合中的每一位元組(B10:B0)之間的誤差絕對值。當處理完第一群組B3:B0後，再由下一個較高位元開始比較，如B1:B4、B2:B5、B3:B6…等。因此，在8個群組中，將產生誤差絕對值AD1~AD4、AD5~AD8、…、AD29~AD32。總和電路404加總每一群組的誤差絕對值，並提供相應的誤差絕對值總和PSAD<79:0>。In summary, for the PSAD instruction, the difference circuit 402 is used to determine each byte (A3: A0) in the first set of digits and each byte (B10: B0) in the second set of digits. The absolute value of the error. After the first group B3:B0 is processed, the comparison is started by the next higher bit, such as B1:B4, B2:B5, B3:B6...etc. Therefore, in eight groups, the absolute values of errors AD1~AD4, AD5~AD8, ..., AD29~AD32 will be generated. The summation circuit 404 sums the absolute values of the errors for each group and provides the corresponding sum of absolute values of the errors, PSAD<79:0>.

當控制碼INSTR為PMIN指令時，除了所選派的位元組不同外，差異電路402的處理方式大致相同。AD1~AD16的總和以及PSAD<39:0>可被省略，只需要比較位元C<5:0>。差異單元DIFF1比較或用其它方法確定A1與A3之間的誤差絕對值以及A0與A2之間的誤差絕對值。第一位元組A3係為字元W1的高位元組，而第二位元組A1係為字元W0的高位元組。第三位元組A2係為字元W1的低位元組，而第四位元組A0係為字元W0的低位元組。在本實施例中，差異單元DIFF1分別比較字元W1及W0的高位元組及低位元組。差異單元DIFF1確定比較位元C<0>。位元C<0>表示哪一個字元(W1或W0)係為較小的字元。同樣地，差異單元DIFF2比較字元W2及W1的高位元組A5與A3，以及比較字元W2及W1的低位元組A4與A2，用以確定哪一個字元(W2或W1)係為較小的字元，並提供比較位元C<3>。同樣地，差異單元DIFF3比較字元W3及W2的高位元組A7與A5，以及比較字元W3及W2的低位元組A6與A4，用以確定哪一個字元(W3或W2)係為較小的字元，並提供比較位元C<5>。針對PMIN指令而言，可省略差異單元DIFF4。差異單元DIFF5比較字元W2及W0的高位元組A5與A1，以及比較字元W2及W0的低位元組A4與A0，用以確定哪一個字元(W2或W0)係為較小的字元，並提供比較位元C<1>。差異單元DIFF6比較字元W3及W1的高位元組A7與A3，以及比較字元W3及W1的低位元組A6與A2，用以確定哪一個字元(W3或W1)係為較小的字元，並提供比較位元C<4>。針對PMIN而言，可省略差異單元DIFF7。差異單元DIFF8比較字元W3及W0的高位元組A7與A1，以及比較字元W3及W0的低位元組A6與A0，用以確定哪一個字元(W3或W0)係為較小的字元，並提供比較位元C<2>。When the control code INSTR is a PMIN instruction, the difference circuit 402 is processed in substantially the same manner except for the selected bit groups. The sum of AD1~AD16 and PSAD<39:0> can be omitted, and only need to compare bits C<5:0>. The difference unit DIFF1 compares or otherwise determines the absolute value of the error between A1 and A3 and the absolute value of the error between A0 and A2. The first tuple A3 is the high tuple of the character W1, and the second tuple A1 is the high tuple of the character W0. The third byte A2 is the lower byte of the character W1, and the fourth byte A0 is the lower byte of the character W0. In the present embodiment, the difference unit DIFF1 compares the high byte and the low byte of the characters W1 and W0, respectively. The difference unit DIFF1 determines the comparison bit C<0>. Bit C<0> indicates which character (W1 or W0) is a smaller character. Similarly, the difference unit DIFF2 compares the high-order bytes A5 and A3 of the characters W2 and W1, and compares the lower bytes A4 and A2 of the characters W2 and W1 to determine which character (W2 or W1) is compared. Small characters and provide comparison bit C<3>. Similarly, the difference unit DIFF3 compares the high-order bytes A7 and A5 of the characters W3 and W2, and compares the lower bytes A6 and A4 of the characters W3 and W2 to determine which character (W3 or W2) is compared. Small characters and provide comparison bit C<5>. For the PMIN instruction, the difference unit DIFF4 can be omitted. The difference unit DIFF5 compares the upper bytes A5 and A1 of the characters W2 and W0, and the lower bytes A4 and A0 of the comparison characters W2 and W0 to determine which character (W2 or W0) is a smaller word. Yuan, and provide comparison bit C<1>. The difference unit DIFF6 compares the high-order bytes A7 and A3 of the characters W3 and W1, and the lower bytes A6 and A2 of the comparison characters W3 and W1 to determine which character (W3 or W1) is a smaller word. Yuan, and provide comparison bit C<4>. For PMIN, the difference unit DIFF7 can be omitted. The difference unit DIFF8 compares the high-order bytes A7 and A1 of the characters W3 and W0, and the lower bytes A6 and A0 of the comparison characters W3 and W0 to determine which character (W3 or W0) is a smaller word. Yuan, and provide the comparison bit C<2>.

總而言之，針對PMIN指令而言，第一加法器電路204的差異電路402的比較位元C<0>表示字元W0與W1之間的較小者。比較位元C<1>表示字元W0與W2之間的較小者。比較位元C<2>表示字元W0與W3之間的較小者。比較位元C<3>表示字元W1與W2之間的較小者。比較位元C<4>表示字元W1與W3之間的較小者。比較位元C<5>表示字元W2與W3之間的較小者。雖然第4圖並未顯示第二加法器電路208的詳細電路，但第二加法器電路208亦具有與第一加法器電路204相同的差異電路，用以針對高階加法器電路207的字元W4~W8進行相同的比較，並提供相應的比較位元C<11:6>。因此，針對PMIN而言，比較位元C<6>表示字元W4與W5之間的較小者。比較位元C<7>表示字元W4與W6之間的較小者。比較位元C<8>表示字元W4與W7之間的較小者。比較位元C<9>表示字元W5與W6之間的較小者。比較位元C<10>表示字元W5與W7之間的較小者。比較位元C<11>表示字元W6與W7之間的較小者。第一PMIN電路206利用比較位元C<5:0>，辨識出字元W0~W3的最小者。第二PMIN電路210利用比較位元C<11:6>，辨識出字元W4~W7的最小者。In summary, for the PMIN instruction, the compare bit C<0> of the difference circuit 402 of the first adder circuit 204 represents the smaller of the characters W0 and W1. The comparison bit C<1> represents the smaller of the characters W0 and W2. The comparison bit C<2> represents the smaller of the characters W0 and W3. The comparison bit C<3> represents the smaller of the characters W1 and W2. The comparison bit C<4> represents the smaller of the characters W1 and W3. The comparison bit C<5> represents the smaller of the characters W2 and W3. Although FIG. 4 does not show the detailed circuit of the second adder circuit 208, the second adder circuit 208 also has the same difference circuit as the first adder circuit 204 for the character W4 of the higher order adder circuit 207. ~W8 performs the same comparison and provides the corresponding comparison bit C<11:6>. Thus, for PMIN, the compare bit C<6> represents the smaller of the characters W4 and W5. The compare bit C<7> represents the smaller of the characters W4 and W6. The compare bit C<8> represents the smaller of the characters W4 and W7. The comparison bit C<9> represents the smaller of the characters W5 and W6. The compare bit C<10> represents the smaller of the characters W5 and W7. The comparison bit C<11> represents the smaller of the characters W6 and W7. The first PMIN circuit 206 uses the compare bits C<5:0> to identify the smallest of the characters W0~W3. The second PMIN circuit 210 uses the comparison bit C<11:6> to identify the smallest of the characters W4~W7.

第5圖為本發明之差異單元DIFF1之一實施例。如圖所示，差異單元DIFF1具有一加法器對。該加法器對具有一高(或第一)加法器502以及一低(或第二)加法器504。加法器502及504均具有一反相輸入端B以及一非反相輸入端A。因此，加法器502及加法器504均可執行一減法操作，用以確定反相輸入端B及非反相輸入端A之間的信號差異。針對PSAD指令而言，加法器502的反相輸入端B接收位元組B1。針對PMIN指令而言，加法器502的反相輸入端B接收位元組A3。針對PSAD及PMIN指令而言，加法器502的非反相輸入端A接收位元組A1。加法器502對反相輸入端B所接收到的位元組的每一位元進行反相操作，用以得到反相值~B，其中~代表二進制中的反相。加法器502將反相後的結果(~B)與輸入端A所接收到的位元組進行無正負號的加總(即A+~B=A-B)，然後將加總後的結果由輸出端SUM輸出。加法器502具有一進位輸出(carry out；CO)端CO，用以提供一進位輸出信號CO1。當加法器502所得到的加總結果發生溢位(overflow)時，進位輸出信號CO1為邏輯1。加法器502亦會對加總結果進行增量，並將增量後的結果由輸出端INCSUM輸出。加法器502具有一傳遞(propagate)輸出端CP。若加法器將一進位輸入(carry input；未提供)輸出時，傳遞輸出端CP的傳遞輸出信號CP1為邏輯1。在第5圖中，雖然沒有進位輸入，但若加法器502接收並傳遞進位輸入時，則傳遞輸出信號CP1為邏輯1。在一實施例中，將輸入端A所接收到的位元組的每一位元與輸入端B所接收到的位元組的每一位元，一對一地作OR運算。經過OR運算後，便可得到8個運算結果。再經這8個運算結果進行AND運算。根據OR運算結果以及AND運算結果，便可決定傳遞輸出端CP的傳遞輸出信號CP1的邏輯位準。輸出端SUM耦接至反相器508的輸入端。針對位元組的每一位元而言，反相器508具有一獨立的反相器。反相器508的輸出端耦接多工器506的輸入端0。輸出端INCSUM耦接耦接多工器506的輸入端1。多工器506的選擇輸入端接收進位輸出信號CO1。多工器506的輸出信號AD1即為，多工器502的輸入端A及B所接收到的位元組間的誤差絕對值。Figure 5 is an embodiment of the difference unit DIFF1 of the present invention. As shown, the difference unit DIFF1 has an adder pair. The adder pair has a high (or first) adder 502 and a low (or second) adder 504. Adders 502 and 504 each have an inverting input B and a non-inverting input A. Therefore, both the adder 502 and the adder 504 can perform a subtraction operation to determine the signal difference between the inverting input terminal B and the non-inverting input terminal A. For the PSAD instruction, the inverting input B of the adder 502 receives the byte B1. For the PMIN instruction, the inverting input B of the adder 502 receives the byte A3. For the PSAD and PMIN instructions, the non-inverting input A of adder 502 receives byte A1. The adder 502 inverts each bit of the byte received by the inverting input B to obtain an inverted value ~B, where ~ represents the inversion in the binary. The adder 502 performs the unsigned summation (ie, A+~B=AB) on the inverted result (~B) and the byte received by the input terminal A, and then adds the aggregated result to the output end. SUM output. Adder 502 has a carry out (CO) terminal CO for providing a carry output signal CO1. When the summation result obtained by the adder 502 overflows, the carry output signal CO1 is logic 1. The adder 502 also increments the summed result and outputs the incremented result from the output terminal INCSUM. The adder 502 has a parent output CP. If the adder outputs a carry input (not provided), the transfer output signal CP1 of the transfer output CP is logic 1. In Fig. 5, although there is no carry input, if the adder 502 receives and passes the carry input, the transfer output signal CP1 is logic 1. In one embodiment, each bit of the byte received by input A is OR-ORed with each bit of the byte received by input B. After the OR operation, you can get 8 results. Then, the AND operations are performed through the results of the eight operations. Based on the OR operation result and the AND operation result, the logical level of the transfer output signal CP1 of the transfer output terminal CP can be determined. The output terminal SUM is coupled to the input of the inverter 508. Inverter 508 has a separate inverter for each bit of the byte. The output of inverter 508 is coupled to input 0 of multiplexer 506. The output terminal INCSUM is coupled to the input terminal 1 of the multiplexer 506. The select input of multiplexer 506 receives carry output signal CO1. The output signal AD1 of the multiplexer 506 is the absolute value of the error between the bytes received by the inputs A and B of the multiplexer 502.

同樣地，針對PSAD指令而言，加法器504的反相輸入端B接收位元組B0。針對PMIN指令而言，加法器504的反相輸入端B接收位元組A2。針對PSAD及PMIN指令而言，加法器504的輸入端A接收位元組A0。加法器504對反相輸入端B所接收到的位元組的每一位元進行反相操作，用以產生相反的邏輯值，如~B。加法器504將反相後的結果(~B)與輸入端A所接收到的位元組進行無正負號的加總，並提供輸出信號予輸出端INCSUM、SUM及CO。由於加法器504的輸出端INCSUM、SUM及CO與加法器502相似，故不再贅述。加法器504的輸出端CO提供一進位輸出信號CO2。若加法器504具有一傳遞輸出端CP時，可不使用或省略傳遞輸出端CP。加法器504的CP輸出端可以不輸出信號。加法器504的輸出端INCSUM耦接多工器510的輸入端1。多工器510用以提供AD2。加法器504的輸出端SUM耦接反相器512的輸入端。反相器512的輸出端耦接多工器510的輸入端0。多工器510的選擇輸入端接收進位輸出信號CO2。AND閘514的兩輸入端之一者接收進位輸出信號CO2。OR閘516用以產生比較位元C<0>,OR閘516的兩輸入端之一者接收進位輸出信號CO1。加法器502的輸出端CP耦接AND閘514的一輸入端。AND閘514的另一輸入端接收加法器504的輸出端CO的進位輸出信號CO2。AND閘514的輸出端耦接OR閘516。Similarly, for the PSAD instruction, the inverting input B of the adder 504 receives the byte B0. For the PMIN instruction, the inverting input B of the adder 504 receives the byte A2. For the PSAD and PMIN instructions, input A of adder 504 receives byte A0. The adder 504 inverts each bit of the byte received by the inverting input B to generate an opposite logical value, such as ~B. The adder 504 performs the unsigned summation of the inverted result (~B) and the byte received by the input terminal A, and provides an output signal to the output terminals INCSUM, SUM, and CO. Since the outputs INCSUM, SUM and CO of the adder 504 are similar to the adder 502, they will not be described again. The output CO of the adder 504 provides a carry output signal CO2. If the adder 504 has a transfer output CP, the transfer output CP can be omitted or omitted. The CP output of adder 504 may not output a signal. The output INCSUM of the adder 504 is coupled to the input 1 of the multiplexer 510. A multiplexer 510 is used to provide AD2. The output SUM of the adder 504 is coupled to the input of the inverter 512. The output of the inverter 512 is coupled to the input 0 of the multiplexer 510. The select input of multiplexer 510 receives carry output signal CO2. One of the two inputs of the AND gate 514 receives the carry output signal CO2. The OR gate 516 is used to generate a compare bit C<0>, and one of the two inputs of the OR gate 516 receives the carry output signal CO1. The output CP of the adder 502 is coupled to an input of the AND gate 514. The other input of the AND gate 514 receives the carry output signal CO2 of the output CO of the adder 504. The output of the AND gate 514 is coupled to the OR gate 516.

針對加法器502及504而言，若輸入端A的位元組大於輸入端B的位元組時，則輸出端CO為邏輯1，並且輸出端INCSUM表示輸入端A及B之間的誤差絕對值，即|A-B|。當加法器502將進位輸出信號CO1設定成邏輯1時，OR閘516所輸出的比較位元C<0>=1。當進位輸出信號CO1為邏輯1時，輸入端A及B的位元值可決定加法器502的傳遞輸出信號CP1為邏輯0或1。當進位輸出信號CO1為邏輯1時，OR閘516便可將比較位元C<0>設定成邏輯1，因此，對於比較位元C<0>而言，傳遞輸出信號CP1的值並不重要。舉例而言，若輸入端A所接收到的二進制碼為00000100(十進制碼為4)，並且輸入端B所接收到的二進制碼為00000010(十進制碼為2)，則輸入端A及B之間的差A-B=00000010(十進制碼為2)。輸入端B所接收到的二進制碼會先被反相，故反相後的結果~B=11111101。當輸入端A所接收到的二進制碼與~B進行無正負號加總時，則加總後的結果A+~B(或A-B)為00000001，並且進位輸出信號CO1為邏輯1(傳遞輸出信號CP1=0)。因此，加總後的結果(即輸出端SUM的值)並非正確值。反相器(508或512)的輸出端為~SUM(即輸出端SUM的二進制碼的反相值)=11111110。反相器的輸出端的值亦並非正確值。輸出端INCSUM的值為00000001+1=00000010，這才是正確的值。因此，針對加法器502及504而言，當輸入端A的位元組大於輸入端B的位元組時，輸出端CO=1，因此，相對應的多工器(506或510)將輸入端1的值(即INCSUM)視為正確的輸出(輸入端A及B間的絕對值)。For the adders 502 and 504, if the byte of the input terminal A is larger than the byte of the input terminal B, the output terminal CO is logic 1, and the output terminal INCSUM indicates that the error between the input terminals A and B is absolute. The value, ie |AB|. When the adder 502 sets the carry output signal CO1 to logic 1, the compare bit C<0> output by the OR gate 516 is =1. When the carry output signal CO1 is logic 1, the bit values of the input terminals A and B may determine that the transfer output signal CP1 of the adder 502 is logic 0 or 1. When the carry output signal CO1 is logic 1, the OR gate 516 can set the compare bit C<0> to logic 1, and therefore, for the compare bit C<0>, the value of the output signal CP1 is not important. . For example, if the binary code received by input A is 00000100 (decimal code is 4), and the binary code received by input B is 00000010 (decimal code is 2), then between input terminals A and B. The difference is AB=00000010 (the decimal code is 2). The binary code received at input B will be inverted first, so the result after the inversion is ~B=11111101. When the binary code received at input A and ~B are summed with no sign, the summed result A+~B (or AB) is 00000001, and the carry output signal CO1 is logic 1 (transfer output signal CP1) =0). Therefore, the summed result (ie the value of the output SUM) is not the correct value. The output of the inverter (508 or 512) is ~SUM (ie, the inverse of the binary code of the output SUM) = 1111111. The value at the output of the inverter is also not the correct value. The value of the output INCSUM is 00000001+1=00000010, which is the correct value. Therefore, for the adders 502 and 504, when the byte of the input terminal A is larger than the byte of the input terminal B, the output terminal CO=1, and therefore, the corresponding multiplexer (506 or 510) will input. The value of end 1 (ie INCSUM) is considered the correct output (the absolute value between inputs A and B).

若輸入端A的值小於等於輸入端B的值時，輸出端CO=0，並且相對應的多工器會將相對應的反相器(508或512)的輸出信號~B視為正確的輸出。當輸入端A的值等於輸入端B的值時，正確的輸出為00000000。雖然正確的輸出會反應在輸出端INCSUM及~SUM中，但由於輸出端CO=0，故相對應的多工器會選擇~SUM。當輸入端A的值等於輸入端B的值時，傳遞輸出端CP的值=1。舉例而言，當輸入端A及B的值均等於00001111時，則輸入端A的值加上輸入端B的反相值~B等於00001111+11110000=11111111=SUM，並且輸出端CP的值=1。輸出端SUM的反相值(即~SUM)為00000000，此為正確的值。輸出端INCSUM的值為1+11111111，此結果為00000000，這也是正確的值(雖然不會被多工器所選擇)。當輸入端A的值小於輸入端B的值時，輸出端CO=0，並且多工器會把~SUM視為正確的值。舉例而言，若輸入端A的值為00000010，並且輸入端B的值為00000100，則|A-B|=00000010。在此例中，A+~B=00000010+11111011=11111101=SUM。由於輸出端CO=0，故~SUM=00000010會被作為正確的值。在此例中，輸出端INCSUM的值等於1+11111101=11111111，這並非正確的值。If the value of the input terminal A is less than or equal to the value of the input terminal B, the output terminal CO=0, and the corresponding multiplexer will treat the output signal ~B of the corresponding inverter (508 or 512) as correct. Output. When the value of input A is equal to the value of input B, the correct output is 00000000. Although the correct output will be reflected in the output terminals INCSUM and ~SUM, since the output terminal CO=0, the corresponding multiplexer will select ~SUM. When the value of the input terminal A is equal to the value of the input terminal B, the value of the delivery output terminal CP=1. For example, when the values of the input terminals A and B are both equal to 00001111, the value of the input terminal A plus the inverted value of the input terminal B is equal to 00001111 + 11110000 = 111111111 = SUM, and the value of the output terminal CP = 1. The inverted value of the output SUM (ie ~SUM) is 00000000, which is the correct value. The value of the output INCSUM is 1+11111111, and the result is 00000000, which is also the correct value (although it will not be selected by the multiplexer). When the value of input A is less than the value of input B, the output CO=0, and the multiplexer will treat ~SUM as the correct value. For example, if the value of input A is 00000010 and the value of input B is 00000100, then |A-B|=00000010. In this example, A+~B=00000010+11111011=11111101=SUM. Since the output CO=0, ~SUM=00000010 will be taken as the correct value. In this case, the value of the output INCSUM is equal to 1+11111101=11111111, which is not the correct value.

當控制碼INSTR為PSAD指令時，根據PSAD操作，加法器502可得到誤差絕對值AD1=|A1-B1|，並且加法器504可得到誤差絕對值AD2=|A0-B0|，並且可省略比較位元C<0>。當控制碼INSTR為PMIN指令時，若A1>A3，則字元W0的高位元組大於字元W1的高位元組，故W0>W1。在本例中，當W0>W1，由於CO1=1，故C<0>=1。當A3>A1時，加法器502的CO1及CP1均為邏輯0，故C<0>=0，用以代表字元W0<W1。若A1=A3，則加法器502的輸出CO1=1並且CP1=0。在本例中，加法器504的相對字元的低位元組的比較結果會用來判斷字元W0及W1的相對值。當高位元組都相等時，則CP1=1，若A0>A2，則字元W0的低位元組大於字元W1的低位元組，故W0>W1。在本例中，CP1及CO2均為邏輯1，故C<0>=1。若高位元組都相等時，則CP1=1，則A0小於等於A2，故CO2為邏輯0，使得C<0>=0。在本例中，字元W0小於等於W1，並且其它例中，字元W0被作為最小值。其它的差異電路(DIFF2~DIFF8)的結構及操作均相同，用以判斷AD3~AD16。差異單元DIFF4及DIFF7可被簡化。特別來說，接收CO及CP，用以判斷相對應的比較位元C<x>的邏輯裝置並非必要。若必要，亦可省略每一獨立加法器所使用的傳遞邏輯。When the control code INSTR is the PSAD instruction, the adder 502 can obtain the error absolute value AD1=|A1-B1| according to the PSAD operation, and the adder 504 can obtain the error absolute value AD2=|A0-B0|, and the comparison can be omitted. Bit C<0>. When the control code INSTR is the PMIN instruction, if A1>A3, the high byte of the character W0 is larger than the high byte of the character W1, so W0>W1. In this example, when W0>W1, since CO1=1, C<0>=1. When A3>A1, both CO1 and CP1 of adder 502 are logic 0, so C<0>=0 is used to represent character W0<W1. If A1 = A3, the output of adder 502 is CO1 = 1 and CP1 = 0. In this example, the result of the comparison of the lower bytes of the relative character of adder 504 is used to determine the relative values of characters W0 and W1. When the high-order tuples are equal, CP1=1, and if A0>A2, the lower byte of the character W0 is larger than the lower byte of the character W1, so W0>W1. In this example, both CP1 and CO2 are logic 1, so C<0>=1. If the high-order tuples are equal, then CP1=1, then A0 is less than or equal to A2, so CO2 is logic 0, so that C<0>=0. In this example, the character W0 is less than or equal to W1, and in other examples, the character W0 is taken as the minimum value. The other difference circuits (DIFF2~DIFF8) have the same structure and operation to judge AD3~AD16. The difference units DIFF4 and DIFF7 can be simplified. In particular, it is not necessary to receive the CO and CP to determine the logical device of the corresponding compare bit C<x>. The transfer logic used by each individual adder can also be omitted if necessary.

請參考第4及5圖，在PMIN指令及PSAD指令中，均使用相同的加法器電路，特別是每一差異單元裡的每一加法器對均可應用在PMIN指令及PSAD指令中。針對PSAD指令而言，每一獨立的加法器電路用以得到所輸入的位元組對間的誤差絕對值。對於PMIN指令而言，雖然PSAD指令所得到誤差絕對值總和並非必需，但每一加法器對利用位元組間的比較，用以確定哪個字元具有最小值。在PSAD指令中，路徑選擇電路將加法器作最大限度的使用，用以幫助PMIN指令。如上所述，針對PMIN指令而言，多個加法器被分成許多加法器對。將一對數位碼(如兩字元)的高部分(如高位元組)提供予第一加法器的相對應輸入端，並且將該對數位碼的低部分(如低位元組)提供予第二加法器的相對應輸入端。藉由修改兩加法器，使其得到進位輸出。藉由加法器對中的高加法器，使其具有傳遞輸出。每一加法器對中的進位輸出及傳遞輸出用以確定每一數位碼對的最小值。對於PSAD指令而言，加法器處理後的結果用以得到第一運算元及第二運算元之間的誤差絕對值，並且對於PMIN指令而言，加法器處理後的結果可得到8個字元集合中的最小者，其中第一運算元具有4個位元組，第二運算元具有11個位元組。Please refer to Figures 4 and 5. In the PMIN instruction and the PSAD instruction, the same adder circuit is used. In particular, each adder pair in each difference unit can be applied to the PMIN instruction and the PSAD instruction. For the PSAD instruction, each individual adder circuit is used to obtain the absolute value of the error between the input byte pairs. For the PMIN instruction, although the sum of the absolute values of the errors obtained by the PSAD instruction is not necessary, each adder pair uses a comparison between the bits to determine which character has the smallest value. In the PSAD instruction, the path selection circuit maximizes the adder to aid the PMIN instruction. As mentioned above, for the PMIN instruction, multiple adders are divided into a number of adder pairs. Providing a high portion (eg, a high byte) of a pair of digital code (eg, a high byte) to a corresponding input of the first adder, and providing the lower portion of the log code (eg, a low byte) to the first The corresponding input of the two adders. By modifying the two adders, they get the carry output. The high adder in the adder pair has a transfer output. The carry output and the pass output of each adder pair are used to determine the minimum value of each digit code pair. For the PSAD instruction, the result of the adder processing is used to obtain the absolute value of the error between the first operand and the second operand, and for the PMIN instruction, the result of the adder processing can obtain 8 characters. The smallest of the sets, where the first operand has 4 bytes and the second operand has 11 bytes.

第6圖顯示本發明之總和單元S1之一可能實施例。總和單元S1具有加法器602、加法器604及加法器606，用以提供具有10位元的結果PSAD<9:0>。加法器602及加法器604均具有8位元，加法器606具有9位元。加法器602及加法器604與加法器502相似，不同之處在於，加法器602及加法器604不具有反相輸入端，並且INCSUM電路並非必需，故可省略。另外，傳遞輸出電路亦並非必要，故可省略。加法器602對於二進制值AD1及AD2進行無正負號加總，並提供一第一總和值SUM1(=AD1+AD2)以及一相對應的進位輸出C1。加法器604對二進制值AD3及AD4進行無正負號加總，並提供一第二總和值SUM2(=AD3+AD4)以及一相對應的進位輸出C2。進位輸出C1作為SUM1的最高有效位元(MSB)。進位輸出C2作為SUM2的最高有效位元(MSB)。加法器606的第一輸入端接收進位輸出C1及第一總和值SUM1結合後的結果。加法器606的第二輸入端接收進位輸出C2及第二總和值SUM2結合後的結果。加法器606的兩輸入端均接收到9位元。加法器606對於兩輸入端所接收到的資料(C1,SUM1+C2,SUM2)進行無正負號加總，並提供具有10位元的輸出結果PSAD<9:0>。最小的9位元PSAD<8:0>係代表無正負號二進制加總的結果，而最高有效位元MSB PSAD<9>表示進位輸出的加總結果。在本實施例中，總和單元S1加總第一誤差絕對值群組(AD1~AD4)，用以得到第一誤差絕對值總合PSAD<9:0>。其它的總和單元S2~S4的結構均相同，分別加總誤差絕對值群組AD5~AD8、AD9~AD12及AD13~AD16，用以提供誤差絕對值總合PSAD<19:10>、PSAD<29:20>及PSAD<39:30>。Figure 6 shows a possible embodiment of the summation unit S1 of the present invention. The summing unit S1 has an adder 602, an adder 604, and an adder 606 for providing a result PSAD<9:0> having 10 bits. The adder 602 and the adder 604 each have 8 bits, and the adder 606 has 9 bits. The adder 602 and the adder 604 are similar to the adder 502 except that the adder 602 and the adder 604 do not have an inverting input, and the INCSUM circuit is not necessary and can be omitted. In addition, the transmission output circuit is not necessary and can be omitted. The adder 602 performs sign-free summation for the binary values AD1 and AD2, and provides a first sum value SUM1 (= AD1 + AD2) and a corresponding carry output C1. The adder 604 performs an unsigned addition on the binary values AD3 and AD4, and provides a second sum value SUM2 (= AD3 + AD4) and a corresponding carry output C2. The carry output C1 is taken as the most significant bit (MSB) of SUM1. The carry output C2 is taken as the most significant bit (MSB) of SUM2. The first input of the adder 606 receives the result of combining the carry output C1 and the first sum value SUM1. The second input of the adder 606 receives the combined result of the carry output C2 and the second sum value SUM2. Both inputs of adder 606 receive 9 bits. The adder 606 performs unsigned addition on the data (C1, SUM1+C2, SUM2) received at the two inputs, and provides an output result PSAD<9:0> having 10 bits. The smallest 9-bit PSAD<8:0> represents the result of the unsigned binary summation, and the most significant bit MSB PSAD<9> represents the summed result of the carry output. In this embodiment, the summation unit S1 adds the first error absolute value group (AD1~AD4) to obtain the first error absolute value sum PSAD<9:0>. The other summation units S2~S4 have the same structure, and the total error absolute value groups AD5~AD8, AD9~AD12 and AD13~AD16 are respectively used to provide the absolute sum of errors PSAD<19:10>, PSAD<29. :20> and PSAD<39:30>.

第7圖為本發明之PMIN電路206的一實施例。PMIN電路206具有解碼邏輯電路701、選擇邏輯電路728以及位置邏輯電路(location logic)703。解碼邏輯電路701具有反相器702、反相器704、反相器706、反相器712、反相器714、反相器720、反相器710、反相器718及反相器724以及AND閘708、AND閘716、AND閘722及AND閘726。AND閘708、AND閘716、AND閘722及AND閘726均具有三輸入端。位置邏輯電路703具有OR閘730及OR閘732。OR閘730及OR閘732均具有二輸入端。比較位元C<2:0>分別提供至反相器702、反相器704及反相器706。AND閘708接收反相器702、反相器704及反相器706的輸出。AND閘708輸出信號W0_MIN。當字元W0為最小字元時，信號W0_MIN為邏輯1。比較位元C<3:4>分別提供至反相器712及反相器714。AND閘716的三輸入端分別接收反相器712及反相器714的輸出以及比較位元C<0>。AND閘716輸出信號W1_MIN。當字元W1為最小字元時，信號W1_MIN為邏輯1。反相器720的輸入端接收C<5>。AND閘722分別接收反相器720的輸出、C<1>及C<3>。AND閘722輸出信號W2_MIN。當字元W2為最小字元時，信號W2_MIN為邏輯1。反相器710、718及反相器724分別接收信號W0_MIN、W1_MIN及W2_MIN，用以分別產生信號~W0_MIN、~W1_MIN及~W2_MIN。信號~W0_MIN、~W1_MIN及~W2_MIN分別表示相對應的字元並非最小值。AND閘726接收信號~W0_MIN、~W1_MIN及~W2_MIN，並輸出信號W3_MIN。當字元W3為最小字元時，信號W3_MIN為邏輯1。Figure 7 is an embodiment of a PMIN circuit 206 of the present invention. The PMIN circuit 206 has a decode logic circuit 701, a select logic circuit 728, and a location logic 703. The decoding logic circuit 701 has an inverter 702, an inverter 704, an inverter 706, an inverter 712, an inverter 714, an inverter 720, an inverter 710, an inverter 718, and an inverter 724, and AND gate 708, AND gate 716, AND gate 722 and AND gate 726. The AND gate 708, the AND gate 716, the AND gate 722, and the AND gate 726 each have three inputs. The position logic circuit 703 has an OR gate 730 and an OR gate 732. Both the OR gate 730 and the OR gate 732 have two inputs. Comparison bits C<2:0> are provided to inverter 702, inverter 704, and inverter 706, respectively. The AND gate 708 receives the outputs of the inverter 702, the inverter 704, and the inverter 706. The AND gate 708 outputs a signal W0_MIN. When the character W0 is the smallest character, the signal W0_MIN is logic 1. The compare bits C<3:4> are supplied to the inverter 712 and the inverter 714, respectively. The three inputs of the AND gate 716 receive the outputs of the inverter 712 and the inverter 714, respectively, and compare bits C<0>. The AND gate 716 outputs a signal W1_MIN. When the character W1 is the smallest character, the signal W1_MIN is logic 1. The input of inverter 720 receives C<5>. The AND gate 722 receives the output of the inverter 720, C<1> and C<3>, respectively. The AND gate 722 outputs a signal W2_MIN. When character W2 is the smallest character, signal W2_MIN is a logic one. Inverters 710, 718 and inverter 724 receive signals W0_MIN, W1_MIN, and W2_MIN, respectively, for generating signals ~W0_MIN, ~W1_MIN, and ~W2_MIN, respectively. The signals ~W0_MIN, ~W1_MIN, and ~W2_MIN indicate that the corresponding character is not the minimum value, respectively. The AND gate 726 receives signals ~W0_MIN, ~W1_MIN, and ~W2_MIN, and outputs a signal W3_MIN. When character W3 is the smallest character, signal W3_MIN is a logic one.

AL<15:0>、BL<15:0>、BL<31:16>及BL<47:32>分別代表字元W0~W3。選擇電路728接收AL<15:0>、BL<15:0>、BL<31:16>、BL<47:32>、信號W0_MIN~W3_MIN。在同一時間，只有信號W0_MIN~W3_MIN之一者為邏輯1，這表示在此週期內，W0_MIN~W3_MIN的相對應字元為最小值。因此，選擇電路728將字元W0~W3之一者作為最小字元，並將此最小字元作為PMINVAL<15:0>而輸出。OR閘730接收信號W3_MIN及W2_MIN。OR閘730具有一輸出端，用以輸出相對應位置位元PMINCLOC<1>。OR閘732接收信號W3_MIN及W1_MIN。OR閘732具有一輸出端，用以輸出相對應位置位元PMINCLOC<0>。在本實施例中，藉由PMINVAL<15:0>，可得知字元W0~W3之最小者，並且PMINLOC<1:0>表示低階加法器電路203所接收到的第一匯流排ABUS的後半部分字元中的最小字元的相對應位置。PMIN電路210的結構與PMIN電路206相似，用以提供代表字元W4~W7最小者之PMINVAL<31:16>以及PMINLOC<3:2>。PMINLOC<3:2>表示高階加法電路207所接收到的第一匯流排ABUS的前半部分字元中的最小者的相對應位置。AL<15:0>, BL<15:0>, BL<31:16>, and BL<47:32> represent the characters W0~W3, respectively. Selection circuit 728 receives AL<15:0>, BL<15:0>, BL<31:16>, BL<47:32>, and signals W0_MIN~W3_MIN. At the same time, only one of the signals W0_MIN~W3_MIN is logic 1, which means that the corresponding character of W0_MIN~W3_MIN is the minimum value during this period. Therefore, the selection circuit 728 takes one of the characters W0 to W3 as the minimum character and outputs the minimum character as PMINVAL<15:0>. The OR gate 730 receives the signals W3_MIN and W2_MIN. The OR gate 730 has an output for outputting the corresponding position bit PMINCLOC<1>. The OR gate 732 receives the signals W3_MIN and W1_MIN. The OR gate 732 has an output for outputting the corresponding position bit PMINCLOC<0>. In the present embodiment, the smallest one of the characters W0 to W3 is known by PMINVAL<15:0>, and PMINLOC<1:0> indicates the first bus ABUS received by the low-order adder circuit 203. The corresponding position of the smallest character in the second half of the character. The PMIN circuit 210 is similar in construction to the PMIN circuit 206 for providing PMINVAL<31:16> and PMINLOC<3:2> which represent the smallest of the characters W4~W7. PMINLOC<3:2> indicates the corresponding position of the smallest of the first half of the first bus of the first bus ABUS received by the high-order adding circuit 207.

第8圖為本發明之高階/低階比較電路212之一實施例。16位元的比較電路802的反相輸入端接收高階加法器207所提供的PMINVAL<31:16>。比較電路802的非反相輸入端接收低階加法器電路203所提供的PMINVAL<15:0>。比較電路802具有一進位輸出端CO，用以提供信號MINLOC<2>。比較電路802比較高階及低階的最小字元，並且將進位輸出作為MINLOC<2>。比較電路802進位輸出端CO與上述的加法器的輸出端CO相同。若PMINVAL<15:0>的字元大於PMINVAL<31:16>的字元時，則比較電路802進位輸出端CO的MINLOC<2>為邏輯1，否則MINLOC<2>為邏輯0。MINLOC<2>係為位置值MINLOC<2:0>的最高有效位元(MSB)。由於MINLOC<2>為邏輯1，故最小值係位於第一匯流排ABUS的前半部字元中。相反地，若MINLOC<2>為邏輯0，則表示最小值係位於第一匯流排ABUS的後半部字元中。MINLOC<2>作為多工器804、多工器806及多工器808的選擇輸入端，多工器804選擇位元組值PMINVAL<23:16>或PMINVAL<7:0>，作為低位元組MINVAL<7:0>。位元值PMINVAL<23:16>或PMINVAL<7:0>表示從高階及低階部分所找出的最小字元的低位元組。多工器806選擇位元組值PMINVAL<31:24>或PMINVAL<15:8>，作為高位元組MINVAL<15:8>。PMINVAL<31:24>或PMINVAL<15:8>表示從高階及低階部分所找出的最小字元的高位元組。多工器808選擇位置位元PMINLOC<3:2>或PMINLOC<1:0>，作為MINLOC<1:0>。位置位元PMINLOC<3:2>或PMINLOC<1:0>表示高階或低階部分的最低有效位置位元(least significant location bits)。如上所述，比較電路802可判斷出MINLOC或是MINLOC<2>的最高有效位元。因此，MINLOC<2:0>表示第一匯排ABUS的最小字元的所在位置。Figure 8 is an embodiment of a high order/low order comparison circuit 212 of the present invention. The inverting input of the 16-bit comparison circuit 802 receives the PMINVAL<31:16> provided by the high-order adder 207. The non-inverting input of comparison circuit 802 receives PMINVAL<15:0> provided by low-order adder circuit 203. Comparison circuit 802 has a carry output CO for providing a signal MINLOC<2>. The comparison circuit 802 compares the high-order and low-order minimum characters and takes the carry output as MINLOC<2>. The carry-out terminal CO of the comparison circuit 802 is identical to the output CO of the adder described above. If the character of PMINVAL<15:0> is greater than the character of PMINVAL<31:16>, then MINLOC<2> of the carry output terminal CO of the comparison circuit 802 is logic 1, otherwise MINLOC<2> is logic 0. MINLOC<2> is the most significant bit (MSB) of the position value MINLOC<2:0>. Since MINLOC<2> is a logic 1, the minimum value is in the first half of the first bus ABUS. Conversely, if MINLOC<2> is a logic 0, it means that the minimum value is in the second half of the first bus ABUS. MINLOC<2> serves as a selection input for multiplexer 804, multiplexer 806, and multiplexer 808, and multiplexer 804 selects the byte value PMINVAL<23:16> or PMINVAL<7:0> as the lower bit. Group MINVAL<7:0>. The bit value PMINVAL<23:16> or PMINVAL<7:0> represents the low byte of the smallest character found from the high and low order portions. The multiplexer 806 selects the byte value PMINVAL<31:24> or PMINVAL<15:8> as the high byte MINVAL<15:8>. PMINVAL<31:24> or PMINVAL<15:8> represents the high byte of the smallest character found from the high and low order parts. The multiplexer 808 selects the location bits PMINLOC<3:2> or PMINLOC<1:0> as MINLOC<1:0>. The location bits PMINLOC<3:2> or PMINLOC<1:0> represent the least significant location bits of the higher or lower order portion. As described above, the comparison circuit 802 can determine the most significant bit of MINLOC or MINLOC<2>. Therefore, MINLOC<2:0> indicates the location of the smallest character of the first row ABUS.

雖然本發明已詳細說明許多較佳的實施方式，但其它可能的變化也已仔細考量過。舉例而言，上述的所有電路均可利用任何邏輯裝置或邏輯電路來實現。上述的邏輯電路的功能亦可利用積體裝置內的軟體或韌體來實現。上述的電路可能具有許多反相裝置，用以對任何信號提供正相邏輯(positive logic)或反相邏輯(negative logic)。本發明所揭露的電路係使用數位碼或是二進制位元組或字元，但並不限定數位碼或是二進制碼的位元數量。雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作些許之更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although many preferred embodiments have been described in detail herein, other possible variations have also been considered. For example, all of the circuits described above can be implemented using any logic device or logic circuit. The functions of the above logic circuits can also be realized by using software or firmware in the integrated device. The above described circuit may have a number of inverting means for providing positive or negative logic for any signal. The circuit disclosed in the present invention uses a digital code or a binary byte or a character, but does not limit the number of bits of a digital code or a binary code. Although the present invention has been disclosed in the above preferred embodiments, it is not intended to limit the invention, and any one of ordinary skill in the art can make some modifications and refinements without departing from the spirit and scope of the invention. Therefore, the scope of the invention is defined by the scope of the appended claims.

100．．．微處理器100. . . microprocessor

102．．．排程器102. . . Scheduler

104．．．複雜整數執行單元104. . . Complex integer execution unit

106．．．簡單整數執行單元106. . . Simple integer execution unit

108．．．浮點執行單元108. . . Floating point execution unit

110．．．媒體單元110. . . Media unit

114、802．．．比較電路114, 802. . . Comparison circuit

112．．．其它單元112. . . Other unit

202．．．路徑選擇電路202. . . Path selection circuit

203．．．低階加法器電路203. . . Low order adder circuit

204．．．第一加法器電路204. . . First adder circuit

206．．．第一PMIN電路206. . . First PMIN circuit

207．．．高階加法器電路207. . . High order adder circuit

208．．．第二加法器電路208. . . Second adder circuit

210．．．第二PMIN電路210. . . Second PMIN circuit

212．．．高階/低階比較器電路212. . . High order/low order comparator circuit

302．．．緩衝器電路302. . . Buffer circuit

304、306、308、506、510、804、806、808．．．多工器304, 306, 308, 506, 510, 804, 806, 808. . . Multiplexer

402‧‧‧差異電路402‧‧‧Differential circuit

404‧‧‧總和電路404‧‧‧sum circuit

410、412‧‧‧選擇邏輯電路410, 412‧‧‧Select logic circuit

502、504、602、604、606‧‧‧加法器502, 504, 602, 604, 606‧‧ ‧ adders

514、708、716、722、726‧‧‧AND閘514, 708, 716, 722, 726‧‧‧AND gate

516‧‧‧OR閘516‧‧‧OR gate

508、512、702、704、706、712、714、720、710、718、724‧‧‧反相器508, 512, 702, 704, 706, 712, 714, 720, 710, 718, 724‧‧ ‧ inverter

728‧‧‧選擇電路728‧‧‧Selection circuit

DIFF1~DIFF8‧‧‧差異單元DIFF1~DIFF8‧‧‧Differentiation unit

S1~S4‧‧‧總和單元S1~S4‧‧‧sum unit

第1圖顯示微處理器100的一實施例。FIG. 1 shows an embodiment of a microprocessor 100.

第2圖係為比較電路之一實施例。Figure 2 is an embodiment of a comparison circuit.

第3圖為本發明之路徑選擇電路之一實施例。Figure 3 is an embodiment of the path selection circuit of the present invention.

第4圖為本發明之第一加法器電路之一實施例。Figure 4 is an embodiment of the first adder circuit of the present invention.

第5圖為本發明之差異單元DIFF1之一實施例。Figure 5 is an embodiment of the difference unit DIFF1 of the present invention.

第6圖顯示本發明之總和單元S1之一實施例。Fig. 6 shows an embodiment of the sum unit S1 of the present invention.

第7圖為本發明之PMIN電路206的一實施例。Figure 7 is an embodiment of a PMIN circuit 206 of the present invention.

第8圖為本發明之高階/低階比較電路212之一實施例。Figure 8 is an embodiment of a high order/low order comparison circuit 212 of the present invention.

202．．．路徑選擇電路202. . . Path selection circuit

203．．．低階加法器電路203. . . Low order adder circuit

204．．．第一加法器電路204. . . First adder circuit

206．．．第一PMIN電路206. . . First PMIN circuit

207．．．高階加法器電路207. . . High order adder circuit

208．．．第二加法器電路208. . . Second adder circuit

210．．．第二PMIN電路210. . . Second PMIN circuit

Claims

a judging system for finding a minimum binary code from at least two binary codes, the judging system comprising: a first adder, adding a plurality of first bits and a plurality of second bits to provide a a first carry output and a first pass output, wherein the first bit is a high bit of a first binary code, and the second bit is inverted to a high bit of a second binary code a second adder, adding a plurality of third bits and a plurality of fourth bits for providing a second carry output, wherein the third bit is a lower bit of the first binary code, Waiting for the fourth bit to be inverted to the lower bit of the second binary code; and a comparison circuit determining whether the first binary is based on the first carry output and the second carry output and the first transfer output The code is greater than the second binary code.

The judging system of claim 1, wherein the first binary code and the second binary code have no sign.

The judging system of claim 1, wherein the first adder and the second adder perform an unsigned binary addition.

The judging system of claim 1, wherein the first binary code and the second binary code are a first unsigned binary character and a second unsigned binary character, respectively. The first adder compares the high-order tuple of the first unsigned binary character and the second unsigned binary character, and the second adder compares the first unsigned binary character with the second unsigned sign The lower byte of a binary character.

The judgment system described in claim 1, wherein the first The pass output represents whether the first adder receives an incoming input.

The judging system of claim 1, wherein the first bit and the second bit are ORed one-to-one, and the result of the OR operation is ANDed, according to an OR operation. The result and the AND operation result determine the first delivery output.

The judging system of claim 1, wherein the comparing circuit comprises: an OR gate having a first input end, a second input end, and a first output end, the first input end receiving the first a carry output, the level of the output indicates whether the first binary code is greater than the second binary code; and an AND gate having a third input terminal, a fourth input terminal, and a second output terminal The third input receives the first transfer output, the fourth input receives the second carry output, and the second output is coupled to the second input.

A judging system for quickly finding a horizontal minimum value from a plurality of digital code codes, the judging system comprising: a complex difference circuit, each difference circuit comparing a first digital bit code and a second digital bit code, wherein each A difference circuit includes: a high adder for comparing a high portion of the first digit code and a high portion of the second digit code to provide a first carry output and a transfer output; and a low adder, Comparing a low portion of the first digit code and a low portion of the second digit code to provide a second carry output; a comparison circuit for comparing the first carry output and the second carry output and comparing the transfer outputs for learning one of the lowest digit codes of the digital code; and a path selection circuit for the digital bits Each of the codes is assigned to at least one of the high adder and the low adder for comparing each digit code with another digit code.

The judging system of claim 8, wherein the comparing circuit comprises: a first comparing circuit, comparing the first carry output, the second carry output and the transfer output of each difference circuit for providing a plurality of comparison bits; and a second comparison circuit for learning the minimum number of the digits based on the comparison bits.

The judging system of claim 9, wherein the first comparison circuit of each difference circuit comprises an AND gate and an OR gate, the AND gate comparing the transmission output with the second carry output, To generate a first bit, the OR gate compares the first bit with the first carry output to provide one of the compare bits.

The judging system of claim 9, wherein the second comparison circuit decodes the comparison bits to provide a plurality of minimum bits, each of the minimum bits representing each of the digits Whether a digit code is the minimum value.

The judging system of claim 8, wherein the digit code comprises an unsigned binary character, and a high portion of each digit code of the digit code has a high byte, each of the digit codes One digit code The lower portion has a lower byte of a corresponding digital code.

The judging system of claim 8, wherein the high adder of each difference circuit and the low adder perform an unsigned binary addition.

The judging system of claim 8, wherein each of the transfer outputs represents a carry-in of a high adder of one of the difference circuits.

The judging system of claim 8, wherein the digit code is stored in a memory, the judging system further comprising a position circuit for determining a memory location of the minimum digit code of the digit code .

The judging system of claim 9, further comprising: a memory for storing the digit code, wherein the second comparison circuit comprises a decoding circuit for decoding the comparison bit for Providing a plurality of minimum bits; a selection circuit, using the minimum bits, selecting one of the digit codes stored in the memory as the minimum digit code; and a position circuit according to the minimum bits A position value is provided, the position value indicating the position of the least digit code in the memory.

A judging method for finding one of the plurality of digit codes, the judging method comprising the steps of: comparing a high bit of a first digit code with a high bit of a second digit code to provide a first a carry output and a pass output; comparing a lower bit of the first bit code and a lower bit of the second bit code to provide a second carry output; and determining according to the first and second carry outputs and the transfer output Out of The first digit code or the second digit code is a smaller code.

The method for judging according to claim 17 further includes: comparing each of the adder pairs of the plurality of adder pairs, comparing the upper bits of the first digit code with the high bits of the second digit code, and comparing the first a lower bit of a digit code and a lower bit of the second digit code and determining that the first digit code or the second digit code is the smaller code; transmitting each digit code of the digit code to the addition At least one adder pair of the pair of cells is used to compare each digit code with another digit code; and based on the compared result, one of the digit codes is known as the least digit code.

The method for determining a method according to claim 18, wherein a plurality of comparison bits are generated by combining a complex carry output and a complex transfer output, and by decoding the comparison bits, the digital code can be determined. The minimum digit code.

The method of judging according to claim 18, further comprising: determining a position of the minimum digit code among the digit codes stored in a memory in the memory.

A judging system for performing one of a horizontal minimum command and an error absolute value sum command using a common adder circuit, the judging system comprising: a complex digital code, the absolute value sum instruction for the error, the digital code The first digital code set includes a second digital code set, and the second digital code set includes a complex digital code pair, each digital code pair having a high digital code and a low digital code; a complex adder, each adder compares a first digit code with a second digit code to provide an absolute value of the error, a carry output and a transfer output; and a total circuit, summing the absolute errors a value for providing a complex error absolute value plus a total value; a comparison circuit combining the carry output and the transfer output to find a minimum digit code pair of the digit code pairs; and a path selection circuit, When performing the horizontal minimum instruction, the path selection circuit transmits each digital code pair of the digital code pair to at least one adder pair of the adders for using each digital code pair with other digital code Comparing, when the error absolute value sum command is executed, the path selection circuit transmits the first digit code set and the second digit code set to the adders for learning each of the first digit code sets An absolute value of the error between a digit code and each digit code of the second digit code set, the second digit code set having a continuous digit code.

The judging system of claim 21, wherein the path selecting circuit executes the first sum code set and the second digit code set by a first bus bar and a first The two bus bars are respectively sent to a third bus bar and a fourth bus bar. When the minimum level command is executed, the path selecting circuit transmits the digit code pair from the first bus bar to the third bus bar. Row and fourth bus.

The judging system of claim 21, wherein the path selection circuit transmits each digit code of the first digit code set to one of the adders when the error absolute sum command is executed a first adder of the adder pair and each digit of the second digit code set The code is passed to one of the first adder pairs of the adders.

The judging system of claim 23, wherein the first adder provides an absolute value of the error to the first adder.

The judging system of claim 21, wherein the summing circuit comprises: a first adder, adding a first error absolute value pair provided by one of the first adder pairs of the adders And a second adder, adding a second error absolute value pair provided by one of the second adders of the adders to provide a second sum total a third adder that sums the first summed value and the second summed value to provide a sum of absolute values of the errors.

The judging system of claim 25, wherein the first and second adder pairs each comprise: a high adder, a high digit code pair and a second digit code pair of a first digit code pair Comparing one of the high digitizers, the high adder provides the pass output; and a low adder comparing the low digit code of the first digit code pair with the low digit code of the second digit code pair .

The judging system of claim 21, wherein when the error absolute value sum instruction is executed, each digit code of the digit code comprises an unsigned bit group, when the horizontal minimum instruction is executed Each of the digit codes includes an unsigned character.

The judging system of claim 21, wherein each of the pass outputs indicates whether a carry input is incremented by a high adder of an adder pair of the adders.

The judging system of claim 21, wherein the comparing circuit comprises: a first comparing circuit, the carry output of each adder pair of the adders is combined with the transfer output, Generating a comparison bit; and a second comparison circuit determining a minimum number bit pair of the plurality of bit pairs based on the comparison bits.

The judging system of claim 29, wherein the first comparison circuit of each adder pair of the adders has an AND gate and an OR gate, the AND gate will be a high adder The pass output is combined with a pass output of a low adder for generating a first bit, the OR gate correlating the first bit with a carry output of the high adder to provide a compare bit yuan.

The judging system of claim 29, wherein the second comparing circuit decodes the comparing bits to provide a complex minimum bit, each minimum bit representing each digit of the pair of digits Whether it is a minimum digit code pair.

The judging system of claim 29, further comprising: a memory for storing the digit code pairs, the second comparison circuit comprising a decoding circuit, the decoding circuit decoding the comparison bits for providing a plurality of minimum bits; a selection circuit that selects one of the digit code pairs and, based on the minimum bits, the selected pair of digits as a minimum digit pair, and the least digit code Storing in the memory; and a position circuit providing a position value based on the minimum bits, The position value indicates the position of the smallest digit code pair in the memory.

A judging method for performing a horizontal minimum instruction and an error absolute value summation instruction by using a common adder circuit, the determining method comprising: receiving a complex digital code, and when performing the error absolute value sum instruction, the digital The code includes a first digit code set and a second digit code set. When the horizontal minimum instruction is executed, the digital code includes a high digit code and a low digit code; a complex adder is provided, and each adder will Comparing the first digit code with a second digit code for providing an absolute value of the error and a carry output; summing the absolute values of the errors to provide a sum of absolute values of the complex errors; classifying the adders into a plurality of adder pairs and providing a pass output; combining the carry outputs and the pass outputs for knowing a minimum number of bits of the digital code; and when performing the horizontal minimum command, the digital code Each digit code is transmitted to at least one adder pair of the pair of adders for comparing each digit code with another digit code, and performing the error And transmitting, by the summation instruction, the first digit code set and the second digit code set to the adder pair, to learn each digit code of the first digit code set and each of the second digit code set The absolute value of the error between consecutive digit codes.

The method for judging according to claim 33, wherein when the absolute value summation instruction is executed, the method further comprises: Transmitting each digit code of the first digit code set to a first adder of a first adder pair of the adder; transmitting each digit code of the second digit code set to the adder a second adder of the first adder pair.

The method for determining a method according to claim 33, wherein the summing step comprises: adding a pair of first error absolute values provided by the first adder pair of the adder pair to generate a first Adding a total value; summing the pair of absolute values of the second adder pair provided by the adder pair to generate a second summed value; and summing the first summed value and The second summed value is used to provide a sum of absolute values of the errors.

The method of claim 33, wherein the step of providing and classifying the adders comprises: comparing a first adder by a high adder of each adder pair a high digit code of the digit code pair and a high digit code of a second digit code pair for providing a first carry output and the pass output; comparing the low adder of each adder pair A low digit code of the first digit code pair and a low digit code of the second digit code pair are used to provide a second carry output.

The judging method of claim 33, wherein the combining step comprises: a first carry output and a second carry output of each adder pair of the adder pair and one of the transfer outputs Combined to provide a comparison bit; and, based on the comparison bits, a minimum number of bits of the digital code.

The method of claim 37, wherein the step of learning the minimum digit code of the digit code comprises: decoding the comparison bits to provide a complex minimum bit, each minimum bit representing Whether the corresponding digital code is the minimum digital code.

The method for determining a method according to claim 38, further comprising: storing the digital code in a memory; and finding, according to the minimum bits, the minimum digit code of the digital code in the memory The position in the body.