CN103365624A

CN103365624A - Determination system and determination method

Info

Publication number: CN103365624A
Application number: CN2013102439659A
Authority: CN
Inventors: 罗沙尔.L.史托兹; 雷蒙.A.贝特伦
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2009-10-26
Filing date: 2010-09-07
Publication date: 2013-10-23
Anticipated expiration: 2030-09-07
Also published as: CN103365624B; CN103941601A; CN101937333B; CN103941601B; TWI489374B; TW201419138A; TWI423121B; CN101937333A; TW201115460A

Abstract

The invention relates to a determination system and a determination method. The system uses common adder circuitry to perform either one of a horizontal minimum instruction and a sum of absolute differences instruction including multiple adders, a sum circuit, a compare circuit, and a routing circuit. The input operands include multiple digital values which are delivered by the routing circuit to the adders depending upon which instruction is indicated. Each adder determines a difference between a pair of digital values. The differences are grouped and summed together by the sum circuit for the sum of absolute differences instruction. The adders are paired together for the horizontal minimum instruction, in which each pair provides carry and propagate outputs. The upper portions of a pair of digital values are compared by the upper adder and the lower portions are compared by the lower adder, and the carry and propagate outputs are collectively used to determine the minimum value.

Description

Judge system and method

The application be that on 09 07th, 2010, application number are 201010277155.1 the applying date, denomination of invention divides an application for the application for a patent for invention of " judgement system and method ".

Technical field

The present invention relates to a kind of microprocessor instruction, be particularly related to a kind of in order to from numerical code set (set of digital values), judge the system and method for minimum code, wherein minimum numerical code is as a horizontal minimum value (horizontal minimum).

Background technology

Present microprocessor (microprocessor) often is used to carry out Media instruction (Media Instruction), in order to increase the efficient of multimedia application.For example, microprocessor architecture design may comprise one or more Media instructions, in order to picking out a horizontal minimum value from numerical code set, and this horizontal minimum value is at the opposite position (location) of a bus (bus) or a register (register).One concrete example is exactly the inner PHMINPOSUW instruction of SSE4 procedure reference handbook (SSE4programming reference manual) of Intel (intel).The PHMINPOSUW instruction is found out the opposite position of small character and small character by 8 without in the sign word (unsigned words, 128bits), and wherein small character has 16 positions (bit).Some known microprocessor needs more handling procedure or more clock period when carrying out the PHMINPOSUW instruction.For example, in order to pick out a plurality of words to inner small character pair, then need to use 4 16 big or small comparer (magnitude comparators), could be within a period 1, search area is reduced to 4 words by 8 words, again 4 words that find are fed back (feed back) to 2 comparers, in order within a second round, search area is reduced to 2 words by 4 words, to seek again at last result feedback to 1 comparer, within (namely last) cycle one the 3rd, find out the small character in 2 words.In a known way, the quantity by increasing by 16 bit comparators is to reach the function of carrying out instruction in single cycle.Take 7 16 bit comparators as example, in single cycle, utilize first 4 comparers to carry out primary comparison, be reduced to 4 words in order to the scope of will search by 8 words, and then utilize 2 comparers, the scope of searching is reduced to 2 words by 4 words, recycles at last 1 comparer, from 2 words, find out reckling.Yet each 16 bit comparator can take the larger space of microprocessor, thereby increases cost and reduce treatment efficiency.

Summary of the invention

The object of the invention is to, do not increase in the situation of circuit, can from the numerical code set, find out minimum number character code and opposite position thereof at single cycle again.

The invention provides a kind of judgement system, in order to from least two binary codes, find out a minimum binary code.In one embodiment, the judgement system comprises, a first adder, a second adder and a comparator circuit.First adder adds up a plurality of first and a plurality of seconds, transmits output in order to one first carry output and one first to be provided.These first is the high position of one first binary code.These seconds are anti-phase in the high position of one second binary code.Second adder adds up a plurality of the 3rd and a plurality of the 4th, in order to the output of one second carry to be provided.These the 3rd is the low level of the first binary code.These the 4th bit Invertings are in the low level of the second binary code.Comparator circuit transmits output according to first and second carry output and first, judges whether that the first binary code is greater than the second binary code.First and second binary code is all without sign (unsigned).First and second totalizer is carried out without the sign binary addition.This first transmits output and represents this first adder and whether receive one and enter input (carry input).

The present invention also provides a kind of judgement system, by in a plurality of numerical codes, finds out a horizontal minimum value in order to rapidly.Judgement of the present invention system comprises a plurality of difference circuit, a path selecting circuit and a comparator circuit.Each difference circuit is two numerical codes relatively.Path selecting circuit is assigned at least one difference circuit with in these numerical codes each, in order to each numerical code and other numerical code are made comparisons.Each difference circuit may comprise a high totalizer and a low totalizer.High totalizer is the high part of partly high and one second numerical code of one first numerical code relatively, transmits output in order to one first carry output and one to be provided.Low totalizer is the lower part of this first numerical code and the lower part of this second numerical code relatively, in order to the output of one second carry to be provided.Comparator circuit is these first and second carry outputs and relatively these transmission outputs relatively, in order to learn the minimum number character code in these numerical codes.

Whether each high totalizer of one transmitting in these difference circuit of output expression receives carry input.This comparator circuit comprises a decoding circuit.The decoding circuit decoding is the position relatively, in order to a plurality of minimum bit to be provided.Each minimum bit represents whether corresponding numerical code is the minimum number character code.One location circuit is informed the memory location of this minimum number character code.The judgement system may be incorporated in the microprocessor chip, in order to carry out a horizontal minimum instruction fast.

The invention provides a kind of determination methods, in order to find out the minimum number character code in a plurality of numerical codes.In a possibility embodiment, determination methods comprises the following steps, compares the high position of high-order and one second numerical code of one first numerical code, exports in order to provide one first carry output and one to transmit; Relatively the low level of the low level of this first numerical code and this second numerical code is exported in order to one second carry to be provided; And export and be somebody's turn to do transmission according to first and second carry and export, judging this first or second numerical code is a less code.Determination methods of the present invention may comprise, each of these numerical codes is sent to the right at least one totalizer centering of a plurality of totalizers, in order to each numerical code is compared with other numerical code, to learn a minimum number character code.Determination methods of the present invention also comprises, the relatively position of decoding.Determination methods of the present invention also comprises, learns the position of minimum number character code in a storer.

The invention provides a kind of system, utilize one to share adder circuit, carry out in a horizontal minimum instruction and the Error Absolute Value summation instruction.In one embodiment, this system comprises, a plurality of totalizers, add way circuit, a comparator circuit and a path selecting circuit.The input operand comprises a plurality of numerical codes.For the instruction of Error Absolute Value summation, these numerical codes comprise one first numerical code collection and and one second numerical code set.For horizontal minimum instruction, these numerical codes comprise a plurality of numerical codes pair.Each numerical code is to having a high numerical code and a low numerical code.Each totalizer is made comparisons one first numerical code and one second numerical code, in order to the output of an Error Absolute Value and a carry to be provided.Add way circuit and add up these Error Absolute Value, add total value in order to a plurality of Error Absolute Value to be provided.These totalizers consist of a plurality of totalizers pair, and provide one to transmit output.Comparator circuit transmits output in conjunction with the output of these carries and these, in order to find out the right minimum number character code of these numerical codes pair.When carrying out this horizontal minimum instruction, path selecting circuit each numerical code that these numerical codes are right is to being sent to the right at least one totalizer of these totalizers pair, in order to each numerical code pair and other numerical code to comparing.When carrying out this Error Absolute Value summation instruction, path selecting circuit is sent to these totalizers pair with this first and second numerical code set, in order to the Error Absolute Value between each numerical code of each numerical code of learning the set of this first numerical code and the set of this second numerical code, this second numerical code is gathered and is had continuous numerical code.

The present invention also provides a kind of method, utilizes one to share adder circuit, carries out in a horizontal minimum instruction and the Error Absolute Value summation instruction.In one embodiment, method provided by the present invention comprises: receive a plurality of numerical codes.When carrying out the instruction of Error Absolute Value summation, these numerical codes comprise the set of one first numerical code and the set of one second numerical code.When the executive level minimum instruction, these numerical codes comprise a high numerical code and a low numerical code.Method provided by the present invention also comprises, a plurality of totalizers are provided.Each totalizer is compared one first numerical code with one second numerical code, in order to the output of an Error Absolute Value and a carry to be provided.Method provided by the present invention also comprises, adds up these Error Absolute Value, in order to a plurality of Error Absolute Value total value to be provided; These totalizers are categorized into a plurality of totalizers pair, and provide one to transmit output; Transmit output in conjunction with the output of these carries and these, in order to learn the right minimum number character code of these numerical codes pair; And when carrying out this horizontal minimum instruction, each numerical code that these numerical codes are right is to being sent to the right at least one totalizer of these totalizers pair, in order to each numerical code pair and other numerical code to comparing, when carrying out this Error Absolute Value summation instruction, this first and second numerical code set is sent to these totalizers pair, the Error Absolute Value between each consecutive numbers character code of gathering in order to each numerical code and this second numerical code of learning the set of the first numerical code.

Description of drawings

Fig. 1 shows an embodiment of microprocessor 100.

Fig. 2 is an embodiment of comparator circuit.

Fig. 3 is an embodiment of path selecting circuit of the present invention.

Fig. 4 is an embodiment of first adder circuit of the present invention.

Fig. 5 is the embodiment of difference unit DIFF1 of the present invention.

Fig. 6 shows an embodiment of summation cell S 1 of the present invention.

Fig. 7 is an embodiment of PMIN circuit 206 of the present invention.

Fig. 8 is an embodiment of high-order of the present invention/low order comparator circuit 212.

[main element symbol description]

100: microprocessor;

102: scheduler;

104: the complex integer performance element;

106: the simple integer performance element;

108: performance element of floating point;

110: media units;

114,802: comparator circuit;

112: other unit;

202: path selecting circuit;

203: the low order adder circuit;

204: the first adder circuit;

206: the PMIN circuit;

207: the high-order adder circuit;

208: the second adder circuit;

210: the two PMIN circuit;

212: high-order/low order comparator circuit;

302: buffer circuits;

304,306,308,506,510,804,806,808: multiplexer;

402: the difference circuit;

404: the summation circuit;

410,412: select logical circuit;

502,504,602,604,606: totalizer;

514,708,716,722,726: with door;

516: or door;

508,512,702,704,706,712,714,720,710,718,724: phase inverter;

728: select circuit;

DIFF1～DIFF8: difference unit;

S1～S4: summation unit.

Embodiment

Following embodiment explanation is in order to allow those of ordinary skill in the art be made and to use content disclosed by the invention.The modification of preferred embodiment it will be apparent to those of skill in the art, and universal principle described herein can be applicable to other embodiment.Therefore, the present invention is not confined to the specific embodiment that proposes and illustrate herein, and it should be contained all and meet the principle that is disclosed in this and the maximum magnitude of novel feature.

The present invention notices, the instruction of known microprocessor executive level minimum value need be used many cycles.The present invention only needs single cycle, and can not roll up circuit when carrying out identical instruction.The invention provides a kind of system and method, in order to learn fast horizontal minimum value, can become apparent for making the features and advantages of the present invention, cited below particularlyly go out preferred embodiment, and cooperate accompanying drawing (Fig. 1～Fig. 8), elaborate.

Fig. 1 is a structural drawing of microprocessor 100 in one embodiment of the invention.Processor 100 has comparator circuit 114, comparator circuit 114 can be by in the numerical code set, find out rapidly a horizontal minimum value, and obtain the Error Absolute Value summation (sum of absolute differences) of the set of the first numerical code and the set of the second numerical code.In the present embodiment, Fig. 1 does not show system and the function that other is known, resets (Instruction reordering) such as instruction fetch (instruction fetch), instruction queue (instruction queue), instruction decoding (instruction decoding) and instruction ... Deng.Although Fig. 1 does not have the display part known technology, can't affect for the understanding of the present invention.Microprocessor 100 has scheduler (scheduler) 102.Scheduler 102 arranges the program of (route) instruction or operation, in order to select ALU (arithmetic logic units; ALUs) or performance element (execution units; EUs).As shown in Figure 1, scheduler 102 couples complex integer performance element (complex integer execution unit; IEU) 104, simple integer performance element (simple IEU) 106, performance element of floating point (floating point execution unit; FPEU) 108, media units (media unit) 110 and other unit 112, wherein other unit 112 is other similar or different processing unit.Media units 110 general executions are take instruction and the running of media as the basis, such as single-instruction multiple-data stream (SIMD) formula expansion instruction set (Streaming SIMD Extensions, SSE) or Multimedia Extension (MultiMedia extension, MMX) and other near order collection.SSE is a kind of SIMD instruction set in the x86 framework of Intel, and SIMD refers to single instruction multiple data (single instruction multiple data).Media units 110 has comparator circuit 114, in order to carry out at least two Media instructions independently.In the present embodiment, this two Media instruction is called horizontal minimum instruction (PMIN instruction) and Error Absolute Value summation instruction (PSAD instruction).The PSAD instruction represents the Error Absolute Value summation of the first numerical code (or binary code) set and the second numerical code (or binary code) set, and wherein the set of the second numerical code follows closely after the set of the first numerical code.To describe after a while the set of the first numerical code and the set of the second numerical code in detail.By carrying out the PMIN instruction, can learn minimum number character code and an opposite position thereof.In the present embodiment, above-mentioned numerical code, binary code and corresponding form can be replaced mutually, and these a plurality of position of code representative (bit) or sexadesimal system numerical codes.Scheduler 102 has storer 116.Storer 116 is in order to storing the operand (operand) of PSAD instruction and PMIN instruction, and has the first bus ABUS and the second bus B BUS.In one embodiment, the first bus ABUS and the second bus B BUS can transmit 128, but are not to limit the present invention.In other embodiments, the first bus ABUS and the second bus B BUS position that can transmit other quantity.Although the Media instruction that media units 110 is generally known very well in order to carry out other multiple this area personage, comparator circuit 114 is in order to carry out PSAD and PMIN instruction.

In a possibility embodiment, for the PSAD instruction, the set of the first numerical code has 4 bytes (each byte has 8 positions), and wherein these 4 bytes are without sign bit.For the PSAD instruction, the set of the second numerical code has byte set.This byte set has 11 continuous bytes.The same time, per 4 continuous bytes can be classified into a group.For the set of the second numerical code, each next 4 byte group is begun by next higher bit, look like in other words, and 1 byte of each next group's meeting displacement, therefore, last 3 bytes of the overlapping upper group of meeting.Suppose that the set of the second numerical code has 11 byte B0～B10.At first B0～B3 is categorized into the first group, then, by next higher bit (such as B1) beginning, formation the second group of classifying again (B1～B4).Therefore, (B1～B4) can overlapping the first group (last 3 bytes of B0～the B3) (B1～B3) of the second group.Difference between each byte of each byte of the first numerical code set and the set of the second numerical code is called Error Absolute Value.Above-mentioned Error Absolute Value can be added up together.One concrete example is exactly the MPSADBW instruction in the SSE4 procedure reference handbook of Intel.For the PSAD instruction, the first bus ABUS transmits the first operand.The first operand comprises 4 without the byte of sign.The second bus B BUS transmits the second operand.The second operand has 11 without the byte of sign.The Error Absolute Value summation be 8 without 10 binary codes of sign.The PSAD instruction may comprise one or more side-play amounts (offset), in order to find above-mentioned operand.The present invention does not limit the size of side-play amount, any side-play amount all can configure by the first bus ABUS and the second bus B BUS, therefore, corresponding operand can be configured in the rightest upper level position (right-most bit position) of the first bus ABUS and the second bus B BUS.In the present embodiment, omit above-mentioned side-play amount.In one embodiment, the PMIN instruction provides 8 opposite positions without minimum value and this minimum value of sign numeric word among the first bus ABUS, and wherein these 8 each word without the sign numeric word have 16.One concrete example is exactly the PHMINPOSUW instruction in the SSE4 procedure reference handbook of Intel.For the PMIN instruction, the first bus ABUS transmits 8 words, and each word has 16.The position that the second bus B BUS transmits can not be defined or ignore, also or the word that makes the second bus B BUS transmit is identical with the first bus ABUS.In the present embodiment, comparator circuit 114 utilizes identical adder circuit to carry out two instructions (PMIN instruction and PSAD instruction) in single cycle.

Fig. 2 is an embodiment of comparator circuit 114 of the present invention.As shown in the figure, comparator circuit 114 comprises, path selecting circuit (routing circuit) 202, low order (low-order; LO) adder circuit 203, high-order (high-order; HI) adder circuit 207, high-order/low order comparator circuit 212.Path selecting circuit 202 has two input ends, couples respectively the first bus ABUS and the second bus B BUS.Path selecting circuit 202 has another input end, in order to receive control code INSTR.Path selecting circuit 202 is according to the received control code INSTR of input end, and the byte to from the first bus ABUS and the second bus B BUS rearranges or re-start routing, in order to the byte of cutting the first bus ABUS and the second bus B BUS.Control code INSTR has at least 1.In the present embodiment, when control code INSTR equaled 1, the PMIN instruction was carried out in expression; When control code INSTR equaled 0, the PSAD instruction was carried out in expression.The first bus ABUS is cut into a high-order portion AH＜31:0〉and a low portion AL＜31:0, high-order portion AH＜31:0 wherein〉and low portion AL＜31:0 all have 32.The second bus B BUS is cut into a high-order portion BH＜55:0〉and a low portion BL＜55:0, high-order portion BH＜55:0 wherein〉and low portion BL＜55:0 all have 56.How will to describe in detail after a while according to performed at the beginning instruction, the byte of the first bus ABUS and the second bus B BUS will be rearranged or re-start routing.Low order adder circuit 203 has first adder circuit 204.First adder circuit 204 couples a PMIN circuit 206.High-order adder circuit 207 has second adder circuit 208.Second adder circuit 208 couples the 2nd PMIN circuit 210.

First adder circuit 204 receives control code INSTR, low portion AL＜31:0〉and BL＜55:0, and output error absolute value summation PSAD＜39:0 and compare position C＜5:0.Error Absolute Value summation PSAD＜39:0〉have 40.Compare position C＜5:0〉have 6.A position C＜5:0 relatively 〉, AL＜15:0 and BL＜47:0 be transferred into a PMIN circuit 206.For low portion, a PMIN circuit 206 output minimum value PMINVAL＜15:0〉and opposite position PMINLOC＜1:0.Control code INSTR, high-order portion AH＜31:0〉and BH＜55:0 be transferred into second adder circuit 208.Second adder circuit 208 output error absolute value summation PSAD＜79:40〉and compare position C＜11:6 〉.Error Absolute Value summation PSAD＜79:40〉have 40.Compare position C＜11:6〉have 6.A position C＜11:6 relatively 〉, AH＜15:0 and BH＜47:0 be transferred into the 2nd PMIN circuit 210.For high-order portion, the 2nd PMIN circuit 210 output minimum value PMINVAL＜31:16〉and opposite position PMINLOC＜3:2.The minimum value PMINVAL that the one PMIN circuit 206 is exported＜15:0〉and opposite position PMINLOC＜1:0 and the minimum value PMINVAL that exports of the 2nd PMIN circuit 210＜31:16 and opposite position PMINLOC＜3:2 combine, just can produce PMINVAL＜31:0〉and PMINLOC＜3:0.High-order/low order comparator circuit 212 receives PMINVAL＜31:0〉and PMINLOC＜3:0, and produce final minimum number character code MINVAL＜15:0〉and relative position MINLOC＜2:0.

First adder circuit 204 and second adder circuit 208 are arranged the byte of input according to instruction (being control code INSTR), and carry out the comparison between byte.For the PSAD instruction, the PSAD＜79:0 after the combination〉have 8 10 numerical code, wherein these numerical codes do not have sign.This numerical code of 8 10 is for carrying out the result after the Error Absolute Value summation operates.For the PSAD instruction, a PMIN circuit 206, the 2nd PMIN circuit 210 and high-order/low order comparator circuit 212 can be omitted.For the PMIN instruction, when the input of each high-order portion and low portion, can omit PSAD＜79:0 〉, by a PMIN circuit 206 and the received comparison position C＜11:0 of the 2nd PMIN circuit 210 〉, just can learn minimum numerical code and relative position.When the first bus ABUS provided 128 input data, high-order/low order comparator circuit 212 received and compares the minimum number character code of high-order portion and low portion, and output minimum value MIN VAL＜15:0〉and relative position MINLOC＜2:0.

Fig. 3 is an embodiment of path selecting circuit 202 of the present invention.Path selecting circuit 202 is according to specific instructions, arranges or re-start routing in order to the numerical code that the first bus ABUS and the second bus B BUS are provided.Buffer circuits 302 receives ABUS＜31:0 〉, and for PSAD instruction and PMIN instruction, export corresponding AL＜31:0 〉.In one embodiment, for each, buffer circuits 302 can comprise an impact damper independently, so that ABUS＜31:0〉can effectively be duplicated into AL＜31:0 〉.In other words, AL＜31 〉=ABUS＜31, AL＜30=ABUS＜30 ..., AL＜0=ABUS＜0.For PSAD instruction and PMIN instruction, AL＜31:0〉have 4 byte A3～A0.For the PMIN instruction, byte A3～A0 can be divided into two pairs, and wherein A3 and A2 can consist of word W1, and A1 and A0 can consist of word W0.Word W1 and W0 all have 16.Multiplexer 304 receives ABUS＜95:64〉and ABUS＜31:0.When the control signal of multiplexer 304 equals logical one (or high level), the output AH＜31:0 of multiplexer 304〉equal ABUS＜95:64 〉.When the control signal of multiplexer 304 equals logical zero (or low level), the output AH＜31:0 of multiplexer 304〉equal ABUS＜31:0 〉.In one embodiment, for AH＜31:0 of 32〉in each, independent one multiplexer with 1 bit width all is provided, therefore all have independent multiplexer path (MUX path) for each input end and output terminal.When if control code INSTR represents the PMIN instruction, then multiplexer 304 is with ABUS＜95:64〉as AH＜31:0 〉.These 32 form 4 byte A11～A8.For PMIN, byte A11～A8 can be divided into two words, and wherein byte A11 and A10 can consist of word W5, and byte A9 and A8 can consist of word W4.When if control code INSTR represents PSAD, then multiplexer 304 is with ABUS＜31:0〉as AH＜31:0 〉.These 32 form 4 byte A3～A0.Copying because the first operand of PSAD instruction is identical for high-order and low order portion of byte will describe in detail after a while.

When the control signal of multiplexer 306 is logical one (control code INSTR=1), multiplexer 306 receives and exports 8 high-order 0x8 and ABUS＜63:16 〉, wherein the logical value of these 8 high-order 0x8 is 0.At this moment, the output BL＜55:0 of multiplexer 306〉be 8 high-order 0x8 and ABUS＜63:16.When the control signal of multiplexer 306 was logical zero, multiplexer 306 received and output BBUS＜55:0 〉, at this moment, the output BL＜55:0 of multiplexer 306〉be BBUS＜55:0.In one embodiment, for each byte of each bus, can use the multiplexer with 1 bit width.When if control code INSTR represents the PMIN instruction, ABUS＜63:16 then〉can be chosen to.ABUS＜63:16〉have 6 byte A7～A2.Byte A7～A2 can be by 3 pairs respectively.Byte A7 and A6 can consist of word W3.Byte A5 and A4 can consist of word W2.Byte A3 and A2 can consist of word W1.When if control code INSTR represents the PSAD instruction, BBUS＜55:0〉can be chosen to.BBUS＜55:0〉the second operand of 7 low byte B6～B0 of tool.When the control end of multiplexer 308 was logical one, multiplexer 308 received and exports 8 high-order 0x8 and ABUS＜127:79 〉, wherein the logical value of these 8 high-order 0x8 is 0.At this moment, the output BH＜55:0 of multiplexer 308〉be 8 high-order 0x8 and ABUS＜127:79 combination.When the control end of multiplexer 308 was logical zero, multiplexer 308 received and output BBUS＜87:32 〉.At this moment, the output BH＜55:0 of multiplexer 308〉be BBUS＜87:32.When if control code INSTR is the PMIN instruction, ABUS＜127:79〉can be selected.ABUS＜127:79〉have 6 byte A15～A10.Byte A15～A10 can distinguish 3 pairs.Byte A15 and A14 can consist of word W7.Byte A13 and A12 can consist of word W6.Byte A11 and A10 can consist of word W5.When if control code INSTR is the PSAD instruction, BBUS＜87:32〉can be selected.BBUS＜87:32〉have 7 high byte B10～B4,7 high byte B10～B4 consist of the second operand of PSAD instructions.

Please refer to Fig. 2, for the PMIN instruction, utilize selecting and appointing of the shown path selecting circuit of Fig. 3 202, word W1 and W0 can be offered the AL bus, word W3～W1 is offered the BL bus, in order to be sent to first adder circuit 204.First adder circuit 204 is compared word W0 respectively with word W1～W3, word W1 is compared with word W2～W3 respectively again, and then word W2 is compared with word W3, and according to comparative result, provide corresponding comparison position C＜5:0 〉.The one PMIN circuit 206 receives word W3～W0, and will small character as PMINVAL＜15:0.The one PMIN circuit 206 is pointed out small character and the opposite position PMINLOC＜1:0 thereof of the low portion of the first bus ABUS 〉.For example, if minimum word is positioned at ABUS＜15:0〉time, PMINLOC=00 then; If minimum word is positioned at ABUS＜32:16〉time, PMINLOC=01 then.As a same reason, for the PMIN instruction, word W5 and W4 can be offered the AH bus, word W7～W5 be offered the BH bus, in order to be sent to second adder circuit 208.Second adder circuit 208 is compared word W4 with word W5～W7, and then word W5 is compared with word W6～W7 respectively, then word W6 is compared with word W7 respectively, and according to comparative result, provides corresponding comparison position C＜11:6 〉.The 2nd PMIN circuit 210 receives word W7～W4, and with the corresponding position of the small character among word W7～W4 as PMINVAL＜31:16.The 2nd PMIN circuit 210 is also indicated the opposite position PMINLOC＜3:2 of the small character of the high-order portion that is positioned at the first bus ABUS 〉.For example, if minimum word is positioned at ABUS＜79:64〉time, PMINLOC=00 then; If minimum word is positioned at ABUS＜95:65〉time, PMINLOC=01 then.High-order/low order comparator circuit 212 is with PMINVAL＜15:0〉word and PMINVAL＜31:16 word compare, be only ABUS＜127:0 in order to pick out whichever〉in minimum value.By the comparative result of high-order/low order comparator circuit 212, also can learn the relative position MINLOC＜2:0 of minimum value 〉.

Please refer to Fig. 2, for the PSAD instruction, path selecting circuit 202 (as shown in Figure 3) selecting and appointing by byte, to offer AL＜31:0 from the byte A3 of the first operand of the first bus ABUS～A0〉and AH＜31:0, and respectively with AL＜31:0 provide give first adder circuit 204 and with AH＜31:0 provide and give second adder circuit 208.Path selecting circuit 202 will be from the byte B6 of the second operand of the second bus B BUS～B0 as BL＜55:0 〉, and with BL＜55:0 be sent to first adder circuit 204.Path selecting circuit 202 will be from the byte B10 of the second operand of the second bus B BUS～B4 as BH＜55:0 〉, and with BH＜55:0 be sent to second adder circuit 208.For the PSAD instruction, first adder circuit 204 is poor, the byte A2 between poor, the byte A1 between byte A0 and B0 and B1 and the difference between B2 and poor the totalling together between byte A3 and B3, and as a result PSAD＜9:0 of the one 10 is provided 〉.First adder circuit 204 is poor, the byte A2 between poor, the byte A1 between byte A0 and B1 and B2 and the difference between B3 and poor the totalling together between byte A3 and B4, and as a result PSAD＜19:10 of the 2 10 is provided 〉.First adder circuit 204 is poor, the byte A2 between poor, the byte A1 between byte A0 and B2 and B3 and the difference between B4 and poor the totalling together between byte A3 and B5, and as a result PSAD＜29:20 of the 3 10 is provided 〉.First adder circuit 204 is poor, the byte A2 between poor, the byte A1 between byte A0 and B3 and B4 and the difference between B5 and poor the totalling together between byte A3 and B6, and as a result PSAD＜39:30 of the 3 10 is provided 〉.As a same reason, second adder circuit 208 is poor, the byte A2 between poor, the byte A1 between byte A0 and B4 and B5 and the difference between B6 and poor the totalling together between byte A3 and B7, and as a result PSAD＜49:40 of the one 10 is provided 〉.Second adder circuit 208 is poor, the byte A2 between poor, the byte A1 between byte A0 and B5 and B6 and the difference between B7 and poor the totalling together between byte A3 and B8, and as a result PSAD＜59:50 of the 2 10 is provided 〉.Second adder circuit 208 is poor, the byte A2 between poor, the byte A1 between byte A0 and B6 and B7 and the difference between B8 and poor the totalling together between byte A3 and B9, and as a result PSAD＜69:60 of the 3 10 is provided 〉.Second adder circuit 208 is poor, the byte A2 between poor, the byte A1 between byte A0 and B7 and B8 and the difference between B9 and poor the totalling together between byte A3 and B10, and as a result PSAD＜79:70 of the 4 10 is provided 〉.

Fig. 4 is an embodiment of first adder circuit 204 of the present invention.First adder circuit 204 is processed AL＜31:0〉with BL＜31:0 in byte, and PSAD＜39:0 is provided or C＜5:0.First adder circuit 204 comprises difference circuit (difference circuit) 402, summation circuit (sum circuit) 404, selects logical circuit (selection logic) 410 and selects logical circuit 412.Difference circuit 402 has a plurality of difference unit DIFF1～DIFF8.Difference unit DIFF1～DIFF8 is independent separately.Summation circuit 404 has summation cell S 1～S4.Summation cell S 1～S4 is independent separately.Each difference unit is judged the difference (without sign) between 4 bytes (i.e. 2 pairs of bytes).After each difference unit is anti-phase with a byte wherein of every a pair of byte, add up together with another byte again.The difference that every a pair of byte produces is Error Absolute Value.The received byte data of difference unit is determined by performed at the beginning instruction.Select logical circuit 410 to have a plurality of duplex circuits.Each duplex circuit is independent of one another.These duplex circuits select specified byte to give difference unit DIFF3 according to performed at the beginning instruction.As shown in the figure, for the PMIN instruction, when the control end of selecting logical circuit 410 is logical one (control code INSTR=1), select logical circuit 410 to select and output byte BL＜47:40 〉, BL＜31:24, BL＜39:32 and BL＜23:16 give difference unit DIFF3.Byte BL＜47:40 〉, BL＜31:24, BL＜39:32 and BL＜23:16 correspond respectively to byte A7～A4.For the PSAD instruction, when the control end of selecting logical circuit 410 is logical zero (control code INSTR=0), select logical circuit 410 to select and output byte BL＜23:16 〉, AL＜15:8, BL＜15:8 and AL＜7:0 give difference unit DIFF3.Byte BL＜23:16 〉, AL＜15:8, BL＜15:8 and AL＜7:0 correspond respectively to byte B2, A1, B1 and A0.Same reason for the PMIN instruction, when the control end of selecting logical circuit 412 is logical one, selects logical circuit 412 to select and output byte AL＜15:8〉and AL＜7:0 give difference unit DIFF8.Byte AL＜15:8〉and AL＜7:0 correspond respectively to byte A1 and A0.For the PSAD instruction, when the control end of selecting logical circuit 412 is logical zero, select logical circuit 412 to select and output byte AL＜23:16〉and AL＜15:8 give difference unit DIFF3.Byte AL＜23:16〉and AL＜15:8 correspond respectively to byte A2 and A1.

For the PSAD instruction, the first inverting input of difference unit DIFF1 receives byte BL＜15:8 〉.Byte BL＜15:8〉corresponding byte B1.The second non-inverting input of difference unit DIFF1 receives byte AL＜15:8 〉.Byte AL＜15:8〉corresponding byte A1.Difference unit DIFF1 determines the Error Absolute Value (∣ A1-B1 ∣ between byte A1 and the B1).Difference unit DIFF1 is with the Error Absolute Value (∣ A1-B1 ∣ between byte A1 and the B1) AD1 as a result of, and exported by the first output terminal.Similarly, the 3rd inverting input of difference unit DIFF1 receives byte BL＜7:0 〉.Byte BL＜7:0〉corresponding byte B0.The 4th non-inverting input of difference unit DIFF1 receives byte AL＜7:0 〉.Byte AL＜7:0〉corresponding byte A0.Difference unit DIFF1 determines the Error Absolute Value (∣ A0-B0 ∣ between byte A0 and the B0).Difference unit DIFF1 is with the Error Absolute Value (∣ A0-B0 ∣ between byte A0 and the B0) AD2 as a result of, and exported by the second output terminal.Similarly, difference unit DIFF2 determines the Error Absolute Value (∣ A3-B3 ∣ between byte A3 and the B3), and the Error Absolute Value between byte A3 and the B3 is as AD3, and exported by the first output terminal.Difference unit DIFF2 determines the Error Absolute Value (∣ A2-B2 ∣ between byte A2 and the B2), and with the Error Absolute Value between byte A2 and the B2 as AD4, and exported by the second output terminal.Generally speaking, when control code INSTR is the PSAD instruction, difference circuit 402 determines byte A0 respectively and the Error Absolute Value between byte B0～B3, byte A1 respectively and the Error Absolute Value between byte B1～B4, byte A2 respectively and the Error Absolute Value between byte B2～B5, and byte A3 respectively and the Error Absolute Value between byte B3～B6.

Summation cell S 1 is calculated the sum total of 4 byte AD1～AD4, and the result after will calculating is as PSAD＜9:0 of 10 〉.The result of calculation of summation cell S 1 is corresponding to (∣ A0-B0 ∣)+(∣ A1-B1 ∣)+(∣ A2-B2 ∣)+(∣ A3-B3 ∣).For the PSAD instruction, difference unit DIFF3 determines the Error Absolute Value between A0 and the B1, and with the Error Absolute Value between A0 and the B1 as AD6.Difference unit DIFF3 determines the Error Absolute Value between A1 and the B2, and with the Error Absolute Value between A1 and the B2 as AD5.Difference unit DIFF4 determines the Error Absolute Value between A2 and the B3, and with the Error Absolute Value between A2 and the B3 as AD8.Difference unit DIFF4 determines the Error Absolute Value between A3 and the B4, and with the Error Absolute Value between A3 and the B4 as AD7.Summation cell S 2 is calculated the sum total of 4 byte AD5～AD8, and the result after will calculating is as PSAD＜19:10 of 10 〉.The result of calculation of summation cell S 2 is corresponding to (∣ A0-B1 ∣)+(∣ A1-B2 ∣)+(∣ A2-B3 ∣)+(∣ A3-B4 ∣).Similarly, for the PSAD instruction, summation cell S 3 is calculated the sum total of 4 byte AD9～AD12, and the result after will calculating is as PSAD＜29:20 of 10 〉.The result of calculation of summation cell S 3 is corresponding to (∣ A0-B2 ∣)+(∣ A1-B3 ∣)+(∣ A2-B4 ∣)+(∣ A3-B5 ∣).At last, for the PSAD instruction, summation cell S 4 is calculated the sum total of 4 byte AD13～AD16, and the result after will calculating is as PSAD＜39:30 of 10 〉.The result of calculation of summation cell S 3 is corresponding to (∣ A0-B3 ∣)+(∣ A1-B4 ∣)+(∣ A2-B5 ∣)+(∣ A3-B6 ∣).Although Fig. 4 only shows an embodiment of first adder circuit 204, but second adder circuit 208 is similar to first adder circuit 204 haply, in order to determine byte A0 respectively and the Error Absolute Value between byte B4～B7, byte A1 respectively and the Error Absolute Value between byte B5～B8, byte A2 respectively and the Error Absolute Value between byte B6～B9 and byte A3 respectively and the Error Absolute Value between byte B7～B10.In addition, second adder circuit 208 adds up 4 Error Absolute Value, and according to the result who adds the General Logistics Department, provides 4 to add total value.PSAD＜79:40〉comprise these 4 and add total value.

Generally speaking, for the PSAD instruction, difference circuit 402 is in order to each byte (A3:A0) in definite the first numerical code set and the Error Absolute Value between each byte (B10:B0) in the set of the second numerical code.After handling the first B3:B0 of group, begin comparison by next high bit again, such as B1:B4, B2:B5, B3:B6 ... Deng.Therefore, in 8 groups, will produce Error Absolute Value AD1～AD4, AD5～AD8 ..., AD29～AD32.Summation circuit 404 adds up the Error Absolute Value of each group, and corresponding Error Absolute Value summation PSAD＜79:0 is provided 〉.

When control code INSTR was the PMIN instruction, except the byte difference of selecting and appointing, the processing mode of difference circuit 402 was roughly the same.The summation of AD1～AD16 and PSAD＜39:0〉can be omitted, only need relatively position C＜5:0 〉.Difference unit DIFF1 relatively or with other method determines Error Absolute Value between A1 and the A3 and the Error Absolute Value between A0 and the A2.The first byte A3 is the high byte of word W1, and the second byte A1 is the high byte of word W0.The 3rd byte A2 is the low byte of word W1, and nybble A0 is the low byte of word W0.In the present embodiment, high byte and the low byte of difference unit DIFF1 difference comparand W1 and W0.Difference unit DIFF1 determines relatively C＜0, position 〉.C＜0, position〉represent which word (W1 or W0) is less word.Similarly, the high byte A5 of difference unit DIFF2 comparand W2 and W1 and A3, and low byte A4 and the A2 of comparand W2 and W1 are less word in order to definite which word (W2 or W1), and comparison position C＜3 are provided 〉.Similarly, the high byte A7 of difference unit DIFF3 comparand W3 and W2 and A5, and low byte A6 and the A4 of comparand W3 and W2 are less word in order to definite which word (W3 or W2), and comparison position C＜5 are provided 〉.For the PMIN instruction, can omit difference unit DIFF4.The high byte A5 of difference unit DIFF5 comparand W2 and W0 and A1, and low byte A4 and the A0 of comparand W2 and W0 are less word in order to definite which word (W2 or W0), and comparison position C＜1 are provided 〉.The high byte A7 of difference unit DIFF6 comparand W3 and W1 and A3, and low byte A6 and the A2 of comparand W3 and W1 are less word in order to definite which word (W3 or W1), and comparison position C＜4 are provided 〉.For PMIN, can omit difference unit DIFF7.The high byte A7 of difference unit DIFF8 comparand W3 and W0 and A1, and low byte A6 and the A0 of comparand W3 and W0 are less word in order to definite which word (W3 or W0), and comparison position C＜2 are provided 〉.

Generally speaking, for the PMIN instruction, the comparison position C of the difference circuit 402 of first adder circuit 204＜0〉smaller between expression word W0 and the W1.C＜1, position relatively〉smaller between expression word W0 and the W2.C＜2, position relatively〉smaller between expression word W0 and the W3.C＜3, position relatively〉smaller between expression word W1 and the W2.C＜4, position relatively〉smaller between expression word W1 and the W3.C＜5, position relatively〉smaller between expression word W2 and the W3.Although Fig. 4 does not show the detailed circuit of second adder circuit 208, but second adder circuit 208 also has the difference circuit identical with first adder circuit 204, carry out identical comparison in order to the word W4～W8 for high-order adder circuit 207, and corresponding relatively position C＜11:6 is provided 〉.Therefore, for PMIN, C＜6, position relatively〉smaller between expression word W4 and the W5.C＜7, position relatively〉smaller between expression word W4 and the W6.C＜8, position relatively〉smaller between expression word W4 and the W7.C＜9, position relatively〉smaller between expression word W5 and the W6.C＜10, position relatively〉smaller between expression word W5 and the W7.C＜11, position relatively〉smaller between expression word W6 and the W7.The one PMIN circuit 206 utilizes relatively position C＜5:0 〉, the reckling of identification word occurrence W0～W3.The 2nd PMIN circuit 210 utilizes relatively position C＜11:6 〉, the reckling of identification word occurrence W4～W7.

Fig. 5 is the embodiment of difference unit DIFF1 of the present invention.As shown in the figure, difference unit DIFF1 has a totalizer pair.This totalizer is to having a height (or first) totalizer 502 and low (or second) totalizer 504.Totalizer 502 and 504 all has an inverting input B and a non-inverting input A.Therefore, totalizer 502 and totalizer 504 all can be carried out subtraction operation, in order to determine the signal difference between inverting input B and the non-inverting input A.For the PSAD instruction, the inverting input B of totalizer 502 receives byte B1.For the PMIN instruction, the inverting input B of totalizer 502 receives byte A3.For PSAD and PMIN instruction, the non-inverting input A of totalizer 502 receives byte A1.Each of the byte that 502 couples of inverting input B of totalizer are received is carried out operated in anti-phase, in order to obtain inverse value～B, wherein～represent anti-phase in the scale-of-two.(～the totalling that B) carries out without sign with the received byte of input end A (is that A+～B=A-B), the result that then will add the General Logistics Department is exported by output terminal SUM to totalizer 502 with the result after anti-phase.Totalizer 502 has carry output (carry out; CO) end CO is in order to provide a carry output signals CO1.When totalizer 502 is resulting when adding overall result generation overflow (overflow), carry output signals CO1 is logical one.Totalizer 502 also can be carried out increment to adding overall result, and the result behind the increment is exported by output terminal INCSUM.Totalizer 502 has one and transmits (propagate) output terminal CP.If totalizer is inputted (carry input with a carry; Not providing) when exporting, the transmission output signal CP1 that transmits output terminal CP is logical one.In Fig. 5, although there is not the carry input, if when totalizer 502 received and transmit the carry input, then transmitting output signal CP1 was logical one.In one embodiment, each of the byte that each of the byte that input end A is received and input end B are received is made exclusive disjunction one to one.Through behind the exclusive disjunction, just can obtain 8 operation results.Carry out and computing through these 8 operation results again.According to the exclusive disjunction result and and operation result, just can determine to transmit the logic level of the transmission output signal CP1 of output terminal CP.Output terminal SUM is coupled to the input end of phase inverter 508.For each of byte, phase inverter 508 has a phase inverter independently.The output terminal of phase inverter 508 couples the input end 0 of multiplexer 506.Output terminal INCSUM couples the input end 1 of multiplexer 506.The selection input end of multiplexer 506 receives carry output signals CO1.The output signal AD1 of multiplexer 506 is, the Error Absolute Value between the received byte of the input end A of multiplexer 502 and B.

Similarly, for the PSAD instruction, the inverting input B of totalizer 504 receives byte B0.For the PMIN instruction, the inverting input B of totalizer 504 receives byte A2.For PSAD and PMIN instruction, the input end A of totalizer 504 receives byte A0.Each of the byte that 504 couples of inverting input B of totalizer are received is carried out operated in anti-phase, in order to producing opposite logical value, as～B.Totalizer 504 with the result after anti-phase (～B) carry out totalling without sign with the received byte of input end A, and provide output signal to give output terminal INCSUM, SUM and CO.Because output terminal INCSUM, SUM and the CO of totalizer 504 are similar to totalizer 502, therefore repeat no more.The output terminal CO of totalizer 504 provides a carry output signals CO2.When if totalizer 504 has a transmission output terminal CP, can not use or omit transmission output terminal CP.The CP output terminal of totalizer 504 is output signal not.The output terminal INCSUM of totalizer 504 couples the input end 1 of multiplexer 510.Multiplexer 510 is in order to provide AD2.The output terminal SUM of totalizer 504 couples the input end of phase inverter 512.The output terminal of phase inverter 512 couples the input end 0 of multiplexer 510.The selection input end of multiplexer 510 receives carry output signals CO2.Receive carry output signals CO2 with in two input ends of door 514 one.Or door 516 is in order to produce relatively C＜0, position 〉, or in two input ends of door 516 one receives carry output signals CO1.The output terminal CP of totalizer 502 couples the input end with door 514.Receive the carry output signals CO2 of the output terminal CO of totalizer 504 with another input end of door 514.Couple or door 516 with the output terminal of door 514.

For totalizer 502 and 504, if the byte of input end A during greater than the byte of input end B, then output terminal CO is logical one, and output terminal INCSUM represents the Error Absolute Value ， Ji ∣ A-B ∣ between input end A and the B.When totalizer 502 is set carry output signals CO1 for logical one, or the door 516 comparison position C that exports＜0=1.When carry output signals CO1 was logical one, the logical value of input end A and B can determine that the transmission output signal CP1 of totalizer 502 is logical zero or 1.When carry output signals CO1 is logical one, or door 516 just can compare C＜0, position〉set logical one for, therefore, for comparing C＜0, position〉for, the value of transmitting output signal CP1 is unimportant.For example, if the received binary code of input end A is 00000100 (decimal code is 4), and the received binary code of input end B is 00000010 (decimal code is 2), then the poor A-B=00000010 between input end A and the B (decimal code is 2).The received binary code of input end B can be inverted first, therefore the result～B=11111101 after anti-phase.When the received binary code of input end A and～B carried out adding up without sign, the as a result A+～B (or A-B) that then adds the General Logistics Department was 00000001, and carry output signals CO1 is logical one (transmission output signal CP1=0).Therefore, the result's (being the value of output terminal SUM) who adds the General Logistics Department is not right value.The output terminal of phase inverter (508 or 512) is～SUM (being the inverse value of the binary code of output terminal SUM)=11111110.The value of the output terminal of phase inverter also is not right value.The value of output terminal INCSUM is 00000001+1=00000010, and this is only correct value.Therefore, for totalizer 502 and 504, when the byte of input end A during greater than the byte of input end B, output terminal CO=1, therefore, corresponding multiplexer (506 or 510) is considered as correct output (absolute value between input end A and B) with the value (being INCSUM) of input end 1.

If the value of input end A is during less than or equal to the value of input end B, output terminal CO=0, and corresponding multiplexer can be considered as correct output with the output signal～B of corresponding phase inverter (508 or 512).When the value of input end A equals the value of input end B, correctly be output as 00000000.Although correct output can react output terminal INCSUM and～SUM in because output terminal CO=0, therefore corresponding multiplexer can be selected～SUM.When the value of input end A equals the value of input end B, the value of transmission output terminal CP=1.For example, when the value of input end A and B was equal to 00001111, then the value of input end A added that inverse value～B of input end B equals 00001111+11110000=11111111=SUM, and the value of output terminal CP=1.(namely～SUM) be 00000000, this is correct value to the inverse value of output terminal SUM.The value of output terminal INCSUM is 1+11111111, and this result is 00000000, and this also is correct value (although can be not selected by multiplexer).When the value of input end A during less than the value of input end B, output terminal CO=0, and multiplexer can be considered as correct value to～SUM.For example, if the value of input end A is 00000010, and the value of input end B is 00000100 ， Ze ∣ A-B ∣=00000010.In this example, A+～B=00000010+11111011=11111101=SUM.Because output terminal CO=0, therefore～SUM=00000010 can be used as correct value.In this example, the value of output terminal INCSUM equals 1+11111101=11111111, and this is not correct value.

When control code INSTR was the PSAD instruction, according to PSAD operation, totalizer 502 can obtain Error Absolute Value AD1=∣ A1-B1 ∣, and totalizer 504 can obtain Error Absolute Value AD2=∣ A0-B0 ∣, and can omit comparison position C＜0 〉.When control code INSTR is the PMIN instruction, if A1〉A3, then the high byte of word W0 is greater than the high byte of word W1, therefore W0〉W1.In this example, work as W0〉W1, because CO1=1, therefore C＜0 〉=1.As A3〉during A1, CO1 and the CP1 of totalizer 502 are logical zero, therefore C＜0 〉=0, in order to represent word W0＜W1.If A1=A3, then output CO1=1 and the CP1=0 of totalizer 502.In this example, the comparative result of the low byte of the relative word of totalizer 504 can be used for judging the relative value of word W0 and W1.When high byte all equated, if CP1=1 then was A0〉A2, then the low byte of word W0 is greater than the low byte of word W1, therefore W0〉W1.In this example, CP1 and CO2 are logical one, therefore C＜0 〉=1.When if high byte all equates, CP1=1 then, then A0 is less than or equal to A2, therefore CO2 is logical zero, so that C＜0 〉=0.In this example, word W0 is less than or equal to W1, and in other example, word W0 is used as minimum value.(structure of DIFF2～DIFF8) and operation are all identical, in order to judge AD3～AD16 for other difference circuit.Difference unit DIFF4 and DIFF7 can be simplified.Special, receive CO and CP, in order to judge corresponding comparison position C＜x〉logical unit and inessential.If necessary, also can omit the employed transmission logic of each independent totalizer.

Please refer to Fig. 4 and Fig. 5, in PMIN instruction and PSAD instruction, all use identical adder circuit, particularly each totalizer in each difference unit is to all can be applicable in PMIN instruction and the PSAD instruction.For the PSAD instruction, each independently adder circuit in order to the byte that obtains inputting to Error Absolute Value.For the PMIN instruction, although the resulting Error Absolute Value summation of PSAD instruction and nonessential, each totalizer is to utilizing the comparison between byte, in order to determine which word has minimum value.In the PSAD instruction, path selecting circuit is done totalizer to use to greatest extent, in order to help the PMIN instruction.As mentioned above, for the PMIN instruction, a plurality of totalizers are divided into many totalizers pair.Provide the corresponding input end that gives first adder with the high part (such as high byte) of a pair of numerical code (such as two words), and provide the corresponding input end that gives second adder with this lower part to numerical code (such as low byte).By revising two totalizers, make it obtain carry output.By the high totalizer of totalizer centering, make it have the output of transmission.The carry output of each totalizer centering and transmission output are in order to determine the right minimum value of each numerical code.For the PSAD instruction, result after totalizer is processed is in order to obtain the Error Absolute Value between the first operand and the second operand, and for the PMIN instruction, result after totalizer is processed can obtain 8 recklings in the word set, wherein the first operand has 4 bytes, and the second operand has 11 bytes.

Fig. 6 shows that one of summation cell S 1 of the present invention may embodiment.Summation cell S 1 has totalizer 602, totalizer 604 and totalizer 606, and as a result PSAD＜9:0 of 10 is provided in order to provide 〉.Totalizer 602 and totalizer 604 all have 8, and totalizer 606 has 9.Totalizer 602 and totalizer 604 are similar to totalizer 502, and difference is, totalizer 602 and totalizer 604 do not have inverting input, and INCSUM circuit and nonessential, therefore can omit.In addition, transmit output circuit also also inessential, therefore can omit.Totalizer 602 carries out adding up without sign for binary value AD1 and AD2, and provide one first total value SUM1 (=AD1+AD2) and a corresponding carry output C1.604 couples of binary value AD3 of totalizer and AD4 carry out adding up without sign, and provide one second total value SUM2 (=AD3+AD4) and a corresponding carry output C2.Carry output C1 is as the highest significant position (MSB) of SUM1.Carry output C2 is as the highest significant position (MSB) of SUM2.The first input end of totalizer 606 receives the result after carry output C1 and the first total value SUM1 combination.The second input end of totalizer 606 receives the result after carry output C2 and the second total value SUM2 combination.Two input ends of totalizer 606 all receive 9.Totalizer 606 carries out adding up without sign for the received data of two input ends (C1, SUM1+C2, SUM2), and the PSAD＜9:0 of the Output rusults with 10 is provided 〉.9 minimum PSAD＜8:0〉be representative without sign binary add the long and, and highest significant position MSB PSAD＜9 output of expression carry add overall result.In the present embodiment, and summation cell S 1 totalling the first Error Absolute Value group (AD1～AD4), in order to obtain the first Error Absolute Value sum total PSAD＜9:0 〉.The structure of other summation cell S 2～S4 is all identical, adds up respectively the AD5～AD8 of Error Absolute Value group, AD9～AD12 and AD13～AD16, in order to Error Absolute Value sum total PSAD＜19:10 to be provided 〉, PSAD＜29:20 and PSAD＜39:30.

Fig. 7 is an embodiment of PMIN circuit 206 of the present invention.PMIN circuit 206 has decode logic circuit 701, selects logical circuit 728 and location logic circuit (location logic) 703.Decode logic circuit 701 have phase inverter 702, phase inverter 704, phase inverter 706, phase inverter 712, phase inverter 714, phase inverter 720, phase inverter 710, phase inverter 718 and phase inverter 724 and with door 708, with the door 716, with the door 722 and with the door 726.With door 708, with door 716,726 all have three input ends with door 722 and with door.Location logic circuit 703 has or door 730 and or door 732.Or door 730 reaches or door 732 all has two input ends.Compare position C＜2:0〉provide respectively to phase inverter 702, phase inverter 704 and phase inverter 706.Output with door 708 reception phase inverter 702, phase inverter 704 and phase inverters 706.With door 708 output signal W0_MIN.As word W0 during for small character, signal W0_MIN is logical one.Compare position C＜3:4〉provide respectively to phase inverter 712 and phase inverter 714.Receive respectively output and comparison position C＜0 of phase inverter 712 and phase inverter 714 with three input ends of door 716 〉.With door 716 output signal W1_MIN.As word W1 during for small character, signal W1_MIN is logical one.The input end of phase inverter 720 receives C＜5 〉.Receive respectively output, C＜1 of phase inverter 720 with door 722〉and C＜3.With door 722 output signal W2_MIN.As word W2 during for small character, signal W2_MIN is logical one.Phase inverter 710,718 and phase inverter 724 receive respectively signal W0_MIN, W1_MIN and W2_MIN, in order to produce respectively signal～W0_MIN ,～W1_MIN and～W2_MIN.Signal～W0_MIN ,～W1_MIN and～W2_MIN represents that respectively corresponding word is not minimum value.With door 726 receive signal～W0_MIN ,～W1_MIN and～W2_MIN, and output signal W3_MIN.As word W3 during for small character, signal W3_MIN is logical one.

AL＜15:0 〉, BL＜15:0, BL＜31:16 and BL＜47:32 represent respectively word W0～W3.Select circuit 728 to receive AL＜15:0 〉, BL＜15:0, BL＜31:16, BL＜47:32, signal W0_MIN～W3_MIN.At one time, only have one among signal W0_MIN～W3_MIN to be logical one, this was illustrated in this cycle, and the corresponding word of W0_MIN～W3_MIN is minimum value.Therefore, select circuit 728 with one among word W0～W3 as small character, and with this small character as PMINVAL＜15:0 and export.Or door 730 receives signal W3_MIN and W2_MIN.Or door 730 has an output terminal, in order to export opposite position position PMINCLOC＜1 〉.Or door 732 receives signal W3_MIN and W1_MIN.Or door 732 has an output terminal, in order to export opposite position position PMINCLOC＜0 〉.In the present embodiment, by PMINVAL＜15:0 〉, can learn the reckling of word W0～W3 and PMINLOC＜1:0〉opposite position of small character in the latter half word of received the first bus ABUS of expression low order adder circuit 203.The structure of PMIN circuit 210 is similar to PMIN circuit 206, in order to the PMINVAL＜31:16 that represents word W4～W7 reckling to be provided〉and PMINLOC＜3:2.PMINLOC＜3:2〉the opposite position of reckling in the first half word of received the first bus ABUS of expression high-order add circuit 207.

Fig. 8 is an embodiment of high-order of the present invention/low order comparator circuit 212.The inverting input of 16 comparator circuit 802 receives PMINVAL＜31:16 that high-order totalizer 207 provides 〉.The non-inverting input of comparator circuit 802 receives PMINVAL＜15:0 that low order adder circuit 203 provides 〉.Comparator circuit 802 has a carry output terminal CO, in order to signal MINLOC＜2 to be provided 〉.Comparator circuit 802 is than the small character of higher-order and low order, and with carry output as MINLOC＜2.Comparator circuit 802 carry output terminal CO are identical with the output terminal CO of above-mentioned totalizer.If PMINVAL＜15:0〉word greater than PMINVAL＜31:16 word the time, the MINLOC of comparator circuit 802 carry output terminal CO＜2 then〉be logical one, otherwise MINLOC＜2 be logical zero.MINLOC＜2〉be positional value MINLOC＜2:0 highest significant position (MSB).Because MINLOC＜2〉be logical one, therefore minimum value is arranged in the first half word of the first bus ABUS.On the contrary, if MINLOC＜2〉be logical zero, represent that then minimum value is arranged in the latter half of word of the first bus ABUS.MINLOC＜2〉as the selection input end of multiplexer 804, multiplexer 806 and multiplexer 808, multiplexer 804 is selected byte value PMINVAL＜23:16〉or PMINVAL＜7:0, as low byte MINVAL＜7:0 〉.Byte value PMINVAL＜23:16〉or PMINVAL＜7:0 the low byte of the small character found out from high-order and low order portion of expression.Multiplexer 806 is selected byte value PMINVAL＜31:24〉or PMINVAL＜15:8, as high byte MINVAL＜15:8 〉.PMINVAL＜31:24〉or PMINVAL＜15:8 the high byte of the small character found out from high-order and low order portion of expression.Multiplexer 808 chosen position position PMINLOC＜3:2〉or PMINLOC＜1:0, as MINLOC＜1:0 〉.Position, position PMINLOC＜3:2〉or PMINLOC＜1:0 the least significant bit (LSB) set (least significant location bits) of expression high-order or low order portion.As mentioned above, comparator circuit 802 can be judged MINLOC or MINLOC＜2〉highest significant position.Therefore, MINLOC＜2:0〉position of small character of expression the first bus ABUS.

Although the present invention has described many better embodiments in detail, other possible variation was also carefully considered.For example, all above-mentioned circuit all can utilize any logical unit or logical circuit to realize.The function of above-mentioned logical circuit also can utilize software or the firmware in the integrating device to realize.Above-mentioned circuit may have many anti-phase devices, in order to any signal is provided positive phase logic (positive logic) or inverted logic (negative logic).Circuit system's use numerical code disclosed in this invention or scale-of-two byte or word, but do not limit numerical code or the bit quantity of binary code.Although the present invention with preferred embodiment openly as above; so it is not to limit the present invention, those skilled in the art, without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking the appended claims person of defining.

Claims

1. a system utilizes one to share adder circuit, carries out in a horizontal minimum instruction and the Error Absolute Value summation instruction, and this system comprises:

A plurality of numerical codes, for this Error Absolute Value summation instruction, these numerical codes comprise the set of one first numerical code and the set of one second numerical code, for this horizontal minimum instruction, these numerical codes comprise a plurality of numerical codes pair, and each numerical code is to having a high numerical code and a low numerical code;

A plurality of totalizers, each totalizer is made comparisons one first numerical code and one second numerical code, and in order to an Error Absolute Value to be provided, output is transmitted in carry output and one;

One adds way circuit, adds up these Error Absolute Value, adds total value in order to a plurality of Error Absolute Value to be provided;

One comparator circuit transmits output in conjunction with the output of these carries and these, in order to find out the right minimum number character code of these numerical codes pair; And

One path selecting circuit, when carrying out this horizontal minimum instruction, this path selecting circuit each numerical code that these numerical codes are right is to being sent to the right at least one totalizer of these totalizers pair, in order to each numerical code pair and other numerical code to comparing, when carrying out this Error Absolute Value summation instruction, this path selecting circuit is sent to these totalizers pair with the set of this first numerical code and the set of the second numerical code, in order to the Error Absolute Value between each numerical code of each numerical code of learning the set of this first numerical code and the set of this second numerical code, this second numerical code is gathered and is had continuous numerical code.

2. the system as claimed in claim 1, wherein this path selecting circuit is when carrying out this Error Absolute Value summation instruction, this the first numerical code set and the second numerical code are gathered by one first bus and one second bus, be sent to respectively one the 3rd bus and one the 4th bus, when carrying out this horizontal minimum instruction, this path selecting circuit to by this first bus, is sent to the 3rd bus and the 4th bus with these numerical codes.

3. the system as claimed in claim 1, wherein when carrying out this Error Absolute Value summation instruction, this path selecting circuit is sent to each numerical code of this first numerical code set the right first adder of one first adder of these totalizers, and each numerical code of this second numerical code set is sent to the right second adder of this first adder of these totalizers.

4. system as claimed in claim 3, wherein this right first adder of this first adder provides an Error Absolute Value.

5. the system as claimed in claim 1, wherein this adds way circuit and comprises:

One first adder adds up the right first adder of these totalizers to one first Error Absolute Value that provides pair, adds total value in order to provide one first;

One second adder adds up the right second adder of these totalizers to one second Error Absolute Value that provides pair, adds total value in order to provide one second;

One the 3rd totalizer adds up this and first adds total value and second and add total value, in order to an Error Absolute Value total value to be provided.

6. the system as claimed in claim 1, wherein each totalizer is to comprising:

One high totalizer, the high numerical code that the high numerical code that one first numerical code is right and one second numerical code are right is compared, and this high totalizer provides these to transmit output; And

One low totalizer, what the low numerical code that this first numerical code is right and this second numerical code were right one hangs down numerical code and compares.

7. the system as claimed in claim 1, wherein when carrying out this Error Absolute Value summation instruction, each numerical code in these numerical codes comprises one without sign bit, and when carrying out this horizontal minimum instruction, each numerical code in these numerical codes comprises one without the sign word.

8. the system as claimed in claim 1, wherein these transmit each expression of output whether carry input are incremented by the right high totalizer of the right totalizer of these totalizers.

9. the system as claimed in claim 1, wherein this comparator circuit comprises:

One first comparator circuit, this carry output that each totalizer that these totalizers are right is right combines with this transmission output, compares the position in order to produce one; And

One second comparator circuit according to these positions relatively, is determined the right minimum number character code of these numerical codes pair.

10. system as claimed in claim 9, this first comparator circuit that each totalizer that wherein these totalizers are right is right have one with the door and one or the door, should transmit output with one of goalkeeper's one high totalizer combines with a transmission output of a low totalizer, in order to produce one first, this or this first carry with this high totalizer of goalkeeper are exported the phase result, compare the position in order to provide one.

11. system as claimed in claim 9, this second comparator circuit these relatively positions of decoding wherein, in order to a plurality of minimum bit to be provided, each minimum bit represents that whether each right numerical code of these numerical codes to being a minimum number character code pair.

12. system as claimed in claim 9 also comprises:

One storer is stored these numerical codes pair, and this second comparator circuit comprises a decoding circuit, and this decoding circuit these relatively positions of decoding are in order to provide a plurality of minimum bit;

One selects circuit, selects the right numerical code of these numerical codes pair, and according to these minimum bit, with selected this numerical code to as a minimum number character code pair, and with this minimum number character code to being stored in this storer; And

One location circuit according to these minimum bit, provides a positional value, and this positional value points out that this minimum number character code is to the position at this storer.

13. a method utilizes one to share adder circuit, carries out in a horizontal minimum instruction and the Error Absolute Value summation instruction, the method comprises:

Receive a plurality of numerical codes, when carrying out this Error Absolute Value summation instruction, these numerical codes comprise the set of one first numerical code and the set of one second numerical code, and when carrying out this horizontal minimum instruction, these numerical codes comprise a high numerical code and a low numerical code;

A plurality of totalizers are provided, and each totalizer is compared one first numerical code with one second numerical code, in order to the output of an Error Absolute Value and a carry to be provided;

Add up these Error Absolute Value, in order to a plurality of Error Absolute Value total value to be provided;

These totalizers are categorized into a plurality of totalizers pair, and provide one to transmit output;

Transmit output in conjunction with the output of these carries and these, in order to learn the right minimum number character code of these numerical codes pair; And

When carrying out this horizontal minimum instruction, each numerical code that these numerical codes are right is to being sent to the right at least one totalizer of these totalizers pair, in order to each numerical code pair and other numerical code to comparing, when carrying out this Error Absolute Value summation instruction, the set of this first numerical code and the set of the second numerical code are sent to these totalizers pair, the Error Absolute Value between each consecutive numbers character code of gathering in order to each numerical code and this second numerical code of learning the set of the first numerical code.

14. method as claimed in claim 13 wherein when carrying out this Error Absolute Value summation instruction, also comprises:

Each numerical code of this first numerical code set is sent to the right first adder of a first adder of these totalizers;

Each numerical code of this second numerical code set is sent to the right second adder of this first adder of these totalizers.

15. method as claimed in claim 13, wherein this totalling step comprises:

Add up the right first adder of these totalizers to one first Error Absolute Value that provides pair, add total value in order to produce one first;

Add up the right second adder of these totalizers to one second Error Absolute Value that provides pair, add total value in order to produce one second; And

Add up this and first add total value and second and add total value, in order to an Error Absolute Value total value to be provided.

16. method as claimed in claim 13 wherein when carrying out this horizontal minimum instruction, provides and the step of these totalizers of classifying comprises:

By the right high totalizer of each totalizer, relatively a right high numerical code and the right high numerical code of one second numerical code of one first numerical code exported in order to the output of one first carry and these transmission to be provided;

By the right low totalizer of each totalizer, the low numerical code that the low numerical code that relatively this first numerical code is right and this second numerical code are right is in order to provide the output of one second carry.

17. method as claimed in claim 13, wherein this integrating step comprises:

The one first carry output that each totalizer that these totalizers are right is right and the output of one second carry combine with these of transmitting in the output, compare the position in order to provide one; And

According to these positions relatively, learn the right minimum number character code of these numerical codes pair.

18. method as claimed in claim 17, the right right step of this minimum number character code comprises wherein to learn these numerical codes:

These relatively positions of decoding, in order to a plurality of minimum bit to be provided, whether the corresponding numerical code of each minimum bit representative to being this minimum number character code pair.

19. method as claimed in claim 18 also comprises:

With these numerical codes to being stored in the storer; And

According to these minimum bit, find out this right minimum number character code of these numerical codes to the position in this storer.