CN100418054C - Apparatus and method for generating packed sum of absolute differences - Google Patents

Apparatus and method for generating packed sum of absolute differences Download PDF

Info

Publication number
CN100418054C
CN100418054C CNB2005100058802A CN200510005880A CN100418054C CN 100418054 C CN100418054 C CN 100418054C CN B2005100058802 A CNB2005100058802 A CN B2005100058802A CN 200510005880 A CN200510005880 A CN 200510005880A CN 100418054 C CN100418054 C CN 100418054C
Authority
CN
China
Prior art keywords
package
difference
order
instruction
summation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005100058802A
Other languages
Chinese (zh)
Other versions
CN1641565A (en
Inventor
强森·丹尼尔
路伯·亚伯特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/765,497 external-priority patent/US7376686B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN1641565A publication Critical patent/CN1641565A/en
Application granted granted Critical
Publication of CN100418054C publication Critical patent/CN100418054C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)
  • Executing Machine-Instructions (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An apparatus for performing an MMX PSADBW instruction is disclosed. The apparatus includes carry-generating subtraction logic that generates packed differences of the subtrahend from the minuend and associated carry bits indicating whether the difference is positive or negative. The apparatus selectively inverts the differences based on the carry bits. Addition logic adds the selectively inverted differences and carry bits substantially in parallel to generate the PSADBW instruction result. In one embodiment, the apparatus also includes two muxes. The first mux selects the selectively inverted differences in the case of a PSADBW instruction and selects a multiply instruction's partial products otherwise. The second mux selects the carry bits in the case of a PSADBW instruction and selects a second multiply instruction's partial products otherwise. The two mux outputs are provided to the addition logic of the embodiment for the last calculation.

Description

Produce the apparatus and method of absolute difference package summation
Related data
The application claim of the application's case is based on previous U.S. Provisional Application case, case number is 60/444531 (u.s.provisional, application, serial, No.60/444531, filedJanuary 31,2003,) title is " APPARATUS AND METHOD FORGENERATING PACKED SUM OF AB SOLUTE DIFFERENCE ", write.
Technical field
The present invention is about a kind of method of microprocessor calculating operation, particularly about produce a kind of method and the device of absolute difference package summation in can carrying out the multimedia elongation technology.
Background technology
In the instruction set of x86 architecture microprocessor, comprised one group of absolute difference package summation (PSADBW; Packed sum of absolute difference) instruction.And in the instruction of absolute difference package summation, comprise the input operand of two 64 bits again, wherein the arrangement mode of each is eight no symbol package integer-bit tuples (unsigned byte integers).In two 64 bit input operands, one of them input operand is as the minuend operand in the subtraction, and another operand is then as the subtrahend operand in the subtraction.Therefore, instruction subtraction when absolute difference package summation, when two input operands are subtracted each other as subtrahend and minuend, will produce the result of no symbol 16 bits, and the result of these no symbol 16 bits, just corresponding eight no sign bit tuples are subtracted each other the absolute value summation of eight differences of back generation.And this absolute difference package summation is instructed the result that special subtraction produced, and will extensively be applied to various places of using this instruction, and for example multimedia sound, image, figure utilization and science are used or the like.
In microprocessor, an approximate mode of carrying out the instruction of absolute difference package summation, be that the first package operand and the second package operand subtraction are produced a difference, then this difference is taken absolute value, again the absolute value external phase of this difference is added up at last, as the mode of carrying out the instruction of absolute difference package summation.Yet such mode has a shortcoming, just will need a quite long processing clock recurrence interval, produces above-mentioned instruction results, particularly at when doing continuous additive operation.Therefore, our one of needs are handled arithmetic unit faster and are improved above-mentioned such shortcoming, make the instruction of absolute difference package summation to carry out faster.
Summary of the invention
Purpose of the present invention provides a micro processor, apparatus with multimedia elongation technology identical element, goes to carry out the instruction of the absolute difference package summation of a multimedia elongation technology.
This device includes carry and produces (carry-generating) subtraction package logic, the major function of this logic is, to not have symbol minuend operand package and deduct no symbol subtrahend operand package, produce a difference package after subtracting each other, and the carry bit relevant with each difference.If when the carry bit that should be correlated with is cloth woods numerical value 1, represent that then the difference of this carry bit correspondence is negative, opposite, Ruo Bulin numerical value is 0, represents that then the pairing difference of this carry bit is a positive number.This device also comprises a multiplexer (multiplexer), in order to judge that the carry bit cloth woods numerical value that each difference is correlated with is 0 or 1.And for the difference of a negative, can be that 1 corresponding carry bit goes anti-phase (inverted) relevant difference via adding a cloth woods numerical value, and can obtain the absolute value of difference; And for the difference of a positive number, then add a cloth woods numerical value and be 0 corresponding carry bit, and go this relevant difference is done the action of noninverting (non-inverted), also can obtain the absolute value of this difference.
In addition, in order to produce the result of absolute difference package summation instruction, difference optionally being done anti-phase, or added action such as carry bit, all is parallel action, the action of carrying out simultaneously just.More in simple terms, after the absolute calculation of difference is finished, be that simultaneously that anti-phase property is anti-phase difference and carry bit carries out sum operation in essence, to finish an absolute difference package summation instruction results fast.
In embodiment device proposed by the invention, multimedia elongation technology (MMX; Multimedia extension) unit comprises the multiplier circuit (multiplierpipeline) of two 16 bits.Each multiplier circuit comprises a partial product generator (partial productgenerator), and via this (Booth) of cloth coding, produces the partial product coldest days of the year end.The disclosed device of the present invention comprises an adder logic (addition logic), in adder logic, also comprise one and save add with carry musical instruments used in a Buddhist or Taoist mass (carry-save adder), place among each multiplication circuit, (partial product) adds up with partial product, produces a summation.In addition, more comprise a full adder (full adder), in order to summation addition with two multiplier circuits.Device of the present invention also comprises a multiplexer (multiplexer) and places among each multiplier circuit, under the situation of a multiplying order, is used for selecting the partial product of multiplier, is provided in the adder logic.Yet, if be under the situation of instruction of an absolute difference package summation, this multiplexer will provide optionally anti-phase difference with and carry digit unit, in adder logic.
In the embodiments of the invention,,, be transformed in first and second micro-order and carry out the macro instruction of absolute difference package summation via the multimedia elongation technology unit in the microprocessor.
For reaching above-mentioned purpose, device technique feature provided by the invention is as follows:
A kind of device of carrying out the instruction of multimedia elongation technology absolute difference package summation is characterized in that, comprises:
One subtracter produces the package difference of a package operand instruction, and produces the corresponding carry bit of each difference; One phase inverter connects this subtracter, and according to the numerical value of each corresponding relatively carry bit, optionally anti-phase each difference promptly produces the anti-phase difference of a selectivity of each this package difference;
At least one first multiplexer, be connected in this phase inverter and this subtracter, in order to differentiate by this indicated package difference of this carry bit is positive number or negative, and to select corresponding this package difference be an output valve of positive number or negative, if when the carry bit that should be correlated with was cloth woods numerical value 1, the difference of then representing this carry bit correspondence was a negative; Opposite, Ruo Bulin numerical value is 0, represents that then the pairing difference of this carry bit is a positive number; And
One totalizer connects this first multiplexer and this subtracter, with this output valve addition of this carry bit and this first multiplexer, produces the result of multimedia elongation technology absolute difference package summation instruction.
For reaching above-mentioned purpose, method and technology feature provided by the invention is as follows:
A kind of method of carrying out the instruction of multimedia elongation technology absolute difference package summation is characterized in that, comprises:
Produce the package difference of an instruction operands package, and produce the relevant carry bit of each package difference;
Judge this package difference, indicating this package difference by the relevant carry bit of this package difference is positive number or negative;
Select the step of the corresponding numerical value of this carry bit, if this package difference is a positive number, the corresponding numerical value of then selecting this carry bit is the package difference, if this package difference is a negative, the corresponding numerical value of then selecting this carry bit is the complement of package difference;
Provide the micro-order kenel to the addend multiplexer;
When the addend multiplexer judges whether the micro-order kenel is a PMULSAD micro-order;
If yes,
The output valve that the addend multiplexer selects anti-phase difference and carry bit to be produced;
With this carry bit addition, produce one first total value;
Should the addition of correspondence numerical value, produce one second total value; And
With this first total value and this second total value addition, produce an instruction results.
Method of the present invention also comprises:
After producing this relevant carry bit of this each package difference, store this carry bit; Before producing this carry bit, change this absolute difference package summation and instruct at least one first micro-order and one second micro-order, wherein produce this carry bit by this first micro-order, reach and carry out this carry bit addition step by this second micro-order;
When the addend multiplexer judges whether the micro-order kenel is a PMULSAD micro-order;
If not,
It is output valve that the addend multiplexer is selected partial product by the partial product generator, the partial product addition is produced the result of multiplying order.
Description of drawings
Fig. 1 is according to known technology, the calcspar of multimedia elongation technology absolute difference package summation (PSADBW) instruction;
Fig. 2 is disclosed technology according to the present invention, and microprocessor is carried out the calcspar of absolute difference package summation (PSADBW) instruction;
Fig. 3 is disclosed Fig. 2 according to the present invention, the calcspar of represented multimedia elongation technology unit; And
The represented block flow diagram of carrying out absolute difference package summation for the microprocessor among Fig. 2 of the present invention of Fig. 4.
Main conventional letter partly:
100 multimedia elongation technology absolute difference package summations (PSADBW) instruction calcspar
102 absolute difference package summations instruction operation code opcode
Unit is calculated in the instruction of 104 minuends
106 subtrahend instruction operands
108 absolute difference package summation instruction results
200 microprocessors, 202 instruction transform logic
204 1 micro-order storage row 206 multimedia elongation technology unit
306 micro-orders, 308 subtracters (carry produces the package subtract logic)
312 difference carry bits, 314 package differences
316 bit group phase inverters, 318 first multiplexers
The anti-phase difference 324 micro-order kenels of 322 selectivity
326 second multiplexers
326A addend multiplexer A 326B addend multiplexer B
328 totalizers
328A first adder (saving add with carry musical instruments used in a Buddhist or Taoist mass A)
328B second adder (saving add with carry musical instruments used in a Buddhist or Taoist mass B)
332 full adder 334A partial product A
334B partial product B 336A partial product generator A
336B partial product generator B 338A multiplier A
338B multiplier B
Specific implementation method
Please refer to Fig. 1, represent the calcspar of absolute difference instruction in the known multimedia elongation technology.Wherein, reference number 100 is expressed as multimedia elongation technology (MMX; Multimediaextension) absolute difference package summation (PSADBW in; Packed sum of absolutedifference) instruction.In the multimedia elongation technology, among absolute difference package summation instruction square Figure 100, comprise an instruction operation code opcode 102, be used for specifying in the multimedia elongation technology, the instruction of absolute difference package summation, and two instruction operands 104 and 106.Wherein, first instruction is calculated unit 104 and is comprised the minuend operand (minuendoperand) that eight packages do not have the sign bit tuple, is denoted as X0 to X7.Second 106 of instruction operands comprises eight packages is not had the subtrahend operand of sign bit tuple (subtrahend operand), is denoted as Y0 to Y7.
In the multimedia elongation technology in the absolute difference package summation instruction 100, produce an absolute difference package summation instruction results 108, this result is to be the summation of the absolute value of eight differences, is just subtracted each other the summation of the absolute value of the resulting difference in back by minuend operand 104 and subtrahend operand 106.And about the detailed description of absolute difference package summation (PSADBW), can be with reference to Intel's software configurations development handbook in 1999, the instruction set reference in second, by the 3-545 page or leaf to the 3-547 page or leaf.(1999?Intel?Architecture?Software?Develop’sManual,Volume?2:Instruction?Set?Reference,at?pages?3-545?through3-547)
Please refer to Fig. 2, expression microprocessor 200 is in order to carry out the calcspar of absolute difference package summation instruction.Microprocessor square Figure 200 comprises an instruction transform logic 202, micro-order storage row 204, is couple to instruction transform logic 202 and a multimedia elongation technology unit 206, is couple to micro-order storage row 204.
Instruction transform logic 202 in the microprocessor 200, main function are that conversion macros instructs in one or more micro-orders.The macro instruction 100 of the absolute difference package summation of macro instruction such as Fig. 1 wherein.In the present embodiment, macro instruction also comprises the x86 framework, the instruction of the instruction set of microprocessor internal, for example multimedia elongation technology instruction.In addition, instruction transform logic 202 converts absolute difference package summation instruction 100 to two macro instructions, is respectively PMULSAD 212 and PSUBSAD 214 as shown in Figure 2.In PSUBSAD 214 micro-orders, can order multimedia elongation technology unit 206, produce the difference of an absolute difference package summation operand package, and produce the corresponding carry bit of each difference, and according to the numerical value of each corresponding relatively carry bit, optionally anti-phase each difference.In addition, in PMULSAD 212 micro-orders, will indicate multimedia elongation technology unit 206,,, and then produce the result of absolute difference package summation instruction with optionally anti-phase difference addition with corresponding carry bit.And the operand of micro-order PSUBSAD 214 and PMULSAD 212, will be in Fig. 3 and the detailed description of Fig. 4.
Instruction transform logic 202 comprises logic, circuit, device or the microcode or the equivalence element etc. of logic, circuit, device or microcode (for example: micro-order or native instructions) or a combination, instructs relevant micro-order storage to be listed as in order to conversion macros.When element is carried out when conversion in instruction transform logic 202, can be shared with other circuit, microcode etc., and by shared circuit, then in order to carry out other function in microprocessor 200.A micro-order (reference is according to native instructions usually) in the instruction of a grade, is the performance element of an execution, for example multimedia elongation technology unit 206.For example: the microprocessor by Reduced Instruction Set Computer (ReduceInstruction Set Computer-RISC) is directly carried out micro-order.And for a complex instruction set computer (CISC) (complex instruction set computer-CISC) microprocessor, for example with the microprocessor of x86 architecture microprocessor compatibility, the instruction of x86 architecture microprocessor, to be switched in the relevant micro-order, and its relevant micro-order then can directly be carried out via the unit or a plurality of unit that are positioned at the complex instruction set computer (CISC) microprocessor.
Then, provide a micro-order to micro-order storage row 204, in order to store the performed instruction of waiting for by microprocessor 200 of performance element, for example multimedia elongation technology unit 206 by instruction transform logic 202.Have a plurality of micro-order inputs in the micro-order storage row 204; And these micro-orders are stored up the performance element that row 204 provide microprocessor 200 by micro-order, for example: the multimedia elongation technology unit 206 of multimedia elongation technology.
In embodiments of the present invention, multimedia elongation technology unit 206, the archives working storage that comprises a multimedia elongation technology, have a plurality of working storages in this archives working storage, be used for the save command operand, the minuend operand 104 and the subtrahend operand 106 of the absolute difference package summation instruction among these instruction operands such as Fig. 1.The executable operations rule of multimedia elongation technology unit 206 is that micro-order by microprocessor 200 first pretreatment stage is transmitted and obtains.Wherein, multimedia elongation technology unit 206 comprises logic, circuit, device or the microcode or the equivalence element etc. of logic, circuit, device or microcode (for example: micro-order or native instructions) or a combination, is used as the executable operations rule that micro-order provides.When the element in the multimedia elongation technology unit 206 is used for executable operations, may be shared or the like with other circuit or microcode, and these shared elements then are the functions that can carry out other equally in microprocessor 200.
In embodiments of the present invention, multimedia elongation technology unit 206, can with other performance element, for example: an integer unit (integer unit), floating point unit (floating unit) or the like carry out simultaneously.In an embodiment who is compatible with x86 structure microprocessor, the integer unit of a multimedia elongation technology unit 206 and an x86, an x86 floating point unit and an x86 SSE
Figure C20051000588000141
The unit is operated simultaneously.The disclosed technology according to the present invention, with the embodiment of x86 structure compatible in, if huge application program of execution that can be correct, then can design in the microprocessor of x86 and carry out.And that supposes that this application program can be correct is performed, and obtains desired result.In the embodiment of another and x86 compatibility, will consider that multimedia elongation technology unit 206 combines with the performance element of above-mentioned x86, and be to operate simultaneously.And about multimedia elongation technology unit 206, will be described in detail among following Fig. 3 and Fig. 4.
With reference to figure 3, the calcspar of expression structure of the multimedia elongation technology unit 206 of Fig. 2 according to the present invention.Multimedia elongation technology unit 206 comprises carry and produces package subtracter (carry-generating packed subtraction logic) 308, mainly in order to receive micro-order 306, for example, micro-order PMULSAD 212 or the PSUBSAD 214 that is produced by micro-order storage row 204 among Fig. 2.The same reception of subtracter (subtraction logic) 308 is arranged in Fig. 1, the minuend operand 104 that is produced via the total instruction of absolute difference package and the signal of subtrahend operand 106.Afterwards, after subtracter 308 was finished subtraction, producing a package did not have sign bit tuple difference 314, and right corresponding to each subtrahend and minuend.And the package that produces does not have the sign bit tuple, just as difference among Fig. 3 314 represented by the computing of X7-Y7 to X0-Y0.The numerical value that is produced of this difference 314 is to utilize two complement generation that operation method calculates.
For each difference 314, in subtracter 308, can produce corresponding carry bit 312, C7 as shown in Figure 3 is to C0.In embodiments of the present invention, carry bit 312 is stored in the storage unit, and wherein storage unit can be in latch (latches) or the working storage (registers).Carry bit 312 is usually as borrow bit (borrow bits), underflow bit (underflow bits) or signal bit (sign bits), its main cause is because minuend 104 need can't determine whether the 9th borrow bit, or indicating the numerical value of relevant difference 314 is positive number or negative.
In addition, for each difference 314, if relevant carry bit 312 is a cloth woods numerical value 1, its 314 of pairing difference is a negative; If be cloth woods numerical value 0,314 of then pairing differences are positive number.For example: suppose that minuend X4 104 is 13 10Or 00001101 2, subtrahend Y4 106 is 9 10Or 00001001 2, be 4 with the difference 314 that X4-Y4 obtained afterwards 10Or 00000100 2, the value of the C4 of carry bit 314 then is 0 at this moment, because do not need borrow unit.That is to say, the C4 in the carry bit 314, the difference 314 of expression X4-Y4 is a positive number, and the absolute value of the difference 314 of X4-Y4, the just numerical value of the difference 314 of X4-Y4.Yet, if hypothesis minuend X4 104 is 9 10Or 00001001 2, subtrahend Y4 106 is 13 10Or 00001101 2, the difference 314 that X4-Y4 obtained then is-4 10Or 11111100 2, the numerical value of the C4 in the carry bit 312 then is 1 at this moment, therefore needs a borrow bit.That is to say that the C4 in the carry bit 312 represent that the difference 314 of X4-Y4 is negative, and the absolute value of the difference 314 of X4-Y4, then be not equal to the numerical value of the difference 314 of X4-Y4.And, can difference 314 be added one 1 according to two complement computing method for the absolute value of the difference 314 that obtains X4-Y4, just earlier that the numerical value of difference 314 is anti-phase, again that this is anti-phase numerical value adds one 1, then obtains 00000100 2Or 4 10, and this numerical value is just-4 10Or 11111100 2Absolute value.
Be in the multimedia elongation technology unit 206, usually comprise eight bit group phase inverters 316, each eight bit group phase inverter 316 connects corresponding each package difference 314, to receive the numerical value that each package difference 314 is produced, and it is produce the complement of a cloth woods numerical value, or it is anti-phase.
Also comprised the dual input multiplexer 318 of eight bit groups in the multimedia elongation technology unit 206, be referred to as first multiplexer 318 in the present invention, and be connected with pairing each bit group phase inverter 316.Two input ends of each multiplexer 318 will receive the numerical value of being exported by pairing bit group phase inverter 316 respectively, and the output valve that receives package difference 314, as the input value of input end.The input value that each multiplexer 318 is received is controlled by its pairing carry bit 312.If the cloth woods numerical value of carry bit 312 is 0, and multiplexer 318 will select the output valve of difference 314 as its input value, if the cloth woods numerical value of carry bit 312 is 1,318 of multiplexers will select output valve by bit group phase inverter 316 as its input value.Therefore, the output valve of multiplexer 318 has then comprised the selectivity anti-phase difference 322 of the numerical value of eight differences 314 that produced by subtract logic 308.The anti-phase difference 322 of selectivity, just in Fig. 3 represented Z7 to Z0.
Multimedia elongation technology unit 206 also comprises two group of 16 bit multiplication circuit (multiplierpipelines), and one group is multiplier (multiplier) A 338A, and one group is multiplier B 338B, as shown in Figure 3.Comprise a partial product generator (partial productgenerator) at every group of multiplier 338, as shown in Figure 3.Partial product generator among the multiplier 338A is 336A, and the partial product generator among the multiplier 338B then is 336B, and will divide other to produce partial product 334A in multiplier 338A, and produces partial product 334B in multiplier 338B.
In embodiments of the present invention, comprise Bu Si (Booth) scrambler in the partial product generator 336, each booth encoder produces nine according to three multiplier computing bits, as the 16 bit partial products of checking, to produce each partly product.Usually comprise in the partial product and add bit (additional bits), for example: signal extends (sign-extension) bit.Therefore, further, when these part product additions, at least eight partial products, have at least eight bits overlapping.
In multiplier 338A and 338B, comprise the dual input multiplexer usually, be referred to as second multiplexer 326 in the present invention, comprise that as second multiplexer 326 among Fig. 3 addend multiplexer (addend multiplexer) A is that 326A and addend multiplexer B are 326B.One of them input end of addend multiplexer 326A is mainly receiving the partial product 334A that is produced by partial product generator 336A, and another input end then is to receive the output valve that is produced by carry bit 312.And addend multiplexer 326B, one of them input end receives the partial product 334B that is produced by partial product generator 336B, and another input end is the output valve of the anti-phase difference 332 of receiver selectivity then.And each addend multiplexer 326 will be received the control input signal of a micro-order kenel signal 324, and indicating micro-order kenel signal is a PMULSAD micro-order or a multiplying order.If micro-order kenel 324 signals represent that then addend multiplexer 326A will select carry bit 312 to be used as its output valve, otherwise addend multiplexer 326A will then be selected partial product 334A, as its output valve for carrying out a PMULSAD micro-order.In addition for addend multiplexer 326B, when micro-order kenel 324 is implemented as a micro-order PMULSAD, then the output valve that addend multiplexer 326B will select the anti-phase difference 322 of selectivity expression this moment is as its output valve, otherwise, will select partial product 334B as its output valve.
In multiplier 338A and 338B, also comprise the first adder 328A and the second addition 328B usually.In embodiments of the present invention, comprise saving add with carry musical instruments used in a Buddhist or Taoist mass (carry-save adder) in the totalizer 328.First adder 328A is mainly the output valve that receives addend multiplexer 326A, and second adder 328B is then for receiving the output valve of addend multiplexer 326B.Just, first adder 328A can be with received output valve, no matter be partial product 334A, or the output valve of carry bit 312 does addition, and controlled by the signal of micro-order kenel signal 324.Identical, and though second adder 328B also can receive be by partial product 334B as the addend addition, or the anti-phase difference 322 of selectivity does addition, also all is to control according to the signal of micro-order kenel signal 324.
In embodiments of the present invention, totalizer 328 is mounted and is mainly used at least nine addend additions.Wherein, each addend comprises at least 16 bits.Particularly all addends have at least eight bits overlapping, and 32 bits that comprised in its summation then produce via totalizer 328.
In the embodiment of the invention, each totalizer 328 comprises the saving add with carry musical instruments used in a Buddhist or Taoist mass of one first row 3:2, is six center section products in order to simplify nine partial products; The saving add with carry musical instruments used in a Buddhist or Taoist mass of secondary series 3:2, then simplifying six partial products is four center section products; The saving add with carry musical instruments used in a Buddhist or Taoist mass of the 3rd row 3:2, will simplify four partial products is three partial products; And the saving add with carry musical instruments used in a Buddhist or Taoist mass of the 4th row 3:2, then three partial products being reduced to two center section products, it comprises a carry value and a total value.
Multimedia elongation technology unit 206 comprises the totalizer 332 of a full adder usually, in order to receive the total value that is produced by first adder 328A and second adder 328B.In embodiments of the present invention, totalizer 332 is referred to as a full adder.The output valve addition that full adder 332 will be produced by first adder 328A and the 2nd 328B, produce a last total value, and this absolute difference package summation instruction results 108 just, and hypothesis micro-order kenel signal 324 is a PMULSAD micro-order, otherwise, then be the product addition that two 16 bit multipliers are produced, produce a last total value.In embodiments of the present invention, if the result who wishes is the multiplication products value of one 16 bit, then import the long-pending value that one 0 value (for example: its input is not by 16 performed bit multiplication of multiplier) in full adder 332 can produce the multiplication of 16 final bits.In the present embodiment, link the multiplication that removes to carry out one 32 bit mutually by two multipliers 338 and full adder 332.
Multimedia elongation technology unit 206 in the present embodiment also comprises two and saves add with carry musical instruments used in a Buddhist or Taoist mass (carry-save adder) (not expression in the drawings).Saving add with carry musical instruments used in a Buddhist or Taoist mass is used for simplifying summation and the carry among totalizer 328A and the 328B, and it is become single summation and carry, is provided in the full adder 332 again, and then produces last single summation 108.
According to above description, can more effective generation finish the absolute value of each numerical value in package difference 314, mainly be to add carry bit 312 simultaneously to obtain with the anti-phase difference 322 of selectivity.That is to say, if given known package difference 314 numerical value when being negative, its absolute value can come anti-phase to the value of first multiplexer 318 via 314 generations of package difference.Afterwards, the actual carry bit 312 (if package difference 314 is a negative, then a cloth woods numerical value is 1) that adds is to anti-phase difference.So, if,, then will have a cloth woods numerical value 1 as the numerical value in the anti-phase difference 322 of selectivity by anti-phase package numerical value 314.Opposite, if known package difference 314 is a positive number, then its absolute value can be finished to the numerical value of first multiplexer 318 via 314 generations of noninverting (non-inverted) package difference, the actual then carry bit 312 (if package difference 314 is positive numerical value, then a cloth woods numerical value is 0) that adds is in noninverting package difference 314.So,, then will have a cloth woods numerical value 0 if, be used as the numerical value in the anti-phase difference 322 of selectivity by noninverting package numerical value 314.Further, when the carry bit that added 312 and the anti-phase difference 322 of selectivity are carried out simultaneously, rather than produce earlier behind the absolute value of difference, therefore, will produce an absolute difference package summation instruction results fast with its continuous adding up mutually.
With reference to figure 4, according to the present invention among Fig. 2, the process flow diagram of the absolute difference package summations instruction that microprocessor 200 is performed.Its process flow diagram is by square 402 beginnings.
In square 402, a microprocessor 200 reads in an absolute difference package summation macro instruction 100 among Fig. 1.Then, carry out square 404.
In square 404, the instruction transform logic 202 among Fig. 2 will be changed the instruction of absolute difference package summation, be transformed in PSUBSAD 214 and PMULSAD 242 micro-orders, and be stored in the micro-order storage row 204.Afterwards, be sent to the interior multimedia elongation technology unit 206 of Fig. 2 again.Then, carry out square 406.
Carry out PSUBSAD micro-order 214 in square 406 expression multimedia elongation technology unit 206.And,,, produces the numerical value of eight package differences 314 and carry bit 312, as shown in Figure 3 with the action that subtrahend 106 and minuend 104 subtract by the subtracter in Fig. 3 subtract logic 308 at the action of PSUBASAD micro-order 214.Cloth woods numerical value is 1 carry bit 312, and the numerical value of expression package difference 314 is negative; Otherwise if the cloth woods numerical value of carry bit 312 is 0, the numerical value of then representing package difference 314 is positive number.Then, carry out square 408.
Square 408 expression bit group phase inverters 316 numerical value, and suppose that the cloth woods numerical value of pairing carry bit 312 is 1 with anti-phase package difference 314, then first multiplexer 318 can select by bit group phase inverter 316 anti-phase difference.But if the cloth woods numerical value of carry bit 312 is 0, then first multiplexer 318 will be selected the value of package difference 314, after 318 selections of first multiplexer, will produce a numerical value to as in the anti-phase difference 322 of the selectivity among Fig. 3.Then, carry out square 412.
The signal of square 412 expression micro-order kenels 324 will be provided in second multiplexer 326, as shown in Figure 3.Then, carry out square 414.
Whether square 414 expressions second multiplexer 326 will determine micro-order kenel 324, be a PMULSAD micro-order 212.If whole flow process will proceed to square 422, otherwise, square 416 then carried out.
Expression in square 416, second multiplexer 326 will be selected as among Fig. 3, the signal of the partial product 334 that partial product generator 336 is produced.Then, carry out square 418.
Expression with partial product 334 additions of the totalizer in Fig. 3 328 and 332, produces a result who is controlled by multiplying order in square 418.And this flow process leaves it at that, just by the micro-order kenel 324 selected result of flow that multiplying order produced.
Square 422 expression micro-order kenels 324 select PMULSAD micro-orders 212 as signal, so the function of addend multiplexer 326 then is to select eight anti-phase differences 322 of selectivity, the output valve that is produced with carry bit 312.Then, carry out square 424.
Square 424 expressions, totalizer 328 and 322 with eight anti-phase differences 322 of selectivity and eight carry bit 312 additions, and produces the result 108 of PSADBW 100 instructions.And whole flow process to 424 is ended.
Therefore, by mentioned before idea as can be known, the present invention can carry out the instruction of absolute difference package summation further in the cycle of two micro-orders.In the present embodiment, the execution of absolute difference package summation macro instruction 100 is via four core clock recurrence intervals in microprocessor 200.That is to say that it is anti-phase that the selection of package subtractions and difference is carried out in multimedia elongation technology unit 206, is according to PSUBSAD micro-order 214, at a unitary core in the cycle; And carry out the addition of carry bit 312 and the anti-phase difference 322 of selectivity, and then be according to PMULSAD micro-order 212, the time of processing then is three cores in the cycle.
So by in the top description as can be known, second multiplexer, 326 activation multimedia elongation technology unit 206 make it can more efficient use totalizer 328 and 332, go optionally to carry out the instruction of instruction of absolute difference package summation or multiplier.Therefore, see through and reuse shared circuit, might be able to reduce when carrying out a plurality of instruction the sum of required circuit.
Though purport of the present invention, feature and advantage are by detailed description, other embodiment related to the present invention still is contained among the present invention.For example: though embodiment mentioned be that it is on the 128 bit operand packages that the device of present embodiment also is operable in absolute difference package summation version about 64 bit operand packages of multimedia elongation technology absolute difference package summation instruction version.And, though anti-phase cloth woods numerical value is described to 1 or 0 in the present embodiment, but perhaps cloth woods numerical value can be expressed as other different types in circuit component, the logical OR that particularly utilizes different progression is that voltage, the electric current habitually practised show its pattern, and the pattern that shows does not break away from idea of the present invention.At last, the microprocessor carried of the embodiment of the invention can also comprise a plurality of multimedia elongation technology performance elements.
Same, finishing outside the hardware used in the present invention, the present invention can also finish in computer readable code (for example: computer readable program code, data or the like), and specific implementation is on a spendable computer media.Computer code then is that disclosed function or device or both all are to finish according to the present invention.For example: the present invention can comprise Verilog HDL, VHDL, Altera HDL language such as (AHDL) via general program language (for example: C, C++, JAVA or similar language), GDSII database, machine language (HDL), or other programmable circuit (for example: schematic) finish.
Computer code can be installed on any known spendable computer media, comprises semiconductor memory, disk, CD (for example: CD-ROM, DVDROM or other similar device) and similarly is that the computing machine that the computer signal concrete manifestation can be come out can use (for example: can read) transmitting device (for example: carrier wave or other device comprise devices such as numeral, optics, simulation).For example, computer code can be transmitted in communication network, and it comprises the network of internet and enterprises.Therefore can understand the present invention can be specialized, via computer code (for example: part is smart code manually), microprocessor sign indicating number, or a system-level design, similarly be a System on Chip/SoC (System on chip (SOC)), and it can be transformed in the hardware of integrated circuit.
Comprehensive above described content, it only is preferred embodiment of the present invention, its all contents that disclosed are not in order to limit claim of the present invention; All other do not break away from the equivalence of being finished under the disclosed spirit and changes or modification, all should be included in the described claim scope.

Claims (14)

1. a device of carrying out the instruction of multimedia elongation technology absolute difference package summation is characterized in that, comprises:
One subtracter (308) produces the package difference (314) of a package operand instruction, and produces the corresponding carry bit of each package difference (312);
One phase inverter (316) connects this subtracter, and according to the numerical value of each corresponding carry bit, optionally anti-phase each difference promptly produces the anti-phase difference of a selectivity (322) of each this package difference;
At least one first multiplexer (318), be connected in this phase inverter (316) and this subtracter (308), in order to differentiate by this indicated package difference of this carry bit is positive number or negative, and to select corresponding this package difference be an output valve of positive number or negative, if when the carry bit that should be correlated with was cloth woods numerical value 1, the difference of then representing this carry bit correspondence was a negative; Opposite, Ruo Bulin numerical value is 0, represents that then the pairing difference of this carry bit is a positive number; And
One totalizer (338) connects this first multiplexer and this subtracter, with this output valve addition of this carry bit and this first multiplexer, produces the result of multimedia elongation technology absolute difference package summation instruction.
2. the device of execution multimedia elongation technology absolute difference package summation instruction as claimed in claim 1 is characterized in that, more comprises:
One instruction kenel (324) input is carried out an absolute difference package summation instruction or a multiplying order in order to indicate this device; And
One second multiplexer (326), be connected in this first multiplexer (318) and this subtracter, and receive the carry bit that subtracter produces, when this instruction kenel was input as the instruction of absolute difference package summation, the carry bit that subtracter provides and the output valve of this first multiplexer were to this totalizer; And when this instruction kenel is input as a multiplying order, provide partial product to this totalizer by partial product generator (336).
3. the device of execution multimedia elongation technology absolute difference package summation instruction as claimed in claim 1 is characterized in that above-mentioned totalizer more comprises:
One first adder (328A) and a second adder (328B) are when carrying out multiplying order, first adder (328A) and a second adder (328B) carry out the addition program of first's product (334A) and second portion product (334B) separately respectively; When carrying out the instruction of absolute difference package summation, first adder (328A) and second adder (328B) carry out the addition program of carry bit (312) and the anti-phase difference of selectivity (322) respectively; And
One the 3rd totalizer (332), be connected in this first adder (328A) and this second adder (328B), first total value of this first adder generation and the second total value addition of this second adder generation are produced an absolute difference package summation instruction results.
4. the device of execution multimedia elongation technology absolute difference package summation instruction as claimed in claim 3, it is characterized in that, when carrying out a multiplying order, import in one 0 value to the three totalizers (332), the 3rd totalizer (332) produce a partial product be worth the result.
5. the device of execution multimedia elongation technology absolute difference package summation instruction as claimed in claim 3 is characterized in that, above-mentioned first total value comprises the total value of this carry bit and the total value that this second total value comprises this first multiplexer output.
6. microprocessor that produces an absolute difference package summation is characterized in that this microprocessor comprises:
One dictate converter is transformed at least the first micro-order and at least the second micro-order with multimedia elongation technology absolute difference package summation macro instruction; And
One multimedia elongation technology identical element is connected to this dictate converter, corresponding to the control of at least one first micro-order and at least one second micro-order, produces the result of this absolute difference package summation macro instruction,
Wherein, this multimedia elongation technology unit comprises a multitask logic device, it has micro-order kenel control input end, if wherein above-mentioned micro-order kenel control input is this second micro-order, then this multitask logic device will be selected the anti-phase difference package of selectivity operand, be provided to a totalizer, as a plurality of addends.
7. the microprocessor of generation one absolute difference package summation as claimed in claim 6, it is characterized in that, when this multimedia elongation technology unit corresponding to this first micro-order, produce the package difference of operand, and this multimedia elongation technology unit comprises a plurality of subtracters, in order to produce this package difference of this operand; And when corresponding to this second micro-order, produce the absolute value summation of this package difference.
8. the microprocessor of generation one absolute difference package summation as claimed in claim 7, it is characterized in that, these above-mentioned a plurality of subtracters, at a single microprocessor clock in the recurrence interval, produce this package difference of this operand, and these a plurality of subtracters will produce a signal, corresponding to this package difference of each this operand.
9. the microprocessor of generation one absolute difference package summation as claimed in claim 8, it is characterized in that, above-mentioned each package difference is anti-phase by selectivity, be whether to be positive number or negative according to this package difference, if wherein this package difference is a negative value, then anti-phase this package difference is if this package difference is on the occasion of, then not anti-phase this package difference.
10. the microprocessor of generation one absolute difference package summation as claimed in claim 9, it is characterized in that, suppose that list-directed input list shows that this micro-order kenel is not this second micro-order, then these a plurality of multiplexers will be selected a plurality of partial products of a multiplier, provide to these these a plurality of addends of totalizer conduct.
11. device that produces the instruction of absolute difference package summation, place a microprocessor, this microprocessor has a subtract logic, and this subtract logic produces the package bit group of a difference in order to the instruction subtraction with each subtrahend operand and minuend operand; It is characterized in that this device comprises:
A plurality of storage units, in order to store a signal bit, wherein whether pairing this difference of this signal bit indication is positive number or negative;
A plurality of multiplexers connect pairing these a plurality of storage units, produce an output valve; If this signal bit is a positive number, then this output valve comprises this difference, if this signal bit is a negative, then this output valve comprises the complement of this difference; And
One multitask logical block is connected to this a plurality of multiplexers, during in order to the execution multiplying order, selects partial product to provide to an adder logic, when carrying out the instruction of absolute difference package summation, selects this output valve of this signal bit to provide to this adder logic.
12. a method of carrying out the instruction of multimedia elongation technology absolute difference package summation is characterized in that, comprises:
Produce the package difference of an instruction operands package, and produce the relevant carry bit of each package difference;
Judge this package difference, indicating this package difference by this relevant carry bit of this package difference is positive number or negative;
Select the corresponding numerical value of this carry bit, if this package difference is a positive number, the corresponding numerical value of then selecting this carry bit is the package difference, if this package difference is a negative, the corresponding numerical value of then selecting this carry bit is the complement of package difference;
Provide the micro-order kenel to the addend multiplexer;
When the addend multiplexer judges whether the micro-order kenel is a PMULSAD micro-order;
If yes,
The output valve that the addend multiplexer selects anti-phase difference and carry bit to be produced;
With this carry bit addition, produce one first total value;
Should the addition of correspondence numerical value, produce one second total value; And
With this first total value and this second total value addition, produce an instruction results.
13. the method for execution multimedia elongation technology absolute difference package summation instruction as claimed in claim 12 is characterized in that, more comprises:
After producing this relevant carry bit of this each package difference, store this carry bit; Before producing this carry bit, change this absolute difference package summation and instruct at least one first micro-order and one second micro-order, wherein produce this carry bit by this first micro-order, reach and carry out this carry bit addition step by this second micro-order;
When the addend multiplexer judges whether the micro-order kenel is a PMULSAD micro-order;
If not,
It is output valve that the addend multiplexer is selected partial product by the partial product generator, the partial product addition is produced the result of multiplying order.
14. the method for execution multimedia elongation technology absolute difference package summation instruction as claimed in claim 13, it is characterized in that, this selects the step of the corresponding numerical value of this carry bit, and is the step of carrying out simultaneously by the step that second micro-order is carried out this carry bit addition.
CNB2005100058802A 2004-01-27 2005-01-27 Apparatus and method for generating packed sum of absolute differences Active CN100418054C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/765,497 US7376686B2 (en) 2003-01-31 2004-01-27 Apparatus and method for generating packed sum of absolute differences
US10/765,497 2004-01-27

Publications (2)

Publication Number Publication Date
CN1641565A CN1641565A (en) 2005-07-20
CN100418054C true CN100418054C (en) 2008-09-10

Family

ID=34886504

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100058802A Active CN100418054C (en) 2004-01-27 2005-01-27 Apparatus and method for generating packed sum of absolute differences

Country Status (2)

Country Link
CN (1) CN100418054C (en)
TW (1) TWI249685B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013095599A1 (en) 2011-12-23 2013-06-27 Intel Corporation Systems, apparatuses, and methods for performing a double blocked sum of absolute differences
US10481870B2 (en) * 2017-05-12 2019-11-19 Google Llc Circuit to perform dual input value absolute value and sum operation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6377970B1 (en) * 1998-03-31 2002-04-23 Intel Corporation Method and apparatus for computing a sum of packed data elements using SIMD multiply circuitry
US20030005267A1 (en) * 2001-06-21 2003-01-02 Koba Igor M. System and method for parallel computing multiple packed-sum absolute differences (PSAD) in response to a single instruction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6377970B1 (en) * 1998-03-31 2002-04-23 Intel Corporation Method and apparatus for computing a sum of packed data elements using SIMD multiply circuitry
US20020062331A1 (en) * 1998-03-31 2002-05-23 Abdallah Mohammad A. A method and apparatus for computing a packed sum of absolute differences
US20030005267A1 (en) * 2001-06-21 2003-01-02 Koba Igor M. System and method for parallel computing multiple packed-sum absolute differences (PSAD) in response to a single instruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Instruction Set Reference. Intel Architecture Softare Developer's Manual,No.2. 1999
Instruction Set Reference. Intel Architecture Softare Developer's Manual,No.2. 1999 *

Also Published As

Publication number Publication date
CN1641565A (en) 2005-07-20
TW200525381A (en) 2005-08-01
TWI249685B (en) 2006-02-21

Similar Documents

Publication Publication Date Title
US6292886B1 (en) Scalar hardware for performing SIMD operations
Mallasén et al. PERCIVAL: Open-source posit RISC-V core with quire capability
JP5089776B2 (en) Reconfigurable array processor for floating point operations
US7480685B2 (en) Apparatus and method for generating packed sum of absolute differences
US4866652A (en) Floating point unit using combined multiply and ALU functions
Chong et al. Configurable multimode embedded floating-point units for FPGAs
US9740488B2 (en) Processors operable to allow flexible instruction alignment
JPH11511577A (en) Device for performing multiply-add operation of packed data
JPH02226420A (en) Floating point computation execution apparatus
CN100418054C (en) Apparatus and method for generating packed sum of absolute differences
Sima et al. An 8x8 IDCT Implementation on an FPGA-augmented TriMedia
US8214419B2 (en) Methods and apparatus for implementing a saturating multiplier
US5539684A (en) Method and apparatus for calculating floating point exponent values
Belyaev et al. A high-perfomance multi-format simd multiplier for digital signal processors
US9223743B1 (en) Multiplier operable to perform a variety of operations
JP3462054B2 (en) Parallel addition / subtraction circuit
Nannarelli FPGA based acceleration of decimal operations
Chen et al. A reconfigurable architecture of high performance embedded DSP core with vector processing ability
Mallasén Quintana et al. PERCIVAL: Open-source posit RISC-V core with quire capability
Mule et al. Design and Performance Analysis of FPGA based DPU using MAC Unit
Varma et al. Design and Evaluation of Efficient Decimal Multiplier Architectures
Wang et al. The design of arithmetic logic unit based on ALM
JP2000293357A (en) Microprocessor
Lee et al. Design of a DSP Unit for 32-bit Embedded EISC Microprocessor
KR100246472B1 (en) Digital signal processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant