GB2365593A

GB2365593A - System and method for performing popcount using a multiplier

Info

Publication number: GB2365593A
Application number: GB0104187A
Authority: GB
Inventors: Richard B Zeng
Original assignee: Hewlett Packard Co
Current assignee: HP Inc
Priority date: 2000-02-21
Filing date: 2001-02-20
Publication date: 2002-02-20
Also published as: GB0104187D0

Abstract

A system and method are disclosed which utilize an existing multiplier 300, 302 to perform both multiplication and popcount, thereby eliminating the separate, dedicated circuitry required for performing popcount in the prior art. In a preferred embodiment, a popcount generator for generating a count of the number of high bits of an operand is provided, which comprises a multiplier. The multiplier is configured 30, 32, 34, 36 to receive a first operand and a second operand as inputs, and output a popcount for the first input operand. In a preferred embodiment, the multiplier is configured to perform both multiplication and popcount. That is, in a preferred embodiment the multiplier is configured to output a product resulting from the multiplication of the operands input to the multiplier, if multiplication is enabled for the multiplier, and the multiplier is configured to output the popcount for the first operand, if popcount is enabled for the multiplier.

Description

<Desc/Clms Page number 1> SYSTEM AND METHOD FOR PERFORMING POPCOUNT USING A MULTIPLIER RELATED APPLICATIONS This application is related to co-pendifig and commonly assigned U.S. Application Serial Number (Attorney Docket No. 10971278) entitled "LINEAR SUMMATION MULTIPLIER ARRAY IMPLEMENTATION FOR BOTH SIGNED AND UNSIGNED MULTIPLICATION," the disclosure of which is hereby incorporated herein by reference.

TECHNICAL FIELD The present invention relates in general to performing a population count for an operand, and in specific to a system and method that utilizes a multiplier to perform a population count for an operand.

BACKGROUND Traditionally, computer processors of the prior art include dedicated circuitry for performing a "popcount" function. Popcount is the function of counting the number of high bits (i.e., I's) in an input or output operand. That is, popcount counts the population of I's in an operand. Popcount is commonly performed for multimedia instructions, for example. Prior art processors commonly include a dedicated circuit for performing popcount. That is, prior art processors typically include stand-alone circuitry that performs pocount as its sole function.

Prior art popcount circuitry typically utilizes a Carry-Save-Adder (CSA) array to perform the popcount function. An example of such prior art popcount circuitry is illustrated in FIG. 1. As an example, suppose popcount is being performed for a 16-bit operand A[15:0]. As shown in FIG. 1, the operand A[15:0] would be input to CSA array 100, of which only a portion is illustrated for simplicity. The lowest three significant bits A[O], A[ I and A[2] are input to CSA 102, the next three significant bits A[3], A[4], and A[5] are input to CSA 104, the next three significant bits A[6), A[7], and A[8] are input to CSA 106. As shown in FIG. 1, the most significant bit A[ 15) is input to CSA 112 along with two zeroes. CSA 102 functions to sum bits A[O], A[ 1], and A[2] to o utput a sum S I and carry C 1. CSA 104 functions to sum bits A[3], A[4], and A[5] to output a sumS2and carryC2. Likewise, CSA 106 functions to sum bits A[6), A[7], and A[8] to output a sumS3and carryC3. It should be understood that CSAs 112, 114, and 116 function in a like manner to sum the input bits and output a sum and carry. The sums S1,S2, andS3output by CSAs 102, 104, and 106, respectively, are input to CSA 110, which adds the sums and outputs a sumS5and carryC5. The carriesCl, C2, and C3 output by CSAs; 102, 104, and 106, respectively, are input to CSA 108, which adds the carries and outputs a sumS4and carryC4- CSA array 100 comprises further CSA.s that function in a like manner to add the sums and carries until ultimately a final sum and final carry are added together to generate the popcount for operand A[ 15: 0].

As discussed above, prior art popcount circuitry, such as that illustrated in FIG. 1, is implemented as a dedicated circuitry that only performs popcount. Such dedicated circuitry is problematic because it requires dedicated components and circuitry to be included within a

processor chip that only perform a single function (i.e., popcount). Also, the dedicated popcount circuitry of the prior art increases the complexity of the chip's design. Furthermore, such dedicated popcount circuitry of the prior art results in increased time and effort required for both functional verification and electrical verification of the design which causes delays in design schedules (e.g., delays getting the product to market). Furthermore, because the dedicated circuitry requires such additional components, it increases the cost of producing the chip. Also, the dedicated popcount circuitry of the prior art consumes area within a chip. Given the ever-increasing performance requirements of processor chips and the size restrictions typically placed on such chips, surface area of a chip is a valuable asset within a chip. Thus, prior art popcount circuitry consumes this valuable surface area of a chip. Further, the dedicated popcount circuitry of the prior ail consumes power, thus requiring additional power for the chip. Moreover, the dedicated popcount circuitry of the prior art generally has a negative impact on the processor's performance. For example, the dedicated popcount circuitry of the prior art typically results in delays in scheduling instructions, thereby hindering the processor's efficiency.

SUMMARY OF THE INVENTION In view of the above, a dpsire exists for a system and method for performing popcount that does not require a dedicated circuitry within the processor to perfonn such popcount. A further desire exists for a system and method for performing popcount that utilizes existing circuitry within a processor to perform popcount, rather than requiring a dedicated circuit to solely perform popcount. Accordingly, a desire exists for a system and method that reduces the cost, number of components, amount of circuitry, area consumption, power consumption, and complexity of design of a processor chip by eliminating the dedicated circuitry for performing popcount. Yet a further desire exists for a system and method that performs popcount in a manner that increases a processor's efficiency by allowing faster scheduling to be accomplished by the processor when popcount is performed. Still a further desire exists for a system and method that decreases the schedule required for completion of the design of a processor chip, thereby allowing the processor chip to reach the market in a more timely manner.

These and other objects, features and technical advantages are achieved by a system and method which utilize an existing multiplier to perform both multiplication and popcount. In a preferred embodiment, a popcount generator for generating a count of the number of high bits of an operand is provided, which comprises a multiplier. The multiplier is configured to rec eive a first operand and a second operand as inputs, and output a popcount for the first operand. In a preferred embodiment, the multiplier is configured to perform both multiplication and popcount. That is, in a preferred embodiment the multiplier is configured to output a product resulting from the multiplication of the operands input to the multiplier, if multiplication is enabled for the multiplier, and the multiplier is configured to output the popcount for the first operand, if popcount is enabled for the multiplier.

In a preferred embodiment, a popcount control signal is utilized to enable the multiplier to perform either multiplication or popcount. For example, when the popcount control signal is high, the multiplier may perform popcount, and when the popcount control signal is low, the multiplier may perform multiplication. In a preferred embodiment, the multiplier comprises a multiplier array. Furthermore, in a preferred embodiment very little

circuitry is added to an existing multiplier to enable the multiplier to perform both multiplication and popcount, thereby eliminating the requirement of a dedicated circuit for performing popcount. In a preferred embodiment, the multiplier's circuitry for implementing the "all-inclusive" diagonal of the multiplier array, which includes every bit of the first operand (i.e., the operand for which popcount is desired when popcount is enabled), is configured to output each bit of such first operand as a partial product when popcount is enabled. Furthermore, in a preferred embodiment, the multiplier's circuitry for implementing the all-inclusive diaconal of the multiplier array is configured to output the appropriate product of the elements of such all-inclusive diagonal as a partial product when popcount is not enabled (meaning that multiplication is enabled). That is, when popcount is not enabled the circuitry for the all-inclusive diagonal of the multiplier array produces the appropriate partial products for the respective bits of the first and second operands included within the all-inclusive diagonal, as is desired for perfornfing multiplication. The circuitry for implementing the remaining portions of the multiplier array is as commonly implemented for such multiplier, in a preferred embodiment.

Additionally, in a preferred embodiment, when popcount is enabled for the multiplier, each bit of the second operand is set to 0. Accordingly, because the second operand is set to 0, the resulting partial product for every element of the multiplier array is 0, except for those elements of the all-inclusive diagonal of the multiplier array. Therefore, when popcount is enabled for a preferred embodiment, the resulting partial products from the multiplier array are all 0, except for those partial products of the all-inclusive diagonal of the multiplier array, which correspond to each bit of the first operand. Thus, when the partial products of the multiplier array are input to the CSA arr ay of the multiplier, the output results in the correct popcount for the first operand (i.e., the operand for which popcount is desired when popcount is enabled). On the other hand, when popcount is not enabled, the appropriate partial products of the multiplier array for multiplication of the first and second operands input to the multiplier are generated. Accordingly, when such partial products are input to the CSA array of the multiplier, the output of the multiplier results in the correct product resulting from the

multiplication of the first and second input operands, when popcount is not enabled for the multiplier.

It should be appreciated that a technical advantage of one aspect of the present invention is that a multiplier is implemented to perform both multiplication and popcount, thereby eliminating the dedicated circuitry to only perform popcount that is commonly implemented in the prior art. Accordingly, a dedicated circuitry is not required within the processor to perform popcount. Thus, it should be appreciated that a technical advantage of one aspect of the present invention is that the number of components and amount of circuitry implemented within a processor to allow for multiplication and popcount are minimized because a separate, dedicated circuit is not required to perform each function. Additionally, it should be appreciated that a technical advantage of one aspect of the present invention is that the amount of area required within the processor for the circuitry required for performing multiplication and popcount is minimized. Moreover, a further technical advantage of one aspect of the present invention is it allows for a processor to be produced more economically. That is, the reduction in the number of required components allows for a reduction in production costs for the processor. A further technical advantage of one aspect of the present invention is that because of a reduction in the number of components and complexity, the time and effort required for functional verification and circuit verification of the design is reduced, thereby allowing the product to be sent to market faster. Yet a further technical advantage of one aspect of the present invention is that it implements circuitry for performing both multiplication and popcount in a manner that reduces power consumption by such circuitry. Still a further technical advantage of one aspect of the present invention is that is allows for popcount to be performed in a manner that increases the processor's efficiency by allowing faster scheduling to be accomplished by the processor when popcount is performed.

The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the,art that the conception and specific embodiment disclosed may be readily

utilized as a basis for modifying or designing other structures for carrying, out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims,

BRIEF DESCRIPTION OF THE DRAWING For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying, drawing, in which: I FIGURE I shows an exemplary dedicated circuitry of the prior art for performing, popcount; FIGURE 2 shows an exemplary multiplier array that results from the multiplication of operands X[3:0] and Y(3:0]; FIGURE 3A shows circuitry for generating the partial products of a multiplier array and inputting such partial products to a CSA array to generate a final sum and final carry; FIGURE 3B shows circuitry for generating output of a multiplier of a preferred embodiment; FIGURE 4 shows a preferred embodiment of the present invention for implementing partial product circuitry that produces inputs to a CSA in the all-inclusive diagonal of a multiplier to perform both multiplication and popcount; and FIGURE 5 shows exemplary circuitry that may be implemented for generating the appropriate values to be input to the multiplier array for bits of a Y operand in a preferred embodiment.

DETAILED DESCRIPTION In a preferred embodiment, a multiplier is utilized not only for multiplication, but also for popcount. To understand the popcount algorithm utilized in a multiplier for a preferred embodiment, attention is directed to FIG. 2. Shown in FIG. 2 is a multiplier array 40 that results from the multiplication of operands X[3:0] and Y[3:0). Assume that popcount is desired for operand X[3:0]. It will be recognized from multiplier array 40 that area 42 (i.e., the diagonal of multiplier array 40) includes all of the bits X[3:0], i.e., bits X[0], X[l], X[2], and X(3]. Accordingly, if the bits for operand Y[3:0] contained in area 42 of multiplier array 40 were ignored (meaning X3*Y6=X3, X2*yl=X2, X1 *Y2=X], and X0*Y3=X0), then area 42 may be utilized to perform popcount for operand X[3:0]. That is, by summing all of the bits of the X[3:0] operand found on the diagonal 42 of multiplier 40, while ignoring the Y[3:0] operand, popcount can be accomplished for the X[3:0] operand. Thus, areas 44 and 46 of the multiplier array 40 may be set to zero (e.g., by setting Y[3:0] operand to be zero), and the bits for operand X[3:0] found in the diagonal 42 may be utilized to perform popcount for operand X[3:0], ignoring the bits for the Y[3:0) operand found in the diagonal 42. In this manner, the multiplier array 40 for an existing multiplier found within a processor may be utilized to accomplish popcount.

As shown in FIG. 2, it will be understood that area 42 of the multiplier array 40 is the dia gonal of the multiplier array 40, which includes all of the bits of the operand X[3:0]. Accordingly, when summing all of the bits for the X[3:0] operand found in the diagonal 42 of the multiplier array 40 and ignoring the bits for the Y[3:0] operand found in the diagonal 42, the popcount result for the X[3:0) operand is obtained. Thus, the diagonal 42 of the multiplier array 40 may be referred to as an "all-inclusive diagonal" of multiplier array 40 because such diagonal 42 includes all bits of the operand for which popcount is desired (i.e., operand X[3:0)).

As is well known in the art, a multiplier typically includes circuitry to AND each element of the multiplier array 40, such as element & *Y0, to produce a partial product (e.g., the product of XD*Yo). The partial products of the multiplier array 40 are then input to a CSA array included within the multiplier to generate the final results (the final sum output and

final carry output). The final output for multiplication is then generated by surriming the final sum and carry by an adder. For example, as shown in FIG. 3A, AND gates, such as AND gates 32, 34, and 36 may be included within a multiplier to each receive an input bit from X[3:0] and from Y[3:0] to produce an element of the multiplier array 40 as input. For instance, AND gate 36 may receiveX2and Yo as input to produce the partial product for elementX2*yOof multiplier array 40 as its output. Similarly, AND gate 34 may receive X, and Yj as input to produce the partial product for element Xj*Yj of multiplier array 40 as its output. Likewise, AND gate 32 may receive X0 and Y2as input to produce the partial product for elementX0*Y2of multiplier array 40 as its output. Of course, additional AND gates may be included within a multiplier to produce the partial products for all of the elements of multiplier array 40 in a like manner. As shown in FIG. 3A, the partial products are fed to a CSA array of the multiplier, including CSAs such as CSA 38, to sum the parlial products to generate the final sum and final carry.

Once the final sum and final carry are generated by the CSA array, they are added to produce the final product to be output by the multiplier. For example, as shown in FIG. 3B, a multiplier CSA array 300 generates a final sum and carry, which are surnmed in adder 302.

In a preferred embodiment, adder 302 outputs the final result for multiplication or popcount, depending on which function is enabled.

A preferTed embodiment utilizes a multiplier to not only perform multiplication, but also to accomplish popcount for an operand. Suppose for example, that popcount is desired for operand X[3:0]. Operand X(3:0] may be input to the multiplier along with operand Y[3:0], wherein each bit of operand Y(3:0] is set to 0. A variety of techniques may be utilized. to set operand Y(3:0] to 0. One technique that may be implemented in a preferred embodiment is to read operand Y[3:0] from a special register, which sets Y[3:0] to 0, when popcount is being performed. Another technique that may be implemented in a preferred embodiment is to set Y[3:0] to input into the multiplier array to zero when popcount is being performed in the manner as shown in FIG.5. In the exemplary implementation of FIG. 5, bit Yo of operand Y is input from its register to an AND gate 504. A pop_control signal input to an inverter 502, the output of which is input to AND gate 504. Thus, when pop__pontrol is a

high voltage level, meaning that popcount is enabled, the high voltage value is inverted by inverter 502 resulting in a low voltage value being input to the AND gate 504. Accordingly, the value for Yo output by AND gate 504 is a 0 (low voltage value), which is desired when popcount is being performed for the X operand. On the other hand, when pop_control is a low voltage level, meaning that multiplication is enabled for the multiplier, the low voltage value is inverted by inverter 502 resulting in a high voltage level being input to the AND gate 504. Accordingly, the value of Yo from its register will be passed as the output of AND gate 504, which is desired when multiplication is being performed. It should be recognized that any other technique for generating the Y[3:0] operand having a 0 value may be implemented in a preferred embodiment, and any such implementation is intended to be within the scope of the present invention. Accordingly, the multiplier will result in a multiplier array 40, as shown in FIG. 2. Because each bit of operand Y[3:0] is 0 when popcount is being performed, each partial product of the multiplier array 40 in areas 44 and 46 will be 0. Thus, each partial product of areas 44 and 46 of the multiplier array 40 are zeroed out as a bit of the X operand is ANDed with a bit of the Y operand having value 0. As discussed above, having areas 44 and 46 of the multiplier array 40 zeroed out is desirable when performing popcount. However, for the diagonal 42 of the multiplier array 40, the values of the Y[3:0] operands contained therein must be disregarded in a manner that allows the number of I's in the X[3:0] operand of the diagonal 42 to be counted. In a preferred embodiment, additional circuitry is added to an existing multiplier to allow the multiplier to accomplish this task (i.e., counting, the number of I's in the X[3:0] operand contained in the diagonal 42 of the multiplier array 40), as discussed more fully below in conjunction with FIG. 4.

In a preferred embodiment, very little circuitry is added to an existing multiplier to allow the multiplier to perform both multiplication and popcount. Turning to FIG. 4, a preferred embodiment for implementing a multiplier to perform both multiplication and popcount is shown. More specifically, FIG. 4 illustrates the circuitry implemented for generating partial products of the diagonal 42 of multiplier array 40, which allows the multiplier to perform both multiplication and popcount, in a preferred embodiment. That is, in a preferred embodiment, the circuitry implementation illustrated in FIG. 4 need only be

implemented for the portion of a multiplier that functions to generate the partial products of the diagonal 42 of the multiplier array 40, and the remainder of the multiplier circuitry that is utilized to generate the partial products of areas 44 and 46 of the multiplier array 40 as is commonly implemented for such multiplier. Thus, in a preferred embodiment, only the circuitry for producing the partial products of diagonal 42 of the multiplier array 40 are implemented as shown in FIG. 4, and the remainder of the multiplier circuitry is unchanged. In a preferred embodiment, an inverter 30, pass gate 32, and P-channel field effect transistor (PFET) 34 are added to the existing multiplier circuitry that functions to produce the partial products of diagonal 42 of the multiplier array 40 (AND gate for the partial product) to enable the multiplier to be utilized for both multiplication and popcount.

As shown in FIG. 4, a popcount control bit (shown as pop_control) is provided to indicate whether popcount is being performed (i.e., whether popcount is enabled for the multiplier). For example, when the pop_contiol bit is high (1), popcount is enabled for the multiplier, and when the pop__pontrol bit is low (0), multiplication is enabled for the multiplier. To better illustrate operation of a preferred embodiment, suppose that popcount is desired for operand X[3:0]. In a prefer-red embodiment, operands X[3:0] and Y[3:0] are input to the multiplier, wherein each bit of operand Y[3:0] is set to 0. Accordingly, as discussed above, the partial products of areas 44 and 46 of multiplier array 40 are 0, as a bit of operand X*[3:0] is ANDed with a bit of operand Y[3:0] having value 0.

In a preferred embodiment, the portion of the multiplier's circuitry for generating the partial products of the diagonal 42 of multiplier array 40 is implemented as shown in FIG. 4. FIG. 4 illustrates an example of generating the partial product for the element X3*YO of the diagonal 42 of multiplier array 40. As shown, operand YO is input to pass gate 32, and the pop_control bit is fed to pass gate 32 to control the gate's operation. Node YO' is output from pass gate 32 and fed to the AND gate 36 along with the. X3 operand. Therefore, the AND gate 36 outputs the partial product X3*YO', which is then fed to the multiplier's CSA (not shown) for summing, as is commonly performed within such multiplier.

Because popcount is enabled for the multiplier, the pop-Pontrol bit is set high (1) and each bit of the Y[3:0] operand is set to 0. Because the pop_pontrol bit is high (1), the pass

gate 32 is cut off. Additionally, the pop_control bit is fed through inverter 30 resulting in pop_control-b, which is then fed to PFET 34, as shown in FIG. 4. Thus, when pop_control is high (1) (meaning that popcount is enabled), pop__control-b is low (0), which turns on the PFET 34 that is connected to VDD. Thus, node Yo' of FIG. 4 is pulled up to a high voltage value (1) when popcount is enabled. Thereafter, Node Yo' having a high value (1) is fed to AND gate 36 along with operand X3, which causes AND gate 36 to output the value Of X3 as the partial product and Yo is ignored and has no effect on the partial product in the diagonal area 42 in the multiplier array 40. Circuitry for generating the partial product for the other elements of the diagonal 42 of multiplier array 40 is implemented in a like manner. Accordingly, when popcount is enabled, only the values of the X[3:0] operand are passed to the multiplier's CSA array to be surnmed. That is, because areas 44 and 46 of the multiplier array are zeroed and the circuitry of FIG. 4 functions to pass each bit of operand X[3:0] found in the diagonal 42 of multiplier array 40 to themultiplier's CSA array when popcount is enabled, the results of such CSA array are summed to produce the final product that is the popcount of operand X[3:0]. Therefore, the output of the multiplier is the popcount for operand X[3:0] when popcount is enabled.

Suppose now that operands X[3:0] and Y[3:0] are input to the multiplier and popcount is not enabled for the multiplier (meaning that multiplication is enabled). Thus, operands X[3:0] and Y[3:0] are to be multiplied together, and the multiplier is to output the resulting product. Again, the multiplier will produce the multiplier array 40, as shown in FIG. 2. Of course, when multiplication is enabled for the multiplier, the circuitry for generating the partial products of areas 44 and 46 function as is common for such multiplier because such circuitry of the multiplier is unchanged, in a preferred embodiment. Therefore, the multiplier will generate the appropriate partial product for each element of areas 44 and 46 of the multiplier array 40 and input the generated partial products to the multiplier's CSA array, as is commonly performed for such multiplier.

However, now that multiplication is enabled for the multiplier, the circuitry for generating the partial products of the elements of the diagonal 42 of the multiplier array 40 must generate the appropriate partial products for the elements, rather than passing the value

of the X[3:0] operand to the CSA array as with popcount. As discussed above, FIG. 4 illustrates an example of generating the partial product for the element X3*YO of the diagonal 42 of multiplier array 40. As shown, operand Yo is input to pass gate 32, and the pop_control bit is fed to pass gate 32 to control the gate's operation. Node Yo' is output from pass gate 32 and fed to the AND gate 36 along with the X3 operand. Therefore, the AND gate 36 outputs the partial product X3*YO', which is then fed to the multiplier's CSA (not shown) for surnming, as is commonly performed within such muliplier.

Because popcount is not enabled for the multiplier, the pop_control bit is set low (0), which turns the pass gate 32 on. Additionally, the pop_control bit is fed through inverter 30 resulting in pop_control-b, which is then fed to PFET 34, as shown in FIG. 4. Thus, when pop_control is low (0) (meaning that popcount is not enabled), pop_#control_b is high (1), which cuts off the pull-up PFET 34. Thus, the value for Yo is passed through the pass gate 32 to node Yo' when popcount is disabled. That is, node Yo' has value Yo when popcount is not enabled. Thereafter, node Yo' having value Yo is fed to AND gate 36 along with operand X3, which causes AND gate 36 to output the partial product Of X3*YO, which is the appropriate action for multiplication. Circuitry for generating the partial product for the other elements of the diagonal 42 of multiplier array 40 is implemented in a like manner. Accordingly, when popcount is not enabled (meaning that multiplication is enabled), the appropriate partial products of the diagonal 42 of multiplier array 40 are passed to the multiplier's CSA array to be summed. Therefore, the output of the multiplier is the product of operands X[3:0] and Y[3:0] when popcount is not enabled.

Even though operation of a preferred embodiment has been discussed above with reference to two 4-bit operands (X[3:0] and Y[3:0]), it should be understood that a preferred embodiment is not intended to be limited only to 4 x. 4 multiplication, but may be implemented for any size multiplication (i.e., any N x N multiplication). In a most preferred embodiment, the multiplier is utilized for 16 x 16 multiplication, as well as 16-bit popcount. Additionally, a preferred embodiment has been described above wherein popcount is enabled upon a pop_control bit being set to a high value (1). However, it will be recognized by one of ordinary skill in the art that alternative embodiments may be implemented in a similar

manner such that popcount is enabled upon the pop_control bit being set to a low value (0), and any such embodiment is intended to be within the scope of the present invention.

Thus, a preferred embodiment utilizes an existing multiplier to perform both multiplication and popcount, thereby eliminating the dedicated circuitry typically required within a processor of the prior art for performing popcount. As a result, fewer components and less circuitry is required for the processor design, thereby resulting in reduced cost, reduced area consumption for the design, reduced power consumption, and reduced complexity of the design as well as lower research and development costs and faster time to market for the resulting processor design. Additionally, a prefer-red embodiment results in an increase in processor efficiency in that it allows faster scheduling to be accomplished when the processor performs popcount.

It should be understood that the scope of the present invention is intended to encompass any multiplier now known or late;discovered being implemented to perform popcount. Although, in a most preferred embodiment, a Baugh-Woolley multiplier capable of performing both signed and unsigned multiplication as disclosed in application serial number (Attorney Docket No. 10971278] entitled "LINEAR SUMMATION MULTIPLIER ARRAY IMPLEMENTATION FOR BOTH SIGNED AND UNSIGNED MULTIPLICATION," the disclosure of which is hereby incorporated herein by reference, is implemented to perform both multiplication and popcount. It should be understood that a preferred embodiment may be implemented within a processor that is utilized for a computer system, such as a personal computer (PC), laptop computer, or personal data assistant (e.g., a palmtop PC). Of course, it should be understood that the present invention is intended to encompass any other type of device in which a preferred embodiment may be implemented, as well.

Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means,

methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

WHAT IS CLAIMED IS: 1 A popcount generator for generating a count of the number of high bits of an operand, said popcount generator comprising: a multiplier configured to receive a first operand and a second operand as inputs; and said multiplier adapted to output a popcount for said first operand.
2. The popcount generator of claim 1 wherein said multiplier is further adapted to output a product resulting from the multiplication of said first and said second operands, if said multiplier is enabled to perform multiplication, and wherein said multiplier is adapted to output said popcount for said first operand if said multiplier is enabled to perform popcount.
3. The popcount generator of claim 2 wherein said multiplier array includes a multiplier array, said popcount generator further including: circuitry for the diagonal
4.2 of said- multiplier array that includes every bit of said first operand adapted to output each bit of said first operand as a partial product of said diagonal of said multiplier array, when popcount is enabled for said multiplier. 4. A computer system comprising: at least one processor, wherein said at least one processor comprises a multiplier; and said multiplier configured to perform both popcount and multiplication, wherein dedicated circuitry for performing only popcount is not required within said processor.
5. The computer system of claim 4 further including: said multiplier configured to receive a first operand and a second operand as inputs; said multiplier configured to output the popcount for said first operand if popcount is enabled for said multiplier; and said multiplier configured for output the product resulting from the multiplication of said first and second operands if multiplication is enabled for said multiplier.

<Desc/Clms Page number 18>
6. The computer system of claim 4 wherein said multiplier provides a multiplier array, farther including: circuitry for implementing the diagonal of said multiplier array that includes every bit of said first operand, said circuitry configured to output each bit of said first operand as a partial product of said diagonal of said multiplier array when popcount is enabled for said multiplier; said circuitry for implementing said diagonal of said multiplier array that includes every bit of said first operand farther configured to output for each element of said diagonal the product of each element as said partial product when multiplication is enabled for said multiplier; and said multiplier configured to input each partial product resulting from said multiplier array to a CSA array, wherein said CSA array outputs the product resulting from the multiplication of said first operand and said second operand when multiplication is enabled for said multiplier and wherein said CSA array outputs the popcount result for said first operand when popcount is enabled for said multiplier.
7. A method of using a multiplier comprising the steps of- inputting a first operand to said multiplier; and inputting a second operand to said multiplier; and said multiplier outputting the popcount of said first operand if popcount operation is enabled for said multiplier.
8. The method of claim 7 farther including the step of. enabling said multiplier for performing popcount operation.

<Desc/Clms Page number 19>
9. The method of claim 7 further including the step of- selecting each bit of said first operand from a diagonal of said multiplier array, said diagonal comprising each bit of said first operand; and setting all partial products of said multiplier =ay to 0, except for the partial products of said diagonal of said multiplier array, wherein each partial product of said diagonal of said multiplier array is set to the corresponding bit of said first operand.
10. The method of claim 7 further including the step of: enabling said multiplier for performing multiplication, wherein said multiplier outputs the product of said first and second operands.