GB2300054A - Clipping integers - Google Patents

Clipping integers Download PDF

Info

Publication number
GB2300054A
GB2300054A GB9600781A GB9600781A GB2300054A GB 2300054 A GB2300054 A GB 2300054A GB 9600781 A GB9600781 A GB 9600781A GB 9600781 A GB9600781 A GB 9600781A GB 2300054 A GB2300054 A GB 2300054A
Authority
GB
United Kingdom
Prior art keywords
output
input
mask
bit
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB9600781A
Other versions
GB9600781D0 (en
Inventor
Alan H Karp
Dennis Brzezinski
Rajiv Gupta
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HP Inc
Original Assignee
Hewlett Packard Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Co filed Critical Hewlett Packard Co
Publication of GB9600781D0 publication Critical patent/GB9600781D0/en
Publication of GB2300054A publication Critical patent/GB2300054A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/76Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
    • G06F7/764Masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • G06F9/30014Arithmetic instructions with variable precision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations

Description

SYSTEM AND METHOD FOR CLIPPING INTEGERS 2300054 The present invention
relates to data processing systems and, more particularly, to a method and apparatus for parallel integer clipping.
Since their first use, computers have evolved dramatically. The first computers were primarily used for numerical calculations. As the storage and retrieval capabilities of computers improved, computers were used largely for data management. Whereas in the past computer storage was primarily used for storing text and numerical information, modern computers have storage capabilities sufficiently large to store large quantities of image data.
Another dramatic change in computers is vast improvement in processor speed. Increased processor speed has also lead to many new applications.
Parallel to the evolution of computers has been the evolution of video and sound technology, as well as communications technology. These technologies have become increasingly computerized.
A result of the parallel evolution of these technologies is the merging of applications, for example, enabling a user to interact with stored images and sound. The merging of video and audio technologies with computing is often called Multimedia. Multimedia relies both on modern computers' abilities for large data storage and for rapid calculations.
An important functionality for multimedia applications is the ability to transmit images and 2 sequences of images from one computer to another over existing communication networks, such as telephone networks. The data that needs to be transmitted is huge. For example, one color image on a SVGA terminal requires 780K bytes of data. A sequence of images, such as a motion picture, may contain billions of btes. Unfortunately, the bandwidth of now existing networks tends to be very narrow. To solve the problem of transmitting large quantities of data over narrow bandwidth networks, several industry standard image compression techniques have been devised, e.g., MPEG, JPEG, and H.261. These techniques are continuously being refined. Therefore, while it has been possible to design processors for executing particular compression techniques, it is very difficult to design a processor that may be programmed to handle anyone of the current and future compression techniques.
MPEG, JPEG, H.261 and other compression techniques have in common that they perform integer arithmetic operations on the numerical representation of the images they compress. These operations may result in numerical quantities outside the allowed range of the hardware. Consider, for example, that certain standards allow only eight bits per pixel. It would not be unusual for compression operations to produce pixel values with 11-12 significant bits. When it is attempted to display such values on a hardware display with an eight bit per pixel limit, the result is unpredictable. Therefore, it is necessary to clip the results to an acceptable range.
In the prior art, byte clipping to unsigned integer values has been accomplished with the following sequence: if (ix < 0) ix:= 0; if (ix > 255) ix:= 255.
3 Unfortunately, when translated into machine code, this simple sequence results in code with two compare operations and two branches. Furthermore, unless multiple-processors are used, the sequence would have to be executed sequentially on all pixels in an image. Because the bus width and register width of most modern computers is much greater than 8 or 16 bits, performing the clipping operation one pixel at a time does not use the full capability of the computer.
Some architectures, for example RISC, have only a limited number of opcodes. In these architectures it is therefore important to Provide new functionality without adding substantially to the number of opcodes used in processors. One such functionality is the clipping of integers to specified ranges. It is therefore desirable to provide these computers or processors with the capability of clipping integers to a specified range using only one machine instruction.
It is therefore desirable to provide a general purpose computer or an image processor with the capability of rapidly clipping integers. It is also desirable to provide a computer or processor with the capability of clipping integers in parallel.
Thus, there is a definite need for an improved technique for high-speed clipping of integers.
Broadly speaking, the invention enables parallel clipping of integer values. Hardware incorporating the invention contains means for storing input data to be clipped. The input data is sub-divided into one or more input blocks which are clipped in parallel. The hardware 4 also contains means for storing a mask. The mask is applied to the input data such that integers that are larger than the largest integer of a specified output range are replaced with that largest integer, and, conversely, integers that are smaller than the smallest integer of the output range are replaced with that smallest number. In the preferred embodiment, all bits of the input data word are operated upon in parallel.
The mask defines the range of the output data. The necessity of clipping an integer is determined by comparing the mask bits to corresponding data bits. Generally, if at least one set mask bit corresponds to a set data bit, clipping is necessary. If clipping is not necessary, the input data block is passed through to the output data block with any required movement of sign bit. If clipping is necessary, a clip signal is set to indicate to the hardware to force in the required smallest or largest integer into the output block.
The invention allows clipping from both signed and unsigned input data and to both signed and unsigned output data in any combination. However, it is generally not preferred to clip unsigned input to signed output. The data type of the input and output may be either encoded in the clip opcode, encoded into the mask, or in combination of opcode and mask.
It is an object of the invention to allow parallel integer clipping.
It is a further object of the invention to Provide for parallel integer clipping and to provide alternate embodiments which can perform the parallel clipping function with the minimum number of opcodes used in a processor.
is FIG. 1 is a block diagram of a processor having integer clipping circuit in accordance with a preferred embodiment of the invention; FIG. 2 is a block diagram showing the building blocks of the clipping circuit in accordance with a preferred embodiment of the invention and the data, mask, and output registers; FIG. 3 is a block diagram showing circuitry for determining the value of the sign bit of the input bucket corresponding to each byte of output data; FIG. 4 is a block diagram showing details of the half-word and byte logic circuits of the clipping circuit of Fig. 2; FIG. 5 is a block diagram showing the size select gate of each byte logic circuit of Fig. 4; and FIG. 6 is a logic diagram of the bit logic circuit of the byte logic circuits of Fig. 4.
6 The invention is intended for use in a general purpose computer processor or any other programmable processor. The invention enables the processor to clip integers using a single assembly language instruction. Furthermore, on processors having a bus width wider than the width of unclipped integers, the invention permits the clipping of several integers in parallel.
Several embodiments of the invention are discussed below with reference to Figures 1 through 6. However, those skilled in the art will readily appreciate that the detailed description given herein with respect to these figures is for explanatory purposes as the invention extends beyond these limited embodiments.
FIG. 1 is a block diagram of a microprocessor 100 having a register file 101, an adder 103, and integer clipping circuitry 105. The adder 103 and the integer clipping circuitry 105 are both connected to accept input from the register file 101 and to return results to the register file 101. The register file 101 is further connected to a memory 107. The register file 101 is operable to selectively load its registers from the memory 107 and to store the contents of registers back into the memory 107.
The microprocessor 100 includes a control unit 109 that is connected to a micro-code memory 111. The control unit is connected via a signal path 113 to the register file 101, the adder 102, the clipping circuit 105, as well as to other components not shown. The micro-code memory 111 contains instructions for controlling the operation of the various components of the microprocessor 100. The control unit is operable to receive instructions 7 from a program memory, e.g., the memory 107. The control unit decodes these instructions and fetches a corresponding micro-code program from the micro-code memory 111. In response to the micro-code instructions, which constitute this micro-code program, the control unit 109 sends signals via the signal path 113 to the various components of the processor 100, including the register file 101, the adder 103, and the clipping circuitry 105.
As a person skilled in the art would know, the micro-processor 100 is connected to other processors and/or peripheral devices via a bus. These components are well understood by persons skilled in the art and are, therefore, not shown.
In the preferred embodiment, the memory 107 contains addressable locations that each are 64 bits wide. Further, the registers in the register file 101 are at least 64 bits wide, the paths between the memory 107 and the register file 101, between register file 101 and the adder 103, and between the register file 101 and the clipping circuit 105 are all at least 64 bits wide.
In one application of the invention, the memory 107 contains digitized images. In image processing, it is common to perform arithmetic operations on the integers representing the data associated with a given picture element (pixel). These operations frequently extend the dynamic range of the data beyond that supported by the graphics hardware. To prevent unpredictable behavior, the data is clipped to fall in the range expected by the display hardware.
For example, a device capable of displaying 256 intensity values would have 8-bit (1 byte) pixels. A a modification of the image, such as edge enhancement, done with normal, 4- byte integer arithmetic might result in some values less than 0 or greater than 255.
is bits:
For illustrative purposes, consider the following 64 63 0 00000000 01011100 00000000 11000010 00000000 10001101 00000000 11110010 Example 1.
These 64 bits exemplify what may be stored in one memory location in the memory 107. In this example, each half word (16 bits) contains the numeric value for the light intensity to be displayed by one pixel. Each pixel value is held in one byte and the other byte contains zeros. Certain arithmetic operations, e.g., those associated with image compression protocols such as MPEG, may cause the pixel values to overflow the 8 bits of the least significant byte of its half word. For example, an arithmetic operation (or series of operations) may cause the following result:
63 0 00000000 01101001 11111111 11001110 00001000 00110101 11110110 10101001 Example 2.
If the dynamic range for pixel values of a device is 0 to 255, to prevent the unpredictable outputs from those half-words that are less than 0 or greater than 255, the values in the half-words are clipped to conform to the acceptable range. The clipping is done in such a fashion that negative values are replaced by 0 (00000000) and those greater than 255 are replaced by 255 (11111111).
9 Binary data formats exist for representing signed and unsigned integers. The 2's compliment format is the most commn representation for signed integers and is used in this description of the preferred embodiments of the invention. This allows three different types of clipping depending on the input and output type, namely, clip unsigned input to unsigned output, clip signed input to unsigned output, and clip signed input to signed output. A fourth possibility is to clip unsigned input to signed output. However, according to the preferred embodiment, such a clipping is illegal and an attempt to do such a clipping results in returning the input data as the result. of course, alternative embodiments could also allow clipping of unsigned input to signed output.
An instruction, CLIP, directs the processor 100 to clip integer values stored in a register in the register file 101. The CLIP instruction has the following format:
CLIP xyz source, mask, target where x indicates the size of the input data - D for 64bits, W for 32 bits, H for 16 bits, and B for 8 bits; y indicates whether the input is unsigned (U) or signed (S); and the flag z indicates whether or not the output should be clipped to an unsigned (U) or a signed (S) intger. The mask allows the integers stored in the source register to be clipped to any range. Furthermore, the mask allows clipping all the pixel values stored in the register in parallel. In one embodiment, y equals 0 for unsingned input and y equals 1 for signed input; similarly, z equals 0 for unsigned output and y equals 1 for signed output.
The mask contains a 1 for each out-of-range bit and 0 for each in-range bit.
is Figure 2 is a block diagram of certain components of the microprocessor 100; namely, it shows the building blocks of the clipping circuit 105 and the data, mask, and output registers. The microprocessor 100 contains a mask register 201, a data register 203, and an output register 205. Each register 201 through 205 is contained in the register file 101. The CLIP instruction indicates to the processor 100 which register in the register file to use for data, mask, and output register, respectively.
The microprocessor 100 also contains the clip logic 105. The clip logic 105 is connected to each register 201 through 205.
The clip logic 105 is built from a set of replicated circuits. The clip logic 105 contains one double word logic circuit 207, two word logic circuits 209a-b, four half-word logic circuits 211a-d, eight byte logic circuits 213a-h, and 64 bit logic circuits 215 (not individually named). The word logic circuits 209 are identical to each other, the half-word logic circuits 211 are identical to each other, the byte logic circuits 213 are identical to each other, and the bit logic circuits 215 are identical to each other. What differentiates the various replicated circuits from one another is how it is connected to other circuit components. Those connections are discussed below in conjunction with Figures 3 through 5. Each bit of the mask register 201 and of the data register 203 is connected as input to a corresponding bit logic 215. Similarly, each bit of the output register is connected as an output to the corresponding bit logic 215.
The parallel integer input data is divided into one or more input blocks. For example, each input block could be a half-word in which case, supposing 64-bit wide 301a 301b 301c 301d 301e 11 input data, there would be four input blocks, each being 16 bits wide.
Figure 3 is a block diagram of the double word logic 207. This figure shows the circuitry for determining the value of the sign bit of the input data block corresponding to each byte of output data. The high order bit of each byte in the data register is connected to one or more muxes 301a through 301h. For example, bit 63 is connected to all of the muxes, and bit 7 is connected only to mux 301a. Each mux is selected using the BRWD control lines. The BHWD control lines are set from the value of the X field of the CLIP instruction. Outputs DBO through DB7 are inputs to the logic 105 in Figure 2, and more specifically to teh SIGN input in logic 215 in Figure 6. Output DBO through DB7 control these sign inputs on BO through B7 in logic 213 in Figure 2.
The following table shows for each mux from which bit of the data register 203 the output of that mux is taken depending on the value of the BHWD control lines.
MUX Ha l f -word (H) Byte (B) Word (W) Doubleword (D) 7 23 31 39 31 31 47 3C1h 63 63 63 63 Table 1.
The following table contains pseudo-code for each type of clipping operation. The pseudo-code is with respect to each byte in the data register 203.
Type of clipping For each byte k:
clip - OR (D(O)M(O),... D(7)M(7)) (i.e., clip is 1 if any D(i) is 1 where m (i) is 1) signed input/ if clip=o then signed output if M(i)=1 then 0(i)=DB(k) else if M(i)=0 then 0(i)=D(i) else if clip=l then if M(i)=0 then 0(i)=not_DB(k) else if M(i)=1 then 0(i)=DB(k) signed input/ if clip=o then unsigned output if DB (k) =1 then 0 (i) =0 else if DB(k)=0 then if M (i) =0 then 0 (i) =D (i) elseif M(i)=1 then 0(i)=0 else if clip=l then if DB (k) =1 then 0 (i) =0 else if DB (k) =0 then if M(i)=0 then 0(i)=1 elseif M(i)=1 then 0(i)=0 unsigned input/ if clip=o then 0(i)=D(i) unsigned output elseif clip=l then ifM(i)=0 then 0(i)=1 else if M(i)=1 then 0(i)=0 Table 2.
13 For this embodiment, wherein the type of the input and the output is obtained from the clip instruction opcode, no data type information is encoded into the mask. Therefore, the mask does not depend upon the data type of the inputs and outputs. For example, the mask applied to clip either signed or unsigned half word (16 bits) integer inputs to either a signed or unsigned byte (8 bits) integer output is shown below:
63 0 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 Example 3.
When the mask shown as the bit pattern of Example 3 is applied to the unsigned input word shown as bitpattern (2) the following unsigned integer result is obtained (the first line is the input bit pattern of Example 2, the second line is the mask bit pattern of Example 3, and the third line is the result):
63 0 00000000 01101001 11111111 11001110 00001000 00110101 11110110 10101001 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 01101001 00000000 11111111 00000000 11111111 00000000 11111111 Example 4.
When the mask shown as the bit pattern of Example 3 is applied to the signed input word shown as bitpattern (2) the following signed integer result is obtained (the first line is the input bit pattern of Example 2, the second line is the mask bit pattern of Example 3, and the third line is the result):
63 0 00000000 01101001 11111111 11001110 00001000 00110101 11110110 10101001 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 01101001 11111111 11001110 00000000 01111111 11111111 10000000 Example 5.
When the mask shown as the bit pattern of Example 3 is applied to the signed input word shown as bitpattern (2) the following unsigned integer result is obtained (the first line is the input bit pattern of Example 2, the second line is the mask bit pattern of Example 3, and the third line is the result):
63 0 00000000 01101001 11111111 11001110 00001000 00110101 11110110 10101001 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 00000000 01101001 00000000 00000000 00000000 11111111 0000000000000000 Example 6.
Figure 4 is a schematic of one of the halfword logic circuits, namely, the halfword logic circuit 211d. The halfword logic 211d contains the two byte logic circuits 213g and 213h. The byte logic circuit 213h corresponds to bits 0 through 7 of the data, mask and output words. Similarly, the byte logic circuit 213g corresponds to bits 8 through 15. Each of the byte logic circuits 213 contain bit logic circuits 215 for each bit to which it corresponds.
The data and mask registers are connected to the clip logic so that the bits in the data and mask registers that correspond to one another are input into the byte logic circuit 213 to which those bits correspond. The byte logic circuits each contain eight XOR gates 401, each corresponding to one data bit. Each of the XOR gates 401 have two inputs, namely, the data bit to which it corresponds and the sign bit, which is determined for the corresponding byte as shown in Figure 3.
The outputs from the XOR gates 401 are bitwise-ANDed with their corresponding MASK bits in AND gates 403. The outputs from these ANDing operations are 0Red (in gate 405) together with the output from a size select gate 407. The output of the OR gate 405 is the CLIP signal of the pseudo-code of Table 2. The CLIP signal indicates whether the data needs to be clipped or if it can be passed through to the output register 205. The CLIP signal is input to each of the bit logic circuits 215.
Each byte logic circuit 213, except for the highest order byte logic circuit 213a, has a size select gate 407. Each size select gate 407 is an AND gate. One input to each size select gate 407 is the value of the CLIP signal for the next higher order byte logic circuit 213. Each of the size select gates 407 also receives as input a signal 409 that indicates whether the next higher order byte is part of the same input data as the present byte. The signal 409 depends on the size of the input block and where the particular byte in question fits into the input register 203. Figure 5 shows the inputs to the various size select gates 407 for the eight bytes in a 64 bit register. For each of byte logic circuits 213b, 213d, 213f, and 213h, the signal 409 is the value "H or W or W', where H is 1 if the input register is divided on half-word boundries, W is 1 if the input register is divided on word-boundries, and D is 1 if the input register is divided on double-word boundries (i.e., the register has only one input). If H, W, and D are all zero, the input register is divided on byte boundries.
For each of byte logic circuits 213c, 213e, and 213g, the signal 409 is "W or D".
Table 3 below is a truth table for the operation of the bit logic circuits 215 in Figure 6. Table 3 is divided into three sub-tables, one for signedinput/signed-output Table 3a, one for signed- 16 inputlunsigned-output Table 3b, and one for unsignedinput/unsigned-output Table 3c.
clip db (k) m (i) d (i) output(i) (sign) (mask) (data) 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 - 0 1 1 1 1. 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 17 Table 3a: Truth table for bit logic 215 for signedinput/signed-output case.
clip db (k) (sign) m (i) (mask) d (i) (data) output(i) 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 -- 0 0 0 1 11 0 1 1 0 0 0 1 Table 3b: Truth Table for bit logic 215 for signedinput/unsigned-output case.
clip db (k) m (i) d (i) output(i) (sign) (mask) (data) -0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 -0 1 0 0 0 0 - 9 1 0 1 -1 18 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 io 1 1 1 1 0 0 i 1 i 0 1 0 Table 3c: Truth Table for bit logic 215 for the unsigned-inputlunsignedoutput case.
Figure 6 is a schematic of the bit logic circuits 215 corresponding to the truth tables of Table 3.
In the embodiment discussed above in conjunction with Figures 1 through 6 the type of clipping is indicated by the instruction itself, by way of two opcode completer bits, y and z, and passed to the clipping circuit as signals IN - TYPE and OUT_TYPE. There are three alternative embodiments that use bits in the mask register to remove some or all of the requirement for having opcode completer bits y (signed/unsigned input) and/or z (signed/unsigned output).
The three alternative embodiments are summarized in Table 4.
19 Table 4: Use of Mask and completer bits to indicate data type of input and output in alternative embodiments.
Embodiment Mask: AXXX XXXX XXXX XXXX Ii. A:input type; A=O, signed; A=1, unsigned z completer in opcode controls output type selection.
Embodiment Mask: ABXX XXXX XXXX XXXX A:input type; A=O, signed; A=1, unsigned IF A=1, B is not controllable B: output, iff A indicates signed input, B=O, signed output B=1, unsigned output.
Embodiment Mask: AXXX XXXX XXXX XXXX IV. A:output type; A=O, signed; A=1, unsigned y completer in opcode for input type selection.
L- I In the second embodiment, the most significant bit of the mask, bit A, is used to indicate if the input data is signed or unsigned (An opcode completer, z, contains the output data type selection). This can be done because there is no reason to clip a 16 bit signed input to a 16 bit signed output; and for 16 bit signed nput data, the 16th bit must be the sign bit and hence would never be masked off. In the embodiment described below, for signed input data, A equals 0, and for unsigned input data, A equals 1.
In the third embodiment, the most significant bit in the mask, bit A, is used to indicate if the input data is signed or unsigned. If the input data is signed, the mask bit B tells the hardware state machine if the output should be clipped to a signed or unsigned value, thereby removing the need for opcode completer z. When B equals to 0, the hardware clips the output to signed integers, and when B equals 1, the hardware clips to unsigned integers. It is possible to use bit B to encode the output data type because when clipping to a 15 bit signed output, the fifteenth bit, which is the bit under mask bit B control, is the sign bit, and, thus, does not need to be clipped.
In the fourth embodiment, opcode completer bit y controls the input data type and mask bit A controls the output. When A equals 0, the hardware clips to signed output, and when A equals 1, the hardware clips to unsigned output. The use of the opcode completer y on the input specification provides the hardware earlier knowledge of the input typing to ease timing.
Table 4a contains examples for the fourth embodiment and the case where the input is a halfword for the case of signed output.
1X0x xxxx xxxx xxxx 0111 1111 1000 0000 1111 1111 1000 0000 (signed data) (mask) (signed result = -128) OXIX xxxx xxxx xxxX (signed data) 0111 1111 1000 0000 (mask) 0000 0000 0111 1111 (signed result = 127) 1XXX xxxx xxxx xxxx (signed data) 1111 1111 0000 0000 (mask) 0000 0000 0000 0000 (unsigned result = 0) Oxix xxxx xxxx xxxx (signed data) 1111 1111 0000 0000 (mask) 0000 0000 1111 1111 (unsigned result = 255) Table 4a: Examples of clipping using one completer bit and signed output.
The sign bit of the mask indicates whether the input is treated as signed or unsigned. The following psuedo code describes the clipping operation:
21 sb:- sign bit of input; IF mask[sb] - 0 IF any bit in (mask && data) data[sb] IF data[sb] - 1 result mask ELSE result!mask ELSE result data ELSE IF data[db] 1 result:= 0 ELSE IF mask && data!= 0 mask[sb]:=!mask[sb] result!mask ELSE result data.
Table 4b: Pseudo-code for clipping to signed output using one completer bit.
The following example illustrates clipping for the unsigned case using one completer bit:
CLIP, U data, mask, result lxxx xxxx xxxx xxxx (unsigned data) 1111 1111 0000 0000 (mask) 0000 0000 1111 1111 (result = 255) Table 4c: Clipping to unsigned result using one completer bit.
The corresponding pseudo-code is shown below:
IF mask && data!= 0 result!mask ELSE result data.
Table 4d: Pseudo-code for clipping to unsigned result using one completer bit.
The third embodiment uses the fact that the leading bit of the mask is always 1 for unsigned case. This is due to the fact that clipping to the full width of the 22 input data is not necessary, e.g., for half-words clipping to the full 16- bit range is never done. Similarly, for signed output, it does not make sense to clip to a 15-bit signed range. Therefore, it is possible to use the two leading bits of the mask to distinguish the cases.
The following half-word examples illustrate the third embodiment:
CLIP data, mask, resul 1X0x xxxx xxxx xxxx (signed data) 0011 1111 1000 0000 (mask) is 1111 1111 1000 0000 (result - -128) Oxix xxxx xxxx xxxx (signed data) 0011 1111 1000 0000 (mask) 0000 0000 0111 1111 (signed result - 127) 11xx xxxx xxxx xxxx (signed data) 0111 1111 0000 0000 (mask) 0000 0000 0000 0000 (unsigned result = 0) OXIX xxXX xxXx xxxX (signed data) 0111 1111 0000 0000 (mask) 0000 0000 1111 1111 (unsigned result =255) 1XXX xxxx xxxx xxxx (unsigned data) 30 1111 1111 0000 0000 (mask) 0000 0000 1111 1111 (unsigned result = 255) Table 5: Examples of clipping where clipping type is encoded in the first two bits of the mask.
The following pseudo-code shows the operation of the clipping operation wherein the type of clipping is encoded in the first two bits of the mask:
sb sign bit; sb2 sb - 0; IF mask[sb:sb21 10 mask[sb:sb2] 01 if any bit in (mask && data)!= data[sb] if data[sb] == 1 result:= mask 23 else result:-!mask; else result:= data; ELSE IF mask[sb:sb2] 01 if data[sb]---1 result:- 0 else if mask && data!= 0 mask[sb]:=!mask[sb] result!mask; else result data else if mask[sb:sb21 - 11 if mask data!= 0 result!mask else result data else ERROR! when mask[sb:sb2] = 10.
Table 5b: Pseudo-code for clipping where type of clipping is encoded in the first two bits of the mask.
The clipping logic 105 is operable to use either of the three alternative embodiments.
An alternative embodiment allows for variable width integers. In the alternative embodiment, each integer to be clipped may be of different width than each other integer stored in a common register, and each integer may be clipped to a different range.
As an example, the following 64-bit bit pattern contains six input elements, each 10 bits wide:
6 5 4 3 2 1 0000 0000011010 0111100110 1110111000 1010000011 0101111101 0010101001 Example 7.
Numbers 1 and 6 are positive numbers that fit in the output range; numbers 2 and 5 are positive numbers that do not fit the output range; number 4 is a negative 24 number that fits in the output range; and number 3 is a negative number that does not fit in the signed output range. The pseudo code of the first embodiment is applied for clipping. The following bit pattern is used to clip six ten-bit numbers to an a-bit unsigned range:
0000 1100000000 1100000000 1100000000 1100000000 1100000000 1100000000 Example 8.
The pseudo-code of Table 1 is used to perform the clipping operation. Applying the bit pattern of Example 8 to the bit pattern of Example 7, using the method of the pseudocode of Table 1, results in the following output:
0000 0000011010 0011111111 0000000000 0000000000 0011111111 0010101001 Example 9.
To clip an input to an eight-bit signed range the following mask bit pattern is applied:
0000 1110000000 1110000000 1110000000 1110000000 1110000000 1110000000 Example 10.
Applying the mask of Example 10 to the input of Example 7 using the signed output pseudo-code of Table produces the following result:
0000 0000011010 0001111111 1110111000 1110000000 0011111111 0010101001 The clipping instructions for the unsigned and signed integer clipping of the alternative embodiment,is:
CLIP t source, mask, target where t is either U, for unsigned, or S, for signed.
Note that the instruction gives no indication of input data size or whether parallel execution is to be used. The hardware automatically infers this information from the mask. The data is assumed to be right justified in the register. The first 1 in the mask indicates the positon of the sign bit of the first data element; the first 0 indicates the position of the highest order bit in the output field. The next 1 indicates the sign of the next number; the next 0 indicates the highest order bit in the next output field, and so on, until the end of the mask word.
26

Claims (29)

1 2 3 4 6 7 a 9 10 11 12 1 2 3 4 5 6 1 2 3 4 5 6 2 3 1. A method of operating a digital processor to clip binary input integers to a specified range using one instruction, comprising the steps of:
(a) defining a clip instruction having a mask field; (b) accepting a mask from said mask field wherein a bit is set in said mask for each out-of-range bit and not set for in-range bits; and (b) applying said mask to said input integers so that any integers outside of said range is clipped to the quantity in the range closest to said integer, thereby producing output integers within a range specified by said mask.
2. The method of Claim 1 wherein said masking operation is performed according to the following steps:
using said mask to determine for each byte whether clipping is necessary; if clipping is necessary using said mask to define where to set the output to zero and where to set the output to one.
3. The method of Claim 2 where the step of determining whether clipping is necessary comprises the further step of obtaining a clip signal of a next higher order byte and ANDing said higher order clip signal with a size expression for said byte, thereby producing a size select signal for said byte.
4. The method of Claim 3 further comprising the step of ANDing the input bits in the byte with their corresponding mask bits.
2 3 1 2 3 4 1 4 6 1 2 3 1 2 1 2 3 4
5 6 27 5. The method of Claim 4 further comprising the step of ORing the results of ANDing the input bits and the mask bits with the size select signal for said byte.
6. The method of Claim 1 wherein said clipping instruction causes said digital processor to clip input integers having a first data type to output integers having a second data type.
7. The method of Claim 6 wherein said first data type and said second data type are selected from the group containing the elements signed data and unsigned data.
8. The method of Claim 7 wherein said clipping instruction may indicate input data of both signed and unsigned integers and output clipped data of both signed and unsigned integers and wherein opcode completer bits of said instruction indicates the data types of integers in said input data and in said output data.
9. The method of Claim 8 wherein said clipping instruction contains opcode completer bits for specifying said data types.
10. The method of Claim 8 wherein said mask contains codes for specifying said data types.
11. The method of Claim 7 wherein said clipping instruction may indicate input data of both signed and unsigned integers and output clipped data of both signed and unsigned integers and wherein said mask indicates the data types of integers in said input data and in said output data.
1 2 3 4 6 a 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 28 12. The method of Claim 1 wherein said masking operation is performed according to the following steps: determine for each byte in the output whether clipping is necessary; determine clipping type; for signed input to signed output: if clipping is not necessary, for each set mask bit set the output bit to the sign bit of the input data block for each other bit set the output bit to the data bit if clipping is necessary, where the mask is set, set the output bit to the sign bit of the input data block where the mask is not set, set the output bit to the inverse of the sign bit of the input data block; for signed input to unsigned output: if clipping is not necessary if the input is negative, set the output to 0 if the input is positive, where the mask is zero copy the input to the output and where the mask is 1 set the output to 0 else if clipping is necessary if the input is negative, set the output to 0 else set the output to NOT mask; 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 1 2 3 4 1 2 3 29 for unsigned input to unsigned output: if clipping is not necessary write the input byte into the output destination else if clipping is necessary where the mask is set, set the output to 0 and where the mask is not set, set the output to 1.
13. A method of operating a digital processor to clip binary input integers in parallel, comprising the steps of:
(a) accepting input data having at least one input data block; (b) accepting a mask having a bit corresponding to each bit in said input data; and (c) applying said mask to said input data so that each input data block in said input data is clipped in parallel to produce output data containing said clipped integers.
14. The method of Claim 13 wherein said digital processor clips said integers in response to a single instruction having a mask field, a data field, and an output field, and wherein said mask field points to said mask, said data field points to said input data; and said output field points to a storage location for said output data.
15. The method of Claim 14 wherein said input integers and said output clipped integers are of specified data types from the set of unsigned integers and signed integers.
16. The method of Claim 15 wherein said data types of said input integers and output integers is indicated by op code completer bits.
1 2 3 1 2 3 4 5 7 a 9 10 11 12 13 14 15 16
17 is 19 20 21 22 23 24 25 26 27 28 29 17. The method of Claim 15 wherein at least one of said data types of said input integers and output integers is indicated by codes contained in said mask.
18. A circuit for clipping integers stored in an input source logically divided into at least one input data block, each input data block having a size, to produce a clipped result stored in an output destination, comprising: a control circuit operable to accept a clip instruction as input, said clip instruction defining the size of said input data blocks, wherein each input data block contains one integer; a mask source connected to said control circuit for storing a mask corresponding to each input data block; a clip determination circuit operable to compare said mask to input stored in said input source thereby determining whether clipping is required for each output block and to produce a clip signal whose value depends on said clip determination; and at least one bit-level clip circuit connected to said clip determination circuit, said input source, said mask source, and said output destination such that each bit-level clip circuit provides a one-to-one mapping between said input source and said output source, and wherein all bit-level clip circuits corresponding to one input data block are collectively selectively operable, responsive to said clip signal, to apply said mask to said input so that those bits corresponding to in-range bits in said output destination are set to a bit pattern that represents an in-range number closest in value to the value of the integer contained in said input data block.
1 2 3 4 1 2 3 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 1 2 31
19. The circuit of Claim 18 wherein each output block is a byte; and wherein said clip determination circuit produces a clip signal for each byte in said output destination.
20. The circuit of Claim 18 wherein said clip instruction specifies that each input data block is selected from the set of double word, word, half word, and byte.
21. The circuit of Claim 20 wherein said circuit for clipping integers each integer stored in said input source has a sign bit and wherein said circuit is hierarchically organized and further comprises: a byte sign select circuit; at least one byte-level circuit connected to said sign select circuit such that the sign select circuit provides to said bytelevel circuit the value of the sign bit of the integer corresponding to said byte-level circuit and wherein said byte-level circuit contains a plurality of said bit-level circuits.
22. The circuit of Claim 21 further comprising at least one additional level of hierarchical circuits each containing at least one byte-level circuit wherein control signals are routed f rom said additional levels of hierarchical circuits through said byte-level circuits to said bit- level circuits such that each said bit-level circuit responds to said clip signal to selectively apply said mask to a bit in said input data corresponding to said bit-level circuit.
23. The circuit of Claim 21 wherein the clip determination circuit is divided over said byte-level 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 a 9 1 25. The clip circuit of Claim 24 wherein said truth table 2 has the following values for signed input-to-signed 3 output clipping:
circuits such that each byte-level circuit, except for the byte-level circuit of highest order, contains a size select AND gate with one input connected to the clip signal of the next higher order byte-level circuit and a second input connected to a signal line whose value depends on the size of the input data block and the bytelevel circuits location relative to other byte-level circuits.
24. The circuit of Claim 21 wherein each bit-level c. ircuit is connected to said byte-level clip signal (c), said byte sign select circuit (db), a mask bit in said mask corresponding to said bit-level circuit (m), a data bit In said input source corresponding to said bit-level circuit (d). and a clipping-type signal and wherein said bit-level circuit comprises a plurality of logic gates which produces an output bit (o) according to a truth table.
33 clip byte mask data output 6 (c) sign (m) (d) (o) (db) 7 0 0 0 0 0 8 0 0 0 1 1 9 0 0 1 0 0 0 0 1 1 0 11 0 1 0 0 0 12 0 1 0 1 1 13 0 1 1 0 1 14 0 1 1 1 1 1 0 0 0 1 16 1 0 0 1 1 17 1 0 1 0 0 18 1 0 1 1 0 19 1 1 0 0 0 1 1 0 1 0 21 1 1 1 0 1 22 1 1 1 1 1 23 24 and the following values for signed input-to-unsigned
25 output clipping:
26 clip byte mask data output 27 (c) sign (m) (d) (o) (db) 28 0 0 0 0 0 29 0 0 0 1 1 0 0 1 0 0 31 0 0 1 1 0 32 0 1 0 0 0 33 0 1 0 1 0 34 36 37 38 39 41 42 43 44 45 46 47 48 49 so 51 52 53 54 55 56 57 58 34 0 1 1 0 0 1 1 1 0 0 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 and the following values for unsigned input-to-unsigned output clipping:
clip byte (c) sign (db) 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 mask data (m) (d) 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 output (o) 0 1 0 0 0 1 0 59 1 0 1 0 0 1 0 1 1 0 61 1 1 0 0 1 62 1 1 0 1 1 63 1 1 1 0 0 64 1 1 1 1 0 1 2 3 4 5 1 2 26. The circuit of Claim 18 wherein clipping is done according to a clipping type selected from the group having the members signed input-to- signed output, signed input-to-unsigned output, and unsigned input-to- unsigned output.
27. The circuit of Claim 26 wherein said clipping type is encoded in said mask.
28. A method of operating a digital processor to clip binary input integers substantially as herein described with reference to the accompanying drawings.
29. A circuit for clipping integers substantially as herein described with reference to the accompanying drawings.
GB9600781A 1995-01-17 1996-01-15 Clipping integers Withdrawn GB2300054A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US37373995A 1995-01-17 1995-01-17

Publications (2)

Publication Number Publication Date
GB9600781D0 GB9600781D0 (en) 1996-03-20
GB2300054A true GB2300054A (en) 1996-10-23

Family

ID=23473674

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9600781A Withdrawn GB2300054A (en) 1995-01-17 1996-01-15 Clipping integers

Country Status (3)

Country Link
JP (1) JPH08272591A (en)
DE (1) DE19601575A1 (en)
GB (1) GB2300054A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0845741A2 (en) * 1996-11-29 1998-06-03 Matsushita Electric Industrial Co., Ltd. Processor which can favorably execute a rounding process

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015143949A (en) * 2014-01-31 2015-08-06 富士通株式会社 Arithmetic program, arithmetic unit, and arithmetic method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0540155A2 (en) * 1991-10-29 1993-05-05 Advanced Micro Devices, Inc. Digital limit checking system
EP0540150A2 (en) * 1991-10-29 1993-05-05 Advanced Micro Devices, Inc. Improved arithmetic logic unit
EP0657804A1 (en) * 1993-12-08 1995-06-14 Hewlett-Packard Company Overflow control for arithmetic operations
EP0660226A2 (en) * 1993-12-27 1995-06-28 Nec Corporation Limiter circuit
EP0686910A1 (en) * 1994-06-10 1995-12-13 Nec Corporation Data processing system having a saturation arithmetic operation function

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3103914B2 (en) * 1992-08-21 2000-10-30 ソニー株式会社 Data rounding circuit and data restoration circuit
DE4304198A1 (en) * 1993-02-12 1994-08-18 Itt Ind Gmbh Deutsche Method of speeding up the data processing of a signal processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0540155A2 (en) * 1991-10-29 1993-05-05 Advanced Micro Devices, Inc. Digital limit checking system
EP0540150A2 (en) * 1991-10-29 1993-05-05 Advanced Micro Devices, Inc. Improved arithmetic logic unit
EP0657804A1 (en) * 1993-12-08 1995-06-14 Hewlett-Packard Company Overflow control for arithmetic operations
EP0660226A2 (en) * 1993-12-27 1995-06-28 Nec Corporation Limiter circuit
EP0686910A1 (en) * 1994-06-10 1995-12-13 Nec Corporation Data processing system having a saturation arithmetic operation function

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0845741A2 (en) * 1996-11-29 1998-06-03 Matsushita Electric Industrial Co., Ltd. Processor which can favorably execute a rounding process
EP0845741A3 (en) * 1996-11-29 2000-11-29 Matsushita Electric Industrial Co., Ltd. Processor which can favorably execute a rounding process
US6237084B1 (en) 1996-11-29 2001-05-22 Matsushita Electric Industrial Co., Ltd. Processor which can favorably execute a rounding process composed of positive conversion and saturated calculation processing
EP1306752A1 (en) * 1996-11-29 2003-05-02 Matsushita Electric Industrial Co., Ltd. Processor which can favourably execute a rounding process
USRE39121E1 (en) 1996-11-29 2006-06-06 Matsushita Electric Industrial Co., Ltd. Processor which can favorably execute a rounding process composed of positive conversion and saturated calculation processing
CN100356316C (en) * 1996-11-29 2007-12-19 松下电器产业株式会社 Processor for decoding and executing instruction
USRE43145E1 (en) 1996-11-29 2012-01-24 Panasonic Corporation Processor which can favorably execute a rounding process composed of positive conversion and saturated calculation processing
USRE43729E1 (en) 1996-11-29 2012-10-09 Panasonic Corporation Processor which can favorably execute a rounding process composed of positive conversion and saturated calculation processing

Also Published As

Publication number Publication date
JPH08272591A (en) 1996-10-18
DE19601575A1 (en) 1996-07-18
GB9600781D0 (en) 1996-03-20

Similar Documents

Publication Publication Date Title
US6877020B1 (en) Method and apparatus for matrix transposition
KR100348951B1 (en) Memory store from a register pair conditional
EP0657803B1 (en) Three input arithmetic logic unit
KR100266337B1 (en) Data processing circuit, semiconductor integrated circuit device, microcomputer, and electronic equipment
EP0655680B1 (en) Arithmetic and logic unit having a plurality of independent sections and a register for storing the status bits
US4219874A (en) Data processing device for variable length multibyte data fields
US6986025B2 (en) Conditional execution per lane
US5442577A (en) Sign-extension of immediate constants in an alu
US5390307A (en) Apparatus for a multi-data store or load instruction for transferring multiple contiguous storage locations in one transfer operation
US6412066B2 (en) Microprocessor employing branch instruction to set compression mode
EP0655676A2 (en) Three input arithmetic logic unit forming mixed arithmetic and boolean combinations
US6205534B1 (en) Apparatus and method for processing data with a plurality of flag groups
WO1991012576A1 (en) Parallel string processor and method for a minicomputer
US5896307A (en) Method for handling an underflow condition in a processor
US5905661A (en) Method for handling an overflow condition in a processor
JPS6254359A (en) Computer apparatus
US5689592A (en) Parallel processing of digital signals in a single arithmetic/logic unit
US5680632A (en) Method for providing an extensible register in the first and second data processing systems
US5801977A (en) System and method for clipping integers
US4999808A (en) Dual byte order data processor
US4611278A (en) Wraparound buffer for repetitive decimal numeric operations
US6499046B1 (en) Saturation detection apparatus and method therefor
US5265204A (en) Method and apparatus for bit operational process
GB2300054A (en) Clipping integers
US4575795A (en) Apparatus for detecting a predetermined character of a data string

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)