WO2007047167A2 - Fast rotator with embedded masking and method therefor - Google Patents

Fast rotator with embedded masking and method therefor Download PDF

Info

Publication number
WO2007047167A2
WO2007047167A2 PCT/US2006/039180 US2006039180W WO2007047167A2 WO 2007047167 A2 WO2007047167 A2 WO 2007047167A2 US 2006039180 W US2006039180 W US 2006039180W WO 2007047167 A2 WO2007047167 A2 WO 2007047167A2
Authority
WO
WIPO (PCT)
Prior art keywords
operand
rotator
input
shift
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2006/039180
Other languages
English (en)
French (fr)
Other versions
WO2007047167A3 (en
Inventor
Lincoln R. Nunes
Albert N. Danysh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NXP USA Inc
Original Assignee
Freescale Semiconductor Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Freescale Semiconductor Inc filed Critical Freescale Semiconductor Inc
Priority to JP2008536674A priority Critical patent/JP2009512090A/ja
Publication of WO2007047167A2 publication Critical patent/WO2007047167A2/en
Publication of WO2007047167A3 publication Critical patent/WO2007047167A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/76Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
    • G06F7/764Masking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/017Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising using recirculating storage elements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/501Half or full adders, i.e. basic adder cells for one denomination
    • G06F7/503Half or full adders, i.e. basic adder cells for one denomination using carry switching, i.e. the incoming carry being connected directly, or only via an inverter, to the carry output under control of a carry propagate signal
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/76Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
    • G06F7/768Data position reversal, e.g. bit reversal, byte swapping

Definitions

  • the present disclosure is generally related to arithmetic circuits, and more particularly to systems for rotating and shifting operands in an integrated circuit.
  • a data processor requires a variety of shift operations to implement its instruction set.
  • the shift operations may include left shifts, right shifts, and rotates.
  • the shifts can be arithmetic or logical, which determines how bits either end of the operand are handled.
  • Each shift or rotate operation has a variable length. Which bit is shifted into a given bit position is determined by the type of shift operation and the rotate amount.
  • a simple shift register stores an input operand in parallel, and then shifts the operand serially by one bit position for each clock cycle. When the operand has been shifted by the desired number of bits, the result is read out of the shift register in parallel.
  • Another type of shifter is a barrel shifter.
  • the barrel shifter includes connections from each bit of a source operand to each bit of a destination operand. Thus, the barrel shifter can perform a shift instruction by any arbitrary number of bit positions.
  • Barrel shifters conventionally include two registers each of which function as either the source register or the destination register of the shift operation, depending on the direction.
  • the source and destination registers are coupled to a shifter array, which is essentially an M-by-M matrix of transistors, where M is the operand size. Barrel shifters are fast but require large amounts of circuit area.
  • a data processor may also be required to support vector operations, also known as single instruction multiple data (SIMD).
  • SIMD single instruction multiple data
  • the data processor is required to perform arithmetic and logical operations, including shift and rotate operations, on vector operands.
  • the vector operands can be of varying size.
  • One known technique for performing shifts in vector processors is to have multiple shifters in parallel that support each possible vector size. However, this technique requires multiple barrel shifters and large amounts of circuit area.
  • FIG. 1 is a block diagram of a rotator system according to the present invention.
  • FIG. 2 is a block diagram of a rotator system according to another embodiment of the present invention.
  • FIG. 3 illustrates in block diagram form a circuit that forms a part of the decoder and the rotator and masking module of FIG. 2;
  • FIG. 4 illustrates in block diagram form a circuit 400 that illustrates circuit 300 of FIG. 3 in greater detail.
  • a system and method of rotating an operand includes a first decoder with a first input to receive an operand size indicating one of a plurality of operand sizes, a second input for receiving a rotate amount signal and a control output to provide a plurality of control signals.
  • the system also includes a rotator with a first input connected to the control output of the first decoder, a second input to receive a data element and an output to provide rotated data.
  • the rotator is responsive to the plurality of control signals to rotate portions of the data element corresponding to one of the plurality of operand sizes by an amount corresponding to the rotate amount signal.
  • the first decoder further has a third input for receiving a shift type, and provides the plurality of control signals further in response to the shift type.
  • the rotator is further responsive to the plurality of control signals to perform a masking operation on a rotated data element to provide shifted data to the output thereof.
  • the system includes a second decoder with a first input to receive the operand size, a second input to receive a shift amount signal and a control output to provide a plurality of masking control signals.
  • the system also includes a masking module with a first input coupled to the control output of the second decoder, a second input coupled to an output of the rotator module, and an output to provide shifted data.
  • the masking module is responsive to the plurality of masking control signals to shift portions of the data element corresponding to one of the plurality of operand sizes by an amount corresponding to the shift amount signal.
  • the first decoder includes first decoding logic to decode a first portion of the rotate amount signal and second decoding logic to decode a second portion of the rotate amount signal.
  • the rotator includes a first stage of multiplexers with an input coupled to an output of the first decoding logic, the first stage of multiplexers to partially shift the data element.
  • the rotator includes a second stage of multiplexers with a first input coupled to an output of the second decoding logic and a second input coupled to an output of the first stage of multiplexers to receive the partially shifted data element, the second stage of multiplexers to provide the rotated data.
  • the decoder includes third decoding logic to decode a third portion of the rotate amount and the rotator includes a third stage of multiplexers.
  • the plurality of operand sizes includes a byte size, a half word size, and a word size. In another particular aspect, the plurality of operand sizes includes a double word size or other multiples of the word size.
  • the rotator is responsive to the plurality of control signals to rotate portions of the vector data in a leftward or rightward direction.
  • the system includes a decoder with a first input to receive an operand size indicating one of a plurality of operand sizes, a second input for receiving a rotation amount, a third input for receiving a shift amount, a control output to provide plurality of control signals.
  • the system also includes a rotator and mask logic circuit with a first input coupled to the control output of the decoder, a second input to receive a data element and an output to provide rotated or shifted data, wherein the rotator and shifter is responsive to the plurality of control signals to rotate or shift portions of the data element corresponding to one of the plurality of operand sizes by an amount corresponding to the rotation amount or the shift amount signal.
  • the rotator and shifter includes a first stage of multiplexers responsive to a first portion of the plurality of control signals to partially rotate or shift the data element.
  • the rotator and shifter further includes a second stage of multiplexers to receive the partially rotated data element and provide the partially rotated or shifted data element and to further rotate or shift the data element.
  • the first stage of multiplexers includes a sign extend input, and the first stage of multiplexers is responsive to the sign extend input to shift the data element.
  • the method includes receiving a first operand size indicating one of a plurality of operand sizes at a first decoder at a first time and receiving a rotate amount signal at the first decoder.
  • the method also includes providing a plurality of control signals from the first decoder to a rotator and rotating portions of a data element corresponding to one of the plurality of operand sizes by an amount corresponding to the rotate amount signal.
  • the method further includes receiving a second operand size at the first decoder at a second time, the second operand size different from the first operand size.
  • the method also includes rotating portions of the data element corresponding to the second operand size by an amount corresponding to the rotate amount signal.
  • the method includes receiving a shift amount signal at the first decoder, shifting portions of the vector data corresponding to one of the plurality of operand sizes by an amount corresponding to the shift amount signal.
  • the portions of the vector data are shifted in a manner corresponding to an algebraic right shift operation.
  • the plurality of operand sizes includes a byte size, a half word size, and a word size.
  • the plurality of operand sizes includes a double word size or other multiples of the word size.
  • the operand rotator 100 includes a first decoder 102, a rotator 104, a second decoder 106, and a masking module 108.
  • the first decoder 102 has first control input terminals for receiving shift amount signals labeled "Amw,” “Amh,” and “Amb,” second control input terminals for receiving operand size signals labeled "B/H/W,” and a plurality of control output terminals.
  • the rotator 104 has a data input for receiving an input operand labeled "VA[0:31]," a set of control input terminals connected to corresponding ones of the control output terminals of decoder 102, and a data output terminal for providing a rotated data signal.
  • the second decoder 106 has first control input terminals for receiving shift amount signals Amw, Amh, and Amb, second control input terminals for receiving shift type signals labeled "Rot/Shf.”
  • the signals labeled Rot/Shf include an op code indicating whether the operation should be a shift or rotate operation, a shift left or shift right operation, and a logical shift or arithmetic shift operation.
  • the second decoder 106 also includes a plurality of control output terminals.
  • the masking module 108 has a data output terminal for providing a data output signal labeled "FINAL RESULT.”
  • the operand rotator 100 is a vector rotator capable of performing rotation and shift operations on operands capable of being represented in different vector formats.
  • the operand rotator 100 supports shifts and rotates on 8-, 16-, and 32- bit (i.e., byte, half-word, and word) vector operands.
  • Other operand sizes such as double word sizes or other multiples of the word size, can be supported in alternative embodiments.
  • the operand rotator 100 is capable of rotating different portions of the vector operand by different amounts. For example, for a 32-bit operand, the operand rotator 100 may shift the first half-word of the operand by a first amount and the second half-word of the operand by a second amount. [0027] The operand rotator 100 performs shifts and operates in two steps. . In the first step, the bits are rotated in a rotation operation by the rotator 104. In the second step, certain bit positions are masked to handle boundary conditions by the masking module 108 to convert the simple rotation operation into arithmetic shifts or logical shifts as determined by the instruction.
  • the first decoder 102 decodes the control signals Amw, Amh, and Amb as well as the vector size B/H/W. Based on the decoded control signals, the rotator 104 rotates each portion of the vector operand by the appropriate amount.
  • the operand rotator 100 converts simple rotation operations into shift operations by using the second decoder 106 and the masking module 108.
  • the masking module 108 is responsive to the control signals Amw, Amh, and Amb, as well as the shift type signals Rot/Shf to determine the type of shift and boundary conditions of the shift to be performed.
  • the masking module 108 applies a mask to determine the value to be inserted into vacated bit positions, such as a sign bit after the arithmetic shift operation.
  • rotator 100 By performing additional decoding using both the rotat ⁇ amount and the vector size, rotator 100 is able to generate control signals in a single 32-by-32 matrix to handle all supported shift amounts and vector sizes.
  • decoder 102 may be somewhat larger than that of a comparable barrel shifter
  • the matrix in rotator 104 is approximately the same size as a shift array used for a 32-by-32 barrel shifter.
  • the operand rotator 100 saves significant amounts of circuit area in vector processors because it can be used for all supported vector sizes, and may independently shift or rotate different portions of a particular operand.
  • the operand rotator saves power and is faster than some other solutions.
  • the operand rotator includes a decoder 202 and a rotator and masking module 204.
  • the decoder 202 includes first control input terminals for receiving shift amount signals labeled "Amw,” “Amh,” and “Amb,” second control input terminals for receiving operand size signals labeled "B/H/W,” third control input terminals for receiving a shift type signals labeled "Rot/Shf and a plurality of control output terminals.
  • the rotator and masking module 204 has a data input for receiving an input operand labeled "VA[0:31]," a set of control input terminals connected to corresponding ones of the control output terminals of the decoder 202, and a data output terminal for providing a data output signal labeled "FINAL RESULT.”
  • the operand rotator 200 is a vector rotator capable of performing rotation and shift operations on operands capable of being represented in different vector formats.
  • the operand rotator 200 supports shifts and rotates on 8-, 16-, and 32- bit (i.e., byte, half-word, and word) vector operands but other operand sizes, such as double word sizes, can be supported in alternative embodiments.
  • the decoder 202 decodes the control signals Amw, Amh, and Amb, as well as the control signals B/H/W and Rot/Shf. Based on the decoded control signals, the rotator and masking module 204 rotates each portion of the vector operand by the appropriate amount and performs masking to handle boundary conditions for various supported shift operations.
  • operand rotator 200 integrates the rotation and masking functions into a single circuit, saving additional circuit area, additional power, and providing additional speed.
  • FIG. 3 illustrates in block diagram form a circuit 300 that forms a part of decoder 202 and rotator and masking module 204 of FIG. 2.
  • the circuit includes a first decoder 302, a second decoder 304, and a first stage of multiplexers including a first multiplexer 306, a second multiplexer 308, a third multiplexer 310 and a fourth multiplexer 312.
  • the system further includes a sign extend module 314 and a second stage of multiplexers including a fifth multiplexer 316, a sixth multiplexer 318, a seventh multiplexer 320, and an eighth multiplexer 322.
  • the system also includes an output register 324, and input registers 326, 328, and 330.
  • the first decoder 302 has first control input terminals for receiving shift type signals labeled "Il RS-OP" stored in the second input register 328, second control input terminals for receiving shift amount signals labeled "Il RS-VB” stored in the third input register 330, and a plurality of control output terminals.
  • the second decoder 304 also has first control input terminals for receiving shift amount labeled "Il RS-OP,” second control input terminals for receiving shift amount signals labeled "Il RS-VB,” and a plurality of control output terminals.
  • control inputs for receiving shift type signals labeled "Il RS-OP," data inputs for receiving 4 bits of an input operand labeled "Il RS -VA” stored in the first input register 326, and a data output terminal.
  • the multiplexers 306, 308, 310, and 312 are each comprised of two multiplexers, labeled "MOA,” “MOB,” “MlA,” “MlB,” “M2A,” “M2B,” “M3A,” and “M3B” respectively.
  • Each of the multiplexers 306, 308, 310 and 312 have first data inputs for receiving the input operand labeled Il RS -VA, second data inputs corresponding to the data output terminal of the sign extend module 314, and a plurality of control input terminals connected to corresponding ones of the control output terminals of the first decoder 302.
  • the first multiplexer 306 includes a data output terminal for providing a data output signal labeled "M0_res[0:15].”
  • the second multiplexer 308 includes a data output terminal for providing a data output signal labeled "Ml_res[0:15].”
  • the third multiplexer 310 includes a data output terminal for providing a data output signal labeled "M2_res[0:15].”
  • the fourth multiplexer 312 includes a data output terminal for providing a data output signal labeled "M3_res[0: 15].”
  • the multiplexers 316, 318, 320 and 322 each have first data inputs for receiving the corresponding data output of the multiplexers 306, 308, 310 and 312 respectively and a plurality of control input terminals connected to corresponding ones of the control output terminals of the second decoder 304.
  • the fifth multiplexer 316 includes a data output terminal for providing a data output signal labeled "R[0:7]”
  • the sixth multiplexer 318 includes a data output terminal for providing a data output signal labeled "R[8:15].”
  • the seventh multiplexer 320 includes a data output terminal for providing a data output signal labeled "R[16:23].”
  • the eighth multiplexer 322 includes a data output terminal for providing a data output signal labeled "R[24:31].”
  • the first decoder 302 receives the higher or more significant bits of a rotate amount signal, an operand size and shift type signal. The decoder 302 decodes these received bits to provide control signals to the first stage of multiplexers.
  • the first stage of multiplexers 306, 308, 310 and 312 receives a vector data element and the sign extend signal from the sign extend module 314 based on the control signals provided by the first decoder 302 provides a shifted output based on the received data element.
  • the first stage of multiplexers performs a coarse shifting operation. In particular, the first stage of multiplexers operates to perform a shift operation on coarse portions of the data element. For example, if the data element is 32 bits long, the first stage of multiplexers shifts each byte or word that comprises the data element.
  • the multiplexers receives data from the sign extend module 314.
  • the sign extend module 314 is used to apply a masking or shift operation to the first stage of multiplexers.
  • the sign extend module 314 can be used to place ones or zeroes in the data element in such a way as to perform a masking operation on the data element in order to modify a rotate operation into a shift operation.
  • the second decoder 304 decodes the lower three bits of a rotation amount.
  • the second decoder 304 also receives an operand size and shift type. Based on these inputs, the second decoder 304 provides control signals to the second stage of multiplexers 316, 318, 320, and 322.
  • the second stage of multiplexers receives an output of the first stage of multiplexers.
  • the second stage of multiplexers then rotates the output of the first stage of multiplexers based on the control signals provided by the second decoder 304.
  • the second stage of multiplexers performs a "fine" shift operation.
  • each of the multiplexers 316, 318, 320, 322 receives a 16 bits from the first stage of multiplexers and performs a shift or rotate operation on the received bits. After the bits have been rotated, the rotated bits are integrated into a single operand at the register 324.
  • the register 324 thus stores the rotated and shifted result.
  • rotate and shift operations may be performed using a single stage of multiplexers. More than
  • multiplexers may also be used. Further, the multiplexers may be configured so that first and second decoders are reversed.
  • FIG. 4 illustrates in block diagram form a circuit 400 that illustrates circuit 300 of FIG. 3 in greater detail.
  • the system 400 can be used to implement the rotate circuit 300 of FIG. 3.
  • the system 400 includes a first decoder module 402, a second decoder module 406 and a sign extend module 404.
  • the system also includes a first stage of multiplexers, comprised of multiplexers 408, 410, 412, 414, 416, 418, 420, and 422.
  • the system 400 further includes a second stage of multiplexers, comprised of multiplexers 424, 426, 428, and 430.
  • the first decoder module 402 includes first control input terminals for receiving shift amount signals, labeled "VB[12,27:28] 5 " second control input terminals for receiving shift type signals, labeled "Shf/Rot,” third control input terminals for receiving an operand size, labeled "Byte,” “Half,” and “Word,” and fourth control input terminals for receiving shift directions signals, labeled "Left/Right.”
  • the first decoder module 402 further includes a plurality of control outputs for providing control signals.
  • the second decoder module 406 includes first control input terminals for receiving shift amount signals, labeled "VB[5:7, 13:15, 21:23, 29:31]," second control input terminals for receiving an operand size, labeled "Byte,” “Half,” and “Word,” and third control input terminals for receiving shift directions signals, labeled "Left/Right.”
  • the second decoder module 406 further includes a plurality of control outputs for providing control signals
  • the system 400 receives an input operand, labeled "VA[0:31]."
  • Each of the first stage of multiplexers including multiplexers 408, 410, 412, 414, 416, 418, 420 and 422, receive a plurality of data inputs based on the input operand.
  • the multiplexer 408 receives a plurality of data inputs, each input including a portion of the bits that comprise the input operand. Therefore, as illustrated, the multiplexer 408 receives a data input labeled "0:7" which consists of bits 0 through 7 of the input operand.
  • the other multiplexers included in the first stage of multiplexers receive similar data inputs.
  • the sign extend module 404 includes first data inputs for receiving a plurality of sign bits, labeled "VA[0,8, 16,24]," first control inputs for receiving a 6 039180
  • the sign extend module 404 also includes a plurality of data outputs, labeled "SO,” “Sl,” “S2” and “S3.” Each of the first stage of multiplexers includes a data input connected to a corresponding one of the data outputs of the sign extend module 404. Thus, the multiplexers 408 and 410 each include a data input corresponding to the "SO" data output of the sign extend module 404.
  • each of the first stage of multiplexers include a plurality of control inputs that are connected to corresponding ones of the control outputs of the first decoder 402.
  • Each of the first stage of multiplexers also includes a data output.
  • Each of the second stage of multiplexers including the multiplexer 424, the multiplexer 426, the multiplexer 428, and the multiplexer 430, each include a plurality of data inputs based on a data output of one or more of the first stage of multiplexers.
  • the data input of the multiplexer 424 is connected to the data output of the multiplexer 408 and the multiplexer 410.
  • the data output of the multiplexers 408 and 410 form a sixteen bit half word labeled "R0[0:15]."
  • the input to the multiplexer 424 is based on certain bits of the half word RO[0: 15].
  • the first input of the multiplexer 424 consists of bits 0:7 of the half word R0[0:15], while the second input consists of bits 1:8 of the half word R0[0:15].
  • the multiplexers 426, 428, and 430 are configured in a similar fashion, based on different outputs of the first stage of multiplexers.
  • Each of the second stage of multiplexers also includes a plurality of control inputs connected to corresponding ones of the control outputs of the second decoder module 406.
  • each of the second stage of multiplexers includes a data output.
  • the multiplexer 424 provides a data output labeled ROT RES[0:7].
  • the outputs of each of the multiplexer 424, 426, 428 and 430 may be integrated in an appropriate fashion, such as placed in a 32 bit register, to produce a rotation result.
  • the first decoder module 402 decodes the received shift amount, shift type, operand size, and shift direction signals to produce control signals for the first stage of multiplexers. Based on these control signals, each of the first stage of multiplexers selects one of the pluralities of inputs to provide as an output. Thus, for example, the multiplexer 414 can select the input 0:7, 8:15, 16:23, or 23:31 to provide as an output. In this fashion, each of the first stage of multiplexers performs a coarse shift on the input operand VA[0:31].
  • the sign extend module 404 provides data to the first stage of multiplexers based on the control signals provided sign extend module.
  • the data provided by the sign extend module 404 is selected by each multiplexer of the first stage of multiplexers to apply the proper boundary conditions according to the shift type, shift direction, and other control signals.
  • the output of each of the first stage of multiplexers is therefore based on one of the inputs provided to each multiplexer and the data provided by the sign extend module.
  • the second decoder module 406 decodes the received shift amount, operand size, and shift direction signals to produce control signals for the second stage of multiplexers. Based on these control signals, each of the second stage of multiplexers selects one of the pluralities of inputs to provide as an output.
  • the multiplexer 424 can select the input 0:7, 1:8, 2:9, 3:10, 4:11, 5:12, 6:13, 7:14, or 8:15 to provide as an output. In this fashion, each of the second stage of multiplexers performs a fine shift on the corresponding output of the first stage of multiplexers.
  • the outputs of the second stage multiplexers are integrated to form the final rotated or shifted result.
  • the system of FIG. 4 may also be configured to perform operations on scalar elements. In that case, assuming that the scalar operands are always the same size, the decoders may not be provided an operand size. Furthermore, another data input may be provided to each of the second stage multiplexers. These data inputs may injection bits from the sign extend module or other appropriate source to perform a second masking operation. This second masking operation allows the system to perform "injected masking" operations, or other operations, to perform the appropriate shifts - 2006/039180
  • the second stage multiplexers are larger than those illustrated in FIG. 4 to accommodate the new data input but the first stage is simplified and can have just half plus one of the multiplexers used in FIG. 4.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • Executing Machine-Instructions (AREA)
PCT/US2006/039180 2005-10-17 2006-10-04 Fast rotator with embedded masking and method therefor Ceased WO2007047167A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008536674A JP2009512090A (ja) 2005-10-17 2006-10-04 埋め込み型マスキングを備える高速ローテータ及びその方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/252,061 2005-10-17
US11/252,061 US20070088772A1 (en) 2005-10-17 2005-10-17 Fast rotator with embedded masking and method therefor

Publications (2)

Publication Number Publication Date
WO2007047167A2 true WO2007047167A2 (en) 2007-04-26
WO2007047167A3 WO2007047167A3 (en) 2008-01-17

Family

ID=37949361

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/039180 Ceased WO2007047167A2 (en) 2005-10-17 2006-10-04 Fast rotator with embedded masking and method therefor

Country Status (4)

Country Link
US (1) US20070088772A1 (enExample)
JP (1) JP2009512090A (enExample)
KR (1) KR20080049825A (enExample)
WO (1) WO2007047167A2 (enExample)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602006005020D1 (de) * 2005-05-04 2009-03-19 St Microelectronics Sa Ringschieberegister
FR2914447B1 (fr) * 2007-03-28 2009-06-26 St Microelectronics Sa Dispositif electronique de decalage de donnees en particulier pour du codage/decodage avec un code ldpc
US8041755B2 (en) * 2007-06-08 2011-10-18 Apple Inc. Fast static rotator/shifter with non two's complemented decode and fast mask generation
JP5206603B2 (ja) * 2009-07-01 2013-06-12 富士通株式会社 シフト演算器
US8356145B2 (en) * 2010-01-15 2013-01-15 Qualcomm Incorporated Multi-stage multiplexing operation including combined selection and data alignment or data replication
US8768989B2 (en) * 2011-03-18 2014-07-01 Apple Inc. Funnel shifter implementation
US8972469B2 (en) 2011-06-30 2015-03-03 Apple Inc. Multi-mode combined rotator
US20130151820A1 (en) * 2011-12-09 2013-06-13 Advanced Micro Devices, Inc. Method and apparatus for rotating and shifting data during an execution pipeline cycle of a processor
US10289382B2 (en) 2012-12-20 2019-05-14 Wave Computing, Inc. Selectively combinable directional shifters
US9933996B2 (en) * 2012-12-20 2018-04-03 Wave Computing, Inc. Selectively combinable shifters
US9419792B2 (en) * 2012-12-28 2016-08-16 Intel Corporation Instruction for accelerating SNOW 3G wireless security algorithm
US9490971B2 (en) * 2012-12-28 2016-11-08 Intel Corporation Instruction for fast ZUC algorithm processing
US9904511B2 (en) * 2014-11-14 2018-02-27 Cavium, Inc. High performance shifter circuit
US9904545B2 (en) * 2015-07-06 2018-02-27 Samsung Electronics Co., Ltd. Bit-masked variable-precision barrel shifter
GB2637295A (en) * 2024-01-10 2025-07-23 Imagination Tech Ltd Vector bitwise rotations

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4139899A (en) * 1976-10-18 1979-02-13 Burroughs Corporation Shift network having a mask generator and a rotator
US4396994A (en) * 1980-12-31 1983-08-02 Bell Telephone Laboratories, Incorporated Data shifting and rotating apparatus
US4653019A (en) * 1984-04-19 1987-03-24 Concurrent Computer Corporation High speed barrel shifter
JPH02197919A (ja) * 1989-01-27 1990-08-06 Matsushita Electric Ind Co Ltd 異種サイズ対応ローテータとシフタ
US5961635A (en) * 1993-11-30 1999-10-05 Texas Instruments Incorporated Three input arithmetic logic unit with barrel rotator and mask generator
US6116768A (en) * 1993-11-30 2000-09-12 Texas Instruments Incorporated Three input arithmetic logic unit with barrel rotator
US5652718A (en) * 1995-05-26 1997-07-29 National Semiconductor Corporation Barrel shifter
US5729482A (en) * 1995-10-31 1998-03-17 Lsi Logic Corporation Microprocessor shifter using rotation and masking operations
US5844825A (en) * 1996-09-03 1998-12-01 Wang; Song-Tine Bidirectional shifter circuit
US5822231A (en) * 1996-10-31 1998-10-13 Samsung Electronics Co., Ltd. Ternary based shifter that supports multiple data types for shift functions
US6260055B1 (en) * 1997-10-15 2001-07-10 Kabushiki Kaisha Toshiba Data split parallel shifter and parallel adder/subtractor
US6098087A (en) * 1998-04-23 2000-08-01 Infineon Technologies North America Corp. Method and apparatus for performing shift operations on packed data
US6393446B1 (en) * 1999-06-30 2002-05-21 International Business Machines Corporation 32-bit and 64-bit dual mode rotator

Also Published As

Publication number Publication date
JP2009512090A (ja) 2009-03-19
KR20080049825A (ko) 2008-06-04
US20070088772A1 (en) 2007-04-19
WO2007047167A3 (en) 2008-01-17

Similar Documents

Publication Publication Date Title
US10649772B2 (en) Method and apparatus for efficient matrix transpose
US8909901B2 (en) Permute operations with flexible zero control
US20070088772A1 (en) Fast rotator with embedded masking and method therefor
EP2798464B1 (en) Packed rotate processors, methods, systems, and instructions
EP1267257A2 (en) Conditional execution per data path slice
JPH11249894A (ja) 処理デバイスに命令ストリームを供給する方法及び装置
US10459728B2 (en) Apparatus and method of improved insert instructions
EP2919112B1 (en) Packed two source inter-element shift merge processors, methods, systems, and instructions
US10909259B2 (en) Instruction execution that broadcasts and masks data values at different levels of granularity
CN104583980A (zh) 用于响应于单个指令来执行循环和异或的系统、装置和方法
JPH0916397A (ja) 複数のコンピュータ・ワードにパックされている複数のサブ・ワード・アイテムの選択混合システム
US10719317B2 (en) Hardware apparatuses and methods relating to elemental register accesses
US20200249955A1 (en) Pair merge execution units for microinstructions
US7761694B2 (en) Execution unit for performing shuffle and other operations
EP3394755B1 (en) Apparatus and method for enforcement of reserved bits
US20160139924A1 (en) Machine Level Instructions to Compute a 4D Z-Curve Index from 4D Coordinates
KR102528073B1 (ko) 벡터 비트 수집을 수행하기 위한 방법 및 장치
EP1267255A2 (en) Conditional branch execution in a processor with multiple data paths
US20080148018A1 (en) Shift Processing Unit
US11385897B2 (en) Merge execution unit for microinstructions
US20040024992A1 (en) Decoding method for a multi-length-mode instruction set
US11074213B2 (en) Apparatuses, methods, and systems for vector processor architecture having an array of identical circuit blocks
US6976049B2 (en) Method and apparatus for implementing single/dual packed multi-way addition instructions having accumulation options
WO2019005151A1 (en) SYSTEMS, APPARATUSES, AND METHODS FOR ADDING-MULTIPLYING COMPLEX DOUBLE NUMBERS OF SIGNED WORDS
US8572147B2 (en) Method for implementing a bit-reversed increment in a data processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2008536674

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020087009079

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06836212

Country of ref document: EP

Kind code of ref document: A2