WO2007057832A2 - Vector shuffle unit - Google Patents
Vector shuffle unit Download PDFInfo
- Publication number
- WO2007057832A2 WO2007057832A2 PCT/IB2006/054214 IB2006054214W WO2007057832A2 WO 2007057832 A2 WO2007057832 A2 WO 2007057832A2 IB 2006054214 W IB2006054214 W IB 2006054214W WO 2007057832 A2 WO2007057832 A2 WO 2007057832A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- shuffle
- multiplexer units
- bit
- control information
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/76—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data
- G06F7/762—Arrangements for rearranging, permuting or selecting data according to predetermined rules, independently of the content of the data having at least two separately controlled rearrangement levels, e.g. multistage interconnection networks
Definitions
- the invention relates to a data processing apparatus and method, and in particular to a data processing apparatus and method having power reduction when processing vectors.
- the invention further relates to a device, such as a mobile phone, PDA or alike, comprising such data processing apparatus.
- Power efficiency for processor based equipment is becoming increasingly important.
- a number of techniques have been used to reduce power usage. These include designing the processor's circuitry to use less power, or designing the processor in a manner which allows power usage to be managed. Also, for a given processor architecture, power consumption can be saved by optimizing its programming.
- mapping of several similar operations onto one piece of hardware is quite common in the area of processor design. This often means that the result is sub-optimal for each of the specific operations. Therefore, the mapping of several operations onto one piece of hardware tends to result in higher power dissipation per operation when compared to dedicated circuitry being provided for each specific operation.
- 64 bit vector i.e. using 8 bytes, in which an output vector 3 can be a shuffled version of the input vector 5
- the 64 bit vector shuffle operation is performed using a vector shuffle unit configured around four base multiplexers (muxO, muxl, mux2, mux3 - not shown).
- Each base multiplexer has a distance of 4 bytes between its inputs. Since the vector in this example has 8 elements of 1 byte, each of the base multiplexers will be a 2:1 multiplexer.
- Fig. 2 shows the configuration for the first base multiplexer, muxO.
- byte 0 in the output vector 3 can therefore come from byte 0 or byte 4 of the input vector 5.
- this configuration provides all the shuffling options for shuffling vector elements of 32 bits.
- Fig. 2 shows how an input vector having 64 bits, comprising two elements of 32 bits each (i.e. first element being bytes 0-3 and the second element being bytes 4-7), can be shuffled to provide an output vector 3 in which bytes 0-3 come from bytes 4-7 of the input, and bytes 4-7 come from bytes 0-3 of the input.
- Fig. 3 shows the connection to byte 0 for each of the base multiplexers (muxO, muxl, mux2, mux3) in a 64 bit vector comprising 8 bytes.
- muxO, byte 0 in the output can come from byte 0 or byte 4 of the input.
- muxl, byte 0 in the output can come from byte 1 or byte 5 of the input.
- mux2, byte 0 in the output can come from byte 2 or byte 6 of the input.
- mux3, byte 0 in the output can come from byte 3 or byte 7 of the input.
- Fig. 4 shows a conventional vector shuffle unit 1 that is capable of performing vector shuffle operations for vectors having element sizes of 8, 16 and 32 bits.
- the vector shuffle unit 1 comprises a register 7 for storing an input vector 5.
- the register 7 is connected to each of one of four base multiplexer units, muxO, muxl, mux2, mux3, using appropriate bus connections 9.
- the output of each base multiplexer unit muxO, muxl, mux2, mux3 is connected to an output multiplexer 11, again using appropriate bus connections 13.
- Table 1 below illustrates how, for certain element sizes, only some of the base multiplexer units muxO, muxl, mux2, mux3 are utilized.
- the aim of the present invention is to provide a data processing apparatus and method for shuffling vectors having different sized elements, but without wasting power consumption.
- a data processing apparatus for performing vector shuffle operations on an input vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element.
- the data processing apparatus comprises a plurality of multiplexer units configured to shuffle at least an input vector comprising elements of a first size or an input vector comprising elements of a second size.
- a power saving circuit is connected to receive control information indicative of the element size of a vector being shuffled. The power saving circuit is configured to disable operation of one or more of the multiplexer units in accordance with the received control information.
- This invention allows maximum reuse of the vector shuffle hardware (resource sharing) while minimizing the power dissipated.
- a method of reducing power in a data processing apparatus configured to perform vector shuffle operations on an input vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element.
- the method comprises the step of providing a plurality of multiplexer units for shuffling at least an input vector comprising elements of a first size or an input vector comprising elements of a second size.
- the method also comprises the step of providing a power saving circuit for masking an input vector from one or more of the plurality of multiplexer units, by receiving control information indicative of the element size of a vector being shuffled, and disabling the operation of one or more of the multiplexer units by masking the input vector therefrom, in accordance with the received control information.
- a vector shuffle instruction for performing vector shuffle operations on a vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element, wherein the vector shuffle instruction comprises at least one data bit for indicating the element size of the vector being shuffled.
- Fig. 1 shows a basic vector shuffle operation on a vector having 256 bits, with 32 elements of 8 bits each;
- Fig. 2 shows a basic shuffle operation on a vector with 64 bits, with each element having a granularity of 32 bits;
- Fig. 3 shows how the first byte is obtained for each multiplexer in a shuffle operation with 8 bits granularity, on a vector of 64 bits;
- Fig. 4 shows a conventional vector shuffle unit for shuffling 8, 16 and 32 bit element sizes
- Fig. 5 shows a vector shuffle unit having a power saving circuit according to the present invention.
- Fig. 5 shows a vector shuffle unit 50 according to the present invention.
- the vector shuffle unit 50 comprises a plurality of base multiplexer units (muxO, muxl, mux2, mux3), which are connected to an output multiplexer 11.
- a power saving circuit 15 is provided for reducing the power dissipation in the base multiplexer units.
- power dissipation in base multiplexer units muxl, mux2 and mux3 can be reduced by masking the inputs to these multiplexer units, because of the element size, when their result is not needed.
- No masking is required for muxO as it is always needed for 8, 16 and 32 bit element sizes (as seen from Table 1).
- Muxl and mux3 are only used for 8 bit element sizes and can therefore be masked together as they are always used together.
- Mux2 is only used for 8 and 16 bit elements and requires its own masking circuitry within the power saving circuitry 15.
- the power saving circuit 15 is disposed between the input register 7 and the base multiplexer units.
- the power saving circuit 15 receives first and second control bits 17, 19.
- the first and second control bits 17, 19 form part of, or are derived from, the instruction set, for example, part of a vector shuffle instruction.
- the first control bit 17 can be set “high” to indicate when a 16 bit element is being shuffled, and set “low” at other times.
- the second control bit 19 can be set “high” to indicate when a 32 bit element is being shuffled, and set “low” and other times.
- the first and second control bits 17, 19 are connected to an OR gate 21.
- the output of the OR gate 21 is connected to the input of a first AND gate 23.
- the AND gate 23 is connected to receive its other input from the register 7, and has its output connected to the second and fourth multiplexer units, muxl, mux3.
- the first AND gate 23 receives the input vector 5 via bus connection 9 at one input, which can be masked using the signal received from the OR gate 21.
- a second AND gate 25 is connected to receive the input vector 5 at its first input, and the second control bit 19 at its other input.
- the output of the second AND gate 25 is connected to the third multiplexer unit mux2.
- the second AND gate 25 receives the input vector 5 via bus connection 9 at one input, which can be masked using the signal received from the second control bit 19.
- the first and second control bits 17, 19 will be set low.
- each of the AND gates 23 and 25 having one of its inputs set low, thus resulting in the input vector 5 being connected to each of the multiplexer units muxO, muxl, mux2 and mux3 in the normal way.
- multiplexer unit muxO will receive the input vector directly, multiplexer units muxl and mux3 will receive their inputs via the first AND gate 23, while multiplexer unit mux2 will receive its input from the second AND gate 25.
- the first control bit 17 When processing a vector having a granularity of 16 bit elements, the first control bit 17 is set high. This has the effect of setting the output of the OR gate 21 high, which in turn provides a high signal on one of the input connections to the first AND gate 23. This has the effect of masking the input vector from the multiplexer units muxl and mux3, which are connected to the output of the first AND gate 23. Since the input to the second AND gate 25 is connected to the second control bit 19 (i.e. the control bit for the 32 bit element which will be set low), the multiplexer unit mux2 will receive the input vector 5 at its input in the normal manner. In this way, only base multiplexer units muxO and mux2 are used when processing a 16 bit vector. Power is therefore saved because base multiplexer units muxl and mux3 are masked from operation.
- the second control bit 19 When processing a vector having a granularity of 32 bit elements, the second control bit 19 is set high. This has the effect of setting the output of the OR gate 21 high, which in turn provides a high signal on one of the input connections to the first AND gate 23. This has the effect of masking the input vector 5 from the multiplexer units muxl and mux3, which are connected to the output of the first AND gate 23. In addition, since the input to the second AND gate 23 is also set high (i.e. because this input is connected to the second control bit 19), the multiplexer unit mux2 will also be masked from receiving the input vector 5. In this way, only base multiplexer unit muxO is used when processing a 32 bit vector.
- the shuffling unit when the shuffling unit is not active, the inputs are kept “low”.
- the power saving circuitry 9 detects the power saving opportunity using the first and second control bits 17, 19, and masks the appropriate busses. It will be appreciated that modifications are required to both the instruction set and the hardware circuitry in order to realize the power saving.
- the preferred embodiment has been described in relation to shuffling an input vector, the invention may equally be used with more than one input vector, for example in a system having two to one shuffle units.
Abstract
A vector shuffle unit (50) comprises a number of base multiplexer units (mux0, muxl, mux2, mux3), which are connected to an output multiplexer (11). The vector shuffle unit (50) can be configured to shuffle a vector having any one of a number of different element sizes (for example 8, 16 and 32 bit element sizes). A power saving circuit (15) is provided for reducing the power consumption in the base multiplexer units muxl, mux2 and mux3, by masking the inputs to these multiplexer units when performing shuffle operations on certain element sizes. For example, no masking is required for mux0 as it is always needed for each of the 8, 16 and 32 bit element sizes. Multiplexer units muxl and mux3 are only used for 8 bit elements and can be masked together as they are always used together. Mux2 is only used for 8 and 16 bit elements and requires its own power saving circuitry.
Description
Data processing apparatus and method
The invention relates to a data processing apparatus and method, and in particular to a data processing apparatus and method having power reduction when processing vectors.
The invention further relates to a device, such as a mobile phone, PDA or alike, comprising such data processing apparatus.
Power efficiency for processor based equipment is becoming increasingly important. A number of techniques have been used to reduce power usage. These include designing the processor's circuitry to use less power, or designing the processor in a manner which allows power usage to be managed. Also, for a given processor architecture, power consumption can be saved by optimizing its programming.
The mapping of several similar operations onto one piece of hardware is quite common in the area of processor design. This often means that the result is sub-optimal for each of the specific operations. Therefore, the mapping of several operations onto one piece of hardware tends to result in higher power dissipation per operation when compared to dedicated circuitry being provided for each specific operation.
For example, in many data processing applications there is the need to shuffle vectors on a per element basis. The most common element sizes to be supported are of 8, 16 and 32 bits. All of these element sizes are usually supported by providing only 8 bit element size support, with the 16 and 32 bit element sizes then being catered for as just a subset of the
8 bit element size support. In other words, in a standard instruction set the vector shuffling only needs to be described for 8 bit element support because all of the other sizes (16 and 32 bit) are in essence a subset of this instruction. An example of a vector shuffle operation for a vector with 32 elements is given in Fig. 1. For example, this vector can be 256bits, with 32 elements of 8 bits each.
Referring to Fig. 2, consider the basic operation of a vector shuffle unit for a
64 bit vector, i.e. using 8 bytes, in which an output vector 3 can be a shuffled version of the input vector 5 The 64 bit vector shuffle operation is performed using a vector shuffle unit
configured around four base multiplexers (muxO, muxl, mux2, mux3 - not shown). Each base multiplexer has a distance of 4 bytes between its inputs. Since the vector in this example has 8 elements of 1 byte, each of the base multiplexers will be a 2:1 multiplexer.
In particular, Fig. 2 shows the configuration for the first base multiplexer, muxO. As can be seen, for the first base multiplexer muxO, byte 0 in the output vector 3 can therefore come from byte 0 or byte 4 of the input vector 5. It will be appreciated that this configuration provides all the shuffling options for shuffling vector elements of 32 bits. In other words, Fig. 2 shows how an input vector having 64 bits, comprising two elements of 32 bits each (i.e. first element being bytes 0-3 and the second element being bytes 4-7), can be shuffled to provide an output vector 3 in which bytes 0-3 come from bytes 4-7 of the input, and bytes 4-7 come from bytes 0-3 of the input.
Fig. 3 shows the connection to byte 0 for each of the base multiplexers (muxO, muxl, mux2, mux3) in a 64 bit vector comprising 8 bytes. As indicated above, in the first base multiplexer, muxO, byte 0 in the output can come from byte 0 or byte 4 of the input. In the second base multiplexer, muxl, byte 0 in the output can come from byte 1 or byte 5 of the input. In the third base multiplexer, mux2, byte 0 in the output can come from byte 2 or byte 6 of the input. In the fourth base multiplexer, mux3, byte 0 in the output can come from byte 3 or byte 7 of the input. In this way, the input vector can be shuffled using the four base multiplexers such that byte 0 in the output can be derived from any of the input bytes 0 to 7. Fig. 4 shows a conventional vector shuffle unit 1 that is capable of performing vector shuffle operations for vectors having element sizes of 8, 16 and 32 bits. In other words, the circuit shown in Fig. 4 is an example whereby several similar operations have been mapped onto one piece of hardware, which results is sub-optimal performance for the specific operations, as will be explained below. The vector shuffle unit 1 comprises a register 7 for storing an input vector 5.
The register 7 is connected to each of one of four base multiplexer units, muxO, muxl, mux2, mux3, using appropriate bus connections 9. The output of each base multiplexer unit muxO, muxl, mux2, mux3 is connected to an output multiplexer 11, again using appropriate bus connections 13. Table 1 below illustrates how, for certain element sizes, only some of the base multiplexer units muxO, muxl, mux2, mux3 are utilized.
Element size Base multiplexers needed
8 muxO, muxl, mux2 and mux3
16 muxO and mux 2
32 muxO
Table 1
As can be seen, in the conventional hardware which is configured to allow resource sharing, i.e. different sized vector elements to be shuffled, power is wasted when shuffling certain sized elements. This is because some of the base multiplexer units will be consuming power unnecessarily. For example, when processing a 64 bit vector with an element size of 32 bits, base multiplexer units muxl, mux2, and mux3 will be consuming power unnecessarily, because theirs results are not used. This results in higher power dissipation per operation when compared to dedicated circuitry.
The aim of the present invention is to provide a data processing apparatus and method for shuffling vectors having different sized elements, but without wasting power consumption.
According to a first aspect of the invention, there is provided a data processing apparatus for performing vector shuffle operations on an input vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element. The data processing apparatus comprises a plurality of multiplexer units configured to shuffle at least an input vector comprising elements of a first size or an input vector comprising elements of a second size. A power saving circuit is connected to receive control information indicative of the element size of a vector being shuffled. The power saving circuit is configured to disable operation of one or more of the multiplexer units in accordance with the received control information.
This invention allows maximum reuse of the vector shuffle hardware (resource sharing) while minimizing the power dissipated.
According to a second aspect of the invention, there is provided a method of reducing power in a data processing apparatus configured to perform vector shuffle operations on an input vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an
element. The method comprises the step of providing a plurality of multiplexer units for shuffling at least an input vector comprising elements of a first size or an input vector comprising elements of a second size. The method also comprises the step of providing a power saving circuit for masking an input vector from one or more of the plurality of multiplexer units, by receiving control information indicative of the element size of a vector being shuffled, and disabling the operation of one or more of the multiplexer units by masking the input vector therefrom, in accordance with the received control information.
According to a third aspect of the invention, there is provided a vector shuffle instruction for performing vector shuffle operations on a vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element, wherein the vector shuffle instruction comprises at least one data bit for indicating the element size of the vector being shuffled.
For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:
Fig. 1 shows a basic vector shuffle operation on a vector having 256 bits, with 32 elements of 8 bits each; Fig. 2 shows a basic shuffle operation on a vector with 64 bits, with each element having a granularity of 32 bits;
Fig. 3 shows how the first byte is obtained for each multiplexer in a shuffle operation with 8 bits granularity, on a vector of 64 bits;
Fig. 4 shows a conventional vector shuffle unit for shuffling 8, 16 and 32 bit element sizes;
Fig. 5 shows a vector shuffle unit having a power saving circuit according to the present invention.
Fig. 5 shows a vector shuffle unit 50 according to the present invention. In a similar manner to Fig. 4, the vector shuffle unit 50 comprises a plurality of base multiplexer units (muxO, muxl, mux2, mux3), which are connected to an output multiplexer 11. However, unlike Fig. 4, a power saving circuit 15 is provided for reducing the power dissipation in the base multiplexer units. In particular, power dissipation in base multiplexer
units muxl, mux2 and mux3 can be reduced by masking the inputs to these multiplexer units, because of the element size, when their result is not needed. No masking is required for muxO as it is always needed for 8, 16 and 32 bit element sizes (as seen from Table 1). Muxl and mux3 are only used for 8 bit element sizes and can therefore be masked together as they are always used together. Mux2 is only used for 8 and 16 bit elements and requires its own masking circuitry within the power saving circuitry 15.
The power saving circuit 15 is disposed between the input register 7 and the base multiplexer units. The power saving circuit 15 receives first and second control bits 17, 19. The first and second control bits 17, 19 form part of, or are derived from, the instruction set, for example, part of a vector shuffle instruction.
The first control bit 17 can be set "high" to indicate when a 16 bit element is being shuffled, and set "low" at other times. The second control bit 19 can be set "high" to indicate when a 32 bit element is being shuffled, and set "low" and other times. The first and second control bits 17, 19 are connected to an OR gate 21. The output of the OR gate 21 is connected to the input of a first AND gate 23. The AND gate 23 is connected to receive its other input from the register 7, and has its output connected to the second and fourth multiplexer units, muxl, mux3. Thus, it can be seen that the first AND gate 23 receives the input vector 5 via bus connection 9 at one input, which can be masked using the signal received from the OR gate 21. A second AND gate 25 is connected to receive the input vector 5 at its first input, and the second control bit 19 at its other input. The output of the second AND gate 25 is connected to the third multiplexer unit mux2. Thus, it can be seen that the second AND gate 25 receives the input vector 5 via bus connection 9 at one input, which can be masked using the signal received from the second control bit 19. When processing a vector having a granularity of 8 bit elements, the first and second control bits 17, 19 will be set low. This in turn will result in each of the AND gates 23 and 25 having one of its inputs set low, thus resulting in the input vector 5 being connected to each of the multiplexer units muxO, muxl, mux2 and mux3 in the normal way. In other words, multiplexer unit muxO will receive the input vector directly, multiplexer units muxl and mux3 will receive their inputs via the first AND gate 23, while multiplexer unit mux2 will receive its input from the second AND gate 25.
When processing a vector having a granularity of 16 bit elements, the first control bit 17 is set high. This has the effect of setting the output of the OR gate 21 high, which in turn provides a high signal on one of the input connections to the first AND gate 23.
This has the effect of masking the input vector from the multiplexer units muxl and mux3, which are connected to the output of the first AND gate 23. Since the input to the second AND gate 25 is connected to the second control bit 19 (i.e. the control bit for the 32 bit element which will be set low), the multiplexer unit mux2 will receive the input vector 5 at its input in the normal manner. In this way, only base multiplexer units muxO and mux2 are used when processing a 16 bit vector. Power is therefore saved because base multiplexer units muxl and mux3 are masked from operation.
When processing a vector having a granularity of 32 bit elements, the second control bit 19 is set high. This has the effect of setting the output of the OR gate 21 high, which in turn provides a high signal on one of the input connections to the first AND gate 23. This has the effect of masking the input vector 5 from the multiplexer units muxl and mux3, which are connected to the output of the first AND gate 23. In addition, since the input to the second AND gate 23 is also set high (i.e. because this input is connected to the second control bit 19), the multiplexer unit mux2 will also be masked from receiving the input vector 5. In this way, only base multiplexer unit muxO is used when processing a 32 bit vector.
Power is therefore saved because base multiplexer units muxl, mux2 and mux3 are masked from operation.
It is noted that in the analysis above, it is assumed that when the shuffling unit is not active, the inputs are kept "low". As will be appreciated from the above, by differentiating the different element sizes in the instruction set (i.e. providing first and second control bits 17, 19), a power saving opportunity is made possible in the hardware. The power saving circuitry 9 detects the power saving opportunity using the first and second control bits 17, 19, and masks the appropriate busses. It will be appreciated that modifications are required to both the instruction set and the hardware circuitry in order to realize the power saving.
This means of power saving can be applied to all hardware that has to support shuffle vectors on a per element basis where there is a manner (e.g. instruction set) of differentiating multiple element sizes. Although the preferred embodiment has been described in relation to a vector shuffle unit configured to shuffle 8, 16 or 32 bit elements, it will be appreciated that the invention could also be used with a vector shuffle unit configured to switch less, or more differently sized elements.
It will also be appreciated that, although the preferred embodiment refers to the control signals 17, 19 having a logic high signal for indicating a particular state, a logic low signal could also be used, with the power saving circuitry adapted accordingly to give the same logic output. Furthermore, although the preferred embodiment has been described using
AND gates in the power saving circuit, it will be appreciated that other logic circuitry can be used to provide operand isolation.
Also, although the preferred embodiment has been described in relation to shuffling an input vector, the invention may equally be used with more than one input vector, for example in a system having two to one shuffle units.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfill the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.
Claims
1. A data processing apparatus for performing vector shuffle operations on an input vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element, the data processing apparatus comprising: - a plurality of multiplexer units configured to shuffle at least an input vector comprising elements of a first size or an input vector comprising elements of a second size; a power saving circuit connected to receive control information indicative of the element size of a vector being shuffled; wherein the power saving circuit is configured to disable operation of one or more of the multiplexer units in accordance with the received control information.
2. A data processing apparatus as claimed in claim 1, wherein the control information is contained in a vector shuffle instruction forming part of an instruction set.
3. A data processing apparatus as claimed in claim 1 or 2, wherein the power saving circuit comprises logic circuitry for masking an input vector from the one or more multiplexer units in accordance with the received control information.
4. A data processing apparatus as claimed in any one of the proceeding claims, wherein the data processing apparatus is configured to shuffle vectors having 8, 16 or 32 bit element sizes, the apparatus comprising: first, second, third and fourth base multiplexer units forming the plurality of multiplexer units; a first logic gate for masking the second and fourth base multiplexer units when either a first control bit or a second control bit in the control information is enabled; and a second logic gate for masking the third base multiplexer unit when the second control bit in the control information is enabled.
5. A method of reducing power in a data processing apparatus configured to perform vector shuffle operations on an input vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element, the method comprising the steps of: - providing a plurality of multiplexer units for shuffling at least an input vector comprising elements of a first size or an input vector comprising elements of a second size; providing a power saving circuit for masking an input vector from one or more of the plurality of multiplexer units; receiving control information indicative of the element size of a vector being shuffled; and disabling the operation of one or more of the multiplexer units by masking the input vector therefrom, in accordance with the received control information.
6. A method as claimed in claim 5, wherein the control information is received from a vector shuffle instruction forming part of an instruction set.
7. A method as claimed in claim 5 or 6, wherein the step of disabling the operation of one or more of the multiplexer units comprises the step of masking an input vector from the one or more multiplexer units in accordance with the received control information.
8. A method as claimed in any one of claims 5 to 7, wherein the data processing apparatus is configured to shuffle vectors having 8, 16 or 32 bit element sizes, the method comprising the steps of: - providing first, second, third and fourth base multiplexer units as the plurality of multiplexer units; masking the second and fourth base multiplexer units when either a first control bit or a second control bit in the control information is enabled; and masking the third base multiplexer unit when the second control bit in the control information is enabled.
9. A vector shuffle instruction for performing vector shuffle operations on a vector having a plurality of elements, each element comprising a predetermined number of data bits, and the number of data bits defining the size of an element, wherein the vector shuffle instruction comprises at least one data bit for indicating the element size of the vector being shuffled.
10. Device comprising a data processing apparatus according to any of claims 1-4.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05110769 | 2005-11-15 | ||
EP05110769.6 | 2005-11-15 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007057832A2 true WO2007057832A2 (en) | 2007-05-24 |
WO2007057832A3 WO2007057832A3 (en) | 2007-08-02 |
Family
ID=37907071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2006/054214 WO2007057832A2 (en) | 2005-11-15 | 2006-11-13 | Vector shuffle unit |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW200811705A (en) |
WO (1) | WO2007057832A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017112170A1 (en) * | 2015-12-20 | 2017-06-29 | Intel Corporation | Instruction and logic for vector permute |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013095620A1 (en) | 2011-12-23 | 2013-06-27 | Intel Corporation | Apparatus and method of improved insert instructions |
CN108241504A (en) | 2011-12-23 | 2018-07-03 | 英特尔公司 | The device and method of improved extraction instruction |
US9632980B2 (en) | 2011-12-23 | 2017-04-25 | Intel Corporation | Apparatus and method of mask permute instructions |
US9946540B2 (en) | 2011-12-23 | 2018-04-17 | Intel Corporation | Apparatus and method of improved permute instructions with multiple granularities |
WO2013095637A1 (en) | 2011-12-23 | 2013-06-27 | Intel Corporation | Apparatus and method of improved permute instructions |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62105524A (en) * | 1985-11-01 | 1987-05-16 | Nec Corp | Signal selecting circuit |
EP0757312A1 (en) * | 1995-08-01 | 1997-02-05 | Hewlett-Packard Company | Data processor |
US20030095547A1 (en) * | 2001-11-21 | 2003-05-22 | Schofield William G.J. | 2n-1 Shuffling network |
US6622242B1 (en) * | 2000-04-07 | 2003-09-16 | Sun Microsystems, Inc. | System and method for performing generalized operations in connection with bits units of a data word |
-
2006
- 2006-11-13 WO PCT/IB2006/054214 patent/WO2007057832A2/en active Application Filing
- 2006-11-15 TW TW95142325A patent/TW200811705A/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62105524A (en) * | 1985-11-01 | 1987-05-16 | Nec Corp | Signal selecting circuit |
EP0757312A1 (en) * | 1995-08-01 | 1997-02-05 | Hewlett-Packard Company | Data processor |
US6622242B1 (en) * | 2000-04-07 | 2003-09-16 | Sun Microsystems, Inc. | System and method for performing generalized operations in connection with bits units of a data word |
US20030095547A1 (en) * | 2001-11-21 | 2003-05-22 | Schofield William G.J. | 2n-1 Shuffling network |
Non-Patent Citations (1)
Title |
---|
LEE R B: "Subword permutation instructions for two-dimensional multimedia processing in microSIMD architectures" APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES, AND PROCESSORS, 2000. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON JULY 10-12, 2000, PISCATAWAY, NJ, USA,IEEE, 10 July 2000 (2000-07-10), pages 3-14, XP010507732 ISBN: 0-7695-0716-6 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017112170A1 (en) * | 2015-12-20 | 2017-06-29 | Intel Corporation | Instruction and logic for vector permute |
CN108292271A (en) * | 2015-12-20 | 2018-07-17 | 英特尔公司 | Instruction for vector permutation and logic |
US10467006B2 (en) | 2015-12-20 | 2019-11-05 | Intel Corporation | Permutating vector data scattered in a temporary destination into elements of a destination register based on a permutation factor |
CN108292271B (en) * | 2015-12-20 | 2024-03-29 | 英特尔公司 | Instruction and logic for vector permutation |
Also Published As
Publication number | Publication date |
---|---|
TW200811705A (en) | 2008-03-01 |
WO2007057832A3 (en) | 2007-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7571303B2 (en) | Reconfigurable integrated circuit | |
KR101310044B1 (en) | Incresing workload performance of one or more cores on multiple core processors | |
KR101050554B1 (en) | Masking in Data Processing Systems Applicable to Development Interfaces | |
US9348792B2 (en) | Coarse-grained reconfigurable processor and code decompression method thereof | |
EP1870813A1 (en) | Page processing circuits, devices, methods and systems for secure demand paging and other operations | |
JP6239130B2 (en) | System and method for reducing memory bus bandwidth according to workload | |
EP2580657B1 (en) | Information processing device and method | |
US10678710B2 (en) | Protection scheme for embedded code | |
WO2007057832A2 (en) | Vector shuffle unit | |
US8275975B2 (en) | Sequencer controlled system and method for controlling timing of operations of functional units | |
US20200042321A1 (en) | Low power back-to-back wake up and issue for paired issue queue in a microprocessor | |
US5784642A (en) | System for establishing a transfer mode between system controller and peripheral device | |
US9697163B2 (en) | Data path configuration component, signal processing device and method therefor | |
US20140325183A1 (en) | Integrated circuit device, asymmetric multi-core processing module, electronic device and method of managing execution of computer program code therefor | |
US9000804B2 (en) | Integrated circuit device comprising clock gating circuitry, electronic device and method for dynamically configuring clock gating | |
DE102022121048A1 (en) | SELECTING THE POWER SUPPLY FOR A HOST SYSTEM | |
US20050210219A1 (en) | Vliw processsor | |
US20060101291A1 (en) | System, method, and apparatus for reducing power consumption in a microprocessor | |
US9442788B2 (en) | Bus protocol checker, system on chip including the same, bus protocol checking method | |
TW202340967A (en) | Encoding byte information on a data bus | |
US9501584B2 (en) | Apparatus and method for distributing a search key in a ternary memory array | |
US20120005438A1 (en) | Input/output control apparatus and information processing apparatus | |
EP2585907B1 (en) | Accelerating execution of compressed code | |
JP2005327062A (en) | Control method for input/output terminal module and input/output terminal module | |
KR20150002319A (en) | Processor of heterogeneous cluster architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06821408 Country of ref document: EP Kind code of ref document: A2 |