US20060155958A1 - Processor architecture - Google Patents
Processor architecture Download PDFInfo
- Publication number
- US20060155958A1 US20060155958A1 US11/293,845 US29384505A US2006155958A1 US 20060155958 A1 US20060155958 A1 US 20060155958A1 US 29384505 A US29384505 A US 29384505A US 2006155958 A1 US2006155958 A1 US 2006155958A1
- Authority
- US
- United States
- Prior art keywords
- processor
- instruction
- unit
- execution units
- groups
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000003116 impacting effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/30149—Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
Definitions
- This invention relates to a processor architecture, and in particular to a processor architecture which is particularly useful in signal processing applications.
- Modern high-performance wireless communications systems require digital processors which can provide billions of compute operations per second to achieve acceptable performance, for example to carry out operations such as filtering, equalisation and decoding functions.
- digital processors which can provide billions of compute operations per second to achieve acceptable performance, for example to carry out operations such as filtering, equalisation and decoding functions.
- ALUs arithmetic logic units
- multipliers arithmetic logic units
- address generators etc.
- LIF Long Instruction Word
- instructions for each of a number of execution units are concatenated into one “long instruction word” which can be executed in a single processor cycle.
- a bit field within the long instruction is reserved for an instruction for each of the execution units, regardless of whether a particular execution unit will be active within any one processor cycle. This has the disadvantageous effect that it creates excessively long instruction words, which can contain a lot of redundant information for execution units that are not active.
- the present invention relates to an alternative implementation of an LIW processor.
- a processor which comprises multiple execution units.
- the multiple execution units of the processor are divided into groups, and an input instruction word can contain instructions for one execution unit in each of the groups.
- the processor is optimised for use in signal processing operations, in that the multiple execution units of the processor are divided into groups which do not place significant restrictions on the desirable uses of the processor. That is, it has been determined that, in signal processing applications, it is not usually necessary for certain execution units to operate simultaneously.
- execution units can therefore be grouped together, in such a way that only one of them can operate at a particular time, without significantly impacting on the operation of the device.
- an array comprising a plurality of interconnected processors, wherein each of the processors comprises multiple execution units as defined above.
- FIG. 1 is a block schematic diagram of a processor array according to an aspect of the present invention
- FIG. 2 is a block schematic diagram of a processor within the processor array of FIG. 1 , according to another aspect of the present invention
- FIG. 3 is an overview of the format of an instruction word for use in the processor of FIG. 2 ;
- FIG. 4 illustrates in more detail the format of a part of the instruction word shown in FIG. 3 ;
- FIG. 5 illustrates the operation of a second part of the instruction word shown in FIG. 3 ;
- FIG. 6 illustrates the operation of a third part of the instruction word shown in FIG. 3 .
- FIG. 1 is a block schematic diagram of a processor array, as generally described in WO02/50624.
- the array is made up of array elements 20 , which are interconnected by buses and switches.
- the array architecture includes first bus pairs 30 , shown running horizontally in FIG. 1 , each pair including a respective first bus 32 carrying data from left to right in FIG. 1 and a respective second bus 36 carrying data from right to left.
- the array architecture includes second bus pairs 40 , shown running vertically in FIG. 1 , each pair including a respective third bus 42 carrying data upwards in FIG. 1 and a respective fourth bus 46 carrying data downwards.
- each diamond connection 50 represents a switch, which connects an array element 20 to a respective bus 32 , 36 .
- the array further includes a switch matrix 55 at each intersection of a first and second bus pair 30 , 40 .
- the data buses, and the switches and switch matrices, therefore allow data to be switched from one array element to another for processing, as required.
- the array elements 20 take the form of processors, as shown in more detail in FIG. 2 .
- the processors 20 are adapted to make them particularly suitable for use as array elements, although the invention is also applicable to individual processors.
- the processor 20 includes a 64 ⁇ 64 bit instruction memory 60 , which contains instructions loaded into the memory to control the operation of the processor.
- instructions are fetched from the instruction memory 60 , and passed to an instruction decoder 62 , where they are decoded to configure the datapaths and execution units in the processor.
- the processor comprises six execution units.
- the first available execution unit is a first Arithmetic Logic Unit (ALU) 64 , which can perform a number of arithmetic and logical operations.
- ALU Arithmetic Logic Unit
- the second available execution unit is a communications unit 66 , which is connected to the input communications bus 68 and the output communications bus 70 , and is able to perform “put” and “get” operations to move data to and from the external communications buses 68 , 70 , and is also able to move data to and from the 15 ⁇ 16 bit data registers 84 .
- the registers 84 are connected to the execution units by means of a data bus 85 .
- the communications unit 66 is thereby optimised to support the processing performed in the array, whereby data flows from one processor 20 to another, with parts of the processing being performed at each stage.
- the third available execution unit is a combined Memory Access Unit (MAU)/second ALU 72 , which performs a variety of load and store operations over a bus 74 to a 64 ⁇ 32 bit data memory 76 , and also provides a subset of the ALU operations performed by the first ALU 64 .
- MAU Memory Access Unit
- second ALU 72 which performs a variety of load and store operations over a bus 74 to a 64 ⁇ 32 bit data memory 76 , and also provides a subset of the ALU operations performed by the first ALU 64 .
- the fourth available execution unit is a branch unit 78 , which performs a number of conditional and unconditional branch operations.
- the fifth available execution unit is a Multiplier Accumulator (MAC) Unit 80 , which performs a variety of multiply and multiply accumulate operations with various bit widths.
- MAC Multiplier Accumulator
- this unit may be replaced by a simpler Multiply unit.
- an Application Specific Unit (ASU) 82 .
- the ASU 82 is adapted to perform a number of highly specialised operations for wireless signal processing applications, such as complex spread and complex despread, in order to support CDMA transmit and receive functionality.
- this unit may be omitted.
- each execution unit is able to perform one operation in one clock cycle.
- the first ALU 64 is also able to perform a shift operation on the first operand of the basic arithmetic or logical operations.
- two instructions can effectively execute simultaneously on that one execution unit.
- the execution units are clustered into three groups, each controlled by a separate instruction in a LIW instruction.
- the first group 86 includes only the first Arithmetic Logic Unit (ALU) 64 ; the second group 88 is made up of the communications unit 66 , and the combined Memory Access Unit (MAU)/second ALU 72 ; and the third group 90 is made up the branch unit 78 , the Multiplier Accumulator (MAC) Unit 80 , and the Application Specific Unit (ASU) 82 .
- ALU Arithmetic Logic Unit
- MAU Memory Access Unit
- ASU Application Specific Unit
- the device is then controlled such that any one, any two, or all three of the groups 86 , 88 , 90 can be active at any one time, but such that no more than one of the execution units within a group can be active at any one time.
- the instruction format is such that this can be achieved efficiently in each case.
- a long instruction word can include an instruction LIW# 1 for the first group 86 , an instruction LIW# 2 for the second group 88 , and an instruction LIW# 3 for the third group 90 .
- FIG. 3 shows the basic structure of a long instruction word instruction, which is also explained in more detail in FIGS. 4, 5 and 6 .
- the long instruction word first contains a short, 3 bit, bit sequence, which indicates whether the first group 86 is active in that processor cycle and, if so, indicates what class of operation is to be performed, so that execution units and datapaths can be configured.
- the first group 86 is active in that processor cycle and that three bit sequence indicates what operation is to be performed by the first Arithmetic Logic Unit (ALU) 64 .
- ALU Arithmetic Logic Unit
- the operation is an ALU operation with three operands, for example adding two values to give a result, with the three operands then being the register addresses of the two values to be added plus the register address in which the result is to be stored.
- the operation is a load or store operation between the data memory and a nominated register or register pair.
- the operation is an ALU operation with two operands, one operand, or no operands, for example nop.
- the fourth bit indicates whether an extension byte is to be used, as will be described in more detail below.
- the remaining four bits of byte 0 , and the eight bits of byte 1 then indicate the operands or opcode values, depending on the value of the first three bits of byte 0 , as shown in FIG. 4 . More specifically, where FIG. 4 says that four of these bits represent an operand, they define the address, within the registers 84 , from which the first ALU 64 should retrieve the respective operand on which it will perform the defined operation.
- the fourth bit must be set to “1”, and the extension byte must be used, if either the second group 88 or the third group 90 is active.
- the first group 86 is not active in that processor cycle, and byte 0 of the long instruction word then contains further short bit sequences, which indicate whether the second group 88 and third group 90 are active and, if so, what class of operation is to be performed.
- additional bytes LIW# 2 108 provide required information to allow the second group 88 to perform the intended function
- additional bytes LIW# 3 110 provide required information to allow the third group 90 to perform the intended function.
- the extension byte In the case where the first three bits of byte 0 are not 000 , and an LIW# 1 instruction or “short” Memory Access operation is to be executed, the extension byte must be used if either or both of the second group 88 and third group 90 is active. If so, the extension byte carries Lcode 2 and Lcode 3 , and additional bytes LIW# 2 108 and LIW# 3 110 contain the required information to allow the relevant group to perform the intended function.
- the extension byte also carries a 2-bit extension opcode “ex op”, which allows more possible instructions for ALU# 0 .
- the extension byte also includes a 1 bit flag, S. If set, the flag S indicates the presence of a shift operation on the ALU first operand. In that case, an additional byte following the extension byte is used to define whether the shift is logical or arithmetic, to the left or right, and how many bits are shifted (4-bit value).
- the instruction set architecture supports the use of short constants (which, in this illustrated embodiment, are 4 bits long) and long constants (which, in this illustrated embodiment, are 16 bits long).
- operands are generally 4 bits long, and one of these 4-bit operands normally refers to one of the registers 84 , but it can alternatively be used to indicate a 4-bit constant value.
- the operand value “15” is used to direct the instruction decoder 62 to take the value in the 16-bit field 112 , which in that case appears at the end of the long instruction word instruction, as a 16-bit constant value. No useful information is therefore stored at the register address “15” (R 15 ). Thus, writing to R 15 is used to indicate that an operation result should be discarded.
- the encoded instruction word is organized on byte boundaries. It can further be seen from FIGS. 3-6 that an individual LIW instruction can be between 1 byte (the special case where none of the groups is active, and there are no LIW# 1 , no LIW# 2 and no LIW# 3 instructions) and 9 bytes in length.
- the instruction decoder 62 can therefore support any combination of instruction lengths within a single 64-bit instruction word and can tolerate LIW instructions which are contained in successive 64-bit instruction words.
- the length of any single LIW instruction cannot exceed 8 bytes. However, in other embodiments of the invention, this maximum length can be set to any desired value. This restriction results in a small number of combinations of LIW# 1 , LIW# 2 and LIW# 3 instructions which cannot be supported because they exceed this length. These illegal combinations are trapped by the Instruction Decode block 62 , resulting in the setting of an Illegal Instruction flag.
- a compiler and assembler operating to support the processor architecture should also intercept disallowed instruction combinations at compile time.
- the architecture relies on an instruction being decoded every processor cycle and therefore it is necessary that a branch destination is aligned at the beginning of a 64-bit instruction word.
- the instruction decoder 62 interprets an all-0 byte instruction (equivalent to “no LIW# 1 , no LIW# 2 , no LIW# 3 ”) as a “new line” and will fetch the next 64-bit instruction word.
- the compiler and assembler can use the “new line” instruction at the end of an instruction sequence immediately prior to a branch destination, in order to ensure 64-bit alignment of the instruction at the branch destination.
- the long instruction word format therefore has the property that the length LIW inst of the long instruction word is independent of the total number of execution units. Rather, it is determined by the maximum number of execution units which can be active in a single processor cycle. In the illustrated embodiment, a maximum of three execution units out of the six available can be active in a single LIW instruction/processor cycle, and the maximum length of a single LIW instruction is limited to 64 bits.
- LIW inst of the long instruction word can vary, from one instruction to the next, depending on the number of active execution units within a given cycle. Thus, in many instruction cycles, it is likely that LIW inst will be less than 64 bits.
- multiple instructions can be packed into the 64 bit wide instruction memory 60 , usually without the need for alignment to word boundaries, and the instructions can overrun a 64-bit instruction word boundary into the following instruction word.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
- This invention relates to a processor architecture, and in particular to a processor architecture which is particularly useful in signal processing applications.
- Modern high-performance wireless communications systems require digital processors which can provide billions of compute operations per second to achieve acceptable performance, for example to carry out operations such as filtering, equalisation and decoding functions. Increasingly these very high processing demands are satisfied by the use of multiple execution units (such as arithmetic logic units (ALUs), multipliers, address generators etc.) which can operate in parallel within a single processor cycle, and can thus increase the aggregate number of operations which can be completed per cycle.
- One architectural approach which has been developed, in order to allow parallel operation of multiple execution units, is the Long Instruction Word (LIW) architecture. In this approach, instructions for each of a number of execution units are concatenated into one “long instruction word” which can be executed in a single processor cycle. Typically, in implementations of this approach, a bit field within the long instruction is reserved for an instruction for each of the execution units, regardless of whether a particular execution unit will be active within any one processor cycle. This has the disadvantageous effect that it creates excessively long instruction words, which can contain a lot of redundant information for execution units that are not active.
- The end result is a larger and more costly design.
- The present invention relates to an alternative implementation of an LIW processor.
- According to a preferred embodiment of the present invention, there is provided a processor which comprises multiple execution units. The multiple execution units of the processor are divided into groups, and an input instruction word can contain instructions for one execution unit in each of the groups.
- In a further preferred embodiment of the invention, the processor is optimised for use in signal processing operations, in that the multiple execution units of the processor are divided into groups which do not place significant restrictions on the desirable uses of the processor. That is, it has been determined that, in signal processing applications, it is not usually necessary for certain execution units to operate simultaneously.
- These execution units can therefore be grouped together, in such a way that only one of them can operate at a particular time, without significantly impacting on the operation of the device.
- According to a further aspect of the present invention, there is provided an array, comprising a plurality of interconnected processors, wherein each of the processors comprises multiple execution units as defined above.
- For a better understanding of the present invention, and to show how it may be put into effect, reference will now be made to the accompanying drawings, in which:
-
FIG. 1 is a block schematic diagram of a processor array according to an aspect of the present invention; -
FIG. 2 is a block schematic diagram of a processor within the processor array ofFIG. 1 , according to another aspect of the present invention; -
FIG. 3 is an overview of the format of an instruction word for use in the processor ofFIG. 2 ; -
FIG. 4 illustrates in more detail the format of a part of the instruction word shown inFIG. 3 ; -
FIG. 5 illustrates the operation of a second part of the instruction word shown inFIG. 3 ; -
FIG. 6 illustrates the operation of a third part of the instruction word shown inFIG. 3 . -
FIG. 1 is a block schematic diagram of a processor array, as generally described in WO02/50624. The array is made up ofarray elements 20, which are interconnected by buses and switches. - The array architecture includes
first bus pairs 30, shown running horizontally inFIG. 1 , each pair including a respectivefirst bus 32 carrying data from left to right inFIG. 1 and a respectivesecond bus 36 carrying data from right to left. - The array architecture includes
second bus pairs 40, shown running vertically inFIG. 1 , each pair including a respectivethird bus 42 carrying data upwards inFIG. 1 and a respectivefourth bus 46 carrying data downwards. - In
FIG. 1 , eachdiamond connection 50 represents a switch, which connects anarray element 20 to arespective bus switch matrix 55 at each intersection of a first andsecond bus pair - In this embodiment of the invention, at least some of the
array elements 20 take the form of processors, as shown in more detail inFIG. 2 . In accordance with this illustrated embodiment of the present invention, theprocessors 20 are adapted to make them particularly suitable for use as array elements, although the invention is also applicable to individual processors. - The
processor 20 includes a 64×64bit instruction memory 60, which contains instructions loaded into the memory to control the operation of the processor. In operation of the device, instructions are fetched from theinstruction memory 60, and passed to aninstruction decoder 62, where they are decoded to configure the datapaths and execution units in the processor. - In this illustrated embodiment, the processor comprises six execution units. The first available execution unit is a first Arithmetic Logic Unit (ALU) 64, which can perform a number of arithmetic and logical operations.
- The second available execution unit is a
communications unit 66, which is connected to theinput communications bus 68 and theoutput communications bus 70, and is able to perform “put” and “get” operations to move data to and from theexternal communications buses bit data registers 84. Theregisters 84 are connected to the execution units by means of adata bus 85. - In this illustrated embodiment, the
communications unit 66 is thereby optimised to support the processing performed in the array, whereby data flows from oneprocessor 20 to another, with parts of the processing being performed at each stage. - The third available execution unit is a combined Memory Access Unit (MAU)/
second ALU 72, which performs a variety of load and store operations over abus 74 to a 64×32bit data memory 76, and also provides a subset of the ALU operations performed by the first ALU 64. - The fourth available execution unit is a
branch unit 78, which performs a number of conditional and unconditional branch operations. - The fifth available execution unit is a Multiplier Accumulator (MAC)
Unit 80, which performs a variety of multiply and multiply accumulate operations with various bit widths. In an alternative embodiment of the invention, this unit may be replaced by a simpler Multiply unit. - In this illustrated embodiment of the invention, there is a sixth available execution unit in the form of an Application Specific Unit (ASU) 82. More specifically, the ASU 82 is adapted to perform a number of highly specialised operations for wireless signal processing applications, such as complex spread and complex despread, in order to support CDMA transmit and receive functionality. In an alternative embodiment of the invention, this unit may be omitted.
- As is conventional, in general each execution unit is able to perform one operation in one clock cycle. However, The first ALU 64 is also able to perform a shift operation on the first operand of the basic arithmetic or logical operations. Thus, in this special case, two instructions can effectively execute simultaneously on that one execution unit.
- Analysis of a wide range of signal processing applications has now led to the conclusion that it is not necessary for all of the execution units to be able to operate simultaneously. In this illustrated embodiment of the invention, the execution units are clustered into three groups, each controlled by a separate instruction in a LIW instruction.
- Specifically, in this illustrated embodiment, the
first group 86 includes only the first Arithmetic Logic Unit (ALU) 64; thesecond group 88 is made up of thecommunications unit 66, and the combined Memory Access Unit (MAU)/second ALU 72; and thethird group 90 is made up thebranch unit 78, the Multiplier Accumulator (MAC)Unit 80, and the Application Specific Unit (ASU) 82. - According to this preferred embodiment of the invention, the device is then controlled such that any one, any two, or all three of the
groups - Specifically, a long instruction word can include an
instruction LIW# 1 for thefirst group 86, aninstruction LIW# 2 for thesecond group 88, and aninstruction LIW# 3 for thethird group 90. -
FIG. 3 shows the basic structure of a long instruction word instruction, which is also explained in more detail inFIGS. 4, 5 and 6. - Thus, the long instruction word first contains a short, 3 bit, bit sequence, which indicates whether the
first group 86 is active in that processor cycle and, if so, indicates what class of operation is to be performed, so that execution units and datapaths can be configured. - As shown in
FIG. 4 , except in the case where the first three bits ofbyte 0 are 000, thefirst group 86 is active in that processor cycle and that three bit sequence indicates what operation is to be performed by the first Arithmetic Logic Unit (ALU) 64. - Thus, when the value of the first three bits is within the range 001-100, the operation is an ALU operation with three operands, for example adding two values to give a result, with the three operands then being the register addresses of the two values to be added plus the register address in which the result is to be stored.
- When the value of the first three bits is within the range 101-110, the operation is a load or store operation between the data memory and a nominated register or register pair.
- When the value of the first three bits is 111, the operation is an ALU operation with two operands, one operand, or no operands, for example nop.
- In any of these cases, the fourth bit then indicates whether an extension byte is to be used, as will be described in more detail below. The remaining four bits of
byte 0, and the eight bits ofbyte 1, then indicate the operands or opcode values, depending on the value of the first three bits ofbyte 0, as shown inFIG. 4 . More specifically, whereFIG. 4 says that four of these bits represent an operand, they define the address, within theregisters 84, from which the first ALU 64 should retrieve the respective operand on which it will perform the defined operation. - When the first three bits of
byte 0 are not 000, and thus thefirst group 86 is active in that processor cycle, the fourth bit must be set to “1”, and the extension byte must be used, if either thesecond group 88 or thethird group 90 is active. - In the case where the first three bits of
byte 0 are 000, thefirst group 86 is not active in that processor cycle, andbyte 0 of the long instruction word then contains further short bit sequences, which indicate whether thesecond group 88 andthird group 90 are active and, if so, what class of operation is to be performed. - Thus, there is a 3
bit sequence Lcode 2 relating to thesecond group 88, as shown inFIG. 5 , and a 2bit sequence Lcode 3 relating to thesecond group 90, as shown inFIG. 6 . - If either or both of the
second group 88 andthird group 90 is active, then additionalbytes LIW# 2 108 provide required information to allow thesecond group 88 to perform the intended function, and additionalbytes LIW# 3 110 provide required information to allow thethird group 90 to perform the intended function. - In the case where the first three bits of
byte 0 are not 000, and anLIW# 1 instruction or “short” Memory Access operation is to be executed, the extension byte must be used if either or both of thesecond group 88 andthird group 90 is active. If so, the extension byte carries Lcode2 and Lcode3, and additionalbytes LIW# 2 108 andLIW# 3 110 contain the required information to allow the relevant group to perform the intended function. - The extension byte also carries a 2-bit extension opcode “ex op”, which allows more possible instructions for
ALU# 0. The extension byte also includes a 1 bit flag, S. If set, the flag S indicates the presence of a shift operation on the ALU first operand. In that case, an additional byte following the extension byte is used to define whether the shift is logical or arithmetic, to the left or right, and how many bits are shifted (4-bit value). - The instruction set architecture supports the use of short constants (which, in this illustrated embodiment, are 4 bits long) and long constants (which, in this illustrated embodiment, are 16 bits long). As shown in
FIG. 4 , operands are generally 4 bits long, and one of these 4-bit operands normally refers to one of theregisters 84, but it can alternatively be used to indicate a 4-bit constant value. Where it is required to use a longer constant value, the operand value “15” is used to direct theinstruction decoder 62 to take the value in the 16-bit field 112, which in that case appears at the end of the long instruction word instruction, as a 16-bit constant value. No useful information is therefore stored at the register address “15” (R15). Thus, writing to R15 is used to indicate that an operation result should be discarded. - It can therefore be seen that the encoded instruction word is organized on byte boundaries. It can further be seen from
FIGS. 3-6 that an individual LIW instruction can be between 1 byte (the special case where none of the groups is active, and there are noLIW# 1, noLIW# 2 and noLIW# 3 instructions) and 9 bytes in length. Theinstruction decoder 62 can therefore support any combination of instruction lengths within a single 64-bit instruction word and can tolerate LIW instructions which are contained in successive 64-bit instruction words. - In one embodiment of the invention, the length of any single LIW instruction cannot exceed 8 bytes. However, in other embodiments of the invention, this maximum length can be set to any desired value. This restriction results in a small number of combinations of
LIW# 1,LIW# 2 andLIW# 3 instructions which cannot be supported because they exceed this length. These illegal combinations are trapped by theInstruction Decode block 62, resulting in the setting of an Illegal Instruction flag. Preferably, a compiler and assembler operating to support the processor architecture should also intercept disallowed instruction combinations at compile time. - There is one situation where alignment of LIW instructions must be restricted further, and that is in the case of branch destinations. The architecture relies on an instruction being decoded every processor cycle and therefore it is necessary that a branch destination is aligned at the beginning of a 64-bit instruction word. The
instruction decoder 62 interprets an all-0 byte instruction (equivalent to “noLIW# 1, noLIW# 2, noLIW# 3”) as a “new line” and will fetch the next 64-bit instruction word. Thus the compiler and assembler can use the “new line” instruction at the end of an instruction sequence immediately prior to a branch destination, in order to ensure 64-bit alignment of the instruction at the branch destination. - The long instruction word format therefore has the property that the length LIWinst of the long instruction word is independent of the total number of execution units. Rather, it is determined by the maximum number of execution units which can be active in a single processor cycle. In the illustrated embodiment, a maximum of three execution units out of the six available can be active in a single LIW instruction/processor cycle, and the maximum length of a single LIW instruction is limited to 64 bits.
- Further, the length LIWinst of the long instruction word can vary, from one instruction to the next, depending on the number of active execution units within a given cycle. Thus, in many instruction cycles, it is likely that LIWinst will be less than 64 bits.
- Also, multiple instructions can be packed into the 64 bit
wide instruction memory 60, usually without the need for alignment to word boundaries, and the instructions can overrun a 64-bit instruction word boundary into the following instruction word. - Taken together, these factors mean that the result is object code which is compact and highly efficient in both high-throughput signal processing applications with multiple parallel operations per cycle and lower-throughput, more complex control operations.
- This reduces hardware complexity significantly and, more importantly, reduces the complexity of the compiler/assembler required to support the architecture.
Claims (13)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/981,973 US9104426B2 (en) | 2004-12-03 | 2007-11-01 | Processor architecture for processing variable length instruction words |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0426606.0 | 2004-12-03 | ||
GB0426606A GB2420884B (en) | 2004-12-03 | 2004-12-03 | Processor architecture |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/981,973 Continuation US9104426B2 (en) | 2004-12-03 | 2007-11-01 | Processor architecture for processing variable length instruction words |
Publications (1)
Publication Number | Publication Date |
---|---|
US20060155958A1 true US20060155958A1 (en) | 2006-07-13 |
Family
ID=34044035
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/293,845 Abandoned US20060155958A1 (en) | 2004-12-03 | 2005-12-02 | Processor architecture |
US11/981,973 Expired - Fee Related US9104426B2 (en) | 2004-12-03 | 2007-11-01 | Processor architecture for processing variable length instruction words |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/981,973 Expired - Fee Related US9104426B2 (en) | 2004-12-03 | 2007-11-01 | Processor architecture for processing variable length instruction words |
Country Status (4)
Country | Link |
---|---|
US (2) | US20060155958A1 (en) |
EP (1) | EP1667016A3 (en) |
JP (1) | JP5112627B2 (en) |
GB (1) | GB2420884B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100185818A1 (en) * | 2009-01-21 | 2010-07-22 | Lanping Sheng | Resource pool managing system and signal processing method |
WO2015035339A1 (en) * | 2013-09-06 | 2015-03-12 | Huawei Technologies Co., Ltd. | System and method for an asynchronous processor with heterogeneous processors |
WO2015035306A1 (en) * | 2013-09-06 | 2015-03-12 | Huawei Technologies Co., Ltd. | System and method for an asynchronous processor with token-based very long instruction word architecture |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100813662B1 (en) | 2006-11-17 | 2008-03-14 | 삼성전자주식회사 | Profiler for optimizing processor architecture and application |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5692139A (en) * | 1988-01-11 | 1997-11-25 | North American Philips Corporation, Signetics Div. | VLIW processing device including improved memory for avoiding collisions without an excessive number of ports |
US20020198606A1 (en) * | 2001-06-25 | 2002-12-26 | Takeshi Satou | Data processing system and control method |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0605927B1 (en) * | 1992-12-29 | 1999-07-28 | Koninklijke Philips Electronics N.V. | Improved very long instruction word processor architecture |
DE69424370T2 (en) * | 1993-11-05 | 2001-02-15 | Intergraph Corp | Instruction cache with crossbar switch |
US5848288A (en) * | 1995-09-20 | 1998-12-08 | Intel Corporation | Method and apparatus for accommodating different issue width implementations of VLIW architectures |
JP3623840B2 (en) * | 1996-01-31 | 2005-02-23 | 株式会社ルネサステクノロジ | Data processing apparatus and microprocessor |
US5826054A (en) * | 1996-05-15 | 1998-10-20 | Philips Electronics North America Corporation | Compressed Instruction format for use in a VLIW processor |
US6216223B1 (en) * | 1998-01-12 | 2001-04-10 | Billions Of Operations Per Second, Inc. | Methods and apparatus to dynamically reconfigure the instruction pipeline of an indirect very long instruction word scalable processor |
US6317820B1 (en) * | 1998-06-05 | 2001-11-13 | Texas Instruments Incorporated | Dual-mode VLIW architecture providing a software-controlled varying mix of instruction-level and task-level parallelism |
US6240510B1 (en) * | 1998-08-06 | 2001-05-29 | Intel Corporation | System for processing a cluster of instructions where the instructions are issued to the execution units having a priority order according to a template associated with the cluster of instructions |
US6249861B1 (en) * | 1998-12-03 | 2001-06-19 | Sun Microsystems, Inc. | Instruction fetch unit aligner for a non-power of two size VLIW instruction |
JP2000305781A (en) * | 1999-04-21 | 2000-11-02 | Mitsubishi Electric Corp | Vliw system processor, code compressing device, code compressing method and medium for recording code compression program |
JP2001034471A (en) * | 1999-07-19 | 2001-02-09 | Mitsubishi Electric Corp | Vliw system processor |
US6631439B2 (en) * | 2000-03-08 | 2003-10-07 | Sun Microsystems, Inc. | VLIW computer processing architecture with on-chip dynamic RAM |
US7127588B2 (en) * | 2000-12-05 | 2006-10-24 | Mindspeed Technologies, Inc. | Apparatus and method for an improved performance VLIW processor |
GB2370380B (en) * | 2000-12-19 | 2003-12-31 | Picochip Designs Ltd | Processor architecture |
KR100464406B1 (en) * | 2002-02-08 | 2005-01-03 | 삼성전자주식회사 | Apparatus and method for dispatching very long instruction word with variable length |
WO2004029796A2 (en) * | 2002-09-24 | 2004-04-08 | Koninklijke Philips Electronics N.V. | Apparatus, method ,and compiler enabling processing of load immediate instructions in a very long instruction word processor |
AU2003267692A1 (en) * | 2002-10-11 | 2004-05-04 | Koninklijke Philips Electronics N.V. | Vliw processor with power saving |
US7404067B2 (en) * | 2003-09-08 | 2008-07-22 | Intel Corporation | Method and apparatus for efficient utilization for prescient instruction prefetch |
JP4283131B2 (en) * | 2004-02-12 | 2009-06-24 | パナソニック株式会社 | Processor and compiling method |
GB2414308B (en) * | 2004-05-17 | 2007-08-15 | Advanced Risc Mach Ltd | Program instruction compression |
US7840953B2 (en) * | 2004-12-22 | 2010-11-23 | Intel Corporation | Method and system for reducing program code size |
-
2004
- 2004-12-03 GB GB0426606A patent/GB2420884B/en not_active Expired - Fee Related
-
2005
- 2005-12-02 US US11/293,845 patent/US20060155958A1/en not_active Abandoned
- 2005-12-02 JP JP2005349339A patent/JP5112627B2/en not_active Expired - Fee Related
- 2005-12-02 EP EP05257447A patent/EP1667016A3/en not_active Withdrawn
-
2007
- 2007-11-01 US US11/981,973 patent/US9104426B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5692139A (en) * | 1988-01-11 | 1997-11-25 | North American Philips Corporation, Signetics Div. | VLIW processing device including improved memory for avoiding collisions without an excessive number of ports |
US20020198606A1 (en) * | 2001-06-25 | 2002-12-26 | Takeshi Satou | Data processing system and control method |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100185818A1 (en) * | 2009-01-21 | 2010-07-22 | Lanping Sheng | Resource pool managing system and signal processing method |
US8612686B2 (en) * | 2009-01-21 | 2013-12-17 | Huawei Technologies Co., Ltd. | Resource pool managing system and signal processing method |
WO2015035339A1 (en) * | 2013-09-06 | 2015-03-12 | Huawei Technologies Co., Ltd. | System and method for an asynchronous processor with heterogeneous processors |
WO2015035306A1 (en) * | 2013-09-06 | 2015-03-12 | Huawei Technologies Co., Ltd. | System and method for an asynchronous processor with token-based very long instruction word architecture |
US9928074B2 (en) | 2013-09-06 | 2018-03-27 | Huawei Technologies Co., Ltd. | System and method for an asynchronous processor with token-based very long instruction word architecture |
US10133578B2 (en) | 2013-09-06 | 2018-11-20 | Huawei Technologies Co., Ltd. | System and method for an asynchronous processor with heterogeneous processors |
Also Published As
Publication number | Publication date |
---|---|
EP1667016A3 (en) | 2008-01-02 |
GB0426606D0 (en) | 2005-01-05 |
JP2006164279A (en) | 2006-06-22 |
JP5112627B2 (en) | 2013-01-09 |
EP1667016A2 (en) | 2006-06-07 |
GB2420884A (en) | 2006-06-07 |
GB2420884B (en) | 2009-04-15 |
US9104426B2 (en) | 2015-08-11 |
US20080065859A1 (en) | 2008-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0994413B1 (en) | Data processing system with conditional execution of extended compound instructions | |
KR100464406B1 (en) | Apparatus and method for dispatching very long instruction word with variable length | |
US9477475B2 (en) | Apparatus and method for asymmetric dual path processing | |
JP3880056B2 (en) | RISC microprocessor architecture with multiple type register set | |
US5903769A (en) | Conditional vector processing | |
US7676653B2 (en) | Compact instruction set encoding | |
EP1735700B1 (en) | Apparatus and method for control processing in dual path processor | |
US7581082B2 (en) | Software source transfer selects instruction word sizes | |
JPH117387A (en) | Vliw processor | |
JP2002333978A (en) | Vliw type processor | |
US7139899B2 (en) | Selected register decode values for pipeline stage register addressing | |
US20040078554A1 (en) | Digital signal processor with cascaded SIMD organization | |
CN108139911B (en) | Conditional execution specification of instructions using conditional expansion slots in the same execution packet of a VLIW processor | |
US9104426B2 (en) | Processor architecture for processing variable length instruction words | |
US20060095713A1 (en) | Clip-and-pack instruction for processor | |
EP1735699B1 (en) | Apparatus and method for dual data path processing | |
US7340591B1 (en) | Providing parallel operand functions using register file and extra path storage | |
US6438680B1 (en) | Microprocessor | |
US6654870B1 (en) | Methods and apparatus for establishing port priority functions in a VLIW processor | |
US20060095714A1 (en) | Clip instruction for processor | |
EP0924602B1 (en) | Instruction masking in providing instruction streams to a processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PICOCHIP DESIGNS LIMITED, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DULLER, ANDREW;PANESAR, GAJINDER;CLAYDON, PETER;AND OTHERS;REEL/FRAME:017306/0637;SIGNING DATES FROM 20060103 TO 20060112 |
|
AS | Assignment |
Owner name: ETV CAPITAL S.A., LUXEMBOURG Free format text: SECURITY AGREEMENT;ASSIGNOR:PICOCHIP DESIGNS LIMITED;REEL/FRAME:018329/0480 Effective date: 20060804 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MINDSPEED TECHNOLOGIES, INC.;MINDSPEED TECHNOLOGIES U.K., LIMITED;MINDSPEED TELECOMMUNICATIONS TECHNOLOGIES DEVELOPMENT (SHENSHEN) CO. LTD.;AND OTHERS;SIGNING DATES FROM 20140204 TO 20140214;REEL/FRAME:032372/0154 |