GB2332075A - Optimized instruction storage and distribution for parallel processor architecture - Google Patents

Optimized instruction storage and distribution for parallel processor architecture Download PDF

Info

Publication number
GB2332075A
GB2332075A GB9725808A GB9725808A GB2332075A GB 2332075 A GB2332075 A GB 2332075A GB 9725808 A GB9725808 A GB 9725808A GB 9725808 A GB9725808 A GB 9725808A GB 2332075 A GB2332075 A GB 2332075A
Authority
GB
United Kingdom
Prior art keywords
instructions
instruction
routing
irf
processors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9725808A
Other versions
GB2332075B (en
GB9725808D0 (en
Inventor
Alexander Tulai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsemi Semiconductor ULC
Original Assignee
Mitel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitel Corp filed Critical Mitel Corp
Priority to GB9725808A priority Critical patent/GB2332075B/en
Publication of GB9725808D0 publication Critical patent/GB9725808D0/en
Priority to CA 2254200 priority patent/CA2254200A1/en
Priority to DE1998154810 priority patent/DE19854810A1/en
Priority to FR9815270A priority patent/FR2772952B1/en
Priority to SE9804202A priority patent/SE9804202L/en
Publication of GB2332075A publication Critical patent/GB2332075A/en
Application granted granted Critical
Publication of GB2332075B publication Critical patent/GB2332075B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Multi Processors (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

A method of improving the utilization of program memory in a multi parallel processor architecture which utilizes an instruction register file (IRE). The IRE is partitioned into two pages and grouping bits are added to the program instructions to designate the fetch cycle to which the instruction belongs. Routing bits are also used to route the instructions properly to the designated processor. The relative position of the routing instruction within the set of instructions is also used to provide routing information.

Description

2332075 OPTIMIZED INSTRUCTION STORAGE AND DISTRIBUTION FOR PARALLEL
PROCESSORS ARCHITECTURES
Field of the Invention
This invention relates to multiple processors in a parallel configuration and more particularly to a method of improving the utilization of program memory in the process of fetching and distributing instructions when an instruction register file is used.
Background of the Invention
In a single processor architecture, the execution of a program is conventionally divided into three major phases. These phases are: instruction fetching which involves reading one instruction from the program memory into the instruction register (IR); instruction decoding which involves decoding the instructions from IR and preparing the control signals for its execution; and executing the instruction.
In a parallel processor architecture, multiple instructions have to be read from the program memory into multiple instruction registers that could be organized into an instruction register file (IRF) If there are n 1 processors in a multiparallel processor architecture, there should ba at least n instructions which are read from the program ---ito the IRF if unnecessary de-ays are to be avoided. In multiple processor architectures, however, it is not guaranteed that all the processors will have an instruct-on to execute every cycle. In this case, a no operation (NOP) instruction will have:o be routed to the processor for decoding and execution. Obviously, storing NOP instructions into the program memcry is wasteful of 10 program memory and ways of eliminating NOPs have been investicated.
Simply eliminating the NOP instructions creates routing problems in a multiple parallel processor architecture, as it is impossible to successfully route the instructions without -=dditional information.
To overcome this problem it is known to add control bits to --he instructions stored in the program memory for use in grouping instructions and routing information to the intended processor. The requirement to introduce grouping and routing bits to the instructions adds complexity to the system architecture and increases power requirements.
Summary of the Invention
The present invention seeks to provide better utiliza---on of the program memory in a multi-parallel 2 processor implementation by allowing instructions to stretch between to consecutive instruction packs.
The present invention provides a simplified ins--ruction distribution circuit in that the bit routing coding makes use of the instruction position within the group (for groups of two or more instructions).
In the present invention unused distribution control bits are created in certain cases and these bits mav be used for additional functionality.
Therefore, in accordance with a first aspect o."Lc the present invention there is provided in a multi-parallel processor architecture having a program manager for storing processor instructions, an instruction register file for decoding instructions fetched from the program memory for execution by selected ones of the parallel processors a method of improving utilization of the program memory. The method comprises a) partitioning the instruction register file; b) providing a grouping bit to the instructions to identify the fetch cycle to which the instruction belongs and c) providing routing bits to the instructions tc designate which of the processors the instruction is for, wherein the relative position of the routing bit within the instruction provides routing information.
In accordance with a second aspect of the invention there is provided a system for distributing instruc---Jons 3 from a program memory to multi-parallel processors comprising: a partitioned instruction register file (IRF) for receiving and decoding instructions from the program memory; multiple buses for carrying instructions from the memory to the IRF; routing circuitry for directing instructions to designated processors; and a bit route coding sequence to distribute instructions, the route coding utilizing the instruction position within the sequence to provide routing information.
Brief Description of the Drawings
The invention will now be described in greater detail with reference to the attached drawings wherein: Figure 1 illustrates a multi-parallel processor architecture according to the prior art; Figure 2 illustrates the problem caused by eliminating nonoperational instructions from the set of instructions; and Figure 3 illustrates a four-processor architecture whichimplements the present invention.
Detailed Description of the Invention
As previously discussed the execution of a program is divided in most of the processors in use today into three major phases. These are:
1) instruction fetching 4 2) instruction decoding; and 3) instruction execution.
In a para-lel processor architecture multiple instructions have to be read from the Program Memory into multiple IRs t"---at could be organized in a Instruction Register File!IRF) Assuming 7-hat a certain architecture uses "n" processors, at least "n" instructions should be read from the Program Memory into the IRF if unnecess;-:ry delays are to be avoided. However in multi processor arch--- -ectures it is not guaranteed that all the processors will nave an instruction to execute every cycle, in whic---case a NOP (NO Operation) inszruction will have to be routed to the processor for decoding and execution. Storing NOP instructions into the Program Memory is rat--.er wasteful and ways of elimina-ting them have been sought. Figure 1 illustrates a multi parallel processor arc't-.-J--ecture in which the program memory has stored NOP instructicns respecting processors 2 and 3. These NOP instructions are fetched to the Instructicn Register File for delivery:o the respective processors. Obviously this results in memory usage involving no exchange o"meaningful data.
If the un--;esired NOP instructions are eliminated, the routing of the instructions is impossible w-'-Lhout additional 5 information. Figure 2 illustrates the assignment problem for the case of n=4. In this example the NOP instructons relating to processors 2 and 3 have been elimina--ed and the program memory which would have been used for NO? instructions used for other processor instructions. As indicated in Figure 2 this results in instructions intended for processor 4 being wrongly directed to process3r 2.
To solve this problem, control bits are added to the instructions stored in the Program Memory. The number of instruction goes from constant (4 in Figure 1) tc variable (anywhere from 1 to 4 in the case of a 4 processcr architecture or 1 to n in the general case of n processors) These control bits carry two kind of information: 1) grouping information (grouping together all t---e instructions that have to be executed in the same cycle but on different processors); and 2) routing information (maps an instruction to a certain processing unit).
Because of these definitions they shall be referred to as grouping control bits and routing control bits.
In such systems the issues that have to be addressed are:
1) How are the instructions stored in the Progra= Memory and how many of them are written into the IRF in one fetch cycle? 2) What is the optimal size of the IRF (how manv 6 instructions can it accommodate)? 3) What configuration of control bits allows for an optimal distribution of the instruction from the IRF to the processing units? 4) Hcw are the flow control changes (jumps, call to subrcutines etc.) handled? and 5) Hzw does the size of IRF influence the number of reads from the PM and the impact on the power consumption of the device? The present invention demonstrates that the Program Memory waste could be further reduced and the instruction distributing circuitry could be simplified by:
1. allowing a set of instructions (that is to be executed in the same cycle) to spread over two consecutive Program Memcry fetch lines; 2. d'mensioning IRF to 2n where n is a power of 2 (but not necessarily); 3. =ding the distribution control bits as follows: 3. 1) use r= [ log2 (2n1) 1 bits per instruction for routing control; 3.2) in a set of p instructions belonging to the same cycle, with p 2: m (where m is the minimum integer such that mr n), z-ssign each distribution control bit of the first m ins::_ructions to one of the n processors and set them to 0 or 7 1 to indica--e which processor receives,,ihich instruction in the set of n (the matching is done POSi-Lionally from left to right) 4. If the grouping control bits indicate that more than n instructions belong to the same group, do not advance the decoding pcnter in IRF 5. Upon a flow control change, set all the grouping bits of the IRF, t---at will not be written to during the first fetch cycle, to Such a value that an instruction spreading over two consecu-tive Program Memory locations (at the addressed jumped to) --ould not be falsely grouped with instruction left over = the IRF before the flow control occurred.
As mentioned above in a system with n processors, at least n instructions should be read at a time from the Program Me=ory (PM) if delays are to be avoided.
Consequent-y a minimum IRF capacity of n instructions guarantees that no delays are introduced during the fetching phase.
If IR-7 can store more than n instructions, that would allow the elimination of the fetching phase upon jumps to locations::hat are already in the IRF. From this point of view a larzer IRF would behave like a cache memory. However, the size cf the routing circuit needed to send an instructic-ri from IRF to the proper processor becomes huge 8 when any IRF register could be routed to any processor, a situation that occurs if one wants to eliminate the memory wasteful NOPs by accepting a variable instruction regiszer. In these conditions the size of the routing circuitry is kept to a minimum and no additional delays are introduced during the fetching phase if the capacity of the IRF is set to exactly n instructions. However, a third factor in deciding the size of the IRF is the waste of program me=ory location that occurs when the size of the IRF is exactl,; n and the instructions to be decoded and executed every cycle is variable (anywhere from 1 to n).
When variable sized instructions are packed in groups of n and stored in the program memory, it could happen that the room left in the current pack is not enough to fit the next instruction in which case the rest of t--'.e pack will be filled with NOPs and a new pack started. T1he worst scenario possible is that (n-1) locations are available in the current pack while the next cycle instruction length is exactly n. In such a case the wasze could be as high as (n- 1) instructions. The best case -s obviously when instructions could be fitted exactly in an n instruction pack.
To address this waste, we could allow an instruct= to stretch between two packs of length n and thus elimina-Le any waste of PM locations. However, this feature requires -L'--ie 9 extension of the IRF capacity from n to 2n.
A pointer within IRF will indicate where the next instruction to be decoded starts. When this points to an instruction that starts in one pack and finishes in the next pack, the instruction cannot be decoded unless the rest of it is fetched from the PM and available in the IRF. That's why doubling the size of the IRF from n to 2n solves the problem as we could alternatively fetch in one half of IRF or the other and when an instruction that stretches between LWO consecutive packs is to be decoded both the beginning and the end of the instructions are found in IRF. Wrap arounding is used in such a case to maintain the continuity of an instruction.
Having two pages of n instructions significantly increases the size of the IRF circuitry and that of the routing circuitry, however, considerable program memory savings are made possible (some examples on a 4 processor architecture have shown savings of up to 20% for certain programs). In addition to this, 2n locations are enough for the code for some tight loops, the kind we encounter in filtering, to be fully stored in IRF and that would avoid program memory fetches during filtering and consequently would reduce the overall power consumption of the chip. Considering these advantages an IRF with a capacity of 2n C is optimal. Increasing the capacity of the IRF beyond 2n instructions could reduce the power consumpt-i::.n in certain cases and that for a very high cost in increa-c-ed IRF and routing circuitry, and it is not justifiable _n general.
The control bits used for grouping are u-zed to indicate which instructions from IFR should be routed z:) the n processors for decoding during the current cy=le. The minimum number of bits used for this operatic--- for each instruction is 1. The routing circuitry will analyse the grouping bit for n consecutive IRF instructicns and the decisions taken are summarized in Table 1.
Table 1: Grouping control bits decoder' Instr. 1 Instr. 2..... Instr. n Decision Gr p X X.. X X 1 instruction cycle R - 0 P p X X X 2 instruction cycle U p p X. X X 3 instruction cycle 1 I - And so on N G p F p PP X n-1 instruction cycle B p p F.. F p n instruction cycle I p p p 0 instruction cycle, NOps T will be pushed to all proc essers a. x - don't care, p 011. f - 110 r=[log2(2n-1)] bits are required to identify io what processor an individual instruction should be routed, where [] is the integer part function defined as:
[X] = x, x G: N rn, n<x<n+.., n E N However, if each instruction carries its own routing bits, a redundancy appears when groups of p instruc-Lions with p > m, where m is such that m r 22t n, do not exploit the position of the instruction in the group.
Consider the example where n=4.
The number of routing bits r required for routing one instruction is: r=log2(24-1)]=[2.8]=2 which indeed corresponds to the 4 possible combinations one can make with two bits.
For m=2 we have: mr=4=n so for any group of p > 2 instructions we have some redundancy within the routing bits if the position of the instruction in the group is not exploited.
For p=3,r=2,n=4 and the following three instructions:
gr 1 r 1 i 1.1 1 1 o 1 112- - 2 2 2 2.2 3 3.3.3.3 grorl'1'2'L gror,1112 'L where: g's are the grouping bits r's are the routing bits i's are the instruction bits L is the instruction length and let's assume as well that the 3 instructions should go to processors 1, 3 and 4 with g=1 and the natural assignment of the 2 bit combinations we have the following control bits for the example given:
12 1 1 1 OW'1'2... 'L 2.2 2 3.3.3 010'112 'L 111'1;21L However, the following group of identical instructions would be distributed just as well to the processors 1, 3 and 4 for the simple reason that each instruction carries its own routing bits and the order of the instructions in the group dcesn't count 33.3 Owilil... 1.2.2.2 0111112... &L 1 2 iL 1101112-1L This introduces a redundancy that translates into a somewhat faster circuit but significantly larger than in the case when the position of the instructions in the group would be exploited.
If we assume that the three instructions to be routed to processors 1, 3 and 4 are placed exactly in this order when packed and placed in the program memory, we would need 20 exactly n=4 bits to show how the mapping is done.
-s of the first two By concatenating the routing bit 1 1 2 2 instructions (rorl, and rorl) we get exactly the 4 bits needed to show how the assignment is done and the following table w-'11 cover all possible cases for groups of 2, 3 and 4 instructions.
13 Table 2: Proposed routing bits assigment p 1 1 2 2 Routing decision 2 0011 NOP->proc.1,NOP->proc.2, i 112 'L ->proc.3 9 2.2.2 ->proc.4 112 &L 2 0101 1 1.1 NOP-nroc.1, '1'2... 'L ->proc.2,NOP->proc.3 9 2.2.2 ->proc.4 1g21L 2 1001 1112A ->PrOC.1,NOP->proc.2,NOP->proc.3 2.2.2 1112-1L ->proc.4 2 0110 NOP. 1, i 1 1. 1.21.2 1'2-SL ->proc 2, 1112-1L ->proc.3 NOP->proc.4 2 1010.1.1 1.2.2.2 112... 'L ->proc. 1,NOP->proc.2, 1112 'L ->proc.3 NOP->proc.4 2 1100.1.1 1 2.2.2 1112-1L ->PrOC.l,lil'2'L ->proc.2,NOP->proc.3 NOP->proc.4 3 0111 NOP->proc.1, 1 1.1 -2.2.2 1'2'.'L ->proc.2, 1112 'L ->proc.3 1.3.3.3 ->proc.4 112-ZL 3 1011.1.1 1.2.2.2 1112 A ->proc. LNOP->proc.2, g 112. 'L ->proc.3 3.3 3 1112-1L ->proc.4 3 1101.1.1 1.2.2.2 Y2-'L ->proc. 1 111Z2-ZL ->proc.2,NOP->proc.3 -3.3 1112A ->proc.4 3 1110.1.1 1 -2.2.2 1112-'L ->PrOC"11112"L ->proc.2, 3.3.3 111Z2ZL ->proc.3, NOP->proc.4 4 1111.1.1 1 -2.2.2 1112'L ->proC'11X112-1L ->proc.2, -3.3.3 ->PT0C. 4.4.4 I'112-1L 3, '112-'L ->proc.4 Depending how the circuit is implemented the last coding could prove redundant just as well because for the case p=4 i: is clear which instruction goes to which 14 processor (because we have an equal number of instructions and processors).
Going back to the previous example, we see now that 3 3 bits ror, are not used any more. These bits are redundant.
In the case of 4 instructions in the group not only are two 4 4 more bits becoming redundant (rorl) but depending on the 1 1 2 2 implementation even the first four bits (ror, and rori) could be redundant.
Because the routing of the instructions from the IRF to the processors is done after the instructions have been fetched from the program memory into the IRF, at least one cycle of delay will be introduced during an instruction flow change to a PM location that is not already loaded into IRF.
During this time NOPs will be pushed to all processors.
However if the instruction at the address jumped to, is one that stretches over two consecutive packs of n instructions, an additional cycle is needed to load the second pack into the second IRF page, before the instruction could actually be routed to the appropriate processor.
However, because of the previous instructions left over in that IRF page, an early and wrong routing might take place, if during a jump, the second page of the IRF (first in this case being the one to which the first PM location is fetched into) will have all its grouping bits set to q such that the decoder will be forced to default to the last case is in Table 1, with NOPs being pushed to all processors and the pointer within IRF preserving the old value.
---nis is a very elegant way of handling the jumps because it does not require any additional circuitry.
Moreo7er the circuit will work just as well during RESET when all the grouping bits will be held to 17 and NOPs will be pushed automatically to all units while the pointer will be locked at 0.
7igure 3 shows an architecture with 4 processors, 3 contr^-! bits (one for grouping and two for routing), an IRF with -wo pages of 4 instructions each. Not shown in Figure 3 is the circuitry that enables writing to IRF, based on the value of the current IRF pointer, the grouping and control bits and the instructions to be executed. Figure 3 does show a 4 processor implementation wherein the IRF has two pages of fc-ir instructions each. From the program memory, four buses carry four instructions to IRF during a fetch cycle.
Each bus is Lc+1 bits wide with three control bits and L instruction bits, so that L,= L+2.
-A-1though one implementation of the invention has been descr-'bed and illustrated it will be apparent to one skilled in the art that several alterations can be made without depa----ing from the basic concept. It is to be understood that such alterations will fall within the scope of the inven--ion as defined by the appended claims.
16

Claims (8)

  1. Claims: 1. In a multi, parallel processor architecture having a program
    memory for storing processor instructions and an instruction register file for decoding instructions fetched -ed ones of from said program memory for execution by select said parallel processors, a method of improving utilization of said program memory comprising: a) partitioning said instruction register file; b) providing a grouping bit to said instructions to identify the fetch cycle to winich said instruction belongs; and c) providing routing bits to said instructions to designate which of said processors said instruction is to be routed, wherein the position of said routing bit within said instructions provides routing information.
  2. 2. A method as defined in claim 1, wherein said instruction register file is partitioned into two sections.
  3. 3. A method as defined in claim 2 wherein the number of parallel processors is n and the capacity of the instruction register file is 2n.
  4. 4. A method as defined in claim 3 wherein the number of routing bits (r) is in accordance with the expression: r= 11092 (2n-1) 1.
    17
  5. 5. A method as defined in claim 1 wherein said grouping bit is used to indicate which instructions from the instruction register file is to be decoded in the current cycle.
  6. 6. A system for distributing instructions from a program memory to multiparallel processors comprising: a partitioned instruction register file (IRF) for receiving and decoding instructions from the program memory; multiple buses for carrying instructions from the memory to the IRF; routing circuitry for directing instructions to designated processors; and a bit route coding sequence to distribute instructions, the route coding utilizing the instruction position within the sequence to provide routing information.
  7. 7. A system as defined in claim 6 wherein said instruction register file is partitioned into two pages.
  8. 8. A system substantially as herein described, with reference to figure 3 of the accompanying drawings.
    18
GB9725808A 1997-12-06 1997-12-06 Optimized instruction storage and distribution for parallel processors architectures Expired - Fee Related GB2332075B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
GB9725808A GB2332075B (en) 1997-12-06 1997-12-06 Optimized instruction storage and distribution for parallel processors architectures
CA 2254200 CA2254200A1 (en) 1997-12-06 1998-11-20 Optimized instruction storage and distribution for parallel processors architectures
DE1998154810 DE19854810A1 (en) 1997-12-06 1998-11-27 Optimized storage and distribution of instructions for parallel processor architectures
FR9815270A FR2772952B1 (en) 1997-12-06 1998-12-03 INSTRUCTION STORAGE AND DISTRIBUTION METHOD AND SYSTEM FOR PARALLEL PROCESSOR ARCHITECTURES AND CORRESPONDING ARCHITECTURE
SE9804202A SE9804202L (en) 1997-12-06 1998-12-04 Optimized instruction storage and distribution for architectures for parallel processor architectures

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB9725808A GB2332075B (en) 1997-12-06 1997-12-06 Optimized instruction storage and distribution for parallel processors architectures

Publications (3)

Publication Number Publication Date
GB9725808D0 GB9725808D0 (en) 1998-02-04
GB2332075A true GB2332075A (en) 1999-06-09
GB2332075B GB2332075B (en) 2002-08-07

Family

ID=10823193

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9725808A Expired - Fee Related GB2332075B (en) 1997-12-06 1997-12-06 Optimized instruction storage and distribution for parallel processors architectures

Country Status (5)

Country Link
CA (1) CA2254200A1 (en)
DE (1) DE19854810A1 (en)
FR (1) FR2772952B1 (en)
GB (1) GB2332075B (en)
SE (1) SE9804202L (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7395532B2 (en) 2002-07-02 2008-07-01 Stmicroelectronics S.R.L. Process for running programs on processors and corresponding processor system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69130588T2 (en) * 1990-05-29 1999-05-27 National Semiconductor Corp., Santa Clara, Calif. Partially decoded instruction cache and method therefor
GB2263985B (en) * 1992-02-06 1995-06-14 Intel Corp Two stage window multiplexors for deriving variable length instructions from a stream of instructions
EP1338957A3 (en) * 1993-11-05 2003-10-29 Intergraph Corporation Software scheduled superscalar computer architecture
US5974534A (en) * 1994-02-14 1999-10-26 Hewlett-Packard Company Predecoding and steering mechanism for instructions in a superscalar processor

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7395532B2 (en) 2002-07-02 2008-07-01 Stmicroelectronics S.R.L. Process for running programs on processors and corresponding processor system
US7617494B2 (en) * 2002-07-02 2009-11-10 Stmicroelectronics S.R.L. Process for running programs with selectable instruction length processors and corresponding processor system
US8176478B2 (en) 2002-07-02 2012-05-08 Stmicroelectronics S.R.L Process for running programs on processors and corresponding processor system

Also Published As

Publication number Publication date
FR2772952A1 (en) 1999-06-25
GB2332075B (en) 2002-08-07
GB9725808D0 (en) 1998-02-04
FR2772952B1 (en) 2001-09-21
CA2254200A1 (en) 1999-06-06
SE9804202D0 (en) 1998-12-04
SE9804202L (en) 1999-06-07
DE19854810A1 (en) 1999-06-10

Similar Documents

Publication Publication Date Title
US6851041B2 (en) Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor
US6002880A (en) VLIW processor with less instruction issue slots than functional units
US5930508A (en) Method for storing and decoding instructions for a microprocessor having a plurality of function units
US7941648B2 (en) Methods and apparatus for dynamic instruction controlled reconfigurable register file
JP3880056B2 (en) RISC microprocessor architecture with multiple type register set
US7725687B2 (en) Register file bypass with optional results storage and separate predication register file in a VLIW processor
US6343356B1 (en) Methods and apparatus for dynamic instruction controlled reconfiguration register file with extended precision
US5051885A (en) Data processing system for concurrent dispatch of instructions to multiple functional units
US7865692B2 (en) Methods and apparatus for automated generation of abbreviated instruction set and configurable processor architecture
JP4657455B2 (en) Data processor
EP0605927A1 (en) Improved very long instruction word processor architecture
US7149875B2 (en) Data reordering processor and method for use in an active memory device
US7383419B2 (en) Address generation unit for a processor
US7308559B2 (en) Digital signal processor with cascaded SIMD organization
US4223381A (en) Lookahead memory address control system
WO2019133258A1 (en) Look up table with data element promotion
US7340591B1 (en) Providing parallel operand functions using register file and extra path storage
US7096344B2 (en) Processor for improving instruction utilization using multiple parallel processors and computer system equipped with the processor
US5642523A (en) Microprocessor with variable size register windowing
GB2332075A (en) Optimized instruction storage and distribution for parallel processor architecture
US5862399A (en) Write control unit
US6243798B1 (en) Computer system for allowing a two word jump instruction to be executed in the same number of cycles as a single word jump instruction
USRE41012E1 (en) Register file indexing methods and apparatus for providing indirect control of register addressing in a VLIW processor
CA2150518C (en) Vector processing unit with reconfigurable data buffer
JPH05173778A (en) Data processor

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20031206