EP0463296A2 - Préprocesseur disposé en mémoire pour un processeur ayant un jeu échelonable d'instructions combinées - Google Patents

Préprocesseur disposé en mémoire pour un processeur ayant un jeu échelonable d'instructions combinées Download PDF

Info

Publication number
EP0463296A2
EP0463296A2 EP91104324A EP91104324A EP0463296A2 EP 0463296 A2 EP0463296 A2 EP 0463296A2 EP 91104324 A EP91104324 A EP 91104324A EP 91104324 A EP91104324 A EP 91104324A EP 0463296 A2 EP0463296 A2 EP 0463296A2
Authority
EP
European Patent Office
Prior art keywords
instructions
instruction
compounding
tag
parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP91104324A
Other languages
German (de)
English (en)
Other versions
EP0463296A3 (fr
Inventor
Bartholomew Blaner
Stamatis Vassiliadis
Richard James Eickemeyer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of EP0463296A2 publication Critical patent/EP0463296A2/fr
Publication of EP0463296A3 publication Critical patent/EP0463296A3/xx
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/382Pipelined decoding, e.g. using predecoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Definitions

  • This invention relates to digital computers and digital data processors, and particularly to digital computers and data processors capable of executing two or more instructions in parallel.
  • a recent mechanism for accelerating the computational speed of uni-processors is found in reduced instruction set architecture that employs a limited set of very simple instructions.
  • Another acceleration mechanism is complex instruction set architecture which is based upon a minimal set of complex multi-operand instructions.
  • Superscalar computers suffer from disadvantages which it is desirable to minimize. A concrete amount of time is consumed in deciding at instruction execution time which instructions can be executed in parallel. This time cannot be readily masked by overlapping with other machine operations. This disadvantage becomes more pronounced as the complexity of the instruction set architecture increases. Also, the parallel execution decision must be repeated each time the same instructions are to be executed.
  • a scalable compound instruction set machine (SCISM) architecture is proposed in which instruction level parallelism is achieved by statically analyzing a sequence of scalar instruction at a time prior to instruction execution to generate compound instructions formed by adjacent grouping of existing instructions in the sequence which are capable of parallel execution. Relevant control information in the form of tags is added to the instruction stream to indicate where a compound instruction starts, as well as to indicate the number of existing instructions which are incorporated into a compound instruction.
  • the term "compounding” refers to the grouping of instructions contained in a sequence of instructions, the grouping being for the purpose of concurrent or parallel execution of the grouped instructions.
  • compounding is satisfied by "paring" of two instructions for simultaneous execution.
  • compounded instructions are unaltered from the forms they have when presented for scalar execution.
  • compounded instructions are accompanied by compounding tag information, that i, bits appended to the grouped instructions which denote the grouping of the instructions for parallel execution.
  • a particularly advantageous embodiment of the invention is based upon a memory architecture which provides for compounding of instruction prior to their issue and execution.
  • a memory is a component of a hierarchical memory structure which provides instructions to the CPU (central processing unit) of a computer.
  • a structure includes a high-speed cache storage containing frequently accessed instructions, a lower speed main memory or primary storage connected to the cache, and a low-speed, high-capacity auxiliary storage.
  • the cache and main storage contain instructions which can be directly referenced for execution. Access to instructions in the auxiliary storage is had through an input/output (I/O) adaptor connected between the main memory and the auxiliary storage.
  • I/O input/output
  • the invention resides in a combination including an input/output interface for providing, from secondary storage, a sequence of instructions for execution, an instruction compounding mechanism which produces compounding tag information in response to the instruction sequence, the compound tag information indicating instructions of the sequence which may be executed in parallel, and a main storage connected to the input/output interface and to the instruction compounding mechanism for storing the sequence of instructions with the compound tag information.
  • main memory provides residence for data and instructions which are immediately accessible to a CPU for reference for execution.
  • the use of the main memory in a well-designated hierarchical storage system in and of itself serves to improve the overall performance of a scalar computer.
  • the storing of the compounding tag information in the main memory enables the information to be used over and over son long as the instructions remain in the main memory.
  • instruction sin main memory once passed to a cache, frequently remain in a cache long enough to be used more than once.
  • FIG. 1 of the drawings there is shown a representative embodiment of a portion of a digital computer system or digital data processing system constructed in accordance with this invention.
  • the illustrated computer system is capable of executing two or more instructions in parallel. It includes a hierarchically-arranged storage system in which auxiliary or secondary storage devices are connected via an I/O bus to a computer.
  • the computer interfaces with the I/O bus through an adaptor which is also connected to a memory bus.
  • the main memory and a high-speed cache are connected to the memory bus.
  • This hierarchy typically permits the computational components of the computer system to directly access or refer to the contents of the main memory and the cache, while the adaptor provides access to the auxiliary storage. Instructions and data which must be accessed or referenced to support current computer operations are kept in the memory. When no longer required, these are returned to the auxiliary memory by way of the adaptor, while new instructions and data are entered into the main memory.
  • the cache supports high-speed access by the CPU and is used to store instructions and data which are currently being used or are highly likely to be used next by the CPU. Hierarchical storage structures are explained in detail in Chapter 7 of Deitel's OPERATING SYSTEMS, second edition. 1990.
  • FIG. 1 of the drawing a representative embodiment of a portion of a digital computer system having a hierarchically-arranged memory structure is shown in accordance with the present invention.
  • This computer system is capable of processing two or more instructions in parallel. It includes a first storage mechanism for storing instructions and data to be processed.
  • the storage mechanism is identified as a main memory 10.
  • the main memory 10 is connected to a memory bus 9 having an address and command bus 9a and a text bus 9b.
  • the main memory 10 exchanges instructions and data over the memory bus with an IO adaptor 8.
  • the IO adaptor 8 is connected to the memory bus 9 and to an IO bus 7. It is asserted that one or more auxiliary storage devices (not shown) are coupled to the IO bus 7.
  • the adaptor 8 transfers data over the IO bus 7 by storing program information to and obtaining program information from the auxiliary storage devices.
  • the adaptor 8 also exchanges program data on the memory bus 9 with the main memory 10 by providing instructions and data to and receiving instructions and data from the main memory over the bus 9.
  • the adaptor 8 buffers instructions and data between the buses 7 and 9, which may have differing speeds and formats.
  • the adaptor 8 also includes checking and error functions.
  • An IO adaptor corresponding to the component indicated by reference number 8 is found in, for example, the channelized IO subsystem of the model 3090 computer system, available from IBM Corporation, the assignee of this patent application.
  • the main memory 10 is a relatively large capacity, medium-speed storage mechanism which is connected by way of the memory bus 9 to a lower capacity, higher-speed cache.
  • This cache is identified as a compound instruction cache 12.
  • the computer system in Fig. 1 also includes an instruction compounding mechanism 11 for receiving instructions from the adaptor 8 and associating with those instructions compound tag information in the form of tag fields indicating which of these instructions may be processed in parallel.
  • This instruction compounding mechanism is represented by instruction compounding unit 11.
  • the compounding unit 11 analyzes the incoming instructions for determining which ones may be processed in parallel. Furthermore, the instruction compounding unit 11 produces for those analyzed instructions compounding tag information in the form of tag fields which indicate which instructions may be processed in parallel with one another and which ones may not be processed in parallel with one another.
  • instructions are provided to the computing system from an auxiliary storage device by way of the adaptor 8, the instruction compounding unit 11, and the main memory 10.
  • the main memory 10 receives and stores the analyzed instructions and their associated tag fields.
  • the main memory 10 then provides the analyzed instructions and their associated tag fields to the compound instruction cache 12.
  • the cache 12 has a smaller capacity and higher speed than the main memory 10 and is of the kind commonly used for improving performance rate of a computer system by reducing the frequency of the access to the main memory 10.
  • the computer system of Fig. 1 also includes a plurality of functional instruction processing units. These functional instruction processing units are represented by functional units 13, 14, 15 and so on. These functional units 13-15 operate in parallel with one another in a concurrent manner and each, on its own, is capable of processing one or more type of machine-level instructions. Examples of functional units which may be used include a general purpose arithmetic and logic unit (ALU) and address generation type ALU, a data dependency collapsing ALU of the type taught in co-pending application serial no. 07/504,910 (IBM Docket EN9-90-014), a branch instruction processing unit, a data shifting unit, a floating-point processing unit, and so on. A given computer system may include two or more of some of these types of functional units. For example, a given computer system may include two or more general purpose ALU's. Also, a given computer system may include each and every one of these different types of functional units. The particular configuration of functional units will depend on the nature of the particular computer system being considered.
  • ALU
  • the computer system of Fig. 1 also includes an instruction fetch and issue mechanism coupled to the compound instruction cache 12 for supplying adjacent instructions stored therein to different ones of the functional instruction processing units 13-15 when the instruction tag fields indicate that they may be processed in parallel.
  • This mechanism is represented by instruction fetch and issue unit 16.
  • Instruction fetch and issue unit 16 fetches instructions from cache 12, examines their tag fields and operation code (OP code) fields and, based upon such examinations, sends the instructions to the appropriate ones of the functional units 13-15. If a desired instruction is resident in the compound instruction cache 12, the appropriate address is sent to the cache 12 to fetch therefrom the desired instruction. This is sometimes referred to as a "cache hit". If the requested instruction does not reside in the cache 12, then it must be fetched from the main memory 10 and brought in to the cache 12.
  • OP code operation code
  • miss This is sometimes referred to as a "cache miss".
  • the address of the requested instruction is sent to the main memory 10.
  • the main memory 10 commences the transfer out or read out of a line of instructions which includes the requested instruction, together with the tag fields of the instructions in the line.
  • a cache miss causes reference to be made to the main memory 10 to determine whether the requested instruction is contained in the memory 10.
  • instructions are commonly stored in the main memory in blocks called “pages” and the memory management facility (not shown) of the computing system is able to determine from the requested instruction whether the page which contains it is in the main memory. If the page is in the main memory, the line containing instruction is transferred out or read out of the main memory 10 into the cache 12. However, if the page containing the requested instruction is not in main memory 10, a "page fault" occurs requiring the missing page to be "fetched” from auxiliary storage and placed into the main memory 10. When a page is fetched, the identification of the missing page is sent tot he adaptor 8, which retrieves it and then provides it, over the memory bus 9, for storage into the main memory 10.
  • pages which are fetched for storage in the main memory 10 are transferred to the input of the instruction compounding unit 11, which unit proceeds to analyze these incoming instructions and generate the appropriate tag field for each instruction.
  • the tags and instructions are thereafter applied to the main memory 10 and stored therein for subsequent placement, if needed, in the compound instruction cache 12.
  • instruction compounding unit 11 is illustrated in Fig. 1 as being connected between the adaptor 8 and the main memory 10, it is contemplated that the unit may be a separate drop on the memory bus 9 or connected at the input to the main memory 10.
  • Figs. 2A-2D Storage of compounded instructions in the main memory 10 can be implemented in a number of ways, some of which are illustrated in Figs. 2A-2D.
  • the examples in Figs. 2A-2D assume an 8-byte wide text bus 9b plus extra lines for the tag information.
  • the basic memory transfer between the main memory 10 and the compound instruction cache 12 involves a 64-byte cache line, with one tag bit for each two bytes of instruction text.
  • One cache line is shown in each of the examples of Figs. 2A-2D.
  • the number of tag bits is determined by the maximum number of instructions to be compounded and the information available to the instruction compounding unit 11. These considerations are covered in co-pending applications serial nos. 07/519,382 and 07/504,910 (IBM Dockets EN9-90-019 and EN9-90-014).
  • Fig. 2A The simplest tag storage implementation from a control point of view is illustrated in Fig. 2A. If it is assumed that compounding is limited to two instructions, a minimum of a one-bit tag for each two bytes of instruction text is required. Thus, for the line stored in the memory of Fig. 2A, every 64 bits (that is every eight bytes) requires four bits of compounding tag information. As illustrated in Fig. 2A storage of this information involves extension of the word size from 64 to 68 bits. Other optional tag bits would increase the size of the extended words.
  • Fig. 2B A second approach, more compatible with available memory technology is illustrated in Fig. 2B.
  • Fig. 2B separate text and tag memories are provided by storage of instructions and associated compounding tag information.
  • the tag memory operates in parallel with the text memory. Implicit in the memory structure of Fig. 2B is the requirement for an extra set of taglines forming a tag bus on the memory bus 9 to provide parallel operation of the text and tag memories. This has several advantages over the extended word approach in Fig. 2A.
  • the tag memory may cover only part of the words in main memory.
  • the operating system uses certain parts of memory only for data pages (as opposed to instruction pages), tags are not necessarily over these parts.
  • Distinction between data and instruction pages can be a hardware decision, or one made in software and implemented by commands to the tag memory which indicate that certain pages contain data only and therefore do not require the memory page address to be mapped into the tag memory address for these pages.
  • the second advantage is that the tag memory can be removed at will to produce a lower cost system. This broadens the performance range possible in a family of computers. If more tag bits are needed, as would be required for more than two-way compounding, a new tag memory will be substituted for the tag memory in Fig. 2B without requiring a change in the main memory design. Further, each memory can be provided with its own error correction.
  • the compounding tags accompany the instruction stream in the memory, whether woven into the stream, appended to sections of it, or maintained in parallel with it.
  • a first section of the main memory contains tag tables, and a second storage of instruction text pages.
  • operating system support is required to reserve the tag table portion of the memory and pair memory pages with tag pages.
  • portions of each page are reserved for tags. This requires a capability in the compiler for page constructions. For example, with 64 byte cache lines, a compiler would use 60 bytes for instructions and 4 bytes for tags.
  • tags are pated with instruction bytes in the instruction cache when requested by the CPU.
  • Fig. 1 An implication of the computer system in Fig. 1 is that the instruction compounding unit 11 can form a part of the bus adaptor 8. Thus when any page is brought in from the IO system, it is subjected to the compounding process implicit in the unit 11 and moved on the memory bus 9 to the main memory 10. From hereon, the discussion assumes a page structure according to Fig. 2A, implying that the text bus 9b is 68 bits wide and that the main memory is configured and controlled to store pages such as that illustrated in Fig. 2A. Of course, the compounding instruction cache 12 is configured and controlled to receive lines including extended words as illustrated in Fig. 2A.
  • a page is loaded into a page buffer in the adaptor 8 and provided to the instruction compounding unit 11 as described below.
  • two page buffers 18a and 18b send a sequence of pages to the instruction compounding unit 11 which undertakes compounding operations by adding compounding tag information to page instructions. Pages processed by the compounding unit 11 are fed to the main memory 10 through compounded page buffers 19a and 19b.
  • the compounding unit adds time to that required to fetch a text segment from auxiliary storage and enter it into the main memory 10. However, the time added is small relative to the total time required, and is asynchronous to the CPU.
  • each segment i is transferred from an auxiliary storage device such as a disk drive to one of the page buffers 18a or 18b.
  • Each of the time segments b i indicate the time required to transfer text segment i from a page buffer to the main memory 10.
  • text segment i is transferred in time a i into one of the page buffers 18a or 18b, following which text segment i+1 is transferred to the other of the buffers.
  • text segment i is transferred in time b i from the page buffer, where it is currently stored, into the main memory. As Fig. 4 shows, this time is substantially shorter than the time required to fetch a page to one of the buffers 18a or 18b.
  • the time required for the operation of the compounding unit 11 to be performed on a text segment in one of the page buffers plus the time spend in a compounded buffer 19a or 19b is represented by compound time c i .
  • the time b i is that required to transfer text segment i from a page buffer to the compounding unit 11.
  • the compounding time c i is incurred while text segment i is subjected to the process of the compounding unit 11.
  • the sum of the time b i and c i is less than the time a a .
  • the superscaler machine must decide at instruction execution time whether instructions may be executed in parallel.
  • Figs. 3 and 4 illustrate two principal advantages of compounding in main memory.
  • the compounding can be made part of the asynchronous page fault process without extending the time to complete that process.
  • compounding of large blocks of instruction text, such as pages provides a larger scope of consideration for compounding, which can result in more optimized compounding.
  • an in-memory instruction compounding unit such as that illustrated in Fig. 1, will provide performance advantages, as the CPU will always execute instructions that have been compounded and the compounding can be better optimized than when performed synchronously on a smaller section of instruction text.
  • Fig. 5a shows a portion of a stream of compounded or tagged instructions as they might appear at the output of the instruction compounding unit 11 of Fig. 1.
  • each instruction (Instr.) has a tag field added to it by the instruction compounding unit 11.
  • the tagged instructions like those shown in Fig. 5a are stored into the main memory in the page block for the page containing the instructions.
  • tagged instructions in the main memory 10 are transferred to the cache 12 when a "miss" occurs. Thereafter, the tagged instructions in the cache 12 are fetched by the instruction fetch and issue unit 16.
  • tags are examined to determine if they may be processed in parallel and their operation code (OP CODE) fields are examined to determine which of the available functional units is most appropriate for their processing. If the tag fields indicate that two or more of the instructions are suitable for processing in parallel, then they are sent to the appropriate ones of the functional units in accordance with the codings of their OP CODE fields. Such instructions are then processed concurrently with one another by their respective functional units.
  • OP CODE operation code
  • the instruction execution rate of the computer system would be N times as great as for the case where instructions are executed one at a time, with N being the number of instructions in the groups which are being processed in parallel.
  • the tagged instruction stream of Fig. 5a is easier to preprocess by an instruction compounding unit if known reference points exist to indicate where instructions begin. Such a reference point will provide precise knowledge of where an instruction boundary occurs.
  • instruction boundaries are expressly known only by a compiler at compile time and only by a CPU when instructions are fetched.
  • a boundary reference point is unknown between compile time and instruction fetch unless a special boundary reference scheme is adopted.
  • Such a scheme is illustrated in Fig. 5b by instruction boundary bits B.
  • AS Fig. 5b illustrates, the boundary bits may be placed in the instruction stream by the compiler at compile time to provide a reference for instruction alignment just prior to compounding.
  • Fig. 6 shows in greater detail the internal construction of a representative embodiment of an instruction compounding unit in accordance with the present invention.
  • This instruction compounding unit 20 is suitable for use as the instruction compounding unit 11 of Fig. 1.
  • the instruction compounding unit 20 of Fig. 6 is designed for the case where a maximum of two instructions at a time may be processed in parallel. However, this is not meant to limit the invention only to pairwise compounding.
  • a 1-bit tag field is used.
  • a tag bit value of "1" (one) means that the instruction is a "first" instruction.
  • a tag bit value of "0" (zero) means that the instruction is a "second” instruction and may be executed in parallel with the preceding first instruction.
  • An instruction having a tag bit value of 1 may be executed either by itself or at the same time and in parallel with the next instruction, depending upon the tag bit value for such next instruction.
  • Each pairing of an instruction having a tag bit value of one with a succeeding instruction having a tag bit value of zero forms a compound instruction for parallel execution purposes, that is, the instructions in such a pair may be processed in parallel with one another.
  • the tag bits for two succeeding instructions each have a value of one
  • the first of these instructions is executed by itself in a nonparallel manner.
  • all of the instructions in the sequence would have a tag bit value of one.
  • all of the instructions would be executed one at a time in a nonparallel manner.
  • an instruction alignment unit receives from the I/O adaptor the instruction stream which is to be compounded.
  • the instruction stream may include boundary bits B, as illustrated in Fig. 5b.
  • instruction alignment is simply a matter of detecting boundary bits and decoding instruction OP codes.
  • OP codes include bits which give instruction length in bytes or half words. Therefore, once a boundary bit B has been identified for an instruction, the next instruction can be unambiguously identified by counting the number of bytes or half words from the boundary bit. Instruction alignment is not a feature of this invention, it being understood that instruction boundaries are identified by any known method, including the use of boundary bits.
  • the instruction compounding unit 20 of Fig. 6 includes a plural-instruction instruction register 21 for receiving a plurality of successive instructions from the page buffers 18a and 18b of the adapters. Instruction compounding unit 20 also includes a plurality of rule-based instruction analyzer mechanisms. Each such instruction analyzer mechanism analyzes a different pair of side-by-side instructions in the instruction register 21 and produces a compoundability signal which indicates whether or not the two instructions in its pair may be processed in parallel. In Fig. 6, there are shown a plurality of compound analyzer units 22-25. Each of these compound analyzer units 22-25 includes two of the instruction analyzer mechanisms just mentioned. Thus, each of these analyzers units 22-25 produces two of the compoundability signals.
  • the first compound analyzer unit 22 produces a first compoundability signal M01 which indicates whether or not Instructions 0 and 1 may be processed in parallel.
  • Compound analyzer units 22 also produces a second compoundability signal M12 which indicates whether or not Instructions 1 and 2 may be processed in parallel.
  • the second compound analyzer unit 23 produces a first compoundability signal M23 which indicates whether or not Instructions 2 and 3 may be processed in parallel and a second compoundability signal M34 which indicates whether Instructions 3 and 4 may be processed in parallel.
  • the third compound analyzer 24 produces a first compoundability signal M45 which indicates whether or ot Instructions 4 and 5 may be processed in parallel and a second compoundability signal M56 which indicates whether or not Instructions 5 and 6 may be processed in parallel.
  • the fourth compound analyzer 25 produces a first compoundability signal M67 which indicates whether or not Instructions 6 and 7 may be processed in parallel and a second compoundability signal M78 which indicates whether Instructions 7 and 8 may be processed in parallel.
  • the instruction compounding unit 20 further includes a tag generating mechanism 26 responsive to the compoundability signals appearing at the outputs of the analyzer units 22-25 for generating the individual tag fields for the different instructions in the instruction register 21.
  • These tag fields T0, T1, T2 etc. are supplied to a tagged instruction register 27, as are the instructions themselves, the latter being obtained from the input instruction register 21. In this manner, there is provided in the compounding unit output register 27 a tag field T0 for Instruction 0, a tag field T1 for Instruction 1, etc.
  • each tag field T0, T1, T2, etc. is comprised of a single binary bit.
  • a tag bit value of "one” indicates that the immediately following instruction to which it is attached is a "first" instruction.
  • a tag bit value of "zero” indicates that the immediately following instruction is a "second” instruction.
  • An instruction having a tag bit value of one followed by an instruction having a tag bit value of zero indicates that those two instructions may be executed in parallel with one another.
  • the tagged instructions in the compounding unit output register 27 are supplied to the input of the main memory 10 of Fig. 1 via one or the other of the compounding buffers 19a or 19b of Fig. 3.
  • the compounded instructions are stored into the main memory 10.
  • the compound analyzer 22 includes instruction compatibility logic 30 for examining the op code of Instruction 0 and the op code of Instruction 1 and determining whether these two op codes are compatible for purposes of execution in parallel.
  • Logic 30 is constructed in accordance with predetermined rules to select which pairs of op codes are compatible for execution in parallel. More particularly, logic 30 includes logic circuitry for implementing rules which define which types of instructions are compatible for parallel execution in the particular hardware configuration used for the computer system being considered. If the op codes for Instruction 0 and 1 are compatible, then logic 30 produces at its output a binary one level signal. If they are not compatible, logic 30 produces a binary zero value on its output line.
  • Compound analyzer 22 further includes a second instruction compatibility logic 31 for examining the op codes of Instructions 1 and 2 and determining whether they are compatible for parallel execution.
  • Logic 31 is constructed in the same manner as logic 30 in accordance with the same predetermined rules used for logic 30 to select which pairs of op codes are compatible for execution in parallel for the case of Instructions 1 and 2.
  • logic 31 includes logic circuitry for implementing rules which define which types of instructions are compatible for parallel execution, these rules being the same as those used in logic 30. If the op codes for Instructions 1 and 2 are compatible, then logic 31 produces a binary one level output. Otherwise, it produces a binary zero level output.
  • Compound analyzer 22 further includes first register dependency logic 32 for detecting conflicts in the usage of the general purpose registers designated by the R1 and R2 fields of Instructions 0 and 1. These general purpose registers will be discussed in greater detail hereinafter.
  • dependency logic 32 may be constructed to detect the occurrence of a data dependency condition wherein a second instruction (Instruction 1) needs to use the results obtained by the performance of the proceeding instruction (Instruction 0). In this case, either the second instruction can be executed by the dependency collapsing hardware, thus executing in parallel with the first instruction, or the execution of the second instruction must await completion of the execution of the preceding instruction and, hence, cannot be executed in parallel with the preceding instruction.
  • Compound analyzer 22 further includes second register dependency logic 33 for detecting conflicts in the usage of the general purpose registers designated by the R1 and R2 fields of Instructions 1 and 2.
  • This logic 33 is one of the same construction as the previously discussed logic 32 and produces a binary one level output if there are no register dependencies or the register dependencies can be executed by the data dependency collapsing hardware, and a binary zero level output otherwise.
  • the output lines from the instruction compatibility logic 30 and the register dependency logic 32 are connected to the two inputs of an AND circuit 34.
  • the output line of AND 34 has a binary one value if the two op codes being considered are compatible and if there are no register dependencies. This binary one value on the AND 34 output line indicates that the two instructions being considered are compatible, that is, are executable in parallel. If, on the other hand, the AND 34 output line has a binary value of zero, then the two instructions are not compoundable. Thus, there is produced on the AND 34 output line a first compoundability signal M01 which indicates whether or not Instructions 0 and 1 may be processed in parallel. This M01 signal is supplied to the tag generator 26.
  • AND 35 produces on its output line a second compoundability signal M12 which has a binary value of one if the two op codes being considered (op codes for Instructions 1 and 2) are compatible and if there are no register dependencies for Instructions 1 and 2 or register dependencies that can be executed by the data dependency collapsing hardware. Otherwise, the AND 35 output line has a binary value of zero.
  • the output line from AND 35 runs to a second input of the tag generator 26.
  • the other compound analyzers 23-25 shown in Fig. 6 are of the same internal construction as shown in Fig. 7 for the first compound analyzer.
  • Fig. 8 there is shown an example of the logic circuitry that can be used to implement the compound analyzer 22 and the portion of the tag generator 26 which is used to generate the first three tags, Tag 0 and Tag 1 and Tag 2.
  • Fig. 5 it is assumed that there are two categories of instructions which are designated as category A and category B.
  • the rules for compounding these categories of instructions are assumed to be as follows:
  • Fig. 8 shows the internal logic circuitry that may be used for the instruction compatibility logic 30 and the instruction compatibility logic 31 of Fig. 7.
  • the instruction compatibility logic 30 includes decoders 40 and 41, AND circuits 42 and 43 and OR circuit 44.
  • the second instruction compatibility logic 31 includes decoders 41 and 45, AND circuits 46 and 47 and OR circuit 48.
  • the middle decoder 41 is shared by both logics 30 and 31.
  • the first logic 30 examines the op codes OP0 and OP1 for Instructions 0 and 1 to determine their compatibility for parallel execution purposes. This is done in accordance with Rules (1)-(4) set forth above. Decoder 40 looks at the op code of the first instruction and if it is a category A op code, the A output line of decoder 40 is set to the one level. If OP0 is a category B op code, then the B output line of decoder 40 is set to a one level. If Op0 belongs to neither category A nor category B, then both outputs of decoder 40 are at the binary zero level. The second decoder 41 does a similar kind of decoding for the second op code OP1.
  • AND circuit 42 implements Rule (1) above. If OP0 is a category A op code and OP1 is also a category A op code, then AND 42 produces a one level output. Otherwise, the output of AND 42 is a binary zero level. AND 43 implements Rule (4) above. If the first op code is a category B op code and the second op code is a category a op code, then AND 43 produces a one level output. Otherwise, it produces a zero level output. If either AND 42 or AND 43 produces a one level output, this drives the output of OR circuit 44 to one level, in which case, the compoundability signal M01 has a value of one. This one value indicates that the first and second instructions (Instructions 0 and 1) are compatible for parallel execution purposes.
  • the second instruction compatibility logic 31 performs a similar type of op code analysis for the second and third instructions (Instructions 1 and 2). If the second op code OP1 is a category A op code and the third op code OP2 is a category A op code, then, per Rule (1), AND 46 produces a one level output and the second compoundability signal M12 is driven to the compoundability-indicating binary one level. If, on the other hand OP1 is a category B op code and OP2 is a category A op code, then, per Rule (4), AND 47 is activated to produce a binary one level for the second compoundability signal M 12. For any op code combination other than those set forth in Rules (1) and (4), the M12 signal has a value of zero.
  • Fig. 8 shows the logic circuitry that can be used in tag generator 26 to respond to the M01 and M12 compoundability signals to produce the desired tag bit values for Tags 0, 1 and 2.
  • a tag bit value of one indicates that the associated instruction is "first" instruction for parallel execution purposes.
  • a tag bit value of zero indicates that the associated instruction is a "second" instruction for parallel execution purposes.
  • the only instruction in the pair has a tag bit value of zero. Any instruction having a tag bit value of one which is followed by another instruction having a tag bit value of one is executed by itself in a singular manner and not in parallel with the following instruction.
  • Tag 2 has a binary value of one
  • the status of its associated Instruction 2 is dependent on the binary value for Tag 3. If Tag 3 has a binary value of zero, then Instructions 2 and 3 can be executed in parallel. If, on the other hand, Tag 3 has a binary value of one, then Instruction 2 will be executed in a singular, nonparallel manner. It is noted that the logic implemented for the tag generator 26 does not permit the occurrence of two successive tag bits having binary values of zero.
  • FIG. 9 An examination of Fig. 9 reveals the logic needed to be implemented by the portion of tag generator 26 shown in Fig. 8. As indicated in Fig. 9, Tag 0 will always have a binary value of one. This is accomplished by providing a constant binary value of one to tag generator output line 50 which constitutes the Tag 0 output line. An examination of Fig. 9 further reveals that the bit value for Tag 1 is always the opposite of the bit value of the M01 compoundability signal. This result is accomplished by connecting output line 51 for Tag 1 to the output of NOT circuit 52, the input of which is connected to the M01 signal line.
  • the binary level on Tag 2 output line 53 is determined by an OR circuit 54 and a NOT circuit 55.
  • One input of OR 54 is connected to the M01 line. If M01 has a value of one, then Tag 2 has a value of one. This takes care of the Tag 2 values in the second and fourth rows of Fig. 9.
  • the other input of OR 54 is connected by way of NOT 44 to the M12 signal line. If M12 has a binary value of zero, this value is inverted by NOT 55 to supply a binary one value to the second input of OR 54. This causes the Tag 2 output line 53 to have a binary one value. This takes care of the Tag 2 value for row one of Fig. 9. Note that for the row 3 case, Tag 2 must have a value of zero. This will occur because, for this case, M01 will have a value of zero and M12 will have a value of one which is inverted by NOT 55 to produce a zero at the second input of OR 54.
  • Implicit in the logic of Fig. 9 is a prioritization rule for the row four case where each of M01 and M12 has a binary value of one.
  • this row four case can be produced by an instruction category sequence of BAA.
  • Rule (5) is followed and the 101 sequence shown in Fig. 9 is chosen.
  • the BA pairing is given preference over the AA pairing.
  • the 1,1 pattern for N01 and M12 can also be produced an an op code sequence of AAA.
  • the 101 tag sequence of Fig. 9 is again selected. This is better because it provides a one value for Tag 2 and, hence, potentially enables Instruction 2 to be compounded with Instruction 3 if Instruction 2 is compatible with Instruction 3.
  • FIG. 10 there is shown a detailed example of how a computer system can be constructed for using the compounding tags of the present invention to provide parallel processing of machine-level computer instructions.
  • the instruction compounding unit 20 used in Fig. 10 is assumed to be of the type described in Fig. 6 and, as such, it adds to each instruction a one-bit tag field. These tag fields are used to identify which pairs of instructions may be processed in the parallel. Pages containing these tagged instructions are supplied to and stored into the main memory 10. As the tagged instructions are needed, they are read or transferred into the cache 12.
  • Fetch/Issue control unit 60 fetches the tagged instructions from cache 12, as needed, and arranges for their processing by the appropriate one or ones of a plurality of functional instruction processing units 61, 62, 63 and 64.
  • Fetch/Issue unit 60 examines the tag fields and op code fields of the fetched instructions. If the tag fields indicate that two successive instructions may be processed in parallel, then fetch/issue unit 60 assigns them to the appropriate ones of the functional units 61-64 as determined by their op codes and they are processed in parallel by the selected functional units. If the tag fields indicate that a particular instruction is to be processed in a singular, nonparallel manner, then fetch/issue unit 60 assigns it to a particular functional unit as determined by its op code and it is processed or executed by itself.
  • the first functional unit 61 is a branch instruction processing unit for processing branch type instructions.
  • the second functional unit 62 is a three input address generation arithmetic and logic unit (ALU) which is used to calculate the storage address for instructions which transfer operands to or from storage.
  • the third functional unit 63 is a general purpose arithmetic and logic unit (ALU) which is used for performing mathematical and logical type operations.
  • the fourth functional unit 64 in the present example is a data dependency collapsing ALU of the kind described in the above-referenced co-pending application Serial No.: 07/504,910 (IBM Docket EN9-90-014). This dependency collapsing ALU 64 is a three-input ALU capable of performing two arithmetical/logical operations in a single machine cycle.
  • the computer system embodiment of Fig. 10 also includes a set of general purpose registers 65 for use in executing some of the machine-level instructions.
  • these general purpose registers 65 are used for temporarily storing data operands and address operands or are used as counters or for other data processing purposes.
  • sixteen (16) such general purpose registers are provided.
  • general purpose registers 65 are assumed to be one of the multiport type wherein two or more registers may be accessed at the same time.
  • the computer system of Fig. 10 further includes a high-speed data cache storage mechanism 66 for storing data operands obtained from the higher-level storage unit 10. Data in the cache 66 may also be transferred back to the main memory 10. Data cache 66 may be of a known type and its operation relative to the main memory 10 may be conducted in a known manner.
  • Fig. 11 shows an example of a compounded or tagged instruction sequence which may be processed by the computer system of Fig. 10.
  • the Fig. 11 example is composed of the following instructions in the following sequence: Load, Add, Compare, Branch on Condition and Store. These are identified as instructions 11-15, respectively.
  • the tag bits for these instructions are 1,1,0,1, and 0, respectively. Because of the organization of the machine shown in Fig. 10, the Load instruction is processed in a singular manner by itself.
  • the Add and Compare instructions are treated as a compound instruction and are processed in parallel with one another.
  • the Branch and Store instructions are also treated as a compound instruction and are also processed in parallel with one another.
  • the table of Fig. 12 gives further information on each of these Fig. 11 instructions.
  • the R/M column in Fig. 12 indicates the content of a first field in each instruction which is typically used to identify the particular one of general purpose registers 65 which contains the first operand. As exception is the case of the Branch on Condition instruction, wherein the R/M field contains a condition code mask.
  • the R/X column in Fig. 12 indicates the content of a second field in each instruction, which field is typically used to identify a second one of the general purpose registers 65. Such register may contain the second operand or may contain an address index value (X).
  • the B column in Fig. 12 indicates the content of a third possible field in each instruction, which field may identify a particular one of the general purpose registers 65 which contains a base address value.
  • a zero in the B column indicates the absence of a B field or the absence of a corresponding address component in the B field.
  • the D field of Fig. 12 indicates the content of a further field in each instruction which, when used for address generation purposes, includes an address displacement value.
  • a zero in the D column may also indicate the absence f a corresponding field in the particular instruction being considered or, alternatively, an address displacement value of zero.
  • the fetch/issue control unit 60 determines from the tag bits for this Load instruction and the following Add instruction that the Load instruction is to be processed in a singular manner by itself.
  • the action to be performed by this Load instruction is to fetch an operand from storage, in this case the data cache 66, and to place such operand into the R2 general purpose register.
  • the storage address from which this operand is to be fetched is determined by adding together the index value in register X, the base value in register B and the displacement value D.
  • the fetch/issue control unit 60 assigns this address generation operation to the address generation ALU 62.
  • ALU 62 adds together the address index value in register X (a value of zero in the present example), the base address value contained in general purpose register R 7 and the displacement address value (a value of zero in the present example) contained in the instruction itself.
  • the resulting calculated storage address appearing at the output of ALU 62 is supplied to the address input of data cache 66 to access the desired operand. This accessed operand is loaded into the R2 general purpose register in register set 65.
  • the control unit 60 examines the compounding tags for these two instructions and notes that they may be executed in parallel.
  • the Compare instruction has an apparent data dependency on the Add instruction since the Add must be completed before R3 can be compared. This dependency, however, can be handled by the data dependency collapsing ALU 64. Consequently, these two instructions can be processed in parallel in the Fig. 10 configuration.
  • the control unit 60 assigns the processing of the Add instruction to ALU 63 and assigns the processing of the Compare instruction to the dependency collapsing ALU 64.
  • ALU 63 adds the contents of the R2 general purpose register to the contents of the R3 general purpose register and places the result of the addition back into the R3 general purpose register.
  • the dependency collapsing ALU 64 performs the following mathematical operation: R3 + R2 - R4
  • condition code for the result of this operation is sent to a condition code register located in branch unit 61.
  • the data dependency is collapsed because ALU 64, in effect, calculates the sum of R3 + R2 and then compares this sum with R4 to determine the condition code. In this manner, ALU 64 does not have to wait on the results from the ALU 63 which is performing the Add instruction.
  • the numerical results of calculated by the ALU 64 and appearing at the output of ALU 64 is not supplied back to the general purpose registers 65. In this case, ALU 64 merely sets the condition code.
  • Control unit 60 determines from the tag bits for these instructions that they may be processed in parallel with one another. It further determines from the op codes of the two instructions that the Branch instruction should be processed by the branch unit 61 and the Store instruction should be processed by the address generation ALU 62. In accordance with this determination, the mask field M and the displacement field D of the Branch instruction are supplied to the branch unit 61. Likewise, the address index value in register X and the address base value in register B for this Branch instruction are obtained from the general purpose registers 65 and supplied to the branch unit 61. In the present example, the X value is zero and the base value is obtained from the R7 general purpose register. The displacement value D has a hexadecimal value of twenty, while the mask field M has a mask position value of eight.
  • the branch unit 61 commences to calculate the potential branch address (0 + R7 + 20) and at the same time compares the condition code obtained from the previous Compare instruction with the condition code mask M. If the condition code value is the same as the mask code value, the necessary branch condition is met and the branch address calculated by the branch unit 61 is thereupon loaded into an instruction counter in control unit 60. This instruction counter controls the fetching of the instructions from the compound instruction cache 12. If, on the other hand, the condition is not met (that is, the condition code set by the previous instruction does not have a value of eight), then no branch is taken and no branch address is supplied to the instruction counter in control unit 60.
  • the address generation ALU 62 is busy doing the address calculation (0 + R7 + 0) for the Store instruction.
  • the address calculation by ALU 62 is supplied to the data cache 66. If no branch is taken by the branch unit 61, then the Store instruction operates to store the operand in the R3 general purpose register into the data cache 66 at the address calculated by ALU 62. If, on the other hand, the branch condition is met and the branch is taken, then the contents of the R3 general purpose register is not stored into the data cache 66.
  • Fig. 11 The foregoing instruction sequence of Fig. 11 is intended as an example only.
  • the computer system embodiment of Fig. 10 is equally capable of processing various and sundry other instruction sequences.
  • Each pairing of an instruction having a tag bit value of one with a succeeding instruction having a tag bit value of zero forms a compound instruction for parallel execution purposes, that is, the instructions in which a pair may be processed in parallel with one another.
  • the tag bits for two succeeding instructions each have a value of one, the first of these instructions is executed by itself in a nonparallel manner.
  • all of the instructions in the sequence would have a tag bit value of one.
  • all of the instructions would be executed one at a time in a nonparallel manner.
  • each pair of adjacent instruction is analyzed to determine whether the pair can be executed in parallel.
  • memory compounding offers the possibility of examining many compoundings over more than two instructions and choosing the best grouping available.
  • the instruction compounding unit has been illustrated particularly as being positioned between the I/O adaptor and the memory bus. This example is not meant to exclude any other locations in memory where the instruction compounding unit can operate. For example, it can be absorbed into the I/O adaptor, it can operate as a separate unit on the memory bus 9 (at which location it could compound either in the main memory 10 or in the compound instruction cache 12), or it can compromise a unit attached only to the main memory through a private memory port not accessible over the memory bus 9.
  • the compounder can also function between the main memory and instruction cache, as taught in the co-pending application entitled "compounding Preprocessor for Cache".

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
EP91104324A 1990-06-26 1991-03-20 Préprocesseur disposé en mémoire pour un processeur ayant un jeu échelonable d'instructions combinées Withdrawn EP0463296A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US543464 1990-06-06
US54346490A 1990-06-26 1990-06-26

Publications (2)

Publication Number Publication Date
EP0463296A2 true EP0463296A2 (fr) 1992-01-02
EP0463296A3 EP0463296A3 (fr) 1994-03-23

Family

ID=24168174

Family Applications (1)

Application Number Title Priority Date Filing Date
EP91104324A Withdrawn EP0463296A2 (fr) 1990-06-26 1991-03-20 Préprocesseur disposé en mémoire pour un processeur ayant un jeu échelonable d'instructions combinées

Country Status (10)

Country Link
US (2) US5355460A (fr)
EP (1) EP0463296A2 (fr)
JP (1) JPH0778738B2 (fr)
BR (1) BR9102128A (fr)
CA (1) CA2038264C (fr)
CZ (1) CZ280269B6 (fr)
HU (1) HUT57920A (fr)
PL (1) PL165585B1 (fr)
RU (1) RU2109333C1 (fr)
SK (1) SK93491A3 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2274527A (en) * 1993-01-20 1994-07-27 Hitachi Ltd Microprocessor
WO1997025669A1 (fr) * 1996-01-04 1997-07-17 Advanced Micro Devices, Inc. Procede et appareil permettant de traduire un premier ensemble d'instructions en un second ensemble d'instructions
EP1023660A1 (fr) * 1997-10-13 2000-08-02 Idea Corporation Processeur faisant appel a un codage d'instructions par champ de gabarit
US6360313B1 (en) 1993-11-05 2002-03-19 Intergraph Corporation Instruction cache associative crossbar switch

Families Citing this family (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69129569T2 (de) * 1990-09-05 1999-02-04 Philips Electronics Nv Maschine mit sehr langem Befehlswort für leistungsfähige Durchführung von Programmen mit bedingten Verzweigungen
JP2642529B2 (ja) * 1991-04-30 1997-08-20 株式会社東芝 並列プロセッサーの命令分配処理装置
US5539911A (en) * 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
US5493687A (en) 1991-07-08 1996-02-20 Seiko Epson Corporation RISC microprocessor architecture implementing multiple typed register sets
US5961629A (en) * 1991-07-08 1999-10-05 Seiko Epson Corporation High performance, superscalar-based computer system with out-of-order instruction execution
GB2263565B (en) * 1992-01-23 1995-08-30 Intel Corp Microprocessor with apparatus for parallel execution of instructions
EP0650612B1 (fr) * 1992-03-25 1999-08-18 Zilog Incorporated Decodage d'instructions rapide dans un processeur pipeline
US5438668A (en) 1992-03-31 1995-08-01 Seiko Epson Corporation System and method for extraction, alignment and decoding of CISC instructions into a nano-instruction bucket for execution by a RISC computer
WO1993020505A2 (fr) 1992-03-31 1993-10-14 Seiko Epson Corporation Planification d'instructions pour ordinateurs superscalaires a jeu d'instructions reduit
KR950701437A (ko) 1992-05-01 1995-03-23 요시오 야마자끼 슈퍼스칼라 마이크로프로세서에서의 명령어 회수를 위한 시스템 및 방법
US5416913A (en) * 1992-07-27 1995-05-16 Intel Corporation Method and apparatus for dependency checking in a multi-pipelined microprocessor
US5590348A (en) * 1992-07-28 1996-12-31 International Business Machines Corporation Status predictor for combined shifter-rotate/merge unit
DE59301610D1 (de) * 1992-09-22 1996-03-21 Siemens Ag Verfahren zur bearbeitung eines anwenderprogramms auf einem parallelrechnersystem
US6735685B1 (en) 1992-09-29 2004-05-11 Seiko Epson Corporation System and method for handling load and/or store operations in a superscalar microprocessor
DE69329778T2 (de) 1992-09-29 2001-04-26 Seiko Epson Corp System und verfahren zur handhabung von laden und/oder speichern in einem superskalar mikroprozessor
DE69330889T2 (de) 1992-12-31 2002-03-28 Seiko Epson Corp System und Verfahren zur Änderung der Namen von Registern
US5628021A (en) 1992-12-31 1997-05-06 Seiko Epson Corporation System and method for assigning tags to control instruction processing in a superscalar processor
US6154828A (en) * 1993-06-03 2000-11-28 Compaq Computer Corporation Method and apparatus for employing a cycle bit parallel executing instructions
US5504914A (en) * 1993-06-23 1996-04-02 National Science Council Multi-level instruction boosting method using plurality of ordinary registers forming plurality of conjugate register pairs that are shadow registers to each other with different only in MSB
DE69424370T2 (de) * 1993-11-05 2001-02-15 Intergraph Corp Befehlscachespeicher mit Kreuzschienenschalter
US5509129A (en) * 1993-11-30 1996-04-16 Guttag; Karl M. Long instruction word controlling plural independent processor operations
US5598546A (en) * 1994-08-31 1997-01-28 Exponential Technology, Inc. Dual-architecture super-scalar pipeline
US5819059A (en) * 1995-04-12 1998-10-06 Advanced Micro Devices, Inc. Predecode unit adapted for variable byte-length instruction set processors and method of operating the same
US5991869A (en) * 1995-04-12 1999-11-23 Advanced Micro Devices, Inc. Superscalar microprocessor including a high speed instruction alignment unit
US5931941A (en) * 1995-04-28 1999-08-03 Lsi Logic Corporation Interface for a modularized computational unit to a CPU
US5848288A (en) * 1995-09-20 1998-12-08 Intel Corporation Method and apparatus for accommodating different issue width implementations of VLIW architectures
US5872947A (en) * 1995-10-24 1999-02-16 Advanced Micro Devices, Inc. Instruction classification circuit configured to classify instructions into a plurality of instruction types prior to decoding said instructions
US5958042A (en) * 1996-06-11 1999-09-28 Sun Microsystems, Inc. Grouping logic circuit in a pipelined superscalar processor
US5924128A (en) * 1996-06-20 1999-07-13 International Business Machines Corporation Pseudo zero cycle address generator and fast memory access
US5845099A (en) * 1996-06-28 1998-12-01 Intel Corporation Length detecting unit for parallel processing of variable sequential instructions
US6049863A (en) * 1996-07-24 2000-04-11 Advanced Micro Devices, Inc. Predecoding technique for indicating locations of opcode bytes in variable byte-length instructions within a superscalar microprocessor
US5867680A (en) * 1996-07-24 1999-02-02 Advanced Micro Devices, Inc. Microprocessor configured to simultaneously dispatch microcode and directly-decoded instructions
US5724422A (en) * 1996-08-05 1998-03-03 Industrial Technology Research Institute Encrypting and decrypting instruction boundaries of instructions in a superscalar data processing system
US5941980A (en) * 1996-08-05 1999-08-24 Industrial Technology Research Institute Apparatus and method for parallel decoding of variable-length instructions in a superscalar pipelined data processing system
US5852727A (en) * 1997-03-10 1998-12-22 Advanced Micro Devices, Inc. Instruction scanning unit for locating instructions via parallel scanning of start and end byte information
US5887161A (en) * 1997-03-31 1999-03-23 International Business Machines Corporation Issuing instructions in a processor supporting out-of-order execution
US5875336A (en) * 1997-03-31 1999-02-23 International Business Machines Corporation Method and system for translating a non-native bytecode to a set of codes native to a processor within a computer system
US5913048A (en) * 1997-03-31 1999-06-15 International Business Machines Corporation Dispatching instructions in a processor supporting out-of-order execution
US6098167A (en) * 1997-03-31 2000-08-01 International Business Machines Corporation Apparatus and method for fast unified interrupt recovery and branch recovery in processors supporting out-of-order execution
US5898885A (en) * 1997-03-31 1999-04-27 International Business Machines Corporation Method and system for executing a non-native stack-based instruction within a computer system
US5870582A (en) * 1997-03-31 1999-02-09 International Business Machines Corporation Method and apparatus for completion of non-interruptible instructions before the instruction is dispatched
US5898850A (en) * 1997-03-31 1999-04-27 International Business Machines Corporation Method and system for executing a non-native mode-sensitive instruction within a computer system
US5805849A (en) * 1997-03-31 1998-09-08 International Business Machines Corporation Data processing system and method for using an unique identifier to maintain an age relationship between executing instructions
US5940602A (en) * 1997-06-11 1999-08-17 Advanced Micro Devices, Inc. Method and apparatus for predecoding variable byte length instructions for scanning of a number of RISC operations
US6134649A (en) * 1997-11-17 2000-10-17 Advanced Micro Devices, Inc. Control transfer indication in predecode which identifies control transfer instruction and an alternate feature of an instruction
US6167506A (en) * 1997-11-17 2000-12-26 Advanced Micro Devices, Inc. Replacing displacement in control transfer instruction with encoding indicative of target address, including offset and target cache line location
US6118940A (en) * 1997-11-25 2000-09-12 International Business Machines Corp. Method and apparatus for benchmarking byte code sequences
US6314493B1 (en) 1998-02-03 2001-11-06 International Business Machines Corporation Branch history cache
US6061786A (en) * 1998-04-23 2000-05-09 Advanced Micro Devices, Inc. Processor configured to select a next fetch address by partially decoding a byte of a control transfer instruction
US6175908B1 (en) 1998-04-30 2001-01-16 Advanced Micro Devices, Inc. Variable byte-length instructions using state of function bit of second byte of plurality of instructions bytes as indicative of whether first byte is a prefix byte
US6141745A (en) * 1998-04-30 2000-10-31 Advanced Micro Devices, Inc. Functional bit identifying a prefix byte via a particular state regardless of type of instruction
US6230260B1 (en) 1998-09-01 2001-05-08 International Business Machines Corporation Circuit arrangement and method of speculative instruction execution utilizing instruction history caching
US6658552B1 (en) * 1998-10-23 2003-12-02 Micron Technology, Inc. Processing system with separate general purpose execution unit and data string manipulation unit
US6332215B1 (en) 1998-12-08 2001-12-18 Nazomi Communications, Inc. Java virtual machine hardware for RISC and CISC processors
US7225436B1 (en) 1998-12-08 2007-05-29 Nazomi Communications Inc. Java hardware accelerator using microcode engine
US6826749B2 (en) 1998-12-08 2004-11-30 Nazomi Communications, Inc. Java hardware accelerator using thread manager
JP2000284970A (ja) * 1999-03-29 2000-10-13 Matsushita Electric Ind Co Ltd プログラム変換装置及びプロセッサ
GB2352066B (en) * 1999-07-14 2003-11-05 Element 14 Ltd An instruction set for a computer
US6711670B1 (en) * 1999-10-14 2004-03-23 Hewlett-Packard Development Company, L.P. System and method for detecting data hazards within an instruction group of a compiled computer program
US6438664B1 (en) 1999-10-27 2002-08-20 Advanced Micro Devices, Inc. Microcode patch device and method for patching microcode using match registers and patch routines
EP1102165A1 (fr) * 1999-11-15 2001-05-23 Texas Instruments Incorporated Microprocesseur avec paquet d'exécution à longueur de deux ou plusiers paquets d'extraction
US7039790B1 (en) 1999-11-15 2006-05-02 Texas Instruments Incorporated Very long instruction word microprocessor with execution packet spanning two or more fetch packets with pre-dispatch instruction selection from two latches according to instruction bit
US6618801B1 (en) * 2000-02-02 2003-09-09 Hewlett-Packard Development Company, L.P. Method and apparatus for implementing two architectures in a chip using bundles that contain microinstructions and template information
DE10043003A1 (de) * 2000-09-01 2002-03-14 Infineon Technologies Ag Programmgesteuerte Einheit
KR20020028814A (ko) 2000-10-10 2002-04-17 나조미 커뮤니케이션즈, 인코포레이티드 마이크로코드 엔진을 이용한 자바 하드웨어 가속기
US7149878B1 (en) 2000-10-30 2006-12-12 Mips Technologies, Inc. Changing instruction set architecture mode by comparison of current instruction execution address with boundary address register values
US7711926B2 (en) * 2001-04-18 2010-05-04 Mips Technologies, Inc. Mapping system and method for instruction set processing
US6826681B2 (en) * 2001-06-18 2004-11-30 Mips Technologies, Inc. Instruction specified register value saving in allocated caller stack or not yet allocated callee stack
US7107439B2 (en) * 2001-08-10 2006-09-12 Mips Technologies, Inc. System and method of controlling software decompression through exceptions
US8769508B2 (en) 2001-08-24 2014-07-01 Nazomi Communications Inc. Virtual machine hardware for RISC and CISC processors
JP3564445B2 (ja) * 2001-09-20 2004-09-08 松下電器産業株式会社 プロセッサ、コンパイル装置及びコンパイル方法
US7395408B2 (en) * 2002-10-16 2008-07-01 Matsushita Electric Industrial Co., Ltd. Parallel execution processor and instruction assigning making use of group number in processing elements
US7917734B2 (en) 2003-06-30 2011-03-29 Intel Corporation Determining length of instruction with multiple byte escape code based on information from other than opcode byte
US7269715B2 (en) * 2005-02-03 2007-09-11 International Business Machines Corporation Instruction grouping history on fetch-side dispatch group formation
US7475223B2 (en) * 2005-02-03 2009-01-06 International Business Machines Corporation Fetch-side instruction dispatch group formation
US7664765B2 (en) * 2005-07-12 2010-02-16 Cipherflux, Llc Method for accelerating the computational speed of a computer algorithm
US7562206B2 (en) * 2005-12-30 2009-07-14 Intel Corporation Multilevel scheme for dynamically and statically predicting instruction resource utilization to generate execution cluster partitions
JP2007272353A (ja) * 2006-03-30 2007-10-18 Nec Electronics Corp プロセッサ装置及び複合条件処理方法
GR1006531B (el) 2008-08-04 2009-09-10 Διαμορφωση εντυπου πολλαπλων επιλογων αναγνωσιμου με ηλεκτρονικο μεσο και συστημα και μεθοδος για την ερμηνεια κατ' ελαχιστον μιας επιλογης του χρηστη
US9354888B2 (en) * 2012-03-28 2016-05-31 International Business Machines Corporation Performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching
US9348596B2 (en) * 2013-06-28 2016-05-24 International Business Machines Corporation Forming instruction groups based on decode time instruction optimization
RU2620731C1 (ru) * 2016-07-20 2017-05-29 федеральное государственное казенное военное образовательное учреждение высшего образования "Военная академия связи имени Маршала Советского Союза С.М. Буденного" Способ совместного арифметического и помехоустойчивого кодирования и декодирования

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3401376A (en) * 1965-11-26 1968-09-10 Burroughs Corp Central processor
US4025771A (en) * 1974-03-25 1977-05-24 Hughes Aircraft Company Pipe line high speed signal processor
JPS53108254A (en) * 1977-03-02 1978-09-20 Nec Corp Information processor
US4295193A (en) * 1979-06-29 1981-10-13 International Business Machines Corporation Machine for multiple instruction execution
US4439828A (en) * 1981-07-27 1984-03-27 International Business Machines Corp. Instruction substitution mechanism in an instruction handling unit of a data processing system
US4594655A (en) * 1983-03-14 1986-06-10 International Business Machines Corporation (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions
US4847755A (en) * 1985-10-31 1989-07-11 Mcc Development, Ltd. Parallel processing method and apparatus for increasing processing throughout by parallel processing low level instructions having natural concurrencies
US5021945A (en) * 1985-10-31 1991-06-04 Mcc Development, Ltd. Parallel processor system for processing natural concurrencies and method therefor
JPH0769824B2 (ja) * 1988-11-11 1995-07-31 株式会社日立製作所 複数命令同時処理方式

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
17TH ANNUAL SYPOSIUM ON COMPUTER ARCHITECTURE, 28 May 1990 , SEATTLE,US, pages 216 - 226 HORST ET AL. 'Multiple instruction issue in the NonStop Cyclone Processor' *
IEEE TRANSACTIONS ON COMPUTERS, vol. C-32, no. 5 , May 1983 pages 425 - 438 REQUA AND MCGRAW 'The piecewise dataflow architecture' *
PROCEEDINGS SUPERCOMPUTING 88, 14 November 1988 , ORLANDO,US, pages 88 - 95 WANG AND WU 'I-NET mechanism for issuing multiple instructions' *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2274527A (en) * 1993-01-20 1994-07-27 Hitachi Ltd Microprocessor
GB2274527B (en) * 1993-01-20 1997-07-23 Hitachi Ltd Microprocessor
US6360313B1 (en) 1993-11-05 2002-03-19 Intergraph Corporation Instruction cache associative crossbar switch
WO1997025669A1 (fr) * 1996-01-04 1997-07-17 Advanced Micro Devices, Inc. Procede et appareil permettant de traduire un premier ensemble d'instructions en un second ensemble d'instructions
US5826089A (en) * 1996-01-04 1998-10-20 Advanced Micro Devices, Inc. Instruction translation unit configured to translate from a first instruction set to a second instruction set
EP1023660A1 (fr) * 1997-10-13 2000-08-02 Idea Corporation Processeur faisant appel a un codage d'instructions par champ de gabarit
EP1023660A4 (fr) * 1997-10-13 2001-08-01 Idea Corp Processeur faisant appel a un codage d'instructions par champ de gabarit

Also Published As

Publication number Publication date
CZ280269B6 (cs) 1995-12-13
RU2109333C1 (ru) 1998-04-20
US5355460A (en) 1994-10-11
EP0463296A3 (fr) 1994-03-23
CZ93491A3 (en) 1995-07-12
CA2038264C (fr) 1995-06-27
HU911101D0 (en) 1991-10-28
SK93491A3 (en) 1995-09-13
PL289724A1 (en) 1992-04-21
JPH04232532A (ja) 1992-08-20
PL165585B1 (pl) 1995-01-31
JPH0778738B2 (ja) 1995-08-23
US5459844A (en) 1995-10-17
HUT57920A (en) 1991-12-30
BR9102128A (pt) 1991-12-24

Similar Documents

Publication Publication Date Title
US5355460A (en) In-memory preprocessor for compounding a sequence of instructions for parallel computer system execution
US5475853A (en) Cache store of instruction pairs with tags to indicate parallel execution
US5295249A (en) Compounding preprocessor for cache for identifying multiple instructions which may be executed in parallel
US5659722A (en) Multiple condition code branching system in a multi-processor environment
Sakai et al. An architecture of a dataflow single chip processor
US5881308A (en) Computer organization for multiple and out-of-order execution of condition code testing and setting instructions out-of-order
US6542985B1 (en) Event counter
US5418973A (en) Digital computer system with cache controller coordinating both vector and scalar operations
US5197135A (en) Memory management for scalable compound instruction set machines with in-memory compounding
US5983336A (en) Method and apparatus for packing and unpacking wide instruction word using pointers and masks to shift word syllables to designated execution units groups
US5398321A (en) Microcode generation for a scalable compound instruction set machine
Benitez et al. Code generation for streaming: An access/execute mechanism
EP0518420A2 (fr) Système ordinateur pour le traitement simultané hors-séquence d'instructions multiples
KR20040005927A (ko) 데이터 처리장치에서의 소스 레지스터 록킹
JP2626675B2 (ja) データ誘起状態信号発生装置及び方法
EP0825529A2 (fr) Système servant à préparer des instructions pour un processeur parallèle d'instructions et système à mécanisme de branchement au milieu d'une instruction composée
DE69031232T2 (de) Verfahren und Vorrichtung zur Vorverarbeitung mehrerer Befehle in einem Pipeline-Prozessor
Richardson The Fred VHDL Model
CA2040637C (fr) Preprocesseur de composition pour antememoire
Craig et al. PIPE: A High Performance VLSI Processor Implementation
WO1998006039A1 (fr) Circuit de memoire de desambiguisation et procede de fonctionnement
Mori et al. Implementation and Evaluation of Oki 32-bit Microprocessor 032
JPH0778737B2 (ja) キャッシュに対する複合化プリプロセッサ方式
EP0186668A1 (fr) Pipeline d'instructions a trois mots

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT CH DE DK ES FR GB IT LI NL SE

17P Request for examination filed

Effective date: 19911219

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT CH DE DK ES FR GB IT LI NL SE

17Q First examination report despatched

Effective date: 19961206

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19970417