GB2586258A - Efficient processor machine instruction handling - Google Patents

Efficient processor machine instruction handling Download PDF

Info

Publication number
GB2586258A
GB2586258A GB1911667.2A GB201911667A GB2586258A GB 2586258 A GB2586258 A GB 2586258A GB 201911667 A GB201911667 A GB 201911667A GB 2586258 A GB2586258 A GB 2586258A
Authority
GB
United Kingdom
Prior art keywords
instruction
operations
opcode
instructions
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB1911667.2A
Other versions
GB201911667D0 (en
Inventor
Kirkshaw Watts Christopher
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
1inspiries Tech Ltd
Original Assignee
1inspiries Tech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 1inspiries Tech Ltd filed Critical 1inspiries Tech Ltd
Priority to GB1911667.2A priority Critical patent/GB2586258A/en
Publication of GB201911667D0 publication Critical patent/GB201911667D0/en
Publication of GB2586258A publication Critical patent/GB2586258A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30156Special purpose encoding of instructions, e.g. Gray coding

Abstract

A computer system comprising a processing unit configured to process the machine instructions comprised in an instruction set architecture of the processing unit. Each machine instruction comprises at least one computer operation, each computer operation having an operation frequency, wherein each operation is classified, by its operation frequency, into one of a plurality of subsets of operations, subsets of operations corresponding to ranges of operation frequencies. One or more opcode fields are present in each machine instruction, each opcode field corresponding to one of the plurality of subsets of operations and being configured to identify either an operation in the corresponding subset of operations or a reserved value.

Description

EFFICIENT PROCESSOR MACHINE INSTRUCTION HANDLING
[0001] The subject matter disclosed herein relates to electronic processors for processing computer instructions, in particular to electronic processors for high performance processing of instructions in shorter time periods and enhanced energy efficiency, as well as to a method for processing such instruction structures, and to an improved instruction set architecture.
BACKGROUND OF THE INVENTION
[0002] Computing hardware comprises one or more processing units or processors, including, but not limited to, a central processing unit (CPU), processing units serving to control the operations of the computer, to perform programs and to communicate with other functional units of the computer, such as logic units, registers, memories, caches, input and output devices. Computer programs installed and stored on the processor, or on other functional units, comprise commands, or sets of instructions, which perform specified tasks when executed on the computer.
[0003] The subject-matter disclosed herein relates, in particular, to those instructions which interface between the software and the hardware of the processing system, i.e. the machine instmctions, which implement the set of instructions in the various software installed on the computer, and which are permanently stored, i.e. hard-wired, as permanent electronic circuitry, in the processor silicon and to this extent determining the structure or architecture of the processor and control its operation either directly or by hard-wired addressing of a stored microprogram. The set of instructions define the processor's Instruction Set Architecture (ISA), and form the fundamental operating instructions of the processor, and initiate all other instructions to be executed by the computer.
[0004] "Machine instructions" will be understood to refer to those instructions defined by a pattern of bit values ("0" or "1"), producing a physical effect on the processing device which alone handles the machine instructions, as opposed to micro-instructions if they exist which are initiated by machine instructions and where one machine instruction usually comprises multiple microinstructions. Unless otherwise stated, all references to instructions hereinafter, are to machine instructions.
[0005] As will be explained in greater detail in later sections herewith, the processor circuitry is designed to process one or more instruction sets comprising machine instructions typically comprising one or more operations, each operation having a plurality of fields where a field is a group of 1 or more bits which together serve a common purpose. The instruction typically includes a number of distinct bit fields: a first bit field (but not necessarily in the first position) representing the operation code, or "opcode", which once decoded identifies the computer operation or simply the operation or action of the instruction, e.g. ADD, MULTIPLY, etc, and one or more bit fields for the operands, which contain or reference data on which the operation is to be performed, as well as unused bit fields. Additionally, there may be bits or bit fields which directly control parts of the processors' circuitry. The ISA is crucial to the overall working of the computer, and determines task duration, processor speed and processor efficiency. Processing time is the time required to process an operation.
[0006] Some digital computers run as CISCs (Complex Instruction Set Computers) in which individual machine instructions are comprised of one or more operations and each operation is comprised of multiple computer micro-operations, which are built into the architecture. Each complex machine instruction interacts directly with the processor's internal memory the contents of which are permanent or semi-permanent and control the processors functionality.
[0007] Various proposals have been put forward to improve computing efficiency, by, for example, reducing execution times or instruction lengths, in order to allow a greater throughput of instructions to occur, with more instructions being executed in a given time period. One additional advantage of shortening instruction lengths is that a smaller amount of memory is required to hold programs awaiting execution. Techniques have been proposed which seek to minimise the length of longest opcodes by the automatic generation of instruction structures, or which pre-compress the instructions to shorter lengths, and decompress the instructions prior to execution. Such schemes have their own drawbacks. In the case of automatic generation of opcodes, this would only be beneficial if all opcodes were equally likely to occur, which is unlikely ever to be the case. With the decompression of compressed opcodes supplementary tasks are required, which necessarily also require time and extra circuitry, and in many cases the time penalties incurred can offset or exceed the time savings sought. Functionality-based solutions, in which a dedicated bit of the opcode may trigger the loading of other instructions into the cache in anticipation of the change in functionality in the program, have also been proposed.
[0008] One widely implemented approach to improving processor efficiency is based on RISC (Reduced Instruction Set Computer) architectures, such as that described in W0009748041. Compared to complex instructions, RISC machine instructions perform simpler functions and so are not broken into smaller tasks each of which are stored separately. RISC architectures had fixed length instructions e.g. 32-bit. Subsequently, RISC architectures have taken a subset of commonly used existing instructions and, by means of compiler programs, converted these to simpler instructions of shorter length e.g. 32-bit instructions become 16-bit, with the processing time for whole instructions usually taking on average one clock cycle to complete. Each shorter instruction contains a single fixed length opcode and is effectively expanded on the fly to its longer equivalent instruction. Unlike other compression approaches, compression in RISC occurs at the individual instruction level. RISC architecture continues the trend toward longer instructions and has become commonplace.
[0009] After one or more implementations of some ISAs had been produced, opcode values, known as extension codes, were introduced within instructions, the extension code permitting expansion of the instruction set for future implementations of the ISA, by denoting one or more additional opcode fields identifying the supplementary instructions [00010] Other moves to enhance efficiency include developments based on longer instructions, but seek to process the larger data volumes, by parallel processing, using multiple synchronised ALUs running simultaneously.
[00011] This disclosure relates to a novel and inventive apparatus and method for enhancing computer processor efficiency and speed when executing computer instructions.
BRIEF DESCRIPTION OF THE INVENTION
[00012] Reference will now be made in detail to present embodiments of the invention, one or more examples of which are illustrated in the accompanying drawings. The detailed description uses numerical and letter designations to refer to features in the drawings. Like or similar designations in the drawings and description have been used to refer to like or similar parts of the invention.
[00013] Unless the context indicates otherwise, the terms "first-, "second", "third-, "last", as well as "highest", "higher", "lowest" and "lower" may be used interchangeably to distinguish one component from another and shall not be understood to indicate importance, order or position of the components specified. Singular articles such as "a-, "an", and "the" shall denote also plural forms unless the context indicates otherwise.
[00014] The present disclosure is directed to systems and methods configured to provide and execute instructions or instruction types. In an example aspect of the invention, a processor may provide an instruction set architecture in which opcodes are maintained in a structure comprising dedicated fields for a subset of operations which reflect a range of frequencies of occurrence. The instructions in instruction sets and their operations are realized in the form of units of "hard-wired" electronic circuitry such as adders, multiplexers, registers, ALUs, sequencers etc. in the processing unit.
[00015] An example aspect of the present disclosure is directed to processors comprising an instruction set architecture comprising or storing instructions and configured to execute said instructions, wherein the instructions are stored in a structure which takes into account the frequency of occurrence of the instruction's operation (frequency of occurrence is explained in later passages herein) [00016] Operations and operation types do not occur with uniform frequency: some operations, or types thereof, occur with a greater frequency than others. Per force, frequent operations are responsible for a larger proportion of processing than other operations. Therefore, in accordance with an aspect of the invention, by enhancing the bit processing efficiency in respect of those operations identified as frequent, relating to the major part of the volume of total operation executions, a considerable saving can be achieved in the overall amount of processing, as well as the associated power consumption.
[00017] In accordance with the invention, the length of the more, or most frequently, occurring instructions may be decreased, in order to reduce the processing load on the processor when executing the instruction, thereby improving system performance and decreasing power consumption. The apparatus and method in accordance with an aspect of the invention, may discriminate between computer instructions, operations or opcodes, according to their respective frequencies, and provide for disparate processing of operations of different frequency. In particular, the high frequency opcodes (FIFO) and the least frequently occurring opcodes (LF0) may be identified and designated to be implemented by different processes. As stated below, in accordance with an aspect of the invention, operations may be categorised into subsets according to how frequently they occur or are executed. The frequency of occurrence or execution (see below) is referred to herein as the operation frequency.
[00018] In accordance with an aspect of' the apparatus and method of this disclosure, operations may be categorised according to frequency, specifying two or more groups or subsets of operations, comprising at least a group of high frequency opcodes (HFO) and, potentially, also a group of least frequent opcodes (LFO). The structure of the opcodes, according to an aspect of the invention, may comprise one or more bit-fields to indicate a frequency group of the operations concerned, such as fields indicating instructions with HFO status or LF0 status.
[00019] As set out in greater detail in later sections of this disclosure, by, for the more, and most, frequently occurring instructions, shortening the instruction length of the instruction, i.e. using less bits in the opcode, as compared to a standard opcode, the processing required to execute those instructions, may be reduced. The number of bits required for the shortened opcodes will depend on the number of operations determined to be of greatest or greater frequency, and the size of the subsets of such frequencies.
[00020] An aspect of the apparatus and method of this disclosure provides an alternative means of identifying operations according to their frequency of occurrence.
[00021] In many architectures, the access scheme is byte-addressing, rather than word-addressing or bit-addressing. In order to keep the overhead of aligning instructions low, such schemes stipulate that the opcodes are byte-aligned, meaning that instructions start or end at byte-addressed locations which are separated by intervals comprising a whole number of bytes. The trend in the last decades has been to increase the alignment offset, with more recent schemes going beyond byte-addressing and operating 32-bit, 64-bit or 128-bit addressing.
[00022] Unlike other schemes, the system and method in accordance with the current invention envisages bit-alignment, rather than byte or multi-byte alignment, meaning that instruction start points do not have be located at intervals of a whole number of bytes or multi-byte sections. As a bit is the smallest unit of data, the restrictions on starting points for byte-addressed or multi-byte addressed schemes are eliminated: the starting points of a bit-aligned instructions is unrestricted.
[00023] The shortening of the more frequently occurring instructions may be accompanied by the lengthening of less frequently occurring instructions in such cases, because there may be a processing cost of such lengthening, there may be a trade-off between the benefits and costs, influencing how many FIFO instruction groups are included in the ISA, as discussed below.
[00024] In accordance with an aspect of the apparatus and method in this disclosure improves system performance by better utilization of processor and memory resources, irrespective of the length of opcodes: under the apparatus and method of the invention fewer tasks are performed and tasks are performed in a shorter duration.
[00025] In accordance with an aspect of the apparatus and method in this disclosure, simpler, shorter opcodes for the most frequently occurring operations facilitate better utilisation of processor, bus and connection resources. By fitting more instructions into fixed-sized caches or memories, more instructions can be fetched and executed in any given time, or any given instruction may be executed in a shorter time. In accordance with the invention, bit alignment (rather than byte-alignment), in combination with shorter opcodes, may provide more unused bits for additional addressing and an increased jump/branch addressing range. This scheme may facilitate less gate switching due to the aggregation of the more frequently occurring instructions, leading to less heat generation and reduced power consumption.
[00026] It will be apparent from the foregoing that the technical advantages provided by the invention as indicated above will result in an enhanced system performance [00027] The description and the drawings set out certain illustrative aspects of the subject-matter of this disclosure. These aspects merely serve to indicate, however, of a number of the various ways in which the subject-matter of this disclosure may be implemented. The advantages and novel features of the invention set out in this disclosure will become apparent from the detailed description herein when considered in conjunction with the drawings.
[00028] In an exemplary aspect of the apparatus of this disclosure a computer system is provided which comprises a processing unit comprising an instruction set architecture comprising a set of machine instructions, wherein the processing unit is configured to process the set of machine instructions, each machine instruction comprising at least one computer operation, each computer operation having an operation frequency, wherein each operation is classified, by its operation frequency, into one of a plurality of subsets of operations, each such subset of operations corresponding to a range of operation frequencies, arid each machine instruction comprises one or more opcode fields, each opcode field (HF0,) corresponding to one of the plurality of subsets of operations and being configured to identify either an operation in the corresponding subset of operations or a reserved value; and one or more memory subsystems comprising one or more memory devices in electronic communication with the processing unit.
[00029] In another exemplary aspect of the apparatus of this disclosure the number of operations in each subset of operations is determined at least by the number of bits in the corresponding opcode field.
[00030] In another exemplary aspect of the apparatus of this disclosure a first opcode field (HF00) present in the instruction corresponds to the subset of operations of highest operation frequency in the instruction set.
[00031] In another exemplary aspect of the apparatus of this disclosure the order of logically consecutive opcode fields is the order of decreasing operation frequency, the logical first opcode field (HF00) corresponding to the subset with the highest operation frequency range, with, if present in the instruction, logically consecutive opcode fields (HM) corresponding to subsets of successively lower operation frequency ranges.
[00032] In another exemplary aspect of the apparatus of this disclosure a least frequent opcode field (LFO) which corresponds to the subset of operations of lowest frequency of all operations in the instruction set and is the logically last opcode field in the instruction.
[00033] In another exemplary aspect of the apparatus of this disclosure the logically last opcode field (HFON) present in the instruction is, apart from the LFO, if present in the instruction, the opcode field corresponding to the subset of operations with the lowest operation frequencies.
[00034] In another exemplary aspect of the apparatus of this disclosure the number (N+1) of opcode fields present in the instruction, apart from the LFO if present, is determined at least by the number of bits in the opcode fields present in the instruction.
[00035] In another exemplary aspect of the apparatus of this disclosure the plurality of subsets of operations corresponds collectively to all the operations in the instruction set.
[00036] In another exemplary aspect of the apparatus of this disclosure the instructions each have an instruction length, the instruction lengths including lengths other than a multiple of 8 bits.
[00037] In another exemplary aspect of the apparatus of this disclosure at least one of the instructions are multi-bit instructions and further comprise one or more operand fields.
[00038] In another exemplary aspect of the apparatus of this disclosure at least one of the instructions comprises only opcode fields and operand fields.
[00039] In another exemplary aspect of the apparatus of this disclosure at least one of the instructions contains no unused bits.
[00040] In another exemplary aspect of the apparatus of this disclosure the opcode field (HF0x, LFO) comprises a value which identifies either the operation in the corresponding subset of operations or the reserved value.
[00041] In another exemplary aspect of the apparatus of this disclosure, if the value comprised in the opcode field identifies the operation, the instruction, if processed, is executed.
[00042] In another exemplary aspect of the apparatus of this disclosure, if the value comprised in opcode field identifies the operation, the opcode field is, apart from the LFO, if present, the logically last opcode field in the instruction.
[00043] In another exemplary aspect of the apparatus of this disclosure, if the value comprised in the opcode field is the reserved value, the value indicates that there is a logically further opcode field.
[00044] In another exemplary aspect of the apparatus of this disclosure, if the value comprised in the opcode field is the reserved value, the value is an extension indication value which indicates that there is an extension field [00045] In another exemplary aspect of the apparatus of this disclosure the processing unit is configured to process instructions comprising instruction addressing which is not byte or multi-byte-addressing [00046] In another exemplary aspect of the apparatus of this disclosure the processing unit is configured to process instructions comprising instruction addressing which is bit-addressing or address offsets for instruction addressing which is bit-addressing.
[00047] In another exemplary aspect of the apparatus of this disclosure wherein the instruction set architecture is hard-wired in the processor.
[00048] In another exemplary aspect of the apparatus of this disclosure the machine instructions are hard-wired in the processor.
[00049] In another exemplary aspect of the apparatus of this disclosure the operation frequency is a static frequency of occurrence of the operation in the instruction set which comprises the machine instruction comprising the operation.
[00050] In another exemplary aspect of the apparatus of this disclosure the operation frequency is a dynamic frequency of occurrence of the operation, when executed, in the instruction set [00051] In another exemplary aspect of the apparatus of this disclosure each operation has a processing time, wherein the processing time is variable and specific to each operation [00052] In an exemplary aspect of the method of this disclosure a method is implemented on a computer system, comprising a processing unit and one or more electronic devices in electronic communication with the processing unit, comprising the steps of: processing, by a processing unit comprising an instruction set architecture comprising a set of machine instructions, each machine instruction comprising at least one computer operation, each computer operation having an operation frequency, wherein each operation is classified, by its operation frequency, into one of a plurality of subsets of operations, each such subset of operations corresponding to a range of operation frequencies, and each machine instruction comprises one or more opcode fields, each opcode field (HFO", LF0) corresponding to one of the plurality of subsets of operations and being configured to identify either an operation in the corresponding subset of operations or a reserved value; and communicating, by the processing unit, with the one or more memory subsystems comprising one or more memory devices.
BRIEF DESCRIPTION OF THE DRAWINGS
[00053] Various aspects, implementations, embodiments, objects and advantages of the present invention will be apparent to the reader upon consideration of this detailed description, in combination with the drawings, in which reference signs refer to corresponding components throughout.
[00054] FIG 1 illustrates an exemplary computer system with a typical arrangement of caches and main areas of storage as well as the relative access speeds for the CPU to access instructions from them; [00055] FIGs 2.1 to 2.4 illustrate instruction structures according to the prior art; [00056] FIGs. 3.1 to 3.14 illustrate example instruction structures and implementations thereof in accordance with various aspects and implementations disclosed herein; [00057] FIG. 4 illustrates instructions according to the prior art, and exemplary instructions in accordance with various aspects and implementations disclosed herein; [00058] FIG. 5 illustrates a diagram of example decode memory in accordance with various exemplary aspects and implementations disclosed herein; [00059] FIG. 6 illustrates a circuit representation of FIG. 5 in accordance with various exemplary aspects and implementations disclosed herein; [00060] FIG. 7 illustrates a diagram of example decode memory in accordance with various exemplary aspects and implementations disclosed herein, [00061] FIG. 8 illustrates a circuit representation of FIG 7 in accordance with various exemplary aspects and implementations disclosed herein, [00062] FIG. 9 illustrates fetching instructions according to the prior art, and fetching the same instructions in accordance with various aspects and implementations described herein; [00063] In this description and accompanying drawings, bit numbers are shown in big endian structure though could equally be shown in little endian structure. The drawings illustrate the sequence of various bit fields in various exemplary machine instructions. Opcode fields are shown ordered left to right from the opcode field representing most frequent operations to that representing least frequent but could be any order. Bit fields are shown with the opcode fields on the left but the bit fields could be any order. X = any bit value i.e. binary 0 or 1.In the description and accompanying figures, it is assumed, for illustrative purposes only, that each instruction is read from left to right, wherein the leftmost bit (as depicted) is received by the processing unit first and then the second left bit, and third left bit and so on, until the rightmost bit, the last bit in the instruction, is received. As stated above, terms such as "left", "right", "first" and "last" etc. are used herein to explain the various aspects disclosed herein but do not limit the alignment or structure of the embodiments described herein. Unless the context clearly dictates otherwise, "first", -next", "successive", "last" generally refer to the logical order of instructions or fields or bits as logically interpreted by the processing unit of the computer system and not refer to any physical location, alignment, structure or actual order of receipt by processing unit.
DETAILED DESCRIPTION OF THE INVENTION
[00064] Reference is made herein to examples and embodiments of the disclosed apparatus and method, one or more of which are, for the purpose of explanation, are illustrated in the drawings. Such examples and embodiments are not limitations of the invention. It will be apparent to the skilled reader that various modifications and variations can be made to the apparatus and method in this disclosure without departing from the scope of the invention, which is defined in the claims. For example, features shown or described in any embodiment can be part of any other embodiment to provide a further embodiment. The present invention covers all modifications and variations which fall within the scope of the claims and their equivalents.
[00065] Computer instructions and their operations do not occur with uniform frequency, some occurring very frequently and others occurring very rarely. Uniform treatment of all instructions and uniform allocation of resources for the storage and processing of instructions is not efficient. By frequency of an operation's occurrence, it will be understood that the frequency with which instructions' operations are invoked or called or fetched or executed, is intended, and that some instruction operations (e.g. ADD, SUBTRACT, MOVE, JUMP etc.) are of high frequency and that more complex tasks (e.g, obscure or complex functions, equations etc.) occur less frequently.
[00066] In accordance with an aspect of the invention, operations may be categorised into subsets according to, for a given instruction set in respect of one or more programs, how frequently the operations occur or how frequently they are executed. The frequency of occurrence or execution (see below) is the referred to herein as the operation frequency. For an operation deemed to be of high frequency, an identifier may be assigned which serves as a proxy for the underlying operation, uniquely identifying the operation from other operations. The unique identifier of the high frequency operation is shorter than the length of the opcode in the corresponding conventional instruction. The determination of the absolute or relative frequency of the operation is discussed in further detail below.
[00067] The static frequency of occurrence of an operation in an instruction in an instruction set is a count of the occurrence of the operation in the code produced from one or more software programs. Note that the count relates to the count of executable instructions, i.e. instructions to be run, in a set of instructions, irrespective of how often the instructions, when run, are actually executed. The frequency of occurrence is, in this sense, a static count of operations in instructions in the instruction set. The counting of operations (or instruction types) to establish the absolute frequency of each operation in the population of operations conducted by a processor provide frequency orders, or frequency rankings, of operations and benchmark analyses have been performed in which frequencies have been measured. Benchmark programs may be written specifically for performance analysis or may be a collection of one or more ordinary application programs whose analysis would be representative of system performance Such benchmarks may be based on assembled source programs or disassembled from one or more executables, with corresponding instruction mnemonics, such as ADD, MULTIPLY, MOVE, JUMP etc, being totalled. In conventional systems, instruction opcodes and operations generally occur on a one-to-one relationship, and measurement of opcode frequency is the same as, or a proxy of, operation frequency. A frequency profile for the population of operations in the instruction set may be ascertained without the source code haying been actually run. In addition to benchmarking of raw frequencies, weighted frequencies may be used in which certain operations are given preferential weightings. Different benchmarks may also be combined or aggregated, leading to various assessments of absolute and relative frequency of instruction operations. Whatever the exact method used; it is well known that it is possible to derive a frequency distribution of operations for the population of operations in a given instruction set. Such analyses relate to static frequency, meaning for any instruction set architecture and one or more benchmark programs representative of the systems performance, the relative frequencies of the operations in the instruction set are generally fixed, i.e an instruction set has a repeatable operation frequency profile.
[00068] The frequency profile may also be measured dynamically, based on the frequency of occurrence of executed operations. Rather than static frequency, this latter approach is referred to as dynamic frequency, as explained below.
[00069] While static frequency relates to frequency of occurrence of executable (but not executed) operations, dynamic frequency takes into account the number of times an operation, is actually executed. Some instmctions are subject to an iterative process and comprise loop-type or repeatable elements, such that some instructions, when run, are required to be re-run a plurality of times before execution of the instructions can be said to be completed. The number of iterations may be absolute or may depend on some conditionality being met. The count of operations, in this context, is not simply the count of executable, but unexecuted operations (as in static frequency measurements, see above), but a count of executions of each iteration of each operation as the instructions are run. For a predetermined time period a logic analyser or similar equipment connected to the processor records the stream of instructions as they are executed, and, at the end of the period, derives from the record a count of executions of operations. During the recording, a table of operation counts is constantly updated, such that whenever a particular instruction is about to decoded for execution or has been executed, the count for the particular instruction's operation(s) is incremented by 1. At the end of the counting period, the count of operations is the dynamic frequency of that instruction's operation. Some instructions will be more susceptible to iterative elements (loops and/or repeat executions), than others, such that for each instruction in the instruction set the operation dynamic frequency may be very different when taking also the iteration executions into account, i.e. the dynamic frequency of an instruction operation may differ from the corresponding static frequency of the same instruction operation.
[00070] As with static frequency, a profile of the dynamic frequency of operations may be obtained for each set of operations, but, as the reader will appreciate, for any given instruction set the dynamic frequency profile of the instruction set will be different from the static frequency profile of the same instmction set. Consequently, once the operations are ordered or ranked (see above), the resulting order of operations for the dynamic frequency profile will be different from the order of operations using a static frequency profile.
Accordingly, subsets of operations, corresponding to discrete ranges of operation frequencies will be different for the static and dynamic frequency profile of the same set of instructions.
[00071] In an aspect of the apparatus and method of this disclosure, the recording of the dynamic operation count may be performed directly on the processor or, alternatively, on a simulator of the processor. In another aspect an operating system emulator may be utilised, such that as each instruction is about to be executed or has been executed, a table of operation counts in the emulator is incremented, thereby providing an overall count for each operation.
[00072] Aspects of the apparatus and method of this disclosure relate to either the static or dynamic frequency approaches or to both.
[00073] In accordance with an example aspect of the apparatus and method of this disclosure the operation frequency distribution may be used as a basis for efficiency enhancements in a computer processor and method for processing computer instructions.
[00074] In an example aspect of the invention operations may be ranked, or ordered, by frequency, and then categorised or grouped by frequency, such that each operation in a group or category of operations is linked by an opcode field, the value of the opcode field being the identifier of the operation within the group and the concatenated values of all opcode fields in the operation together being the identifier of the operation within the instruction set.
[00075] In an example aspect of the apparatus and method of this disclosure, processor operations may be arranged according to their operation frequency. In the frequency analyses referred to above, each instruction generally comprised a single opcode field, reflecting a single operation, so instructions and operations enjoyed a one-to-one relationship. By means of such analyses on the population of operations associated with an instruction set, operations may be categorised or grouped by frequency into subsets of operations represented by a single operation code (opcode) field. Operations with similar (or, for example, consecutive frequency ranking) may be grouped together, thereby defining one or more subsets of operations. The subset of operations of the same or similar, or related, frequencies may be represented by a single designated opcode field, a particular value of the opcode identifying a specific operation. Different subsets of operations are represented by different opcode fields, the operations represented by each opcode field being linked by their respective frequencies, but each opcode field comprising operations of different frequency levels to those represented by other opcode fields. In accordance with an aspect of the apparatus and method of the invention a single instruction may comprise one or more operations each comprising a plurality of such opcodes, each with a corresponding opcode field, FIFO°, HFOI, HF02, HE°, etc. Despite the possible plurality of opcodes in an operation, at any one time only one opcode, in any single operation is of primary relevance as explained in a later passage herewith.
[00076] Each successive opcode field represents a different subset of operations, grouped by frequency, but having a lower frequency than the operations represented by the previous opcode field. The first opcode field 11F00 comprises the operations of highest frequencies, the second opcode field HMI, if it exists, comprises the operations of next highest frequencies, the third opcode field ITIF07, if it exists, comprises operations with the third highest frequencies, etc. The value of the last opcode field for an operation serve as identifiers of the underlying operation. If the opcode field has 4 bits, it can contain up to 16 opcode values -one of these is the reserved value (see below), the others ("0001", "0010", "0011" etc) being operation identifiers. This approach of grouping operations into subsets, is applied consistently across all operations in the population of operations in the instruction set architecture (ISA), until the last opcode field LEO which comprises the operations of lowest frequencies. The subsets collectively cover all operations, i.e. there are enough subsets to cover the whole population of operations. Because each opcode field represents a subset of operations, a smaller number of opcode fields is required than in an arrangement where each opcode field represents a single operation.
[00077] Instructions in conventional systems generally have a single opcode field for denoting the operation in question, and with a bit length sufficient to cover all operations in the instruction set with no use of subsets or any reference to frequency of the instruction. As will be appreciated, in an aspect of the apparatus and method according to the invention, there is, unlike in conventional systems, no one-to-one relationship between the instruction and the opcode for all instructions in the instruction set, as the instruction may contain a plurality of operations and each operation may contain a plurality of opcodes. As the reader will understand, each opcode field according to the apparatus and method of this disclosure represents a subset of operations and not the whole population of operations, and opcode lengths for high frequency operations may therefore be much shorter than opcodes designed to cover the entirety of the population of operations. The exact bit-length of the opcode field will depend on the number of operations, grouped into the subset, by frequency, as described above.
[00078] Within each opcode field, TITO°, HF01, 11F02, HFO, etc., the values for individual operations may be assigned according to order of frequency within the population of operations or according to order of frequency within the subset, or may be arbitrary, as described in a later passage herewith [00079] Unlike conventional systems, operations in accordance with the apparatus and method of the invention may contain a plurality of opcode fields based upon the frequency of operations. In conventional systems with operations containing a single opcode field, and the opcode value in that field identifies the action to be performed, there is no need to specify that further opcode fields need be considered. In accordance with an aspect of the apparatus and method of the invention, the relevant opcode, this being the one containing bits corresponding to the operation to be processed and executed by the processor, has to be accordingly flagged up. As explained below, in accordance with an aspect of the apparatus and method of the invention, each opcode is identified by value, flagging whether it is a relevant opcode or not. If it is not a relevant opcode, the processor will effectively pass over or bypass one or more opcode fields which are not relevant, until it receives the bits of a relevant opcode which it then processes.
[00080] In accordance with the apparatus and method of the invention, the number of opcodes in an operation will depend on the position of the opcode comprising the relevant action to be performed. Opcodes are arranged in order of associated frequencies, with HF00 first, then UFO', then HF02 etc, and opcode bits are processed in the same order. Only one opcode field contains the relevant operation for any given instruction, so the processor, as explained below, can be considered to serially process consecutive bits in the instruction, receiving the bits of each opcode, one after another, until it receives the relevant opcode, the one containing the relevant operation, and the instruction will contain no further opcodes, the last opcode received by the processor will be the relevant opcode. Thus, the number of opcodes present in the instruction will be determined by that opcode which identifies the relevant operation. Another way of understanding this is to consider that the instruction will only contain those opcodes required to determine the relevant opcode.
[00081] If the opcode value for the relevant operation, due to the frequencies of the operations for the corresponding subset, is located, for example, in the seventh opcode field comprising the seventh subset of operations, the operation will not contain subsequent opcodes i.e. the eighth, ninth, tenth etc. opcode fields will be absent from the instruction. The operation will therefore not contain opcodes corresponding to frequencies less than that of the relevant operation i.e. the operation of interest.
[00082] As the reader will understand, there is an implicit presumption in this process that higher frequency opcodes are prioritised, i.e. received and processed first, because they are more likely, being of higher frequency, to contain the operation of interest. I-1F00 is logically more likely to contain the relevant operation than Fifth, HFOi is more likely to contain the relevant operation than FIF02, and so on. By a mechanism explained below, the higher the frequency of the opcode of the operation, the sooner it may be received by the processor, thereby effectively prioritising high frequency operations over those of lower frequency. The system may, in this way, arrange to process opcodes and operations in operation frequency order.
[00083] As stated above, instruction bits including the opcode bits are successively received by the processor, with opcodes which are not relevant being flagged to the processor as such, until the bits of the relevant opcode are received. The bits of each opcode flag to the processor whether the opcode received is relevant or not. The processor may receive a plurality of non-relevant opcodes before it receives the relevant opcode corresponding to the relevant operation. Non-relevant opcodes have a value which is reserved, i.e. if the bits of an opcode comprise a reserved valve, the processor will treat that opcode as not relevant and move on to the immediately following opcode. In effect the processor is "informed" by the presence in the received opcode field of a reserved value, that there is alio-flier opcode and the current opcode may be passed over. To "find-the relevant operation in, say, the seventh opcode (where the operation is comprised) the instruction will nevertheless comprise the higher opcodes i.e. the first, second, third, down to the sixth, opcode, but in each of these higher opcodes a reserved value is present, the reserved value signalling that another opcode, the next successive opcode in the chain, is present in the instruction and these higher opcodes should be, in effect, "bypassed" or ignored. As the higher opcodes (in this example: the first six opcodes) are consecutively by-passed in this way, the processor will "arrive" at the relevant opcode (in this example, the seventh opcode) where the relevant operation forms part of the subset, and the relevant value acted upon. Having "located" the operation, which can then be executed, the bypassing of opcodes, ceases, such that relevant code, i.e. the one containing a value of the relevant operation (the operation to be executed) is the last opcode in the instruction. If the operation is comprised in a high frequency opcode, it will be "located" in one of the first opcodes, with relatively fewer opcodes being bypassed, and conversely if it is in a lower frequency opcode, then the chain to arrive at the opcode in question will be relatively longer. The reader will understand that the number of opcodes present in the instruction will in this way be determined by, inter ali a, the relevant subset and relevant opcode, containing the value for the relevant operation, which is dependent on the frequency of the relevant operation. The number of bypasses required depends on the corresponding frequency subset of the opcode which contains the relevant value and is directly linked to the frequency of the relevant operation relative to the frequency of other operations. It will be understood that operations of higher frequency will benefit more than lower frequency operations, to the extent that the processing required is less for high frequency operations. Correspondingly, according to the arrangement set forth in this disclosure, the processing savings are greater for higher frequency instructions than for lower frequency operations. Operations comprised in opcode HF00 will require the least processing of all and benefit from the maximum processing saving over conventional systems in which instructions may contain not only many unused bits but also much longer opcodes.
[00084] It will be understood that the above process for accessing operations in the relevant subset comprised in the corresponding opcode does not involve extension values [00085] We now consider the size of subsets of operations (the subset capacity in terms of operations) and the bit-length of the corresponding opcode field according to the apparatus and method of this disclosure. The reader will appreciate this will determine how many subsets are needed to cover the population of operations: bigger subsets mean fewer subsets and vice versa. Clearly, as the subsets and the operations identified by them are classified and ordered by frequency, the size of the subset (which is fixed by the number of bits in the opcode) will determine in which subset and opcode a particular operation is comprised. By using small subsets, fewer operations will be identified by values in each opcode field, and any particular operation not in the first subsets, will be located "further down" in the series of successive opcode fields. There is trade-off between implementing smaller subsets which "push" operations down to later operation subsets and larger subsets which "maintain" operations at a relatively high level in the opcode chain, and the choice of numerous small subsets or few large subsets may depend on the particular frequency profile of the population concerned. Alternatively, this choice may be dictated by an ideal opcode size (say 5-8 bits) which in turn determines the number of subsets and opcode fields required to cover the whole population.
[00086] Once the processor has located the relevant operation from the corresponding opcode, it decodes and executes the operation in the conventional way (see below). However, high frequency operations will only comprise higher level opcode fields, meaning the bit length of these instructions (the once most frequently used) will be shorter, saving on bits and on processing for those instructions. The more frequently the operation occurs, the shorter the total opcode bit length of the instruction and the greater the saving. For lower frequency instructions, it will be apparent that the saving is lower and may even be negative, due to the number of bits in opcode fields containing reserved values which need to be decoded in addition to those in the opcode field identifying the operation.
[00087] In terms of decoding operations (see later passage herein) where a single block of decode memory is implemented, the access time for decode memory is slower if the memory has more locations (as is the case if there are more opcode bits to decode) Thus, the potential decode time is shorter for operations in the FIFO° subset, if dedicated decode memory for this subset is implemented, compared to the same operations with a classic ISA. However, it is also the case that the potential decode time will be longer for operations in the LEO subset compared to the same operations in a classic ISA [00088] In view of the above it will be apparent to the reader that there is a trade-off between the number of opcode fields and opcode field length, and the opcode may have an ideal size, as cited above, this being the level at which processing saving is optimised [00089] Further advantages of the invention will be apparent on consideration of the operation and workings of typical computer systems, as illustrated in FIG. 1 and explained later herein. FIG. 1 is a schematic diagram of an example computer system and the subsystems thereof, along with an indication of the relative speeds of interaction between the different subsystems. The processor (the CPU) on the left of the diagram comprises a number of subcomponents (not shown): those communications and interactions occurring within the processor occur at the fastest speed. Reading from left to right in FIG. 1, the computer system typically comprises a number of other storage subsystems (caches, memories of various types), such as non-volatile memory (e.g. Solid-State Drives, hard disks, magnetic tape etc.), which can include "external" or removable media, (for example CD-RONIs, DVDs etc.), normally with a progressively increasing storage volume, but also progressively reducing access speed. When instructions are required by the processor, the delay in accessing and/or retrieving the instructions will depend not only on the speed of the subsystem or device where the instructions are stored, but also the number arid speed of other subsystems through which the data has to be transferred before it arrives at the processor. It will be readily understood that instructions stored, for example, in the non-volatile memory, such as a CD-ROM, are slowest to access and have the greatest delay in being transferred to the processor. We shall return to this arrangement in a later section herein, as it is in the context of such a computer arrangement, that the apparatus and method of an aspect of the invention operate, but shall first consider how instructions are implemented in conventional systems.
[00090] As stated previously, as well as an opcode, a conventional instruction structure may optionally contain one or more operands, as well as one or more unused bits, which may be utilised for some processing purpose, as illustrated in FIG. 2.1 with a specific example in FIG. 2.2. The instruction may also contain some bits which directly control processor functionality (not shown). The instruction commonly has a byte or multiple-byte length, FIGs 2.2 and 2.4, and can be any length, as long as they are byte or multi-byte aligned, meaning they are stored at a memory address that is a multiple of 8 bits and terminate at points a multiple of 8 bits apart. Instructions may also include one or more extension opcode fields, facilitating future additional instructions and may be invoked or activated by the presence of a particular value (e.g. "01001") in the second opcode field, FIG 2.4.
[00091] FIG. 2.3 shows a generalised instruction structure with extension opcode field.
Instructions contain a main opcode field whose value indicates there is also an extension opcode field. The instruction may contain 1 or more operands and/or 1 or more unused bits. Only when both opcode fields have been decoded can the operation (to be performed) be determined, meaning that a considerable number of bits must be processed before that determination is made.
[00092] An example of this concept is shown at FIG. 2.4 showing exemplary field values. This may be for example a 4-byte instruction: ADD R1, R2, R3, (where RI, R2, R3 are the names of the corresponding registers). The instruction adds the contents of register R1 to the contents of R2 and stores the result in R3. One value (here shown as '111111') in the main opcode field acts as an indicator of extension instructions.
[00093] In the instruction depicted in FIG. 2.4 there are 5 useful fields comprising a total of 23 bits, so there are 9 unused bits as it is in a 4-byte instruction. We can see from this that, considerable bit redundancy is inherent in rigid length, multi-byte long instructions such as the 4-byte alignment structure. Instructions are not only long, but many of the bits are actually unused. Depending on how the opcodes and operands "fit in" to the multi-byte structure, a greater or small number of unused bits will be present, If, for example, the bits required for an instruction just exceeds a multiple of thirty-two bits, another whole 4 bytes has to be utilised to accommodate the excess bit(s) and the rest of the 4 bytes will be padded out with supplementary unused bits. Even if this example was part of an ISA which supported byte aligned instructions, 7 bits of padding would be required. These supplementary bits have to be "carried" in the instruction and serve merely to ensure the instruction is of a specified length. Creating, carrying, and processing all these unused bits is inefficient. As the same instruction has to be moved between different subsystems of the computer system (see later passage herein), this inefficiency is compounded, with the processing of unused bits being replicated at multiple locations.
[00094] Opcodes in the conventional instruction structure are generally)" bits wide, and can support up to 2" different operations. In order to accommodate all the operations in the instruction set, ii is generally set at a high value, leading to large opcode field lengths which are fixed, irrespective of how seldom a particular opcode value may occur. Furthermore, the long opcode fields in conventional instruction structures provide a large number of opcode permutations, which allows some future expansion if all the permutations are not required for the current instruction set. The inherent redundancy of the fixed length single opcode field scheme is inflexible and inefficient, as will be explained in later sections herewith.
[00095] The structure of an opcode, according to an example aspect of the invention, comprises a dedicated field of one or more bits for a subset of operations which are operations having a range of operation frequencies. Each different opcode field corresponds to a different subset of operations having a different range of operation frequencies. One opcode field corresponds to the highest frequency operations, another opcode field corresponds to another subset and another range of frequencies, the next opcode field corresponds to yet another subset, and so on. Depending on the instruction, one such opcode field may represent the operations of lowest operation frequency. As the number of operations having a given frequency is limited, or may be limited, to a predetermined number of operations, the number of bits in the opcode field required to identify the relevant operation can be limited: as each bit in the field is binary (-0" or "1"), il such bits would permit up to 2' high frequency instructions to be identified. For example, 3 bits would allow 8 such operations, 5 bits would allow 32 operations to be identified, etc. Allowing for a reserved value, the opcode field for a particular frequency or frequency range may therefore contain one of 2n -1 values, the values entered in the opcode field serving as unique identifiers for the underlying operations. The opcode field has decoding circuitry to convert its bits into binary values on the processor control lines. As we explain below, there may be a plurality of such opcode fields, each corresponding to a different frequency or frequency range and representing a different subset of potential operations. Each opcode has a predetermined number of bits to identify the relevant frequency operation. As already explained, each opcode field according to the invention represents just a subset of operations and not the whole population of operations, such that the length of each opcode field may therefore be much shorter than opcodes designed to cover the entirety of the population of operations. The exact bit-length of the opcode will depend on the number of operations, grouped into the subset, by frequency, as described above.
[000961 According to an aspect of the invention, one advantage of using such a structure with such dedicated opcode fields, denoting and specifying the higher frequency operations, is that the combined opcode field lengths for high frequency operations are shorter than the opcode fields in a conventional ISA. In accordance with the structure described herein, once the high frequency operations are allocated an identifier, which is the opcode value, serving as a proxy for the operation itself, the operation may be permanently identified by a small number of bits, i.e. by a shorter opcode, than would normally be required to identify the underlying instruction without such an identifier. The high frequency operation identifier (the value in the opcode field) may be considerably shorter than for the same operation in a conventional ISA, meaning that there is a saving in terms of bits needed to be processed and instruction processing time, a saving which is highest for the highest frequency operations. Processor performance is improved due to less work being required or performed to process identified operations, in combination with certain tasks being achieved in less time.
[00097] There are also benefits in terms of power consumption. As explained above, shorter instructions, due to shorter opcodes and fewer unused bits, results in bit savings and reduced processing. Energy consumption is reduced, primarily because of the drop in the volume of bit state switching, but also due to earlier completion of tasks, leading to periods of downtime. Furthermore, as explained below, the "distance" over which instructions are fetched is reduced, providing a further significant power saving.
[00098] For high frequency operations, the overall instruction lengths may in this way be shortened, as set out herein. Although there is relatively less shortening (or no shortening) of the instructions with lower frequency operations, the reader will readily understand that, by shortening the instruction lengths of just the high frequency operations, substantial bit savings in terms of program size may be achieved, leading to the code storage efficiency improvements, as set out herein.
[00099] Further advantages of the apparatus and method of the invention will be apparent from the figures, as explained below.
[000100] FIGs 3.1 to 3.14 show example aspects of the invention. FIG. 3.1 illustrates an instruction structure for an exemplary high frequency operation, according to an example aspect of the invention, in which the opcode field of the conventional instruction structure is replaced by a number of dedicated fields, marked here as "FIFO", standing for High Frequency Opcode field, with a numerical suffix optionally followed by an operand or operands, plus any unused bits. In the example at FIG. 3.2, there are 4 bits in the HF00 field, or 16 permutations, supporting 15 different high frequency instructions, with one permutation reserved, the purpose and function of reserved value being described later herein. The 15 operations supported by the opcode field IWO° form a subset of operations from the wider population of operations present in the instruction set architecture: in practice the opcode field will contain a value (replacing the x... bits in FIG. 3.1), this value uniquely identifies an operation. In the instruction illustrated in FIG. 3.1 there is just one opcode field, FIFO°, which is the simplest form of instruction according to an aspect of the invention, and represents the subset of instructions of highest frequency or frequencies. In other figures there is a plurality of opcode fields. The value entered in the opcode field (not shown in the FIG. 3.1) will correspond to an operation which is one of the 15 highest frequencies in the population of operations.
[000101] There may be two operands, as illustrated in FIG. 3.2, showing the opcode HFOU and two operands: the operation represented by HF00 may, for example, be an instruction to ADD the two operands, which may be, for example, the contents of two registers, e.g. R5 and R2, Based upon the example given in FIG. 3.2, up to 15 different instructions could be represented, along with a low frequency opcode field identifier, in a 4-bit field. Here, the high frequency operation identified is "0100", without any low frequency identifier, and two operands "0101" and "0010".
[000102] FIG. 3.3 shows an aspect of the invention, in which, the example structure for low frequency operations, or "LFO" (low frequency opcode field) is identified. In this case, the HFO field will contain a reserved value, shown here only as x..., which for a 4-bit HF00, as shown in FIG. 3.4, will be one value among the 16 potential values which can be entered in the opcode field. The function of the reserved value is to indicate that after the current opcode field, there is a further opcode field, which in this case is the opcode field for a low frequency opcode, LFO. The operation, corresponding to the value contained in the LEO field, belongs to the subset of operations which occurs infrequently. Accordingly, after the HMI) field, there is a dedicated LFO field, which contains the identity of the low frequency operation. Again, after the opcode fields, there follows, within the instruction, the optional relevant operands and unused bits. Were the FIR/afield not to contain a reserved value, but an operation value instead, then no LFO field would exist and the value would identify that operation and the processor would process the operation.
[000103] An example of an instruction, in accordance with an aspect of the invention, is shown at FIG. 3.4, with exemplary values. Here we see that the HF00 field has a reserved value "1111", which indicates to the processor that the HE00 field is followed by a further opcode field. In the example illustrated, the further opcode field is an LFO, meaning the operation in question is not from any subset of high frequency operations, but is a low frequency operation, which is then identified as operation "OH 0" in the immediately following field to the FIFO° field, marked LFO. The value "1111" in the 11E00 field in this way indicates that there is a second opcode field, the LFO. The value "0110" in the LFO field indicates a specific instruction operation, such as "ADD", and that this, in the example illustrated, is to be performed on three operands, which are specified in the next three fields as Operand 1, Operand 2 and Operand 3. The LFO field is shown in FIG. 3.4 to be a 4-bit field. However, the opcode field lengths and values in the opcode fields are illustrative only.
[000104] In accordance with an aspect of the disclosure, a scheme is illustrated at FIG. 3.5 which shows an example instruction structure with two fields relating to high frequency instructions, HE00, the highest frequency operations, and HFOI, the next most frequent operations, plus optional associated operands. The presence of both opcode fields means the HE00 field contains a reserved value and the HFOI field contains an identifier of the underlying operation. FIG. 3.6 demonstrates this concept, but now with exemplary values for HF00 and HFOI, with HF00 being set at "111", a reserved value to signal, that the next field, HFOI, identifies an operation belonging to the set of next most frequent operations, is to be processed. As we saw in the previous figures, the purpose and function of the reserved value is to indicate that there is a further opcode field after the opcode field comprising the reserved value, and that the opcode field containing the reserved value identifies no operation at all. We see in this way that presence of a reserved value in an opcode field causes the opcode field containing it to be effectively by-passed: FIFO° has no operation identifier and its only purpose is, by means of the reserved value, to indicate the presence of another opcode field, HF01. Note that FIFO() and HF01 are of different bit-lengths, meaning the size of the subsets of operations which their potential values represent are unequal: after taking reserved values into account, HT(*) and HF01 can contain up to 7 and 31 different values respectively, each identifying a unique operation.
[000105] The operation, for example "INC", or increment, is identified by the value "01100" in HF01, and this is applied to Operand 1, a value held, for example, in register R4, denoted by the value "0100".
[000106] We see in the example at FIG. 3.6 that there are only two opcode fields, FT 00 and HF01. These two opcode fields represent consecutive subsets of operations in the frequency ordering of opcodes, i.e. the HF01 subset of operations corresponds to operation frequencies immediately following those operation frequencies which correspond to the fifth) subset of operations. This is an important aspect and is followed in more complex instructions with greater numbers of opcode fields, as will be shown in subsequent figures. We also see in the example in FIG. 3.6 that for this instruction HFOI is the last opcode field because it contains the value for the relevant operation, which, in this example, is "01100", and that there are therefore no further opcode fields and the opcode field "chain" ends with the opcode field representing the subset of values which contain the relevant operation. This is also true in longer chains, as shown in subsequent figures.
[000107] FIGs 3.7 and 3.8 illustrate further example aspects of the apparatus and method of this disclosure, in which there are two groups of high frequency instructions and one group of low frequency instructions. FIG 3.7 illustrates an example instruction structure and FIG 3.8 a specific instruction example using that structure. As we saw above for FIGs 3.3 and 3.4, in order for the LFO value to be decoded correctly, the preceding bits, this time in two fields EIF01 and HF02, must each contain reserved values which each indicate there is a further opcode field. FIG. 3.8 shows examples of the predetermined reserved values. For the field HF00, the value "1111", which, in this example, indicates there is another opcode field, which is HFOI, and HFOi itself contains the value "0000", which, in this example, indicates that there is another opcode field, the LFO, which here contains the value "0001". In this example the first two opcode fields only serve to "move on" the processing to a later field and they are in effect by-passed and the only opcode field of significance is the third and last opcode field, which in this example is the LFO. In this way, each opcode field, if it contains a predetermined reserved value, signals to the processor there is another field to follow the opcode field containing the reserved value and to check that field. Each time a reserved value is determined the activity moves along the frequency ordering: the operations referenced by values in HIFOI are in the subset of operations with frequencies which are the next frequencies after those referenced by values in HF0(); the operations referenced by values in LFO On this example) are in the subset of operations with frequencies which are the next frequencies after those in HFOI and also the subset of operations with lowest frequencies.
[000108] In accordance with a further aspect of the apparatus and method of the disclosure, it is also possible that a predetermined reserved value in the HFOo field indicates to the processor that both HFOi and LFO, or any other opcode fields, are present. Rather than a serial signalling of further opcode fields, in this alternative, the reserved values of two or more opcode fields can be handled in parallel, with the relevant operation being identified in one of the opcode fields.
[000109] In the example structure shown in FIG. 3.9, the instruction may contain a plurality of opcode fields: HF00. represent the subset of highest frequency operations, followed by HF01, if present, which is the subset of next most frequent operations after the HIF00 referenced operations, followed by HF02, if present, which represents the next most frequent operations, and so on, with each opcode field representing subsets of successively less frequently occurring operations. For each subset of operations (HF00, HFOI, HF07, HIF03 etc.) the opcode field value serves as an identifier of the underlying operation itself As stated above, the number of bits in each opcode field determines the number of operations in each group: the size in bits of each opcode field is not necessarily the same. The rightmost opcode field in the figures is the opcode field containing the value for the relevant operation and is the last opcode field in the instruction: once the opcode field values, possibly with associated operand(s) if present, have been fetched by the processor all required bits of the instruction are available so it can be processed and the next instruction fetched.
[000110] In accordance with a further aspect of the apparatus and method of this disclosure, the instruction may, as an extension of the structure illustrated in FIG. 3.5, comprise yet further fields HFOI 11F021-1F03 HF04.... 1-WON of frequently occurring operations, each of which represents the next most frequent operation group, as shown at FIG. 3.9. FIG. 3.9 illustrates a structure which starts with (on the left-hand side in the figure) the opcode field for the operation group which has the highest frequency, followed by opcode fields for consecutive operation groups of progressively decreasing frequency. The number of operations in each operation group, depends on the width (number of bits) of the opcode field for each group. As with structures shown in FIGs 3.1 to 3.7, the instruction may also optionally contain relevant operands.
[000111] Figure 3.10 demonstrates another aspect of the apparatus and method of the invention and illustrates a longer instruction structure supporting all the potential operations, as represented by the opcode fields UFO() .... HIFON and LFO (for N+1 higher operation frequency subsets and one subset of low frequency). As explained in respect of previous figures, for an opcode field other than the first to be interpreted the preceding opcode field has to contain a reserved value. For the contents of the LFO field in FIG. 3.10 to be interpreted as a valid operation, the preceding opcode fields up to and including TIFON all need to contain reserved values and will in effect be bypassed, in the sense set out earlier. However, as previously explained, the LFO opcode field represents the operations with the very lowest frequency of all the operations in the population of operations.
[000112] FIG. 3.11 shows another aspect of the apparatus and method in this disclosure, illustrating a practical example of a variation of the structure shown in FIG. 3.10. The underlying operation corresponding to the value "0101" in the LFO field may be, for example, to find the square root of a number in register R7 and place it in R1: SQRT R7, RI. Altogether the ISA contains 4 groups of operations. The instruction performs a square root operation on the contents of R7 and stores the result in RI. As before, each operation group supports a maximum number of operations, after reserving one value in each group to indicate that the next opcode field in the chain must be considered. Thus, with an FIFO° of 2 bits, FIFO] of 3 bits, 11F02 of 5 bits and an LFO of 4 bits, the architecture supports an instruction set comprising a maximum of 3+7+31+16=57 operations. HF00, HFOI and HF02 are shown to contain the reserved values, which, in this example, are "00", -000" and "00000", such that each successfully indicates the next opcode field in the chain must be interpreted as indicating there is another opcode field: HF00 contains a value set to '00' which indicates there is H701. which contains '000' which indicates there is HF02, which contains '00000' to indicate the final opcode field; the LFO is to be interpreted. In this ISA, the LFO field is shown to be physically positioned after the operands and the value '0101' identifies the instruction as a SQRT.
[000113] FIG. 3.12 shows a further example aspect of the apparatus and method of this disclosure, with exemplary values. This instruction starts with the same structure and data as in FIG. 3.]1 but also comprises a 5-bit extension opcode field. (Extension opcode fields are described above). The LFO value of '1111' had been reserved in anticipation of this need. This new instruction could be, for example DIV R6, R2, R3 (or: divide value contained in R6 by value contained in R2 and store the result in R3). As well as the FrFoo of 2 bits, HFOI of 3 bits, HF02 of 5 bits and an LEO of 4 bits, there is now an extension field of 5 bits, such that the structure now supports an instruction set with a maximum of 3+7+31+15+32=88 operations. It is now the LFO which signals there is another opcode: the LFO value is set to '1111' to indicate there is another opcode field, the extension, which in turn has a value set to '00000' to identify the instruction as a DIV with 3 operand fields.
[000114] As stated above, the number of frequently occurring operations which may be supported by any given dedicated opcode field depends on the number of bits: generally, a field oft/ bits would permit up to 2" operations to be identified, but one thereof is reserved to indicate the presence of another opcode field, as discussed above. For example, an HFO opcode field of 3 bits will allow 7 such high frequency operations, 5 bits will allow 3] high frequency operations, etc. In principle, in the extreme case where the architecture creates a separate opcode field, HIF00 to HFON, for any single operation, then a 1-bit field is all that is needed per operation, but in each case a separate bit must be used for this purpose. Such a scenario, with just one bit in the FIFO° field, is shown in FIG. 3.13. The instruction may also comprise a mixture of 1-bit and multi-bit opcode fields, as shown in FIG. 3.14. The advantage of 1-bit opcodes is that they are shorter, but multi-bit opcode fields have the advantage of suffering less chance of a loss of efficiency if the predicted operation frequency is not achieved in practice. Clearly, one can choose and implement the size of the opcode fields and the number of opcode fields to achieve optimum efficiency, optimum flexibility or a combination of flexibility and efficiency.
[000115] Conventional instruction schemes are constrained by the requirement to contain instructions in packets of bits of 8, 16, 32, 64 bits etc. and various schemes have stipulated a particular length for all instructions. The total length is generally fixed or, if not fixed, is constrained to a whole number of bytes. While the conventional advantage of such a restriction is that such instructions are predictable and so decode circuitry is simple, there are also inefficiencies, due to the need to include unused bits within the majority of instructions and the length of some fields is beyond what is optimal. As alluded to previously, these inefficiencies are amplified by the vast number of instructions and by the multiple processing of instructions as they are transferred between different subsystems of the computer system.
[000116] Various aspects of the apparatus and method in this disclosure address these drawbacks, providing enhanced performance and efficiency benefits, are discussed herein. As will be apparent from this disclosure (see below), in addition to the shortening of instructions being processed there is also a reduction in the number of instructions being processed. This is due to the greater likelihood of tasks being completed before being interrupted by other tasks in the system. Both of these effects lead to real physical improvements in the internal operation of the computer, with faster instruction fetching and processing times and fewer instructions requiring processing, resulting in enhanced processor performance and reduced energy consumption.
[000117] Further advantages of the invention will be apparent on consideration of the operation and workings of computer systems and in particular how instructions are processed by a computer system. Returning to FIG. I, it will be appreciated that, in accordance with an aspect of the invention, shorter opcodes and shorter instructions, as described above, will facilitate faster access and quicker fetches of instructions. Short opcodes and short instructions allow a greater number of instructions to be stored in the memory at each stage from bulk memory, on the right of FIG. 1, through to Ll instruction cache and into the CPU. There is a significant benefit when the CPU requires an instruction, in accordance with an aspect of the invention, if it can be fetched from faster memory e.g. from L2 cache rather than L3 cache. And because more instructions fit into memory of a certain size it means there is a higher probability of that occurring. For example, there is a huge time penalty if the CPU attempts to fetch an absent instruction from Li cache: if the instruction is not there, the processor will then search in the L2 cache; if the instruction is not present in L2, the processor moves on and searches in the next successive storage subsystem, etc., until it reaches the location where the instruction resides which could be the slow, bulk non-volatile memory.
[000118] Slowest memory devices and subsystems are the most abundant, and efficiency enhancements are greatest in respect of instructions in accordance with the invention located in such subsystems (those on the right in FIG. 1). In accordance with the system and method of this disclosure, the time required for the processor to fetch instructions from where they are stored may be reduced, and the time saving is greatest for those instructions located most remotely from the processor, such as in non-volatile memory. As instructions according to the invention are shorter than the comparable conventional instructions, the same memory subsystem or storage capacity can accommodate a larger number of instructions. A greater number of instructions can be accommodated by the processor itself, if it stores multiple instructions, and by other subsystems in the series of subsystems of a computer system as depicted in FIG. 1. Generally, as each subsystem can itself accommodate more instructions, fewer instructions need to be fetched from any other subsystem. Therefore, instructions in accordance with the system and method of this disclosure require fewer "fetches" from the next subsystem away from the CPU (the successive subsystems in the series), saving on the overall time to fetch instructions into the CPU, as well as the energy consumed due to the localization of fetches.
[000119] A further example aspect of the system and apparatus disclosed herein is the use of the use of bit-alignment for instructions, rather than byte or multi-byte alignment used in most conventional systems. With bit-alignment, instructions do not need to be a whole number of bytes long and start at addresses which are a whole number of bytes. In addition to the performance enhancements achievable from shortening instruction lengths, further efficiency improvements become achievable by the use of bit-alignment, instead of byte-alignment.
[000120] FIG. 4 provides a diagrammatic comparison of the conventional byte-aligned instruction schemes with instructions of 8 and 16 bits, though could be 32, 64 bits etc. with the apparatus and method in accordance with an aspect of the invention.
[000121] The same 5 instructions (ADD R1, R2; RTN, CLR R3; DIV R3, R4; and SET R5), are present on both sides of FIG. 4. In conventional practice, on the left of the figure, multiple groups of 8 bits, i.e. 16, 16, 8, 16 and 8 bits, are required by the 5 instructions respectively. In the example depicted, the instruction length may be 8 or 16 bits. Altogether 64 bits are needed for the 5 instructions, and 26 of these are unused bits.
[000122] The right-hand side of FIG. 4 shows the instructions in accordance with an aspect of the invention. Here the reader sees two important differences with the left-hand side. In an example aspect of the invention, the opcode field length in the first 3 instructions is shorter than in conventional schemes, as explained in previous passages. The reader will appreciate that for the first instruction the opcode field HF00 which contains the value "00", using two bits, represents an instruction of high frequency and takes the place of the instruction opcode "0011", using four bits, shown on the left-hand side. The opcode fields are always in the same relative position in the instruction, which is important for alignment aspects, as described later herein. The two operands, Operand 1 and Operand 2 are shown to be the same length in both schemes. The shortening of the opcode, according to an example aspect of the invention, provides a bit-saving of 2 bits in the instruction.
[000123] In an example aspect of the invention, which is also depicted in FIG. 4, the instructions on the right-hand side have no fixed length. In the absence of any constraint regarding byte-alignment, instructions can be of any length, and may be tailored to the precise length required for the significant bits in the instruction. The need in the conventional schemes for padding out of the instruction with unused bits, as discussed above, is eliminated, in this example aspect of the invention. It will be apparent to the reader, on consideration of just the first of the instructions in FIG. 4, i.e. -ADD RI R2", that in accordance with various aspects of the invention, the overall bit-saving is 8, or 50% of the length. For the second instruction, RTN, (with no operands) an even larger bit-saving is achieved, with the total length decreasing from 16 bits to just 2 bits, a saving of 14 bits (or 87%) Bit savings are also made with the third instruction, CLR R3.
[000124] Even if the operation in question is not high frequency, there may be a bit saving due to the absence of byte-alignment, as shown in the fourth instruction in FIG. 4. In an example aspect of the apparatus and method of the invention, even where there is little or no shortening of the combined opcode field lengths (because the operation does not occur frequently) there may nevertheless be an overall bit-saving as there is no need to "pad out" the instruction to ensure a length of multiples of 8 bits. It will be apparent to the reader, that on the right-hand side of FIG. 4, instruction DIV R3, R4, which, in this example, is shown as a low frequency operation, has no unused bits, so a saving of 5 bits is achievable, in comparison to the corresponding instruction in the conventional scheme.
[000125] The values allocated in the opcode field(s) of the instruction may be arbitrarily assigned to operations, as long as they uniquely identify the underlying operation. In cases where the opcode field is a pure proxy for the underlying operation, it serves only to identify the underlying operation, then the value e.g. "0101", "0111", "1011" etc. allocated to identify the instruction is not important. So, the architecture may assign any unique opcode value of the required length to a particular instruction. The architect may instead assign values on a serial basis, based on the frequency-ranking of instructions in the population of instructions (as described previously), or they may follow some other logic. However, if the opcode, or some bits of the opcode, also serve as control bits, the value assigned to the opcode or to the control bits, is important. For example, an ADD instruction could have an opcode value of '0100' and a SUB instruction the value '0110' because these are similar and may share some control lines. Then an ADDC instruction might have an opcode value of '0101' and a SUBC the value '0111' where the least significant bit controls the 'carry' bit. Clearly, if ADDC was a most frequent instruction and SUBC a least frequent instruction this probably couldn't be done as HFOI would hold the reserved value for SUBC. In which case some extra gates would be needed e.g. the gating of bit 3 with bit 7 to control the carry.
[000126] To understand how bit-aligned opcode fields will result in processor efficiency enhancements, one should also consider the "stages' undertaken by a processor in processing instructions and how this relates to the clock cycle. Typically, in synchronous systems, there are four such stages: fetching the instruction; decoding the fetched instruction; executing the decoded instruction; and writing the result of the executed instruction to a designated storage location. The four stages typically each take 1 clock cycle to complete, so all four stages, run serially, require altogether 4 clock cycles to complete the instruction. If each stage takes lps, then completing the instruction will take 4ps and as long as each step takes less than lps to complete the processor will function correctly. Some processors require 5 stages to process an instruction and others 6 stages, but for simplicity a 4 stage processor will be considered here.
[000127] In accordance with an aspect of the apparatus and method of this disclosure, the combined length of the opcode fields is not constant across all instructions in the instruction set, so the decoding task may take longer or shorter than in conventional schemes, where opcode lengths are fixed. In a pipelined scenario where each of the processor stages is utilised by a different instruction, if decoding is quicker than one clock cycle, then there is no performance improvement due to the decoding task: in such circumstances, one of the other three tasks will determine the overall duration of the four tasks which will determine the period of the clock cycle. If the decoding is longer than the clock cycle, the clock cycle must be lengthened (slower clock frequency e.g. 10Hz to 0.5GHz) in order to accommodate the decoding task [000128] The above analysis of tasks and the clock cycle is based on the assumption that each clock cycle is of fixed duration. However, in accordance with an aspect of the apparatus and method of this disclosure, the clock cycle may be of variable duration. In particular, the clock cycle duration may be determined by the opcode itself or by more than one opcode, or, in a further aspect, in an asynchronous design, there may be no clock cycle at all, or, in a yet further aspect, by a longer clock cycle, within which all four stages, rather than just one thereof, are completed.
[000129] In the former of these aspects, the clock cycle duration may be determined by the opcode, the value of which specifies the cycle duration to be applied for that particular processing stage. Alternatively, the cycle duration may be set by interpreting all the opcode values in the instruction. The clock period may, for the specific operation being processed, be set at a duration for all the instruction processing stages e.g. 4, to be completed within that duration. The next operation, having possibly a different inherent time for all tasks to be completed, may require a longer or a shorter duration, and may determine the clock cycle time accordingly. Alternatively, it may be set at a duration just for the decode stage Each opcode value, depending on the operation it identifies, may accordingly tailor the clock cycle time to process the corresponding stages, the determination of cycle times being specific to each opcode. A variable frequency clock cycle is typically achieved using a fractional clock frequency divider circuit.
[000130] The total opcode length, in accordance with an aspect of the invention, is determined by the individual opcode field lengths and the number of opcode fields. By means of a variable frequency clock cycle design, the length of particular clock cycles is determined from the opcode whereby a shorter total opcode length has a small clock cycle period (i.e. higher frequency) and a long total opcode length has a long clock cycle period (i.e. lower frequency). Those opcodes corresponding to high frequency operations may have shorter clock cycle times to those corresponding to lower frequencies. Take the example of instructions processed in a 10Hz system (which has a cycle time of Ins) with opcode fields IfF00, Fifth and LFO. The clock cycle times can be modified specific to an operation frequency group, for example, as follows: * the most frequent instructions (HFOU opcode field) perform the decode operation in 0 8ns; * the next most frequent instructions (111F00 and HMI opcode fields) perform the decode operation in Ins * the group of least frequent instructions (1-700, HIFOI and LFO opcode fields) perform the decode operation in 1.2ns [000131] The non-uniform clock cycle time may itself be determined from circuitry in the instruction cache which informs the processor which cyle time to adopt for each fetched instruction. Alternatively, a supplementary task, possibly part of the fetch task, may incorporate downloading of the instructions specific cycle time. The benefit of variable clock cycle times is that the most frequently occuring instructions can be accelerated, by using shorter cycles specifically for those instructions, thereby reducing the overall processing time.
[000132] A further benefit is derived from a variable frequency clock cycle design in other stages of the processor such as instruction fetch, execute or write back if these can be performed in less time than the standard clock period.
[000133] We return now to the instruction implementation in the processor. The execution and result writing steps in accordance with an aspect of the invention are the same as in conventional schemes. However, we analyse the effect of the short opcode in accordance with an aspect of the invention on the two other tasks: decoding and fetching. Let us first consider the decoding task. The decoding task for instructions in accordance with the apparatus and method of the invention are essentially the same as decoding tasks for instructions of conventional structure. The processing unit has decode circuitry typically comprising decode mapping ROMs each of which has a set of input address lines, to which the opcode bits are connected in order to address a particular location in the decode ROMs. The outputs from the ROMs are connected to the control lines of the processing unit's ALU, Sequencer etc, which also receive the operand bits, the control lines causing the specified operation to be performed on the supplied operands. Alternatively, the decode ROMs may be replaced with other memory types such as RAMs (not discussed herewith) or combinatorial logic may be used (not discussed herewith).
[000134] We saw above at Figure 4 (right hand side), an illustration of how by eliminating unused bits in instructions, fewer bits require processing and instructions can be processed in any given time. In that example, three instructions were considered: three (ADD, RTN and CLR) comprise high frequency operations which are identified in the FIFO° opcode field, while two (DIV and SET) are lowest frequency operations identified by the LFO opcode field. As explained below, the processing of these two groups is not the same.
[000135] Unlike instructions in conventional systems, where the implementation of the handling of all instructions is largely identical, in an aspect of the apparatus and method of this disclosure, groups of instructions belonging to different subsets and identified in different opcode fields, are handled in a non-uniform way, as shown in the example provided in FIG. 5, which provides a diagram of the decode memory block containing the decode ROM for the instructions in FIG. 4, with a corresponding circuit representation at FIG. 6.
[000136] Clearly, the circuitry for the input and output of the decode memory is permanently connected to the block, meaning that the number of address lines supplied to the decode memory block, see FIG. 6, is fixed but must be sufficient to handle the longest concatenation of all the opcode fields in any of the operations in the instruction set, which (see above) are of variable bit length. ADD, RTN and CLR, are, in this example, all identified in the two-bit HF00 opcode field, and provide input to the decode memory block via just two address lines. From FIG. 5, the CLR instruction can be seen to address 8 locations in the decode memory all of which contain the same value of '00000000'. This is because it is in the most frequent group and the HF00 field only contains 2 bits so the remaining 3 bits required are taken from the operand in this case and can be anything, thus all possibilities must result in the correct value being output. Although the remaining address lines (those corresponding to opcode fields after the FIFO° in lower frequency operations) are not meaningful in terms of opcodes (their values are marked greyed out in FIG. 5.) for ADD, RTN and CLR instructions, we see that each of the addressed values is replicated at 8 different locations in the decode memory block: the 5 address lines lead to the same content "00110011" at 8 different locations for the ADD instruction; the.5 address lines lead to the same content "01000100" at 8 different locations for the RTN instruction; and the 5 address lines lead to the same content "00000000" at 8 different locations for the CLR instruction. Unlike decode memory blocks in conventional system, where the content for ADD, RTN and CLR would each be stored at a single unique location, in this aspect of the apparatus and method of the invention, for operations of high frequency, there is a significant replication of content at different addresses in the decode memory and the memory block is significant larger than in conventional systems. Instead of just three addressed locations in the decode memory block, as in conventional systems, the content for these three instructions is stored over a total of 24 locations.
[000137] Continuing the analysis of FIG 5, we see that the other example instructions (SET and DIV) are identified by the LFO opcode field (as they occur with the lowest frequency) and there is no replication of entries in the decode memory block for the corresponding content, unlike the instructions identified solely by the FIF00 field. As illustrated in FIG. 6, for each of these instructions only one address is required in the decode memory block as it uniquely contains the corresponding content: 1 value on the input address lines for SET (11010) identifies the content "10000000" at a single location; and 1 value on the input address lines for DIV (11101) identifies the content -11001110" at a single location. This non-replicated addressing for the LEO opcode fields is the same as in conventional systems.
[000138] From the foregoing, it will be apparent, that in an aspect of the apparatus and method of the invention, operations identified by different opcode fields (reflecting different operation frequencies) are handled differently and the corresponding content for operations of different frequency opcodes is replicated in the decode memory block to a differing degree [000139] FIG. 6 shows a circuit representation of FIG. 5 but additionally shows the contents of all the entries in the decode memory. It can be seen that the HF00 and LFO opcode bits act as input address lines to the decode memory.
[000140] FIG.7 shows an alternative arrangement to FIG. 5. In this case two separate decode memory blocks or devices are required. The first is used to decode values from opcode field HF00 and the second to decode those from LFO. The advantage of this approach over that in FIG. 5 is that the total number of memory locations is less because there is only one entry per opcode value regardless of the opcode field. However, when compared to FIG. 6, additional circuitry is required as shown in FIG. 8. An AND plus a NAND gate are required to select which decode memory block operates the control lines.
[000141] However, it is in respect of the fetching task in accordance with the invention, that many of the efficiency benefits may be achieved. The "fetch" task initially comprises the simultaneous downloading of a number of bits (e.g. 64 bits) which may or may not contain a complete number of instructions. As we saw above, in conventional schemes the lengths of instructions fetched are a multiple of 8 bits: taking the instructions set out in FIG. 9 for example (left hand side), we see that both 2 conventional structure 32-bit instructions (ADD; JIVIPZ) could be fetched in exactly one 64-bit download. However, after the second instruction, (JMPZ) the 64 bits fetched are already depleted and the third instruction, in the conventional arrangement, requires further bits i.e. another read from memory. By contrast, instructions, in accordance with an aspect of the apparatus or method of the invention, have no fixed length, FIG. 9 (right hand side), and are shorter (with no unused bits). The first two instructions require together just 54 bits, meaning that the third instruction (CLR) can be fetched from memory in the same read as the first two instructions: compared to the conventional scheme (FIG. 9, left hand side), three instructions can be read instead of two, demonstrating that, in accordance with an aspect of the apparatus and method of the invention, more instructions can be fetched in a shorter time. More instructions can be obtained from each memory level with fetches than in systems of fixed length instructions or systems with instructions containing unused bits.
[000142] Both examples assume the interface between CPU and instruction cache is 64 bits wide. In the prior art case, instructions are fixed at 32 bits wide each so for every fetch from instruction cache, 2 instructions can be downloaded into the CPU. So, the CPU receives the ADD and the.IMPZ instruction in the first download. In the second download, the CPU receives the CLR instruction and the subsequent instruction. According to an aspect of the apparatus and method of this disclosure, the ADD instruction, in this example, is 20 bits long, the IMPZ 34 bits and the CLR 10 bits. So, in this case, in the first fetch from instruction cache, all 3 of these instructions can be downloaded showing more efficient use of the fetch operation. This example shows whole instructions being fetched in the first download. Had the CLR instruction been, say, 12 bits rather than 10 bits in length, then only part of the instruction could be fetched in the first download with the remainder being fetched in the second download. Such an arrangement requires circuitry to concatenate parts of instructions. According to an aspect of the apparatus and method of this disclosure, whether a fetch comprises whole instructions or a mix of whole and part instructions, the arrangement will be more efficient than the prior art.
[000143] We also see from the example in FIG. 9 that the addressing of instructions according to an aspect of the apparatus or method of this disclosure, may be by bit, such that any length of instruction can be contemplated. Instructions may also be addressed by dual bit or quad bit. By contrast, as shown in the example in FIG. 9, instructions in conventional systems, which were fixed in length, were addressed using word addressing On the earliest arrangements) or by byte addressing or by multi-byte addressing.
[000144] Byte-addressing, as we have seen, is a constraint of conventional systems, which not only dictates instruction length, but requires that an instruction in conventional systems often needed to be -padded" with unused bits in order to reach a length equating to a multiple of eight. Bit-addressing, by contrast, eliminates the need for such padding and allows instructions of any length, without supplementary unused bits: in accordance with an aspect of the apparatus and method of this disclosure, this, in turn facilitates not only short length instructions, but also, for example in the case of instructions which contain multiple opcode fields, long length instructions. Bit-addressing and bit-alignment, in accordance with an aspect of the disclosure, as discussed above in respect of FIG. 9 facilitates, in this way, more instructions to be fetched and processed for the same number of bits downloaded. Simply put, bit-alignment allows a greater number of instructions to be packed into the download. As we have seen, the opcodes of high frequency instructions, in accordance with an aspect of the disclosure, may be allocated identifiers, HF00, HFOI, HF07, HFOx etc, which are shorter than the identifiers would be for the underlying instructions, and provide further scope, in addition to bit-alignment, discussed above, to shorten the instruction length, to reduce bits and to save on bit processing.
[000145] Although this disclosure makes reference to several examples of the aspects and embodiments, it will be readily understood that the embodiments are not restricted to those which are herein explicitly referenced: all aspects and embodiments may be modified to comprise any amendments, variations, alterations or substitutions, including those not exemplified or expressly referenced herein. Accordingly, the embodiments of the apparatus and method herein disclosed are to be understood as not limited by the above written description and are only limited by the scope of the claims herein.
[000146] Where some features of various examples or embodiments appear in some examples, embodiments or drawings and not in others, this is only for intelligibility of the disclosure. Any components, features and structures of different aspects and embodiments disclosed herein may be combined as appropriate. Even if such combinations are not illustrated or explicitly referenced herein, in relation to a particular aspect of a particular embodiment, this is merely for brevity of this text and should not be understood to mean that such combinations are excluded or cannot occur: the different features of the various aspects and embodiments may be mixed and combined as appropriate and this disclosure should be construed as covering all combinations and permutations of those features referenced herein.

Claims (25)

  1. CLAIMS1. A computer system comprising: a processing unit comprising an instruction set architecture comprising a set of machine instructions, wherein the processing unit is configured to process the set of machine instructions, each machine instruction comprising at least one computer operation, each computer operation having an operation frequency, wherein each operation is classified, by its operation frequency, into one of a plurality of subsets of operations, each such subset of operations corresponding to a range of operation frequencies, and each machine instruction comprises one or more opcode fields, each opcode field (HM) corresponding to one of the plurality of subsets of operations and being configured to identify either an operation in the corresponding subset of operations or a reserved value; and one Of more memory subsystems comprising one Of more memory devices in electronic communication with the processing unit.
  2. 2 A computer system as in any preceding claim, where the number of operations in each subset of operations is determined at least by the number of bits in the corresponding opcode field.
  3. 3 A computer system as in any preceding claim, where the first opcode field (HF00) present in the instruction corresponds to the subset of operations of highest operation frequency in the instruction set
  4. 4 A computer system as in any preceding claim, wherein the order of logically consecutive opcode fields is the order of decreasing operation frequency, the logical first opcode field (HF00) corresponding to the subset with the highest operation frequency range, with, if present in the instruction, logically consecutive opcode fields (HF0x) corresponding to subsets of successively lower operation frequency ranges.
  5. 5. A computer system as in any preceding claim, further comprising a least frequent opcode field (LFO) which corresponds to the subset of operations of lowest frequency of all operations in the instruction set arid is the logically last opcode field in the instruction.
  6. 6 A computer system as in any preceding claim, wherein the logically last opcode field (HFON) present in the instruction is, apart from the LFO, if present in the instruction, the opcode field corresponding to the subset of operations with the lowest operation frequencies.
  7. 7 A computer system as in any preceding claim, wherein the number (NH) of opcode fields present in the instruction, apart from the LFO if present, is determined at least by the number of bits in the opcode fields present in the instruction.
  8. 8. A computer system as in any preceding claim, wherein the plurality of subsets of operations corresponds collectively to all the operations in the instruction set.
  9. 9. A system as in any preceding claim wherein the instructions each have an instruction length, the instruction lengths including lengths other than a multiple of 8 bits
  10. 10. A system as in any preceding claim wherein at least one of the instructions is multi-bit instructions and further comprises one or more operand fields
  11. 11. A computer system as in any preceding claim, wherein at least one of the instructions comprises only opcode fields and operand fields.
  12. 12. A system as in any preceding claim wherein at least one of the instructions contains no unused bits.
  13. 13. A computer system as in any preceding claim, wherein the opcode field (HFO,, LFO) comprises a value which identifies either the operation in the corresponding subset of operations or the reserved value.
  14. 14. A system as in Claim 13 wherein, if the value comprised in the opcode field identifies the operation, the instruction, if processed, is executed.
  15. 15. A system as in Claim 13 or 14 wherein, if the value comprised in opcode field identifies the operation, the opcode field is the logically last opcode field in the instruction.
  16. 16. A system as in a Claim 13 wherein, if the value comprised in the opcode field is the reserved value, the value indicates that there is a logically further opcode field.
  17. 17. A system as in Claim 13 wherein, if the value comprised in the opcode field is the reserved value, the value is an extension indication value which indicates that there is an extension field
  18. 18 A system as in any preceding claim wherein the processing unit is configured to process instructions comprising instruction addressing which is not byte or multi-byteaddressing.
  19. 19. A system as in any preceding claim wherein the processing unit is configured to process instructions comprising either instruction addressing which is bit-addressing or address offsets for instruction addressing which is bit-addressing.
  20. 20. A system as in any preceding claim wherein the instruction set architecture is hardwired in the processor.
  21. 21. A system as in any preceding claim wherein the machine instructions are hard-wired in the processor.
  22. 22. A computer system as in any preceding claim, wherein the operation frequency is a static frequency of occurrence of the operation in the instruction set which comprises the machine instruction comprising the operation.
  23. 23. A computer system as in any of Claims 1 to 21, wherein the operation frequency is a dynamic frequency of occurrence of the operation, when executed, in the instruction set
  24. 24. A computer system as in any preceding claim, each operation having a processing time, wherein the processing time is variable and specific to each operation
  25. 25. A method implemented on a computer system, comprising a processing unit and one or more electronic devices in electronic communication with the processing unit, the method comprising the steps of processing, by a processing unit comprising an instruction set architecture comprising a set of machine instructions, each machine instruction comprising at least one computer operation, each computer operation having an operation frequency, wherein each operation is classified, by its operation frequency, into one of a plurality of subsets of operations, each such subset of operations corresponding to a range of operation frequencies, and each machine instruction comprises one or more opcode fields, each opcode field (HF0x, LFO) corresponding to one of the plurality of subsets of operations and being configured to identify either an operation in the corresponding subset of operations or a reserved value; and communicating, by the processing unit, with the one or more memory subsystems comprising one or more memory devices.
GB1911667.2A 2019-08-15 2019-08-15 Efficient processor machine instruction handling Pending GB2586258A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1911667.2A GB2586258A (en) 2019-08-15 2019-08-15 Efficient processor machine instruction handling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1911667.2A GB2586258A (en) 2019-08-15 2019-08-15 Efficient processor machine instruction handling

Publications (2)

Publication Number Publication Date
GB201911667D0 GB201911667D0 (en) 2019-10-02
GB2586258A true GB2586258A (en) 2021-02-17

Family

ID=68072833

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1911667.2A Pending GB2586258A (en) 2019-08-15 2019-08-15 Efficient processor machine instruction handling

Country Status (1)

Country Link
GB (1) GB2586258A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997048041A1 (en) 1996-06-10 1997-12-18 Lsi Logic Corporation An apparatus and method for detecting and decompressing instructions from a variable-length compressed instruction set
US20020013691A1 (en) * 2000-03-15 2002-01-31 Peter Warnes Method and apparatus for processor code optimization using code compression
US20070022271A1 (en) * 2005-06-23 2007-01-25 Fujitsu Limited Processor with changeable correspondences between opcodes and instructions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997048041A1 (en) 1996-06-10 1997-12-18 Lsi Logic Corporation An apparatus and method for detecting and decompressing instructions from a variable-length compressed instruction set
US20020013691A1 (en) * 2000-03-15 2002-01-31 Peter Warnes Method and apparatus for processor code optimization using code compression
US20070022271A1 (en) * 2005-06-23 2007-01-25 Fujitsu Limited Processor with changeable correspondences between opcodes and instructions

Also Published As

Publication number Publication date
GB201911667D0 (en) 2019-10-02

Similar Documents

Publication Publication Date Title
US6418527B1 (en) Data processor instruction system for grouping instructions with or without a common prefix and data processing system that uses two or more instruction grouping methods
US7581082B2 (en) Software source transfer selects instruction word sizes
RU2620930C1 (en) Processor, method, system and equipment for vector indexed memory access plus arithmetic and / or logic operations
US9842046B2 (en) Processing memory access instructions that have duplicate memory indices
US8904153B2 (en) Vector loads with multiple vector elements from a same cache line in a scattered load operation
CN109144568B (en) Exposing valid bit lanes as vector assertions to a CPU
EP1023660A1 (en) Processor utilizing template field instruction encoding
US7313671B2 (en) Processing apparatus, processing method and compiler
CN101887357A (en) Variable register in the instruction set architecture and digital section coding immediately
EP3343360A1 (en) Apparatus and methods of decomposing loops to improve performance and power efficiency
WO2003098379A2 (en) Method and apparatus for adding advanced instructions in an extensible processor architecture
US5805850A (en) Very long instruction word (VLIW) computer having efficient instruction code format
US5918031A (en) Computer utilizing special micro-operations for encoding of multiple variant code flows
US20220308763A1 (en) Method and apparatus for a dictionary compression accelerator
US6857063B2 (en) Data processor and method of operation
WO2016210023A1 (en) Decoding information about a group of instructions including a size of the group of instructions
Haas et al. HW/SW-database-codesign for compressed bitmap index processing
GB2586258A (en) Efficient processor machine instruction handling
EP0363174A2 (en) Branch on bit processing
US7721054B2 (en) Speculative data loading using circular addressing or simulated circular addressing
CN100356318C (en) Methods and apparatus for instruction alignment
US9477473B2 (en) Bit-level register file updates in extensible processor architecture
EP3343361A2 (en) Apparatus and methods to support counted loop exits in a multi-strand loop processor
US7389408B1 (en) Microarchitecture for compact storage of embedded constants
WO2023027823A1 (en) Issue, execution, and backend driven frontend translation control for performant and secure data-space guided micro-sequencing