MXPA98002291A - Apparatus for detection based on region of interference between reordered memory operations in a process - Google Patents

Apparatus for detection based on region of interference between reordered memory operations in a process

Info

Publication number
MXPA98002291A
MXPA98002291A MXPA/A/1998/002291A MX9802291A MXPA98002291A MX PA98002291 A MXPA98002291 A MX PA98002291A MX 9802291 A MX9802291 A MX 9802291A MX PA98002291 A MXPA98002291 A MX PA98002291A
Authority
MX
Mexico
Prior art keywords
instruction
order
region
location
processing unit
Prior art date
Application number
MXPA/A/1998/002291A
Other languages
Spanish (es)
Inventor
Humberto Moreno Jaime
Moudgill Mayan
Original Assignee
International Business Machines Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation filed Critical International Business Machines Corporation
Publication of MXPA98002291A publication Critical patent/MXPA98002291A/en

Links

Abstract

The present invention relates to a computer processing system, wherein sequences of instructions are executed by a processing unit, in which at least one of the instructions is a loading instruction that moves from an original position in the sequences instruction to a previous position in the instruction sequences, and where the load instruction at least moves on at least one storage instruction, thus becoming an out-of-order load instruction, where the load instruction is order identifies a location in a memory subsystem from which data is read, and the store instruction at least identifies a location in the memory subsystem in which to store data, a method to detect interference between the out-of-order load instruction and the instruction to store as a minimum, and to recover from such interference, the method is characterized by or comprising the steps of: storing in a table a plurality of inputs, wherein each input E corresponds to a region R of a plurality of regions of the memory subsystem, wherein the input E includes at least one field indicating: (i ) if the processing unit processes at least one out-of-order loading instruction that loads data from a location within the R region, and (ii) if the processing unit processes at least one instruction to store with interference that stores data in a site Within the R region, where the interference store instruction interferes with an out-of-order load instruction, which loads data from a location within the R region, identifying an E1 entry corresponding to a first load instruction out of order which is processed by the processing unit, where the input E1 corresponds to a region R1 of the memory subsystem and the first load instruction out of order loads data from From a location within region R1, upon reaching the original position of the first load instruction out of order, control the processing unit to execute a recovery sequence if the field of at least one E1 input indicates that the processing unit processes the minus an instruction store with interference, which stores data to a location within the region

Description

APPARATUS FOR DETECTION BASED ON REGION OF INTERFERENCE BETWEEN REPEATED MEMORY OPERATIONS BN tIN PROCESSOR BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates in general to reordering memory operations in a processor in order to exploit parallelism at program instruction level and more particularly to an apparatus for the detection of incorrect execution of a memory loading operation performed previously than previous memory storage operations (in program order). The invention is applicable to reordered operations when the program is generated (static reordering), as well as to operations reordered in the execution time (dynamic reordering). 2. Description of Early Technique Contemporary high-performance processors are based on super-scalar, superchannelized, and / or very long word instruction (VLIW) techniques to exploit command-level parallelism in programs; that is, to execute more than one instruction at a time. In general, these proctors contain multiple functional units, execute a sequential stream of instructions, are able to search for more than one instruction per cycle and are able to dispatch for execution, more than one instruction per cycle subject to dependencies and availability of resources , The set of instructions from which the processor chooses those that are dispatched at a certain point in time, is enlarged by the use of out-of-order execution. Out-of-order execution is a technique by which operations in a sequential stream of instructions are reordered, so that subsequent appearing operations are executed previously if the resources required by the operation are free, thus reducing the Total execution time of a program. Out-of-order execution exploits the availability of multiple functional units by using otherwise inactive resources. Reordering the execution of operations requires reordering the results that are produced by those operations, in such a way that the functional behavior of the program is the same as that which would be obtained if the instructions were executed in the original sequential order. In the case of memory related operations, a memory load operation reads a memory data, loads it into a processor register, and frequently begins a sequence of operations depending on the loaded data. In this way, in order to use inactive resources, the early (out-of-order) initiation of memory load operations may hide delays in accessing memory, including potential cache failures. There are two basic approaches to implement out-of-order execution and reordering of results: dynamic reordering and static reordering. In dynamic reordering, instructions are analyzed at runtime, and instructions and results are reordered in physical equipment. In static reordering, a compiler / programmer analyzes and reorders the instructions and the results produced by those instructions when the program is generated, in this way the reordering tasks are carried out in software programs. These two approaches can also be used together. A factor that limits the ability to reorder operations are ambiguous memory references; this is the case when a memory load operation appears after a memory storage operation in a sequential instruction stream, and it is not possible to determine in advance whether the memory locations accessed by the load and storage are different. For example, consider the following piece of code: * x = (a + b + 2) < < 4 r = ((* Y) + c) ~ d where: * X indicates the memory location whose address is contained in X; «Indicates a left shift operation; and indicates an EXCLUSIVE 0 (XOR) operation. Considering that a, b, c and d are values stored in registers rl to r4 of a processor and that X and Y are in registers r8 and r9, then this code fragment can be represented by the following sequence of instructions (where the first record after the name of the instruction is the target record, and the remaining records are the operands): add rl0, rl, r2 rio - = a + b add rll, rlO, 2 rll = a + b + 2 shift_left rl2, rll , 4 rl2 = a + b + 2 < < 4 store rl2, (r8) * X = a + b + 2 < < 4 load r20, (r9) r20 = * Y add r2l, r20, r3 r21 - = * Y + c xor r22, r21, r4, -r22 - > (* Y + c)? D If it can be determined that X and Y are different, then the two expressions can be programmed for execution in parallel, giving a sequence like (where the symbol is || denotes execution in parallel): add rl0, ri, r2 || load r20, (r9) add rll, rio, 2 shift_right rl2, rll, 4 | [add r21, r20, r3 store rl2, (r8) || xor r22, r21, r4 In a machine with two execution units, the previous sequence would take 4 cycles to complete (considering that a load takes two cycles and other operations take a single cycle). On the other hand, if it can not be determined that X e And they are always different, that is, the directions are ambiguous, then the two expressions must be programmed in the original order, taking 8 cycles. { considering again that a charge takes two cycles). The previous example is not ical; Ambiguity in memory references limits performance substantially or severely by forcing the sequential execution of operations that could otherwise be performed in parallel. However, this serialization can be avoided (that is, the loading operation can be performed prior to the storage operation) as long as the storage operation does not interfere with the loading operation. Operations interfere every time the memory locations accessed by the storage operation and the out-of-order loading operation overlap or overlap. Furthermore, if the storage operation and the out-of-order loading operation do not interfere, any operation that depends on data loaded out-of-order can also be performed out-of-order. On the other hand, if the operations interfere, the data loaded out of order and any results derived from it are invalid, being necessary to re-execute the load operation after the storage operation, as well as the associated dependent operations. Several attempts have been made to solve the problem of reordering memory operations with ambiguous references by processors. These schemes detect interference by comparing the address of the memory location accessed by an out-of-order load, with the addresses of the memory locations accessed by successive storage operations, within an execution window determined by the extension of the reordering of the loading operation. If the addresses overlap, then it is considered that the operations interfere, so that the loading operation (and those operations that depend on the load that have already been executed, apply) must be re-executed. That is, the mechanisms verify if there has been any modification to the memory location that contains an out-of-order data when tracking memory addresses. The detection is performed either by extra instructions (software-based schemes (software )) or by physical equipment resources (schemes based on physical equipment) sometimes with software support. For example, in the case of software based interference detection, the previously given code fragment can be modified as follows: r = ((* Y) + c)? D * X = (a + b + 2) < < 4 if (X s- = Y) / * compares the addresses * / r = -. ((Y) + c)? D endif That is, the program statements can be rearranged, so that the loading operation implied by * Y is done prior to the store operation implied by * X; additional statements are introduced to compare the addresses of the memory locations referred to by the loading and storage operations, and to re-execute the statement containing the load operation each time the addresses correspond. In the case of a static reordering, the sequence of instructions generated by the compiler / programmer differ among the various schemes proposed to deal with ambiguous memory references. Usually, a load instruction that has been moved over a stored instruction is replaced by some new instruction or sequence of instructions that performs the load operation and configures a mechanism to verify the addresses used by the storage instructions; Another instruction, or an instruction field in the out-of-order load instruction, is used to indicate the site where the load instruction was originally located, which determines the end of the range of verification for storage operations. interfencía. In the case of dynamic reordering, the instructions for loading and storing are presented to the processor in order of program, that is, the store instruction followed by the load instruction. The processor reorders the instructions, marks load instruction as a force-of-order operation, configures a mechanism to detect interference from storage operations (which includes the identification of a range of verification) and recovers the correct status of the processor, when detects interference. This invention follows the detection and interference approach based on physical equipment between out-of-order loading and storage operations, with a mechanism for recovering from the case of incorrectly reordered memory operations. A summary of relevant related technique in the field is now established. A method and apparatus for improving the performance of out-of-order operations is described by M. Kumar, M. Ebcioglu, and E. Kronstadt in their patent application entitled "A method and apparatus for improving performance of out-of -sequence load operations in a computer system "(A method and apparatus for improving performance of off-sequential loading operations in a computer system), US Serial No. 08 / 320,111 filed on October 7, 1994, as a continuation of the U.S. patent application. 07 / 880,102 filed on May 6, 1992, and granted to the assignee of this application. This method and apparatus uses compiler technique, four new instructions and a unit of comparing directions. The compiler statically moves memory loading locations on memory storage operations, marking them as out-of-order instructions. The addresses of operands loaded out of order are stored in an associative memory. Upon request, the unit compares addresses, compares the addresses stored in the associative memory with the address generated by the store operations. If a conflict is detected, the recovery code is executed to correct the problem. The system frees addresses stored in associative memory when there is no further need to compare those addresses. This approach is intense in physical equipment, and also requires special instructions to trigger verification by conflicts in directions as well as freeing the address of an operand that is no longer required. The patent application of the U.S.A. No. 08 / 435,411 filed on May 10, 1995, in the name of Ebcioglu et al., Granted to the assignee of the request, combines reordering of memory locations with speculative execution of memory operations. The reordering of memory operations is based on: static reordering of code by the compiler; support of special physical equipment to detect conflicts in memory references and manipulate data loaded out of order; and code generated by compiler to operate on out-of-order loaded data and to recover from conflict detection. Special physical equipment support consists of an address record for each record, which may be the destination for the result of an out-of-order load operation, a comparator associated with, each address record, and special instructions for load an out-of-order data and to "commit" this data as well as any other values derived from it in points in order in the program. Each out-of-order load registers or records in the corresponding address register, the memory address and data size loaded; each store operation triggers comparison of tupia (address, size) against the contents of all address records. If any such corapasion is void, then the corresponding address record is marked as invalid. A special commitment instruction is executed at the point-in-order of the load instruction, which checks whether the associated address record is valid; if so, the data loaded out of order and data in memory, are coherent. On the other hand, if the address register is invalid, then the out-of-order data and the memory contents are not coherent, so that the charge pressure as well as any other operation must be re-executed. A trap is invoked at that time, transferring ejection control to the recovery scripted by the compiler that re-executes the load operation, as well as the dependent operations. The patent of the U.S.A. No. 5,421,022 entitled "Apparatus and method for speculatively exesuting instrustions in a somputer systept" (Apparatus and method for speculatively executing instructions in a computer system) granted on May 30, 1995 in the name of P. McKeen et al., Describes an apparatus usable in the case of ambiguous memory operations statically re-ordered, which "is based on content-addressable memories { CAM = content-addressable memories) to compare the direction of any executed storage operation with the address of any load instruction outside If an overlap is detested, the device treats the out-of-order load as if it caused an exception, effectively causing the re-execution of the load operation at its point in-order, in its in-order state (or precise) Similarly, U.S. Patent No. 5,420,990 entitled "Mechanism for enforcing the correct order of instruction execution". to enforce the correct order of execution of instruction) also granted on May 30, 1995 in the name of P. McKeen et al., discloses an apparatus closely related to that proposed in the US patent. No. 5,421,022, but usable in the case of memory operations dynamically re-ordered by the processor; this device is also based on addressable memories per content. A method and apparatus for reordering loading instructions are described in the patent application entitled "Memory processor that permits aggresive execution of load instructions" (F.Amerson, R. Gupta , V. Kathal and M. Schlansker (UK Patent Application, GB 2265481A, No. 9302148.3, filed 04/02/1993). This patent application describes a memory processor for a computer system, wherein a compiler moves long-latency load structures previously in the sequence of instructions, to reduce the loss of efficiency resulting from the latency of the twill. The memory rescuer saves loading instructions in a special log file for a period of time, sufficient to determine if any subsequent storage instruction that had been executed before loading, refers the same direction as in the espesifisada by instrussión twill. If so, the memory processor reinserts the original twill in the instruction stream, so that it runs in-order. In this way, this system allows moving loads ahead of storage under compiler control, and relies on physical equipment to insert code to recover from a conflict. However, this system does not allow to reorder other instructions that depend on the load (the resources of physical equipment are sapases to reinsert only the load instrusción). In other words, the method and apparatus are limited to hiding the latency of load instructions, whose maximum value must be known at the time of combining. The article by K. Diefendorff and M. Alien with title "Organization of the Motorola 88110 superscalar RISC microprocessor" (Organization of the RISC super-sampler Motorola 88110) IEEE Misro, April 1992, pp. 40-63, dessríbe the dynamic programmer in the Motorola 88110 processor that dispatches instructions to store a waiting list to store, where the storage operations can be blocked if the operand to be stored has not yet been produced by another operation. Subsequent loading instructions can derive the operations of immediately storing and accessing memory, achieving dynamic reordering of memory assesos. A pressure monitor detests steering hazards and prevents twill operations from proceeding in front of store operations to the same address. The waiting list maintains three pending state operations. The structure does not really move a previous load in the sequential execution stream; on the contrary, it only allows a loading operation not to be delayed as a result of a blocked storage operation. 3. Problems with the State of the Techniques Technicians based on software (software) to detect interference between ambiguous reordered memory operations have a disadvantage due to a large general management expense, in the form of additional intrusions that must be executed. Specifically, a charge instrussión requires verifying against any ambiguous store instruction in which it moves. For example, consider the case of moving a load instrussion over several processes of almasena somo in the following sesuensia: store (store) r7, (r21) store (store) r8, (r22) store (store) r9, (r23) load { twill) rl5, (r25) In this case, the interference test requires comparing the address in register r25 with the addresses in registers r21, r22 and r23. In this way, the interference test requires at least no instructions and may require many more (depending on the primitives to perform the comparisons and to mix several so paras). Even more, if the loading and storing instructions are aligned by octets (ie, twist instruction access data and stored at any memory octet boundary, and the accessed data is more than one octet) or if loading instructions and stores access entities of different sizes (different number of bytes of memory), then the test is more complicated. Instead of just checking for equality in addresses, the interference test should check for address overlap. In this way, considering for example that rY contains the address used in an out-of-order load instruction and rX contains the address used in a successive store instruction, then the test was to verify that rY-rX is less than number of octets stored per storage instruction and that rX-rY is greater than or equal to the number of octets accessed per load instruction. Options assisted by physical equipment or only of physical equipment to detect interference between ambiguous reordered memory references, avoid the overhead of management that arise when executing extra instructions when saving the memory address used by instructions of load out-of-order in resources of special physical equipment (comparative records) and to continuously verify the contents of those records for overlap against the instructions of storage instruction. The resources required for verification of physical equipment are complex and expensive. In each cycle, these resources must compare the address of each store operation issued in that cycle (considering that one or more operations can be issued simultaneously) with all pending out-of-order cargo operations (ie those that have not yet reached its point in-order). This functionality can be achieved through the use of address manageable memories, special log files or multiple comparators, as illustrated by the examples of the previous technique given above. However, these physical equipment resources can only save (and compare against) a fixed number of out-of-order load addresses at any point. Usually, this is a small number, so that only a limited (fixed) number of loading operations can be executed out-of-order at any point in time. This fixed border implies that an out-of-order loading instruction can not be issued as soon as a load unit is available to execute it; on the contrary, the physical equipment for address verification must also have available resources to save the generated address. This feeding contributes complexity to the dispatch mechanism in the case of dynamic reordering or restricts the number of ambiguous load instructions that can be moved out-of-order in the case of static reordering (ie the compiler must ensure that at any given time, no more ambiguous loading instructions have been moved on souvenir instrussions than the number of monitors available). COMPENDIUM DB I-A INVENTION The previously stable problems and the relasioned problems of the previous thesis are solved with the principles of the present invention, an apparatus for detecting and recovering from inscrutable ejecusion of reordered memory operations in a processor. The invention is a simpler alternative to other mechanisms because it does not require expensive hardware resources to track the addresses referred to by out-of-order loading instructions. The invention is applicable to reordered operations when the program is generated (static reordering) as well as to operations reordered to the execution time (dynamic reordering). A computer processing system stores instruction sequencing in a memory to be executed by a processing unit. An out-of-order load instruction can be created, either statically or dynamically, by moving a twill instruction from its original position in a sequence of instructions to a previous position in the instruction sequence. The present invention comprises a method and apparatus that maps the memory address space in a set of regions and verifies interphase between memory operations reordered in the granularity of a region. The addresses referenced by the reordered memory operations are not directly compared; on the contrary, those addresses are mapped in regions, and an apparatus detects if there has been interference reference with these regions. This approach reduces the cost of physical equipment imposed by interference detection, at the potential cost in some rare cases of detecting false interference. A region mapping table maps the memory address space in a set of regions. The number of memory locations per region is determined by the number of regions supported by the mapping table (the number of entries in the table) and the minimum interference unit size. For example, a region mapping table may contain 512 entries, and the minimum interference unit may be a memory word (four octets); in such a case, a memory address space of 2"is mapped in 512 regions are 221 memory locations (words) per region. Memory references are mapped either over a single region or in more than one region, every time the referensia sarea the alignment of the minimum interference unit The reordering of the operations can be restricted in such a way that out-of-order loading operations do not address the alignment of the minimum interference unit. -order, the entry in the region mapping table corresponding to the memory address of the force-of-order loading operation is marked, thus indicating the presence of the out-of-order loading operation. When a storing operation is executed, if the entry in the region mapping table corresponding to the memory address of the storing operation is marked with an out-of-order loading operation, then the ntrada is marked again indicating the presence of an interferensia store operation in the memory region. When the in-order position of an out-of-order loading operation is reached, the entry in the region mapping table corresponding to the memory address of the out-of-order loading operation is inspected. If the entry has been marked indicating the presence of an interference store operation, then the out-of-order loading operation is considered to be executed incorrectly and a recovery sequence is invoked. Otherwise, the entry is updated to indicate the end of verification by interference with the out-of-order load whose position in order has been reached.
An alternate method is as follows. When an out-of-order loading operation is executed, the entry in the region mapping table corresponding to the memory address of the out-of-order loading operation is marked, indicating in this way the presence of the operation off-order loading. If there is no other out-of-order load of that region whose position in-order has not been reached, the entry is marked as having no interference from souvenir operations. When a store operation is executed, the entry in the region mapping table corresponding to the memory address of the souvenir operation is marked, without inspection first if an out-of-order serge instrussion of the region has been executed. When the in-order posi- tion of an out-of-order loading operation is reached, the entry in the region mapping table corresponding to the memory address of the out-of-order twill operation is inspected. If the entry has been marked indicating the presence of an operation to store the region, then the out-of-order loading operation is considered to be performed improperly and a resupesion session is invoked. Otherwise, the input is updated so as to not interfere with the out-of-order load whose in-order position has been reached. This alternative can lead to detection of interference in cases where interference has not really occurred, but can lead to simpler modalities. The invention can be used for the case of statically re-aligned operations, that is, when rearrangement is performed by the compiler / programmer at the time the program is generated. In this case, an out-of-order loading operation is specified, when using a special instruction; the execution of this instruction includes marking the corresponding entry in the region mapping table. The compiler / programmer also generates a "commit" operation to use another special instruction, which is collated at the in-order (original) position of the load instruction; The execution of the commit instruction includes inspecting the corresponding entry in the region mapping table, and either inviting a resupesion session or highlighting the entry in the table. These two instrussiones determine the range of verifisar by interferensia of the operasiones of soulsenar. An entry in the region mapping table is marked indicating the presence of the interference store operation when a store operation is executed, either with or without verifying whether the associated entry is already marked by an out-of-charge operation -order. A program exception is generated every time interferensia is detested, leading to the execution of a recovery session generated statically by the compiler / programmer. The invention can also be used in the case of dynamically reordered instructions, that is, the reordering of instructions takes place at the time of execution. In disho saso, the generation of out-of-order twill operasions is performed by the prosecutor. An entry in the region mapping table is marked when a loading operation is dispatched for out-of-order execution, thus indicating the presence of an out-of-order loading operation in progress. The same entry is inspected when the out-of-order loading operation is removed, which occurs in the order-of-instruction position. These two events determine the range of verification for interferensias of operasion of souls. An entry in the region mapping table is marked indicating the presence of an interference store operation, when it is dispatched for execution to a store operation, either with or without verifying whether the associated entry is already marked by a load instruction out of order. A program exception is generated each time interference is detected. In that case, a sequencing of resuperasión is executed that returns the processor to the state in-order that existed before the ejecusión of the twisted instrussión that recovers. In the recovery sequence, all other instructions executed out-of-order are canceled, and execution proceeds from the load instruction in its original position. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a block diagram of a computer system that supports static reordering of instructions, including memory operations, according to the present invention. Figure 2 is a flow chart describing the execution of hard-of-order memory operations in the computer system illustrated in Figure 1. Figure 3 is a block diagram of a computer system that supports dynamic rearrangement of instructions, including memory operations, in accordance with the present invention. Figure 4 is a flow chart describing the processing of instructions in the computer system illustrated in Figure 3. Figures 5a and 5b are a graphic illustration of the contents of the region mapping table. Figure 6 is a flow chart describing the issuance of instructions in the computer system illustrated in Figure 3.
Figure 7 is a flow chart describing the removed in-order instruction in the computer system illustrated in Figure 3. Figure 8a illustrates input and output signals from a region mapping table that detect re-ordered memory operations incorrectly. Figure 8b illustrates the contents of each of the power signals to the region mapping table. Figure 9 is a block diagram showing the main components of the region mapping table. Figure 10 is a block diagram illustrating input and output signals in the face generation generating unit employed in the region mapping table of Figure 8. Figure 11 is a block diagram showing the stress of each cell in the region mapping table of Figure 9. DETAILED DESCRIPTION OF THE INVENTION Referring now to the drawings, and more particularly to Figure 1, there is illustrated a processor that is provided with physical equipment resources, to support reordered memory operations statically in accordance with this invention. The system consists of a memory subsystem 101; a data cash 102; a sachet of iristrucsion 103; and a processing unit 100. Prosecutor unit 100 includes a waiting list of instrussions 104, a region mapping table 105; one or more memory units 106 performing releaded operations are memory; one or more funtional units 107 performing integer, logic, and floating-point operations; a ramifission unit 108; and a register address 109. The instrussions are searched for the instrument cashme 103 under the control of the branch unit 108 and placed on the instruction waiting list 104. The instructions are dispatched from the instruction waiting list 104 to the memory units 106, functional units 107 and ramification unit 108 for execution. These units interact with the record file 109 to access the operands used by the instructions and store the results produced by the execution of the instructions. The registry 109 typically includes general purpose registers (GPRs), floating point registers (FPRs = floatíng-point registers), condition registers (CRs = condition registers) and special purpose registers (SPRs = special purpose registers). Also, the memory units 106 are the data sashé 102 for sifting the data used by the instrussions of the other units, and for mastering the results produced by these instrussions. Some instructions related to special memory access the region mapping table 105. According to the present invention, a programmer / compiler reorganizes memory operations to reduce the execution time of programs, following principles well-known in the previous technique, as discusses for example in the article "Run-ti e disambiguation: coping with statically unpredistable dependensíes" (Disambiguation of time of operasión: facing statistically non-predictable dependencies) by A. Nicolau, IEEE Transastions on Co puters, vol. 38, May 1989, here insorporated by referensia completely. Each time a loading operation is previously moved to a preceding storage operation, and the compiler can not terminate if the address of the memory locations accessed by the load-store instrussions are disarticulated, the load instruction is marked as a off-order loading operation of preference when using a different operation code. Furthermore, the original site of the twill struc- ture in the instrussion sesuensia is preferably marked by using a special "commit" instruction. The execution of intrusions in the somputation system illustrated in Figure 1, in accordance with this invention, is illustrated by the flow chart presented in Figure 2., In step 201, an instruction block is searched for the instruction cache 103 and placed in the instruction wait list 104, under the control of the branch unit 108. In step 203, a group of concurrently executable instructions is it extracts from the wait list of instructions 104 and dispatches the branch memory functional units for execution. This group corresponds to a sequence of instructions in strict order of memory that is, in the order in which they appear in memory. In step 205, it is determined whether any of the instructions dispatched for execution to a memory unit 106 is an out-of-order memory loading operation. This is determined by examining the operation code of the instructions. If so, at step 207 the entry in the region mapping table associated with the address of the memory location accessed by an out-of-order loading instruction is marked, thereby indicating the presence of an instruction of load out of order. An example of the entries of the region mapping table is described below with reference to Figures 5a and 5b. Similarly, in step 209 it is determined whether any of the instructions dispatched for execution to a memory unit 206 is a memory stick operation, by examining the operation code of the instructions. If so, step 211 at the entry in the region mapping table associated with the address of the memory location accessed by the store instruction is inspected to determine whether an out-of-order load operation from a region of corresponding memory has been dispatched or executed, indicating in this way the presensia of an out-of-order twister instrussion. If so, then in step 213 the entry in the region mapping table is marked such that the region has an interference store operation. More specifically, an interference store operation is a sequentially-sequential-order-store operation that accesses the same memory region as the out-of-order load operation. Further description of the marking of the storing operation with interference is given below, in conjunction with FIG. 9. In step 215, it is determined whether any of the instructions unbalanced for execution to a memory unit 106 is a loading operation engaging out -of-order, when examining the operation code of instructions. If so, in step 217 the entry in the mapping table of associ- ated region is the diression of the storage of memory referred by the committing instrument is inspected to determine if there is a store operation with inter- ference in the region of corresponding memory, indicating in this way interferensia between the operasiones. If so, then in stage 219, a sequencing of resuperation is invoked that returns the prosecutor to the state in-order that existed prior to the axes of the serpent intrusion. This resuscitation session can be reversed by the branch unit 108 which searches for instructions from a memory location containing a resupesion trace. Otherwise, the entry in the table is calculated to reflect the completion of the order-of-order instruction. Further description of the depreciation of an interference stored operation is given in conjunction with FIG. 9. Now with reference to FIG. 3, a superscalar processor is illustrated which is provided with physical equipment resources to support dynamic instruction rearrangement, including reordering of memory operations. This exemplary organization is based on the dessrita by M. Moudgill, K. Pingali, S. Vassiliadis in "Register rena ing and dyna is speculatisn: an alternative approach" (Renamed of registration and dynamic speculation: an alternative approach) in Minutes of the 26 Annual Internal Micro-Architecture Symposium (Proceedings of the 26th Annual International Symposium on Microarshitecture), pp. 202-2, December 13, 1993, here insorporated by referensia sompletamente. The system consists of a memory subsystem 301; a data sashé 302; a sashé of instrussiones 304; and a prosecutor unit 300. The prosecutor unit 300 includes a statement wait list 303 which contains instructions to be searched for the instruction cache 504 and / or memory subsystem 301 that has not yet been decoded; various memory units 305 that perform loading and / or storage operations; several functional units 307 that perform integer, logical and / or floating point operations; a branching unit 309 that performs branching operations; a log file 311 that contains operands used and results produced by the instructions; a register mapping table 320 containing the registration of the registration names specified in the instructions to the names of records in the record file; a waiting list of free records 322 containing the unused (available) names of records in the record file; a dispatch table 324 containing the decoded instruction includes its renamed registers, which wait for the resources required for its execution; an order-to-order table 328 containing the mapping of the names of architecture records to the names of records in the log file, reflecting the effects of the last instruction that has been removed (executed to completion); a withdrawal waiting list 326 containing instructions already dispatched for execution, and instruments already executed whose results have not yet modified the in-order status of the processor; and a region mapping table 330 to support out-of-order execution of memory operations. Figure 4 illustrates a flow chart describing the processing of instructions in the computer system illustrated in Figure 3, including the actions related to out-of-order memory operations. The process is divided into stages. In step 401, an instructing block is searched for the instruction cache 304 or the memory subsystem 301, and is only placed in the instruction wait list 303 under control of the branch unit 309. In step 403, a group of instrussiones is extracted from the waiting list of instrusciones 303 and decodes. The register names used by these instrussions to specify the operands are renamed according to the contents of the register mapping table 320, which specifies the current mapping from names of architecture records to physical records. Similarly, the names of records used by these instructions to specify the destinations by the results are assigned physical records that are extracted from the free register waiting list 322, which contains the names of physical records that are not currently used by the processor. . The renowned prinsipios of record are well known, as discussed for example in the book "Computer architecture: a quantitative approash, 2nd, ed.," (Computer Arquitetment:? N suantitative approach, 2 edision) by J. Hennessy and D. Patterson, Morgan Kaufmann Publishers, Inc., 1996. The record mapping table 320 is updated with physical record assignments to the destination record names specified by the instructions. The decoded instructions are all renamed records, they are placed in the dispatch table 324; In addition, the instructions, including their memory addresses, their physical names and archival records, are placed on the retirement waiting list 326 in order of the program. In step 405, a set of instructions from the dispatch table is chosen for execution, potentially out-of-order, and dispatched to the corresponding memory units 305, functional units 307, or branching unit 309. The candidate instructions for Selection must have all the required resources available (the physical records have been assigned to the expected operands, and the corresponding functional units are free). The operands used by the instructions are read from register form 311, which typically includes general purpose registers.
(GPRs), floating point registers (FPRs) and Registers of Condition (ORs). If any of the above instructions for axes is an out-of-order load operation, the entry in the region mapping table 330 corresponding to the out-of-order load instruction is marked to reflect the presensia of this operation between instructions. despashadas. If your choice of the selected instructions is a store instruction, and the entry in the region mapping table 330 corresponding to the store operation is marked indicating that there is an out-of-order loading operation of that region in progress, then the entry is marked to indicate that the region has an interference store operation (a storage-in-order-seserially-ordering operation that accesses the same region as the out-of-order load operation). An example of the entries of the region mapping table described below are taken from Figures 5a and 5b. Addendum marking of the store operation with interference is given later, in conjunction with Figure 6. In step 407, the results of instructions that complete the transaction are solved in the register file 311. The waiting list for withdrawal 326, and the instrussiones in the dispatch table 324 that await the physical records established by the instructions that the execution completes are notified. The withdrawal waiting list 326 is also notified if any of the instructions that complete the execution result in an ex-secssion. Finally, at step 409, the complete instructions are removed from the withdrawal waiting list 326, in order of program (from the head of the waiting list). Yes, there were no exceptions; by the instructions that are removed, the on-order mapping table 328 is updated in such a way that the architecture registration names used by the instructions point to the physical records in the record box 311 that contains the results of the instructions that they go; the previous record names of the 328 in-order mapping table are returned to the free record waiting list 322. More specifically, if a statement that is removed specifies the Rx architecture record as its objective and the result of the operation it has been collated in the physical record Py, then the contents of the entry for Rx in the mapping table in-order is adjusted for Py. On the other hand, if one or more of the instructions that are removed have resulted in an exception, the program control is set to the memory address of the first of these instructions (the address has been saved with the corresponding instruction in the withdrawal list 326), - the withdrawal list 326 is released (unloading) thereby filling all remaining non-retired instrussions; the register mapping table 320 conforms to the contents of the in-order mapping table 328; and any physical record not specified in the on-order mapping table 328 is added to the waiting list of free registers 322.
Also, in step 409, if any of the instructions that are removed is an out-of-order loading operation, the region mapping table 330 is inspected to verify if there has been a store operation with interference in the region that corresponds to the out-of-order loading instruction. If so, a recovery sequence that returns the processor to s? state in-order that existed prior to the execution of the serge instruction, is invoked. This recovery symbol can be invoked when giving rise to an e-xcepsión somo is dessribe previously. Otherwise, the entry in the table is enhanced to ensure the completion of the out-of-order loading operation. Further description of the detection of an interphase aliasing operation is given below, in which they are shown in Figure 9. Figure 5a is a breakdown of the memory address space mapping in regions. The mapping is done at the granularity of a severed interferensia unit. For example, we want to take a granularity of a double word of memory. In this case, double memory words are assigned to regions in instructive order; that is, the double word 0 is assigned to region 0, the double word 1 to region 1, and so on, to a double word k-l, which is assigned to region k-l. Assignment is wrapped around the inception of the region mapping table, so that double word k is assigned to region 0, double word k + 1 to region i, and so on. That is, the logak least significant bits of the double-word diression of a memory set determine its region; all the ostetos within a double word of memory are mapped in the same region. Alternatively, other mapping functions and / or other granularity functions can be used. In the previous example, memory operations that refer up to two double words are allowed; Consistently, memory references are mapped either in a single region or in two regions. An alternate embodiment of the invention can restrict the reordering of operations, such that out-of-order loading operations are mapped into a single region. Figure 5b is an example of the contents of the mapping table of the present invention. Each entry in the preference mapping table consists of a status field 510 and a sonating field 520. The sonorous sampo 520 indicates that so many off-order loading operations referring locations within the corresponding region have been dispatched to They have not yet reached their original position in the sequencing of instructions (they have not yet been committed).
The status field indicates whether interference has been detested between reordered memory operations that have referenced memory locations within the corresponding region. The possible values for the state field 310 are "clean" and "dirty". A status field set to "clean" indicates that no interference has been detected between the reordered memory operations within the region, that is, there has been no operation to store the corresponding region since a load operation is dispatched. out-of-order that is still in progress; on the other hand, a status field set to "dirty" indicates that there has been a store operation with interference to the region. Figure 6 is a flowchart that describes in more detail the actions performed by the processing unit 300 when dispatching instructions for execution, that is, when select instructions are sent to the corresponding functional units. This figure includes the actions respecting the modifisation of the region mapping table 330 and the detection of storage operations with interference. In step 601, the instruments to be disassembled for axle casting are chosen from the takedown table 324. In step 603, it is determined whether any of the selected instructions is a loading operation. If not, in step 611 it is determined whether any of the selected instructions is a storing operation. Otherwise, the process continues in step 609 where the instructions are dispatched for execution; that is, the instructions are sent to the corresponding functional units 305, 307 or 309, are removed from the dispatch table 324 and added to a withdrawal waiting list 326. If in step 611 it is determined that there is one or more storing operations among the selected instructions, in step 613, the counter field 520 of the region map table entry (s) 330 is inspected corresponding to the select store operations. In the preferred embodiment of the present invention, each store operation is mapped in one or two memory regions; the log2k least significant bits of the double word address of the memory location accessed by the store operation and the size of the operand determine the corresponding regions as described later in conjunction with Figure 9. Two adjacent regions are chosen from this mapping each time the starting address of the memory reference plus the size of the accessed operand causes a double-word memory boundary (the mapping unit.). For each store operation, if the value of its or its counters fields corresponding 520 is zero, indicating that there are no out-of-order loading operations in progress associated with the region (s), no action is taken, on the other hand, if the value of the corresponding sampler samples 520 is not zero , then in step 615 state field 510 is set to "dirty." If in step 613 it is determined that there is one or more loading operations among the selected instructions, then in step 605 it is determined whether any sequester twine operation is an out-of-order loading operation (ie there is a sequentially preceding storing operation that has not yet completed the ejection and thus resides on the waiting list of retirement 326.}. . This function is supported by two store operations counters, whose counting capacity is larger than the number of entries in the withdrawal waiting list. The first counter is incremented each time a storing operation is dispatched for execution; the second counter is incremented each time a sperassion is masked by full axes. At the time when a twill operation is overdue for execution, if the value of the two counters does not correspond, then the twill operation is out-of-order (before the completion of a presumable souvenir operation). If the value of the two players is, then the serge instruction is cleared in-order, and the process is continued in step 611. Alternatively, ordering and loading operations with respect to store operations can be determined on a region basis by increasing the table of mapping with store counters for each region and compare the storage counters of the region corresponding to the loading operation. If in step 605 it is determined that an out-of-order loading operation is dispatched, then in step 607 the entry in the withdrawal waiting list which is the load operation is marked as out-of-order, and the log2k least significant bits of the word address and the size of the operand evaluated are solosan in the entry in the retirement waiting list. In addition, in step 607, it is checked whether the counter field 520 of the input in the mapping table 330 is set to zero. A zero value indicates that there are no other out-of-order loading operations in progress. If so, in step 607, the status field 510 of the corresponding input is set to "clean", and the sonating field 520 is inset (adjusts to l); then the process continues in step 609. However, if the value of counter field 520 is not zero, then state field 510 is not modified, counter field 520 is incremented and the process continues in step 609.
Figure 7 is a flow diagram that describes in more detail the actions performed by the processing unit 300 when removing instructions, including the actions with respect to the mapping table 330. In step 701, the instructions to be removed are chosen from the Withdrawal waiting list 326. In step 703, it is determined whether any of the selected instrussions is marked as an out-of-order loading operation in step 607. If so, in step 705, it is determined whether the state field 510 of the corresponding entry in mapping table 330 is "clean". As in previous cases, the corresponding entry is determined by accessing the mapping table, this time using the colossal information (address, size) is the out-of-order twill instrussion in the withdrawal waiting list in step 607. If the entry is clean, then in step 717 the corresponding counter field 520 is decreased, and the process continues in step 711 where the instruction is normally removed from the withdrawal wait list 326 along with the other instructions. If in step 705 it is determined that the state field 510 of the corresponding entry in the region mapping table 330 is "dirty", an out-of-order exception is generated in step 707, and the exception is processed in stage 709; in this way, the effect of all instructions not withdrawn is canceled by removing all the entries from the withdrawal waiting list 326, and then the ejection resumes in the on-order position of the load instruction in the program by adjusting the register program counter to the memory address of the load instruction that is removed, which is contained in the withdrawal waiting list. Figures 8 and 9 illustrate an exemplary embodiment of the mapping table, for the case of a computer system whose virtual memory address space is 232 octets. The address space is divided into 512 regions. Memory locations are assigned to regions in incremented order, to the granularity of a double word; that is, the double word 0 is assigned to the region 0, the double word the the region l, and so on until the double word 511 that is assigned to the region 511. The assignment is wrapped again at the beginning of the table mapping so that double word 512 is assigned to region 0, double word 513 to region i, and so on. Figure 8a shows the power signals 801 and the output signals 840 a / of the region mapping table, for the case of a processor system having three memory units dedicated to load operations, two dedicated memory units to store operations, and a withdrawal unit (commitment) able to withdraw (commit) two instructions simultaneously. The mapping table receives seven feeds, each one of them consists of 14 bits. The servo-charged feeds2 are received from the three memory units dedicated to twill operations. Allocations to the asenaO (loadO) and storage (loadl) are supported by the memory units dedicated to the storing operations. The vrfyO and vrfyl feeds are received from the withdrawal unit (commitment). The feed format format to the region mapping table is illustrated in Figure 8b. The 14 bits that are fed by these supplies correspond to the 12 least significant bits of the memory reference and two bits indicating the size of the operand that is accessed: osteto, word, double word or quadruple word. Figure 9 illustrates the main components of the mapping table. Preferably, the table is organized as a two-dimensional set (16 by 32) of cells, where each cell corresponds to an entry in the table. The entries are arranged as columns; that is, adjacent regions in memory are assigned to cells in the same column, and the region associated with the last cell in a column is adjacent to the region associated with the highest cell in the next column. In this way, the region associated with the last cell in the last column is adjacent to the region associated with the first cell in the first column. A cell in the table is chosen based on the matching decoding of vertical and horizontal selection signals generated from the feeds to the table. That is, a select cell is located at the intersection of a vertical signal that fits 1 with a horizontal signal that conforms to l. Since there are seven simultaneous feeds to the table, up to seven cells can be chosen at the same time; alternatively, fewer cells can be chosen at the same time, and some cells can be chosen simultaneously by different sources (load units, storage units and verification units). The seven power signals 901, with 14 bits each, are separated into three sets of seven signals with cinso, suatro and sinso bits, respectively. Each 5-bit signal 902 corresponds to the most significant bits of the direction field of FIG. 8b and is used to select a two-dimensional table solder. Each soft-bit signal 913 suffers from the following soft bits of the resolution vector in FIG. 8b, and is used to select a row of the two-dimensional table. Each 5-bit signal 915 is responsive to the three bits more than the address field and the two size bits in Figure 8b, and is used to detest whether the corresponding reference is contained within a single region or if sarea two regions. Each 5-bit signal 902 is fed to a binary de-modifier 905 which generates a 32-bit signal 907 in an I-de-32 code. The seven 32-bit signals 907 are intermixed in the perfect mixer 909, generating 32 signals 911 of 7 bits each, corresponding to a decoded representation of all 5-bit signals 902. Each of these signals selects a column of the set two-dimensional In addition, the 7-bit signals further to the right are fed to the first cell in the two-dimensional array to detect the case of blocking from the last cell to the first cell. Each 4-bit signal 913 is fed to a binary decoder 920, which generates a 16-bit signal in a l-of-16 code. Each 5-bit signal 915 is fed to a face-off generator 922, which generates a single-bit signal indicating whether the memory reference (tupia <address, size>) is associated with two adjacent inputs in the mapping table regional. The seven 16-bit outputs of the binary decoders 920 and the seven 1-bit outputs of the sare generator 922 are combined in the mixer 924, generating 16 signals 919 of 7 bits each. These signals choose either one or two rows of the bi-dimensional set, depending on whether the corresponding references access one or two regions. The coresident decoding of signals 911 and 919 determines the cell selection of the two-dimensional array 950 of 512 cells, which constitute the accessed entries in the region mapping table. Figure 10 is a block diagram illustrating power and output signals in the sare generating unit 922 employed in the region mapping table of Figure 9. This unit receives somo feeds the three least significant bits of the power field. address and the two bits of the size field in Figure 8b. The first three bits correspond to the octet shift within the double aligned word that is assigned to a particular table cell. This offset is added with the size field. If the result of this addition is greater than 8. { the number of octets in a double word, the size of the minimum interference unit), then the reference causes a region. In each case, the output of the generation generation unit is set to 1, otherwise it is set to 0. Figure 11 is a flow diagram showing the functionality of each cell in the region mapping table. The cell consists of a counter and a status field. In step 1101, the reference number of each type to the region is determined. The reference rates include: i) clearance of an out-of-order loading operation; 2) dispatch of an operation to be stored; and 3) withdrawal (commitment) of an out-of-order loading operation. In step 1103, if one or more out-of-order loading instructions are removed, the status field is inspected; if the value of this field is "dirty", then there has been interference between re-ordered memory operations within the region, in such a way that the process proceeds at step 1105 where the corresponding failO and / or faill outputs are set to 1 and the process ends. On the other hand, if the result of step 1103 is false, then in step 1107 it is determined if one or more out-of-order serge instrussions that have reference to the region are dispatched and if the value of the counter is 0, in which case in step 1109, the state field is set to "clean". The process continues in stage lili, where the counter is increased by the number of out-of-order load operations dispatched. The process then continues in step 1113. In step 1113, it is determined whether one or more storage instructions that refer to the region are removed (committed) and if the value of the counter is different from 0, in which case in the stage 1115 the status field is set to "dirty". The process then continues in step 1117. In step 1117, the counter field is decremented by the number of out-of-order twister instrussions that are removed (committed) and the process terminated. While the invention has previously been disregarded with respect to its particular embodiments, those skilled in the art will recognize that the invention can be practiced by modification within the spirit and scope of the appended claims.

Claims (30)

  1. CLAIMS l. In a computer prosecution system, where instruction sections are executed by a processing unit, in which at least one of the instructions is a twill instruction that moves from an original position in the sequences of instructions to a Previous position in the sequencing of instrussiones, and where the loading instruction at least moves on a storage instruction at least, thus becoming an out-of-order loading instruction, where the load instruction was -order identifies a location in a memory subsystem from which data is read, and the minimum storage instruction identifies a location in the memory subsystem in which to store data, a method for detecting interference between the load instruction outside de-orden and the instruction of almasenar somo minimum, and to recover from disha interferensia, the method is characterized because it comprises the stages of: alm accentuate in a table? a plurality of inputs, wherein each input E corresponds to a region R of a plurality of regions of the memory subsystem, wherein the input E includes at least one sampo that indicates: (i) if the prosecution unit proceeds at least one out-of-order loading instruction that loads data from a location within the R region, and (ii) if the advancing unit proses at least one instruction to store with interference storing data at a site within the R region, where the interference store instruction interferes with a strong-order load command, which loads data from a location within the R region; identify an input which corresponds to a first out-of-order load instruction that is processed by the processing unit, where the input El corresponds to a region R1 of the memory subsystem and the first out-of-order load instruction order loads data from a location within the Rl region; when reaching the original position of the first out-of-order loading control, control the processing unit to execute a recovery sequence if the field is at least one entry. It indicates that the prosecution unit processes at least one storage process with interference, that stores data to a location within the Rl region. The method according to claim 1, characterized in that the field at least of the input E indicates: (i) if the processing unit has dispatched at least one out-of-order load instruction that loads data from a location within from the region R, and (ii) wherein the processing unit has dispatched at least one instruction to store with interference, which stores data at a location within the region R; and wherein the sonorous stage, the prosecuting unit controls the execution of a recovery sequence if the field at least of the input indicates that the processing unit has dispatched the instruction to store with minimal interference. 3. The method according to claim 1, characterized in that the field at least of the input E indicates: (i) if the processing unit has executed at least one out-of-order loading instruction that loads data from a location within of the R region, and (ii) if the processing unit has executed at least one instruction of souls, they are interpreters, who store data at a location within the region R; and wherein the control stage controls the processing unit to execute a resupesion sequencing if the field at least of the input indicates that the processing unit has executed the instruction of almasenar with interference as a minimum. 4, The method according to claim 1, characterized in that each region corresponds to a plurality of memory locations. 5. The method of sonification is claim 1, characterized in that it also comprises the steps of: selecting at least one out-of-order loading instruction to be executed by the processing unit, where the out-of-order twill instrussion identifies an LL location in the memory subsystem from which data is read, and where the LL location is within an LR region of the memory subsystem; by sada instrussíon of serge out-of-order selesta, identify an entry LE of the table corresponding to the LR region and update the field at least of the entry LE to indicate that at least one serge instruction out-of-order, which twigs data from a location within the LR region is processed by the processing unit. 6. The method according to claim 1, characterized in that it further comprises the steps of: selecting at least one intrusion to be linked by the prosecution unit, where the instrussion of storing at least identifies an SL location in the memory subsystem where storing data, wherein the location SL is within an SR region of the memory subsystem; and for each instruction of trained people, identify an SE entry of the table that corresponds to the SR region, evaluate the minimum sampo of the SE entry, and if the minimum sampo of the SE entry indicates that at least one load instruction out-of-order that loads data from a location within the SR region is processed by the processing unit, updating the field at least from the SE input, to indicate that at least an instruction to store with interferensia that stores data to a location within the SR region, it is processed by the processing unit. 7. The method according to claim 1, characterized in that it further comprises the steps ce: selecting at least one out-of-order loading instruction for sotnpromisa, wherein at least one out-of-order serge instrussion identifies a bias CLL in the memory subsystem of the sual data is read, and where the CLL location is within a CLR region of the memory subsystem; and by sada instrussión of serge out-of-order selected, identify a CLE entry of a table that corresponds to the CLR region, evaluating the field at least of the CLE input and execute the recovery sequence if the field at least of the input CLE indicates that at least one souvenir instrussion is interferensia, which stores data at a location within the CSR region, is being processed by the processing unit. 8, the method according to claim 1, characterized in that it also comprises the step of: yes the field at least of the CLE entry indicates that at least one instruction of souvenir are interfereneia that stores data at a location within the CSR region it is not processed by the processing unit, update the CLE entry to indicate that. The out-of-order order loading instruction has been completed. The method according to claim 1, characterized in that the out-of-order load instruction as a minimum and the recovery sequence are generated before executing the program, where the out-of-order load instruction is identified by a predetermined code in the out-of-order loading instruction, where the original position of the out-of-order loading instruction is identified by a predetermined instruction that is generated before executing the program, where the instruction of Out-of-order load is identified in response to decoding the predetermined code, where the original position of the out-of-order load instruction is identified in response to decoding the predetermined instruction and where the control logic transfers execution control to the recovery sequence through a branch instruction and a program trap. 10. The method according to claim 1, characterized in that at least the out-of-order load instruction is generated during the execution of the program and marked by a predetermined field that is added to the instruction and in which the control generates a program exception that: a) cansela efesto of the load instruction outside -d -order, b) cansela effects of other instructions executed out-of-order after the out-of-order load instruction, and e) resumes execution of the original out-of-order serge instruction position. 11. The security method with claim 10, characterized in that the instrusions are reordered during program execution and executed out-of-order, where complete instrusions are collated in a withdrawal waiting list, withdrawn from the waiting list of withdrawal in program order and withdrawals in order of program, where the original position of the out-of-order loading instruction is identified by out-of-order serge instruction position in the withdrawal waiting list, and where the program exsepsion drops the withdrawal waiting list in order to cancel all remaining non-recalled instructions and resumes the execution of the program position of the out-of-order load instruction. 12. In a processor system, where sequences of instructions are executed by a processing unit, in which at least one of the instructions is a load instruction that moves from an original position in the instructions sequencing to a position previous in the sequencing of intrusions, and where at least one load instrussión moves on at least one instruction to store, thus becoming an out-of-order loading instruction, where the serge instrumentation is- The apparatus identifies a location in a memory subsystem of the data is read, an apparatus for detecting interference between out-of-order twig instruction and at least one intrusion to store and to recover from said interference, the apparatus comprises : a table consisting of a plurality of inputs, wherein each input E corresponds to a region R of a plurality of regions of the memory subsystem, wherein the It includes at least one field that indicates: (i) if the processing unit processes at least one out-of-order loading instruction that loads data from a location within the R region, and (ii) the prosting unit At least one instrussion of souls is interferrences, which stores data at a location within the R region, where the instruction of aliasing with interference interferes with an out-of-order loading instruction, which loads data from a location within from the R region; detection logic to identify an entry The one corresponding to a first out-of-order twill instruction that is processed by the processing unit, where the input El corresponds to an Rl region of the memory subsystem and the first load instrustion out-of-order loads data from a location within the Rl region and to detect when the processing unit reaches an original position of the first out-of-order loading instruction; and logic of control, coupled to the logic of detection, to sontrolar the prosesadora unit for axes a sesuensía of resuperasión suando the unit prosesadora reaches original position of the first instruction of serge out-of-order and the field at least of an entrance indicates that the processing unit processes at least one instruction to store with interference, which stores data at a location within the region R1. 13. The apparatus according to claim 12, sanitized because at least one sampo of the input E indices: (i) if the prosecuting unit has cleared at least one out-of-order serge instrussion that twists data from a bias within of the R region, and (i) if the prosecuting unit has despassed at least one instrussion of souls, they are interferensia, which stores data at a location within the R region; and where the control logic, the processing unit strokes to execute a recovery sequence when the processing unit reaches the original position of the first out-of-order loading instruction and the minimum sampo of the input indicates that the unit processor has dispatched the instruction to store with interference as a minimum. The apparatus according to claim 12, characterized in that the field at least of the input E indicates: (i) if the processing unit has executed at least one out-of-order loading instruction that loads data from a location within the region R, (ii) if the processing unit has executed at least one instruction to store with interference, which stores data at a location within region R; and wherein the control logic regulates the processing unit to execute a recovery sequence by having the processing unit reach the original position of the first out-of-order loading instruction and the minimum field of the input indicating that the processing unit has run the storage unit with interference at least. 15. The apparatus according to claim 12, characterized in that each region corresponds to a plurality of memory locations. 16. The apparatus according to claim 12, characterized in that it also comprises: drag-and-drop logs to select at least one out-of-order loading instruction to be executed by the prosecution unit, wherein the out-of-order loading instruction identifies an LL location in the memory subsystem from which data is read, and where the LL location is within an LR region of the memory subsystem; and where the control logic, for each out-of-order load instruction selected by the dispatch logic, identifies an LE entry of the table corresponding to the LR region, and updates the field at least of the LE entry for indicate that at least one out-of-order loading instruction that loads data from a batch within the LR region is processed by a proctor unit. 17. The sonicity device is claim 12, characterized in that it also comprises: dispatch logic to select at least one souvenir instrussion, to be executed by the processing unit, where the instruction to store as a minimum identifies a location SL in the subsystem of memory in which data is stored, wherein the location SL is within an SR region of the memory subsystem; and where the logic of control, for each instruction of souvenir selessionada by the logic of despasho, identifies an SE entry of the table that corresponds to the SR region, evaluates the field at least of the SE input, and if the field at least of the SE input indicates that an out-of-order load instruction that at least loads data from a location within the SR region is processed by the unit In this case, the operator evaluates the minimum sampo of the SE input to indicate that the minimum storage interfering device that stores data at a location in the SR region is processed by the processing unit. 18. The apparatus according to claim 12, which is sarasterized because it also includes: commitment logic to select at least one out-of-order serge instrussion for sompromiso, where the out-of-order charge instrussion at least identifies a CLL location in the memory subsystem from which data is read, and where the CLL location is within a CLR region of the memory subsystem; and where the control logic for each out-of-order load instruction selected by the prompting logic identifies a CLE entry of the table corresponding to the CLR region, evaluates the field at least of the CLE input and executes the recovery sequence if the minimum field of the CLE entry indicates that at least one instruction to store with interference, which stores data at a location within the CSR region, is processed by the processing unit. 19. The apparatus according to claim 12, characterized in that if the field at least of the CLE entry indicates that at least one instruction of masking is interference that stores data at a location within the CSR region, it is not processed by the prosecution unit, the The control logic updates the CLE entry to indicate that the out-of-order serge instruction has been completed. 20. The sonicity apparatus is the reinvindisation 12, which is sarasterized because the out-of-order servo instrussion is minimal and the recovery symbol is generated before the execution of the program, where the out-of-order loading instrumentation is identified by a predetermined code in the out-of-order loading instruction, where the original position of the out-of-order loading instruction is identified by a predetermined instruction that is generated before the execution of the program; where the logic of identification identifies the out-of-order loading instruction in response to decoding the predetermined code, where the code of charge identifies the original position of the out-of-order loading instruction, in response to decoding the predetermined instrussión, and where the logic of sontrol transfers axession control to the recovery sequence through a branch control and a program trap. 21. The apparatus according to claim 12, characterized in that the instructions are reordered during program execution and executed out-of-order, where complete instructions are collated in a withdrawal waiting list, removed from the waiting list. of withdrawal in program order and are removed in program order, where at least one out-of-order loading instruction is generated during program execution and identified by a predetermined field that is connected to the instruction, where the position The original of the out-of-order loading instruction is identified by the position of the out-of-order load instruction in the withdrawal waiting list, and where the sontrol logic transfers execution control to the recovery sequence to the generate a program exception that downloads the withdrawal waiting list in order to cancel all remaining non-recalled instructions and resume execution from the original position. ginal of the out-of-order loading instruction. 22. The apparatus according to claim 12, characterized in that the processor executes concurrently a plurality of instructions in a channelized manner, wherein the execution process consists of a plurality of stages, wherein a step can be performed by at least one instrussion. out-of-order twister 00L1 and at least one instruction to store Yes, and where this stage may concurrently withdraw at least one out-of-order twill instruction 00L2 and at least one instruction store S2; wherein the table further comprises: means for identifying first and second locations in the memory subsystem, where the first location is read by the minimum twill instrussion OOLl and the second location is stored by the storing of at least Si; means for concurrently identifying third and fourth locations in the memory subsystem, wherein the third location is read by the load intrusion at least 00L2 and the second location is stored by the instruction store at least S2; means to update entries in the table in response to identification of the first, second, third and fourth locations in order to identify the presence of storage operations with interference. 23. In a computer processing system, where instruction sequencies are executed by a processing unit, wherein at least one of the instructions is a loading instruction that moves from an original position in instructions sequencing to a position previous in the instruction sequences, and where the minimum serge instrussion moves on at least one instruction to store, thus becoming an out-of-order serge instruction, where the out-of-order load instructive identifies a location in a sual memory subsystem data are read and where at least one instrussion of souls identifies an allocation of the memory subsystem in which to store data, the method for detesting interferensia between out-of-order twill instrussion and at least one instruction of almasenar, and to overcome this interference, the method is characterized in that it comprises: storing in a table a plurality of inputs, wherein each input E corresponds to a region R of a plurality of regions of the memory subsystem, where the input E includes at least one field indicating: (i) if the processing unit processes at least one out-of-order serge instruction that loads data from a location within the reg ion R, and (ii) where the processing unit processes at least one stored instruction that stores data at a location between the R region; identify an entry Which corresponds to a first serge-out-of-order instruction that is processed by the prosecution unit, where the input corresponds to a region R1 of the memory subsystem and the first off-load instruction order loads data from a location within the Rl region; To move the original position to the first out-of-order serge instrussion, to control the prostatic unit to send a recraction sequence if the field of at least one entry indicates that the processing unit processes at least one instruction It stores that stores data at a location within the Rl region. 24. The method of soundness With claim 23, characterized in that at least one field of the input E indicates: i) if the processing unit has dispatched at least one out-of-order loading instruction that loads data from a batch within the region R, and ii) if the processing unit has dispatched at least one instrussion almasena that stores data at a location between the R region; and wherein the step of controlling, controls the processing unit to execute a recovery sequence and at least one field of the input. It indicates that the processing unit has dispatched the instruction of minimum weight. 25. The method according to claim 23, characterized in that the minimum sampo of the entry E indicates: i) if the prosecution unit has executed at least one out-of-order loading instruction that loads data from a location within the region R, and ii) if the processing unit has executed at least one store instruction that stores data at a location within the region R; and wherein the step of controlling, controls the processing unit to execute a recovery sequence if the field at least of the input El indicates that the processing unit has executed the instruction to store at least. 26. The method according to claim 23, characterized in that each region corresponds to a plurality of memory locations. 27. The method according to claim 23, characterized in that it also comprises the steps of: selecting at least one out-of-order load instruction to be executed by the processing unit, where the out-of-order loading instruction identifies a location LL in the memory subsystem of the sual data is read, and where the location LL is within a LR region of the memory subsystem; for each out-of-order serge instruction selected, identify an LE entry of the table corresponding to an LR region, and update the field at least of the LE entry to indicate that at least one order-of-order instruction , which loads data from a location within the LR region is processed by the processing unit. 28. The method according to claim 23, characterized in that it further comprises the steps of: selecting at least one store instruction to be executed by the processor unit, wherein the instruction to store at least identifies a location SL in the memory subsystem in the sual data is stored, where the location SL is within the SR region of the memory subsystem; and by sada instrussión to store selected, identify an SE entry of the table q? e corresponds to the SR region, update the field at least of the SE entry, to indicate that at least one instruction to store that stores data at a location to A location within the SR region is processed by the Prosecutor unit. 29. The method of sonification with claim 23, sarasterized because the load-out-of-order instruction as a minimum and the resuscitation sequencing is generated before executing the program, where the out-of-order twill instruction is identified by a predetermined code in the out-of-order serge instrussion, where the original position of the out-of-order serge instrussion is identified by a predetermined instruction that is generated before executing the program, where the Out-of-order load instruction is identified in response to decoding the default code, where the original position of the out-of-order serge instrussion is identified in response to the de-modifying predetermined instruction, and where the control logic transfers axes axes control to the retrieval sequence through a ramifission instrussión and a program trap. 30. The sonformity method is claim 23, which is sarasterized because the minimum order-of-order twill is generated during the execution of the program and marked by a predetermined field that is connected to the instruction, and where the stage of sontrol generates a program exsepsión that: a) sansela efestos of the serge instrussión out-of-order, b) sansela effects of other instruccions that are executed out-of-order after the instruction serge de-out-of-order , ys) resumes axing of the original position of the off-order load instrussión.
MXPA/A/1998/002291A 1997-03-25 1998-03-24 Apparatus for detection based on region of interference between reordered memory operations in a process MXPA98002291A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08827016 1997-03-25

Publications (1)

Publication Number Publication Date
MXPA98002291A true MXPA98002291A (en) 1999-02-24

Family

ID=

Similar Documents

Publication Publication Date Title
US5918005A (en) Apparatus region-based detection of interference among reordered memory operations in a processor
US7600221B1 (en) Methods and apparatus of an architecture supporting execution of instructions in parallel
US6330662B1 (en) Apparatus including a fetch unit to include branch history information to increase performance of multi-cylce pipelined branch prediction structures
JP3488162B2 (en) Method and apparatus for reordering loading operations in a computer processing system
US7458069B2 (en) System and method for fusing instructions
US5758051A (en) Method and apparatus for reordering memory operations in a processor
US6631514B1 (en) Emulation system that uses dynamic binary translation and permits the safe speculation of trapping operations
US9104427B2 (en) Computing system with transactional memory using millicode assists
US6697932B1 (en) System and method for early resolution of low confidence branches and safe data cache accesses
US5996060A (en) System and method for concurrent processing
US6721874B1 (en) Method and system for dynamically shared completion table supporting multiple threads in a processing system
US5838988A (en) Computer product for precise architectural update in an out-of-order processor
US7363467B2 (en) Dependence-chain processing using trace descriptors having dependency descriptors
US6029240A (en) Method for processing instructions for parallel execution including storing instruction sequences along with compounding information in cache
JP3093624B2 (en) Method and apparatus for handling speculative exceptions
US7003629B1 (en) System and method of identifying liveness groups within traces stored in a trace cache
US6742111B2 (en) Reservation stations to increase instruction level parallelism
US20060010309A1 (en) Selective execution of deferred instructions in a processor that supports speculative execution
EP0810519A2 (en) Method and system for supporting speculative execution using a speculative look-aside table
JPH1091455A (en) Branch in cache hit/hiss
US6640315B1 (en) Method and apparatus for enhancing instruction level parallelism
US6219778B1 (en) Apparatus for generating out-of-order results and out-of-order condition codes in a processor
US20040117606A1 (en) Method and apparatus for dynamically conditioning statically produced load speculation and prefetches using runtime information
CN1124546C (en) Distributed instruction completion logic
JP2001527233A (en) Branch prediction using return select bits to classify the type of branch prediction