HUE035210T2 - Eljárás és berendezés adat-elõtöltési kérelmek törlésére egy ciklushoz - Google Patents

Eljárás és berendezés adat-elõtöltési kérelmek törlésére egy ciklushoz Download PDF

Info

Publication number
HUE035210T2
HUE035210T2 HUE14704714A HUE14704714A HUE035210T2 HU E035210 T2 HUE035210 T2 HU E035210T2 HU E14704714 A HUE14704714 A HU E14704714A HU E14704714 A HUE14704714 A HU E14704714A HU E035210 T2 HUE035210 T2 HU E035210T2
Authority
HU
Hungary
Prior art keywords
hogy hogy
data
instruction
cycle
loop
Prior art date
Application number
HUE14704714A
Other languages
English (en)
Inventor
Matthew Gilbert
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of HUE035210T2 publication Critical patent/HUE035210T2/hu

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30065Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/325Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/345Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
    • G06F9/3455Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Description

(12) EUROPEAN PATENT SPECIFICATION (45) Date of publication and mention (51) IntCI.: of the grant of the patent: G06F 9138 <2006 01> G06F 91345 <2006 01> 25.10.2017 Bulletin 2017/43 (86) International application number: (21) Application number: 14704714.6 PCT/US2014/012152 (22) Date of filing: 18.01.2014 (87) International publication number: WO 2014/113741 (24.07.2014 Gazette 2014/30)
(54) METHODS AND APPARATUS FOR CANCELLING DATA PREFETCH REQUESTS FOR A LOOP
VERFAHREN UND VORRICHTUNG ZUR UNTERDRLICKUNG VON DATENVORABRUFANFRAGEN FLIR EINE SCHLEIFE
PROCEDES ET APPAREIL POUR ANNULER DES REQUETES DE PRELECTURE DE DONNEES POUR UNE BOUCLE (84) Designated Contracting States: (56) References cited: ALATBEBGCHCYCZDEDKEEESFIFRGB WO-A1-00/73897 US-A1-2008 010 444 GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO US-B1- 6 260 116 US-B1- 6 430 680 PL PT RO RS SE SI SK SM TR US-B1- 6 775 765 (30) Priority: 21.01.2013 US 201313746000 · CHEN W Y ET AL: "An Efficient Architecture For
Loop Based Data Preloading", (43) Date of publication of application: MICROARCHITECTURE, 1992. MICRO 25.,
25.11.2015 Bulletin 2015/48 PROCEEDINGS OF THE 25TH ANNUAL INT ERNATIONAL SYMPOSIUM ON PORTLAND, OR, (73) Proprietor: Qualcomm Incorporated USA 1-4 DEC. 1992, LOS ALAMITOS, CA,
San Diego, CA 92121 (US) USA,IEEE COMPUT. SOC, US, 1 December 1992 (1992-12-01), - 4 December 1992 (1992-12-04), (72) Inventor: GILBERT, Matthew, M. pages 92-101, XP010094775, DOI:
San Diego, CA 92121 (US) 10.1109/MICR0.1992.697003 ISBN: 978-0-8186-3175-7 (74) Representative: Tomkins &amp; Co 5 Dartmouth Road
Dublin 6 (IE)
Note: Within nine months of the publication of the mention of the grant of the European patent in the European Patent Bulletin, any person may give notice to the European Patent Office of opposition to that patent, in accordance with the Implementing Regulations. Notice of opposition shall not be deemed to have been filed until the opposition fee has been paid. (Art. 99(1) European Patent Convention).
Description
Field of the Disclosure [0001] The present disclosure relates generally to aspects of processing systems and in particular to methods and apparatus to reduce cache pollution caused by data prefetching.
Background [0002] Many portable products, such as cell phones, laptop computers, personal data assistants (PDAs) and the like, utilize a processing system that executes programs, such as communication and multimedia programs. A processing system for such products may include multiple processors, complex memory systems including multi-levels of caches for storing instructions and data, controllers, peripheral devices such as communication interfaces, and fixed function logic blocks configured, for example, on a single chip. At the same time, portable products have a limited energy source in the form of batteries that are often required to support high performance operations by the processing system. To increase battery life, it is desirable to perform these operations as efficiently as possible. Many personal computers are also being developed with efficient designs to operate with reduced overall energy consumption. [0003] In order to provide high performance in the execution of programs, data prefetching may be used that is based on the concept of spatial locality of memory references and is generally used to improve processor performance. By prefetching multiple data elements from a cache at addresses that are near to a fetched data element or are related by a stride address delta or an indirect pointer, and that are likely to be used in future accesses, cache miss rates may be reduced. Cache designs generally implement a form of prefetching by fetching a cache line of data for an individual data elementfetch. Hardware prefetchers may expand on this by speculatively prefetching one or more additional cache lines of data, where the prefetch addressing may be formed based on, sequential, stride, or pointer information. Such hardware prefetcher operation for memory intensive workloads, such as processing a large array ofdata, may significantly reduce memory latency. However, data prefetching is not without its drawbacks. For example, in a software loop used to process an array ofdata, a data prefetcher circuit prefetches data to be used in future iterations of the loop including the last iteration of the loop. However, the data prefetched for the last iteration of the loop will not be used and cache pollution occurs by storing this data that will not be used in the cache. The cache pollution problem is compounded when loops are unrolled. [0004] United States Patent No 6,775,765 relates to a data processing system having instruction folding and a method thereof. United States Patent No, US 6,260,116 relates to a system and method for prefetching data. "An
Efficient Architecture for Loop Based Data Preloading", Chen et al, proposes a preloading buffer as an architectural support for preloading.
SUMMARY
[0005] Among its several aspects, the present disclosure recognizes that providing more efficient methods and apparatuses for prefetching can improve performance and reduce power requirements in a processor system. To such ends, an embodiment of the invention addresses a method for canceling prefetch requests. A loop exit situation is identified based on an evaluation of program flow information. Pending cache prefetch requests are canceled in response to the identified loop exit situation.
[0006] Another embodiment addresses a method for canceling prefetch requests. Data is speculatively prefetched according to a called function. Pending data prefetch requests are canceled in response to a function exit from the called function.
[0007] Another embodiment addresses an apparatus for canceling prefetch requests. A loop data address monitor is configured to determine a data access stride based on repeated execution of a memory access instruction in a program loop. Data prefetch logic is configured to speculatively issue prefetch requests according to the data access stride. A stop prefetch circuit is configured to cancel pending prefetch requests in response to an identified loop exit.
[0008] Another embodiment addresses a computer readable non-transitory medium encoded with computer readable program data and code. A loop exit situation is identified based on an evaluation of program flow information. Pending cache prefetch requests are canceled in response to the identified loop exit situation.
[0009] A further embodiment addresses an apparatus for canceling prefetch requests. Means is utilized for determining a data access stride based on repeated execution of a memory access instruction in a program loop. Means is utilized for speculatively issuing prefetch requests according to the data access stride. Means is also utilized for canceling pending prefetch requests in response to an identified loop exit.
[0010] It is understood that other embodiments of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein various embodiments of the invention are shown and described by way of illustration. As will be realized, the invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] Various aspects of the present invention are illustrated by way of example, and not by way of limitation, in the accompanying drawings, wherein: FIG. 1 illustrates an exemplary processor system in which an embodiment of the invention may be advantageously employed; FIG. 2A illustrates a process for canceling pending non-demand data prefetch requests upon detecting a loop-ending branch; and FIG. 2B illustrates a process for canceling pending non-demand data prefetch requests upon detecting a function return; and FIG. 3 illustrates a particular embodiment of a portable device having a processor complex that is configured to cancel selected pending data prefetch requests to reduce cache pollution.
DETAILED DESCRIPTION
[0012] The detailed description set forth below in connection with the appended drawings is intended as a description of various exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In some instances, well known structures and components are shown in block diagram form in order to avoid obscuring the concepts of the present invention.
[0013] FIG. 1 illustrates an exemplary processor system 100 in which an embodiment of the invention is advantageously employed. The processor system 100 includes a processor 110, a cache system 112, a system memory 114, and an input and output (I/O) system 116. The cache system 112, for example, comprises a level 1 instruction cache (Icache) 124, a memory controller 126, and a level 1 data cache (Dcache) 128. The cache system 112 may also include a level 2 unified cache (not shown) or other cache components as desired for a par-ticularimplementation environment. The system memory 114 provides access for instructions and data that are not found in the Icache 124 or Dcache 128. It is noted that the cache system 112 may be integrated with processor 110 and may also include multiple levels of caches in a hierarchical organization. The I/O system 116 comprises a plurality of I/O devices, such as I/O devices 140 and 142, which interface with the processor 110.
[0014] Embodiments of the invention may be suitably employed in a processor having conditional branching instructions. The processor 110 comprises, for example, an instruction pipeline 120, data prefetch logic 121, prediction logic 122, and a stack logic circuit 123. The in struction pipeline 120 is made up of a series of stages, such as, a fetch and prefetch stage 130, decode stage 131, instruction issue stage 132, operand fetch stage 133, execute stage 134, such as for execution of load (Ld) and store (St) instructions, and completion stage 135. Those skilled in the art will recognize that each stage 130-135 in the instruction pipeline 120 may comprise a number of additional pipeline stages depending upon the processor’s operating frequency and complexity of operations required in each stage. For example, the execute stage 134 may include one or more pipeline stages corresponding to one or more instruction execution stage circuits, such as an adder, a multiplier, logic operations, load and store operations, shift and rotate operations, and other function circuits of greater or less complexity. For example, when a load instruction is executed, it requests data from the Dcache 128 and if the requested data is not present in the Dcache a fetch request is issued to the next level of cache or system memory. Such a fetch request is considered a demand request since it is in direct response to execution of an instruction, in this case a load instruction.
[0015] A prefetch request is a request that is made in response to program flow information, such as detection of a program loop having one or more load instructions in the loop with load addresses based on a stride, for example. The data prefetch logic 121 utilizes such program flow information which may be based on a number of iterations of the detected loop to more accurately identify a demand use pattern of the operand addresses of the load instructions before issuing a prefetch request. Fill requests are inserted when a pattern is detected. The processor 110 may operate to differentiate a demand request from a prefetch request by use of an extra flag associated with the request that is tracked in the processor pipeline. This flag could also propagate with the request to the cache where each outstanding cache line fill could be identified as either a prefetch or demand fill. Each of the pipeline stages may have varied implementations without departing from the prefetch request canceling methods and apparatus described herein.
[0016] In order to minimize delays that could occur if data required by a program were not in the associated level 1 Dcache 128, the fetch and prefetch stage 130 records program flow information associated with one or more memory access instructions which execute in a detected program loop. Program information may include an indication from the decode stage 131 that a load instruction has been received and operand address information for the load instruction may be available at a pipeline stage priorto execution, such as operand fetch stage 133 or at the execute stage 134. The data prefetch logic 121 monitors the load addresses as they become available to detect a pattern. After the pattern is determined with an acceptable level of confidence, such as by monitoring load instructions through three or more iterations of a loop, a prefetch request for expected data is issued prior to when the load instruction is encountered again in the loop. This speculative prefetch request ensures the required data is available in the level 1 Dcache when needed by the execute stage 134. The load and store execute stage 134 is then more likely to access the required data directly from the level 1 Dcache without having to wait to access the data from higher levels in the memory hierarchy.
[0017] The data prefetch logic 121 may also include a data cache loop data address monitor to determine a data access stride. The data prefetch logic 121 then speculatively issues prefetch requests with operand addresses set according to the data access stride. For example, the data prefetch logic 121 may include a stride circuit 119 that is configured to monitor repeated executions of a load instruction to determine a difference between the operand address of each execution of the load instruction that represents a stride value. The stride circuit 119 may also include an add function that is configured to add the determined stride value to the operand address of the most recently executed load instruction to generate the next operand address. In contrast to the stride value as a predicted address, afetched conditional branch instruction uses branch prediction logic, such as contained in the prediction logic circuit 122, to predict whether the conditional branch will be taken and the branch address. Afetched non-branch instruction proceeds to the decode stage 131 to be decoded, issued for execution in the instruction issue stage 132, executed in execute stage 134, and retired in completion stage 135.
[0018] The prediction logic circuit 122 comprises a detection logic circuit 146for monitoring events, afilter 150, and a conditional history table 152. In one embodiment, it is assumed that a majority of conditional branch instructions generally have their conditions resolved to the same value for most iterations of a software loop.
[0019] The detection logic circuit 146, in one embodiment, acts as a software loop detector that operates based on the dynamic characteristics of conditional branch instructions used in software loops as described with regard to FIG. 2A. The detection logic circuit 146 may also detect exits from called software functions, as described with regard to FIG. 2B.
[0020] In software loops with a single entry and a single exit, a loop ending branch is generally a conditional branch instruction which branches back to the start of the software loop for all iterations of the loop except for the last iteration, which exits the software loop. The detection logic circuit 146 may have multiple embodiments for the detection of software loops as described in more detail below and in U.S. Patent Application, 11/066,508 assigned to the assignee of the present application, entitled "Suppressing Update of a Branch History Register by Loop-Ending Branches".
[0021] According to one embodiment, the detection logic circuit 146 identifies conditional branch instructions with a branch target address less than the conditional branch instruction address, and thus considered a backwards branch, and is assumed to mark the end of a soft ware loop. Since not all backward branches are loop ending branches, there is some level of inaccuracy which may need to be accounted for by additional monitoring mechanisms, for example.
[0022] Also, as described with regard to FIG. 2B, a function return instruction (commonly named RET) can be detected. According to one embodiment, the detection of a function return is adapted to trigger prefetch cancellations of any non-demand prefetch requests. Cancellation of a prefetch request is also made in response to program flow information, such as detection of a loop exit. [0023] In another embodiment, a loop ending branch may be detected in simple loops by recognizing repeated execution of the same branch instruction. By storing the program countervalue for the last backward branch instruction in a special purpose register, and comparing this stored value with the instruction address of the next backward branch instruction, a loop ending branch may be recognized when the two instruction addresses match. Since code may include conditional branch instructions within a software loop, the determination of the loop ending branch instruction may become more complicated. In such a situation, multiple special purpose registers may be instantiated in hardware to store the instruction addresses of each conditional branch instruction. By comparing against all of the stored values, a match can be determined for the loop ending branch. Typically, loop branches are conditional backward direct branches having a fixed offset from the program counter (PC). These types of branches would not need address comparisons for detection of a loop exit. Instead, once a program loop is detected based on a conditional backward direct branch, the loop exit is determined from resolution of branch’s predicate. For example, if the predicate resolves to a true condition for returning to the loop, then the loop exit would be indicated when the predicate resolves to a false condition. In orderforthere to be pending prefetches, a program loop would have already executed a few times to trigger the prefetch hardware. The data prefetch logic 121 requires a few warmup demand loads to recognize a pattern before it starts prefetching. [0024] Also, a loop ending branch may be statically marked by a compiler or assembler. For example, in one embodiment, a compiler generates a particular type of branch instruction, by use of a unique opcode or by setting a special format bit field, that is only used for loop ending branches. The loop ending branch may then be easily detected during pipeline execution, such as during a decode stage in the pipeline.
[0025] The prediction logic circuit 122 comprises afilter 150, a conditional history table (CHT) 152, and associated monitoring logic. In one embodiment, a monitoring process saves state information of pre-specified condition events which have occurred in one or more prior executions of a software loop having a conditional branch instruction that is eligible for prediction. In support of the prediction logic circuit 122, the filter 150 determines whether a fetched conditional branch instruction has been received and the CHT 152 is enabled. An entry in the CHT 152 is selected to provide prediction information that is tracked, for example, by the pipeline stages 132-135 as instructions moves through the pipeline. [0026] The CHT 152 entry records the history of execution for the fetched instruction eligible for predicted execution. For example, each CHT entry may suitably comprise a combination of countvaluesfrom execution status counters and status bits that are inputs to the prediction logic. The CHT 152 may also comprise index logic to allow a fetched conditional branch instruction to index into an entry in the CHT 152 associated with the fetched instruction, since multiple conditional branch instructions may exist in a software loop. For example, by counting the number of conditional branch instructions from the top of a software loop, the count may be used as an index into the CHT 152. The prediction logic circuit 122 includes loop counters for counting iterations of software loops and ensuring that execution status counters have had the opportunity to saturate at a specified count value that represents, for example, a strongly not-executed status. If an execution status counter has saturated, the prediction logic is enabled to make a prediction for branch direction of the associated fetched conditional branch instruction on the next iteration of the loop.
[0027] The prediction logic circuit 122 generates prediction information that is tracked at the instruction issue stage 132, the operand fetch stage 133, the execute stage 134, and the completion stage 135 in track register issue (Tri) 162, track register operand fetch 163, track register execute (TrE) 164, and track register complete (TrC) 165, respectively. When a conditional backward branch with a failed predicate indicating the end of the loop, or a function return, is detected such as during the execute stage 134 in the processor pipeline, a cancel pending prefetch requests signal 155 is generated. In anotherembodiment, pending prefetch requests are canceled based on a conditional branch prediction generated by branch prediction logic. Each conditional branch is generally predicted by the branch prediction logic to take or not take the conditional branch. For example, where the prediction information indicates the conditional branch is taken, which in this example continues a program loop, the instruction fetcher speculatively fetches instructions on the program loop indicated by the prediction. The prediction information is also coupled to a cancel pending prefetch request logic circuit 141 which may reside in the fetch &amp; prefetch circuit 130. The cancel pending prefetch request logic circuit 141 may then speculatively cancel pending prefetch requests based program flow information indicating the pending prefetch requests are not needed. For example, the processor may be configured to not cancel pending prefetch requests based on a weakly predicted loop exit. By canceling one or more pending data prefetch requests, data cache pollution is reduced and power utilized to address such pollution is reduced in the processor 110. The cancel pending prefetch request signal 155 is coupled to the proces sor instruction pipeline 120 as shown in FIG. 1 and is accepted by the cancel pending prefetch request logic circuit 141 which causes prefetch requests that are pending, exceptfordemand prefetch requests, to be canceled. Also, processor performance is improved by not storing unnecessary data in the data cache which may have evicted data that would have been fetched and now a miss is generated instead.
[0028] Upon reaching the execute stage 134, if the execute condition specified for the loop ending conditional branch instruction has evaluated opposite to its prediction, any pipeline speculative execution of instructions on the wrong instruction path are corrected, for example by flushing the pipeline, and such a correction may include canceling pending prefetches that are associated with the wrong instruction path. For example, in one embodiment a correction to the pipeline includes flushing the instructions in the pipeline beginning at the stage the prediction was made. In an alternative embodiment, the pipeline is flushed from the beginning fetch stage where the loop ending conditional branch instruction was initially fetched. Also, the appropriate CHT entry may also be corrected after an incorrect prediction.
[0029] The detection circuit 146, acting as a loop detector, operates to detect a loop ending branch. For example, a loop ending branch is generally a conditional branch instruction which branches back to the start of the loop for all iterations of the loop except for the last iteration which exits the loop. Information concerning each identified loop is passed to filter circuit 150 and upon a loop exit situation a cancel pending prefetch request logic circuit 141 cancels pending non-demand prefetch requests in response to each identified loop exit.
[0030] In one embodiment, the filter circuit 150, forex-ample, is a loop counterwhich provides an indication that asetnumberof iterations of a software loop has occurred, such as three iterations of a particular loop. For each iteration of the loop, the filter determines if a conditional branch instruction is eligible for prediction. If an eligible conditional branch (CB) instruction is in the loop, the status of executing the CB instruction is recorded in the conditional history table (CHT) circuit 152. For example, an execution status counter may be used to record an execution history of previous attempted executions of an eligible CB instruction. An execution status counter is updated in a one direction to indicate the CB instruction conditionally executed and in an opposite direction to indicate the CB instruction conditionally did not execute. For example, a two bit execution status counter may be used where a not-executed status causes a decrement of the counter and an executed status causes an increment of the counter. Output states of the execution status counter are, for example, assigned an output of "11" to indicate that previous CB instructions are strongly indicated to have been executed, an outputof"10" to indicate that previous CB instructions are weakly indicated to have been executed, an output of "01" to indicate that previous CB instructions are weakly indicated to have been not executed, and an output of "00" to indicate that previous CB instructions are strongly indicated to have been not executed. The execution status counter "11" output and "00" output would be saturated output values. An execution status counter would be associated with or provide status for each CB instruction in a detected software loop. However, a particular implementation may limit the number of execution status counters that are used in the implementation and thus limit the number of CB instructions that are predicted. The detection circuit 146 generally resets the execution status counters upon the first entry into a software loop.
[0031] Alternatively, a disable prediction flag may be associated with each CB instruction to be predicted rather than an execution status counter. The disable prediction flag is set active to disable prediction if an associated CB instruction has previously been determined to have executed. Identifying a previous CB instruction that executed implies that the confidence level for predicting a not execute situation for the CB instruction would be lower than an acceptable level.
[0032] An index counter may also be used with the CHT 152 to determine which CB instruction is being counted or evaluated in the software loop. For example, in a loop having five or more CB instructions, the first CB instruction could have an index of "000" and the fourth eligible conditional branch instruction could have an index of "011". The index represents an address into the CHT 152 to access the stored execution status counter values for the corresponding CB instruction.
[0033] The prediction circuit 122 receives the prediction information for a particular CB instruction, such as execution status counteroutputvalues, and predicts, during the decode stage 131 of FIG. 1, for example, that the CB instruction will generally branch back to the software loop beginning and not predict a loop exit situation is reached. In one embodiment, the prediction circuit 122 may predict that the condition specified by the CB instruction evaluates to a no branch state, code exits or falls through the loop. The prediction circuit 122 tracks the CB instruction. If a CB instruction is predicted to branch back to the loop beginning, the prediction information indicates such status. If a CB instruction was determined to not branch back, then a tracking circuit generates a cancel pending prefetch request signal and a condition evaluation is made to determine if an incorrect prediction was made. If an incorrect prediction was made, the pipeline may also be flushed, the appropriate execution status counters in the CHT 152 are updated, and in one embodiment the associated CHT entry is marked to indicate that this particular CB instruction is not to be predicted from this point on. In another embodiment, the prediction logic circuit 122 may also change the pre-specified evaluation criterion upon determining the CB instruction was mispredicted, for example, to make the prediction criterion more conservative from this point on.
[0034] It is further recognized that not all loops have similar characteristics. If a particular loop provides poor prediction results, that loop is marked in the prediction logic circuit 122 to disable prediction. In asimilarmanner, a particular loop may operate with good prediction under one setofoperating scenarios and may operate with poor prediction under a different set of operating scenarios. In such a case, recognition of the operating scenarios allows prediction to be enabled, disabled or enabled but with different evaluation criterion appropriate for the operating scenario.
[0035] FIG. 2A illustrates a process 200 for canceling pending non-demand data prefetch requests upon detecting a loop-ending branch. At block 202, processor code execution is monitored for a software loop. At decision block 204, a determination is made whether a software loop has been detected. A software loop may be determined, for example, by identifying a backward branch to a location representing the start of the software loop on a first pass through the software loop, as described above. If no software loop has been identified, the process 200 returns to block 202. If a software loop has been identified then the process 200 proceeds to block 206. At this point in the code, a first cycle of the software loop has already been executed and the next cycle of the software loop is ready to start.
[0036] In the next cycle of the software loop at block 206, the processor code is monitored fora CB instruction. At decision step 208 a determination is made whether a CB instruction has been detected, for example, during a pipeline decode stage, such as decode stage 131 of FIG. 1. If no CB instruction has been detected, the process 200 returns to block 206. If a CB instruction has been detected, the process 200 proceeds to decision block 210. At decision block 210, a determination is made whether the conditional branch (CB) instruction resolved to end the loop, based on an evaluation of the conditional predicate, for example. There are a number of types of CB instruction evaluations that may have been detected. For example, afirst evaluation of the detected CB instruction could be resolved that the CB instruction is at the end of the software loop, but evaluates to continue loop processing. The backward branching CB instruction that identified the software loop in the first pass through the software loop is tagged by its address location in the processor code, for example. Also, for the case that a number of specified iterations of the software loop have not been completed, the CB instruction resolves to branch the processor back to the beginning of the software loop. A second evaluation of the detected CB instruction could be resolved that the CB instruction is at the end of the software loop and evaluates to end the software loop. A third evaluation of the detected CB instruction could be resolved that the CB instruction is within the software loop, but when evaluated as taken or not taken, the processor code remains in the software loop. Also, a fourth evaluation of the CB instruction could be resolved that the CB instruction is within the software loop, but when evaluated as taken or not taken, the processor code exits the software loop. In the fourth evaluation, a CB instruc tion that is within the software loop, but resolves as a forward branch past the address location of the backward branching CB instruction is considered to have exited the software loop.
[0037] Returning to decision block 210, if the detected CB instruction did not resolve to exit the software loop, as in the first and third evaluations of the CB instruction, the process 200 proceeds to block 212. At block 212, the process 200 continues with normal branch processing and then returns to block 206. If the detected CB instruction did resolve to exit the software loop, as in the second and fourth evaluations of the CB instruction, the process 200 proceeds to block 214. At block 214, the process 200 cancels pending data prefetch requests except for demand data prefetch requests, processes the CB instruction, and returns to block 202 to begin searching for the next software loop.
[0038] FIG. 2B illustrates a process 250 for canceling pending non-demand data prefetch requests upon detecting a function return. At block 252, processor code execution is monitored for a software function exit. It is noted that the software function may be speculatively executed. Forexample, speculative execution may occur for a function call in a software loop. In the case of speculative execution of the software function, the software function exit, such as execution of a RET instruction, may also be speculatively executed. At decision block 254, a determination is made whether a software function exit has been detected, such as by detecting a return instruction in a processor’s execution pipeline. If no software function exit has been detected, the process 250 returns to block 252.
[0039] If a software function exit has been detected, the process 250 proceeds to decision block 256. At decision block 256, a determination is made whether this detected exit situation is a return from an interrupt routine. If the detected exit is a return from an interrupt routine, then the process 250 returns to block 252. If the detected exit is not a return from an interrupt routine, the process 250 proceeds to block 258. At block 258, the process 250 cancels pending data prefetch requests except for demand data prefetch requests, processes the return instruction, and then returns to block 252 to continue monitoring processor code for a software function exit. [0040] Frequently, either by hand or through compiler optimizations, a software loop will be unrolled such that multiple iterations of the loop are executed sequentially. This sequential execution of each unrolled iteration becomes an additional prefetch candidate. On the last iteration of the loop, each unrolled candidate can then generate unneeded prefetch requests compounding the problem of prefetched data cache pollution. An embodiment of the invention also applies to loop unrolling by detecting the exit of the loop, or the return from a function, and cancelling all of the unneeded prefetch requests from each unrolled loop.
[0041] FIG. 3 illustrates a particular embodiment of a portable device 300 having a processor complex that is configured to cancel selected pending data prefetch requests to reduce cache pollution. The device 300 may be a wireless electronic device and include the processor complex 310 coupled to a system memory 312 having software instructions 318. The system memory 312 may include the system memory 114of FIG. 1. The processor complex 310 may include a processor 311, an integrated memory subsystem 314 having a level 1 data cache (L1 Dcache) 222, a level 1 instruction cache (L1 Icache) 326, a cache controller circuit 328, and prediction logic 316. The processor 311 may include the processor 110 of FIG. 1. The integrated memory subsystem 314 may also include a level 2 unified cache (notshown). The L1 Icache 326 may include the L1 Icache 124 of FIG. 1 and the L1 Dcache 322 may include the L1 Dcache 128 of FIG. 1. [0042] The integrated memory subsystem 314 may be included in the processor complex 310 or may be implemented as one or more separate devices or circuitry (not shown) external to the processor complex 310. In an illustrative example, the processor complex 310 operates in accordance with any of the embodiments illustrated in or associated with FIGS. 1 and 2. For example, as shown in FIG. 3, the L1 Icache 326, the L1 Dcache 322, and the cache controller circuit 328 are accessible within the processor complex 310, and the processor 311 is configured to access data or program instructions stored in the memories of the integrated memory subsystem 314 or in the system memory 312.
[0043] A camera interface 334 is coupled to the processor complex 310 and also coupled to a camera, such as a video camera 336. A display controller 340 is coupled to the processor complex 310 and to a display device 342. A coder/decoder (CODEC) 344 may also be coupled to the processor complex 310. A speaker 346 and a microphone 348 may be coupled to the CODEC 344. A wireless interface 350 may be coupled to the processor complex 310 and to a wireless antenna 352 such that wireless data received via the antenna 352 and wireless interface 350 can be provided to the processor 311. [0044] The processor 311 may be configured to execute software instructions 318 stored in a non-transitory computer-readable medium, such as the system memory 312, that are executable to cause a computer, such as the processor 311, to execute a program, such as the program process 200 of FIG. 2. The software instructions 318 are further executable to cause the processor 311 to process instructions that access the memories of the integrated memory subsystem 314 and the system memory 312.
[0045] In a particular embodiment, the processor complex 310, the display controller 340, the system memory 312, the CODEC 344, the wireless interface 350, and the camera interface 334 are included in a system-in-pack-age or system-on-chip device 304. In a particular embodiment, an input device 356 and a power supply 358 are coupled to the system-on-chip device 304. Moreover, in a particular embodiment, as illustrated in FIG. 3, the display device 342, the input device 356, the speaker 346, the microphone 348, the wireless antenna 352, the video camera 336, and the powersupply 358 are external to the system-on-chip device 304. However, each of the display device 342, the input device 356, the speaker 346, the microphone 348, the wireless antenna 352, the video camera 336, and the power supply 358 can be coupled to a component of the system-on-chip device 304, such as an interface or a controller.
[0046] The device 300 in accordance with embodiments described herein may be incorporated in a variety of electronic devices, such as a settop box, an entertainment unit, a navigation device, a communications device, a personal digital assistant (PDA), a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a computer, a portable computer, tablets, a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a video player, a digital video player, a digital video disc (DVD) player, a portable digital video player, any other device that stores or retrieves data or computer instructions, or any combination thereof.
[0047] The various illustrative logical blocks, modules, circuits, elements, or components described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic components, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing components, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration appropriate for a desired application.
[0048] The methods described in connection with the embodiments disclosed herein maybe embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-transitory storage medium known in the art. A non-transitory storage medium may be coupled to the proc-essorsuch that the processor can read information from, and write information to, the non-transitory storage medium. In the alternative, the non-transitory storage medium may be integral to the processor.
[0049] The processor 110 of FIG. 1 or the processor 311 of FIG. 3, for example, may be configured to execute instructions including conditional non-branch instructions under control of a program stored on a computer readable non-transitory storage medium either directly associated locally with the processor, such as may be available through an instruction cache, or accessible through an I/O device, such as one of the I/O devices 140 or 142 of FIG. 1, for example. The I/O device also may access data residing in a memory device either directly associated locally with the processors, such as the Dcache 128, or accessible from another processor’s memory. The computer readable non-transitory storage medium may include random access memory (RAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), flash memory, read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), com pact disk (CD), digital video disk (DVD), other types of removable disks, or any other suitable non-transitory storage medium.
[0050] While the invention is disclosed in the context of illustrative embodiments for use in processor systems, it will be recognized that a wide variety of implementations may be employed by persons of ordinary skill in the art consistent with the above discussion and the claims which follow below. For example, a fixed function implementation may also utilize various embodiments of the present invention.
Claims 1. A method (200) for canceling non-demand data cache prefetch requests, in a processor system (100) comprising a processor (110) having a cache system (112) comprising a data cache (124), and having an instruction pipeline (120), the method comprising: determining a data access stride based on repeated execution of a memory access instruction in a program loop; speculatively issuing data cache prefetch requests according to the data access stride; identifying (210) a loop exit based on an evaluation of program flow information; and characterized by: canceling (214) the data cache prefetch requests that are pending non-demand data cache prefetch requests in response to the identified loop exit. 2. The method of claim 1, wherein the loop exit is based on identifying a loop ending branch that evaluates to exit the program loop. 3. The method of claim 1, wherein the loop exit is based on an incorrect branch prediction which caused speculative instruction fetch and execution to be can- celed. 4. The method of claim 1, wherein identifying the loop exit comprises detecting a conditional branch instruction has resolved to end the program loop. 5. The method of claim 1 further comprising: detecting a conditional branch instruction has not resolved to end the program loop; and monitoring (202) for a loop exit. 6. An apparatus (110) for canceling non-demand data cache prefetch requests, in a processor system (100) comprising a processor (110) having a cache system (112) comprising a data cache (124), and having an instruction pipeline (120), the apparatus comprising: a loop data address monitor configured to determine a data access stride based on repeated execution of a memory access instruction in a program loop; data prefetch logic (121) configured to speculatively issue data cache prefetch requests according to the data access stride; means for identifying (210) a loop exit based on an evaluation of program flow information; and characterized by: a stop prefetch circuit configured to cancel the data cache prefetch requests that are pending non-demand data cache prefetch requests in response to the identified loop exit. 7. The apparatus of claim 6, wherein the loop data address monitor comprises: a stride circuit (119) configured to monitor repeated execution of the memory access instruction to determine a difference in an operand address for each execution of the memory access instruction, wherein the difference in the operand address is a stride address value; and an add function circuit configured to add the stride address value to the operand address of the most recently executed memory access instruction to determine the next operand address. 8. The apparatus of claim 6, wherein the identified loop exit is based on identifying a loop ending branch that evaluates to exit the program loop. 9. The apparatus of claim 6, wherein the identified loop exit is based on an incorrect branch prediction which cancels speculative instruction fetch and execution. 10. The apparatus of claim 6, wherein the identified loop exit is based on detecting a conditional branch instruction has resolved to end the program loop. 11. The apparatus of claim 6, wherein the stop prefetch circuit is further configured to detect a conditional branch instruction has not resolved to end the program loop and wherein the program loop continues until the loop exit is identified. 12. The apparatus of claim 6, wherein the stop prefetch circuit is further configured to not cancel pending prefetch requests based on a weakly predicted loop exit. 13. A computer readable non-transitory medium encoded with computer readable program data and code, the program data and code when executed by a processor operable to perform a method according to any of claims 1 to 5.
Patentansprüche 1. Ein Verfahren (200) zum Unterdrücken von nicht angeforderten Datencachevorabrufanfragen in einem Prozessorsystem (100), das einen Prozessor (110) aufweist, der ein Cache-System (112) hat, das einen Datencache (124) aufweist, und der eine Instrukti-ons- bzw. Befehls-Pipeline (120) hat, wobei das Verfahren Folgendes aufweist:
Bestimmen eines Datenzugriffsschrittes basierend auf einer wiederholten Ausführung eines Speicherzugriffsbefehls in einer Programmschleife; spekulatives Ausgeben von Datencachevorabrufanfragen gemäß dem Datenzugriffsschritt; Identifizieren (210) eines Schleifenaustritts basierend auf einer Evaluierung von Programmflussinformation; und das gekennzeichnet ist durch:
Unterdrücken (214) der Datencachevorabrufanfragen, die ausstehende nicht angeforderte Datencachevorabrufanfragen sind ansprechend auf den identifizierten Schleifenaustritt. 2. Verfahren nach Anspruch 1, wobei der Schleifenaustritt auf Identifizieren eines die Schleife beendenden Zweiges basiert, der evaluiert, dass aus der Programmschleife ausgetreten werden soll. 3. Verfahren nach Anspruch 1, wobei der Schleifenaustritt auf einer nicht korrekten Zweigvorhersage basiert, welche einen spekulativen Befehlsabruf und eine Ausführung verursacht hat, die unterdrückt bzw. gelöscht werden sollen. 4. Verfahren nach Anspruch 1, wobei das Identifizieren des Schleifenauftrittes Detektieren aufweist, dass ein konditionaler Zweigbefehl entschieden hat, die Programmschleife zu beenden. 5. Verfahren nach Anspruch 1, das weiter Folgendes aufweist:
Detektieren, dass ein konditionaler Zweigbefehl nicht entschieden hat, die Programmschleife zu beenden;und Überwachen (202) hinsichtlich eines Schleifenaustritts. 6. Eine Vorrichtung (110) zum Löschen bzw. Unterdrücken von nicht angeforderten Datencachevorabrufanfragen in einem Prozessorsystem (100), das einen Prozessor (110) aufweist, der ein Cache-System (112) hat, das einen Datencache (124) aufweist, und der eine Instruktions- bzw. Befehls-Pipeline (120) hat, wobei die Vorrichtung Folgendes aufweist: ein Schleifendatenadressenüberwachungselement zum Bestimmen eines Datenzugriffsschrittes basierend auf einer wiederholten Ausführung eines Speicherzugriffsbefehls in einer Programmschleife;
Datenvorabruflogik (121), die konfiguriert ist zum spekulativen Ausgeben von Datencachevorabrufanfragen gemäß dem Datenzugriffsschritt;
Mittel zum Identifizieren (210) eines Schleifenaustritts basierend auf einer Evaluierung von Programmflussinformation; und die gekennzeichnet ist durch: eine Vorabrufstoppschaltung, die konfiguriert istzum Unterdrücken bzw. Löschender Datencachevorabrufanfragen, die ausstehende nicht angeforderte Datencachevorabrufanfragen sind, ansprechend auf den identifizierten Schleifenaustritt. 7. Vorrichtung nach Anspruch 6, wobei das Schleifendatenadressenüberwachungselement Folgendes aufweist: eine Schrittschaltung (119), die konfiguriert ist zum Überwachen einer wiederholten Ausführung des Speicherzugriffsbefehls zum Bestimmen einer Differenz in einer Operandenadresse für jede Ausführung des Speicherzugriffsbefehls, wobei die Differenz in der Operandenadresse ein Schrittadressenwert ist; und eine Funktionshinzufügungsschaltung, die konfiguriert ist zum Hinzufügen des Schrittadres senwertes zu der Operandenadresse des zuletzt ausgeführten Speicherzugriffsbefehls zum Bestimmen der nächsten Operandenadresse. 8. Vorrichtung nach Anspruch 6, wobei der identifizierte Schleifenaustritt auf Identifizieren eines die Schleife beenden Zweiges basiert, der evaluiert, dass aus der Programmschleife ausgetreten werden soll. 9. Vorrichtung nach Anspruch 6, wobei der identifizierte Schleifenaustritt auf einer inkorrekten Zweigvorhersage basiert, welche eine spekulative Befehlsabholung und -ausführung unterdrückt bzw. löscht. 10. Vorrichtung nach Anspruch 6, wobei der identifizierte Schleifenaustritt auf Detektieren basiert, dass ein konditionaler Zweigbefehl entschieden hat, die Programmschleife zu beenden. 11. Vorrichtung nach Anspruch 6, wobei die Vorabrufstoppschaltung weiter konfiguriert ist zum Detektieren, dass eines konditionaler Zweigbefehl nicht entschieden hat, die Programmschleife zu beenden und wobei die Programmschleife fortfährt, bis der Schleifenaustritt identifiziert worden ist. 12. Vorrichtung nach Anspruch 6, wobei die Vorabrufstoppschaltung weiter konfiguriert ist, ausstehende Vorabrufanfragen nicht zu unterdrücken bzw. zu löschen basierend auf einem schwach vorhergesagten Schleifenaustritt. 13. Ein computerlesbares nicht transitorisches Medium, das mit computerlesbaren Programmdaten und Code codiert ist, wobei die Programmdaten und der Code, wenn sie durch einen Prozessor ausgeführt werden, ausführbar sind zum Durchführen eines Verfahrens gemäß einem der Ansprüche 1 bis 5.
Revendications 1. Un procédé (200) pour annuler des requêtes de préchargement d’antémémoire de données qui ne soient pas la conséquence d’une demande, dans un système de processeur (100) comprenant un processeur (110) ayant un système d’antémémoire (112) comprenant une antémémoire de données (124), et ayant un pipeline d’instructions (120), le procédé comprenant les étapes consistant à : déterminer un intervalle d’adresses (stride) d’accès aux données en fonction de l’exécution répétée d’une instruction d’accès à la mémoire dans une boucle de programme ; délivrer spéculativement des requêtes de préchargement d’antémémoire de données en fonction de l’intervalle d’adresses d’accès aux données; identifier (210) une sortie de boucle sur la base d’une évaluation d’information concernant le flux de programme ; et caractérisé par l’étape consistant à : annuler (214) les requêtes de préchargement d’antémémoire de données qui sont des requêtes d’antémémoire de données en attente qui ne soient pas la conséquence d’une demande, en réponse à la sortie de boucle identifiée. 2. Le procédé selon la revendication 1, dans lequel la sortie de boucle est basée sur l’identification d’un branchement de fin de boucle qui évalue le fait qu’il faut sortir de la boucle de programme. 3. Le procédé selon la revendication 1, dans lequel l’identification de la sortie de boucle est basée sur une prédiction de branchement incorrecte qui a pour effet l’annulation du préchargement et de l’exécution d’instruction spéculatifs. 4. Le procédé selon la revendication 1, dans lequel l’identification de la sortie de boucle comprend l’étape consistant à détecter le fait qu’une instruction de branchement conditionnel a décidé de mettre fin à la boucle de programme. 5. Le procédé selon la revendication 1, comprenant en outre les étapes consistant à : détecter le fait qu’une instruction de branchement conditionnel n’a pas décidé de mettre fin à la boucle de programme ; et surveiller (202) la survenue de la sortie de boucle. 6. Un appareil (110) pour annuler des requêtes de préchargement d’antémémoire de données qui ne soient pas la conséquence d’une demande, dans un système de processeur (100) comprenant un processeur (110) ayant un système d’antémémoire (112) comprenant une antémémoire de données (124), et ayant un pipeline d’instructions (120), l’appareil comprenant: un moniteur d’adresse de données de boucle configuré pour déterminer un intervalle d’adresses d’accès aux données en fonction de l’exécution répétée d’une instruction d’accès à la mémoire dans une boucle de programme ; une logique de préchargement (121) configurée pour délivrer spéculativement des requêtes de préchargement d’antémémoire de données en fonction de l’intervalle d’adresses d’accès aux données ; des moyens pour identifier (210) une sortie de boucle sur la base d’une évaluation d’information concernant le flux de programme ; et caractérisé par : un circuit d’arrêt de préchargement configuré pour annuler les requêtes de préchargement d’antémémoire de données qui sont des requêtes d’antémémoire de données en attente qui ne soient pas la conséquence d’une demande, en réponse à la sortie de boucle identifiée. 7. Appareil selon la revendication 6, dans lequel le moniteur d’adresse de données de boucle comprend : un circuit d’intervalle d’adresses (stride) (119) configuré pour surveiller l’exécution répétée de l’instruction d’accès à la mémoire pour déterminer une différence dans une adresse d’opérande pour chaque exécution de l’instruction d’accès à la mémoire, dans lequel la différence d’adresse de l’opérande est une valeur d’adresse d’un intervalle d’adresses ; et un circuit ayant une fonction d’addition configuré pour ajouter la valeur d’adresse d’un intervalle d’adresses à l’adresse de l’opérande de l’instruction d’accès à la mémoire exécutée la plus récemment pourdéterminer l’adresse d’opérande suivante. 8. Appareil selon la revendication 6, dans lequel la sortie de boucle identifiée est basée sur l’identification d’un branchement de fin de boucle qui évalue le fait qu’il faut sortir de la boucle de programme. 9. Appareil selon la revendication 6, dans lequel la sortie de boucle identifiée est basée sur une prédiction de branchement incorrecte qui annule le préchargement et l’exécution d’instruction spéculatifs. 10. L’appareil selon la revendication 6, dans lequel la sortie de boucle identifiée est basée sur le fait qu’une instruction de branchement conditionnel a décidé de mettre fin à la boucle de programme. 11. L’appareil selon la revendication 6, dans lequel le circuit d’arrêt de préchargement est en outre configuré pour détecter le fait qu’une instruction de branchement conditionnel n’a pas décidé de mettre fin à la boucle de programme et dans lequel la boucle de programme continue jusqu’à ce que la sortie de boucle soit identifiée. 12. Appareil selon la revendication 6, dans lequel le circuit d’arrêt de préchargement est en outre configuré pour ne pas annuler les requêtes de préchargement en attente sur la base d’une sortie de boucle faiblement prédite. 13. Un support non transitoire lisible par ordinateur encodé avec des données et du code de programme lisibles par ordinateur, les données et le code du programme lorsqu’ils sont exécutés par un proces-seurétant exploitables pour mettre en oeuvre un procédé selon l’une quelconque des revendications 1 à 5.
REFERENCES CITED IN THE DESCRIPTION
This list of references cited by the applicant is for the reader’s convenience only. It does not form part of the European patent document. Even though great care has been taken in compiling the references, errors or omissions cannot be excluded and the EPO disclaims all liability in this regard.
Patent documents cited in the description • US 6775765 B [0004] · US 066508 A [0020] • US 6260116 B [0004]

Claims (3)

  1. kijárás és bertmdózős adstmidtóitesi MfM«*ék-tődéáéfé ágy sikiushéz Szabadalmi igénypontok X. kijárás (Xöö): nem kérelmezett ódát -gyömltőtár előtöltési kérelmek tárlésére «így pröcesszort (XXÖj tsrfelme-processzoros rendszerben (Idő), amely egy adstgyorsltóiérat (124) és utasítás gígeime-t (X20) tartalmazó gyorsító!ás· rendszerrel (llllj rondeikazik,az eljárás tartalmazza: adat hozzáférést lépésköz meghatározását egy program ciklusban egy memória hozzáférési uíaofes -smeieit v égroh s jtésa ala pján; adat gyorsítótár előtöltési káféimék spekulatív kibocsátását ez adat hozzáférési lépésköznek íitogföielőéo; egy ciklus befejezódés atönositását (2X0) program folyam-információ kiértékelése· álapján· és a következőkkel jellemezve; azon aóat gyorsítóiéi eírHoitésí kérelmek tödé:·:? (?. 14), amelyek luggö nem kérelmezet: aO«t gyorsító;ar e!ő-töltési kérelmek, válaszképpen az azonosltísft ciklus p«fcjeződésre.
  2. 2. Az X. igénypont szerinti eljárás, ahol a ciklus befejezötiés égy olyan ciklust befejezd ág azonosításán álspui, amely kiértékeli,, hogy ki kell lépnie a program ciklusból 4. edz 1. igénypont szerinti eljárás., ahol o ciklus befejeződés agy hibás ági elöreberstésen alapul., amely spekulatív utasítás lehívást és törlendő végrehajtást aközött
  3. 4. Az X igénypont szerint; eljárás, ahol a ciklus befejeződés azonosítása -«rtaifnazza annak óetektádísát, hogy egy feltétele:; elágazási utasítás a program ciklus befejezésé: ne no:te el. §, Az J; Igénypont' szerinti eljárás, amely tartalmazza továbbá: ábhak detektálását, hogy egy feltételes elágazási utasítás bem döntötte e| a program ciklus befejezését; és egy ciklus befejeződés monitorozását 1282). 6, berendezés (XXö) nem kérelmezed adat gyorsítótár előtöltési kérelmek törlésére egy processzoros rendszerben (100), amely egy adat gyorsítóiéra!; (X24) és utasítás pipáimért íDÖ) tartalmazó gyorsrfötár rendszerrel (1X2) rendelkezik, ó berendezés tartalmaz: eb\tt\>i< «ze; < m !«os<:->rt h mn « a s.s ; "v^mrtW'rí. egy adathoz; Iforc a epe&amp;lozt egy program Ciklusban memória hozzáférés utasítás ismételt végrehajtása alapján; adatfeiötöltési logikát (121), amely úgy van kopfigurálva, hogy spekulatív módón adat gyorsítótár előtöltési kérelmeket böcsássob ki az abathozzáfárési lépésköznek megfelelően; eszköz! egy ciklus beíéjözőtiés azonosításéra (2XÖ), 3: program:folyam Információ kiértékelése alapján; «ás azzal jeliensezvej hogy -arlaimaz: egy eiötöltés leállító áramkört, amely úgy van konfigurálva, hogy törölje azokat az adat gyorsítótár előtöltési kérelmeket, amelyek függőben lévő nem kért adat gyorsítótár aiötoltéss kérelmek, váisszképpén az azonosított ciklus .befejeződésre, ?, A δ, Igénypont szerinti berendezés, ahol a. ciklus adat cím monitor tartalmaz:; egy lépésköz áramkört (IXSj, amely égy Wö kessfigtífá^ hegy figyelje a memóna hozzáférési utasít ás ismételt végrehajtásét, hogy meghatározzon egy k.ülÖnhségpt agy operanduseimben a ntemőrla hozzáíerési utasítás minden egyes-végrehajtásához, ahola különbség a? operaodussírohsn egy lépésközeim érték: és égy hozzáadás! funkció áramkört, amely úgy «ars -konfigurálva, hogy a lépésközeim értéke; hozzásöja a Isgfrís· sefeö vágrohsjtott fhemárla hozzáférés utasítások opsrsnóuscímóhez, hogy meghátárőZ'Zá a kövoíkoző opcrsnöoscíme;:. 8, A 6 igénypont szerinti bérűn hetes, ahol az azonosított ciklus befejeződés egy olyan ciklust befejező ág szemositásán 'aispul, -amely kiértékeli, hogy k> kuli lépnie a -program ciklusból, 5. A 6 is-enypom szennh berendezés, egy b:bás ag: elörebecslesen alapul. an>e;y spekulatív utasítás íohmás: us törlendő végrehajtást okozott. 18. A 6. igénypont szerinti berendezés, ahol a ciklus hefájázödös szönosííásg tartalmazza eonák détskíálását, hogy egy feltételes elágazási utasítás a program ciklus befejezéséi 'döntötte «I. XI, A ó. igéhypöht sáerinti bizrendezés, ahol s ieálíítás: «íőtölíés ácámkőr őgy von továbbá konisgurálys, hogy detektálja, begy egy feitétefes/kondicionáíis ál utasítás nem oldódott meg/nem fejeződött be, a program ciklus befejezéséhez, és ahol a program ciklus folytatódik, amíg a ciklus befejeződés azonosításra bem kerül, XX, A S. Igénypont szerinti berendezés, áböl az eíötőiíás leállító áramkör úgy van továbbá konfigurálvá, hogy egy gyenge előrebec-sült ciklus befejeződés slagján ne törölje a függő eletőitösí kérelmeket, 13; Szám írógép psí Olvasható néíMébjtŐ· közeg, amely számítógépből olvasható program adatra! ez kóddal van kódolva, a program adat és kód egy processzorral történő végrehajtása esetén úgy működtethető, hogy végrehajtsa az l-S. Igénypontok bármelyike szerinti eljárást.
HUE14704714A 2013-01-21 2014-01-18 Eljárás és berendezés adat-elõtöltési kérelmek törlésére egy ciklushoz HUE035210T2 (hu)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/746,000 US9519586B2 (en) 2013-01-21 2013-01-21 Methods and apparatus to reduce cache pollution caused by data prefetching

Publications (1)

Publication Number Publication Date
HUE035210T2 true HUE035210T2 (hu) 2018-05-02

Family

ID=50113017

Family Applications (1)

Application Number Title Priority Date Filing Date
HUE14704714A HUE035210T2 (hu) 2013-01-21 2014-01-18 Eljárás és berendezés adat-elõtöltési kérelmek törlésére egy ciklushoz

Country Status (10)

Country Link
US (1) US9519586B2 (hu)
EP (1) EP2946286B1 (hu)
JP (1) JP6143886B2 (hu)
KR (1) KR101788683B1 (hu)
CN (1) CN105074655B (hu)
BR (1) BR112015017103B1 (hu)
ES (1) ES2655852T3 (hu)
HU (1) HUE035210T2 (hu)
TW (1) TWI521347B (hu)
WO (1) WO2014113741A1 (hu)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9424046B2 (en) * 2012-10-11 2016-08-23 Soft Machines Inc. Systems and methods for load canceling in a processor that is connected to an external interconnect fabric
US9348754B2 (en) 2012-10-11 2016-05-24 Soft Machines Inc. Systems and methods for implementing weak stream software data and instruction prefetching using a hardware data prefetcher
CN104133691B (zh) * 2014-05-05 2016-08-31 腾讯科技(深圳)有限公司 加速启动的方法及装置
US20160283243A1 (en) * 2015-03-28 2016-09-29 Yong-Kyu Jung Branch look-ahead instruction disassembling, assembling, and delivering system apparatus and method for microprocessor system
CN107710153B (zh) * 2015-07-09 2022-03-01 森蒂彼得塞米有限公司 具有有效的存储器访问的处理器
US10275249B1 (en) * 2015-10-15 2019-04-30 Marvell International Ltd. Method and apparatus for predicting end of loop
US10528352B2 (en) 2016-03-08 2020-01-07 International Business Machines Corporation Blocking instruction fetching in a computer processor
US10175987B2 (en) 2016-03-17 2019-01-08 International Business Machines Corporation Instruction prefetching in a computer processor using a prefetch prediction vector
US10474578B2 (en) * 2017-08-30 2019-11-12 Oracle International Corporation Utilization-based throttling of hardware prefetchers
GB2572954B (en) * 2018-04-16 2020-12-30 Advanced Risc Mach Ltd An apparatus and method for prefetching data items
US10649777B2 (en) * 2018-05-14 2020-05-12 International Business Machines Corporation Hardware-based data prefetching based on loop-unrolled instructions
GB2574270B (en) * 2018-06-01 2020-09-09 Advanced Risc Mach Ltd Speculation-restricted memory region type
US11216279B2 (en) * 2018-11-26 2022-01-04 Advanced Micro Devices, Inc. Loop exit predictor
US10884749B2 (en) 2019-03-26 2021-01-05 International Business Machines Corporation Control of speculative demand loads
US10963388B2 (en) * 2019-06-24 2021-03-30 Samsung Electronics Co., Ltd. Prefetching in a lower level exclusive cache hierarchy
CN110442382B (zh) * 2019-07-31 2021-06-15 西安芯海微电子科技有限公司 预取缓存控制方法、装置、芯片以及计算机可读存储介质
US11150812B2 (en) * 2019-08-20 2021-10-19 Micron Technology, Inc. Predictive memory management
CN111541722B (zh) * 2020-05-22 2022-03-18 哈尔滨工程大学 基于密度聚类的信息中心网络缓存污染攻击检测防御方法
US11630654B2 (en) * 2021-08-19 2023-04-18 International Business Machines Corporation Analysis for modeling data cache utilization

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02287828A (ja) * 1989-04-28 1990-11-27 Fujitsu Ltd プリフェッチ制御方式
JPH0439733A (ja) * 1990-06-06 1992-02-10 Fujitsu Ltd 先行制御方式
JPH04344935A (ja) * 1991-05-23 1992-12-01 Nec Corp 情報処理装置
JPH10232775A (ja) * 1997-02-20 1998-09-02 Hitachi Ltd プリフェッチ機構
US5996061A (en) * 1997-06-25 1999-11-30 Sun Microsystems, Inc. Method for invalidating data identified by software compiler
US6430680B1 (en) 1998-03-31 2002-08-06 International Business Machines Corporation Processor and method of prefetching data based upon a detected stride
US6260116B1 (en) 1998-07-01 2001-07-10 International Business Machines Corporation System and method for prefetching data
US6611910B2 (en) * 1998-10-12 2003-08-26 Idea Corporation Method for processing branch operations
US6446143B1 (en) * 1998-11-25 2002-09-03 Compaq Information Technologies Group, L.P. Methods and apparatus for minimizing the impact of excessive instruction retrieval
US6321330B1 (en) 1999-05-28 2001-11-20 Intel Corporation Each iteration array selective loop data prefetch in multiple data width prefetch system using rotating register and parameterization to avoid redundant prefetch
US6799263B1 (en) * 1999-10-28 2004-09-28 Hewlett-Packard Development Company, L.P. Prefetch instruction for an unpredicted path including a flush field for indicating whether earlier prefetches are to be discarded and whether in-progress prefetches are to be aborted
US6775765B1 (en) * 2000-02-07 2004-08-10 Freescale Semiconductor, Inc. Data processing system having instruction folding and method thereof
US20020144054A1 (en) * 2001-03-30 2002-10-03 Fanning Blaise B. Prefetch canceling based on most recent accesses
JP3683248B2 (ja) * 2002-10-22 2005-08-17 富士通株式会社 情報処理装置及び情報処理方法
US7194582B1 (en) 2003-05-30 2007-03-20 Mips Technologies, Inc. Microprocessor with improved data stream prefetching
US7526604B1 (en) 2004-08-09 2009-04-28 Nvidia Corporation Command queueing speculative write prefetch
US7587580B2 (en) * 2005-02-03 2009-09-08 Qualcomm Corporated Power efficient instruction prefetch mechanism
US8589666B2 (en) 2006-07-10 2013-11-19 Src Computers, Inc. Elimination of stream consumer loop overshoot effects
US7917701B2 (en) * 2007-03-12 2011-03-29 Arm Limited Cache circuitry, data processing apparatus and method for prefetching data by selecting one of a first prefetch linefill operation and a second prefetch linefill operation
US7640420B2 (en) * 2007-04-02 2009-12-29 Intel Corporation Pre-fetch apparatus
GB0722707D0 (en) 2007-11-19 2007-12-27 St Microelectronics Res & Dev Cache memory
US8479053B2 (en) 2010-07-28 2013-07-02 Intel Corporation Processor with last branch record register storing transaction indicator
US8661169B2 (en) 2010-09-15 2014-02-25 Lsi Corporation Copying data to a cache using direct memory access
US8977819B2 (en) 2010-09-21 2015-03-10 Texas Instruments Incorporated Prefetch stream filter with FIFO allocation and stream direction prediction

Also Published As

Publication number Publication date
TWI521347B (zh) 2016-02-11
JP2016507836A (ja) 2016-03-10
EP2946286B1 (en) 2017-10-25
CN105074655A (zh) 2015-11-18
KR101788683B1 (ko) 2017-10-20
EP2946286A1 (en) 2015-11-25
US9519586B2 (en) 2016-12-13
WO2014113741A1 (en) 2014-07-24
BR112015017103B1 (pt) 2022-01-11
US20140208039A1 (en) 2014-07-24
KR20150110588A (ko) 2015-10-02
BR112015017103A2 (pt) 2017-07-11
TW201443645A (zh) 2014-11-16
CN105074655B (zh) 2018-04-06
ES2655852T3 (es) 2018-02-21
JP6143886B2 (ja) 2017-06-07

Similar Documents

Publication Publication Date Title
HUE035210T2 (hu) Eljárás és berendezés adat-elõtöltési kérelmek törlésére egy ciklushoz
EP2467776B1 (en) Methods and apparatus to predict non-execution of conditional non-branching instructions
JP5357017B2 (ja) 高速で安価なストア−ロード競合スケジューリング及び転送機構
KR101364314B1 (ko) 분배형 프레디킷 예측을 제공하기 위한 방법, 시스템 및 컴퓨터 액세스가능한 매체
US8612944B2 (en) Code evaluation for in-order processing
US7539851B2 (en) Using register readiness to facilitate value prediction
US10261789B2 (en) Data processing apparatus and method for controlling performance of speculative vector operations
US9367317B2 (en) Loop streaming detector for standard and complex instruction types
EP3871093B1 (en) Processor memory reordering hints in a bit-accurate trace
CN112219193A (zh) 一种处理器性能的监测方法及装置
US11048609B2 (en) Commit window move element
US20050144604A1 (en) Methods and apparatus for software value prediction
KR20140111416A (ko) 정적 스케쥴 프로세서의 논블로킹 실행 장치 및 방법
John Effectiveness of SPEC CPU2006 and multimedia applications on Intel's single, dual and quad core processors