CN105354010A - Processor and method for executing hardware data by processor - Google Patents

Processor and method for executing hardware data by processor Download PDF

Info

Publication number
CN105354010A
CN105354010A CN201510683936.3A CN201510683936A CN105354010A CN 105354010 A CN105354010 A CN 105354010A CN 201510683936 A CN201510683936 A CN 201510683936A CN 105354010 A CN105354010 A CN 105354010A
Authority
CN
China
Prior art keywords
characteristic
processor
looking ahead
preset program
hardware
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510683936.3A
Other languages
Chinese (zh)
Other versions
CN105354010B (en
Inventor
罗德尼·E·虎克
艾伯特·J·娄坡
约翰·麦可·吉尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/625,124 external-priority patent/US10514920B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN105354010A publication Critical patent/CN105354010A/en
Application granted granted Critical
Publication of CN105354010B publication Critical patent/CN105354010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A processor and a method for executing hardware data by the processor are provided, the processor comprises a processing core, the processing core detects whether a preset program is executed on the processor and inquires the prefetching characteristic related to the preset program, wherein the prefetching characteristic is mutual exclusion or sharing. The processor also includes a hardware data prefetcher that performs hardware prefetching for the predetermined program using the prefetch characteristic. The present invention can perform an analysis of changing prefetch characteristics at runtime, thus making it easier to determine when other memory access agents will access which memory block at compile time, as opposed to software prefetching.

Description

Processor and the method by processor execution hardware data
Technical field
The present invention has the data pre-fetching about processor, and advocates the U.S. Provisional Application case the 62/066th that on October 20th, 2014 proposes, and the right of priority of No. 131, this case entirety is cited as reference of the present invention.
Background technology
Because processor inside continues not reciprocity growth to the access time of memory cache relative to the access time of processor to system storage, highlighting processor needs mode of better looking ahead.For example, Mowry describes a mode making an amendment compiler to use exclusive mode to look ahead, when this compiler performs local analytics to segmentation storer, with reference to " equivalence classification; it can be the reference set of single reference ", and insert " exclusive mode is looked ahead but not shared model is taken in a given equivalence classification in advance; if member of at least one equivalence classification is write ", refer to " delay-tolerant of being looked ahead by software control data ", Mowry, ToddCarl, stanford university's PhD dissertation in 1994, the 89th page describe.
A shortcoming of the mode of looking ahead based on software is as described in Mowry, because so prefetched instruction will write in program can increase microcode size, and increase microcode size and may need in the upper more storage area of system major storage medium (such as hard disk) to retain larger program, and also will in the system memory in the larger space of maintenance when larger program performs.Extra instruction also consumes the resource of processor, such as assign region (dispatchslots), reservation station region and performance element region etc., these all may have a negative impact to processor efficiency, and more particularly will reduce effective previewing ability between instruction area, therefore very huge on using the impact of the parallel processing ability of instruction stratum.Another shortcoming is that it can not provide benefit to all programs performed at processor, and only (profiled) described by optimizing compiler and compiler language can be used to benefit to some extent to those.
Summary of the invention
The invention provides a kind of processor, this processor comprises one in order to detect the core whether a preset program performs just on a processor, this core also inquires about a look ahead characteristic relevant to this preset program performed just on a processor, and wherein this characteristic of looking ahead is mutual exclusion or shares.This processor also comprises a hardware data prefetcher, and it uses this characteristic of looking ahead for this preset program execution hardware prefetch.
The present invention separately provides a kind of hardware data prefetcher method performed by processor, and the method comprises inquiry one the look ahead characteristic relevant to this preset program performed just on a processor, and wherein this characteristic of looking ahead is mutual exclusion or shares.The method also comprises this characteristic of looking ahead of use for this preset program execution hardware prefetch.
The present invention provides again a kind of processor, this processor comprises one in order to detect the core whether a preset program performs just on a processor, and perform just on a processor in response to detecting this preset program, by this processor each one or more scope working storage separately load address scope to wherein, this one or more scope working storage each has a relevant characteristic of looking ahead, and wherein this characteristic of looking ahead is mutual exclusion or shares.This processor also comprises a hardware data prefetcher, and it uses this characteristic of looking ahead being relevant to the address realm being loaded on this scope working storage, for this preset program performs hardware prefetch.
The present invention provides again a kind of hardware data prefetcher method performed by processor, and whether the method comprises detecting one preset program and perform just on a processor.The method also comprises and performs just on a processor in response to detecting this preset program, by this processor each one or more scope working storage separately load address scope to wherein, this one or more scope working storage each has a relevant characteristic of looking ahead, and wherein this characteristic of looking ahead is mutual exclusion or shares.The method also uses this characteristic of looking ahead being relevant to the address realm being loaded on this scope working storage, for this preset program performs hardware prefetch.
The present invention operationally can observe other memory access proxy servers to the access of memory block, namely the analysis changing characteristic of looking ahead is performed, thus, for software prefetching, easily determine when other memory access proxy servers can access which memory block when compiling.
Accompanying drawing explanation
Fig. 1 is the calcspar of the computer system of one embodiment of the invention.
Fig. 2 is the thin portion calcspar of the hardware data prefetcher of Fig. 1.
Fig. 3 is the operational flowchart of the system of Fig. 1.
Fig. 4 to 11 be Fig. 1 according to multiple memory access proxy server to the access analysis of a memory block dynamically to upgrade the operational flowchart of this characteristic of looking ahead.
Figure 12 uses off-line procedure analysis to decide to look ahead characteristic to perform the operational flowchart of hardware prefetch.
Figure 13 describes the calcspar of multiple scope working storage.
Figure 14 be Fig. 1 according to multiple memory access proxy server to the access analysis of a memory block dynamically to upgrade the operational flowchart of this characteristic of looking ahead.
Wherein, being simply described as follows of symbol in accompanying drawing:
100: computing system
101: memory access proxy server
102: core
103: processor
104: graphics processing unit (GPU)
106: direct memory access (DMA) (DMA) device
108: system storage
112: bus
114: memory block
122: hardware data prefetcher
124: final stage memory cache (LLC)
132: characteristic of looking ahead
202: memory access history
204: update module
206: prefetch module
212: a part for memory access history 206
208: prefetch request
232: microcode captures
234: program loads/stores
236: pry
1302: range-of-addresses field
1304: characteristics field of looking ahead
302 ~ 312,402 ~ 406,502 ~ 506,602 ~ 608,702 ~ 712,802 ~ 812,902 ~ 912,1002 ~ 1008,1102 ~ 1112,1202 ~ 1208,1402 ~ 1408: step.
Embodiment
< term >
Memory access proxy server is the device of an access system storer, for example, and process core, graphics processing unit and to perform (DMA) peripheral device of direct memory access (DMA) be all memory access proxy server.
Hardware data prefetcher will need estimating of data and via the digital independent from system storage based on memory access proxy server future, particularly, as described herein, hardware prefetch is not a software prefetching, its finger processor because performing a framework prefetched instruction by the digital independent of processor from system storage.Therefore, processor performs hardware prefetch based on processor analysis in the running (that is analysis of memory accesses simultaneous with hardware prefetch).Contrary, being relevant to the software prefetching being inserted in program architecture prefetched instruction (such as in compilation time) can be performed before program performs, and therefore can not occur with software prefetching simultaneously.It may be instruction performed by processor or the data of non-instruction that hardware prefetch performs the data read, and such as processor performs data operation unit during instruction.
Memory block is a series of memory location, such as storage page in the system memory.
Characteristic of looking ahead is that point out digital independent required by person to the character or desirably allow other memory blocks can retain the character of looking ahead of cache line copy (sharing characteristic of looking ahead) of looking ahead that is relevant cache line entitlement mutual exclusion (mutual exclusion look ahead characteristic).When one look ahead use mutual exclusion look ahead characteristic time, to an order other bus transfer of invalid its local express line taking copy (if having amendment then to write back current value data) of memory access proxy server each be comprised, its be usually treated as is read invalidation bus transmission, tool amendment intention reading bus transfer, read entitlement or similar technical name; Otherwise, when look ahead use share look ahead characteristic time, allow other memory access proxy server each to retain the bus transfer of its local express line taking copy in shared state by comprising one, it is treated as usually is simple reading bus transfer or reads and share OK bus transfer or similar technical name.
Fig. 1 illustrates the calcspar of the computer system 100 of one embodiment of the invention.Computer system 100 comprises memory access proxy server 101, and it is shared a system storage 108 and is accessed by a bus 112.Memory access proxy server 101 can comprise peripheral device 106, graphics processing unit (GPU) 104 and a processor 103 of an execution direct memory access (DMA) (DMA).Processor 103 comprise multiple process core 102, one by core 102 the final stage memory cache (LLC) 124 shared and a hardware data prefetcher 122, GPU104 and DMA106 also can comprise a hardware data prefetcher 122.Although only show dinuclear 102 at Fig. 1, the embodiment of other quantity cores also can use technology of the present invention.
Hardware data prefetcher 122 comprises characteristic 132 of looking ahead, and it is that hardware data prefetcher 122 is used for performing hardware prefetch from a memory block 114 of system storage 108, and characteristic of looking ahead 132 has a mutual exclusion or shared value.Hardware data prefetcher 122 is dynamically and optionally according to upgrading to the analysis that this memory block 114 accesses characteristic 132 of looking ahead by memory access proxy server 101.Fig. 2 below and other illustrate to do and further describe by hardware data prefetcher 122.
Processor 103 can comprise one as the Bus Interface Unit of processor 103 with the interface of bus 112, and each core 102 comprises an instruction cache, instruction decoder, instruction dispatcher, storage subsystem (such as load/store unit, memory buffer), other performance elements and a local data cache (such as first order data quick).
When hardware data prefetcher 122 pairs of Bus Interface Units propose a hardware prefetch request, it will with characteristic 132 (that is sharing or mutual exclusion) of looking ahead.Bus unit is responded, and in bus 112, performs transmission and obtain the cache line entitlement being relevant to this hardware prefetch request.If this is looked ahead, characteristic 132 is mutual exclusions, and Bus Interface Unit performs instruction other memory access proxy servers 101 its local express line taking copy invalid, and when local express line taking copy has amendment the numerical value of the current data of write-back.If looked ahead, characteristic 132 is shared, and Bus Interface Unit performs one and allows other memory access proxy server each to retain the bus transfer of its local express line taking copy.
When the memory cache of a cache line to a processor 103 look ahead and cache line can prefetched time, its state for other cores 102 for mutual exclusion or its state are by being shared with other memory access proxy servers 101 of other shared system storeies 108.For example, if a cache line will share by multiple core 102, it can make cache line prefetched effectively in shared state; But, if when prefetched core 102 is performed write by a cache line, it can effectively unshared in mutual exclusion under to look ahead cache line.
Refer to Fig. 2, the square construction drawing of the hardware data prefetcher 122 of its tracing 1.This hardware data prefetcher 122 comprises a update module 204, in order to receive the information come by memory access history 202.Memory access history 202 comprises the information accessed by memory access proxy server 101 pairs of system storages 108, specifically, this memory access history 202 comprises by each core 102 from microcode acquisition 232 (that is instruction captures 232) information performed by system storage 108, program performed by core 102 pairs of system storages 108 loads/stores 234, and response produces the pry 236 (this system storage 108 access system produced by one of multiple memory access proxy servers 101 comprised outside this hardware data prefetcher 122) system storage 108 being performed to access and generation in bus 112.This information can comprise, but be not limited to, the identification code (it also comprises core 102 identification code producing this access) of storage address, access type (such as instruction acquisition, loading, storage) and each memory access proxy server 101.Preferably, in this hardware data prefetcher 122 pairs of system storages 108 the active block 114 that accesses by processor 103 maintain look ahead characteristic 132 and the storer history 202 be separated of a separation.This update module 204 upgrades according to the analysis of memory access history 202 characteristic 132 of looking ahead, and related embodiment will be described below.
Hardware data prefetcher 122 also comprises a reception and to look ahead the prefetch module 206 of characteristic 132.Go to analyze the memory access history of core 102 when starting prefetch module 206, and estimate those data when being needed by core 102 in the future according to this analysis, this prefetch module 206 also receives a part 212 for a memory access history 202 relevant to core 102.Prefetch module 206 is via producing the prefetch request 208 comprising this characteristic 132 of looking ahead to Bus Interface Unit, perform the hardware prefetch to this prefetch data, this characteristic of looking ahead can comprise a preset value, that is shares or mutual exclusion.For example, preset value is by when core 102 optionally Shao Duan Rong silk do default or via core 102 microcode constant value with its state and do default during fabrication.Prefetch module 206 can be looked ahead one or more valuable cache line from system storage 108, and the memory cache (the private memory cache of such as core 102) of lower-order in memory cache stratum is inner stored in memory cache 124 and/or processor 103.
Refer to Fig. 3, it describes the operational flowchart of system shown in Figure 1.
In step 302, the memory block 114 in memory access proxy server 101 access system storer 108, this access can comprise the access of core 102 pairs of memory blocks 104, person as described in step 306.Hardware data prefetcher 122 accumulates the access information be relevant in the memory access history 202 of each active memory block 114.Flow process goes to step 304.
In step 304, update module 204 analyzes the access of this memory access proxy server 101 pairs of memory blocks 114, and dynamically upgrades the characteristic 132 of looking ahead being relevant to this memory block 114 according to this analysis.Update module 204 in step 312, when prefetch module 206 continues to perform hardware prefetch to memory block 114, continues analyze and upgrade characteristic 132 of looking ahead.The step 304 of Fig. 3 has shown operating process to 312, and the embodiment subsequent figures of arranging in pairs or groups subsequently analyzed explains.
Within step 306, core 102 executive routine, it comprises from the instruction of system storage 108 capturing program, and performs loading/storage in response to the execution of institute's capturing program instruction to system storage 108.In addition, instruction accessing, loading and storage access the memory block 114 (such as storage page) of system storage 108.Substantially, access will perform multiple memory block 114.Hardware data prefetcher 122 accumulates the access information be relevant in the memory access history 202 of each active memory block 114.Flow process goes to step 308 by step 306.
In step 308, prefetch module 206 is according to a part 212 for the memory access history 202 of the core 102 pairs of memory blocks 114 accumulated in step 306, and those data estimating memory block 114 will be that core 102 needs.Flow process goes to step 312 from step 308.
In step 312, prefetch module 206 performs the hardware prefetch in step 308 estimated data, and this hardware prefetch is used in the characteristic 132 of looking ahead that step 304 dynamically updates.Although shown by step 302 to 304 is the renewal that memory access proxy server 101 drives characteristic of looking ahead, it should be noted, the characteristic 132 of looking ahead that the memory access undertaken by memory access proxy server 101 in step 302 and step 304 are carried out dynamically updates and can occur simultaneously.In addition, although flow process step 306,308, estimate to 312 because the memory access of core 102 drives, and this estimates and uses the characteristic of looking ahead that dynamically updates to drive hardware prefetch, it should be noted, what the memory access undertaken by core 102 in step 306 and step 308 were carried out look ahead can occur with the hardware prefetch in step 312 simultaneously.As shown in Figure 3, be back to step 302 and 306 by step 312, because flow process is simultaneous in step 302 and 304 and step 306,308 and 312, so be taken as hardware prefetch in advance but not software prefetching performed by step 312.
It should be noted, although above-mentioned flow process only describes the operational scenario about single memory block, but hardware data prefetcher 122 can perform hardware data to multiple memory block 114 and look ahead, the characteristic 132 of looking ahead dynamically updated also can be used to look ahead simultaneously.Preferably, each memory block 144 that hardware data prefetcher 122 can perform hardware prefetch to it maintains relevant dynamically updating and to look ahead characteristic 132.
Mutual exclusion and a benefit of unshared cache line of looking ahead, do like this can cause a unified bus transmission but not two bus transfer, that is, after being not the first transmission of a request msg, followed by one to obtain and transmit with second of the data of entitlement mutual exclusion, it is one to ask in conjunction with two and to require the single transmission of data mutual exclusion that mutual exclusion is looked ahead, and this way is to multi-chip polycaryon processor and each core framework with the final stage cache of oneself is beneficial especially.
Described hereinly to share or between mutual exclusion, the hardware prefetch that changes is relative to the benefit of software prefetching settling mode with dynamic based on characteristic of looking ahead, the settling mode of hardware prefetch can operationally observe other memory access proxy servers this to the access of memory block, that is perform the analysis changing characteristic of looking ahead when they occur, but the settling mode for software prefetching, be difficult to go to determine when other memory access proxy servers can access which memory block when compiling.
Refer to Fig. 4, it describes to do to a memory block 114 analysis accessed according to memory access proxy server 101 in Fig. 1, the operational flowchart dynamically updated of characteristic 132 of carrying out looking ahead.Flow process starts from step 402.
In step 402, characteristic 132 initial value of looking ahead of memory block 114 is mutual exclusion, this is because preset value is mutual exclusion (as mentioned above), or according to an initial access to this memory block 114 (such as according to described in Fig. 6 or Figure 10), mutual exclusion is initialized as to the characteristic 132 of looking ahead of this memory block 114.In general, also can upgrade these data most probably as fruit stone 102 reads data, and just generally speaking, the data in memory block 114 has similar character usually.Therefore, as previously mentioned, cache line of mutually exclusive looking ahead, to perform single bus transfer but not multiple bus transfer, can reduce the carrying capacity of bus 112 and reduce delay.Flow process goes to step 404.
In step 404, a cache line in hardware data prefetcher 122 this memory block 114 notified spy upon by other memory access proxy servers 101, and have write this memory lines intention and the renewal of storer history 202 will be caused, this be also illustrated in other cache lines in this memory block 114 data will write by other memory access proxy servers.In this case, because looking ahead to this cache line between core 102 and other memory access proxy servers 101 may be affected, therefore may to look ahead generation adverse influence to the mutual exclusion of these cache lines.Flow process goes to step 406.
In a step 406, update module 204 upgrades characteristic 132 of looking ahead in response to the pry of step 404 is shared.Flow process terminates in step 406.
Refer to Fig. 5, it describes in Fig. 1, according to memory access proxy server, the access analysis of memory block 114 is dynamically updated to the operational flowchart of characteristic 132 of looking ahead.Flow process starts from step 502.
In step 502, the characteristic 132 of looking ahead of each memory block 114 is initially set shared, this is because preset value is shared (as mentioned above), or be initialized as shared based on an initial access to memory block 114 (such as according to described in Fig. 6 or Figure 10) to the characteristic 132 of looking ahead of memory block 114.Flow process goes to step 504.
In step 504, hardware data prefetcher 122 keeps track quantity that (being such as recorded in memory access history 202) cache line in memory block 114 write by core 102 and detects this quantity more than a critical value.This may represent other cache lines in this memory block 114 data will write by core 102, and as follows to the look ahead reason that has a negative impact of the mutual exclusion of these memory lines in this case.Critical value can be the numerical value predetermined or via system software program or according to looking ahead work effectiveness analysis the numerical value dynamically updated by hardware data prefetcher 122.In one embodiment, critical value is 1, that is to the write of memory block 114, characteristic 132 of looking ahead is updated to mutual exclusion according to first time.Flow process goes to step 506.
In step 506, update module 204 is mutual exclusion in response to critical value is upgraded by surmounting characteristic 132 of looking ahead in step 504.Flow ends is in step 506.
Refer to Fig. 6, in its depiction 1, do access according to memory access proxy server 101 pairs of memory blocks 114 and analyze with the operational flowchart dynamically updated of characteristic 132 of looking ahead.Flow process starts from step 602.
In step 602, update module 204 records the initial access of core 102 pairs of memory blocks 114.Flow process advances to step 604.
In steps in decision-making 604, update module 204 judges that this initial access is instruction acquisition or loads/store.If instruction captures, flow process goes to step 606, otherwise goes to step 608.
In step 606, update module 204 is in response to judging that in step 604 this initial access is that to upgrade characteristic 132 of looking ahead be shared in instruction acquisition, it is helpful for doing like this, because when performing instruction acquisition to a memory block 114, remaining the access of memory block 114 also may be instruction acquisition, and substantially comprises the memory location of instruction once would not be written into after being written into storer again.In one embodiment, hardware data prefetcher 122 continues to be used in the shared characteristic 132 of looking ahead that step 606 dynamically updates and performs hardware prefetch from memory block 114, but, as in other embodiments described by instructions, when the access of hardware data prefetcher 122 pairs of memory blocks does monitoring with when analyzing, characteristic of initially looking ahead 132 can be updated to mutual exclusion (vice versa) from sharing.Flow ends is in step 606.
In step 608, update module 204 is in response to judging that in step 604 this initial access upgrades characteristic 132 of looking ahead for mutual exclusion as loading/storage.In one embodiment, hardware data prefetcher 122 continues to be used in mutual exclusion that step 608 dynamically updates characteristic 132 of looking ahead and performs hardware prefetch from memory block 114, but, as in other embodiments described by instructions, when the access of hardware data prefetcher 122 pairs of memory blocks does monitoring with when analyzing, characteristic of initially looking ahead 132 can be updated to shared (vice versa) from mutual exclusion.Flow process ends at step 608.
Refer to Fig. 7, in its depiction 1, do access according to memory access proxy server 101 pairs of memory blocks 114 and analyze, with the operational flowchart dynamically updated of characteristic 132 of carrying out looking ahead.Flow process starts from step 702.
In a step 702, update module 204 maintains the instruction count values (being such as recorded in memory access history 202) that core 102 captures from memory block 114, and to represent with fetch_cnt and the program come from memory block 114 loads/store count value, and to represent with load_store_cnt.Flow process advances to step 704.
In steps in decision-making 704, update module 204 judges whether fetch_cnt is greater than load_store_cnt.If it is flow process goes to step 706, otherwise flow process goes to step 708.
In step 706, update module 204 is shared in response to fetch_cnt is greater than judging of load_store_cnt and upgrades characteristic 132 of looking ahead in step 704.Flow process goes to step 706.
In step 708, update module 204 judges whether fetch_cnt is less than load_store_cnt, and if it is flow process goes to step 712, otherwise flow process terminates.
In step 712, update module 204 is less than judging of load_store_cnt in response to fetch_cnt in step 708 and upgrades characteristic 132 of looking ahead as mutual exclusion.Flow process ends at step 712.
Refer to Fig. 8, in its depiction 1, do access according to memory access proxy server 101 pairs of memory blocks 114 and analyze, with the operational flowchart dynamically updated of characteristic 132 of carrying out looking ahead.Flow process starts from step 802.
In step 802, hardware data prefetcher 122 maintains the instruction count values (being such as recorded in memory access history 202) that a core 102 captures from memory block 114, and to represent with fetch_cnt and the program come from memory block 114 loads/store count value, and to represent with load_store_cnt.Flow process advances to step 804.
In steps in decision-making 804, update module 204 judges whether the difference of fetch_cnt and load_store_cnt is greater than a critical value.If it is flow process goes to step 806, otherwise flow process goes to step 808.Critical value can be the numerical value predetermined or via system software program or according to looking ahead work effectiveness analysis the numerical value dynamically updated by hardware data prefetcher 122.
In step 806, update module 204 is greater than judging of this critical value in response to the difference in step 804 between fetch_cnt and load_store_cnt and upgrades characteristic 132 of looking ahead is shared.Flow process goes to step 806.
In steps in decision-making 808, update module 204 judges whether the difference of load_store_cnt and fetch_cnt is greater than a critical value.If it is flow process goes to step 812, otherwise flow process terminates.The critical value that can use with step 804 in the critical value numerical value of step 808 is identical or different.
In step 812, update module 204 upgrades characteristic 132 of looking ahead as mutual exclusion in response to the difference in step 808 between load_store_cnt and fetch_cnt is greater than judging of critical value.Flow process ends at step 812.
Refer to Fig. 9, in its depiction 1, do access according to memory access proxy server 101 pairs of memory blocks 114 and analyze, with the operational flowchart dynamically updated of characteristic 132 of carrying out looking ahead.Flow process starts from step 902.
In step 902, hardware data prefetcher 122 maintains the instruction count values (being such as recorded in memory access history 202) that a core 102 captures from memory block 114, and to represent with fetch_cnt and the program come from memory block 114 loads/store count value, and to represent with load_store_cnt.Flow process advances to step 904.
In steps in decision-making 904, update module 204 judges whether the difference of fetch_cnt and load_store_cnt is greater than a critical value.If it is flow process goes to step 906, otherwise flow process goes to step 908.
In step 906, update module 204 is greater than judging of this critical value in response to the difference in step 904 between fetch_cnt and load_store_cnt and upgrades characteristic 132 of looking ahead is shared.Flow process ends at step 906.
In steps in decision-making 908, update module 204 judges whether the difference between fetch_cnt and load_store_cn is less than a critical value.If it is flow process goes to step 912, otherwise flow process terminates.The critical value that can use with step 904 in the critical value numerical value of step 908 is identical or different.
In step 912, update module 204 upgrades characteristic 132 of looking ahead as mutual exclusion in response to the difference in step 908 between fetch_cnt and load_store_cnt is less than judging of critical value.Flow process ends at step 912.
Refer to Figure 10, in its depiction 1, do access according to memory access proxy server 101 pairs of memory blocks 114 and analyze, with the operational flowchart dynamically updated of characteristic 132 of carrying out looking ahead.Flow process starts from step 1002.
In step 1002, update module 204 records the initial access of core 102 pairs of memory blocks 114.Flow process advances to step 1004.
In steps in decision-making 1004, update module 204 judges that this initial access loads or stores.If load, flow process goes to step 1006, otherwise goes to step 1008.In this article, a load access comprises acquisition and the loading performed by this program load instructions of a programmed instruction.
In step 1006, update module 204 is shared in response to judging that in step 1004 this initial access upgrades as loading characteristic 132 of looking ahead, in one embodiment, hardware data prefetcher 122 continues to be used in the shared characteristic 132 of looking ahead that step 1006 dynamically updates and performs hardware prefetch from memory block 114, but, as in other embodiments described by illustrating, when the access of hardware data prefetcher 122 pairs of memory blocks does monitoring with when analyzing, characteristic of initially looking ahead 132 can be updated to mutual exclusion (vice versa) from sharing.Flow process goes to step 1006.
In step 1008, update module 204 is in response to judging that in step 1004 this initial access looks ahead characteristic 132 for mutual exclusion as storage and upgrading.It is helpful for doing like this, because when performing storage to one from memory block 114, remaining the access of memory block 114 also may be store.In one embodiment, hardware data prefetcher 122 continues to be used in mutual exclusion that step 1008 dynamically updates characteristic 132 of looking ahead and performs hardware prefetch from memory block 114, but, as in other embodiments described by instructions, when the access of hardware data prefetcher 122 pairs of memory blocks does monitoring with when analyzing, characteristic of initially looking ahead 132 can be updated to shared (vice versa) from mutual exclusion.Flow process ends at step 1008.
Refer to Figure 11, in its depiction 1, do access according to memory access proxy server 101 pairs of memory blocks 114 and analyze, with the operational flowchart dynamically updated of characteristic 132 of carrying out looking ahead.Flow process starts from step 1102.
In step 1102, hardware data prefetcher 122 maintains the loading count value (being such as recorded in memory access history 202) that a core 102 comes from memory block 114, and the program storage count value representing with load_cnt and come from memory block 114, and represent with store_cnt.Flow process advances to step 1104.
In steps in decision-making 1104, update module 204 judges whether the ratio value of load_cnt and store_cnt is greater than a critical value.If it is flow process goes to step 1106, otherwise flow process goes to step 1108.Critical value can be the numerical value predetermined or via system software program or according to looking ahead analysis of related results the numerical value dynamically updated by hardware data prefetcher 122.
In a step 1106, update module 204 is shared in response to the ratio value of load_cnt and store_cnt is greater than judging of this critical value and upgrades characteristic 132 of looking ahead in step 1104.Flow process ends at step 1106.
In steps in decision-making 1108, update module 204 judges whether the ratio value of store_cnt and load_cnt is greater than a critical value.If it is flow process goes to step 1112, otherwise flow process terminates.The critical value that can use with step 1104 in the critical value numerical value of step 1108 is identical or different.
In step 1112, update module 204 upgrades characteristic 132 of looking ahead as mutual exclusion in response to judging the ratio value of store_cnt and load_cnt to be greater than this critical value in step 1008.Flow process ends at step 1112.
Refer to Figure 12, it describes to use off-line procedure analysis to decide to look ahead characteristic to perform the operational flowchart of hardware prefetch.Flow process starts from step 1102.
In step 1202, a program is analyzed to judge when the execution hardware prefetch work effectiveness of processor when characteristic is looked ahead in shared look ahead characteristic or mutual exclusion is better.This analysis is carried out multiple interesting distinct program (the such as normal program that performs or the known program needing the long period to perform under general situation, therefore their work effectiveness is quite important and need to be optimized).Preferably, when processor use share characteristic of looking ahead to perform hardware prefetch time program can be performed many times, and when processor use mutual exclusion look ahead characteristic to perform hardware prefetch time program can be performed many times perform time, its work effectiveness all can record in addition, such as share or the configuration of mutual exclusion each, the mean value of multiple exercise result will be calculated.In another embodiment, this analysis meeting is used in common experiment value when multiple system communicates with servomechanism, wherein servomechanism provide when system use a configuration info and the improvement configuration of dynamic decision systems time required configuration info and work effectiveness data.This kind of enforcement such as the applying date is the US patent number the 14/474th on September 2nd, 2014,623 and 14/474, the application case of 699, it all cited application day is on May 20th, 2014 and numbering is 62/000, U.S.'s Applicatioll of 808 does right of priority, and they are all incorporated in this use for referencial use.In this example, dynamic system configuration comprises dynamically updating characteristic 132 of looking ahead.Flow process goes to step 1204.
In step 1204, one is compiled the form of each application configuration one project (entry).Preferably, each project comprises the evident characteristics of program and in step 1202, provides the characteristic of looking ahead of best work effectiveness.This evident characteristics can comprise a program name (program name such as known to operating system), memory access aspect (pattern) and/or program the quantity of different instruction type that uses.This form also can be included in the last system software performed on processor 103, such as device driver.Flow process goes to step 1206.
In step 1206, whether program is in the table detected just performs in processor 103.In one embodiment, system software detects this program just in commission, and such as operating system can the program name of query execution Program, and it cans be compared to operating system and inquires about in its execution journey form as the title of each program.In another embodiment, this form can be downloaded to processor 103 by operating system in start-up time, and processor 103 can record during this program is just performed.For example, processor 103 can collect the evident characteristics (quantity of different instruction type that such as storer aspect and/or program use) relevant to program when program performs, and by evident characteristics and step 1204 compile form project compare and be downloaded to processor 103.Flow process goes to step 1208.
In step 1208, hardware data prefetcher 122 performs hardware prefetch to program measured in step 1206, and its characteristic of looking ahead using program measured to this in this table entries relevant is carried out.Flow process ends at step 1208.
Refer to Figure 13, it describes the calcspar of multiple scope working storage 1300.Scope working storage 1300 is contained in hardware data prefetcher 122.In one embodiment, hardware data prefetcher 122 comprises a scope working storage 1300 being relevant to each core 102 and gathers, and each scope working storage 1300 comprises a range-of-addresses field 1302 and and to look ahead characteristics field 1304.Each range-of-addresses field 1302 can be arranged in an address realm of processor 103 address space with instruction via sequencing.Characteristic of looking ahead 1304 indicates one to look ahead characteristic, and it is shared or mutual exclusion.As prefetch module 206 pairs of hardware prefetch the data address predicted, this prefetch module 206 judges whether this prefetch address is arranged in the address realm of indicated scope working storage 1300.If correct, prefetch module 206 according in the relevant characteristic of looking ahead of looking ahead indicated by characteristics field 1304 to produce this prefetch request 208; If incorrect, in one embodiment, this prefetch module 206 with preset characteristic of looking ahead to produce this prefetch request 208.In one embodiment, this characteristic of looking ahead preset is shared, and therefore scope working storage 1300 only needs to be used to refer to the address realm needed for mutual exclusion hardware prefetch.In another embodiment, this characteristic of looking ahead preset is mutual exclusion, so scope working storage 1300 only needs to be used to refer to the address realm needed for shared hardware prefetch.In these embodiments, contrary with the characteristic of looking ahead preset owing to implying indicated characteristic of looking ahead, therefore this characteristics field 1304 of looking ahead may not need.
Refer to Figure 14, it describes to use the characteristic of looking ahead determined by the scope working storage 1300 of Figure 13 to perform the operational flowchart of hardware prefetch.Flow process starts from step 1402.
In step 1402, one program analyzed with use when processor share look ahead characteristic or mutual exclusion look ahead characteristic perform hardware prefetch time, which program judging in distinct program has preferably work effectiveness when performing on processor 103, it is similar to the mode described in above-mentioned Figure 12, but analysis package performed by step 1402 is containing the thinner interval (finergranularity) of the analysis tool being comparatively executed in step 1202.More particularly, this analysis package containing to be relevant to the address realm be programmed within address register 1300, assesses each program to share or mutual exclusion is looked ahead the work effectiveness of characteristic.For example, comprise by the address realm of multiple memory access proxy servers 101 access datas, by sharing the characteristic and being effectively contained in form of looking ahead; Otherwise, to comprise by a single core 102 write the address realm of data, can to look ahead characteristic and being effectively contained in form via mutual exclusion.Flow process goes to step 1404.
In step 1404, one is compiled the form of each application configuration one project, and it is similar to the mode described in step 1204.But comprise address realm in the form that step 1404 compiles and will the relevant characteristic of looking ahead of scope working storage 1300 be pushed into.Flow process goes to step 1406.
In step 1406, whether program is in the table detected just performs on processor 103, and it is similar to the mode described in step 1206.But, when program measured perform time, use the information being relevant to this program in this table entries to carry out programmed range working storage 1300 extraly.In one embodiment, by this scope working storage 1300 of operating system programization.In another embodiment, processor 103 by response to detecting the execution of this program and this scope working storage 1300 of oneself sequencing own, for example, the microcode programmable scope working storage 1300 of processor 103.Flow process goes to step 1408.
In step 1408, the characteristic of looking ahead of hardware data prefetcher 122 usable range working storage 1300 also combines the characteristic of looking ahead preset, and performs hardware prefetch to the program measured in step 1406.Flow process ends at step 1408.
Although described many different embodiments dynamically updating characteristic 132 of looking ahead above, other embodiments not departing from spirit of the present invention are also considered in the present invention, for example, in one embodiment, a saturation count value will be maintained to each active memory block 114.When of memory access proxy server 101 accesses memory block 114, and intention from mutual exclusion hardware prefetch obtain (such as one stores or loads/store) obtain advantage time, the saturated mode of update module 204 increases count value; Otherwise when of memory access proxy server 101 accesses memory block 114, and when intention obtains advantage from shared hardware prefetch (such as loads or instruction acquisition), the saturated mode of update module 204 reduces count value.Preferably, characteristic of looking ahead 132 is most significant bits of saturation count value, for example, update module 204 maintains a queue (such as shift registor) in order to store the information (such as store, load/stores, instruction capture) about the nearest N number of access to each memory block 114, and wherein N is greater than 1.Update module 204, according to being stored in queue and wishing the information that obtains advantage from mutual exclusion or shared hardware prefetch, dynamically upgrades characteristic 132 of looking ahead for mutual exclusion or share.For example, if when most nearest N number of access is storage, be updated to mutual exclusion, otherwise, if when most nearest N number of access is instruction acquisition, be updated to shared.In another example, for each hardware prefetch that prefetch module 206 performs from memory block 114, update module 204 maintain use the instruction of characteristic 132 of looking ahead.Concerning each access occurring in cache line of looking ahead, when the memory access proxy server 101 accessed if doing writes the cache line being relevant to instruction, by renewal, this is designated as mutual exclusion to update module 204, and if this cache line is spied upon time, update module 204 upgrades this and is designated as shared.Under this mode, a cache line bitmap in memory block 114 shows that (bitmap) is by maintained, in order to point out in memory block 114, and the information of the characteristic of looking ahead closest to the best that different cache line may be used.Search aspect in update module 204 diagram in place and any one whether in hardware prefetch hit pattern of the address judging next cache line, and use bitmap to show dynamically to judge whether this characteristic 132 of looking ahead can be used for this cache line of hardware prefetch.Finally, although disclosed the embodiment of the hardware data prefetcher being contained in a polycaryon processor herein, other embodiments being contained in the hardware data prefetcher of single core have also belonged to the category not departing from spirit of the present invention.
The present invention is described in this by various embodiment, and above-described embodiment is interpreted as the example that the present invention presents, and should not produce any restriction to the present invention.Art personnel should significantly recognize, without departing from the spirit and scope of the present invention, when can carry out any in form or the change of details or retouching.For example, available software simulating, the function of apparatus and method as described herein, manufacture, modeling, simulation, description and/or test.Above-mentioned by using general procedure language (such as C, C++), the hardware description language (HDL) comprising VerilogHDL, VHDL etc. or other available programs and realized.Above-mentioned software can be arranged at any known computer usable medium, such as tape (magnetictape), semiconductor, disk or CD (such as CD-ROM, DVD-ROM etc.), a network, wired or wireless or other communication medias.The various embodiments of apparatus and method as herein described can comprise semiconductor intellectual property core (semiconductorintellectualpropertycore), such as a processor core (such as realized by HDL or specify) and be converted to hardware by IC manufacturing.In addition, apparatus and method as herein described are realized by the combination of hardware and software.Therefore, scope of the present invention should not be limited to any one exemplary embodiment herein, and only should be as the criterion with the scope of claims of the present invention and its equivalent scope.Should be specifically noted that, the present invention can be implemented in processor device, and this processor can be used among general computing machine.Finally, art personnel should understand, based on concept disclosed herein and embodiment, any design or modify other frameworks to possess the application of object identical with the present invention, has been contained in scope of the present invention and all defined in the scope of claims of the present invention all.

Claims (20)

1. a processor, is characterized in that, comprises:
One core, in order to detect the characteristic of looking ahead that whether a preset program performs just on the processor and inquiry one is relevant to this preset program performed just on a processor, wherein this characteristic of looking ahead is mutual exclusion or shares; And
One hardware data prefetcher, uses this characteristic of looking ahead for this preset program execution hardware prefetch.
2. processor according to claim 1, is characterized in that, this characteristic of looking ahead was identified before whether this preset program of detecting performs just on the processor.
3. processor according to claim 2, is characterized in that, the step that this characteristic of looking ahead was identified before whether this preset program of detecting performs just on the processor comprises:
When this processor is just performing this program and using one to share characteristic of looking ahead to perform hardware prefetch, judging one first work effectiveness;
When this processor just performing this program and use a mutual exclusion look ahead characteristic to perform hardware prefetch time, judge one second work effectiveness; And
When this first work effectiveness selects this characteristic of looking ahead to be shared when comparatively this second work effectiveness is good, when this second work effectiveness selects this characteristic of looking ahead to be mutual exclusion when comparatively this first work effectiveness is good.
4. processor according to claim 1, it is characterized in that, whether detect the step whether this preset program performs just on the processor, be just detect this pre-programmed identification code at executive routine form by the operating system performed just on a processor in one of this operating system to exist.
5. processor according to claim 4, is characterized in that, this processor receives this characteristic of looking ahead from this operating system.
6. processor according to claim 1, is characterized in that, this processor also in order to:
Before whether this preset program of detecting performs just on the processor, receive at least one evident characteristics of instruction and the characteristic of looking ahead of each being relevant to multiple program, wherein this preset program is one of the plurality of program;
Wherein detect this preset program when whether performing just on the processor, the evident characteristics determined when pre-programmed for this in received information at least one evident characteristics and this preset program perform by this hardware data prefetcher is compared; And
When wherein using this characteristic of looking ahead for this preset program execution hardware prefetch, this processor uses the characteristic of looking ahead being relevant to this comparison result of at least one evident characteristics in this received information, comes for this preset program performs hardware prefetch.
7. performed a method for hardware data by processor, it is characterized in that, the method comprises:
Detect a preset program whether to perform just on the processor;
Inquire about a look ahead characteristic relevant to this preset program performed just on a processor, wherein this characteristic of looking ahead is mutual exclusion or shares; And
Use this characteristic of looking ahead for this preset program execution hardware prefetch.
8. the method being performed hardware data by processor according to claim 7, is characterized in that, this characteristic of looking ahead was identified before whether this preset program of detecting performs just on the processor.
9. the method being performed hardware data by processor according to claim 8, is characterized in that, the step that this characteristic of looking ahead was identified before whether this preset program of detecting performs just on the processor comprises:
When this processor is just performing this program and using one to share characteristic of looking ahead to perform hardware prefetch, judging one first work effectiveness;
When this processor just performing this program and use a mutual exclusion look ahead characteristic to perform hardware prefetch time, judge one second work effectiveness; And
When this first work effectiveness selects this characteristic of looking ahead to be shared when comparatively this second work effectiveness is good, when this second work effectiveness selects this characteristic of looking ahead to be mutual exclusion when comparatively this first work effectiveness is good.
10. the method being performed hardware data by processor according to claim 7, it is characterized in that, whether detect the step whether this preset program performs just on the processor, be just detect this pre-programmed identification code at executive routine form by the operating system performed just on a processor in one of this operating system to exist.
11. methods being performed hardware data by processor according to claim 10, is characterized in that, also comprise:
Performed just on the processor in response to this preset program by this operating system, and provide this characteristic of looking ahead to the step of this processor.
12. methods being performed hardware data by processor according to claim 7, is characterized in that, also comprise:
Before whether this preset program of detecting performs just on the processor, receive at least one evident characteristics of instruction and the characteristic of looking ahead of each being relevant to multiple program by this processor, wherein this preset program is one of the plurality of program;
Wherein detect the step whether this preset program perform just on the processor to comprise, the evident characteristics determined when pre-programmed for this in received information at least one evident characteristics and this preset program being performed is compared; And
When wherein using this characteristic of looking ahead for this preset program execution hardware prefetch, use the characteristic of looking ahead being relevant to this comparison result of at least one evident characteristics in this received information, come for this preset program performs hardware prefetch.
13. 1 processors, is characterized in that, comprise:
One core, in order to:
Detect a preset program whether to perform just on the processor; And
Perform just on a processor in response to detecting this preset program, be loaded in this one or more scope working storage with one or more scope working storage address realm separately of this processor each, wherein this one or more scope working storage each has a relevant characteristic of looking ahead, and this characteristic of looking ahead is mutual exclusion or shares; And
One hardware data prefetcher, uses this characteristic of looking ahead being relevant to the address realm being loaded on this scope working storage for this preset program execution hardware prefetch.
14. processors according to claim 13, is characterized in that, this characteristic of looking ahead that this use is relevant to the address realm being loaded on this scope working storage performs hardware prefetch for this preset program, comprises:
The following data address that may use of this program of prediction one;
Judge whether this address is dropped in one of them of one or more address realm;
When this address drop on this one or more address realm this one of them in, use and this one of them this relevant characteristic of looking ahead of this one or more address realm, this address perform the hardware prefetch of data; And
When address do not drop on this one or more address realm this one of them in, use a preset property on this address, perform the hardware prefetch of these data.
15. processors according to claim 14, is characterized in that, to this one of them this relevant characteristic of looking ahead of this one or more address realm, imply contrary with the characteristic of looking ahead preset.
16. processors according to claim 14, is characterized in that, to this one of them this relevant characteristic of looking ahead of this one or more address realm, are maintained in the address register of this one or more address realm of maintenance.
17. 1 kinds perform the method for hardware data by processor, and it is characterized in that, the method comprises:
Detect a preset program whether to perform just on the processor;
Perform just on a processor in response to detecting this preset program, be loaded in this one or more scope working storage with one or more scope working storage address realm separately of this processor each, wherein this one or more scope working storage each has a relevant characteristic of looking ahead, and this characteristic of looking ahead is mutual exclusion or shares; And
Use this characteristic of looking ahead being relevant to the address realm being loaded on this scope working storage for this preset program execution hardware prefetch.
18. methods being performed hardware data by processor according to claim 17, is characterized in that, this characteristic of looking ahead that this use is relevant to the address realm being loaded on this scope working storage is the step that this preset program performs hardware prefetch, comprises:
The following data address that may use of this program of prediction one;
Judge whether this address is dropped in one of them of one or more address realm;
When this address drop on this one or more address realm this one of them in, use and this one of them this relevant characteristic of looking ahead of this one or more address realm, this address perform the hardware prefetch of data; And
When address do not drop on this one or more address realm this one of them in, use a preset property on this address, perform the hardware prefetch of these data.
19. methods being performed hardware data by processor according to claim 18, is characterized in that, to this one of them this relevant characteristic of looking ahead of this one or more address realm, imply contrary with the characteristic of looking ahead preset.
20. methods being performed hardware data by processor according to claim 18, is characterized in that, to this one of them this relevant characteristic of looking ahead of this one or more address realm, are maintained in the address register of this one or more address realm of maintenance.
CN201510683936.3A 2014-10-20 2015-10-20 Processor and method for executing hardware data by processor Active CN105354010B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201462066131P 2014-10-20 2014-10-20
US62/066,131 2014-10-20
US14/625,124 2015-02-18
US14/625,124 US10514920B2 (en) 2014-10-20 2015-02-18 Dynamically updating hardware prefetch trait to exclusive or shared at program detection
US14/624,981 US9891916B2 (en) 2014-10-20 2015-02-18 Dynamically updating hardware prefetch trait to exclusive or shared in multi-memory access agent system
US14/624,981 2015-02-18

Publications (2)

Publication Number Publication Date
CN105354010A true CN105354010A (en) 2016-02-24
CN105354010B CN105354010B (en) 2018-10-30

Family

ID=55147989

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201510683939.7A Active CN105278919B (en) 2014-10-20 2015-10-20 Hardware data prefetcher and the method for performing hardware data
CN201510683936.3A Active CN105354010B (en) 2014-10-20 2015-10-20 Processor and method for executing hardware data by processor

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201510683939.7A Active CN105278919B (en) 2014-10-20 2015-10-20 Hardware data prefetcher and the method for performing hardware data

Country Status (1)

Country Link
CN (2) CN105278919B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109272594B (en) * 2018-10-17 2020-10-13 重庆扬升信息技术有限公司 Working method for judging check-in of paperless conference under mass data environment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030131218A1 (en) * 2002-01-07 2003-07-10 International Business Machines Corporation Method and apparatus for mapping software prefetch instructions to hardware prefetch logic
CN1487409A (en) * 2003-02-11 2004-04-07 智慧第一公司 Allocation of cache memory data section and initial mechanism
US20040158679A1 (en) * 2002-02-12 2004-08-12 Ip-First Llc Prefetch with intent to store mechanism for block memory
US20050262307A1 (en) * 2004-05-20 2005-11-24 International Business Machines Corporation Runtime selective control of hardware prefetch mechanism
US20100011198A1 (en) * 2008-07-10 2010-01-14 Via Technologies, Inc. Microprocessor with multiple operating modes dynamically configurable by a device driver based on currently running applications
US20100205410A1 (en) * 2009-02-12 2010-08-12 Gzero Limited Data Processing
WO2014108754A1 (en) * 2013-01-11 2014-07-17 Freescale Semiconductor, Inc. A method of establishing pre-fetch control information from an executable code and an associated nvm controller, a device, a processor system and computer program products
US20140297919A1 (en) * 2011-12-21 2014-10-02 Murugasamy K Nachimuthu Apparatus and method for implementing a multi-level memory hierarchy

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7441087B2 (en) * 2004-08-17 2008-10-21 Nvidia Corporation System, apparatus and method for issuing predictions from an inventory to access a memory

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030131218A1 (en) * 2002-01-07 2003-07-10 International Business Machines Corporation Method and apparatus for mapping software prefetch instructions to hardware prefetch logic
US20040158679A1 (en) * 2002-02-12 2004-08-12 Ip-First Llc Prefetch with intent to store mechanism for block memory
CN1487409A (en) * 2003-02-11 2004-04-07 智慧第一公司 Allocation of cache memory data section and initial mechanism
US20050262307A1 (en) * 2004-05-20 2005-11-24 International Business Machines Corporation Runtime selective control of hardware prefetch mechanism
US20100011198A1 (en) * 2008-07-10 2010-01-14 Via Technologies, Inc. Microprocessor with multiple operating modes dynamically configurable by a device driver based on currently running applications
US20100205410A1 (en) * 2009-02-12 2010-08-12 Gzero Limited Data Processing
US20140297919A1 (en) * 2011-12-21 2014-10-02 Murugasamy K Nachimuthu Apparatus and method for implementing a multi-level memory hierarchy
WO2014108754A1 (en) * 2013-01-11 2014-07-17 Freescale Semiconductor, Inc. A method of establishing pre-fetch control information from an executable code and an associated nvm controller, a device, a processor system and computer program products

Also Published As

Publication number Publication date
CN105278919A (en) 2016-01-27
CN105278919B (en) 2018-01-19
CN105354010B (en) 2018-10-30

Similar Documents

Publication Publication Date Title
US11221834B2 (en) Method and system of intelligent iterative compiler optimizations based on static and dynamic feedback
US7401329B2 (en) Compiling computer programs to exploit parallelism without exceeding available processing resources
US20190317880A1 (en) Methods and apparatus to improve runtime performance of software executing on a heterogeneous system
US9886384B2 (en) Cache control device for prefetching using pattern analysis processor and prefetch instruction and prefetching method using cache control device
CN109886859B (en) Data processing method, system, electronic device and computer readable storage medium
JP2011527788A (en) Efficient parallel computation of dependency problems
CN100447744C (en) Method and system for managing stack
Jin et al. Exploring data staging across deep memory hierarchies for coupled data intensive simulation workflows
CN104937552A (en) Data analytics platform over parallel databases and distributed file systems
CA2503263A1 (en) Compiler with cache utilization optimizations
Ogilvie et al. Fast automatic heuristic construction using active learning
US7480768B2 (en) Apparatus, systems and methods to reduce access to shared data storage
CN112148472A (en) Method and apparatus for improving utilization of heterogeneous system executing software
US20170193055A1 (en) Method and apparatus for data mining from core traces
TW201629775A (en) Dynamically updating hardware prefetch trait to exclusive or shared at program detection
CN106897123B (en) Database operation method and device
JP5773493B2 (en) Information processing device
CA2762563A1 (en) Data prefetching and coalescing for partitioned global address space languages
US20110145503A1 (en) On-line optimization of software instruction cache
US10185659B2 (en) Memory allocation system for multi-tier memory
CN105354010A (en) Processor and method for executing hardware data by processor
JP6763411B2 (en) Design support equipment, design support methods, and design support programs
CN109976905A (en) EMS memory management process, device and electronic equipment
JP2013101563A (en) Program conversion apparatus, program conversion method and conversion program
EP3391192A1 (en) Broadening field specialization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant