CN1698031A - Method of prefetching data/instructions related to externally triggered events - Google Patents
Method of prefetching data/instructions related to externally triggered events Download PDFInfo
- Publication number
- CN1698031A CN1698031A CNA038012367A CN03801236A CN1698031A CN 1698031 A CN1698031 A CN 1698031A CN A038012367 A CNA038012367 A CN A038012367A CN 03801236 A CN03801236 A CN 03801236A CN 1698031 A CN1698031 A CN 1698031A
- Authority
- CN
- China
- Prior art keywords
- data
- processor
- commands
- scheduler
- cache memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000001960 triggered effect Effects 0.000 title abstract 2
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000003044 adaptive effect Effects 0.000 claims 1
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000005755 formation reaction Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 4
- 230000003139 buffering effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003801 milling Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Advance Control (AREA)
- Multi Processors (AREA)
Abstract
Method of prefetching data /instructions related to externally triggered events in a system including an infrastructure (18) having an input interface (20) for receiving data/ instructions to be handled by the infrastructure and an output interface (22) for transmitting data after they have been handled, a memory (14) for storing data/instructions when they are received by input interface, a processor (10) for processing at least some data/instructions, the processor having a cache wherein the data/instructions are stored before being processed, and an external source (26) for assigning sequential tasks to the processor. The method comprises the following steps which are performed while the processor is performing a previous task: determining the location in the memory of data/ instructions to be processed by the processor, indicating to the cache the addresses of these memory locations, fetching the contents of the memory locations and writing them into the cache, and assigning the task of processing the data/instructions to the processor.
Description
Technical field
The system that relate generally to of the present invention is such, wherein can interrupt being used to handle wherein data/commands and the irrelevant task handling device of previous task, the invention particularly relates to the method for the data/commands that the incident of the external trigger that is used to look ahead is associated such as the external source of the scheduler in network processing unit.
Background technology
The efficient of Modern microprocessor and microcontroller core depends on the efficient of cache memory very much, because instruction cycle time is more much smaller than memory access time.Cache memory utilizes the locality characteristics of storage access, and a storage access that Here it is is more likely near the fact of former access.
Cache memory comprises a mechanism, it is cache controller, be used for loading new content, and for so, cache memory is this behavior slot milling by abandoning old input item to selected zone (cache line).The current software (the Data Cache Block Touch (data cache block contact software) that for example, is used for all devices that meet PowerPC) that can be had the cache memory prefetched instruction of cache controller activates.And, there is the suggestion of cache controller identification such as the conventional access mode of the data structure of linear span or link.Unfortunately, existing method does not contain the incident of external trigger, and the processing of wherein needed in these cases memory content and front is irrelevant.Under these circumstances, be the event source such as interrupt source, scheduler or other processors of allocating task to unique understanding about needed memory content.
Can interrupt being used for handling system with the data of the data independence of formerly handling such as the external source of the scheduler in network processing unit therein, described processor produces a cache miss (cache miss).This means that processor stops to handle the data that need up to it and is loaded onto the cache memory from storer.This has wasted the considerable time.Therefore, for the processor clock speed of current memory technology and 400MHz, each cache miss relates to 36 processor clock cycles, this means about 40 instructions.Because it is stronger than the storer time-delay that current technological trend demonstrates the growth of processor instruction speed.Therefore the loss instruction number of each cache miss increases.
Summary of the invention
Therefore, fundamental purpose of the present invention is to realize a kind of method, is used to the data/commands of looking ahead and being associated with external trigger, so that avoid for the cache miss that can easily determine the data of address.
Therefore the present invention relates to and is used for the look ahead method of the data/commands that is associated with external trigger a system, described system comprises: foundation structure, and it has the input interface that is used to receive the data/commands that will be handled by described foundation structure, the output interface that is used for sending them after data are processed; Storer is used for storing data/commands when interface receives when data/commands is transfused to; Processor is used to handle at least some data/commands, and described processor has cache memory, and wherein data/commands was stored before processed; External source is used for to processor distribution sequence task.Described method comprises the following steps, these steps are performed when carrying out previous task when processor: determining will be by the position of data/commands in storer of processor processing; Indicate the address of these memory locations to cache memory; Obtain the content of memory location and they are write in the cache memory; Task to processor distribution deal with data/instruction.
Description of drawings
To more specific description of the present invention, can understand above-mentioned and other purposes of the present invention, feature and advantage better below reading in conjunction with the drawings, wherein:
Fig. 1 is a block scheme of wherein realizing the network processing system of the method according to this invention.
Fig. 2 is the process flow diagram of the step of expression the method according to this invention.
Embodiment
Such system comprises processor core 10, such as the PowerPC processor core that has been equipped with the data/commands cache memory.Described system is by high performance bus 12 structures, described high performance bus 12 is such as processor local bus (PLB), it is provided to the connection of external memory storage 14 (for example SDRAM), described external memory storage 14 comprises the instruction of the intermediary of data and Memory Controller 16, and Memory Controller 16 waits the independence that bus structure and storer are provided by producing the timing, the refresh signal that for example are necessary.
Some groupings do not need processed, and are output interface 22 and directly are sent out by network.Other grouping need be handled by processor 10.Search with taxon 24 for one and determine whether that a grouping needs processed and the sort of processing to be performed.In order to handle a data grouping, processor 10 needs several information.For this reason, it need visit the head of grouping and the additional information of generation in foundation structure 18.For example, described foundation structure may have several ports that grouping can arrive, and processor need divide into groups from information where.
Under any circumstance, scheduler 26 is known the next task that processor 10 will be handled.Selected task determines to visit which data.Here under the situation of described network processing unit, in task (project of formation) and at first accessed address, be that relation between packet header and the additional information is very simple.Undertaken from the translation of queued entry by address calculation to a group address.
When processor 10 is handled the data of new grouping and visit such as packet header, if do not use the present invention then cache miss of its common generation.This means that processor will stop to handle, up to loading needed data from external memory storage 14 to cache memory, as mentioned above, this has wasted a large amount of time.Cache memory according to the present invention is looked ahead and has been avoided the speed buffering of the data access that will take place certainly to hit omission, and can determine easily that speed buffering hits the address of omission.
In order to work, must before taking place, visit load needed data by instruction cache.The startup of this behavior is from scheduler, and scheduler uses the direct connection 28 of cache memory.After the position in determining wherein to have stored the storer of packet header and additional information, scheduler distributes the address that will grab cache memory, and cache controller grasps described data the cache memory from storer.After finishing this and writing, perhaps scheduler interrupts processor and when new task has than the higher priority of previous task, distribute new grouping to be used for handling, perhaps scheduler was waited for finishing of previous task before sending new grouping.
The method according to this invention is by flowcharting shown in Figure 2.At first, described foundation structure is waited for and is received new data, i.e. packet (step 30) in described example.The head of grouping is used for classification, and the head and the additional information that obtain from described processing of searching with taxon are stored in (step 32) the external memory storage.Described searching with taxon determines whether that grouping need be by software processes, and determines its priority (step 34).If grouping does not need to handle, cycle of treatment is got back to the data (step 30) of waiting for that reception is new.
When packet need be handled, scheduler need calculate the address in the memory of data that will visit corresponding to processor.In described example, it is the address of packet header and such as the address (step 36) of the additional information of classifier result, input port.These addresses are sent to the data caching controller (step 38) of processor then.The data caching controller writes corresponding data (step 40) in data caching.This is to be undertaken by the staggered storage access of being set up by current packet transaction.
In this stage, handle and to depend on that the grouping that whether just arrived has a higher priority (step 42) than previous.If like this, scheduler interrupts is by the previous task (step 44) of the current execution of processor, and distributes new grouping to be used for handling, and processor begins to handle and finds related data (step 46) in cache memory.If new grouping has higher priority unlike previous, then processor must be finished previous processing (step 46) before handling new grouping (step 48).
Notice that if grouping has under the situation of higher priority, scheduler need be waited for the extracting of finishing data caching before interrupt handler.For this reason, scheduler can be observed the behavior on bus, and wait is done up to all visits that is assigned with.Perhaps, scheduler can be waited for the time of fixed qty, perhaps can use the direct feedback from the cache controller to the scheduler.
Also must be noted that, for wherein as mentioned above first the grouping processing be interrupted so that handle two groupings of second higher priority packets, should all occur on disjoint part of cache memory under two situations.Otherwise the data of looking ahead can be deleted before it is accessed.This can the mapping from the virtual address to the actual address be implemented in processor by using, because use virtual address to come the index cache memory usually.
Though method of the present invention has been described in the network processing unit environment, those of skill in the art can be clear, described method can be used for so any system, is wherein taken place certainly by the visit of processor to some data, and can easily determine its address.In all cases, external event is connected with data more to be processed.Therefore, suppose that new image arrived with the time interval of routine in the robot that uses camera to be used for navigating.The arrival of image is incident, and view data itself is prefetched associated data.
Must be noted that,, can use address bus to be used as external source, because its observed cache coherency that is used for for the microprocessor of standard.In this case, only need an aerial lug to indicate prefetch request.
Claims (8)
1. the method for the data/commands that is associated with external trigger of in a system, looking ahead, described system comprises: foundation structure (18), and it has the input interface (20) that is used to receive the data/commands that will be handled by described foundation structure, the output interface (22) that is used for sending them after data are processed; Storer (14) is used for storage data/commands when data/commands is received by described input interface; Processor (10) is used to handle at least some described data/commands, and described processor has cache memory, and wherein data/commands was stored before processed; External source (26) is used for to described processor distribution sequence task;
Described method is characterised in that, it comprises the following steps, these steps are performed when carrying out previous task when processor:
Determining will be by the position of data/commands in described storer of described processor processing;
Indicate the address of described memory location to described cache memory;
Obtain the content of described memory location and they are write in the described cache memory;
Task to described processor distribution processing said data/instruction.
2. the method for claim 1, wherein said processor (10) is a network processing unit, and data to be processed are in the head of the packet that is received by described foundation structure (18).
3. method as claimed in claim 2, wherein said external source is a scheduler (26) that is directly connected to the described cache memory in the described processor (10), described scheduler is determined the position of data/commands to be processed in described storer (14), and directly indicates described address to described cache memory.
4. method as claimed in claim 3, wherein said scheduler (26) is determined the position of the described data/commands in described storer (14) by calculating described address.
5. as any one described method of claim 2-4, wherein the step of the task of the described prefetch data of allocation process/instruction comprises: interrupt the processing of previous grouping, begin to have than described previous grouping the processing of the new grouping of higher priority.
6. as any one described method of claim 3-5, wherein said cache memory is associated with a cache controller, described cache controller is responsible for obtaining its address by the content of the definite described memory location of described scheduler (26), and they are write in the described cache memory.
7. method as claimed in claim 5, wherein said processor (10) and described scheduler (26) use a processor local bus (PLB), described scheduler obtains the back and interrupts described processor determining to finish data caching, wherein by when data from described storer (14) monitor when returning described bus and accurately observation finish data caching and obtain.
8. system comprises by the adaptive device of realizing according to the step of the method for claim 1-7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02368022 | 2002-03-05 | ||
EP02368022.6 | 2002-03-05 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1698031A true CN1698031A (en) | 2005-11-16 |
CN100345103C CN100345103C (en) | 2007-10-24 |
Family
ID=27771964
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB038012367A Expired - Fee Related CN100345103C (en) | 2002-03-05 | 2003-02-27 | Method of prefetching data/instructions related to externally triggered events |
Country Status (8)
Country | Link |
---|---|
JP (1) | JP2005519389A (en) |
KR (1) | KR20040101231A (en) |
CN (1) | CN100345103C (en) |
AU (1) | AU2003221510A1 (en) |
BR (1) | BR0308268A (en) |
CA (1) | CA2478007A1 (en) |
MX (1) | MXPA04008502A (en) |
WO (1) | WO2003075154A2 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4837247B2 (en) * | 2003-09-24 | 2011-12-14 | パナソニック株式会社 | Processor |
US8224937B2 (en) * | 2004-03-04 | 2012-07-17 | International Business Machines Corporation | Event ownership assigner with failover for multiple event server system |
JP2008523490A (en) | 2004-12-10 | 2008-07-03 | エヌエックスピー ビー ヴィ | Data processing system and method for cache replacement |
US7721071B2 (en) * | 2006-02-28 | 2010-05-18 | Mips Technologies, Inc. | System and method for propagating operand availability prediction bits with instructions through a pipeline in an out-of-order processor |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5619663A (en) * | 1994-09-16 | 1997-04-08 | Philips Electronics North America Corp. | Computer instruction prefetch system |
US5854911A (en) * | 1996-07-01 | 1998-12-29 | Sun Microsystems, Inc. | Data buffer prefetch apparatus and method |
US5761506A (en) * | 1996-09-20 | 1998-06-02 | Bay Networks, Inc. | Method and apparatus for handling cache misses in a computer system |
US6092149A (en) * | 1997-05-28 | 2000-07-18 | Western Digital Corporation | Disk drive cache system using a dynamic priority sequential stream of data segments continuously adapted according to prefetched sequential random, and repeating types of accesses |
US6625654B1 (en) * | 1999-12-28 | 2003-09-23 | Intel Corporation | Thread signaling in multi-threaded network processor |
-
2003
- 2003-02-27 AU AU2003221510A patent/AU2003221510A1/en not_active Abandoned
- 2003-02-27 MX MXPA04008502A patent/MXPA04008502A/en unknown
- 2003-02-27 JP JP2003573543A patent/JP2005519389A/en active Pending
- 2003-02-27 BR BR0308268-7A patent/BR0308268A/en not_active IP Right Cessation
- 2003-02-27 CN CNB038012367A patent/CN100345103C/en not_active Expired - Fee Related
- 2003-02-27 KR KR10-2004-7012736A patent/KR20040101231A/en not_active Application Discontinuation
- 2003-02-27 CA CA002478007A patent/CA2478007A1/en not_active Abandoned
- 2003-02-27 WO PCT/EP2003/002923 patent/WO2003075154A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2003075154A3 (en) | 2004-09-02 |
BR0308268A (en) | 2005-01-04 |
CN100345103C (en) | 2007-10-24 |
JP2005519389A (en) | 2005-06-30 |
AU2003221510A1 (en) | 2003-09-16 |
AU2003221510A8 (en) | 2003-09-16 |
WO2003075154A2 (en) | 2003-09-12 |
KR20040101231A (en) | 2004-12-02 |
CA2478007A1 (en) | 2003-09-12 |
MXPA04008502A (en) | 2004-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6779084B2 (en) | Enqueue operations for multi-buffer packets | |
US20160062894A1 (en) | System and Method for Performing Message Driven Prefetching at the Network Interface | |
US8850125B2 (en) | System and method to provide non-coherent access to a coherent memory system | |
US7337275B2 (en) | Free list and ring data structure management | |
US9727469B2 (en) | Performance-driven cache line memory access | |
US7073030B2 (en) | Method and apparatus providing non level one information caching using prefetch to increase a hit ratio | |
US7234004B2 (en) | Method, apparatus and program product for low latency I/O adapter queuing in a computer system | |
US20040024971A1 (en) | Method and apparatus for write cache flush and fill mechanisms | |
US8352712B2 (en) | Method and system for specualtively sending processor-issued store operations to a store queue with full signal asserted | |
CN1382276A (en) | Prioritized bus request scheduling mechanism for processing devices | |
US8578069B2 (en) | Prefetching for a shared direct memory access (DMA) engine | |
US7370152B2 (en) | Memory controller with prefetching capability | |
CN101013402A (en) | Methods and systems for processing multiple translation cache misses | |
US6567901B1 (en) | Read around speculative load | |
US8095617B2 (en) | Caching data in a cluster computing system which avoids false-sharing conflicts | |
US20040059854A1 (en) | Dynamic priority external transaction system | |
US7334056B2 (en) | Scalable architecture for context execution | |
CN100345103C (en) | Method of prefetching data/instructions related to externally triggered events | |
JP2009521054A (en) | Dynamic cache management apparatus and method | |
US20050044321A1 (en) | Method and system for multiprocess cache management | |
US8719542B2 (en) | Data transfer apparatus, data transfer method and processor | |
CN101059787A (en) | Entity area description element pre-access method of direct EMS memory for process unit access | |
US20240330213A1 (en) | Variable buffer size descriptor fetching for a multi-queue direct memory access system | |
US12099456B2 (en) | Command processing circuitry maintaining a linked list defining entries for one or more command queues and executing synchronization commands at the queue head of the one or more command queues in list order based on completion criteria of the synchronization command at the head of a given command queue | |
CN118276990A (en) | Multithreading task management system and method, network processor and chip thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20071024 |