CN1698031A - Method of prefetching data/instructions related to externally triggered events - Google Patents

Method of prefetching data/instructions related to externally triggered events Download PDF

Info

Publication number
CN1698031A
CN1698031A CNA038012367A CN03801236A CN1698031A CN 1698031 A CN1698031 A CN 1698031A CN A038012367 A CNA038012367 A CN A038012367A CN 03801236 A CN03801236 A CN 03801236A CN 1698031 A CN1698031 A CN 1698031A
Authority
CN
China
Prior art keywords
data
processor
commands
scheduler
cache memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA038012367A
Other languages
Chinese (zh)
Other versions
CN100345103C (en
Inventor
安德烈亚斯·多林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN1698031A publication Critical patent/CN1698031A/en
Application granted granted Critical
Publication of CN100345103C publication Critical patent/CN100345103C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)
  • Multi Processors (AREA)

Abstract

Method of prefetching data /instructions related to externally triggered events in a system including an infrastructure (18) having an input interface (20) for receiving data/ instructions to be handled by the infrastructure and an output interface (22) for transmitting data after they have been handled, a memory (14) for storing data/instructions when they are received by input interface, a processor (10) for processing at least some data/instructions, the processor having a cache wherein the data/instructions are stored before being processed, and an external source (26) for assigning sequential tasks to the processor. The method comprises the following steps which are performed while the processor is performing a previous task: determining the location in the memory of data/ instructions to be processed by the processor, indicating to the cache the addresses of these memory locations, fetching the contents of the memory locations and writing them into the cache, and assigning the task of processing the data/instructions to the processor.

Description

The method of the data/commands that is associated with the incident of external trigger of looking ahead
Technical field
The system that relate generally to of the present invention is such, wherein can interrupt being used to handle wherein data/commands and the irrelevant task handling device of previous task, the invention particularly relates to the method for the data/commands that the incident of the external trigger that is used to look ahead is associated such as the external source of the scheduler in network processing unit.
Background technology
The efficient of Modern microprocessor and microcontroller core depends on the efficient of cache memory very much, because instruction cycle time is more much smaller than memory access time.Cache memory utilizes the locality characteristics of storage access, and a storage access that Here it is is more likely near the fact of former access.
Cache memory comprises a mechanism, it is cache controller, be used for loading new content, and for so, cache memory is this behavior slot milling by abandoning old input item to selected zone (cache line).The current software (the Data Cache Block Touch (data cache block contact software) that for example, is used for all devices that meet PowerPC) that can be had the cache memory prefetched instruction of cache controller activates.And, there is the suggestion of cache controller identification such as the conventional access mode of the data structure of linear span or link.Unfortunately, existing method does not contain the incident of external trigger, and the processing of wherein needed in these cases memory content and front is irrelevant.Under these circumstances, be the event source such as interrupt source, scheduler or other processors of allocating task to unique understanding about needed memory content.
Can interrupt being used for handling system with the data of the data independence of formerly handling such as the external source of the scheduler in network processing unit therein, described processor produces a cache miss (cache miss).This means that processor stops to handle the data that need up to it and is loaded onto the cache memory from storer.This has wasted the considerable time.Therefore, for the processor clock speed of current memory technology and 400MHz, each cache miss relates to 36 processor clock cycles, this means about 40 instructions.Because it is stronger than the storer time-delay that current technological trend demonstrates the growth of processor instruction speed.Therefore the loss instruction number of each cache miss increases.
Summary of the invention
Therefore, fundamental purpose of the present invention is to realize a kind of method, is used to the data/commands of looking ahead and being associated with external trigger, so that avoid for the cache miss that can easily determine the data of address.
Therefore the present invention relates to and is used for the look ahead method of the data/commands that is associated with external trigger a system, described system comprises: foundation structure, and it has the input interface that is used to receive the data/commands that will be handled by described foundation structure, the output interface that is used for sending them after data are processed; Storer is used for storing data/commands when interface receives when data/commands is transfused to; Processor is used to handle at least some data/commands, and described processor has cache memory, and wherein data/commands was stored before processed; External source is used for to processor distribution sequence task.Described method comprises the following steps, these steps are performed when carrying out previous task when processor: determining will be by the position of data/commands in storer of processor processing; Indicate the address of these memory locations to cache memory; Obtain the content of memory location and they are write in the cache memory; Task to processor distribution deal with data/instruction.
Description of drawings
To more specific description of the present invention, can understand above-mentioned and other purposes of the present invention, feature and advantage better below reading in conjunction with the drawings, wherein:
Fig. 1 is a block scheme of wherein realizing the network processing system of the method according to this invention.
Fig. 2 is the process flow diagram of the step of expression the method according to this invention.
Embodiment
Such system comprises processor core 10, such as the PowerPC processor core that has been equipped with the data/commands cache memory.Described system is by high performance bus 12 structures, described high performance bus 12 is such as processor local bus (PLB), it is provided to the connection of external memory storage 14 (for example SDRAM), described external memory storage 14 comprises the instruction of the intermediary of data and Memory Controller 16, and Memory Controller 16 waits the independence that bus structure and storer are provided by producing the timing, the refresh signal that for example are necessary.
Bus 12 and storer 14 are also used by foundation structure 18, and foundation structure 18 is handled the packet that receives from network on input interface 20.Foundation structure 18 management comprises packet group dress, memory allocation and release and from the insertion of packet queue and the reception and the transmission of deletion.
Some groupings do not need processed, and are output interface 22 and directly are sent out by network.Other grouping need be handled by processor 10.Search with taxon 24 for one and determine whether that a grouping needs processed and the sort of processing to be performed.In order to handle a data grouping, processor 10 needs several information.For this reason, it need visit the head of grouping and the additional information of generation in foundation structure 18.For example, described foundation structure may have several ports that grouping can arrive, and processor need divide into groups from information where.
Scheduler 26 is handled needs all packets of being handled by processor in one or several formation.These formations needn't physically exist in scheduler.At least the project of the front of each formation need be stored on the chip.This scheduler records processor behavior.When processor grouping of end process, the task that it please be looked for novelty to scheduler.But if management has several formations of different priorities, then scheduler 26 also can interrupt handler be handled the task of higher priority to the task handling of low priority.
Under any circumstance, scheduler 26 is known the next task that processor 10 will be handled.Selected task determines to visit which data.Here under the situation of described network processing unit, in task (project of formation) and at first accessed address, be that relation between packet header and the additional information is very simple.Undertaken from the translation of queued entry by address calculation to a group address.
When processor 10 is handled the data of new grouping and visit such as packet header, if do not use the present invention then cache miss of its common generation.This means that processor will stop to handle, up to loading needed data from external memory storage 14 to cache memory, as mentioned above, this has wasted a large amount of time.Cache memory according to the present invention is looked ahead and has been avoided the speed buffering of the data access that will take place certainly to hit omission, and can determine easily that speed buffering hits the address of omission.
In order to work, must before taking place, visit load needed data by instruction cache.The startup of this behavior is from scheduler, and scheduler uses the direct connection 28 of cache memory.After the position in determining wherein to have stored the storer of packet header and additional information, scheduler distributes the address that will grab cache memory, and cache controller grasps described data the cache memory from storer.After finishing this and writing, perhaps scheduler interrupts processor and when new task has than the higher priority of previous task, distribute new grouping to be used for handling, perhaps scheduler was waited for finishing of previous task before sending new grouping.
The method according to this invention is by flowcharting shown in Figure 2.At first, described foundation structure is waited for and is received new data, i.e. packet (step 30) in described example.The head of grouping is used for classification, and the head and the additional information that obtain from described processing of searching with taxon are stored in (step 32) the external memory storage.Described searching with taxon determines whether that grouping need be by software processes, and determines its priority (step 34).If grouping does not need to handle, cycle of treatment is got back to the data (step 30) of waiting for that reception is new.
When packet need be handled, scheduler need calculate the address in the memory of data that will visit corresponding to processor.In described example, it is the address of packet header and such as the address (step 36) of the additional information of classifier result, input port.These addresses are sent to the data caching controller (step 38) of processor then.The data caching controller writes corresponding data (step 40) in data caching.This is to be undertaken by the staggered storage access of being set up by current packet transaction.
In this stage, handle and to depend on that the grouping that whether just arrived has a higher priority (step 42) than previous.If like this, scheduler interrupts is by the previous task (step 44) of the current execution of processor, and distributes new grouping to be used for handling, and processor begins to handle and finds related data (step 46) in cache memory.If new grouping has higher priority unlike previous, then processor must be finished previous processing (step 46) before handling new grouping (step 48).
Notice that if grouping has under the situation of higher priority, scheduler need be waited for the extracting of finishing data caching before interrupt handler.For this reason, scheduler can be observed the behavior on bus, and wait is done up to all visits that is assigned with.Perhaps, scheduler can be waited for the time of fixed qty, perhaps can use the direct feedback from the cache controller to the scheduler.
Also must be noted that, for wherein as mentioned above first the grouping processing be interrupted so that handle two groupings of second higher priority packets, should all occur on disjoint part of cache memory under two situations.Otherwise the data of looking ahead can be deleted before it is accessed.This can the mapping from the virtual address to the actual address be implemented in processor by using, because use virtual address to come the index cache memory usually.
Though method of the present invention has been described in the network processing unit environment, those of skill in the art can be clear, described method can be used for so any system, is wherein taken place certainly by the visit of processor to some data, and can easily determine its address.In all cases, external event is connected with data more to be processed.Therefore, suppose that new image arrived with the time interval of routine in the robot that uses camera to be used for navigating.The arrival of image is incident, and view data itself is prefetched associated data.
Must be noted that,, can use address bus to be used as external source, because its observed cache coherency that is used for for the microprocessor of standard.In this case, only need an aerial lug to indicate prefetch request.

Claims (8)

1. the method for the data/commands that is associated with external trigger of in a system, looking ahead, described system comprises: foundation structure (18), and it has the input interface (20) that is used to receive the data/commands that will be handled by described foundation structure, the output interface (22) that is used for sending them after data are processed; Storer (14) is used for storage data/commands when data/commands is received by described input interface; Processor (10) is used to handle at least some described data/commands, and described processor has cache memory, and wherein data/commands was stored before processed; External source (26) is used for to described processor distribution sequence task;
Described method is characterised in that, it comprises the following steps, these steps are performed when carrying out previous task when processor:
Determining will be by the position of data/commands in described storer of described processor processing;
Indicate the address of described memory location to described cache memory;
Obtain the content of described memory location and they are write in the described cache memory;
Task to described processor distribution processing said data/instruction.
2. the method for claim 1, wherein said processor (10) is a network processing unit, and data to be processed are in the head of the packet that is received by described foundation structure (18).
3. method as claimed in claim 2, wherein said external source is a scheduler (26) that is directly connected to the described cache memory in the described processor (10), described scheduler is determined the position of data/commands to be processed in described storer (14), and directly indicates described address to described cache memory.
4. method as claimed in claim 3, wherein said scheduler (26) is determined the position of the described data/commands in described storer (14) by calculating described address.
5. as any one described method of claim 2-4, wherein the step of the task of the described prefetch data of allocation process/instruction comprises: interrupt the processing of previous grouping, begin to have than described previous grouping the processing of the new grouping of higher priority.
6. as any one described method of claim 3-5, wherein said cache memory is associated with a cache controller, described cache controller is responsible for obtaining its address by the content of the definite described memory location of described scheduler (26), and they are write in the described cache memory.
7. method as claimed in claim 5, wherein said processor (10) and described scheduler (26) use a processor local bus (PLB), described scheduler obtains the back and interrupts described processor determining to finish data caching, wherein by when data from described storer (14) monitor when returning described bus and accurately observation finish data caching and obtain.
8. system comprises by the adaptive device of realizing according to the step of the method for claim 1-7.
CNB038012367A 2002-03-05 2003-02-27 Method of prefetching data/instructions related to externally triggered events Expired - Fee Related CN100345103C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP02368022 2002-03-05
EP02368022.6 2002-03-05

Publications (2)

Publication Number Publication Date
CN1698031A true CN1698031A (en) 2005-11-16
CN100345103C CN100345103C (en) 2007-10-24

Family

ID=27771964

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB038012367A Expired - Fee Related CN100345103C (en) 2002-03-05 2003-02-27 Method of prefetching data/instructions related to externally triggered events

Country Status (8)

Country Link
JP (1) JP2005519389A (en)
KR (1) KR20040101231A (en)
CN (1) CN100345103C (en)
AU (1) AU2003221510A1 (en)
BR (1) BR0308268A (en)
CA (1) CA2478007A1 (en)
MX (1) MXPA04008502A (en)
WO (1) WO2003075154A2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4837247B2 (en) * 2003-09-24 2011-12-14 パナソニック株式会社 Processor
US8224937B2 (en) * 2004-03-04 2012-07-17 International Business Machines Corporation Event ownership assigner with failover for multiple event server system
JP2008523490A (en) 2004-12-10 2008-07-03 エヌエックスピー ビー ヴィ Data processing system and method for cache replacement
US7721071B2 (en) * 2006-02-28 2010-05-18 Mips Technologies, Inc. System and method for propagating operand availability prediction bits with instructions through a pipeline in an out-of-order processor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619663A (en) * 1994-09-16 1997-04-08 Philips Electronics North America Corp. Computer instruction prefetch system
US5854911A (en) * 1996-07-01 1998-12-29 Sun Microsystems, Inc. Data buffer prefetch apparatus and method
US5761506A (en) * 1996-09-20 1998-06-02 Bay Networks, Inc. Method and apparatus for handling cache misses in a computer system
US6092149A (en) * 1997-05-28 2000-07-18 Western Digital Corporation Disk drive cache system using a dynamic priority sequential stream of data segments continuously adapted according to prefetched sequential random, and repeating types of accesses
US6625654B1 (en) * 1999-12-28 2003-09-23 Intel Corporation Thread signaling in multi-threaded network processor

Also Published As

Publication number Publication date
WO2003075154A3 (en) 2004-09-02
BR0308268A (en) 2005-01-04
CN100345103C (en) 2007-10-24
JP2005519389A (en) 2005-06-30
AU2003221510A1 (en) 2003-09-16
AU2003221510A8 (en) 2003-09-16
WO2003075154A2 (en) 2003-09-12
KR20040101231A (en) 2004-12-02
CA2478007A1 (en) 2003-09-12
MXPA04008502A (en) 2004-12-06

Similar Documents

Publication Publication Date Title
US6779084B2 (en) Enqueue operations for multi-buffer packets
US20160062894A1 (en) System and Method for Performing Message Driven Prefetching at the Network Interface
US8850125B2 (en) System and method to provide non-coherent access to a coherent memory system
US7337275B2 (en) Free list and ring data structure management
US9727469B2 (en) Performance-driven cache line memory access
US7073030B2 (en) Method and apparatus providing non level one information caching using prefetch to increase a hit ratio
US7234004B2 (en) Method, apparatus and program product for low latency I/O adapter queuing in a computer system
US20040024971A1 (en) Method and apparatus for write cache flush and fill mechanisms
US8352712B2 (en) Method and system for specualtively sending processor-issued store operations to a store queue with full signal asserted
CN1382276A (en) Prioritized bus request scheduling mechanism for processing devices
US8578069B2 (en) Prefetching for a shared direct memory access (DMA) engine
US7370152B2 (en) Memory controller with prefetching capability
CN101013402A (en) Methods and systems for processing multiple translation cache misses
US6567901B1 (en) Read around speculative load
US8095617B2 (en) Caching data in a cluster computing system which avoids false-sharing conflicts
US20040059854A1 (en) Dynamic priority external transaction system
US7334056B2 (en) Scalable architecture for context execution
CN100345103C (en) Method of prefetching data/instructions related to externally triggered events
JP2009521054A (en) Dynamic cache management apparatus and method
US20050044321A1 (en) Method and system for multiprocess cache management
US8719542B2 (en) Data transfer apparatus, data transfer method and processor
CN101059787A (en) Entity area description element pre-access method of direct EMS memory for process unit access
US20240330213A1 (en) Variable buffer size descriptor fetching for a multi-queue direct memory access system
US12099456B2 (en) Command processing circuitry maintaining a linked list defining entries for one or more command queues and executing synchronization commands at the queue head of the one or more command queues in list order based on completion criteria of the synchronization command at the head of a given command queue
CN118276990A (en) Multithreading task management system and method, network processor and chip thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20071024