GB2528842A - A data processing apparatus, and a method of handling address translation within a data processing apparatus - Google Patents

A data processing apparatus, and a method of handling address translation within a data processing apparatus Download PDF

Info

Publication number
GB2528842A
GB2528842A GB1413397.9A GB201413397A GB2528842A GB 2528842 A GB2528842 A GB 2528842A GB 201413397 A GB201413397 A GB 201413397A GB 2528842 A GB2528842 A GB 2528842A
Authority
GB
United Kingdom
Prior art keywords
page table
circuitry
request
walk
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1413397.9A
Other versions
GB201413397D0 (en
GB2528842B (en
Inventor
Andreas Hansson
Ali Ghassan Saidi
Nagendran Udipi Aniruddha
Stephan Diestelhorst
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Priority to GB1413397.9A priority Critical patent/GB2528842B/en
Publication of GB201413397D0 publication Critical patent/GB201413397D0/en
Priority to PCT/GB2015/051809 priority patent/WO2016016605A1/en
Priority to US15/325,250 priority patent/US10133675B2/en
Priority to CN201580039538.6A priority patent/CN106537362B/en
Publication of GB2528842A publication Critical patent/GB2528842A/en
Application granted granted Critical
Publication of GB2528842B publication Critical patent/GB2528842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6022Using a prefetch buffer or dedicated prefetch cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6028Prefetching based on hints or prefetch instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/651Multi-level translation tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/654Look-ahead translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/681Multi-level TLB, e.g. microTLB and main TLB
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/684TLB miss handling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A memory access request is issued by processing circuitry 12, 50, 55 of a data processing apparatus and specifies a virtual address for a data item. Address translation circuitry 14, 60 performs address translation with reference to a descriptor provided by a page table and produces a modified memory access request specifying a physical address for the data item. The address translation circuitry includes page table walk circuitry 18, 64 which generates a page table walk request to retrieve the descriptor. Walk ahead circuitry 35, located between the address translation circuitry and a memory device 40 containing the page table 45, comprises detection circuitry used to detect a memory page table walk request generated by the page table walk circuitry for a descriptor in a page table. The walk ahead circuitry has request generation circuitry which generates a prefetch memory request for data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request. This prefetched data may be another descriptor required as part of the address translation process, or may be the actual data item being requested by the processing circuitry.

Description

A DATA PROCESSING APPARATUS, AND A METHOD OF HANDLING
ADDRESS TRANSLATION WITHIN A DATA PROCESSING APPARATUS
FIEI..D OF THE INVENTK)N The present invention relates to a data processing apparatus, and to a method of handling address translation within such a data processing apparatus.
BACKGROUND OF TE-ifi INVENTION
Within a data processing system, when a master device wishes to perform read or write operations. the master device will typically issue an access request specifying a virtual address for the data item to be read or written, This virtual address then needs to be translated into a physical address witnin a memory device in order to identify the actual physical location in memory from which the data item is to be read or Lo which the data item is to be written.
There will typically be various components residing in the path between the master device and the memory device, for example various levels of cache, various interconnect structures, etc., and typically the address translation is performed by a memory management unit residing iii close proximity to the master device along the path between the master device and the memory device.
Such a memory management unit (MMIJ) will typically include a iranslation lookaside butler (TLB) structure for holding descriptor information obtained from page tables residing in the memory device, cacti descriptor providing information used to translate a portion of the virtual address to a corresponding portion of the physical address. If for a particular portion of a virtual address under consideration, there is no corresponding descriptor stored within the 1113, then page table walk circuitry within the Nfl\'IU is typically used to perform a page table walk process in order to obtain the required descriptor from tile memory device to enable the address translation proc to he performed, En association with a master device's M1VRJ, it is known to implement prefetclung mechanisms that seek to detect patterns between the various different access requests being, issued by the master device, and based on those patterns to prefetch descriptor information into the TLB to thereby seek to avoid the laLency/performance issues that occur when a descripLor is not available in the TLB for a future access request, and hence needs to he retrieved via the page table walk process. However, whilst such pattern recognition based prefetching mechanisms are useful, and can help to reduce latency, there are still other aspects of the virtual to physical address translation process that can introduce latency issues when seeking to process any individual access request.
In particular, considering a.n individual access request, a portion of the specified virtual address will typically he used in combination with a page table base address to identify a physical address for a descriptor that will he needed as part of the address translation process. At a minimum, once that descriptor has been obtained (via a page table walk process if necessary), then that descriptor will need to be used in combination with another portion of the virtual address to identify the actual physical address of the data item that is to be read or written, Accordingly, even in this sinipe case, there may be a need to access the memoly device twice in order to process the read or write operation, once to retrieve the descriptor via a page table walk process, and once to actua]iy access the data item.
hi modern data processing systems, the number of accesses to memory that may be required when processing a single access request can incre.ase significantly over the simple case referred to above. In particular, in modern data processing systems, where the size of die memory device is getting larger and larger, it is known to use multiple levels of page tables when performing the address translation process.
In particular, at a first page table level, a portion of the virtual address may be combined with a page table base address to identify a physical address of a descriptor that is required as part of the address translation process. However, once that descriptor has been obtained, then that descriptor is used in combination with another portion of the virtual address to identify a descriptor in an additional page table at a further page table level, This process can be repeated multiple limes beibre a final level of the page table hierarchy is reached, with the descriptor obtained from that final page tah]e [eve] then being combined with another virtual address portion in order to identify the physical address of the data item to be accessed.
Thus, it will he appreciated that even when considering a single access request., the address translation process may require the memory device to he accessed multiple times, and this can ve rise to significant latency issues. Accordingly, it would be desirable to provide a mechanism that can alleviate the latency issues associated with the multiple stages of address translation required when processing each individual memory access request.
SUMMARY OF THE IN\TENTION
Viewed from a first aspect, the present invention provides a data processing apparatus comprising: processing circuitry configured to issue a memory access request specifying a virtual address for a data item; address translation circuitry conflgured to perform an address translation process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifing a physical address for the data item, the address translation circuitry including page table walk circuitry configured to generate at least one memory page table walk request in order to retrieve the a.t least one descnptor required for the address translation process; walk ahead circuitry located in a path between the address translation circuitry and a memory device containing the at least one page table, the walk ahead circuitry comprising: detection circuitry configured to detect a memory page table walk request generated by the page table walk circuitry of the address translation circuitry for a descriptor in a page table, and further request generation circuitry configured to generate a prefetch memory request in order to prefetch data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
In accordance wth the present invention, walk ahead circuitry is provided that is located in a path between the address translation circuitry and the memory device. When the page table walk circuitry of the address translation circuitry issues a memory page table walk request in order to retrieve a descriptor in a page table, the walk ahead circuitry detects that memory page table walk request. Then, once the descriptor being requested is available (For example by virtue of ii being retrieved from the memory device or being buffered in some storage structure accessible by the walk ahead circuitry), the walk ahead circuitry is configured to generate a prefètch memory request in order to prefetch data from the memory device at a physical address detennined with reference Lo that descriptor. Hence, the walk ahead circuitry speculatively performs at least one additional stage of the address translation process in order to prefetch the data at that next stage of the address translation process. That prefetched data may in fact be the actual data item that is the subject of the ongtnal memory access request, or may he another descriptor required by the address translation process.
By such a process, once the page table walk circuitry within the address translation circuitry receives tile descriptor that it had requested via the memory page table walk request, then when it issues a subsequent request based on that descriptor (whether that be another memory page table walk request for a descriptor at the next level of the address translation process, or a request for the actual data item), then that descriptor or data item will be available with less latency, due to the fact that it has aLready been prefetched from the memory device by the walk ahead circuitry. This can 1 0 hence significantly reduce the latency of the address translation process for each individual memory access request.
There are a number of ways in which the further request generation circuitry within the walk ahead circuitry can be configured to generate a prefetch memory request.
In one embodiment, the page table walk circuitry is configured to include, within the detected memory page table walk request. additional infbrniation not required to retrieve the descriptor requested by that detected memory page table walk request, and the further request generatIon circuitry is configured to use that additional information when generating the prefetch memory request, The additional information thaL the page table walk circuitry includes within the memory page table walk request can take a variety of forms. However, in one embodiment, the page table walk circuitry is configured to use a portion of the virLual address in order to determine a descriptor address, and tc include within the detected page table walk request that descriptor address. In addition, the page table walk circuitry is further configured to include, as said additional infbrmation, a further portion of the virtual address. Flence, in such embodiments, once the descriptor being requested by the detected memory page table walk request is available, die Further request generation circuitry can use that descriptor in combination with the further portion of the virtual address in order to determine the address to he specified in association with the prefetch memory request, and thus identify the data to prefetched from the memory device in response to that prefetch memory request Whilst the address translation process can talce a vatety of forms, in one embodiment the address translation circuitry is configured to perform, as the address translation process, a multi4evel address translation process with reference to descriptors provided by a plurality of page tables configured in multiple hierarchical levels, and the page table walk circuitry is configured to generate memory page table walk requests in order to retrieve the descriptors required for [lie TrniltF-ievei address translation process.
The memory page table walk request detected by the detection circuitry is for a descriptor in a pa.ge table at one hierarchical level, and the further request generation circuitry is configured to generate as the prefetch memory request, for each of at least one subsequent hierarchical level, a prefetch memory page table walk request in order to prefetch an associated descriptor in a page table at that subsequent hierarchical level.
1 0 Hence, in such embodiments the walk ahead circuitry is used to prefètcti one or more descriptors at subsequent hierarchical levels of the page table hierarchy, so that if the address translation circuitry subsequently issues memory page table walk requests [hr those descriptors, they will have been prefetehed from the memory device and accordingly can be provided with significantly reduced latency back to the address translation circuitry, hence speeding up the address translation process.
In one embodiment, the further request generation circuitry is configured to determine a descriptor address for the associated descriptor in a page table at a first subsequent hierarchical level with reference to said further portion of the virtual address and the descriptor retrieved as a result of the memory device processing the detected memory page table walk request. The further request generation circuitry is then further configured to include the determined descriptor address within the generated prefetch memory page table walk request for said first subsequent hierarchical level.
Furthermore, in one embodiment, for each additional subsequent hierarchical level, the further request generation circuitry is configured to determine a descriptor address for the associated descriptor in a page table at that additional subsequent hierarchical level with reFerence to said i'urther portion of the virtual address and the descriptor obtained as a result of the memory device processing the prefetch memory page table walk request fbr a preceding subsequent hierarchical Level.
Hence, the operation of the further request generation circuitry can be repealed iteratively for each subsequent hierarchical level, at each level the further request generation circuitw using a flrther portion of the virtual address and the descriptor obtained for the previous hierarchical level.
Lit embodiments where the further request generation circuitry generates a prefetch memory page table walk request for each of the multipLe subsequent hierarchical levels, the page table walk circuitry may further be configured to include within the detected page table walk request, level indication data. used by the fUrther request circuitry to determine which bits oldie further portion of the %rtual address to use when generating the prefetch memory page table walk request at each of the multiple subsequent hiera.rclucai levels En particular, it will typically he the case that different bits of the fUrther portion of the virtual address are used at each different hierarchical level.
In one embodiment, for a fina' hierarchical level, the fUrther request generation 1 0 circuitry may fUrther be configured to generate a prefetch modified memory access request specif\'ing a physical address fUr the data item in order to prefetch the data item.
Hence, in such embodiments, the walk ahead circuitry can he used not only to prefeteh descriptors a.t subsequent hierarchical levels of the page table hierarchy, but can also he used to prefetch the actual data item that the processing circuitry is seeking to access.
In one embodiment, the walk ahead circuitry further includes a walk ahead storage structure configured to store the associated descriptor retrieved from the memory device as a result of each prefetch memory page table walk request. In embodiments where the ultimate data item that the processing circuitry is seeking to access is also prefetched, then the walk ahead storage structure may also be used to store that prefetched data item as retrieved from the memory device.
The walk ahead sLorage structure can take a variety of forms, hut in one embodiment is configured as a cache. For each memory page table walk request and/or modified memory access request issued by the address translation circuitry, a lookup can then be perfbnned in the cache to determine whether the required descdptor or data item is present in the cachc i.e. whether it has been prefetched. If it has, then that descriptor or data item can be returned to the address translation circwitry directly from tIe cache without the requirement for any further memory device access.
Whilst in on.e embodiment the walk ahead circuitry may be configured to pretètch descnptors f.or each of the subsequent hierarchical levels of the page table hierarchy, in an alternative embodiment the walk ahead circuitry may be configured to be responsive to control information to detennine the number of subsequent hierarchical levels for which associated descriptors are prefetched ahead of a current hierarchical level for which the page table walk circuitry has generated a memory page table walk request.
The control information can take a variety of forms, and in one embodiment can he a simple count value identifying the number of hierarchical evels for which associated descriptors should be prefetched, Whilst [lie walk ahead circuitry is merely prefetching information, and if it prefetches more information than is actually needed there is no adverse consequence on the correct operation of the system, there is power consumed in performing the prefetching and accordingly in certain situations, for example where there are many different hierarchical levels, it may he appropriate to not allow the prefetching 1 0 to get too many stages ahead of the openition of the page table walk circuitry within the address translation circuitry itself. For example, it may be the case that certain descriptor intormation, andior the requested data item may be cached at other places within the system closer to the processing circuitry than the walk ahead circuitry, such that at one or more subsequent levels the associated memory page table walk request or the modified meirioly access request is not propagated as far as the walk ahea.d circuitry and accordingly does not require any action by the walk ahead circuitry. Accordingly, the control infonnation can he configured so as to qualify how many hierarchical levels are prefetched ahead of the actual hierarchical level being considered by the page table walk circuitry, to reduce cc possibility of power being, consumed unnecessarily during the I, -u prefetcnmg process, by seeking to reduce the prospect of prefetching intormanon that is not actually required.
The walk ahead circuitry may be located in a variety of places within the data process apparatus, but in one embodiment is provided within a memory controller associated with the memory device. f:Ience in such embodiments the walk ahead circuitry is provided in close proximity to the memory device itself.
In one such embodiment, the wak ahead circuitry may be configured t.o reuse at.
least one existing component of the memory controller, thereby reducing the cost associated with providing the walk ahead circuitry. In one particular embodimerit the walk ahead storage structure of the walk ahead circuitry is provideC by a read data queue within the memory controller, hence avoiding the need for the provision of a separate walk ahead storage structure, Ln one embodiment, a descriptor provided in a page table at one hierarchical level provides a base address for a page table in a subsequent hierarchical level. Further, in one embodiment, a descriptor provided in a page table at a final hierarchical level provides a base address for a memory page containing the data item associated vAth the virtual S address specified in the memory access request.
There are a number of ways in which the detection circuitry of the walk ahead circuitry can be arranged to detect a memory page table walk request. For example, where that memory page table walk request includes additional information not required to retrieve the descriptor requested by that memory page table waLk request. the presence of that additional information may itself be used to detect the request as being a memory page table walk request for winch some prefetching should he performed. Flowever. in an alternative enibodirnent, the page table walk crcuitry is configured to include, withjn the detected memory page table walk request, a flag field set to identify the request as a memory page table walk request. This provides a simple mechanism for detecting such 1 5 mneiriory page table walk. requests.
Viewed from a second aspect, the present invention provides walk ahead circuitry for use in a data processin.g apparatus having processing circuitry for issuing a memory access request specifying a virtual address for a data item, and address translation circuitry for performing an address translation process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifying a physical address for the dara item, the address translation circuitry generating at least one memory page table walk request in order to retrieve the at least one descriptor required for the address translation process, the walk ahead circuitry being configured for locating in a path between the address translation circuitry and a memory device containing the at least one page table, and comprising: detection circuitry configured to detect a memory page table walk request generated by the address trans'ation circuitry for a descriptor in a page table; and further request generation circuitry configured to generate.a prefetch. memory request in order to prefetch data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
Viewed from a third aspect the present invention provides a walk ahead circuit comprising: detection circuitry configured to detect a memory page table walk request generated by page table walk circuitry of an address translation circuit for a descriptor in a page table; and thrther request generation circuitry configured to generate a prefetch memory request in order to prefetch data from a memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
Viewed from a fiiurth aspect, the present invention provides a method of handling address translation within a data processing apparatus, comprising: issuing from processing circuitry a memory access request specifying a virtual address for a data item; employing address translation circuitry to peform an address translation process with 1 0 reference to at least one descriptor provided by at least one page table, in order to produce a modtied memory access request specifying a physical address ffir the data item, including generating at least one memory page table walk request in order to retrieve the at least one descriptor required for the address translation process; employing walk ahead circuitry located in a path between the address translation circuitry and a memory device containing the at least one page table, to: detect a memory page table walk request generated by the page table walk circuitry of the address translation circuitry for a descriptor in a page table; and to generate a prefetch memory request in order to prefetch data from the memory device at a physical address detennined with reference to the descripLor requesLed by the deiecttd memory page table walk request.
Viewed from a fifth aspect, the present invention provides a data processing appararus compnsing: processing means for issuing a memory access request specifying a virtual address for a data item; address translation means fOr performing an address translation process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifying a physical address fOr the data item, the address translation means including page table walk means for generating at least one memory page table walk request in order to retrieve the at least one descriptor required for the address translation process; walk ahead means for locating in a path between the address translation means and a memory device containing the at Least one page table, the walk ahead means comprising: detection means for detecting a memory page table walk request generated by the page table walk means of the address translation means for a descriptor in a page table; and further request generation means for generating a prefetch memory request in order to prefetch data from the niemory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
BRIEF DESCRHTION OF 11W DRAWDGS The present invention will be described further, by \.vay of example only. with reference to ernbodimenLs thereof as illustrated in the accompanying drawings, in which: Figure 1 is a block diagram of a data processing apparatus in accordance with one embodiment; Figure 2 illustrates multi-level address translation process performed in accordance with one embodiment; Figure 3 is a block diagram illustrating in more detail elements provided within the walk' ahead circuitry &fFigure I in accordance with one embodiment; Figures 4A to 4C illustrates various formats of page table walk read requests that may be used in accordance with one embodiment; Figure 5 is a flow diagram illustrating the operation of the walk ahead circuitry of Figure 1 in accordance with one embodiment; Figure 6 iLlustrates how the walk ahead circuitry may be incorporated within a memory controller in accordance with one embodiment; and Figure 7 illustrates how the walk ahead circuitry may be incorporated within a memory controller in accordance with an alternative embodiment.
DESCRWTION OF ENifiODIIvWNTS Figure 1 is a block diagram of a data processing apparatus in accordance with one embodiment. Within a data processing apparatus, there will typically he at least one instance, hut often multiple instances, of processing circuitry that can issue memory access requests in order to perform read and write operations. each memory access request specifying a virtual address for the data item to be read or written.
Three example of such processing circuitry are shown in Figure 1, In particular, processing circuitry 12 may be provided as part of a processor 10., that also includes an associated memory management unit (?vL\'IU) 14 and one or more levels of cache 20, 22, The processor 10 may form a central processing unit (CPU) of the daLa processing apparatus.
H
As also shown in Figure 1, devices 50. 55 may provide additional instances of such processing circuitry. These devices can take a variety of forms, for example a graphics processing unit (GPU), a network device, etc. Such devices will also have an associated MMU, and by way of example the devices. 5O 55 are assumed to share a system MMU 60.
As shown in Figure 1. the various processing circuits are connected via an interconnect structure 25 with a memory device 40, which may for example be a Dynamic Random Access Memory (DRAM) device. One or more further levels of cache may reside in the path between the interconnect 25 and tile memory device 40, such as the level 3 cache 30 shown in Figure 1, The memory management units 14. 60 are used to perform art address translation process in order to Lransiate Lhe virtual address specified by a memory access request from the associated processing circuitry into a physical address identifying a location within the memory device containing the data item that is the is subject of the meirnory access request. The address translation process is performed with reference to at least one descriptor provided by at least one page table, the page tables typically residing in the memory device, as illustrated by the page tables 45 shown in Figure 1. Page table walk circuitry 18, 64 is used to issue memory page table walk requests to the memory device 40 in order to retrieve individual descripicrs required to perform the address translation process for a received memory access request from the associated processing circuitry. These descriptors can be buffered locally within the translation lookaside buffer (TLB) structures 16, 62. If the required descriptor already resides in the TL.B, then the portion of the address translation processing using that descriptor can be performed without delay. However, if the descriptor that is required is not present in the TLB then it first must be retrieved by the page lahie walk circuitry issuing an appropriate page Lable walk requesi.
Sometimes the descriptors required by a particular page table walk request may be cached in one of the levels of cache 20, 22, 30 within the system, and accordingly it is not necessary to access the memory device. However, in other instances it will be necessary for the page table walk request to he propagated all the way through to the memory device in order to retrieve the required descriptor, which is then returned to the MM1J for storing in the associated TLB, and for use in the address translation process.
It is known to provide certain prefetching mechanisms with the MIs{Us 14, 60, in order to seek to identify patterns of access requests issued by the associated processing circuits. This can be used to seek to retrieve into the TLB a descriptor that may subsequently be needed by a future access request that has not yet been issued by the processing circuitry. However, whilst this can assist in reducing latency depending on the accuracy of the pattern detection mechanisms, another issue that gives rise to significant latency results from the address translation process itself required in connection with each individual memory access request. In particular, the address translation process is often performed in multiple stages. A first portion of the virtual address may be used in combination with a page table base address to identify the physical address of a descriptor in that page table. That descriptor then needs to be retrieved, whereafter some address information specified by that descriptor is combined with another portion of the virtual address in order to identify a further address that needs to be accessed as part of the address translation process. In a simple case where only a single level of page table is used, this latter address may itself identi& the data item that needs to be retrieved. However; even in that case, it will be appreciated that there are potentially two separate accesses that need to be made to the memory device in order to access the required data item.
Furthennore, in modem systems it is often the case that the page tables are arranged in multiple hierarchical levels, such that a multilevel address translation process is performed with reference to multiple different page tables. In particular, the descriptor retrieved from memory for one or more levels of the page table may itself identifr a base address for another page table, with that base address being combined with another portion of the virtual address in order to identify a location of a fwTher descriptor required as part of the address translation process. When that descriptor is returned, it may again be combined with another portion of the virtual address in order to identify a further descriptor at another level of the page table hierarchy that is also required as part of the address translation process. This process can iterate through multiple page table levels before the final level is reached, where the descriptor retrieved is combined with another portion of the virtual address in order to identify the actual address of the data item requiring access by the processing cIrcuitry, Hence, it will be appreciated that when processing each individual memory access request, it may be necessary to perform ntuitiple accesses to the memory device 40. and this can give rise to significant latency issues.
in accordance with one embodiment, as will he described in more detail below, waik ahead circuitry 35 is provided in a path between the address translation circuitry (provided by the MMTJs, 14, 60) and the memory device 40, wiuich is arranged to detect a memory page table walk request genrated by the page table walk circuitry 18, 64. Once the descriptor specified by that memory page table waft request is available to the valk ahead circuitry, for example once it has been retrieved from the memory device 40, or if it is already cached within some structure available to the walk ahead circuitry 35. the walk ahead circuitry is then arranged to generate a prefetch memory request in order to prefetch data from the memory device at a physica.i address determined with reference to that descriptor. This data may itself be another descriptor required as part of the address translation process from another hierarchical level of the page table hierarchy, or may be the actual data item that the processing circuitry is seeking to access.
As a result, once the originally requested descriptor has been returned to the relevant MMU, if that descriptor information is then used to generate a further request, whether that be a further memory page table walk request. or a modified memory access request specifying the physical address of the data item, the walk ahead circuitry may be able to intercept that request and provide the required data directly, without the need to perform a further access to the memory device, hence significantly reducing latency, Figure 2 schematically illustrates the multi-level address translation process adopted in one embodimem. The incoming virtual address 100 specifled as part of a memory access request issued by the processing circuitry can consist of virtual address portions 102, 104, 106. The first portion of the virtual address 102 is combined with a base address for a level one page table 110 (typically that base address being stored in a register accessible to the page table walk circuitry of the MMU) using the combinatorial circuitry 115 in order to generate an address 120 identifying a particular descriptor 130 within the level one page table 125. The combinatorial circuit 115 can take a variety of forms, but in one embodiment merely serves to add the virtual address as an offset to the base address in order to identify the relevant descriptor 130. The identified descriptor 130 actually provides a base address for a level 2 page table.
Accordingly, once that descriptor has been obtained, a further page table walk request S can he issued specifying an address 137 determined by the combinatorial circuitry 35 based on that base a.ddress retrieved from the descriptor 130 and a second virtual address portion 104 of the virtual address 100. In the example shown in Figure 2, the identify descriptor 14.5 in the level two page table 140 then specifies a base address thr the actual data page containing the requested data item. Accordingly, once that descriptor has been obtained, then a final stage of the address translation can be performed, where an access request is issued specifying address 152 generated by the combinatorial circuitry 150 based on the base address identified in the descriptor 145 and the thirdvirtaI address portion 106 of the virtual address 100, Performance of this final access request will cause the data item 160 to be accessed within the relevant data page 155.
Whilst all of these stages will need to be performed by the WvIU 14. 60 and in some instances the reLevant descriptors wiU already he cached within the relevant TLB 16, 62, in some instances one or more of these stages may require the page table walk circuitry IS, 64 to issue a memory page table walk request to retrieve at least one descriptor, and ultiniately a modified memory access request in order to obtain the data item required. Ji by way of example, the page table walk circuitry 18, 64 issues a memory page table walk request speci4'ing the address 120 in order to retrieve a descriptor 130 within the level one page table 125, the walk ahead circuitry 35 can detect that situation, and speculatively prefetch the additional descriptor 145 once the descriptor 130 is available. Furthermore, if desired, it can go on to speculatively prefetch the data item 60 once the descriptor 45 is available a.s a result of the first prefetch operation. Assuming the page table walk circuitry 18, 64 in due course goes on to issue a further page table walk request specifying the address 137, in order to obtain the descriptor 145, and thereafter a modified memory access request specifying the address 152 in order to access the data item, both of those follow on requests can be processed much more quickly due to the data having already been retrieved from the memory device by the walk ahead circuitry 35. ic
From Figure 2, it will be appreciated that in one embodiment, for the walk ahead circuitry to he able to perform the required prefetching operations, it needs access to the various virtual address portions 102, 104, 106 of the virtual address 100.
In one embodiment, this additiona; virtual address information is provided in association with the original memory page table walk request issued by the page table waik circuitry 18, 64, to enable the walk ahead circuitry to have the required information necessary to perfonn the prefetching operations.
Figure 3 is a block diagram illustrating in more detail components provided within the walk ahead circuitry 35 of Figure 1 in accordance with one embodiment.
The walk ahead circuitry includes a page table walk request detection circuit 200 for detecting., from the various access requests being routed over path 215 from the processing circuit components to the memory device, a memory page table walk request. On detecting such a memory page table walk request, a lookup is performed iii the walk ahead cache 210 in order to determine whether the requested descriptor is already stored in the walk ahead cache, as will he the case if it has already been prefetched by a previous prefetch memory request issued by the walk ahead circuitry.
if it is in the walk ahead cache 210, then the required descriptor can. be retutned directly over the read data path 220 back to the relevant MMU 1.4, 60. lithe required descriptor is not in the walk ahead cache 210, then the memory page table walk request can be processed in the standard manner by the memory device 40, with the associated descriptor subsequently being returned over the read data path 220 to the MMU. At the same time, that returning descriptor information can be stored in the walk ahead cache 210 so that it is available subsequently for any memory page table walk requests seeking that descriptor, hi addition, when the detection circuitry 200 detects a page table walk request, the iijrther request generation circuiirv 205 can be used to generate one or more prefetch memory requests in order to prefetch additional descriptors and/or the data item from the memory device, to speed up the operation of the further stages of the address translation process illustrated in Figure 2. It will be appreciated from Figure 2 that at each level, the descriptor from the previous level needs to have first been obtained, and accordingly the further request generation circuitry 205 will access the walk ahead cache 210 in order to determine when the required descriptor is available.
Considering again the example of Figure 2, when the descriptor 130 is available in the walk ahead cache 210, the further request generation circuitry 205 can generate a prefetch memory request in order to access the level 2 page table 140 in order to retrieve tEe descriptor 145. Further, once the descriptor 145 is available in the walk ahead cache 210, the further request generation circuitry 205 can issue a prefetch modified memory access request specifying the physical address for the required data item in the data page 155, in order to cause the required data item 160 to be prefetched from the memory device. In one embodiment, the walk ahead cache 210 can also be used to cache such prefetched data items, in addition to any prefetched descriptors, and in that embodiment the page table walk request circuitry 200 can also be used to perform a lookup in the walk ahead cache 210 for any modified memory access requests issued from the MMUs/associated processing circuits. Again, this can significantly reduce latency by avoiding the need for the memory device to be accessed at that point, given that the required data item has already been prefetched intothewalkaheadcache2l0.
Whilst in Figure 2 a two stage page table hierarchy is shown, it will be appreciated that the number of levels of page tables in the page table hierarchy may include more than two levels, with the process illustrated in Figure 2 being repeated for each level of the page table hierarchy until the final hierarchical level is reached, whereafter the required data item in the relevant data page can then be accessed using the descriptor obtained from the final level of the page table hierarchy. The use of the prefetching mechanism of the described embodiments can significantly reduce latency associated with each of the subsequent levels of the page table translation process, and in association with the retrieval of the ultimate data item to be accessed.
The page table walk read request issued by the page table walk circuitry 18, 64 can take a variety of forms, Figures 4A to 4C schematically illustrating three alternative forms. In each case, there will typically be a field 255 identifying the access request as a read access request, and there will also be an address portion field 260 identifying the address to be read from. It will be within this portion that the actual physical address for the descriptor to be read will be specified, for example the address for the descriptor 130 in the level I page table shown in Figure 2 or the address 137 for the descriptor 145 shown in the level 2 page table 140 of Figure 2.
As shown in Figure 4A, in one embodiment the page table walk read request will also specify additional virtual address bits 265 within the request 250, these additional virtual address bits being usable by the walk ahead circuitry 35 in order to construct the required physical address for a subsequent prefetch memory request to be issued by the walk ahead circuitry.
As shown in Figure 4B, in an alternative embodiment the page table walk read request 270 may additionally include a field 275 including a flag which is set to identify that the request is a page table walk request for which the walk ahead circuitry should perform some associated prefetching. Alternatively, this flag may in some embodiments not be required, and the detection circuitry of the walk ahead circuitry may be able to determine that there is a page table walk request for which it should perform prefetching based on the presence of data in the additional virtual address bits
field 265.
If the walk ahead circuitry is configured to perform only a single level of prefetching, then the additional virtual address bits field 265 will only need to specify the required virtual address bits for that single prefetch. For example, if the page table walk read request issued by the page table walk circuitry 18, 64 relates to accessing a descriptor in the level one page table, and the prefetch perfonned by the walk ahead circuitry is restricted to only prefetch the next descriptor from the level 2 page table, then the additional virtual address bits field 265 would only need to specify the virtual address bits 104 of Figure 2. However, if the walk ahead circuitry is instead to prefetch from multiple subsequent levels, hence requiring multiple additional virtual address portions to be specified within the field 265, in one embodiment the page table walk read request 280 can take the form shown in Figure 4C, where a field 285 is used to identify the current page table walk level of the request, and provide sufficient infonnation to enable the walk ahead circuitry to interpret the additional virtual address bits provided within the field 265', and in particular to identify which bits to use in association with each prefetch memory request. Considering the example of Figure 2, the information in the field 285 will hence identify that the virtual address bits 104 should be used for the first prefetch memory request issued, and that the virtual address bits 106 should be used for the next prefetch memory request issued.
Figure 5 is a flow diagram illustrating the operation of the walk ahead circuitry in accordance with one embodiment. At step 300, the detection circuitry 200 determines whether a page table walk request has been issued, which requires some prefetching to be performed, Once such a page table walk request has been detected, then at step 305 a lookup is performed in the walk ahead cache 210 using the address specified in that page table walk request, in order to determine whether the requested descriptor is present in the cache. Hence, at step 3110 it is determined whether there was a hit in the cache, and if there was the required data is output to die read data path at step 315. irrespective of whether there is a hit or not, the process then proceeds to step 320 where it is deteimi tied whether there al-c any other additional virtual address bits to process, i.e. whether any of the virtual address bits specified in the field 265, 265' have not yet been uti]ised. On the first iteration through the process of Figure 5, this will he the case, and accordingly the process will proceed to step 325.
At step 325, it is then determined whether the data required for the current request level is available in the walk ahead cache, In particular, as discussed earlier, the further request generation circuitry 205 will only be able to issue a prefetch memory request once the descriptor at the current request level (i.e. that descriptor that is the subject of the memory page table walk request for the first iteration) is available within die walk ahead cache 210.. When that descriptor information is available in the walk ahead cache 210. then the process will proceed to step 330, where a new prefetch memory request will be created for the next level using the data returned for the current request level and at least sonic of die virtual address bits specified in the page table walk read request. That new prefetch memory request will then be issued to the read requests path 215 for subsequent processing by the memory device. At this point, the next level (i.e. the level for which the prefetch memory request has just been issued) is then set as the current request level at step 335, wherealler it is determined at step 340 whether a condition for further walk ahead processing, is met.
En one embodiment, the walk ahead circuitry may merely be arranged to perform prefetching of. all of. the subsequent levels of the address translation up to and including the specified data item, with all of the associated descriptors and associated data item then being stored in the walk ahead cache 210. However, in an alternative enibodirnent the prefetching may be throttled in some manner, so that the prefetching' does not get too far ahead of the current level being considered by the MMtJ 14, 60.
For example, the walk ahead circuitry may be configured to only prefetch a maximum of two levels ahead of the current level being considered by the MMTJ. Accordingly, at step 340 it will be determined at what level the MMU is currently making a page table walk request for, before determining whether it is appropriate to continue further prefetching.
If the condition is not met, then a timeout mechanism may be employed to wait for a predetermined period of time, in the hope that the condition will be met before the timeout threshold is reached. If it is, then the process will branch back to step 320, however if the timeout threshold expires without the condition for further walk ahead being met, then the process may end at step 350. The process will also end at step 350 if at step 320 it is determined that all of the additional virtual address bits have been processed.
If the condition for further walk ahead processing is met at step 340, then steps 320, 325, 330 and 335 are repeated until either all of the additional virtual address bits have been processed, or the condition for further walk ahead processing is not met within the timeout threshold period.
Whilst the walk ahead circuitry is merely speculatively prefetching information that the MMU and associated processing circuitry may subsequently require, and hence there are no adverse consequences on the correct operation of the system by prefetching information that may in fact not later be needed, there will be a power consumption effect associated with the prefetching operations, and this can be one reason for introducing a condition at step 340 to throttle the degree to which the walk ahead circuitr prefetches ahead of the actual requirements of the MMU/associated processing circuitry. For example, in some situations it may that the descriptors associated with one or more subsequent levels of the page hierarchy, or the actual data item ultimately requested, may already be cached in one of the levels of cache 20, 22, within the system, and accordingly the subsequent page table walk requests and/or the modified memory access request may not ever be propagated as far as the walk ahead circuitry. Accordingly, the prefetched information held in the walk ahead cache of the walk ahead circuitry would not be utilised in that instance, and the prefetching would have wasted some power consumption unnecessarily. However, by specifying a condition for further walk ahead processing at step 340, the desire to reduce latency by performing the prefetching can be balanced against the power consumption consumed in doing so, dependent on implementation requirements.
The walk ahead circuitry can be located at a variety of positions within the apparatus. In some embodiments the walk ahead circuitry may reside between the MMU 14,60 and the memory device 40. In one embodiment, the walk ahead circuitry is incorporated within a memory controller 400 associated with the memory device, as for example shown in Figure 6. In the example, the memory controller 400 is associated with a DRAM memory device 415. Read memory access requests issued by the various processing circuits are buffered within the read request queue 405 prior to being forwarded to the memory device 415. Typically read request lookup circuitry 425 will monitor the contents of the read request queue and perform a lookup in the read data queue 420 in order to detect situations where the read data being requested has already been retrieved from the memory device. In those situations, read data can be provided directly from the read data queue, without having to perform an access to the memory device. However, for read requests that cannot be serviced directly from the contents of the read data queue, the scheduler 410 will schedule the various read requests for processing by the memory device, and will issue appropriate control commands to the memory device to cause the memory device to process the read access requests. The retrieved read data will then be returned to the read data queue 420 from where it can be returned to the requesting processing circuit/MMU.
In one embodiment, the walk ahead circuitry takes the form of the walk ahead circuitry 430 shown in Figure 6, consisting of the detection circuitry 430, further request generation circuitry 440 and the walk ahead cache 445. These components correspond directly to the components 200, 205, 210 discussed in relation to the walk ahead circuitry 35 in Figure 3. The additional prefetch requests generated by the further request generation circuitry 440 are inserted into the read request queue 405 for processing by the scheduler 410 and memory device 415 in clue course. The returned descriptors and/or data items obtained as a result of processing those prefetch requests are then routed from the read data queue into the walk ahead cache 445.
In an alternative embodiment as shown in Figure 7, the walk ahead cache can be subsumed into the read data queue 420, avoiding the need for a separate walk ahead cache. In that instance the detection circuitry 435 and the further request generation circuitry 440 will use the read request lookup circuitry 425 to perform the necessary lookups within the walk ahead cache provided as part of the read data queue 420. Such an approach can reduce the hardware cost associated with providing the walk ahead circuitry within the memory controller.
Through use of the techniques of the described embodiments, the load-to-use latency associated with the address translation process can be significantly reduced, thereby improving application performance within the data processing apparatus. The walk ahead circuitry of the described embodiments can be operated opportunistically, in order to walk ahead and use less busy periods of operation of the memory device to prefetch descriptors that may be required by an MMU and data items that may be required by the processing circuitry, by resolving the table walks ahead of time and prior to those table walks actually being issued by the associated MMU. This lowers the load-to-use latency for the processing circuitry.
Although particular embodiments have been described herein, it will be appreciated that the invention is not limited thereto and that many modifications and additions thereto may be made within the scope of the invention. For example, various combinations of the features of the following dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.

Claims (25)

  1. CLAIMS1. A data processing apparatus comprising: processing circuitry configured to issue a memory access request specil\'ing a virtual address for a data item; address translation circuitry configured to perform an address-* transiaiion process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifying a physical address for the data item, the address translation circuitry including page table walk circuitry configured to generate at least one memory page table walk request in order to retrieve the at least one descriptor required for the address translation process; walk ahead circuitry located in a path between the address translation circuitry and a memory device containing the at least one page table, the wa] Ic ahead circuitry conipri sing: 1 5 detection circuitry configured to detect a memory page table walk requesi generated by the page table walk circuitry of the address translation circuitry for a descriptor in a page table; and further request generation circuitry configured to generate a prefetch memory request in order to preièteh data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk tequest.
  2. 2. A data processing apparatus as claimed in Claim 1, wherein the data prefetched from the memory device in response to the prefetch memory request is one of the data item required by the modified memory access request and a fbrther descriptor required by the address translation process.
  3. 3. A data processing apparatus as claimed iii Claim i or Claim 2. wherein: the page table walk circuitry is configured to include, within the detected memory page table walic request, additional information not required to retrieve the descriptor requested by that detected memory page table walk request; and tile fUrther request generation circuitry is configured to use said additional information when generating said prefetch memory request.
  4. 4. A data processing apparatus as claimed in Claim 3, wherein: the page table walk circuitry is configured to use a portion of the virtual address in order to determine a descriptor address, and to include within the detected page table walk request said descriptor address; and the page table walk ercuitry is further configured to include, as said additional information, a further portion of the virtual address.
  5. 5. A data processing apparatus as claimed in any preceding claim, wherein: the address translation circuitry is configured to perfonn, as tie address translation process, a multi-level address translation process with reference to descriptors provided by a plurality of page tables configured in multiple hierarchical levels, and the page table walk circuitry is configured to generate memory page table walk requests in order to retrieve the descriptors required for the multi-level address translation process; the memory page table walk request detected by the detection circuitry is for a descriptor in a page table at one hierarchical leveL; and the further request generation circuitry is configured to generate as the prefetch memory request, for each of at least one subsequent hierarchical level, a prefetch memory page table walk request in order to prefetch an associated descriptor in a page table at that subsequent hierarchical level.
  6. 6. A data processing apparatus as claimed in Cairn 5 when dependent on Cairn 4, wherein: the Ibriher request generation circuitry is configured to determine a. descriptor address for the associated descriptor in a page table at a first subsequent hierarchical level with reference tc said further portion of the virtual address and the descriptor retrieved as a result of the memory device processing the detected memory page table walk request; and tile further request generation circuitry is further configured to include the determined descriptor address within the generated prefetch memory page table walk request for said first subsequent hierarchical level.
  7. 7. A data processing apparatus as claimed in Claim 6, wherein for each additional subsequent hierarchical level, the further request generation circuitry is configured to determine a descriptor address fix the associated descriptor in a page table at that additional subsequent hierarchical level with reference to said further portion of the virtual address and the descriptor obtained as a result of the memory device processing the prefetch memory page table walk request for a preceding subsequent hierarchical level.
  8. 8. A data processing apparatus as claimed in any of claims 5 to 7 when dependent on Cairn 4, wherein: the further request generation circuitry is configured to generate, fur each of multiple subsequent hierarchical levels, a prefetch memory page table walk request; and the page table walk circuItry is thither configured to include within the detected page table walk request, level indication data used by the further request generation circuitry to determine which bits of the further portion cii the virtual address Lo use when generating the prefetch memory page table walk request at each of the multiple subsequent hierarchical levels.
  9. 9. A data processing apparatus as claimed in any of claims S to 8, wherein said at least one subsequent hierarchical level includes a final hierarchical level, and the further request generation circuitry is further configured to generate a prefetch modified memory access request specifying a physical address fbr the data item in order to prefetch the data item,
  10. 10. A data processing apparatus as claimed in any of claims 5 to 9, wherein the walk ahead circuitry further includes a walk ahead storage structure configured to store the associated descriptor retrieved from the memory device as a result of each prefetch memory page table walk request.
  11. 11. A data processing apparatus as claimed in Claim 10 when dependent on Claim 9, wherein the walk ahead storage structure is further configured to store the prefetched data item retrieved from the memory device as a result of the prefetch modified memory access request.
  12. 12. A data processing apparatus as claimed in Claim 10 or Claim 11, wherein the walk ahead storage structure is configured as a cache.
  13. 13. Adataprocessing apparatus as claimedinany of claims5to lZwhereinthewalk ahead circuitry is configured to be responsive to control information to determine the number of subsequent hierarchical levels for which associated descriptors are prefetched ahead of a cuitent hierarchical level for which the page table walk circuitry has generated a memory page table walk request.
  14. 14. A data processing apparatus as claimed in any preceding claim, wherein the walk ahead circuitry is provided within a memory controller associated with the memory device.
  15. 15. A data processing apparatus as claimed in Claim 14, wherein the walk ahead circuitry is configured to re-use at least one existing component of the memory controller.
  16. 16. A data processing apparatus as claimed in Claim 15 when dependent on claim 10, wherein the walk ahead storage structure is provided by a read data queue within the memory controller.
  17. 17. A data processing apparatus as claimed in any of claims 5 to 16, wherein a descriptor provided in a page table at one hierarchical level provides a base address for a page table in a subsequent hierarchical level.
  18. 18. A data processing apparatus as claimed in any of claims 5 to 17, wherein a descriptor provided in a page table at a final hierarchical level provides a base address for a memory page containing the data item associated with tte virtual address specified in the memory access request.
  19. I 9. A data processing apparatus as claimed in any preceding clajm wherein the page table walk circuitry is configured to include, within the detec.ted memory page table walk request afiag field set to identify the request as a memory page table walk request.
  20. 20. Walk ahead circuitry for use in a data processing apparatus having processing circuitry for issuing a memory access request specifying a virtual address for a data item.1 0 and address translation circuitry for performing an address translation process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifying a physical address for the data item, the address translation circuitry generating at least one memory page table walk request in order to retrieve the at least one descriptor required for the address translation process, the walk ahead circuitry being configured for locating in a path between the address translation circuitry and a memory device containing the at least one page table, and comprising: detection circuitry configured to detect a memory page table walk request generated by We address translation circuitry for a descnptor in a page table; aiid further request generation circuitry configured to generate a prefetch memory request in order to prefetch data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
  21. 21. A walk ahead circuit comprising: detection circuitry conhgured to detect a memory page table walk request.generated by page table walk circuitry of an address translation circuit for a descriptor ina page table; andfurther request generation circuitry configured to generate a prefetch memory request in order to prefetch data from a memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
  22. 22. A method of handling address translation within a data processing apparatus, comprising: issuing from processing circuitry a memory access request specifying a virtual address for a data item; emoloyng address translation circuitry to perform an address translation process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifying a physical address for the data item, including generating at least one memory page table walk request in order to 1 0 retrieve the at least one descriptor required for the address translation process; employing walk ahead circuitry located in a path between the address translation circuitry and a memory device containing the at least one page LaNe, to: detect a memory page table walk request generated by the page table walk circuitry of the address translation circuitry for a descriptor in a page table; and 1 5 to generate a prefetch memory request in order to prefetch data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request
  23. 23. A data processing apparatus comprising: processing means for issuing a memory access request specifying a virtual address for a daia item; address translation means for perfonning an address translation process with reference to at least one descriptor provided by at least one page table, in order to produce a modified memory access request specifying a physical address f'or the data item, the address translation means including page table walk means for generating at least one memory page Lable walk request in order to retrieve the at least one descriptor required for the address translation process; walk ahead means for Locating in a path between the address translation means and a memory device containing the at least one page table, the walk ahead means comprising: detecbon means for detecting a memory page table walk request generated by the page table walk means of the address translation means for a descriptor in apage table; andfurther request generation neans-* for generating a prefetch memory request in order to prefetch data from the memory device at a physical address determined with reference to the descriptor requested by the detected memory page table walk request.
  24. 24. A data processing apparatus for performing address translation, substantially as hereinhefore described with reference to any of the accompanying figures.
  25. 25, A method of handling address translation within a data processing apparatus, substantially as hereinbefore described with reference to any of the accompanying figures. I)
GB1413397.9A 2014-07-29 2014-07-29 A data processing apparatus, and a method of handling address translation within a data processing apparatus Active GB2528842B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
GB1413397.9A GB2528842B (en) 2014-07-29 2014-07-29 A data processing apparatus, and a method of handling address translation within a data processing apparatus
PCT/GB2015/051809 WO2016016605A1 (en) 2014-07-29 2015-06-22 A data processing apparatus, and a method of handling address translation within a data processing apparatus
US15/325,250 US10133675B2 (en) 2014-07-29 2015-06-22 Data processing apparatus, and a method of handling address translation within a data processing apparatus
CN201580039538.6A CN106537362B (en) 2014-07-29 2015-06-22 Data processing apparatus and method of processing address conversion in data processing apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1413397.9A GB2528842B (en) 2014-07-29 2014-07-29 A data processing apparatus, and a method of handling address translation within a data processing apparatus

Publications (3)

Publication Number Publication Date
GB201413397D0 GB201413397D0 (en) 2014-09-10
GB2528842A true GB2528842A (en) 2016-02-10
GB2528842B GB2528842B (en) 2021-06-02

Family

ID=51587387

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1413397.9A Active GB2528842B (en) 2014-07-29 2014-07-29 A data processing apparatus, and a method of handling address translation within a data processing apparatus

Country Status (4)

Country Link
US (1) US10133675B2 (en)
CN (1) CN106537362B (en)
GB (1) GB2528842B (en)
WO (1) WO2016016605A1 (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10120812B2 (en) * 2016-02-03 2018-11-06 Nutanix, Inc. Manipulation of virtual memory page table entries to form virtually-contiguous memory corresponding to non-contiguous real memory allocations
US10296465B2 (en) * 2016-11-29 2019-05-21 Board Of Regents, The University Of Texas System Processor using a level 3 translation lookaside buffer implemented in off-chip or die-stacked dynamic random-access memory
US10402340B2 (en) 2017-02-21 2019-09-03 Micron Technology, Inc. Memory array page table walk
US10380034B2 (en) * 2017-07-14 2019-08-13 International Business Machines Corporation Cache return order optimization
US10467159B2 (en) 2017-07-14 2019-11-05 Arm Limited Memory node controller
US10592424B2 (en) 2017-07-14 2020-03-17 Arm Limited Range-based memory system
US10489304B2 (en) * 2017-07-14 2019-11-26 Arm Limited Memory address translation
US10353826B2 (en) 2017-07-14 2019-07-16 Arm Limited Method and apparatus for fast context cloning in a data processing system
US10613989B2 (en) 2017-07-14 2020-04-07 Arm Limited Fast address translation for virtual machines
US10565126B2 (en) 2017-07-14 2020-02-18 Arm Limited Method and apparatus for two-layer copy-on-write
US10534719B2 (en) 2017-07-14 2020-01-14 Arm Limited Memory system for a data processing network
US10528480B2 (en) * 2017-08-24 2020-01-07 Arm Limited Apparatus and method for efficient utilisation of an address translation cache
US10884850B2 (en) 2018-07-24 2021-01-05 Arm Limited Fault tolerant memory system
JP7350053B2 (en) 2018-07-25 2023-09-25 コンパニー ゼネラール デ エタブリッスマン ミシュラン 2 modulus metal cord
CN110941565B (en) * 2018-09-25 2022-04-15 北京算能科技有限公司 Memory management method and device for chip storage access
CN111198827B (en) * 2018-11-16 2022-10-28 展讯通信(上海)有限公司 Page table prefetching method and device
US11210232B2 (en) 2019-02-08 2021-12-28 Samsung Electronics Co., Ltd. Processor to detect redundancy of page table walk
US11816037B2 (en) * 2019-12-12 2023-11-14 Advanced Micro Devices, Inc. Enhanced page information co-processor
CN111367831B (en) * 2020-03-26 2022-11-11 超睿科技(长沙)有限公司 Deep prefetching method and component for translation page table, microprocessor and computer equipment
CN118103824A (en) * 2021-06-09 2024-05-28 安法布里卡公司 Transparent remote memory access over network protocol
CN114218132B (en) * 2021-12-14 2023-03-24 海光信息技术股份有限公司 Information prefetching method, processor and electronic equipment
CN114238167B (en) * 2021-12-14 2022-09-09 海光信息技术股份有限公司 Information prefetching method, processor and electronic equipment
CN117785738B (en) * 2024-02-23 2024-05-14 超睿科技(长沙)有限公司 Page table prefetching method, device, chip and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050071601A1 (en) * 2003-09-30 2005-03-31 International Business Machines Corporation Apparatus and method for pre-fetching page data using segment table data
US20100250859A1 (en) * 2009-03-30 2010-09-30 Via Technologies, Inc. Prefetching of next physically sequential cache line after cache line that includes loaded page table entry
US20110010521A1 (en) * 2009-07-13 2011-01-13 James Wang TLB Prefetching
US20120226888A1 (en) * 2011-03-03 2012-09-06 Qualcomm Incorporated Memory Management Unit With Pre-Filling Capability
US20140052917A1 (en) * 2012-05-10 2014-02-20 Oracle International Corporation Using a shared last-level tlb to reduce address-translation latency
US20140181460A1 (en) * 2012-12-21 2014-06-26 Advanced Micro Devices, Inc. Processing device with address translation probing and methods

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7313658B2 (en) * 2001-10-23 2007-12-25 Via Technologies, Inc. Microprocessor and method for utilizing disparity between bus clock and core clock frequencies to prioritize cache line fill bus access requests
US20060136696A1 (en) * 2004-12-16 2006-06-22 Grayson Brian C Method and apparatus for address translation
US7886112B2 (en) * 2006-05-24 2011-02-08 Sony Computer Entertainment Inc. Methods and apparatus for providing simultaneous software/hardware cache fill
US8806177B2 (en) * 2006-07-07 2014-08-12 International Business Machines Corporation Prefetch engine based translation prefetching
US7685355B2 (en) * 2007-05-07 2010-03-23 Microsoft Corporation Hardware memory management unit simulation using concurrent lookups for address translation data
US8161243B1 (en) * 2007-09-28 2012-04-17 Intel Corporation Address translation caching and I/O cache performance improvement in virtualized environments
GB2478727B (en) * 2010-03-15 2013-07-17 Advanced Risc Mach Ltd Translation table control
US8996840B2 (en) 2011-12-23 2015-03-31 International Business Machines Corporation I/O controller and method for operating an I/O controller
US9378150B2 (en) * 2012-02-28 2016-06-28 Apple Inc. Memory management unit with prefetch ability
US20140108766A1 (en) * 2012-10-17 2014-04-17 Advanced Micro Devices, Inc. Prefetching tablewalk address translations
US9563562B2 (en) * 2012-11-27 2017-02-07 Nvidia Corporation Page crossing prefetches
US9047198B2 (en) * 2012-11-29 2015-06-02 Apple Inc. Prefetching across page boundaries in hierarchically cached processors
US10380030B2 (en) * 2012-12-05 2019-08-13 Arm Limited Caching of virtual to physical address translations
US9244840B2 (en) * 2012-12-12 2016-01-26 International Business Machines Corporation Cache swizzle with inline transposition
US9158705B2 (en) * 2013-03-13 2015-10-13 Intel Corporation Stride-based translation lookaside buffer (TLB) prefetching with adaptive offset
CN105378683B (en) * 2013-03-15 2020-03-17 英特尔公司 Mechanism for facilitating dynamic and efficient management of translation buffer prefetching in a software program at a computing system
EP2840504A1 (en) * 2013-08-23 2015-02-25 ST-Ericsson SA Enhanced pre-fetch in a memory management system
US9569361B2 (en) * 2014-01-10 2017-02-14 Samsung Electronics Co., Ltd. Pre-fetch chaining

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050071601A1 (en) * 2003-09-30 2005-03-31 International Business Machines Corporation Apparatus and method for pre-fetching page data using segment table data
US20100250859A1 (en) * 2009-03-30 2010-09-30 Via Technologies, Inc. Prefetching of next physically sequential cache line after cache line that includes loaded page table entry
US20110010521A1 (en) * 2009-07-13 2011-01-13 James Wang TLB Prefetching
US20120226888A1 (en) * 2011-03-03 2012-09-06 Qualcomm Incorporated Memory Management Unit With Pre-Filling Capability
US20140052917A1 (en) * 2012-05-10 2014-02-20 Oracle International Corporation Using a shared last-level tlb to reduce address-translation latency
US20140181460A1 (en) * 2012-12-21 2014-06-26 Advanced Micro Devices, Inc. Processing device with address translation probing and methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002)", published 25/05/2002, IEEE, pp 195-206, Kandiraju & Sivasubramaniam , "Going the distance for TLB prefetching: an application-driven study" *

Also Published As

Publication number Publication date
CN106537362A (en) 2017-03-22
GB201413397D0 (en) 2014-09-10
GB2528842B (en) 2021-06-02
CN106537362B (en) 2020-10-30
US10133675B2 (en) 2018-11-20
US20170185528A1 (en) 2017-06-29
WO2016016605A1 (en) 2016-02-04

Similar Documents

Publication Publication Date Title
US10133675B2 (en) Data processing apparatus, and a method of handling address translation within a data processing apparatus
US11237728B2 (en) Method for accessing extended memory, device, and system
CN107066396B (en) Apparatus and method for operating caching of physical tags of virtual index
JP6696987B2 (en) A cache accessed using a virtual address
KR101667772B1 (en) Translation look-aside buffer with prefetching
US7558920B2 (en) Apparatus and method for partitioning a shared cache of a chip multi-processor
US20140258622A1 (en) Prefetching of data and instructions in a data processing apparatus
US8683125B2 (en) Tier identification (TID) for tiered memory characteristics
US11061572B2 (en) Memory object tagged memory monitoring method and system
CN104769560B (en) Prefetching to a cache based on buffer fullness
CN112416817B (en) Prefetching method, information processing apparatus, device, and storage medium
US9251048B2 (en) Memory page management
KR102482516B1 (en) memory address conversion
CN112527395B (en) Data prefetching method and data processing apparatus
KR102478766B1 (en) Descriptor ring management
WO2013101138A1 (en) Identifying and prioritizing critical instructions within processor circuitry
CN111367831A (en) Deep prefetching method and component for translation page table, microprocessor and computer equipment
CN108874691B (en) Data prefetching method and memory controller
CN114238167B (en) Information prefetching method, processor and electronic equipment
CN106339330A (en) Method and system for flushing caches
US9146870B2 (en) Performance of accesses from multiple processors to a same memory location
CN114281720B (en) Processor, address translation method for processor and electronic equipment
CN114218132B (en) Information prefetching method, processor and electronic equipment
EP3283966B1 (en) Virtualization-aware prefetching
US12026401B2 (en) DRAM row management for processing in memory