WO2012070291A1 - キャッシュコヒーレンシ制御の方法、システムおよびプログラム - Google Patents
キャッシュコヒーレンシ制御の方法、システムおよびプログラム Download PDFInfo
- Publication number
- WO2012070291A1 WO2012070291A1 PCT/JP2011/070116 JP2011070116W WO2012070291A1 WO 2012070291 A1 WO2012070291 A1 WO 2012070291A1 JP 2011070116 W JP2011070116 W JP 2011070116W WO 2012070291 A1 WO2012070291 A1 WO 2012070291A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tlb
- processor
- access
- memory
- physical page
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/084—Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
- G06F12/1036—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
Definitions
- the present invention relates to cache coherency control, and more particularly to a method, system, and program for controlling cache coherency of a shared memory multiprocessor.
- a multiprocessor system executes multiple tasks or processes (hereinafter referred to as “process”) simultaneously.
- process typically has a virtual address space that is used to execute the process.
- Such a location in the virtual address space includes an address that is mapped to a physical address in system memory. It is not uncommon for a single space in system memory to map to multiple virtual addresses in a multiprocessor system. If multiple processes each use virtual addresses, translate these addresses to physical addresses in system memory and retrieve them from system memory if the processor's cache that handles the process does not have the appropriate instructions or data Store in cache.
- TLB Translation Look-aside Buffer
- SMP symmetric multiprocessing
- Each processor in such a multiprocessor system typically includes a TLB for address translation associated with the cache, and in a shared aspect of memory in such a system, to maintain coherency, Changes to one processor's TLB must be carefully and consistently mapped to each other processor's TLB.
- each TLB in the multiprocessor system usually reflect the portion related to the cache in the contents of the page table maintained in the system memory in the page memory system.
- the page table is generally a memory map table that includes virtual addresses or their segments and physical addresses associated with those addresses.
- Various other management data including page protection bits, valid entry bits, and various access control bits are also typically included in such page tables.
- a bit memory coherence required attribute
- the effective use of this prior bit setting method can be rewritten so that the cache is controlled by software. Limited to some special programs that are allowed.
- the scalability of the SMP system is low compared to a cluster that assumes message passing. This is because the cost of hardware that supports cache coherency increases dramatically as the number of processors in the SMP system increases to increase scalability.
- An example of hardware support for cache coherency for SMP systems is the large-scale distributed sharing from the MESI (Modified, Exclusive,) Shared, Invalid) snoop protocol with inexpensive hardware used on the desktop PC shared bus.
- DSM Distributed Shared Memory
- cache Coherent, Non-Uniform Memory Access hereinafter referred to as "CC-NUMA”
- special protocols such as protocol processors and directory memory
- CC-NUMA cache coherent non-uniform memory access
- protocol processors and directory memory
- directory-based protocols with expensive hardware that integrates node-to-node connections can be mentioned.
- CC-NUMA cache coherent non-uniform memory access
- CC-NUMA cache Coherent, Non-Uniform Memory Access
- Non-Patent Document 1 describes a VM (Virtual Memory) that uses the hardware of the memory management unit (Memory Management Unit, hereinafter referred to as "MMU") of the processor to improve the scalability and cost performance of the SMP system.
- MMU memory Management Unit
- NCC- non-Cache Coherent
- VM-based shared memory technology handles cache coherency of the same process, but cannot handle cache coherency between different processes.
- VM-based sharing Data to which the memory technology can be applied is limited to data that guarantees that the application program is not shared with different processes, and cache coherency that is transparent from the application program cannot be implemented.
- the VM-based shared memory technology is not a general-purpose computer, and is limited in application to specific uses and scientific / technical calculations in which new programs can be designed.
- Patent Document 1 it is necessary to broadcast a TLB purge transaction for TLB consistency control at the time of rewriting a page table by adding a small amount of hardware by providing a physical page map table in a shared main memory multiprocessor. It is described that it is possible to eliminate or greatly reduce the bus installation in the network and nodes and to eliminate or greatly reduce the pipeline installation of the processor accompanying the TLB purge process.
- an associative memory such as a cache memory (CACHE-M) or an address translation buffer (TLB) is accessed by a data transfer instruction such as a MOV instruction so that an operation such as entry invalidation can be performed.
- a data transfer instruction such as a MOV instruction
- Patent Document 3 introduces a set of software instructions that allow software to insert translation information such as address translation pairs directly into the TLB.
- the page fault handler not only inserts translation information into the page directory, but also It is described that information can be inserted into TL B, and once the execution of the page fault handler routine is completed, the next virtual address will be guaranteed to be a TLB hit, not a TLB miss. ing.
- An object of the present invention is to realize cache coherency control that improves the scalability of a shared memory multiprocessor system and improves the cost performance by reducing the cost of hardware and software.
- the object of the present invention includes providing a method, system and program for realizing such cache coherency control.
- the object of the present invention includes realizing cache coherency control by software with an inexpensive hardware configuration.
- an object of the present invention includes realizing cache coherency control by software transparently from an application program, that is, without rewriting the application program.
- a cache coherency control method controls cache coherency of a multiprocessor system in which a plurality of processors having a cache and a TLB share system memory, and the processor generates a TLB interrupt in a TLB search.
- a TLB miss exception handling step that handles a TLB miss interrupt when there is no matching information in the TLB when it is determined that it is not a fault, or there is registration information that has an address matching in the TLB, but there is access Including executing a storage exception handling step for handling a storage interrupt, which is a case where authority is violated.
- the TLB miss exception processing step includes a step of flushing the data cache line of the cache belonging to the physical page covered by the victim TLB entry that is evacuated and discarded when the TLB replacement is executed.
- the TLB miss exception processing step or the storage exception processing step determines whether the memory access that caused the TLB miss interrupt or storage interrupt is a data access or an instruction access, and determines that the memory access is a data access.
- Sometimes the write, read, and execute authority for the physical page covered by the TLB entry to be replaced or updated for the access is exclusive to the access authority for the physical page provided in the TLB of another processor. And a processing step for adding a constraint.
- the processing step for adding the exclusive constraint includes a processing step for adding a write invalidation method constraint.
- the processing step for adding a restriction on invalidation at the time of writing includes a MESI emulation processing step for adding a restriction on the MESI protocol.
- the MESI emulation processing step includes a step of determining whether the memory access is a data write or a read, and a TLB directory memory that holds registration information of the TLB of the processor and the TLB of the plurality of processors when the read is determined to be read.
- the step of turning on the read attribute for the physical page of the access and the TLB directory memory are searched for the physical page of the access, and the TLB of another processor has the authority to write the physical page of the access. If you have the right to write and a write right, notify the other processor of a clean command with an inter-processor interrupt, and give the other processor the right to write to the physical page of the access.
- the step of clearing the right to write to the physical page of access by another processor is to copy back the data cache and disable the write attribute for the physical page of the access in the TLB of the processor. Including the steps of:
- the MESI emulation processing step turns on the write attribute for the access physical page in the processor TLB and TLB directory memory when it is determined to be write, and the TLB directory memory is accessed for the physical page.
- Search to determine if the TLB of the other processor has read, write, or execute authority for the physical page being accessed, and other when it has read, write, or execute authority A flash command is sent to the other processor by an inter-processor interrupt, and the other processor clears the authority to read, write, and execute access to the physical page, and the TLB directory memory Access physical page Clearing read, write and execute attributes on the page.
- the step of clearing the read, write, and execute authority to the physical page of access to another processor is that the other processor invalidates the data cache by copying it back, Including disabling read, write and execute attributes.
- the TLB miss exception processing step or the storage exception processing step includes a step of determining whether the memory access that caused the TLB miss interrupt or the storage interrupt is a data access or an instruction access, and the memory access is an instruction access.
- the step of determining whether the page table entry in the system memory has user write permission for the physical page that caused the TLB miss interrupt in the instruction fetch, and the page table entry has the user write permission If the TLB of the other processor has the authority to allow user write for the physical page, and if the TLB of the other processor has authority to allow user write, Clean command And notification processor among interrupt, and a step of the authority of the user write permission to clear the other processors.
- the TLB miss exception handling step or the storage exception handling step was accessed when the TLB of another processor did not have the user write permission or after the step that caused the other processor to clear the user write permission Invalidating the processor's instruction cache.
- the TLB miss exception processing step or the storage exception processing step is performed when the page table entry does not have the authority to allow user write, or after the step of invalidating the instruction cache of the accessing processor, the TLB of the accessing processor.
- a TLB directory memory holding TLB registration information of a plurality of processors, turning on an execution attribute for a physical page that has caused a TLB miss interrupt by instruction fetch.
- the MESI emulation processing step includes a step of performing sequential access using a semaphore when searching the TLB directory memory for a physical page to be accessed.
- a computer program for cache coherency control that causes a processor to execute each step of the above method.
- the cache coherency control system controls cache coherency of a multiprocessor system in which a plurality of processors having a cache and a TLB share a system memory.
- the processor includes a TLB control unit that includes a TLB search unit that executes TLB search and a coherency handler that executes TLB registration information processing when a TLB interrupt is generated without hitting the TLB search.
- the coherency handler includes a TLB replacement handler that performs system memory page table search and TLB registration information replacement, and a TLB that has no registered information that matches the address when the TLB interrupt is not a page fault.
- the TLB miss exception processing unit flushes the data cache line of the cache belonging to the physical page covered by the victim TLB entry that is evacuated and discarded.
- the TLB miss exception processing unit and the storage exception processing unit determine whether the memory access causing the TLB miss interrupt and the storage interrupt is a data access or an instruction access, respectively, and determine that the memory access is a data access.
- the write, read, and execute authority for the physical page covered by the TLB entry to be replaced or updated for the access is exclusive to the access authority for the physical page provided in the TLB of another processor.
- a process for adding a constraint is executed.
- the TLB miss exception processing unit and the storage exception processing unit determine whether the memory access causing the TLB miss interrupt and the storage interrupt is a data access or an instruction access, respectively, and determine that the memory access is an instruction access.
- the physical page that caused the TLB miss interrupt in the instruction fetch determines whether the system memory page table entry has user write permission, and the page table entry has user write permission, Determine whether the TLB of the other processor has the authority to allow user write for the physical page, and if the TLB of the other processor has authority to allow user write, issue a clean command to the other processor with an inter-processor interrupt. Notify other processors To the privileges of the user write permission to clear.
- the cache coherency control system further includes a TLB directory memory that holds TLB registration information of a plurality of processors and that searches for physical pages from the plurality of processors.
- the multiprocessor system includes a plurality of nodes, and each of the nodes is connected to the coherent shared bus by a bridge mechanism and a system memory connected to the plurality of processors by a coherent shared bus.
- the semaphore handler includes a TLB directory memory and a semaphore handler for sequential access to the TLB directory memory of a plurality of processors by a semaphore, and is interconnected by an NCC-NUMA mechanism.
- cache coherency control is realized that improves the scalability of the shared memory multiprocessor system and reduces the cost of hardware and software to improve cost performance.
- a method, a system, and a program for realizing such cache coherency control are provided, and cache coherency control can be realized by software with an inexpensive hardware configuration, and further can be realized without rewriting an application program. it can.
- FIG. 1 is a block diagram schematically illustrating a multiprocessor system that can be used to implement cache coherency control according to the present invention.
- FIG. 1 is a block diagram schematically illustrating a processor having a system for cache coherency control according to an embodiment of the present invention.
- FIG. It is a schematic block diagram of a TLB directory memory. It is a flowchart figure which shows roughly the method of the cache coherency control which concerns on one Embodiment of this invention. It is a flowchart figure which shows the eviction process of victim TLB entry in the subroutine of a TLB miss exception process and a storage exception process of a coherency handler.
- FIG. 10 is a flowchart showing instruction cache coherency processing in a subroutine of TLB miss exception processing and storage exception processing of a coherency handler.
- FIG. 5 is a flow diagram of the entrance and exit of a coherency handler showing the use of a semaphore. It is a schematic block diagram of a coherent shared memory multiprocessor system extended to a hybrid system of SMP and NCC-NUMA. It is a schematic block diagram of the local TLB directory memory for LSM *.
- FIG. 1 is a block diagram schematically showing a multiprocessor system 100 that can be used to implement cache coherency control according to the present invention.
- a plurality of processors 101 are each coupled to a system memory 103 by a memory bus 102.
- Each processor 101 includes a CPU 104, an MMU 105, and a cache 106, and the MMU 105 includes a TLB 107.
- a part of the contents of the system memory 103 is held in the cache 106 of each processor 101.
- each processor 101 can read or write to the system memory 103, so the data and instructions in the system memory 103 and cache 106 need to be coherent. is there.
- a page table 108 is preferably provided in the system memory 103, and multiple entries in the page table 108, ie, registration information, can be used to efficiently map virtual addresses to physical addresses in the system memory 103. it can.
- the system memory 103 includes a memory controller 109, and exchanges storage information with the external storage device 120 connected thereto, that is, performs reading and writing.
- Each processor 101 can convert a virtual address of an instruction or data into a physical address in the system memory 103 by duplicating information included in each entry of the page table 108 using the TLB 107. Since the TLB 107 provides memory space address information, it is important to control the TLB 107 in the multiprocessor system 100 so as to maintain coherency in order to ensure accurate operation of the TLB 107.
- FIG. 2 is a block diagram schematically showing a processor 101 having a cache coherency control system according to an embodiment of the present invention.
- the cache 106 of the processor 101 includes an instruction cache 106 'and a data cache 106 ".
- the processor 101 is connected to a TLB directory memory 121 provided so that all the processors 101 can access.
- the TLB directory memory 121 is used to allow the local processor 101 to examine the contents of the TLB 107 of the remote processor 101 without interrupting the remote processor 101 using an inter-processor interrupt.
- the CPU 104 has an operation mode (user mode) for executing an AP (Application Program) process 122, an operation mode (supervisor mode) for executing the OS kernel process 124, and an operation mode for executing an interrupt handler.
- the coherency handler 126 is executed in the third operation mode.
- the TLB control unit 123 includes a TLB search unit 125 that performs a TLB search when the AP process 122 accesses the cache 106 or an OS kernel process 124 accesses the cache 106, and performs a TLB search to perform a TLB interrupt without performing a hit.
- a coherency handler 126 that executes registration information processing of the TLB 107 when a problem occurs.
- the coherency handler 126 is located outside the OS kernel process 124 that handles page faults, as shown in FIG.
- the TLB search unit 125 includes a cache tag search unit 127 that searches for a cache tag when a hit is made by executing a TLB search.
- the cache tag search unit 127 instructs access to the cache 106 by the AP process 122 when a hit is found in the cache tag search.
- the cache tag search unit 127 instructs the AP process 122 to access the system memory 103 instead of the cache 106 when a cache tag miss occurs without hitting the cache tag search.
- the coherency handler 126 includes a TLB replacement handler 128, a TLB miss exception processing unit 129, and a storage exception processing unit 130.
- the TLB replacement handler 128 includes a page table search unit 131 and a page fault determination unit 132.
- the page table search unit 131 processes the search of the page table 108 in the system memory 103 when the TLB search unit 125 causes a TLB interrupt.
- the page fault determination unit 132 determines whether or not a page fault has occurred from the search processing of the page table search unit 131.
- the page fault determination unit 132 determines from the search processing of the page table search unit 131 that there is no page fault, that is, the page table 108 has a TLB entry page
- the TLB miss exception processing unit 129 or the storage exception processing unit 130 performs coherency. Execute control.
- TLB miss an entry with matching address, that is, a case where the registration information does not exist in the TLB is indicated as “TLB miss”.
- the TLB miss exception processing unit 129 processes a TLB miss interrupt
- the storage exception processing unit 130 processes a storage interrupt. Since the coherency handler 126 performs coherency control when it is not a page fault, it is different from the VM-based shared memory technique that performs coherency processing when it is a page fault.
- the OS kernel process 124 includes a memory management unit 133.
- the page fault determination unit 132 determines from the search processing of the page table search unit 131 that it is a page fault
- the TLB replacement handler 128 generates a page fault interrupt
- the memory management unit 133 of the OS kernel processing 124 processes the page fault. Execute.
- the TLB miss exception processing unit 129 and the storage exception processing unit 130 of the coherency handler 126 execute coherency control so that only the physical page registered in the entry of the TLB 107 of the local processor 101 is held in the cache 106. Therefore, when the coherency handler 126 executes TLB replacement, a TLB entry that is evictioned and discarded as a victim, that is, a physical page covered by registration information is flushed from the cache (FLUSH). ), Ie, copyback and invalidation. Further, the authority to write / read / execute the physical page covered by the added TLB entry, that is, the registration information is exclusive to the access authority of the physical page provided in the TLB 107 of the remote processor 101. Add constraints.
- the MESI protocol is a coherency protocol classified as a write-invalidate type, but in addition to this, there is a write-update type, either of which may be adopted.
- MESI protocol restrictions later. If that restriction is added, there is no need to process coherency unless a TLB miss occurs.
- VM-based shared memory technology logical pages held in the page table are exclusively cached, and therefore coherency cannot be handled when the same physical page is mapped to a plurality of different logical pages.
- the TLB entry that is, the registration information is replaced, that is, replaced or updated at the time of the TLB miss interrupt and the storage interrupt
- the authority to read / write / execute according to the restrictions of the MESI protocol is given.
- the TLB is merely a cache of a part of the page table, but the cache coherency control shown in FIG. 2 uses a TLB controlled by software.
- the access authority recorded in the page table only the access authority that meets the exclusive restriction of the MESI protocol is set in the TLB 107. Accordingly, the access authority recorded in the TLB 107 is the same as or restricted by the access authority recorded in the page table 108 of the system memory 103.
- the local processor 101 refers to the TLB directory memory 121 so that the TLB entry of the remote processor 101 that needs to be updated to meet the exclusive constraint of the MESI protocol, that is, registration Search for information.
- the access to the TLB directory memory 121 may be performed sequentially using a semaphore.
- the TLB directory memory 121 may be implemented by an associative memory (CAM: Content Addressable Memory).
- the search word When using the CAM, the search word includes a physical page number and a write / read / execute access permission bit, and the concatenation of the processor ID and the TLB entry number is used as the CAM address input. It is desirable that the bus used for CAM access is independent of the memory bus and dedicated to the CPU.
- An example of such a bus is a device control register (DCR: Device Control Register) bus.
- FIG. 3 schematically shows the configuration of the TLB directory memory 121.
- the TLB directory memory 121 for holding the TLB 107 entry of each processor 101, that is, registration information is provided so that each processor 101 can track the entry of the TLB 107 of another processor 101, that is, registration information without using an inter-processor interrupt. It is. Further, by controlling caching so that only entries registered in the TLB 107 of each processor 101, that is, pages of registered information are allowed to be cached, and searching those TLBs 107, the usage status of each cache page can be determined. It is.
- the TLB directory memory 121 is mapped to the global address space so that all processors 101 can access it.
- Each entry in the TLB directory memory 121 includes valid status information 300 written as VS (Valid Status), physical page number information 301 written as PPN (Physical PageNumber), and R / W / E P (Read / Write / Execute). Read / write / execution access authority protection information 302 labeled “Protection”. These are duplicated from the information held by the TLBs 107 of all the processors 101.
- the left end indicates that the address of the TLB directory memory 121 can be formed from a combination of the processor ID and the TLB entry number, and the right end indicates that each entry group corresponds to the processor 0 to the processor N, respectively.
- the physical page number in the TLB 107 of each processor 101 is searched using the TLB directory memory 121 so that coherency between different processes can be handled.
- the TLB directory memory 121 is implemented by a CAM to increase the speed so that the following two search operations can be performed. One is a search for a page with a write permission and a matching physical page number, and the other is a search for a page with a read, write or execution permission and a matching physical page number.
- the search is performed by including the physical page number and the access permission to the page in the search word input of the CAM and inputting the concatenation of the processor ID and the TLB entry number in the CAM address input.
- a bus occupied by the processor such as a DCR bus is suitable.
- FIG. 4 is a flowchart (400) schematically illustrating a method of cache coherency control according to an embodiment of the present invention.
- This method can be implemented by the processor 101 in which the TLB as shown in FIG. 2 is controlled by software.
- the processor 101 executes a TLB search (step 402).
- the processor 101 searches for a cache tag of the hit TLB entry (step 403). If there is a hit in the cache tag search, the processor 101 instructs access to the cache and executes access to the cache (step 404).
- the processor 101 instructs access to the system memory and executes access to the system memory (step 405).
- the processor 101 determines whether or not the TLB interrupt is a page fault (step 406).
- the processor 101 performs TLB miss exception processing or storage exception processing by the coherency handler.
- a subroutine is executed (step 407).
- the processor 101 When it is determined that the page fault is indicated as “Yes”, the processor 101 generates a page fault interrupt, and the OS kernel processing memory management unit executes a page fault processing subroutine (step 408).
- FIG. 5 is a flowchart (500) showing an eviction process of victim TLB entry, that is, registration information, in a subroutine (see step 407 in FIG. 4) of TLB miss exception processing and storage exception processing of the coherency handler.
- the TLB miss exception handling and storage exception handling subroutines of the coherency handler start at the TLB miss exception handling entry (step 501) and the storage exception handling entry (step 502), respectively.
- the processor 101 executes TLB replacement for fetching the matching entry from the page table 108, that is, registration information into the TLB 107 (step 503). ).
- the entry that is, the registration information is updated in the TLB directory memory 121.
- the processor 101 flushes (copys back and invalidates) the local data cache line belonging to the victim TLB entry that is eviction and discarded, that is, the physical page covered by the registration information (step 504). ).
- the remote processor at the time of the TLB miss interrupt or storage interrupt. This can be determined simply by examining the TLB.
- the processor 101 determines whether the memory access that caused the TLB miss interrupt or the storage interrupt is a data access or an instruction access (step 505).
- the processor 101 proceeds to a subroutine 506 of MESI emulation processing when it is data access, and proceeds to a subroutine 507 of instruction cache coherency processing when it is instruction access.
- an exclusive constraint between the local TLB and the remote TLB e.g. when switching or updating TLB entries, i.e. registration information, in the case of both TLB miss interrupts and storage interrupts, e.g. Set the authority to read, write, and execute according to the restrictions of the MESI protocol, which is the invalidation method when writing.
- MESI protocol which is the invalidation method when writing.
- Multiple processors can share the authority to read and execute the same physical page. If a TLB interrupt occurs during a data read or instruction fetch and the remote processor has the authority to write to the physical page, a clean (CLEAN) command is sent to the remote processor via an interprocessor interrupt (IPI). The remote processor to clear the right to write to the physical page.
- IPI interprocessor interrupt
- FIG. 6 shows a flowchart (600) of the MESI emulation process as an example of providing MESI protocol restrictions by software control.
- the processor 101 determines in FIG. 5 that it is data access (step 505), the processor 101 proceeds to a subroutine 506 of MESI emulation processing and starts the processing (step 601).
- the processor 101 determines whether the erroneous access that caused the TLB interrupt is data writing or reading (step 602).
- the processor 101 stores the entry corresponding to the physical page of the erroneous access in the local TLB 107 and the TLB directory memory 121, that is, the R (Read) attribute of the registration information in the page table.
- the processor 101 searches the TLB directory memory 121 for an erroneously accessed physical page, and determines whether or not the remote TLB has W (Write) authority for the physical page (step) 604). If the user does not have W authority marked “No”, the process is terminated (step 605). When the user has W authority marked as “Yes”, the processor 101 notifies the remote processor of a clean command by an inter-processor interrupt (IPI), and the physical page is transmitted to the remote processor. Clear the right to write to.
- IPI inter-processor interrupt
- the remote processor copies back the data cache, and disables the entry corresponding to the physical page in the remote TLB, that is, the W attribute of the registration information (step 606).
- the logical to physical address translation remains in that entry in the remote TLB.
- the processor 101 clears the entry corresponding to the physical page related to the remote TLB in the TLB directory memory 121, that is, the W attribute of the registration information (step 607), and ends (step 608).
- the processor 101 sets the entry corresponding to the physical page of the erroneous access in the local TLB 107 and TLB directory memory 121, that is, the W attribute of the registration information in the page table 108. Masked by the PTE UW (User (Write) and SW (Supervisor Write) bits and turned on (step 609).
- the processor 101 searches the TLB directory memory 121 for an erroneously accessed physical page, and determines whether the remote TLB has R, W, or X (eXecute: execute) authority for the physical page. Determination is made (step 610).
- the process ends (step 605).
- the processor 101 notifies the remote processor of a flash command by an inter-processor interrupt (IPI), and the remote processor
- IPI inter-processor interrupt
- the remote processor copies back the data cache to invalidate it, and disables the entry corresponding to the physical page in the remote TLB, that is, the R, W, and X attributes of the registration information (step 611).
- the logical to physical address translation remains in that entry in the remote TLB.
- the processor 101 clears the entry corresponding to the physical page related to the remote TLB in the TLB directory memory 121, that is, the R, W, and X attributes of the registration information (step 612) and ends (step). 608).
- snoop filtering using TLB that is, snoop reduction
- TLB read / write / execution authority
- a determination step is added in which the broadcast of the snoop request, which is a problem when the MESI protocol is implemented in hardware, is limited to a case where a physical page covering the data is also registered in the remote TLB. Therefore, the MESI emulation processing in which the restriction of the MESI protocol is controlled by software can improve the scalability than the implementation of the MESI protocol by hardware.
- the cache coherency control coherency handler can control not only the coherency between data caches but also the coherency between the instruction cache and the data cache. This is realized by invalidating the instruction cache line when a TLB miss interrupt is caused by an instruction fetch to a writable page having write permission authority.
- the instruction cache needs to be coherent to the data cache, but in Linux, the instruction cache is disabled only when fetching pages that can be written in user space. It's all right.
- FIG. 7 shows a flowchart (700) of instruction cache coherency processing under software control.
- the processor 101 determines that it is instruction access in FIG. 5 (step 505), the processor 101 proceeds to a subroutine 507 for instruction cache coherency processing and starts the processing (step 701).
- the processor 101 determines whether or not the PTE of the page table 108 has the authority to allow user writing for a physical page that has caused a TLB miss interrupt by instruction fetch (step 702).
- the processor 101 determines whether the remote TLB has authority to permit user writing for the physical page (step 703).
- the processor 101 When the TLB marked “Yes” has the authority to allow user writing, the processor 101 notifies the remote processor of a clean command by an inter-processor interrupt (IPI) to the remote processor. Clear the user write permission. That is, the remote processor issues a dcbst (data
- IPI inter-processor interrupt
- step 705 the processor 101 performs an instruction fetch in the local TLB 107 and the TLB directory memory 121 in the same way as when the PTE is not authorized for user write permission, which is marked “No” in the determination in step 702.
- the entry corresponding to the physical page that caused the TLB miss interrupt, that is, the X attribute of the registration information is masked by the UX (User eXecute: user execution) and SX (Supervisor eXecute: supervisor execution) bits of the PTE, Turn it on (step 706) and end (step 707).
- FIG. 8 illustrates the use of the semaphore as a coherency handler entry and exit flow (800).
- start step 801
- acquire step 802
- end step 803
- start step 804
- notify step 805
- end step 806
- TLB directory memory 121 can be exclusively accessed by one semaphore
- a more preferable implementation is that a plurality of groups are provided so that a plurality of processors can improve the scalability and access the TLB directory memory 121 simultaneously.
- a semaphore is divided and assigned for each group of physical pages divided into two. For example, S semaphores are generated while the remainder system obtained by dividing the physical page number by S is a semaphore ID, and the divided physical pages are protected independently for each group.
- Semaphore ID mod (physical page number, S) (mod (a, b) represents the remainder when a is divided by b) It is.
- NUMA which is a distributed shared memory system
- different semaphores can be assigned to each NUMA node. Then, only when remote access is performed, the remote TLB directory memory is referenced and its semaphore is acquired. Otherwise, only the local TLB directory memory is referenced and its semaphore is acquired.
- both the TLB directory memory and the semaphore are distributed to NUMA nodes.
- the distributed TLB directory memory records the physical page number of the local system memory and the ID of the processor that caches it, and causes the distributed semaphore to protect the corresponding distributed TLB directory memory.
- remote TLB directory memory and remote semaphores are referenced only when remote access occurs. Other local accesses can be handled using only local TLB directory memory and local semaphores.
- FIG. 9 shows a coherent shared memory multiprocessor system 900 as an example of a configuration extended to such a hybrid system of SMP and NCC-NUMA.
- Each node includes a plurality of processors 901, a system memory 903 connected to each processor 901 by a coherent shared bus, that is, a shared bus coherent SMP 902, and a TLB directory memory 905 and a semaphore connected to the shared bus coherent SMP 902 by a bridge mechanism 904.
- Handler 906. The semaphore handler 906 is provided for the plurality of processors 901 to sequentially access the TLB directory memory 905 by the semaphore.
- Each node is connected to each other by the NCC-NUMA mechanism 907. Since the nodes are connected to each other by an inexpensive NCC-NUMA mechanism 907, the shared memory type multiprocessor system 900 can increase the number of nodes by suppressing the cost of hardware, that is, improve the scalability.
- the size of the TLB directory memory is 4 Mbytes. Therefore, in order to save the size of the TLB directory memory, for example, when applied to a NUMA system, as shown in FIG. 10, the TLB entry for RSM that each processor 1001 assigns to the remote system memory RSM (Remote System Memory).
- the number of 1002 is constrained, and the remainder is used as a TLB entry 1003 for LSM that is allocated to a local system memory LSM (Local System Memory). Then, the local TLB directory memory 1000 for LSM is assigned the duplicated TLB directory for the remote processor RP (Remote Processor) with a limited number of TLB entries and the number of remaining TLB entries. , And an entry obtained by duplicating a TLB directory for a local processor LP (Local Processor), and its capacity can be reduced. In particular, if the number of NUMA nodes is N, the number of TLB entries per CPU is E, and the number of entries assigned to the remote system memory is R, the number of TLB entries assigned to the local system memory is ER.
- the number of entries in the TLB directory memory per node is reduced from E * N to (N ⁇ 1) * R + 1 * (E ⁇ R).
- the TLB directory memory in 45 nm semiconductor technology The area required for this is only 1 mm 2 .
- a shared memory multiprocessor system can be configured with inexpensive parts such as general-purpose parts. Can be improved. In addition to searching for physical pages in a small TLB directory memory that manages only the TLB information of each processor, it is possible not only to handle multiple processes, but also to eliminate the need to change application programs. Scalability can be improved without incurring costs.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
・リードオンリーデータの共有
複数のプロセッサは、同じ物理ページへの読み出しと実行の権限を共有可能にする。データ読み出しまたは命令フェッチでTLB割り込みを生じ、リモートのプロセッサがその物理ページへ書き込みの権限を持つ場合は、そのリモートのプロセッサに対して、クリーン(CLEAN)コマンドをプロセッサ間割り込み(IPI)で通知して、リモートのプロセッサにその物理ページへの書き込みの権限をクリアにさせる。
・書き込みデータの排他制御
あるプロセッサが書き込みの権限を持つ物理ページに対し、他のプロセッサはどのようなアクセス権限も持たないようにする。つまり、ローカルのTLBに書き込みの権限が存在する物理ページには、リモートのTLBにどのようなアクセス権限も与えない。従って、書き込みアクセスがTLBミス割り込みまたはストレージ割り込みを起こしたときに、リモートのプロセッサがその物理ページへのアクセス権を持っているかどうかリモートのTLBを調べ、持っていれば、プロセッサ間割り込み(IPI)を使って、リモートのプロセッサにその物理ページのデータをリモートのキャッシュからフラッシュ(FLUSH)させる。
セマフォID=mod(物理ページ番号、S) (mod(a,b)はaをbで割った余りを表す)
である。この考え方を分散共有メモリシステムであるNUMAに適用するならば、異なるセマフォをNUMAノード毎に割り当てることができる。そうすると、リモートアクセスの行われる時にだけ、リモートのTLBディレクトリメモリの参照とそのセマフォの獲得がなされ、そうでなければ、ローカルのTLBディレクトリメモリの参照とそのセマフォの獲得をするだけで良い。
(1024プロセッサ)*(1024エントリ)*(4バイト)=4Mバイト
TLBディレクトリメモリのサイズは4Mバイトである。
そこで、TLBディレクトリメモリのサイズを節約するために、例えばNUMAシステムに適用する場合、図10に示すように、個々のプロセッサ1001がリモートのシステムメモリRSM(Remote System Memory)に割り当てるRSM 用のTLBエントリ1002の数に制約を加えて、残りをローカルのシステムメモリLSM(Local System Memory)に割り当てるLSM 用のTLBエントリ1003として用いる。そうすると、LSM 用のローカルのTLBディレクトリメモリ1000は、TLBエントリの数が制約された、リモートのプロセッサRP(Remote Processor)用のTLBディレクトリを複製したエントリと、残りのTLBエントリの数が割り当てられた、ローカルのプロセッサLP(Local Processor)用のTLBディレクトリを複製したエントリとで構成され、その容量を削減することができる。特に、NUMAノードの数をN、CPUあたりのTLBエントリ数をE, そのうち、リモートのシステムメモリに割り当てるエントリ数をRとすれば、ローカルのシステムメモリへ割り当てられるTLBエントリの数はE-Rになるので、ノード当たりTLBディレクトリメモリのエントリ数は、E*Nから(N-1)*R+1*(E-R)に削減される。上記の例では、1024のプロセッサが256のNUMAノードに分散し、ノード内は4-wayのSMPである構成で、リモートのTLBに割り当てるTLBエントリの数を16に制約を加えれば、TLBディレクトリメモリのサイズは、次の計算より、81.4Kバイトになる。
(1020プロセッサ)*(16エントリ)*(4バイト)+(4プロセッサ)*(1008エントリ)*(4バイト)=81.4Kバイト
それをCAMに実装すると、45nmの半導体テクノロジでは、TLBディレクトリメモリに要する領域面積はわずか1mm2である。
Claims (20)
- キャッシュおよびTLBを有する複数のプロセッサがシステムメモリを共有するマルチプロセッサシステムのキャッシュコヒーレンシを制御する方法であって、プロセッサが、
TLB検索でTLB割り込みを生じてページフォールトではないと判定したときに、TLBにアドレスのマッチする登録情報が存在しない場合であるTLBミス割り込みを処理するTLBミス例外処理ステップ、または、TLBにアドレスのマッチする登録情報は存在するがアクセス権限が違反している場合であるストレージ割り込みを処理するストレージ例外処理ステップ、
を実行することを含む、キャッシュコヒーレンシ制御の方法。 - 前記TLBミス例外処理ステップは、TLBリプレースメントを実行した時に、エビクションされて破棄されるビクティムTLBエントリがカバーする物理ページに属するキャッシュのデータキャッシュラインをフラッシュするステップを含む、請求項1に記載の方法。
- 前記TLBミス例外処理ステップまたは前記ストレージ例外処理ステップは、
前記TLBミス割り込みまたは前記ストレージ割り込みを生じたメモリアクセスがデータアクセスであるか命令アクセスであるかを判定するステップと、
前記メモリアクセスがデータアクセスであると判定したときに、当該アクセスについて置換または更新されるTLBエントリによりカバーされる物理ページに関する書き込み、読み出しおよび実行の権限に、他のプロセッサのTLBに設けられた当該物理ページのアクセス権限に対して排他的となる制約を加える処理ステップと、
を含む、請求項2に記載の方法。 - 前記排他的となる制約を加える処理ステップは、書き込み時無効化方式の制約を加える処理ステップを含む、請求項3に記載の方法。
- 前記書き込み時無効化方式の制約を加える処理ステップは、MESIプロトコルの制約を加えるMESIエミュレーション処理ステップを含む、請求項4に記載の方法。
- 前記MESIエミュレーション処理ステップは、
前記メモリアクセスがデータ書き込みか読み出しかを判定するステップと、
前記読み出しと判定したときに、前記プロセッサのTLBと前記複数のプロセッサのTLBの登録情報を保持するTLBディレクトリメモリとにおける、当該アクセスの物理ページについての読み出しの属性をオンにするステップと、
前記TLBディレクトリメモリを当該アクセスの物理ページについて検索して、他のプロセッサのTLBが当該アクセスの物理ページについての書き込みの権限を持っているか否かを判定するステップと、
前記書き込みの権限を持っているときに、前記他のプロセッサに対しクリーンコマンドをプロセッサ間割り込みで通知して、前記他のプロセッサに当該アクセスの物理ページへの書き込みの権限をクリアにさせるステップと、
前記TLBディレクトリメモリでの前記他のプロセッサのTLBに関する当該アクセスの物理ページについての書き込みの属性をクリアにするステップと、
を含む、請求項5に記載の方法。 - 前記他のプロセッサに前記アクセスの物理ページへの書き込みの権限をクリアにさせるステップは、前記他のプロセッサが、データキャッシュをコピーバックして、当該プロセッサのTLBにおける前記アクセスの物理ページについての書き込みの属性をディスエーブルにするステップを含む、請求項6に記載の方法。
- 前記MESIエミュレーション処理ステップは、
前記書き込みと判定したときに、前記プロセッサのTLBと前記TLBディレクトリメモリとにおける、前記アクセスの物理ページについての書き込みの属性をオンするステップと、
前記TLBディレクトリメモリを前記アクセスの物理ページについて検索して、前記他のプロセッサのTLBが前記アクセスの物理ページについての読み出し、書き込みまたは実行の権限を持っているか否かを判定するステップと、
前記読み出し、書き込みまたは実行の権限を持っているときに、前記他のプロセッサに対しフラッシュコマンドをプロセッサ間割り込みで通知して、前記他のプロセッサに前記アクセスの物理ページへの読み出し、書き込みおよび実行の権限をクリアにさせるステップと、
前記TLBディレクトリメモリでの前記他のプロセッサのTLBに関する前記アクセスの物理ページについての読み出し、書き込みおよび実行の属性をクリアにするステップと、
を含む、請求項6または7に記載の方法。 - 前記他のプロセッサに前記アクセスの物理ページへの読み出し、書き込みおよび実行の権限をクリアにさせるステップは、前記他のプロセッサが、データキャッシュをコピーバックして無効化し、当該プロセッサのTLBにおける前記アクセスの物理ページについての読み出し、書き込みおよび実行の属性をディスエーブルにするステップを含む、請求項8に記載の方法。
- 前記TLBミス例外処理ステップまたは前記ストレージ例外処理ステップは、
前記TLBミス割り込みまたは前記ストレージ割り込みを生じたメモリアクセスがデータアクセスであるか命令アクセスであるかを判定するステップと、
前記メモリアクセスが命令アクセスであると判定したときに、
命令フェッチで前記TLBミス割り込みを起こした物理ページについて前記システムメモリのページテーブルのエントリがユーザー書き込み許可の権限を持つかを判定するステップと、
前記ページテーブルのエントリがユーザー書き込み許可の権限を持つときは、当該物理ページについて他のプロセッサのTLBがユーザー書き込み許可の権限を持つかを判定するステップと、
前記他のプロセッサのTLBがユーザー書き込み許可の権限を持つときは、前記他のプロセッサに対しクリーンコマンドをプロセッサ間割り込みで通知して、前記他のプロセッサに当該ユーザー書き込み許可の権限をクリアにさせるステップと、
を含む、請求項2に記載の方法。 - 前記TLBミス例外処理ステップまたは前記ストレージ例外処理ステップは、前記他のプロセッサのTLBがユーザー書き込み許可の権限を持たないとき、または、前記他のプロセッサに前記ユーザー書き込み許可の権限をクリアにさせるステップの後に、前記アクセスをしたプロセッサの命令キャッシュを無効化するステップを含む、請求項10に記載の方法。
- 前記TLBミス例外処理ステップまたは前記ストレージ例外処理ステップは、前記ページテーブルのエントリがユーザー書き込み許可の権限を持たないとき、または、前記アクセスをしたプロセッサの命令キャッシュを無効化するステップの後に、前記アクセスをしたプロセッサのTLBと前記複数のプロセッサのTLBの登録情報を保持するTLBディレクトリメモリとにおける、前記命令フェッチでTLBミス割り込みを起こした物理ページについての実行の属性をオンにするステップを含む、請求項10または11に記載の方法。
- 前記MESIエミュレーション処理ステップは、前記TLBディレクトリメモリを前記アクセスの物理ページについて検索するときにセマフォを使用して逐次アクセスを行うステップを含む、請求項6または8に記載の方法。
- プロセッサに、請求項1~13のいずれか1項に記載の方法の各ステップを実行させるコンピュータプログラム。
- キャッシュおよびTLBを有する複数のプロセッサがシステムメモリを共有するマルチプロセッサシステムのキャッシュコヒーレンシを制御するシステムであって、
プロセッサが、TLB検索を実行するTLB検索部と前記TLB検索でヒットせずTLB割り込みを生じたときにTLBの登録情報処理を実行するコヒーレンシハンドラとを有するTLB制御部を含み、
前記コヒーレンシハンドラが、システムメモリのページテーブルの検索およびTLBの登録情報置換を実行するTLBリプレースメントハンドラと、前記TLB割り込みがページフォールトではないときに、TLBにアドレスのマッチする登録情報が存在しない場合であるTLBミス割り込みを処理するTLBミス例外処理部と、TLBにアドレスのマッチする登録情報は存在するがアクセス権限が違反している場合であるストレージ割り込みを処理するストレージ例外処理部とを含む、
キャッシュコヒーレンシ制御のシステム。 - 前記TLBミス例外処理部は、TLBリプレースメントが実行された時に、エビクションされて破棄されるビクティムTLBエントリがカバーする物理ページに属するキャッシュのデータキャッシュラインをフラッシュする、請求項15に記載のシステム。
- 前記TLBミス例外処理部および前記ストレージ例外処理部は、それぞれ、
前記TLBミス割り込みおよび前記ストレージ割り込みを生じたメモリアクセスがデータアクセスであるか命令アクセスであるかを判定し、
前記メモリアクセスがデータアクセスであると判定したときに、当該アクセスについて置換または更新されるTLBエントリによりカバーされる物理ページに関する書き込み、読み出しおよび実行の権限に、他のプロセッサのTLBに設けられた当該物理ページのアクセス権限に対して排他的となる制約を加える処理を実行する、請求項16に記載のシステム。 - 前記TLBミス例外処理部および前記ストレージ例外処理部は、それぞれ、
前記TLBミス割り込みおよび前記ストレージ割り込みを生じたメモリアクセスがデータアクセスであるか命令アクセスであるかを判定し、
前記メモリアクセスが命令アクセスであると判定したときに、
命令フェッチで前記TLBミス割り込みを起こした物理ページについて前記システムメモリのページテーブルのエントリがユーザー書き込み許可の権限を持つかを判定し、
前記ページテーブルのエントリがユーザー書き込み許可の権限を持つときは、当該物理ページについて他のプロセッサのTLBがユーザー書き込み許可の権限を持つかを判定し、
前記他のプロセッサのTLBがユーザー書き込み許可の権限を持つときは、前記他のプロセッサに対しクリーンコマンドをプロセッサ間割り込みで通知して、前記他のプロセッサに当該ユーザー書き込み許可の権限をクリアにさせる、
請求項16に記載のシステム。 - さらに、前記複数のプロセッサのTLBの登録情報を保持して前記複数のプロセッサからは物理ページについての検索がなされるTLBディレクトリメモリを含む、請求項15~18のいずれか1項に記載のシステム。
- 前記マルチプロセッサシステムは複数のノードからなり、当該各ノードは、それぞれ、前記複数のプロセッサと、コヒーレント共有バスにより前記複数のプロセッサに接続された前記システムメモリと、ブリッジ機構により前記コヒーレント共有バスに接続された前記TLBディレクトリメモリおよび前記複数のプロセッサの前記TLBディレクトリメモリへのセマフォによる逐次アクセスのためのセマフォハンドラとを含んで、NCC-NUMA機構により相互に接続される、請求項19に記載のシステム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1310002.9A GB2499168B (en) | 2010-11-26 | 2011-09-05 | Cache coherency control method, system, and program |
CN201180056650.2A CN103229152B (zh) | 2010-11-26 | 2011-09-05 | 高速缓存一致性控制方法、系统和程序 |
DE112011103433.4T DE112011103433B4 (de) | 2010-11-26 | 2011-09-05 | Verfahren, System und Programm zum Steuern von Cache-Kohärenz |
JP2012545639A JP5414912B2 (ja) | 2010-11-26 | 2011-09-05 | キャッシュコヒーレンシ制御の方法、システムおよびプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010263714 | 2010-11-26 | ||
JP2010-263714 | 2010-11-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012070291A1 true WO2012070291A1 (ja) | 2012-05-31 |
Family
ID=46145651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/070116 WO2012070291A1 (ja) | 2010-11-26 | 2011-09-05 | キャッシュコヒーレンシ制御の方法、システムおよびプログラム |
Country Status (6)
Country | Link |
---|---|
JP (1) | JP5414912B2 (ja) |
CN (1) | CN103229152B (ja) |
DE (1) | DE112011103433B4 (ja) |
GB (1) | GB2499168B (ja) |
TW (1) | TW201234180A (ja) |
WO (1) | WO2012070291A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015170313A (ja) * | 2014-03-10 | 2015-09-28 | 富士通株式会社 | 演算処理装置および演算処理装置の制御方法 |
JP2017004554A (ja) * | 2016-08-24 | 2017-01-05 | ルネサスエレクトロニクス株式会社 | 情報処理装置 |
JP2017520869A (ja) * | 2014-05-30 | 2017-07-27 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | マルチプロセッシング環境におけるページ・テーブルの状態インジケータの更新の同期させる方法 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9514058B2 (en) * | 2014-12-22 | 2016-12-06 | Texas Instruments Incorporated | Local page translation and permissions storage for the page window in program memory controller |
CN104750561B (zh) * | 2015-04-15 | 2019-09-10 | 苏州中晟宏芯信息科技有限公司 | 寄存器堆缓存资源的动态释放方法、系统及一种处理器 |
GB2539383B (en) * | 2015-06-01 | 2017-08-16 | Advanced Risc Mach Ltd | Cache coherency |
CN106326146B (zh) * | 2015-06-29 | 2019-05-14 | 上海华虹集成电路有限责任公司 | 检查高速缓存是否命中的方法 |
US10262721B2 (en) * | 2016-03-10 | 2019-04-16 | Micron Technology, Inc. | Apparatuses and methods for cache invalidate |
US10503641B2 (en) * | 2016-05-31 | 2019-12-10 | Advanced Micro Devices, Inc. | Cache coherence for processing in memory |
CN108536473B (zh) * | 2017-03-03 | 2021-02-23 | 华为技术有限公司 | 读取数据的方法和装置 |
TWI622881B (zh) * | 2017-04-25 | 2018-05-01 | Chunghwa Telecom Co Ltd | Cache replacement system and method thereof for memory computing cluster |
KR20190009580A (ko) * | 2017-07-19 | 2019-01-29 | 에스케이하이닉스 주식회사 | 컨트롤러 및 컨트롤러의 동작방법 |
GB2570474B (en) * | 2018-01-26 | 2020-04-15 | Advanced Risc Mach Ltd | Region fusing |
CN112612726B (zh) * | 2020-12-08 | 2022-09-27 | 海光信息技术股份有限公司 | 基于缓存一致性的数据存储方法、装置、处理芯片及服务器 |
CN113779649B (zh) * | 2021-09-08 | 2023-07-14 | 中国科学院上海高等研究院 | 一种针对投机执行攻击的防御方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04306750A (ja) * | 1991-04-03 | 1992-10-29 | Agency Of Ind Science & Technol | マルチプロセッサシステム |
JPH06187241A (ja) * | 1992-10-09 | 1994-07-08 | Internatl Business Mach Corp <Ibm> | 変換索引バッファのコヒーレンス維持方法及びシステム |
JPH06236353A (ja) * | 1993-01-08 | 1994-08-23 | Internatl Business Mach Corp <Ibm> | マルチプロセッサ・コンピュータ・システムのシステム・メモリの並行性を増大する方法およびシステム |
JP2001142780A (ja) * | 1999-10-01 | 2001-05-25 | Hitachi Ltd | 改善されたメモリ管理ユニット及びキャッシュメモリを有するマイクロプロセッサを用いたデータ処理方法 |
JP2004326798A (ja) * | 2003-04-28 | 2004-11-18 | Internatl Business Mach Corp <Ibm> | マルチプロセッサ・データ処理システム |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH083805B2 (ja) | 1985-06-28 | 1996-01-17 | ヒューレット・パッカード・カンパニー | Tlb制御方法 |
JP3713312B2 (ja) | 1994-09-09 | 2005-11-09 | 株式会社ルネサステクノロジ | データ処理装置 |
JP2000067009A (ja) | 1998-08-20 | 2000-03-03 | Hitachi Ltd | 主記憶共有型マルチプロセッサ |
US6829683B1 (en) * | 2000-07-20 | 2004-12-07 | Silicon Graphics, Inc. | System and method for transferring ownership of data in a distributed shared memory system |
US6633967B1 (en) * | 2000-08-31 | 2003-10-14 | Hewlett-Packard Development Company, L.P. | Coherent translation look-aside buffer |
US6854046B1 (en) * | 2001-08-03 | 2005-02-08 | Tensilica, Inc. | Configurable memory management unit |
US7082500B2 (en) * | 2003-02-18 | 2006-07-25 | Cray, Inc. | Optimized high bandwidth cache coherence mechanism |
US20080191920A1 (en) * | 2007-02-12 | 2008-08-14 | Sangbeom Park | Low-voltage drop reference generation circuit for A/D converter |
US7809922B2 (en) * | 2007-10-21 | 2010-10-05 | International Business Machines Corporation | Translation lookaside buffer snooping within memory coherent system |
US20100325374A1 (en) * | 2009-06-17 | 2010-12-23 | Sun Microsystems, Inc. | Dynamically configuring memory interleaving for locality and performance isolation |
-
2011
- 2011-09-05 GB GB1310002.9A patent/GB2499168B/en active Active
- 2011-09-05 WO PCT/JP2011/070116 patent/WO2012070291A1/ja active Application Filing
- 2011-09-05 DE DE112011103433.4T patent/DE112011103433B4/de active Active
- 2011-09-05 CN CN201180056650.2A patent/CN103229152B/zh not_active Expired - Fee Related
- 2011-09-05 JP JP2012545639A patent/JP5414912B2/ja not_active Expired - Fee Related
- 2011-11-07 TW TW100140540A patent/TW201234180A/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04306750A (ja) * | 1991-04-03 | 1992-10-29 | Agency Of Ind Science & Technol | マルチプロセッサシステム |
JPH06187241A (ja) * | 1992-10-09 | 1994-07-08 | Internatl Business Mach Corp <Ibm> | 変換索引バッファのコヒーレンス維持方法及びシステム |
JPH06236353A (ja) * | 1993-01-08 | 1994-08-23 | Internatl Business Mach Corp <Ibm> | マルチプロセッサ・コンピュータ・システムのシステム・メモリの並行性を増大する方法およびシステム |
JP2001142780A (ja) * | 1999-10-01 | 2001-05-25 | Hitachi Ltd | 改善されたメモリ管理ユニット及びキャッシュメモリを有するマイクロプロセッサを用いたデータ処理方法 |
JP2004326798A (ja) * | 2003-04-28 | 2004-11-18 | Internatl Business Mach Corp <Ibm> | マルチプロセッサ・データ処理システム |
Non-Patent Citations (2)
Title |
---|
AKIRA FUKUDA, HEIRETSU OPERATING SYSTEM, 20 May 1997 (1997-05-20), pages 105 - 165 * |
BRYAN S. ROSENBURG: "Low-Synchronization Translation Lookaside Buffer Consistency in Large-Scale Shared-Memory Multiprocessors", SOSP'89 PROCEEDINGS OF THE TWELFTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, 1989, pages 137 - 146 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015170313A (ja) * | 2014-03-10 | 2015-09-28 | 富士通株式会社 | 演算処理装置および演算処理装置の制御方法 |
JP2017520869A (ja) * | 2014-05-30 | 2017-07-27 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | マルチプロセッシング環境におけるページ・テーブルの状態インジケータの更新の同期させる方法 |
JP2017004554A (ja) * | 2016-08-24 | 2017-01-05 | ルネサスエレクトロニクス株式会社 | 情報処理装置 |
Also Published As
Publication number | Publication date |
---|---|
CN103229152B (zh) | 2016-10-19 |
JP5414912B2 (ja) | 2014-02-12 |
GB201310002D0 (en) | 2013-07-17 |
DE112011103433B4 (de) | 2019-10-31 |
CN103229152A (zh) | 2013-07-31 |
GB2499168A (en) | 2013-08-07 |
DE112011103433T5 (de) | 2013-07-25 |
JPWO2012070291A1 (ja) | 2014-05-19 |
TW201234180A (en) | 2012-08-16 |
GB2499168B (en) | 2014-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5414912B2 (ja) | キャッシュコヒーレンシ制御の方法、システムおよびプログラム | |
US20120137079A1 (en) | Cache coherency control method, system, and program | |
US9081711B2 (en) | Virtual address cache memory, processor and multiprocessor | |
US10552339B2 (en) | Dynamically adapting mechanism for translation lookaside buffer shootdowns | |
RU2443011C2 (ru) | Фильтрация отслеживания с использованием кэша запросов отслеживания | |
US7941631B2 (en) | Providing metadata in a translation lookaside buffer (TLB) | |
US8949572B2 (en) | Effective address cache memory, processor and effective address caching method | |
US9513904B2 (en) | Computer processor employing cache memory with per-byte valid bits | |
US7539823B2 (en) | Multiprocessing apparatus having reduced cache miss occurrences | |
EP3265917B1 (en) | Cache maintenance instruction | |
US8706965B2 (en) | Apparatus and method for handling access operations issued to local cache structures within a data processing apparatus | |
US10929295B2 (en) | Accelerating replication of page tables for multi-socket machines | |
US11392508B2 (en) | Lightweight address translation for page migration and duplication | |
WO2020243053A1 (en) | Multi-level cache security | |
GB2427715A (en) | Managing snoop operations in a multiprocessor system | |
KR910004263B1 (ko) | 컴퓨터 시스템 | |
Lai et al. | A memory management unit and cache controller for the MARS systeiil | |
Lai et al. | A memory management unit and cache controller for the MARS system | |
Goel et al. | E-cache memory becoming a boon towards memory management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11843926 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2012545639 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 112011103433 Country of ref document: DE Ref document number: 1120111034334 Country of ref document: DE |
|
ENP | Entry into the national phase |
Ref document number: 1310002 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20110905 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1310002.9 Country of ref document: GB |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11843926 Country of ref document: EP Kind code of ref document: A1 |