WO2022048187A1 - 一种发送清除报文的方法及装置 - Google Patents

一种发送清除报文的方法及装置 Download PDF

Info

Publication number
WO2022048187A1
WO2022048187A1 PCT/CN2021/093977 CN2021093977W WO2022048187A1 WO 2022048187 A1 WO2022048187 A1 WO 2022048187A1 CN 2021093977 W CN2021093977 W CN 2021093977W WO 2022048187 A1 WO2022048187 A1 WO 2022048187A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
physical address
memory
clear
message
Prior art date
Application number
PCT/CN2021/093977
Other languages
English (en)
French (fr)
Inventor
潘伟
吴峰光
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21863257.8A priority Critical patent/EP4156565A4/en
Publication of WO2022048187A1 publication Critical patent/WO2022048187A1/zh
Priority to US18/177,140 priority patent/US20230205691A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9047Buffering arrangements including multiple buffers, e.g. buffer pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0813Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/20Employing a main memory using a specific memory technology
    • G06F2212/206Memory mapped I/O
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • G06F2212/254Distributed memory
    • G06F2212/2542Non-uniform memory access [NUMA] architecture

Definitions

  • the embodiments of the present application relate to the field of computers, and in particular, to a method and apparatus for sending a clear message.
  • the processor core flushes data in the L3 cache, it can clear one cache line at a time. If the processor core clears multiple cache lines, since the processor core cannot concurrently clear the packets, the clearing time is too long, and more processor resources are consumed. For example, assuming a page size of main memory (page size) of 4 kilobytes (kilobyte, KB), a cache line size of 64 bytes (byte, B), a page consists of 64 64B cache lines . When the processor core clears the data of a page of the main memory, it needs to execute 64 consecutive clearing instructions and send 64 consecutive clearing messages to the L3 cache. In addition, the processor core also provides instructions to clear all cache lines in the L3 cache at one time.
  • the present application provides a method and device for sending a clear message, which solves the problem of how to improve the efficiency of clearing the cache line and reduce the occupation of processor memory resources when the processor core clears the cache line in the L3 cache.
  • the present application provides a method for sending a clear message, including: requesting a device to send a clear message to a group of cache memories in a broadcast manner, and receive messages from all cache memories in the group of cache memories. Clear completion message.
  • the physical address information carried in the clear packet is used to indicate clearing at least two cache lines.
  • a set of cache memories includes at least one cache memory.
  • the requesting devices include but are not limited to the following devices: processor cores, peripheral devices, and other devices connected to the bus.
  • the method for sending a clear message requests the device to send one clear message to clear at least two cache lines.
  • the clearing message is sent by broadcasting, which reduces the number of times that clearing messages are sent, so that the original serial processing of clearing tasks becomes parallel processing, which improves the clearing efficiency and effectively Reduced internal processor resources for clearing cache lines.
  • the clear message sent by the requesting device can indicate to clear any number of cache lines. Therefore, data that is being used by other cores or devices is avoided to be cleared, the performance of other processes is prevented from being affected, and the accuracy of clearing the cache line is improved.
  • the clearing packet is used to indicate clearing of at least two consecutive cache lines.
  • the clear message includes information used to indicate the physical address of a continuous physical address space, and at least two cache lines include the cache line where the starting physical address of the continuous physical address space is located to the ending physical address of the continuous physical address space. where the cache line is located.
  • the contiguous physical address space is the physical address space mapped to the main memory
  • the physical address of the cleared contiguous physical address space is aligned with the physical address of the cache line
  • the cleared contiguous physical address space is based on the size of the cache line. Space.
  • the requesting device clears cache lines in the cache memory at page size granularity.
  • the flush message may instruct to flush all contiguous cache lines within the page, whereby one or more cache memories flush the cache lines contained in a page according to the instructions of the flush message.
  • the physical address information of a continuous physical address space includes the starting physical address of the page, and the clear message also includes the page type, which is used to describe the size of the page.
  • the physical address information of a segment of a continuous physical address space includes the starting physical address of the page and the ending physical address of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of the cache line in at least one page, and the clear message also includes the page type of each page, and the page type is used to describe the size of the page.
  • the requesting device clears cache lines in the cache memory at the granularity of an arbitrary number of cache lines.
  • the flush message may instruct to flush consecutive cache lines within a page or consecutive cache lines within multiple pages. Therefore, one or more cache memories clear consecutive cache lines in one page or consecutive cache lines in multiple pages according to the instruction of the flush message.
  • the physical address information of a segment of continuous physical address space includes the physical address of a cache line and the number of cache lines.
  • the physical address information of a segment of continuous physical address space includes a physical address and an immediate value
  • the immediate value indicates the number of low-order bits in a physical address
  • the clear packet is used to indicate to clear at least two non-consecutive cache lines.
  • the present application provides a method for clearing a cache, comprising: the cache receiving a clear message from a requesting device, and clearing the cache lines within the jurisdiction of the cache among at least two cache lines indicated by the clear message ; Send a clear complete message to the requesting device.
  • the flush message is used to indicate that at least two cache lines are flushed.
  • the requesting devices include but are not limited to the following devices: processor cores, peripheral devices, and other devices connected to the bus.
  • the cache memory clears one cache line at a time
  • the method for clearing the cache provided by this embodiment clears any number of cache lines according to the clearing instruction. Therefore, the original serially processed clearing task is changed to parallel processing, the clearing efficiency is improved, and the internal resources of the processor occupied by clearing the cache line are effectively reduced.
  • the cache memory clears the cache lines in all L3 caches
  • the cache clearing method provided by this embodiment clears any number of cache lines according to the clear instruction. Therefore, data that is being used by other cores or devices is avoided to be cleared, the performance of other processes is prevented from being affected, and the accuracy of clearing the cache line is improved.
  • the clearing packet is used to indicate clearing of at least two consecutive cache lines.
  • the clear message includes information used to indicate the physical address of a continuous physical address space, and at least two cache lines include the cache line where the starting physical address of the continuous physical address space is located to the ending physical address of the continuous physical address space. where the cache line is located.
  • the contiguous physical address space is the physical address space mapped to the main memory
  • the physical address of the cleared contiguous physical address space is aligned with the physical address of the cache line
  • the cleared contiguous physical address space is based on the size of the cache line. Space.
  • the physical address information of a continuous physical address space includes the starting physical address of the page, and the clear message also includes the page type, which is used to describe the size of the page.
  • the physical address information of a segment of continuous physical address space includes the starting physical address of the page and the ending physical address of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of the cache line in at least one page, and the clear message also includes the page type of each page, and the page type is used to describe the size of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of a cache line and the number of cache lines.
  • the physical address information of a segment of continuous physical address space includes a physical address and an immediate value
  • the immediate value indicates the number of low-order bits in a physical address
  • the clear packet is used to indicate to clear at least two non-consecutive cache lines.
  • all cache lines cleared by the clear message are stored in a set of cache memories
  • a set of cache memories includes at least one cache memory
  • the cache memory is any one cache memory in a set of cache memories.
  • the cache memories included in a group of cache memories belong to one or more NUMA nodes in a non-uniform memory access (NUMA) system.
  • NUMA non-uniform memory access
  • the present application provides an apparatus for sending a clear message, and the beneficial effects can be referred to the description of the first aspect and will not be repeated here.
  • the device for sending a clear message has the function of implementing the behavior in the method example of the first aspect above.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the device for sending a clear message includes a sending unit and a receiving unit.
  • the sending unit is used to send a clear message to a group of cache memories in a broadcast manner, and the information of the physical address carried in the clear message is used to instruct to clear at least two cache lines, and a group of cache memories includes at least one cache memory. memory.
  • the receiving unit is configured to receive clearing completion messages from all cache memories in a group of cache memories.
  • the clearing packet is used to indicate clearing of at least two consecutive cache lines.
  • the clear message includes information used to indicate the physical address of a continuous physical address space, and at least two cache lines include the cache line where the starting physical address of the continuous physical address space is located to the ending physical address of the continuous physical address space. where the cache line is located.
  • the contiguous physical address space is the physical address space mapped to the main memory
  • the physical address of the cleared contiguous physical address space is aligned with the physical address of the cache line
  • the cleared contiguous physical address space is based on the size of the cache line. Space.
  • the physical address information of a continuous physical address space includes the starting physical address of the page, and the clear message also includes the page type, which is used to describe the size of the page.
  • the physical address information of a segment of a continuous physical address space includes the starting physical address of the page and the ending physical address of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of the cache line in at least one page, and the clear message also includes the page type of each page, and the page type is used to describe the size of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of a cache line and the number of cache lines.
  • the physical address information of a segment of continuous physical address space includes a physical address and an immediate value
  • the immediate value indicates the number of low-order bits in a physical address
  • the clear packet is used to indicate to clear at least two non-consecutive cache lines.
  • the present application provides an apparatus for clearing a cache, and the beneficial effects can be found in the description of the second aspect and will not be repeated here.
  • the device for sending a clear message has the function of implementing the behavior in the method example of the second aspect above.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the device for sending a clear message includes a sending unit, a processing unit and a receiving unit.
  • the receiving unit is configured to receive a clearing message from the requesting device, where the clearing message is used to instruct to clear at least two cache lines.
  • the processing unit is configured to clear the cache lines within the jurisdiction of the cache memory among the at least two cache lines indicated by the clear message.
  • the sending unit is configured to send a clearing completion message to the requesting device.
  • the clearing packet is used to indicate clearing of at least two consecutive cache lines.
  • the clear message includes information used to indicate the physical address of a continuous physical address space, and at least two cache lines include the cache line where the starting physical address of the continuous physical address space is located to the ending physical address of the continuous physical address space. where the cache line is located.
  • the contiguous physical address space is the physical address space mapped to the main memory
  • the physical address of the cleared contiguous physical address space is aligned with the physical address of the cache line
  • the cleared contiguous physical address space is based on the size of the cache line. Space.
  • the physical address information of a continuous physical address space includes the starting physical address of the page, and the clear message also includes the page type, which is used to describe the size of the page.
  • the physical address information of a segment of a continuous physical address space includes the starting physical address of the page and the ending physical address of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of the cache line in at least one page, and the clear message also includes the page type of each page, and the page type is used to describe the size of the page.
  • the physical address information of a segment of continuous physical address space includes the physical address of a cache line and the number of cache lines.
  • the physical address information of a segment of continuous physical address space includes a physical address and an immediate value
  • the immediate value indicates the number of low-order bits in a physical address
  • the clear packet is used to indicate to clear at least two non-consecutive cache lines.
  • the present application provides a processor, the processor includes at least one processor core and at least one cache memory, when the processor core executes a set of computer instructions, the above-mentioned first aspect or the first aspect is implemented
  • the processor further includes a ring bus, a peripheral management module and a memory manager, and the processor core, the cache memory, the peripheral management module and the memory manager are connected through the ring bus .
  • the processor further includes a mesh bus, a peripheral management module and a memory manager, and the processor core, the cache memory, the peripheral management module and the memory manager pass through the mesh. bus connection.
  • the present application provides a computing device, the computing device may include a processor and peripherals, the processor includes at least one processor core and a cache memory, when the processor core or the peripherals execute a set of computer instructions, To implement the method described in any one of the first aspect or possible implementations of the first aspect, when the cache memory executes a set of computer instructions, the above-mentioned second aspect or any of the possible implementations of the second aspect is implemented. one of the methods described.
  • the processor further includes a ring bus, a peripheral device management module and a memory manager, and the processor core, the cache memory, the peripheral device management module and the memory manager are connected through the ring bus.
  • the processor further includes a mesh bus, a peripheral management module and a memory manager, and the processor core, the cache memory, the peripheral management module and the memory manager are connected through the mesh bus.
  • the present application provides a computer-readable storage medium, comprising: computer software instructions; when the computer software instructions are executed in a computing device, the computing device enables the computing device to execute the first aspect or possible implementations of the first aspect, or The method described in any one of the second aspect or possible implementation manners of the second aspect.
  • the present application provides a computer program product that, when the computer program product is run on a computing device, enables the computing device to perform the first aspect or possible implementations of the first aspect, or the second aspect or the second aspect possible
  • FIG. 1 is a schematic diagram of the composition of a computing device according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a method for sending a clear message and clearing a cache provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of the composition of a continuous physical address space provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a cleared cache line according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a cleared cache line according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a cleared cache line provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a cleared cache line according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a cache line provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the composition of a device for sending a clear message according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the composition of a device for clearing cache provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the composition of a computing device according to an embodiment of the application.
  • FIG. 12 is a schematic diagram of the composition of a computing device according to an embodiment of the present application.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as “exemplary” or “such as” should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present the related concepts in a specific manner.
  • FIG. 1 is a schematic diagram of the composition of a computing device provided by an embodiment of the application.
  • the computing device 100 includes a processor 110, a main memory (main memory) (abbreviation: main memory) 120, and an external device (external device) 130.
  • main memory main memory
  • external device external device
  • the processor 110 is the control center of the computing device 100 .
  • the processor 110 is a central processing unit (CPU), including one processor core (core) or multiple processor cores.
  • the processor 110 shown in FIG. 1 includes N processor cores.
  • memory is divided into: registers, cache memory (cache), main memory, disk.
  • Cache memory is a high-speed small-capacity memory between the CPU and main memory.
  • the cache memory includes a first-level cache (L1cache), a second-level cache (L2cache), and a third-level cache (L3cache).
  • L1 cache is located inside the processor core.
  • L2 cache may be located inside the processor core or outside the processor core.
  • Level 1 and Level 2 caches are typically exclusive to the processor core in which they reside.
  • the L3 cache is generally located outside the processor core and is shared by multiple processor cores.
  • a processor may contain multiple L3 caches.
  • the processor 110 shown in FIG. 1 includes a plurality of cache memories 111 .
  • the cache memory 111 is a third-level cache memory in the processor 110 .
  • the cache memory 111 is used to store instructions or data that may be accessed multiple times by the processor cores in the processor 110 . Therefore, the speed of processing data by the processor is improved, and frequent access to the main memory by the processor is avoided.
  • the cache memory 111 includes a cache unit 1111 connected to a cache slice 1112 .
  • a cache line is the smallest cache unit in the cache slice 1112.
  • the size of a cache line (cache line size) can be 32 bytes (byte, B), 64 bytes, 128 bytes or 256 bytes, etc. It is assumed that the storage capacity of the cache slice 1112 is 512 bytes, the size of the cache line is 64 bytes, and the storage capacity of the cache slice 1112 is divided into 8 cache lines.
  • the processor 110 is connected to the main memory 120 through a memory manager (memory control, MC) 113 .
  • MC memory control
  • the cache manager 1111 is used to manage the cache lines in the cache slice 1112 according to the instructions of the processor core. For example, the cache manager 1111 determines whether to acquire a new cache line from the main memory 120 or feed back the existing cache line to the processor according to the cache line read instruction of the processor core and the state of the cache line in the cache slice 1112 nuclear. For another example, the cache manager 1111 clears the cache lines in the cache slice 1112 according to the instruction of the processor core. If the cache lines in the cache slice 1112 are in a modified state, the cache manager 1111 may write back the cache lines in the cache slice 1112 to the main memory 120 or discard them directly according to different types of clear instructions.
  • the cache manager 1111 may discard the cache lines in the cache slice 1112 .
  • the unmodified state includes an exclusive state (exclusive, E), a shared state (shared, S), and an invalid state (invalid, I).
  • the exclusive state means that the data in the cache line 1112 is consistent with the content of the corresponding cache line in the main memory, but the cache line 1112 is only stored in a cache memory of a NUMA domain.
  • the shared state means that the data in the cache line 1112 is consistent with the content of the corresponding cache in the main memory, but the cache line 1112 may be stored in one cache memory of multiple NUMA domains.
  • the invalid state means that the cache line is not cached in this cache slice 1112 . In this embodiment, clearing the cache line may also be described as flushing the cache line instead.
  • the processor core is connected to the cache memory 111 through the bus 112 , and accesses the cache memory 111 through the bus 112 .
  • the bus 112 may be an industry standard architecture (ISA) bus, a peripheral component (PCI) bus, or an extended industry standard architecture (EISA) bus, or a proprietary non-standard architecture bus standard, etc.
  • ISA industry standard architecture
  • PCI peripheral component
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 1, but it does not mean that there is only one bus or one type of bus.
  • the processor 110 may perform various functions of the computing device 100 by running or executing software programs stored in the main memory 120 and calling data stored in the main memory 120 .
  • the processor core sends a clear message to a group of cache memories 111 in a broadcast manner. Understandably, all caches within the broadcast address range can receive the clear message.
  • the information of the physical address carried in the clear packet is used to indicate clearing of at least two cache lines.
  • a set of cache memories includes at least one cache memory.
  • the processor core receives a flush complete message from all caches in a set of caches.
  • the processor core sends one clear message to clear one cache line each time
  • the processor core sends one clear message to clear at least two cache lines.
  • the clearing message is sent by broadcasting, which reduces the number of times that clearing messages are sent, so that the serially processed clearing task becomes parallel processing, which improves the clearing efficiency and effectively reduces the The internal resources of the processor occupied by clearing the cache line.
  • the method for sending a clear message Compared with the traditional technology in which the processor core clears the cache lines in all L3 caches, the method for sending a clear message provided by this embodiment, the clear message sent by the processor core can instruct to clear any number of cache lines. Therefore, data that is being used by other processor cores or devices is avoided to be cleared, the performance of other processes is avoided to be affected, and the accuracy of clearing the cache line is improved.
  • the cache memory 111 receives the clear message, clears the cache lines within the jurisdiction of the cache memory among at least two cache lines indicated by the clear message, and sends a clear complete message to the processor core. It should be noted that, if the cache line indicated by the clear message is not within the jurisdiction of the cache memory 111, the cache memory 111 also sends a clear complete message to the processor core.
  • the cache memory 111 may be random access memory (RAM), static random-access memory (SRAM), dynamic RAM (DRAM), or can be Other types of storage devices that store information and instructions.
  • RAM random access memory
  • SRAM static random-access memory
  • DRAM dynamic RAM
  • the cache memory 111 can be a L3 cache, a L1 cache, a L2 cache, or any level of cache device. As long as the cache device is a distributed cache device, the cache lines will be distributed in multiple caches. in the cache device at the same level.
  • Main memory 120 may be read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (RAM) or other type of static storage device that can store information and instructions Type of dynamic storage device, it can also be an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM) and the like.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • the main memory 120 is also used to store programs related to this embodiment.
  • the processor 110 may also include a peripheral management module 114 .
  • the peripheral device management module 114 is respectively connected to the bus 112 and the peripheral device 130 .
  • the peripheral 130 may be an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • FPGA field programmable gate arrays
  • GPU graphics processor
  • NPU neural-network processing unit
  • Peripherals 130 may also broadcast clear messages to a group of caches. For a specific explanation of the clear message sent by the peripheral device 130, reference may be made to the above description of the processor core sending the clear message.
  • the processor 110 may also contain internal devices 115 .
  • the internal device 115 includes a logical internet protocol (IP) unit connected to the bus 112 and a module with certain logical management functions.
  • IP internet protocol
  • logical management function modules include but are not limited to: interrupt management module, NUMA management module (or NUMA node manager), internal integrated input and output (IO) module, internal encryption and decryption module, internal direct memory access (direct memory access) memory access, DMA) modules, etc.
  • the internal device 115 may also broadcast a clear message to a group of cache memories as needed. It should be noted that all modules or devices connected to the bus can send clear messages to a group of cache memories by broadcasting.
  • the device structure shown in FIG. 1 does not constitute a limitation on the computing device, and may include more or less components than shown, or combine some components, or arrange different components.
  • the requesting device sends a clear message to a group of cache memories in a broadcast manner.
  • Requested devices include but are not limited to the following devices: processor cores, peripherals, and other devices connected to the bus.
  • processor cores For example, processor core 0, processor core 1 or peripheral 130 as shown in FIG. 1 .
  • Other devices connected to the bus include internal devices 115 as shown in FIG. 1 .
  • the device eg, processor core
  • the device After obtaining the clearing instruction, the device (eg, processor core) connected to the bus generates a clearing message according to the clearing instruction, and transmits the clearing message to a group of cache memories by broadcasting.
  • the processing module in the peripheral device controls the peripheral device to send a clear command or a clear message to the peripheral device management module by setting the register of the peripheral device.
  • the peripheral management module converts the clear command or clear message into a bus type clear message, and transmits the clear message to a group of cache memories by broadcasting.
  • the so-called broadcast mode refers to a one-to-many mode.
  • the requesting device sends a clear message, the clear message contains the broadcast address, and all caches within the broadcast address range can receive the clear message. That is, the number of clear messages sent by the requesting device is less than the number of caches that receive clear messages.
  • a requesting device broadcasts a clear message to a group of cache memories, and each cache memory in a group of cache memories can receive a clear message from the requesting device.
  • the requesting device sends two clear messages to a group of cache memories in a broadcast manner, and each cache memory in a group of cache memories can receive the clear message from the requesting device.
  • a set of cache memories includes at least one cache memory.
  • any one of the processor cores in the processor may divide the cache memory in the processor into multiple NUMA nodes according to a hash algorithm.
  • the caches contained in a set of caches may belong to one or more NUMA nodes in the NUMA system.
  • the plurality of caches 111 in Figure 1 may belong to one or more NUMA nodes.
  • the processor core or peripheral 130 may send a flush message to the cache memory 111 within one or more NUMA nodes.
  • each NUMA node is managed by a NUMA node manager, which is connected to bus 112 . If the caches included in a set of cache memories belong to the first NUMA node, the processor core or peripheral 130 may send a clear message to the NUMA node manager of the first NUMA node, and the NUMA node manager will send the clear message Sent to each cache. Furthermore, all cache memories in a group of cache memories, after clearing the cache lines within their own jurisdiction among the at least two cache lines indicated by the clear message, send the clear complete message to the NUMA node manager, The clear complete message is sent to the processor core or peripheral 130 by the NUMA node manager.
  • a set of cache memories includes cache memories belonging to multiple NUMA nodes (eg, a first NUMA node, a second NUMA node, and a third NUMA node).
  • the first NUMA node is managed by the first NUMA node manager.
  • the second NUMA node is managed by the second NUMA node manager.
  • the third NUMA node is managed by the third NUMA node manager.
  • the processor core or peripheral device 130 may send a clear message to the first NUMA node manager, and the first NUMA node manager is upgraded to the master manager of the clear message due to the acceptance of the request.
  • the first NUMA node manager sends a clear message to the second NUMA node manager and the third NUMA node manager.
  • a flush message is sent by the first NUMA node manager to each cache within the first NUMA node.
  • a flush message is sent by the second NUMA node manager to each cache within the second NUMA node.
  • a flush message is sent by the third NUMA node manager to each cache within the third NUMA node.
  • each cache memory in the second NUMA node sends a purge complete message to the second NUMA node manager, and the second NUMA node manager sends a purge complete message to the first NUMA node manager.
  • Each cache memory in the third NUMA node sends a purge complete message to the third NUMA node manager, and the third NUMA node manager sends a purge complete message to the first NUMA node manager.
  • the first NUMA node manager sends the flush complete message fed back by each cache memory in the second NUMA node and the third NUMA node and the flush complete message fed back by each cache memory in the first NUMA node to the processing core or peripheral 130.
  • the counting function of the feedback clearing completion message may be implemented in the first NUMA node manager, the second NUMA node manager, or the third NUMA node manager. For example, after receiving all clearing complete messages within its jurisdiction, the second NUMA node manager feeds back a clearing complete message to the first NUMA node manager.
  • the first NUMA node manager only sends a feedback clearing completion message to the processor core or peripheral 130, which can also be implemented inside the processor core or peripheral 130, and the first NUMA node manager sends all feedback clearing completion messages. The text is forwarded to the processor core or peripheral 130 .
  • the requesting device is an external device
  • the memory on the device can be used as main memory.
  • the external device needs to send a read request to a cache memory in the NUMA domain through the peripheral device management module 114 and the bus 112 to access a cache line in the memory on the device. If the cache line in the cache memory misses, the cache memory passes the address resolution and needs to send the read request to the memory controller of the external device. After the memory controller of the external device reads the corresponding cache line, it feeds back to the cache memory, and the cache memory caches the cache line in the cache slice 1112, and then feeds back the cache line to the external device.
  • the external device If the external device has full ownership of all cache lines of a page, the external device will be able to read the external device's memory directly without going through the cache.
  • the external device can send a clear message to all cache memories in one or more NUMA domains according to the method for clearing the cache provided in this embodiment, instructing the cache memories to clear all cache lines in a page.
  • the cache line of the above page can be directly read and written. Until another processor core or device requests a cache line in the above page.
  • the triggering condition for requesting the device to send the clear message may also be the cache management cache line. For example, cache memory actively flushes cache lines.
  • the information of the physical address carried in the clearing packet is used to indicate clearing at least two cache lines.
  • the physical address may be obtained by translating the logical address indicated by the clear instruction by an address translator module in the processor core.
  • the flush message is used to indicate flushing of at least two consecutive cache lines.
  • the clear packet includes information used to indicate a physical address of a segment of continuous physical address space.
  • the contiguous physical address space may be a physical address space mapped to main memory.
  • the continuous physical address space is a part of the physical address space in the physical address space mapped to the main memory.
  • the contiguous physical address space is the entire physical address space mapped to the physical address space of main memory. It should be understood that the operating system manages the physical address space mapped to the main memory in units of pages, so that the processor core can read and write data mapped to the physical address space of the main memory. Therefore, the physical address space mapped to main memory can be divided into multiple pages. Each page is divided by the size of the cache line.
  • the system physical address space in a computer system represents the memory size occupied by a computer entity.
  • the system physical address space includes the physical address space mapped to the main memory and the physical address space of memory mapped I/O (memory mapped I/O, MMIO).
  • the physical address space mapped to main memory is a portion of the system's physical address space in a computer system.
  • the other physical address space is a portion of the system's physical address space in a computer system.
  • the continuous physical address space may be other physical address spaces in the system physical address space except the physical address space mapped to the main memory.
  • the other physical address space is the physical address space of MMIO.
  • the continuous physical address space is a partial physical address space in other physical address spaces.
  • the contiguous physical address space is the entire physical address space of other physical address spaces.
  • FIG. 3 is a schematic diagram of a contiguous physical address space. As shown in (a) of FIG. 3, it is assumed that the size of the system physical address space is 2 46 . Wherein, a segment of continuous physical address space is a part of the physical address space in the physical address space mapped to the main memory.
  • the size of a cache line is 64 bytes, and the size of a page is 4KB.
  • a 4KB page contains 64 consecutive cache lines.
  • Physical address 0x000 represents the start address of the first page.
  • Physical address 0x1000 represents the start address of the second page.
  • the first page contains 64 consecutive cache lines from physical addresses 0x000 to 0x1000.
  • the second page contains 64 consecutive cache lines from physical address 0x1000 to physical address 0x2000.
  • a segment of continuous physical address space is the entire physical address space of MMIO.
  • the physical address 0x2_0000_0000 represents the starting address of the entire physical address space of MMIO.
  • a segment of continuous physical address space includes a physical address space mapped to the main memory and a physical address space not mapped to the main memory.
  • a segment of contiguous physical address space includes a portion of the physical address space that is mapped to the main memory and a portion of the physical address space that is not mapped to the main memory.
  • the at least two cache lines described in this embodiment include a cache line where the starting physical address of a segment of continuous physical address space is located to a cache line where the end physical address of a segment of continuous physical address space is located.
  • the physical address of the cleared contiguous physical address space is aligned with the physical address of the cache line, and the cleared contiguous physical address space is the space in the unit of the size of the cache line.
  • the starting physical address of a continuous physical address space can be any physical address in the physical address space of MMIO.
  • the end physical address of a continuous physical address space can be any physical address in the physical address space of MMIO.
  • the starting physical address of a continuous physical address space can be any physical address in the physical address space mapped to the main memory.
  • the ending physical address of a continuous physical address space can be any physical address in the physical address space mapped to the main memory.
  • the continuous physical address space may be the physical address space mapped to the main memory, and the at least two cache lines indicated by the clear message are cache lines mapped to the physical address space of the main memory.
  • the starting physical address and ending physical address of a continuous physical address space are the physical addresses in the physical address space mapped to the main memory.
  • the starting physical address of a segment of contiguous physical address space may be any physical address between two cache lines in the physical address space mapped to main memory.
  • the ending physical address of a continuous physical address space may be any physical address between two cache lines in the physical address space mapped to the main memory.
  • a segment of continuous physical address space is a continuous physical address space between the physical address at point A and the physical address at point B.
  • the clear message indicates to clear the cache line where the physical address at point A is located to the cache line where the physical address at point B is located.
  • the cache line where the physical address at point A is located is the 63rd cache line of the first page
  • the cache line where the physical address at point B is located is the first cache line of the second page.
  • the clear message instructs to clear the 63rd cache line of the first page, the 64th cache line of the first page, and the 1st cache line of the second page.
  • the starting physical address or ending physical address of a segment of contiguous physical address space may be a physical address of a cache line in the physical address space mapped to main memory.
  • the cache memory may clear the cache line indicated by the physical address of the cache line, or the cache memory may not clear the cache line indicated by the physical address of the cache line.
  • the flush message can instruct to flush the 63rd cache line of the first page and the 64th cache line of the first page , does not clear the first cache line of the second page.
  • the clear message instructs to clear the 63rd cache line of the first page, the 64th cache line of the first page, and the 1st cache line of the second page.
  • the requesting device clears cache lines in the cache memory at page size granularity.
  • a flush message may instruct to flush all consecutive cache lines within the page.
  • the physical address information of a segment of a continuous physical address space includes the starting physical address of the page.
  • the starting physical address of a page is an address where bits 0 to 11 of any physical address are 0.
  • the clear message also includes the page type, which is used to describe the size of the page. For example, page sizes specified in the x86 architecture include 4K, 2M, and 1G.
  • the starting physical address of the page included in the clear message is 0x000
  • the size of the page is 4K
  • the size of one cache line is 64 bytes.
  • the clear message instructs to clear the 64 cache lines in the first page, that is, the 64 cache lines between the physical address 0x000 and the physical address 0x1000.
  • the starting physical address of the page is 0x040
  • the size of the page is 4K
  • the size of one cache line is 64 bytes.
  • the clear message instructs to clear 64 cache lines from physical address 0x040 to physical address 0x1040.
  • the starting physical address of the page is 0x000
  • the size of the page is 8K
  • the size of one cache line is 64 bytes.
  • the clear message instructs to clear 128 cache lines from physical address 0x040 to physical address 0x200.
  • the clearing instruction may be preset in the computing device.
  • the PKFLUSH instruction is used to clear 4K pages.
  • the PMFLUSH instruction is used to clear 2M pages.
  • the PGFLUSH instruction is used to clear 1G pages.
  • PKFLUSH a general purpose registers used to store logical addresses.
  • the requesting device may send a flush message to a group of cache memories according to the flush command PKFLUSH, instructing to flush 4K pages.
  • the information on the physical addresses of a segment of continuous physical address space includes the starting physical address of the page and the ending physical address of the page.
  • the start physical address of the page included in the clear message is 0x000
  • the end physical address of the page is 0x1000.
  • the clear message instructs to clear the 64 cache lines in the first page, that is, the 64 cache lines between the physical address 0x000 and the physical address 0x1000.
  • the starting physical address of the page included in the clear message is 0x000
  • the ending physical address of the page is 0x2000.
  • the clear message instructs to clear 64 cache lines in the first page and 64 cache lines in the second page, that is, 128 cache lines between physical addresses 0x000 and 0x2000.
  • the information of the physical addresses of a segment of a continuous physical address space includes the physical addresses of the cache lines in the page.
  • the clear message also includes the page type, which is used to describe the size of the page.
  • the physical address of a cache line in a page can be the physical address of any one of the cache lines in the page.
  • the physical address 0xFC0 of the cache line in the page included in the clear message and the size of the page is 4K.
  • the clear message instructs to clear the 64 cache lines in the first page, that is, the 64 cache lines between the physical address 0x000 and the physical address 0x1000.
  • the physical addresses 0xFC0 and 0x1040 of the cache lines in the page included in the clear message and the size of the page is 4K.
  • the clear message instructs to clear 64 cache lines in the first page and 64 cache lines in the second page, that is, 128 cache lines between physical addresses 0x000 and 0x2000.
  • the requesting device clears cache lines in the cache memory at the granularity of any number of cache lines.
  • the flush message may instruct to flush consecutive cache lines within a page or consecutive cache lines within multiple pages.
  • the physical address information of a segment of continuous physical address space includes the physical address of a cache line and the number of cache lines. Assuming that the number of cache lines to be cleared by the clear packet is N, the clear packet instructs to clear N consecutive cache lines starting from the physical address of the cache line.
  • N cache lines may be cleared starting from the physical address of the cache line and in a decreasing physical address direction.
  • the physical address of the cache line included in the clear message is 0x1000, and the number of cache lines is 2.
  • the clear message instructs to clear the 64th cache line and the 63rd cache line in the first page.
  • N cache lines may be cleared starting from the physical address of the cache line and in the direction of increasing physical addresses.
  • the physical address of the cache line included in the clear message is 0x1000, and the number of cache lines is 2.
  • the clear message instructs to clear the 64th cache line in the first page and the 1st cache line in the second page.
  • a segment of continuous physical address space is the physical address space of MMIO.
  • the clear message instructs to clear N consecutive cache lines starting from the physical address of the cache line in the physical address space of the MMIO. It should be noted that, after the clear message instructs to clear the cache line in the physical address space of the MMIO, it is not allowed to write the cleared cache line back to the MMIO.
  • the physical address information of a segment of continuous physical address space includes a physical address and an immediate value.
  • the immediate value indicates the number of low-order bits in a physical address.
  • the cache memory may generate a mask based on the immediate value, use the mask and the physical address to generate a new physical address, and use the new physical address to clear the cache line. For the specific method of clearing the cache line by using the new physical address, reference may be made to the various possible implementation manners described above, which will not be repeated.
  • a clear message instructs to clear a base-2 block.
  • the requesting device may construct flush instructions that clear cache lines of arbitrary granularity, eg, RFLUSH instructions.
  • the instruction format can be written as: RFLUSH es:[esi],imm8.
  • the setting value imm8 is an integer from 6 to the largest physical address, eg 6,7,8...51.
  • the RFLUSH instruction instructs to clear all cache lines in the physical address space mapped to main memory.
  • the RFLUSH instruction instructs to clear a cache line.
  • the RFLUSH instruction instructs to clear any power-of-2 cache lines.
  • the setting value is equal to 12
  • it is equivalent to perform AND operation with the input physical address and 0x000 as the starting physical address, and perform the OR operation with the input physical address and 0xFFF as the ending physical address.
  • the input physical address is 0xABCDE010
  • the clear message indicates to clear all cache lines in the physical address space from physical address 0xABCDE000 to physical address 0xABCDEFFF
  • the set value is equal to 16
  • the clear message indicates to clear the physical address 0xABCD0000 All cache lines within the physical address space to physical address 0xABCDFFFF.
  • the physical address indicating the cache line in the physical address may be deleted, the address indicating the page may be reserved, or a part of the address in the address indicating the page may be reserved.
  • the clear message may contain the address of the indicated page or a partial address of the address of the indicated page.
  • the physical address information of a segment of continuous physical address space includes a physical address and a mask.
  • the mask and the physical address are ORed to delete the low-order bits in the physical address to obtain a physical address with some reserved bits.
  • the physical address is 0xABCDE010
  • the mask is 0xFFFF FFFF FFFF F000
  • the physical address of some reserved bits is 0xABCDE.
  • the flush message is used to indicate flushing of at least two non-consecutive cache lines.
  • the clear packet includes physical address information used to indicate multiple non-consecutive cache lines.
  • the physical addresses of multiple non-consecutive cache lines exist within a range of a continuous physical address space.
  • a continuous physical address space can be a physical address space mapped to main memory, or a physical address space mapped to other storage spaces. Understandably, a segment of continuous physical address space may be a part of the physical address space mapped into the physical address space of the main memory.
  • the physical addresses of a segment of the contiguous physical address space are aligned with the addresses of the cache lines, and the segment of the contiguous physical address space is the space in units of the size of the cache line.
  • a flushed non-contiguous cache line is a cache line in a contiguous physical address space.
  • the physical address information of a plurality of non-contiguous cache lines may include physical addresses of odd-numbered cache lines in a segment of contiguous address space. For example, suppose a cache line is 64 bytes in size and a page is 4KB in size. A 4KB page contains 64 contiguous cache lines.
  • the clear message is used to instruct to clear the first cache line, the third cache line, the fifth cache line, and the 63rd cache line.
  • the physical address information of a plurality of non-contiguous cache lines may include the physical addresses of cache lines with even bits in a segment of contiguous address space. For example, suppose a cache line is 64 bytes in size and a page is 4KB in size. A 4KB page contains 64 contiguous cache lines.
  • the clear message is used to instruct to clear the second cache line, the fourth cache line, the sixth cache line, and the 64th cache line.
  • the physical address information of a plurality of non-consecutive cache lines may include physical addresses of cache lines arranged at equal intervals in a segment of contiguous address space.
  • the physical address information of a plurality of non-contiguous cache lines may include physical addresses of other non-contiguous sequences of cache lines.
  • At least two cache lines indicated by the clear message are used as cache lines mapped to the physical address space of the main memory for illustration. If the continuous physical address space is another physical address space, the above description of clearing the at least two cache lines indicated by the packet is also applicable to the scenario where the at least two cache lines are cache lines in other physical address spaces.
  • the cache memory receives a clear message from the requesting device.
  • the cache memory is cleared and the cache lines within the jurisdiction of the cache memory are among the at least two cache lines indicated by the clearing message.
  • All cache lines indicated by the flush message are stored in a set of cache memories, and a set of cache memories includes at least one cache memory.
  • the cache is any one of a group of caches.
  • the cache line indicated by the flush message may be located in multiple cache memories.
  • Each cache memory clears a cache line within the jurisdiction of the cache memory among the at least two cache lines indicated by the clear message.
  • each cache line contains a tag field (tag), a status field (statu), and a data field (data).
  • tag tag
  • status field status field
  • data data
  • the label field is used to indicate the physical address.
  • status field is used to indicate the state of the cache line.
  • Data fields are used to store data.
  • the clear message instructs to clear 64 cache lines in the first page, ie 64 cache lines between physical address 0x000 and physical address 0x1000.
  • the cache line in the cache memory includes physical address 0x000, physical address 0xFC0, physical address 0x1040 and physical address 0x2000. Since the cache line indicated by the clear message includes the cache line at physical address 0x000 and the cache line at physical address 0xFC0, the cache memory clears the cache line including 0x000 and the cache line including 0xFC0.
  • the cache manager when the cache line is in a modified state (modified, M), the cache manager can write back the cache line in the cache slice to the main memory or directly discard it according to different types of clearing instructions.
  • the cache manager can discard the cache line in the cache slice.
  • the state of the cache line containing 0x000 is the modified state, and the cache manager writes back the cache line containing 0x000 to the main memory or directly discards it.
  • the state of the cache line containing 0xFC0 is the unmodified state (E/S/I), and the cache manager discards the cache line containing 0xFC0.
  • the cache memory that has received the clear message clears the cache lines within the jurisdiction of the cache memory among the 64 cache lines in the page indicated by the clear message.
  • the cache flush is a cache line within the jurisdiction of the cache memory among the at least two cache lines indicated by the cache flush message.
  • the cache memory sends a clearing completion message to the requesting device.
  • the requesting device receives a clearing completion message from all cache memories in a group of cache memories.
  • the processor core sends one clear message to clear at least two cache lines.
  • the clearing message is sent by broadcasting, which reduces the number of times that clearing messages are sent, so that the serially processed clearing task becomes parallel processing, which improves the clearing efficiency and effectively reduces the The internal resources of the processor occupied by clearing the cache line.
  • the clear message sent by the processor core can instruct to clear any number of cache lines. Therefore, the processor core and other devices are prevented from clearing data being used by other cores or devices, the performance of other processes is prevented from being affected, and the accuracy of clearing the cache line is improved.
  • the clearing described in this embodiment may also be described as a refresh instead, and the clear message may also be described as a refresh message instead.
  • the processor core and the cache memory include corresponding hardware structures and/or software modules for executing each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software-driven hardware depends on the specific application scenarios and design constraints of the technical solution.
  • FIG. 9 is a schematic structural diagram of a possible apparatus for sending a clear message according to an embodiment of the present application.
  • These apparatuses for sending a clear message can be used to implement the functions of the processor core in the above method embodiments, and thus can also achieve the beneficial effects of the above method embodiments.
  • the device for sending the clear message may be any one of the processor core 0 to the processing core N as shown in FIG. 1 , or the peripheral device 130 as shown in FIG. 1 , or A module (such as a chip) applied to a processor core or peripheral.
  • the apparatus 900 for sending a clear message includes a sending unit 910 and a receiving unit 920 .
  • the apparatus 900 for sending a clear message is configured to implement the function of the requesting device in the method embodiment shown in FIG. 2 above.
  • the apparatus 900 for sending a clear message is configured to implement the function of the requesting device in the method embodiment shown in FIG. 2 : the sending unit 910 is configured to execute S201 ; the receiving unit 920 is configured to execute S205 .
  • sending unit 910 and receiving unit 920 can be obtained directly by referring to the relevant descriptions in the method embodiment shown in FIG. 2 , and details are not repeated here. It is understandable that the functions of the sending unit 910 and the receiving unit 920 may also be implemented by the processor cores or peripherals in FIG. 1 .
  • the apparatus 1000 for clearing the cache includes a sending unit 1010 , a processing unit 1020 and a receiving unit 1030 .
  • the apparatus 1000 for clearing the cache is used to implement the function of the cache memory in the method embodiment shown in FIG. 2 above.
  • the apparatus 1000 for clearing the cache is used to implement the function of the cache memory in the method embodiment shown in FIG. 2 : the receiving unit 1030 is used to execute S202; the processing unit 1020 is used to execute S203; and the sending unit 1010 is used to execute S204.
  • sending unit 1010 processing unit 1020, and receiving unit 1030 can be obtained directly by referring to the relevant descriptions in the method embodiment shown in FIG. 2, and details are not repeated here. It can be understood that the functions of the sending unit 1010, the processing unit 1020 and the receiving unit 1030 can also be implemented by the cache memory in FIG. 1 .
  • connection manner of the processor core and the cache memory 111 shown in FIG. 1 is only a schematic illustration. In a possible implementation manner, the processor core is connected to the cache memory 111 through a ring bus, and the cache memory 111 is accessed through the ring bus.
  • computing device 100 includes processor 110 , main memory 120 and peripherals 130 .
  • the processor 110 includes a processor core, cache memory 111 and a ring bus 116 .
  • a cache memory 111 is provided in the ring of the ring bus 116 , and the cache memory 111 is connected to the ring bus 116 .
  • a processor core is disposed outside the ring of the ring bus 116 , and the processor core is connected to the ring bus 116 .
  • the processor core is arranged in the ring of the ring bus 116 , and the processor core is connected to the ring bus 116 .
  • the cache memory 111 is provided outside the ring of the ring bus 116 , and the cache memory 111 is connected to the ring bus 116 .
  • the memory manager 113 , the peripheral device management module 114 and the internal device 115 included in the processor 110 are respectively connected to the ring bus 116 .
  • the memory manager 113 is connected to the main memory 120 .
  • the peripheral device management module 114 is connected to the peripheral device 130 .
  • the processor core sends a clear message to a group of cache memories 111 through the ring bus 116 in a broadcast manner. It will be appreciated that all caches within the broadcast address range can receive the clear message.
  • the information of the physical address carried in the clear packet is used to indicate clearing of at least two cache lines.
  • a set of cache memories includes at least one cache memory.
  • the processor core receives a flush complete message from all caches in a set of caches.
  • the processor core sends one clear message to clear one cache line each time
  • the processor core sends one clear message to clear at least two cache lines.
  • the clearing message is sent by broadcasting, which reduces the number of times that clearing messages are sent, so that the serially processed clearing task becomes parallel processing, which improves the clearing efficiency and effectively reduces the The internal resources of the processor occupied by clearing the cache line.
  • the method for sending a clear message Compared with the traditional technology in which the processor core clears the cache lines in all L3 caches, the method for sending a clear message provided by this embodiment, the clear message sent by the processor core can instruct to clear any number of cache lines. Therefore, data that is being used by other processor cores or devices is avoided to be cleared, the performance of other processes is avoided to be affected, and the accuracy of clearing the cache line is improved.
  • the cache memory 111 receives the clear message, clears the cache lines within the jurisdiction of the cache memory among at least two cache lines indicated by the clear message, and sends a clear complete message to the processor core. It should be noted that, if the cache line indicated by the clear message is not within the jurisdiction of the cache memory 111, the cache memory 111 also sends a clear complete message to the processor core.
  • computing device 100 includes processor 110 , main memory 120 and peripherals 130 .
  • the processor 110 includes a processor core, a cache memory 111 , a memory manager 113 , a peripheral management module 114 , an internal device 115 , a mesh bus 117 and a relay module 118 .
  • the processor core, the cache memory 111 , the memory manager 113 , the peripheral management module 114 and the internal device 115 are connected to the mesh bus 117 through the relay module 118 .
  • the processor core, the cache memory 111 , the memory manager 113 , the peripheral device management module 114 and the internal device 115 perform instruction or data transmission through the relay module 118 .
  • the relay module 118 is connected to the horizontal bus and the vertical bus of the mesh bus 117 .
  • the relay module 118 is used to transmit the data of the horizontal bus to the vertical bus.
  • the relay module 118 is also used to transmit the data of the vertical bus to the horizontal bus.
  • the memory manager 113 , the peripheral device management module 114 and the internal device 115 included in the processor 110 are respectively connected to the mesh bus 117 .
  • the memory manager 113 is connected to the main memory 120 .
  • the peripheral device management module 114 is connected to the peripheral device 130 .
  • the processor core sends a clear message to a group of cache memories 111 through the mesh bus 117 in a broadcast manner. It will be appreciated that all caches within the broadcast address range can receive the clear message.
  • the information of the physical address carried in the clear packet is used to indicate clearing of at least two cache lines.
  • a set of cache memories includes at least one cache memory.
  • the processor core receives a flush complete message from all caches in a set of caches.
  • the processor core sends one clear message to clear one cache line each time
  • the processor core sends one clear message to clear at least two cache lines.
  • the clearing message is sent by broadcasting, which reduces the number of times that clearing messages are sent, so that the serially processed clearing task becomes parallel processing, which improves the clearing efficiency and effectively reduces the The internal resources of the processor occupied by clearing the cache line.
  • the method for sending a clear message Compared with the traditional technology in which the processor core clears the cache lines in all L3 caches, the method for sending a clear message provided by this embodiment, the clear message sent by the processor core can instruct to clear any number of cache lines. Therefore, data that is being used by other processor cores or devices is avoided to be cleared, the performance of other processes is avoided to be affected, and the accuracy of clearing the cache line is improved.
  • the cache memory 111 receives the clear message, clears the cache lines within the jurisdiction of the cache memory among at least two cache lines indicated by the clear message, and sends a clear complete message to the processor core. It should be noted that, if the cache line indicated by the clear message is not within the jurisdiction of the cache memory 111, the cache memory 111 also sends a clear complete message to the processor core.
  • the method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (Random Access Memory, RAM), flash memory, read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM) , PROM), Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM), registers, hard disks, removable hard disks, CD-ROMs or known in the art in any other form of storage medium.
  • RAM Random Access Memory
  • ROM read-only memory
  • PROM programmable read-only memory
  • PROM Erasable Programmable Read-Only Memory
  • EPROM Electrically Erasable Programmable Read-Only Memory
  • An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage medium may reside in an ASIC.
  • the ASIC may be located in a network device or in an end device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer programs or instructions.
  • the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits by wire or wireless to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server, data center, or the like that integrates one or more available media.
  • the usable medium can be a magnetic medium, such as a floppy disk, a hard disk, and a magnetic tape; it can also be an optical medium, such as a digital video disc (DVD); it can also be a semiconductor medium, such as a solid state drive (solid state drive). , SSD).
  • “at least one” means one or more, and “plurality” means two or more.
  • “And/or”, which describes the relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, it can indicate that A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are a kind of "or” relationship; in the formula of this application, the character "/” indicates that the related objects are a kind of "division” Relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请公开了一种发送清除报文的方法及装置,涉及计算机领域,解决了计算机系统如何高效率地清除三级缓存中的缓存线,并降低占用处理器内部资源的问题。所述方法包括请求设备采用广播方式向一组高速缓冲存储器发送清除报文,并接收来自所述一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。所述清除报文携带的物理地址的信息用于指示清除至少两个缓存线。高速缓冲存储器接收来自请求设备的清除报文后,清除清除报文指示的至少两个缓存线中属于该高速缓冲存储器管辖范围内的缓存线,并向请求设备发送清除完成报文。一组高速缓冲存储器包括至少一个高速缓冲存储器。

Description

一种发送清除报文的方法及装置
本申请要求于2020年09月07日提交国家知识产权局、申请号为202010930707.8、发明名称为“一种发送清除报文的方法及装置”的中国专利申请的优先权,以及于2020年09月28日提交国家知识产权局、申请号为202011040732.5、发明名称为“一种发送清除报文的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机领域,尤其涉及一种发送清除报文的方法及装置。
背景技术
目前,处理器核清除(flush)三级缓存中的数据时,可以每次清除一个缓存线(cacheline)。如果处理器核清除多个缓存线,由于处理器核无法并发清除报文而导致清除时间过长,并且耗费较多的处理器的资源。例如,假设主存的一个页的大小(page size)为4千字节(kilobyte,KB),一个缓存线的大小为64字节(byte,B),一个页由64个64B的缓存线组成。处理器核清除主存的一个页的数据时,要执行64条连续的清除指令,向三级缓存发送64个连续的清除报文。除此之外,处理器核也提供了一次性清除完所有三级缓存中缓存线的指令。但是,清除所有三级缓存中的缓存线可能导致清除其他处理器核需要的数据,影响其他进程的性能。因此,在处理器核清除三级缓存中的缓存线时如何提高清除缓存线的效率,以及降低占用处理器核的资源是一个亟待解决的问题。
发明内容
本申请提供了一种发送清除报文的方法及装置,解决了处理器核清除三级缓存中的缓存线时,如何提高清除缓存线的效率,以及降低占用处理器内存资源的问题。
第一方面,本申请提供一种发送清除报文的方法,包括:请求设备采用广播方式向一组高速缓冲存储器发送清除报文,并接收来自所述一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。所述清除报文携带的物理地址的信息用于指示清除至少两个缓存线。一组高速缓冲存储器包括至少一个高速缓冲存储器。其中,请求设备包含但不限于以下设备:处理器核,外设,其他与总线相连接的设备。
相对于传统技术,请求设备每次发送一个清除报文清除一个缓存线的方案,本实施例提供的发送清除报文的方法,请求设备发送一个清除报文清除至少两个缓存线。在清除相同数量的缓存线的情况下,采用广播方式发送清除报文,减少了发送清除报文的次数,从而将原本串行处理的清除任务变成了并行处理,提高了清除效率,有效地降低了清除缓存线所占用处理器内部的资源。
相对于传统技术,请求设备清除所有三级缓存中的缓存线的方案,本实施例提供的发送清除报文的方法,请求设备发送的清除报文可以指示清除任意数量的缓存线。从而,避免清除其他核或设备正在使用的数据,避免影响其他进程的性能,提高了清除缓存线的准确度。
在一种可能的实现方式中,清除报文用于指示清除连续的至少两个缓存线。
其中,清除报文包括用于指示一段连续物理地址空间的物理地址的信息,至少两个缓存线包括一段连续物理地址空间的起始物理地址所在的缓存线到一段连续物理地址空间的结 束物理地址所在的缓存线。可选的,连续物理地址空间为映射到主存的物理地址空间,清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且清除的连续物理地址空间为以缓存线的大小为单位的空间。
在另一种可能的实现方式中,请求设备以页的大小粒度清除高速缓冲存储器中的缓存线。清除报文可以指示清除与页内的所有连续的缓存线,从而一个或多个高速缓冲存储器根据清除报文的指示清除一个页包含的缓存线。
例如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址,清除报文还包括页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址和页的结束物理地址。
又如,一段连续物理地址空间的物理地址的信息包括至少一个页中的缓存线的物理地址,清除报文还包括每个页的页类型,页类型用于描述页的大小。
在另一种可能的实现方式中,请求设备以任意个数的缓存线的粒度清除高速缓冲存储器中的缓存线。清除报文可以指示清除一个页内的连续的缓存线或者多个页内的连续的缓存线。从而一个或多个高速缓冲存储器根据清除报文的指示清除一个页内的连续的缓存线或者多个页内的连续的缓存线。
又如,一段连续物理地址空间的物理地址的信息包括一个缓存线的物理地址和缓存线的个数。
又如,一段连续物理地址空间的物理地址的信息包括一个物理地址和立即数,立即数指示一个物理地址中低位的个数。
在另一种可能的实现方式中,清除报文用于指示清除非连续的至少两个缓存线。
第二方面,本申请提供一种清除缓存的方法,包括:高速缓冲存储器接收来自请求设备的清除报文,清除清除报文指示的至少两个缓存线中属于高速缓冲存储器管辖范围内的缓存线;并向请求设备发送清除完成报文。清除报文用于指示清除至少两个缓存线。其中,请求设备包含但不限于以下设备:处理器核,外设,其他与总线相连接的设备。
相对于传统技术,高速缓冲存储器每次清除一个缓存线的方案,本实施例提供的清除缓存的方法,根据清除指令清除任意数量的缓存线。从而,将原本串行处理的清除任务变成了并行处理,提高了清除效率,有效地降低了清除缓存线所占用处理器内部的资源。
相对于传统技术,高速缓冲存储器清除所有三级缓存中的缓存线的方案,本实施例提供的清除缓存的方法,根据清除指令清除任意数量的缓存线。从而,避免清除其他核或设备正在使用的数据,避免影响其他进程的性能,提高了清除缓存线的准确度。
在一种可能的实现方式中,清除报文用于指示清除连续的至少两个缓存线。
其中,清除报文包括用于指示一段连续物理地址空间的物理地址的信息,至少两个缓存线包括一段连续物理地址空间的起始物理地址所在的缓存线到一段连续物理地址空间的结束物理地址所在的缓存线。可选的,连续物理地址空间为映射到主存的物理地址空间,清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且清除的连续物理地址空间为以缓存线的大小为单位的空间。
例如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址,清除报文还包括页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址和页的结束物理 地址。
又如,一段连续物理地址空间的物理地址的信息包括至少一个页中的缓存线的物理地址,清除报文还包括每个页的页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括一个缓存线的物理地址和缓存线的个数。
又如,一段连续物理地址空间的物理地址的信息包括一个物理地址和立即数,立即数指示一个物理地址中低位的个数。
在另一种可能的实现方式中,清除报文用于指示清除非连续的至少两个缓存线。
其中,清除报文清除的所有缓存线存储于一组高速缓冲存储器,一组高速缓冲存储器包含至少一个高速缓冲存储器,高速缓冲存储器是一组高速缓冲存储器中任意一个高速缓冲存储器。可选的,一组高速缓冲存储器包含的高速缓冲存储器属于非一致性内存访问(non-uniform memory access,NUMA)系统中一个或多个NUMA节点。
第三方面,本申请提供一种发送清除报文的装置,有益效果可以参见第一方面的描述此处不再赘述。所述发送清除报文的装置具有实现上述第一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述发送清除报文的装置包括:发送单元和接收单元。所述发送单元,用于采用广播方式向一组高速缓冲存储器发送清除报文,清除报文携带的物理地址的信息用于指示清除至少两个缓存线,一组高速缓冲存储器包括至少一个高速缓冲存储器。所述接收单元,用于接收来自一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
在一种可能的实现方式中,清除报文用于指示清除连续的至少两个缓存线。
其中,清除报文包括用于指示一段连续物理地址空间的物理地址的信息,至少两个缓存线包括一段连续物理地址空间的起始物理地址所在的缓存线到一段连续物理地址空间的结束物理地址所在的缓存线。可选的,连续物理地址空间为映射到主存的物理地址空间,清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且清除的连续物理地址空间为以缓存线的大小为单位的空间。
例如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址,清除报文还包括页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址和页的结束物理地址。
又如,一段连续物理地址空间的物理地址的信息包括至少一个页中的缓存线的物理地址,清除报文还包括每个页的页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括一个缓存线的物理地址和缓存线的个数。
又如,一段连续物理地址空间的物理地址的信息包括一个物理地址和立即数,立即数指示一个物理地址中低位的个数。
在另一种可能的实现方式中,清除报文用于指示清除非连续的至少两个缓存线。
这些单元可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第四方面,本申请提供一种清除缓存的装置,有益效果可以参见第二方面的描述此处不 再赘述。所述发送清除报文的装置具有实现上述第二方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述发送清除报文的装置包括:发送单元、处理单元和接收单元。所述接收单元,用于接收来自请求设备的清除报文,清除报文用于指示清除至少两个缓存线。所述处理单元,用于清除清除报文指示的至少两个缓存线中属于高速缓冲存储器管辖范围内的缓存线。所述发送单元,用于向请求设备发送清除完成报文。
在一种可能的实现方式中,清除报文用于指示清除连续的至少两个缓存线。
其中,清除报文包括用于指示一段连续物理地址空间的物理地址的信息,至少两个缓存线包括一段连续物理地址空间的起始物理地址所在的缓存线到一段连续物理地址空间的结束物理地址所在的缓存线。可选的,连续物理地址空间为映射到主存的物理地址空间,清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且清除的连续物理地址空间为以缓存线的大小为单位的空间。
例如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址,清除报文还包括页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括页的起始物理地址和页的结束物理地址。
又如,一段连续物理地址空间的物理地址的信息包括至少一个页中的缓存线的物理地址,清除报文还包括每个页的页类型,页类型用于描述页的大小。
又如,一段连续物理地址空间的物理地址的信息包括一个缓存线的物理地址和缓存线的个数。
又如,一段连续物理地址空间的物理地址的信息包括一个物理地址和立即数,立即数指示一个物理地址中低位的个数。
在另一种可能的实现方式中,清除报文用于指示清除非连续的至少两个缓存线。
这些单元可以执行上述第二方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第五方面,本申请提供一种处理器,所述处理器包括至少一个处理器核和至少一个高速缓冲存储器,当处理器核执行一组计算机指令时,实现上述如第一方面或第一方面可能的实现方式中任一项所述的方法,当高速缓冲存储器执行一组计算机指令时,实现上述如第二方面或第二方面可能的实现方式中任一项所述的方法。
在一种可能的实现方式中,所述处理器还包括环形总线、外设管理模块和内存管理器,处理器核、高速缓冲存储器、外设管理模块和所内存管理器通过所述环形总线连接。
在一种可能的实现方式中,所述处理器还包括网形总线、外设管理模块和内存管理器,处理器核、高速缓冲存储器、外设管理模块和所内存管理器通过所述网形总线连接。
第六方面,本申请提供一种计算设备,该计算设备可以包括处理器和外设,处理器包括至少一个处理器核和高速缓冲存储器,当处理器核或外设执行一组计算机指令时,实现上述如第一方面或第一方面可能的实现方式中任一项所述的方法,当高速缓冲存储器执行一组计算机指令时,实现上述如第二方面或第二方面可能的实现方式中任一项所述的方法。
在一种可能的实现方式中,处理器还包含环形总线、外设管理模块和内存管理器,处理器核、高速缓冲存储器、外设管理模块和内存管理器通过环形总线连接。
在一种可能的实现方式中,处理器还包含网形总线、外设管理模块和内存管理器,处理 器核、高速缓冲存储器、外设管理模块和内存管理器通过网形总线连接。
第七方面,本申请提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在计算设备中运行时,使得计算设备执行如第一方面或第一方面可能的实现方式,或第二方面或第二方面可能的实现方式中任一项所述的方法。
第八方面,本申请提供一种计算机程序产品,当计算机程序产品在计算设备上运行时,使得计算设备执行如第一方面或第一方面可能的实现方式,或第二方面或第二方面可能的实现方式中任一项所述的目标用户锁定方法。
应当理解的是,本申请中对技术特征、技术方案、有益效果或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此,本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。
附图说明
图1为本申请实施例提供的一种计算设备的组成示意图;
图2为本申请实施例提供的一种发送清除报文和清除缓存的方法的流程图;
图3为本申请实施例提供的一种连续物理地址空间的组成示意图;
图4为本申请实施例提供的一种清除的缓存线的示意图;
图5为本申请实施例提供的一种清除的缓存线的示意图;
图6为本申请实施例提供的一种清除的缓存线的示意图;
图7为本申请实施例提供的一种清除的缓存线的示意图;
图8为本申请实施例提供的一种缓存线的结构示意图;
图9为本申请实施例提供的一种发送清除报文的装置的组成示意图;
图10为本申请实施例提供的一种清除缓存的装置的组成示意图;
图11为本申请实施例提供的一种计算设备的组成示意图;
图12为本申请实施例提供的一种计算设备的组成示意图。
具体实施方式
本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
下面将结合附图对本申请实施例的实施方式进行详细描述。
图1为本申请一实施例提供的计算设备的组成示意图,如图1所示,计算设备100包括处理器110、主存储器(main memory)(简称:主存)120和外设(external device)130。
下面结合图1对计算设备100的各个构成部件进行具体的介绍:
处理器110是计算设备100的控制中心。通常情况下,处理器110是一个中央处理器 (central processing unit,CPU),包括一个处理器核(core)或多个处理器核。例如,图1中所示的处理器110包含的N个处理器核。
在计算机存储系统的层次结构中,距离CPU越近的存储器,读写的速度越快,存储器的容量也越小。依据距离CPU从近到远的顺序,存储器分为:寄存器,高速缓冲存储器(cache),主存储器,磁盘。高速缓冲存储器是介于CPU和主存储器之间的高速小容量存储器。高速缓冲存储器包括一级高速缓冲存储器(L1cache)、二级高速缓冲存储器(L2cache)和三级高速缓冲存储器(L3cache)。通常,一级高速缓冲存储器设置于处理器核内部。二级高速缓冲存储器可以设置于处理器核内部或处理器核外部。一级高速缓冲存储器和二级高速缓冲存储器通常由其所在的处理器核独享(exclusive)。三级高速缓冲存储器一般设置在处理器核外部,由多个处理器核共享(shared)。通常,处理器可以包含多个三级高速缓冲存储器。例如,图1中所示的处理器110包含的多个高速缓冲存储器111。高速缓冲存储器111为处理器110中的三级高速缓冲存储器。高速缓冲存储器111用于存储处理器110中的处理器核可能多次访问的指令或数据。从而,提高处理器处理数据的速度,避免处理器频繁访问主存。
高速缓冲存储器111包括缓存管理器(cache unit)1111,缓存管理器1111连接一个缓存片(slice)1112。缓存线(cache line)是缓存片1112中的最小缓存单位。一个缓存线的大小(cache line size)可以为32字节(byte,B)、64字节、128字节或256字节等。假设缓存片1112的存储容量是512字节,缓存线的大小为64字节,缓存片1112的存储容量划分为8个缓存线。处理器110通过内存管理器(memory control,MC)113与主存储器120连接。
缓存管理器1111用于依据处理器核的指令管理缓存片1112中的缓存线。例如,缓存管理器1111依据处理器核的缓存线读取指令以及缓存片1112中缓存线的状态,决定是否从主存储器120中获取新的缓存线,或是反馈已经存在的缓存线给处理器核。又如,缓存管理器1111依据处理器核的指令清除缓存片1112中的缓存线。若缓存片1112中的缓存线处于已修改状态(modified),缓存管理器1111可以根据不同类型的清除指令将缓存片1112中的缓存线回写到主存储器120或直接丢弃。若缓存片1112中的缓存线处于未修改状态,缓存管理器1111可以将缓存片1112中的缓存线丢弃。例如,未修改状态包括专有态(exclusive,E)、共享态(shared,S)和无效态(invalid,I)。专有态是指该缓存线1112中的数据与主存储器内对应的缓存线的内容一致,但该缓存线1112只存储在一个NUMA域的一个高速缓冲存储器中。共享态是指该缓存线1112中的数据与主存储器内对应的缓存的内容一致,但该缓存线1112有可能存储于多个NUMA域的一个高速缓冲存储器中。无效态是指本缓存片1112中没有缓存该缓存线。本实施例中,清除缓存线也可以替换描述为刷新缓存线。
处理器核通过总线112与高速缓冲存储器111连接,通过总线112访问高速缓冲存储器111。总线112可以是工业标准体系结构(industry standard architecture,ISA)总线、外部设备互连(peripheral component,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线,或者是非标准体系的私有总线标准等。该总线可以分为地址总线、数据总线、控制总线等。为便于表示,图1中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器110可以通过运行或执行存储在主存储器120内的软件程序,以及调用存储在主存储器120内的数据,执行计算设备100的各种功能。
在本实施例中,处理器核采用广播方式向一组高速缓冲存储器111发送清除报文。可理 解的,在广播地址范围内的所有高速缓冲存储器均可以接收到清除报文。清除报文携带的物理地址的信息用于指示清除至少两个缓存线。一组高速缓冲存储器包括至少一个高速缓冲存储器。进而,处理器核接收来自一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
相对于传统技术,处理器核每次发送一个清除报文清除一个缓存线的方案,本实施例提供的发送清除报文的方法,处理器核发送一个清除报文清除至少两个缓存线。在清除相同数量的缓存线的情况下,采用广播方式发送清除报文,减少了发送清除报文的次数,从而将串行处理的清除任务变成了并行处理,提高了清除效率,有效地降低了清除缓存线所占用处理器内部的资源。
相对于传统技术,处理器核清除所有三级缓存中的缓存线的方案,本实施例提供的发送清除报文的方法,处理器核发送的清除报文可以指示清除任意数量的缓存线。从而,避免清除其他处理器核或设备正在使用的数据,避免影响其他进程的性能,提高了清除缓存线的准确度。
高速缓冲存储器111接收到清除报文,清除该清除报文指示的至少两个缓存线中属于该高速缓冲存储器管辖范围内的缓存线,并向处理器核发送清除完成报文。需要说明的是,若高速缓冲存储器111管辖范围内没有清除报文指示的缓存线,高速缓冲存储器111也向处理器核发送清除完成报文。
在物理形态上,高速缓冲存储器111可以是随机存取存储器(random access memory,RAM)、静态随机存取存储器(static random-access memory,SRAM)、动态随机存储器(dynamic RAM,DRAM),或可存储信息和指令的其他类型的存储设备。
在逻辑形态上,高速缓冲存储器111可以是三级缓存、一级缓存、二级缓存或者任意级别的缓存设备,只要该缓存设备为分布式的缓存设备,即缓存线会分布式存于多个同级的缓存设备中。
主存储器120可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)等。主存储器120还用于存储与本实施例相关的程序。
处理器110还可以包含外设管理模块114。外设管理模块114分别连接总线112和外设130。外设130可以是特定集成电路(application specific integrated circuit,ASIC)。例如:微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA),或者,图形处理器(graphics processing unit,GPU),或者,神经网络处理器(neural-network processing unit,NPU)。外设130也可以采用广播方式向一组高速缓冲存储器发送清除报文。关于外设130发送清除报文的具体的解释可以参考上述处理器核发送清除报文的描述。
处理器110还可以包含内部设备115。内部设备115包括连接在总线112上的逻辑网际互连协议(internet protocol,IP)单元和具备一定的逻辑管理功能模块。例如,逻辑管理功能模块包含但不限于:中断管理模块,NUMA管理模块(或NUMA节点管理器),内部集成输入输出(input and output,IO)模块,内部加解密模块,内部直接存储器访问(direct memory access,DMA)模块等。内部设备115也可以根据需要采用广播方式向一组高速缓冲存储器发送清除报文。需要说明的是,与总线相连接的模块或设备均可以采用广播方式向一 组高速缓冲存储器发送清除报文。
图1中示出的设备结构并不构成对计算设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
接下来,结合图2,对本实施例提供的发送清除报文和清除缓存的方法进行详细说明。
S201、请求设备采用广播方式向一组高速缓冲存储器发送清除报文。
请求设备包含但不限于以下设备:处理器核,外设,其他与总线相连接的设备。例如,如图1中所示的处理器核0、处理器核1或外设130。其他与总线相连接的设备包括如图1中所示的内部设备115。
与总线相连的设备(如:处理器核)在获取到清除指令后,根据清除指令生成清除报文,采用广播方式向一组高速缓冲存储器发送清除报文。
若请求设备是外设,外设中的处理模块通过设置外设的寄存器控制外设发送清除指令或清除报文给外设管理模块。外设管理模块获取到清除指令或清除报文后,将清除指令或清除报文转换成总线类型的清除报文,并采用广播方式向一组高速缓冲存储器发送清除报文。
应理解,所谓广播方式是指一对多的方式。请求设备发送清除报文,清除报文包含广播地址,在广播地址范围内的所有高速缓冲存储器均可以接收到清除报文。也就是说,请求设备发送的清除报文的个数少于接收清除报文的高速缓冲存储器的个数。例如,请求设备采用广播方式向一组高速缓冲存储器发送一个清除报文,一组高速缓冲存储器中的每个高速缓冲存储器均可接收到来自请求设备的清除报文。又如,请求设备采用广播方式向一组高速缓冲存储器发送两个清除报文,一组高速缓冲存储器中的每个高速缓冲存储器均可接收到来自请求设备的清除报文。一组高速缓冲存储器包括至少一个高速缓冲存储器。
可选的,处理器中的任意一个处理器核可以根据哈希算法将处理器中的高速缓冲存储器划分为多个NUMA节点。在本实施例中,一组高速缓冲存储器包含的高速缓冲存储器可以属于NUMA系统中一个或多个NUMA节点。例如,图1中的多个高速缓冲存储器111可以属于一个或多个NUMA节点。处理器核或外设130可以向一个或多个NUMA节点内的高速缓冲存储器111发送清除报文。
在一些实施例中,每个NUMA节点由一个NUMA节点管理器管理,NUMA节点管理器与总线112连接。若一组高速缓冲存储器包含的高速缓冲存储器属于第一NUMA节点,处理器核或外设130可向第一NUMA节点的NUMA节点管理器发送清除报文,由该NUMA节点管理器将清除报文发送至每个高速缓冲存储器。进而,一组高速缓冲存储器中的所有高速缓冲存储器在清除完清除报文指示的至少两个缓存线中属于自己管辖范围内的缓存线后,将清除完成报文发送给该NUMA节点管理器,由该NUMA节点管理器将清除完成报文发送至处理器核或外设130。
若一组高速缓冲存储器包含的高速缓冲存储器属于多个NUMA节点(如:第一NUMA节点、第二NUMA节点和第三NUMA节点)。第一NUMA节点由第一NUMA节点管理器管理。第二NUMA节点由第二NUMA节点管理器管理。第三NUMA节点由第三NUMA节点管理器管理。处理器核或外设130可向第一NUMA节点管理器发送清除报文,第一NUMA节点管理器因接受请求而升级为清除报文的主管理器。第一NUMA节点管理器将清除报文发送至第二NUMA节点管理器和第三NUMA节点管理器。由第一NUMA节点管理器将清除报文发送至第一NUMA节点内的每个高速缓冲存储器。由第二NUMA节点管理器将清除报文发送至第二NUMA节点内的每个高速缓冲存储器。由第三NUMA节点管理器将清除报文发送至第三NUMA节点内的每个高速缓冲存储器。进而,第二NUMA节点内的每个高速缓冲存储器将清除完成报文发送给第二NUMA节点管 理器,第二NUMA节点管理器将清除完成报文发送给第一NUMA节点管理器。第三NUMA节点内的每个高速缓冲存储器将清除完成报文发送给第三NUMA节点管理器,第三NUMA节点管理器将清除完成报文发送给第一NUMA节点管理器。第一NUMA节点管理器将第二NUMA节点和第三NUMA节点内的每个高速缓冲存储器反馈的清除完成报文和第一NUMA节点内的每个高速缓冲存储器反馈的清除完成报文发送给处理器核或外设130。反馈的清除完成报文的计数功能可以在第一NUMA节点管理器、第二NUMA节点管理器内或第三NUMA节点管理器内部实现。例如,第二NUMA节点管理器收到所有管辖范围内的清除完成报文后,反馈一条清除完成报文给第一NUMA节点管理器。第一NUMA节点管理器只发一个反馈的清除完成报文给处理器核或外设130,也可以在处理器核或外设130内部实现,第一NUMA节点管理器将所有反馈的清除完成报文转发给处理器核或外设130。
目前技术中,如果请求设备是一个外部设备,并且设备上的内存可被当成主存使用。该外部设备访问设备上的内存中的一个缓存线,需要将读取请求通过外设管理模块114和总线112发给本NUMA域中的一个高速缓冲存储器。如果高速缓冲存储器内的缓存线未命中,高速缓冲存储器通过地址解析,需要再将读取请求发送给外部设备的内存控制器。外部设备的内存控制器读取相应的缓存线后,再反馈给高速缓冲存储器,高速缓冲存储器将缓存线缓存在缓存片1112后,再将缓存线反馈给外部设备。如果外部设备完全拥有一个页所有缓存线的所有权,外部设备将可以直接读取外部设备的内存而不需要通过高速缓冲器。外部设备可以根据本实施例提供的清除缓存的方法向一个或者多个NUMA域中的所有高速缓冲存储器发送清除报文,指示高速缓冲存储器清除一个页中所有的缓存线,至此,外部设备将独占上述页的缓存线,可以直接读写。直到有另外的处理器核或者设备请求上述页中的缓存线。
请求设备发送清除报文的触发条件还可以是高速缓冲存储器管理缓存线。例如,高速缓冲存储器主动清除缓存线。
其中,清除报文携带的物理地址的信息用于指示清除至少两个缓存线。物理地址可以是由处理器核中的地址翻译(address translator)模块对清除指令指示的逻辑地址翻译得到的。
在一种可能的设计中,清除报文用于指示清除连续的至少两个缓存线。
具体的,清除报文包括用于指示一段连续物理地址空间的物理地址的信息。
在一些实施例中,连续物理地址空间可以为映射到主存的物理地址空间。可选的,连续物理地址空间为映射到主存的物理地址空间中的部分物理地址空间。或者,连续物理地址空间为映射到主存的物理地址空间的全部物理地址空间。应理解,操作系统是以页(page)为单位管理映射到主存的物理地址空间,以便于处理器核对映射到主存的物理地址空间的数据进行读写。因此,映射到主存的物理地址空间可以划分为多个页。每个页以缓存线的大小为单位进行划分。
计算机系统中的系统物理地址空间表示一个计算机实体所占用的内存大小。其中,系统物理地址空间包括映射到主存的物理地址空间和内存映射I/O(memory mapped I/O,MMIO)的物理地址空间等。映射到主存的物理地址空间是计算机系统中的系统物理地址空间中的一部分空间。其他物理地址空间是计算机系统中的系统物理地址空间中的一部分空间。
在另一些实施例中,连续物理地址空间可以为系统物理地址空间中除了映射到主存的物理地址空间之外的其他物理地址空间。例如,其他物理地址空间是MMIO的物理地址空间。可选的,连续物理地址空间为其他物理地址空间中的部分物理地址空间。或者,连续物理地 址空间为其他物理地址空间的全部物理地址空间。
图3为连续物理地址空间的示意图。如图3中的(a)所示,假设系统物理地址空间的大小是2 46。其中,一段连续物理地址空间为映射到主存的物理地址空间中的部分物理地址空间。一个缓存线的大小是64字节,一个页的大小是4KB。4KB的页包含64个连续的缓存线。物理地址0x000表示第一个页的起始地址。物理地址0x1000表示第二个页的起始地址。第一个页包含了物理地址0x000到0x1000之间的64个连续的缓存线。第二个页包含了物理地址0x1000到物理地址0x2000之间的64个连续的缓存线。
如图3中的(b)所示,一段连续物理地址空间为MMIO的全部物理地址空间。物理地址0x2_0000_0000表示MMIO的全部物理地址空间的起始地址。
可选的,一段连续物理地址空间包括映射到主存的物理地址空间和没有映射到主存的物理地址空间。例如,一段连续物理地址空间包括映射到主存的物理地址空间中的部分物理地址空间和没有映射到主存的物理地址空间中的部分物理地址空间。
本实施例所述的至少两个缓存线包括一段连续物理地址空间的起始物理地址所在的缓存线到一段连续物理地址空间的结束物理地址所在的缓存线。清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且清除的连续物理地址空间为以缓存线的大小为单位的空间。
一段连续物理地址空间的起始物理地址可以是MMIO的物理地址空间中任意一个物理地址。一段连续物理地址空间的结束物理地址可以是MMIO的物理地址空间中任意一个物理地址。
一段连续物理地址空间的起始物理地址可以是映射到主存的物理地址空间中任意一个物理地址。一段连续物理地址空间的结束物理地址可以是映射到主存的物理地址空间中任意一个物理地址。
下面以连续物理地址空间可以为映射到主存的物理地址空间,清除报文指示的至少两个缓存线为映射到主存的物理地址空间中的缓存线进行举例说明。一段连续物理地址空间的起始物理地址和结束物理地址为映射到主存的物理地址空间中的物理地址。
在一些实施例中,一段连续物理地址空间的起始物理地址可以是映射到主存的物理地址空间中两个缓存线的之间的任意一个物理地址。一段连续物理地址空间的结束物理地址可以是映射到主存的物理地址空间中两个缓存线的之间的任意一个物理地址。
示例的,如图4中的(a)所示,若一段连续物理地址空间为点A处的物理地址到点B处的物理地址之间的连续物理地址空间。清除报文指示清除点A处的物理地址所在的缓存线到点B处的物理地址所在的缓存线。点A处的物理地址所在的缓存线是第一个页的第63个缓存线,点B处的物理地址所在的缓存线是第二个页的第1个缓存线。清除报文指示清除第一个页的第63个缓存线、第一个页的第64个缓存线和第二个页的第1个缓存线。
在另一些实施例中,一段连续物理地址空间的起始物理地址或结束物理地址可以是映射到主存的物理地址空间中缓存线的物理地址。高速缓冲存储器可以清除该缓存线的物理地址指示的缓存线,或者,高速缓冲存储器可以不清除该缓存线的物理地址指示的缓存线。
如图4中的(b)所示,若一段连续物理地址空间为点A处的物理地址到点B’处的物理地址之间的连续物理地址空间。由于点B’处的物理地址是第二个页的第1个缓存线的物理地址,清除报文可以指示清除第一个页的第63个缓存线和第一个页的第64个缓存线,不清除第二个页的第1个缓存线。或者,清除报文指示清除第一个页的第63个缓存线、第一个页 的第64个缓存线和第二个页的第1个缓存线。
在一些实施例中,请求设备以页的大小粒度清除高速缓冲存储器中的缓存线。清除报文可以指示清除与页内的所有连续的缓存线。
在第一种可能的实现方式中,一段连续物理地址空间的物理地址的信息包括页的起始物理地址。页的起始物理地址是任意一个物理地址的0到11的比特位为0的地址。清除报文还包括页类型,页类型用于描述页的大小。例如,x86架构中规定的页的大小包括4K、2M和1G。
例如,清除报文包括的页的起始物理地址为0x000,且页的大小为4K,一个缓存线的大小是64字节。如图5,清除报文指示清除第一个页中的64个缓存线,即物理地址0x000到物理地址0x1000之间的64个缓存线。
又如,假设页的起始物理地址为0x040,且页的大小为4K,一个缓存线的大小是64字节。清除报文指示清除物理地址0x040到物理地址0x1040之间的64个缓存线。
又如,如图6,假设页的起始物理地址为0x000,且页的大小为8K,一个缓存线的大小是64字节。清除报文指示清除物理地址0x040到物理地址0x200之间的128个缓存线。
示例的,可以预先在计算设备中预先设置清除指令。例如,PKFLUSH指令用于清除4K的页。PMFLUSH指令用于清除2M的页。PGFLUSH指令用于清除1G的页。
以PKFLUSH为例,指令格式可以写为但不仅限于:PKFLUSH es:[esi]–32位模式,或,PKFLUSH[rsi]–64位模式。其中,es为存储数据段的段寄存器,esi和rsi是通用用途(general purpose)寄存器用来存储逻辑地址。
例如,请求设备可以依据清除指令PKFLUSH向一组高速缓冲存储器发送清除报文,指示清除4K的页。
在第二种可能的实现方式中,一段连续物理地址空间的物理地址的信息包括页的起始物理地址和页的结束物理地址。
例如,清除报文包括的页的起始物理地址为0x000,且页的结束物理地址为0x1000。如图5,清除报文指示清除第一个页中的64个缓存线,即物理地址0x000到物理地址0x1000之间的64个缓存线。
又如,清除报文包括的页的起始物理地址为0x000,且页的结束物理地址为0x2000。如图6,清除报文指示清除第一个页中的64个缓存线和第二个页中的64个缓存线,即物理地址0x000到物理地址0x2000之间的128个缓存线。
在第三种可能的实现方式中,一段连续物理地址空间的物理地址的信息包括页中的缓存线的物理地址。清除报文还包括页类型,页类型用于描述页的大小。页中的缓存线的物理地址可以是页中的任意一个缓存线的物理地址。
例如,清除报文包括的页中的缓存线的物理地址0xFC0,且页的大小为4K。如图5,清除报文指示清除第一个页中的64个缓存线,即物理地址0x000到物理地址0x1000之间的64个缓存线。
又如,清除报文包括的页中的缓存线的物理地址0xFC0和0x1040,且页的大小为4K。如图6,清除报文指示清除第一个页中的64个缓存线和第二个页中的64个缓存线,即物理地址0x000到物理地址0x2000之间的128个缓存线。
在另一些实施例中,请求设备以任意个数的缓存线的粒度清除高速缓冲存储器中的缓存线。清除报文可以指示清除一个页内的连续的缓存线或者多个页内的连续的缓存线。
在第四种可能的实现方式中,一段连续物理地址空间的物理地址的信息包括一个缓存线的物理地址和缓存线的个数。假设清除报文指示清除的缓存线的个数为N,清除报文指示从缓存线的物理地址开始清除连续的N个缓存线。
在一些实施例中,可以从缓存线的物理地址开始沿着物理地址减小的方向,清除N个缓存线。如图7中的(a)所示,清除报文包括的缓存线的物理地址0x1000,且缓存线的个数为2。清除报文指示清除第一个页中的第64个缓存线和第63个缓存线。
在另一些实施例中,可以从缓存线的物理地址开始沿着物理地址增加的方向,清除N个缓存线。如图7中的(b)所示,清除报文包括的缓存线的物理地址0x1000,且缓存线的个数为2。清除报文指示清除第一个页中的第64个缓存线和第二个页中的第1个缓存线。
可选的,一段连续物理地址空间为MMIO的物理地址空间。清除报文指示清除MMIO的物理地址空间中从缓存线的物理地址开始清除连续的N个缓存线。需要说明的是,清除报文指示清除MMIO的物理地址空间中的缓存线后,不允许将清除的缓存线写回到MMIO。
在第五种可能的实现方式中,一段连续物理地址空间的物理地址的信息包括一个物理地址和立即数。立即数指示一个物理地址中低位的个数。应理解,高速缓冲存储器可以根据立即数生成掩码,利用掩码和物理地址生成一个新的物理地址,利用新的物理地址清除缓存线。利用新的物理地址清除缓存线的具体方法可以参考上述阐述的各种可能的实现方式,不予赘述。
例如,清除报文指示清除以2为基数的区块。请求设备可以构建清除任意粒度的缓存线的清除指令,例如,RFLUSH指令。指令格式可以写为:RFLUSH es:[esi],imm8。
其中,设置值imm8是从6到最大的物理地址的整数,例如6,7,8…51。当设置值超过物理地址的最大值时,RFLUSH指令指示清除映射到主存的物理地址空间中所有的缓存线。当设置值小于6,RFLUSH指令指示清除一个缓存线。
当imm8在6到最大的物理地址之间的一个数值时,RFLUSH指令指示清除2的任意次幂个数的缓存线。例如,设置值等于12时,相当于输入的物理地址与0x000进行与操作作为起始物理地址,输入的物理地址与0xFFF进行或操作作为结束物理地址。例如输入的物理地址是0xABCDE010,设置值等于12时,清除报文指示清除物理地址0xABCDE000到物理地址0xABCDEFFF的物理地址空间内的所有的缓存线,设置值等于16,清除报文指示清除物理地址0xABCD0000到物理地址0xABCDFFFF的物理地址空间内的所有的缓存线。
依据立即数可以删除物理地址中指示缓存线的物理地址,保留指示页的地址,或者,保留指示页的地址中的部分地址。清除报文可以包含指示页的地址或指示页的地址中的部分地址。
可选的,一段连续物理地址空间的物理地址的信息包括一个物理地址和掩码。将掩码和物理地址进行或操作,删除了物理地址中低位的比特位,得到一个保留部分比特位的物理地址。例如,物理地址是0xABCDE010,掩码为0xFFFF FFFF FFFF F000,保留部分比特位的物理地址为0xABCDE。
在另一种可能的设计中,清除报文用于指示清除非连续的至少两个缓存线。
具体的,清除报文包括用于指示多个非连续缓存线的物理地址信息。其中,多个非连续缓存线的物理地址存在于一段连续的物理地址空间的范围内。一段连续的物理地址空间可以为映射到主存的物理地址空间,也可以为映射到其他存储空间的物理地址空间。可理解的,一段连续的物理地址空间可以为映射到主存的物理地址空间中的部分物理地址空间。一段连 续物理地址空间的物理地址与缓存线的地址对齐,并且一段连续物理地址空间为以缓存线的大小为单位的空间。清除的非连续缓存线为一段连续物理地址空间中的缓存线。关于一段连续的物理地址空间的解释可以参考上述阐述。
多个非连续缓存线的物理地址信息可以包括一段连续地址空间中奇数位的缓存线的物理地址。例如,假设一个缓存线的大小是64字节,一个页的大小是4KB。4KB的页包含64个连续的缓存线。清除报文用于指示清除第1个缓存线、第3个缓存线、第5个缓存线,至第63个缓存线。
多个非连续缓存线的物理地址信息可以包括一段连续地址空间中偶数位的缓存线的物理地址。例如,假设一个缓存线的大小是64字节,一个页的大小是4KB。4KB的页包含64个连续的缓存线。清除报文用于指示清除第2个缓存线、第4个缓存线、第6个缓存线,至第64个缓存线。
多个非连续缓存线的物理地址信息可以包括一段连续地址空间中以等差间隔排列缓存线的物理地址。多个非连续缓存线的物理地址信息可以包括其他非连续序列的缓存线的物理地址。
应理解,上述实施例是以清除报文指示的至少两个缓存线为映射到主存的物理地址空间中的缓存线进行举例说明。如果连续物理地址空间为其他物理地址空间,上述清除报文指示的至少两个缓存线的说明同样适用于至少两个缓存线为其他物理地址空间中的缓存线的场景。
S202、高速缓冲存储器接收来自请求设备的清除报文。
S203、高速缓冲存储器清除清除报文指示的至少两个缓存线中属于高速缓冲存储器管辖范围内的缓存线。
清除报文指示清除的所有缓存线存储于一组高速缓冲存储器,一组高速缓冲存储器包含至少一个高速缓冲存储器。该高速缓冲存储器是一组高速缓冲存储器中任意一个高速缓冲存储器。
应理解,由于高速缓冲存储器中的缓存线包含不连续的物理地址,因此,清除报文指示清除的缓存线可能位于多个高速缓冲存储器中。每个高速缓冲存储器清除该清除报文指示的至少两个缓存线中属于高速缓冲存储器管辖范围内的缓存线。
通常,每个缓存线包含标签域(tag)、状态域(statu)和数据域(data)。标签域用于指示物理地址。状态域用于指示缓存线的状态。数据域用于存储数据。
例如,清除报文指示清除第一个页中的64个缓存线,即物理地址0x000到物理地址0x1000之间的64个缓存线。如图8所示,若高速缓冲存储器中缓存线包含物理地址0x000、物理地址0xFC0、物理地址0x1040和物理地址0x2000。由于清除报文指示的缓存线包含了物理地址0x000的缓存线和物理地址0xFC0的缓存线,该高速缓冲存储器清除包含0x000的缓存线和包含0xFC0的缓存线。
其中,当缓存线处于已修改状态(modified,M),缓存管理器可以根据不同类型的清除指令将缓存片中的缓存线回写到主存储器或直接丢弃。当缓存线处于未修改状态,缓存管理器可以将缓存片中的缓存线丢弃。如图8,包含0x000的缓存线的状态为已修改状态,缓存管理器将包含0x000的缓存线回写到主存储器或直接丢弃。包含0xFC0的缓存线的状态为未修改状态(E/S/I),缓存管理器将包含0xFC0的缓存线丢弃。
因此,接收到该清除报文的高速缓冲存储器,清除清除报文指示的页中的64个缓存线 中属于高速缓冲存储器管辖范围内的缓存线。
关于清除报文指示清除连续的至少两个缓存线的具体指示方式可以参考上述各种可能的实现方式的阐述,不予赘述。高速缓冲存储器清除清除报文指示的至少两个缓存线中属于高速缓冲存储器管辖范围内的缓存线。
S204、高速缓冲存储器向请求设备发送清除完成报文。
S205、请求设备接收来自一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
相对于传统技术,处理器核每次发送一个清除报文清除一个缓存线的方案,本实施例提供的清除缓存的方法,处理器核发送一个清除报文清除至少两个缓存线。在清除相同数量的缓存线的情况下,采用广播方式发送清除报文,减少了发送清除报文的次数,从而将串行处理的清除任务变成了并行处理,提高了清除效率,有效地降低了清除缓存线所占用处理器内部的资源。
相对于传统技术,处理器核清除所有三级缓存中的缓存线的方案,本实施例提供的清除缓存的方法,处理器核发送的清除报文可以指示清除任意数量的缓存线。从而,避免处理器核和其他设备清除其他核或设备正在使用的数据,避免影响其他进程的性能,提高了清除缓存线的准确度。
此外,本实施例所述的清除也可以替换描述为刷新,清除报文也可以替换描述为刷新报文。
可以理解的是,为了实现上述实施例中功能,处理器核和高速缓冲存储器包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
图9为本申请的实施例提供的可能的发送清除报文的装置的结构示意图。这些发送清除报文的装置可以用于实现上述方法实施例中处理器核的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该发送清除报文的装置可以是如图1所示的处理器核0至处理核N中任意一个,也可以是如图1所示的外设130,还可以是应用于处理器核或外设的模块(如芯片)。
如图9所示,发送清除报文的装置900包括发送单元910和接收单元920。发送清除报文的装置900用于实现上述图2中所示的方法实施例中请求设备的功能。
当发送清除报文的装置900用于实现图2所示的方法实施例中请求设备的功能时:发送单元910用于执行S201;接收单元920用于执行S205。
有关上述发送单元910和接收单元920更详细的描述可以直接参考图2所示的方法实施例中相关描述直接得到,这里不加赘述。可理解的,发送单元910和接收单元920的功能也可以由图1中的处理器核或外设实现。
如图10所示,清除缓存的装置1000包括发送单元1010、处理单元1020和接收单元1030。清除缓存的装置1000用于实现上述图2中所示的方法实施例中高速缓冲存储器的功能。
当清除缓存的装置1000用于实现图2所示的方法实施例中高速缓冲存储器的功能时:接收单元1030用于执行S202;处理单元1020用于执行S203;发送单元1010用于执行S204。
有关上述发送单元1010、处理单元1020和接收单元1030更详细的描述可以直接参考图2所示的方法实施例中相关描述直接得到,这里不加赘述。可理解的,发送单元1010、处理 单元1020和接收单元1030的功能也可以由图1中的高速缓冲存储器实现。
上述图1中所示的处理器核和高速缓冲存储器111的连接方式只是一种示意性说明。在一种可能的实现方式中,处理器核通过环形总线(ring bus)与高速缓冲存储器111连接,通过环形总线访问高速缓冲存储器111。如图11所示,计算设备100包括处理器110、主存储器120和外设130。处理器110包含处理器核、高速缓冲存储器111和环形总线116。环形总线116的环内设置有高速缓冲存储器111,高速缓冲存储器111连接环形总线116。环形总线116的环外设置有处理器核,处理器核连接环形总线116。可选的,处理器核设置在环形总线116的环内,处理器核连接环形总线116。高速缓冲存储器111设置在环形总线116的环外,高速缓冲存储器111连接环形总线116。
另外,处理器110包含的内存管理器113、外设管理模块114和内部设备115分别与环形总线116连接。内存管理器113连接主存储器120。外设管理模块114连接外设130。
在本实施例中,处理器核采用广播方式通过环形总线116向一组高速缓冲存储器111发送清除报文。可理解的,在广播地址范围内的所有高速缓冲存储器均可以接收到清除报文。清除报文携带的物理地址的信息用于指示清除至少两个缓存线。一组高速缓冲存储器包括至少一个高速缓冲存储器。进而,处理器核接收来自一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
相对于传统技术,处理器核每次发送一个清除报文清除一个缓存线的方案,本实施例提供的发送清除报文的方法,处理器核发送一个清除报文清除至少两个缓存线。在清除相同数量的缓存线的情况下,采用广播方式发送清除报文,减少了发送清除报文的次数,从而将串行处理的清除任务变成了并行处理,提高了清除效率,有效地降低了清除缓存线所占用处理器内部的资源。
相对于传统技术,处理器核清除所有三级缓存中的缓存线的方案,本实施例提供的发送清除报文的方法,处理器核发送的清除报文可以指示清除任意数量的缓存线。从而,避免清除其他处理器核或设备正在使用的数据,避免影响其他进程的性能,提高了清除缓存线的准确度。
高速缓冲存储器111接收到清除报文,清除该清除报文指示的至少两个缓存线中属于该高速缓冲存储器管辖范围内的缓存线,并向处理器核发送清除完成报文。需要说明的是,若高速缓冲存储器111管辖范围内没有清除报文指示的缓存线,高速缓冲存储器111也向处理器核发送清除完成报文。
关于计算设备100包含的各个模块的解释,以及处理器核采用广播方式向一组高速缓冲存储器111发送清除报文的解释,可以参考上述各个实施例的阐述,不予赘述。
在另一种可能的实现方式中,处理器核和高速缓冲存储器111通过网形总线(mesh bus)连接,处理器核通过网形总线访问高速缓冲存储器111。如图12所示,计算设备100包括处理器110、主存储器120和外设130。处理器110包含处理器核、高速缓冲存储器111、内存管理器113、外设管理模块114、内部设备115、网形总线117和中转模块118。处理器核、高速缓冲存储器111、内存管理器113、外设管理模块114和内部设备115通过中转模块118与网形总线117连接。处理器核、高速缓冲存储器111、内存管理器113、外设管理模块114和内部设备115通过中转模块118进行指令或数据传输。中转模块118与网形总线117的横向总线和纵向总线连接。中转模块118用于将横向总线的数据传输到纵向总线。中转模块118还用于将纵向总线的数据传输到横向总线。
另外,处理器110包含的内存管理器113、外设管理模块114和内部设备115分别与网形总线117连接。内存管理器113连接主存储器120。外设管理模块114连接外设130。
在本实施例中,处理器核采用广播方式通过网形总线117向一组高速缓冲存储器111发送清除报文。可理解的,在广播地址范围内的所有高速缓冲存储器均可以接收到清除报文。清除报文携带的物理地址的信息用于指示清除至少两个缓存线。一组高速缓冲存储器包括至少一个高速缓冲存储器。进而,处理器核接收来自一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
相对于传统技术,处理器核每次发送一个清除报文清除一个缓存线的方案,本实施例提供的发送清除报文的方法,处理器核发送一个清除报文清除至少两个缓存线。在清除相同数量的缓存线的情况下,采用广播方式发送清除报文,减少了发送清除报文的次数,从而将串行处理的清除任务变成了并行处理,提高了清除效率,有效地降低了清除缓存线所占用处理器内部的资源。
相对于传统技术,处理器核清除所有三级缓存中的缓存线的方案,本实施例提供的发送清除报文的方法,处理器核发送的清除报文可以指示清除任意数量的缓存线。从而,避免清除其他处理器核或设备正在使用的数据,避免影响其他进程的性能,提高了清除缓存线的准确度。
高速缓冲存储器111接收到清除报文,清除该清除报文指示的至少两个缓存线中属于该高速缓冲存储器管辖范围内的缓存线,并向处理器核发送清除完成报文。需要说明的是,若高速缓冲存储器111管辖范围内没有清除报文指示的缓存线,高速缓冲存储器111也向处理器核发送清除完成报文。
关于计算设备100包含的各个模块的解释,以及处理器核采用广播方式向一组高速缓冲存储器111发送清除报文的解释,可以参考上述各个实施例的阐述,不予赘述。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存 储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (29)

  1. 一种发送清除报文的方法,其特征在于,包括:
    请求设备采用广播方式向一组高速缓冲存储器发送清除报文,所述清除报文携带的物理地址的信息用于指示清除至少两个缓存线,所述一组高速缓冲存储器包括至少一个高速缓冲存储器;
    所述请求设备接收来自所述一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
  2. 根据权利要求1所述的方法,其特征在于,所述清除报文用于指示清除连续的所述至少两个缓存线。
  3. 根据权利要求2所述的方法,其特征在于,所述清除报文包括用于指示一段连续物理地址空间的物理地址的信息,所述至少两个缓存线包括所述一段连续物理地址空间的起始物理地址所在的缓存线到所述一段连续物理地址空间的结束物理地址所在的缓存线。
  4. 根据权利要求3所述的方法,其特征在于,所述连续物理地址空间为映射到主存的物理地址空间,清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且所述清除的连续物理地址空间为以缓存线的大小为单位的空间。
  5. 根据权利要求3或4所述的方法,其特征在于,所述一段连续物理地址空间的物理地址的信息包括页的起始物理地址,所述清除报文还包括页类型,所述页类型用于描述所述页的大小。
  6. 根据权利要求3或4所述的方法,其特征在于,所述一段连续物理地址空间的物理地址的信息包括页的起始物理地址和所述页的结束物理地址。
  7. 根据权利要求3或4所述的方法,其特征在于,所述一段连续物理地址空间的物理地址的信息包括至少一个页中的缓存线的物理地址,所述清除报文还包括每个页的页类型,所述页类型用于描述所述页的大小。
  8. 根据权利要求3或4所述的方法,其特征在于,所述一段连续物理地址空间的物理地址的信息包括一个缓存线的物理地址和缓存线的个数。
  9. 根据权利要求3或4所述的方法,其特征在于,所述一段连续物理地址空间的物理地址的信息包括一个物理地址和立即数,所述立即数指示所述一个物理地址中低位的个数。
  10. 根据权利要求1所述的方法,其特征在于,所述清除报文用于指示清除非连续的所述至少两个缓存线。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述请求设备包含但不限于以下设备:处理器核,外设,其他与总线相连接的设备。
  12. 一种清除缓存的方法,其特征在于,包括:
    高速缓冲存储器接收来自请求设备的清除报文,所述清除报文用于指示清除至少两个缓存线;
    所述高速缓冲存储器清除所述清除报文指示的至少两个缓存线中属于所述高速缓冲存储器管辖范围内的缓存线;
    所述高速缓冲存储器向所述请求设备发送清除完成报文。
  13. 根据权利要求12所述的方法,其特征在于,所述清除报文用于指示清除连续的所 述至少两个缓存线。
  14. 根据权利要求12所述的方法,其特征在于,所述清除报文用于指示清除非连续的所述至少两个缓存线。
  15. 根据权利要求12至14中任一项所述的方法,其特征在于,所述请求设备包含但不限于以下设备:处理器核,外设,其他与总线相连接的设备。
  16. 根据权利要求12至15中任一项所述的方法,其特征在于,所述清除报文清除的所有缓存线存储于一组高速缓冲存储器,所述一组高速缓冲存储器包含至少一个高速缓冲存储器,所述高速缓冲存储器是所述一组高速缓冲存储器中任意一个高速缓冲存储器。
  17. 根据权利要求16所述的方法,其特征在于,所述一组高速缓冲存储器包含的高速缓冲存储器属于非一致性内存访问NUMA系统中一个或多个NUMA节点。
  18. 一种处理器,其特征在于,包括至少一个处理器核和至少一个高速缓冲存储器,其中,
    所述处理器核,用于采用广播方式向一组高速缓冲存储器发送清除报文,所述清除报文携带的物理地址的信息用于指示清除至少两个缓存线,所述一组高速缓冲存储器包括至少一个高速缓冲存储器;
    所述高速缓冲存储器,用于接收来自所述处理器核的清除报文,所述清除报文用于指示清除至少两个缓存线;
    所述高速缓冲存储器,还用于清除所述清除报文指示的至少两个缓存线中属于所述高速缓冲存储器管辖范围内的缓存线;
    所述高速缓冲存储器,还用于向所述处理器核发送清除完成报文;
    所述处理器核,还用于接收来自所述一组高速缓冲存储器中所有高速缓冲存储器的清除完成报文。
  19. 根据权利要求18所述的处理器,其特征在于,所述清除报文用于指示清除连续的所述至少两个缓存线。
  20. 根据权利要求19所述的处理器,其特征在于,所述清除报文包括用于指示一段连续物理地址空间的物理地址的信息,所述至少两个缓存线包括所述一段连续物理地址空间的起始物理地址所在的缓存线到所述一段连续物理地址空间的结束物理地址所在的缓存线。
  21. 根据权利要求20所述的处理器,其特征在于,所述连续物理地址空间为映射到主存的物理地址空间,清除的连续物理地址空间的物理地址与缓存线的物理地址对齐,并且所述清除的连续物理地址空间为以缓存线的大小为单位的空间。
  22. 根据权利要求18所述的处理器,其特征在于,所述清除报文用于指示清除非连续的所述至少两个缓存线。
  23. 根据权利要求18至22中任一项所述的处理器,其特征在于,所述清除报文清除的所有缓存线存储于一组高速缓冲存储器,所述高速缓冲存储器是所述一组高速缓冲存储器中任意一个高速缓冲存储器。
  24. 根据权利要求18至23中任一项所述的处理器,其特征在于,所述一组高速缓冲存储器包含的高速缓冲存储器属于非一致性内存访问NUMA系统中一个或多个NUMA节点。
  25. 一种计算设备,其特征在于,所述计算设备包括处理器和外设,所述处理器包括至少一个处理器核和高速缓冲存储器,当所述处理器核或外设执行一组计算机指令时,实现上述权利要求1至11中任一项所述的方法,当所述高速缓冲存储器执行所述一组计算机指令 时,实现上述权利要求12至17中任一项所述的方法。
  26. 根据权利要求25所述的计算设备,其特征在于,所述处理器还包含环形总线、外设管理模块和内存管理器,所述处理器核、所述高速缓冲存储器、所述外设管理模块和所述内存管理器通过所述环形总线连接。
  27. 根据权利要求25所述的计算设备,其特征在于,所述处理器还包含网形总线、外设管理模块和内存管理器,所述处理器核、所述高速缓冲存储器、所述外设管理模块和所述内存管理器通过所述网形总线连接。
  28. 一种处理器,其特征在于,所述处理器包括至少一个处理器核、至少一个高速缓冲存储器、环形总线、外设管理模块和内存管理器,所述处理器核、所述高速缓冲存储器、所述外设管理模块和所述内存管理器通过所述环形总线连接,当所述处理器核执行一组计算机指令时,实现上述权利要求1至11中任一项所述的方法,当所述高速缓冲存储器执行所述一组计算机指令时,实现上述权利要求12至17中任一项所述的方法。
  29. 一种处理器,其特征在于,所述处理器包括至少一个处理器核、至少一个高速缓冲存储器、网形总线、外设管理模块和内存管理器,所述处理器核、所述高速缓冲存储器、所述外设管理模块和所述内存管理器通过所述网形总线连接,当所述处理器核执行一组计算机指令时,实现上述权利要求1至11中任一项所述的方法,当所述高速缓冲存储器执行所述一组计算机指令时,实现上述权利要求12至17中任一项所述的方法。
PCT/CN2021/093977 2020-09-07 2021-05-15 一种发送清除报文的方法及装置 WO2022048187A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21863257.8A EP4156565A4 (en) 2020-09-07 2021-05-15 METHOD AND APPARATUS FOR SENDING A RINSE MESSAGE
US18/177,140 US20230205691A1 (en) 2020-09-07 2023-03-02 Flush packet sending method and apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010930707 2020-09-07
CN202010930707.8 2020-09-07
CN202011040732.5 2020-09-28
CN202011040732.5A CN114157621A (zh) 2020-09-07 2020-09-28 一种发送清除报文的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/177,140 Continuation US20230205691A1 (en) 2020-09-07 2023-03-02 Flush packet sending method and apparatus

Publications (1)

Publication Number Publication Date
WO2022048187A1 true WO2022048187A1 (zh) 2022-03-10

Family

ID=80462172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093977 WO2022048187A1 (zh) 2020-09-07 2021-05-15 一种发送清除报文的方法及装置

Country Status (4)

Country Link
US (1) US20230205691A1 (zh)
EP (1) EP4156565A4 (zh)
CN (1) CN114157621A (zh)
WO (1) WO2022048187A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1268695A (zh) * 1999-03-31 2000-10-04 国际商业机器公司 用于改进超高速缓存性能的输入/输出页面删除确定
CN101617298A (zh) * 2004-06-08 2009-12-30 飞思卡尔半导体公司 用于dma、任务终止和同步操作的缓存一致保持
WO2017117734A1 (zh) * 2016-01-06 2017-07-13 华为技术有限公司 一种缓存管理方法、缓存控制器以及计算机系统
CN110209601A (zh) * 2018-02-28 2019-09-06 畅想科技有限公司 存储器接口
US20190317895A1 (en) * 2018-04-11 2019-10-17 MemRay Corporation Memory controlling device and memory system including the same
EP3588306A1 (en) * 2018-06-27 2020-01-01 INTEL Corporation Hardware-assisted paging mechanisms

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6711652B2 (en) * 2001-06-21 2004-03-23 International Business Machines Corporation Non-uniform memory access (NUMA) data processing system that provides precise notification of remote deallocation of modified data
US20180143903A1 (en) * 2016-11-22 2018-05-24 Mediatek Inc. Hardware assisted cache flushing mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1268695A (zh) * 1999-03-31 2000-10-04 国际商业机器公司 用于改进超高速缓存性能的输入/输出页面删除确定
CN101617298A (zh) * 2004-06-08 2009-12-30 飞思卡尔半导体公司 用于dma、任务终止和同步操作的缓存一致保持
WO2017117734A1 (zh) * 2016-01-06 2017-07-13 华为技术有限公司 一种缓存管理方法、缓存控制器以及计算机系统
CN110209601A (zh) * 2018-02-28 2019-09-06 畅想科技有限公司 存储器接口
US20190317895A1 (en) * 2018-04-11 2019-10-17 MemRay Corporation Memory controlling device and memory system including the same
EP3588306A1 (en) * 2018-06-27 2020-01-01 INTEL Corporation Hardware-assisted paging mechanisms

Also Published As

Publication number Publication date
EP4156565A4 (en) 2023-11-08
EP4156565A1 (en) 2023-03-29
CN114157621A (zh) 2022-03-08
US20230205691A1 (en) 2023-06-29

Similar Documents

Publication Publication Date Title
US6094708A (en) Secondary cache write-through blocking mechanism
RU2690751C2 (ru) Программируемые устройства для обработки запросов передачи данных памяти
WO2019161557A1 (zh) 一种通信的方法及装置
US8521962B2 (en) Managing counter saturation in a filter
WO2020000483A1 (zh) 数据处理的方法和存储系统
US9280290B2 (en) Method for steering DMA write requests to cache memory
US8037281B2 (en) Miss-under-miss processing and cache flushing
US11601523B2 (en) Prefetcher in multi-tiered memory systems
US20030023666A1 (en) System and method for low overhead message passing between domains in a partitioned server
EP3441884B1 (en) Method for managing translation lookaside buffer and multi-core processor
CN109684237B (zh) 基于多核处理器的数据访问方法和装置
US20190026225A1 (en) Multiple chip multiprocessor cache coherence operation method and multiple chip multiprocessor
US11567874B2 (en) Prefetch management in a hierarchical cache system
US11669453B2 (en) Data prefetching method and apparatus
WO2016019566A1 (zh) 内存管理方法、装置和系统、以及片上网络
WO2023093418A1 (zh) 数据迁移方法、装置及电子设备
TW202238399A (zh) 快速週邊組件互連裝置及其操作方法
CN115964319A (zh) 远程直接内存访问的数据处理方法及相关产品
WO2021213209A1 (zh) 数据处理方法及装置、异构系统
WO2022048187A1 (zh) 一种发送清除报文的方法及装置
WO2024113688A1 (zh) 闪存设备及其数据管理方法
US10489305B1 (en) Prefetch kill and revival in an instruction cache
TW202338810A (zh) 持久化記憶裝置、主機和持久化記憶裝置的控制方法
US8959278B2 (en) System and method for scalable movement and replication of data
US7606932B1 (en) Address packets with variable-size mask format

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021863257

Country of ref document: EP

Effective date: 20221222

NENP Non-entry into the national phase

Ref country code: DE