WO2023093380A1 - 一种转址旁路缓存的维护方法及相关设备 - Google Patents

一种转址旁路缓存的维护方法及相关设备 Download PDF

Info

Publication number
WO2023093380A1
WO2023093380A1 PCT/CN2022/126013 CN2022126013W WO2023093380A1 WO 2023093380 A1 WO2023093380 A1 WO 2023093380A1 CN 2022126013 W CN2022126013 W CN 2022126013W WO 2023093380 A1 WO2023093380 A1 WO 2023093380A1
Authority
WO
WIPO (PCT)
Prior art keywords
physical
cpu
range
physical cpu
cpus
Prior art date
Application number
PCT/CN2022/126013
Other languages
English (en)
French (fr)
Inventor
万波
蒋毅飞
范恒龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023093380A1 publication Critical patent/WO2023093380A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, and in particular to a method for maintaining a forwarding bypass cache and related equipment.
  • cloud computing services are usually provided by a large number of servers, and the number of computing cores on a single server can even reach more than 200 as the demand for cloud services continues to rise.
  • software programs usually run in parallel on multiple computing cores, such as MapReduce (a programming model), software transactional memory, concurrent garbage collection mechanisms and other software programs.
  • MapReduce a programming model
  • the translation lookaside buffer TLB
  • Different threads in the same process can run on different computing cores. When one of the threads modifies the page table, it not only needs to modify the TLB information of the computing core where the thread resides, but also notifies other computing cores to update the corresponding TLB information to ensure Consistency of TLB information of different computing cores.
  • Embodiments of the present application provide a method for maintaining forwarding lookaside cache and related equipment, which can greatly reduce the TLB consistency maintenance delay.
  • the method for maintaining the forwarding lookaside cache may be executed by an electronic device or the like.
  • An electronic device refers to a device that can be abstracted as a computer system, wherein an electronic device that supports the function of maintaining the bypass cache may also be called a maintenance device for the bypass cache.
  • the maintenance device for forwarding bypass cache can be the whole electronic device, for example: smart wearable device, smart phone, tablet computer, notebook computer, desktop computer, on-board computer or server, etc.; it can also be composed of multiple A system/device composed of the whole machine; it can also be a part of the electronic equipment, for example: a chip related to the forwarding bypass cache maintenance function, such as a system chip (system on a chip, SoC), etc., the implementation of this application The example does not specifically limit this. Wherein, the system chip is also referred to as a system on chip.
  • the embodiment of the present application provides a method for maintaining the forwarding look-aside cache, which is applied to an electronic device, and the electronic device includes a plurality of physical central processing units CPU; the electronic device runs a first process, The first process currently includes M first threads; the M first threads are currently running on M physical CPUs among the plurality of physical CPUs; M is an integer greater than or equal to 1; the method Including: determining the physical CPU range S1 currently corresponding to the first process, the physical CPU range S1 including the M physical CPUs currently running the first thread in the first process; based on the first process
  • the page table information maintained by a process updates the TLB information maintained by all the physical CPUs in the physical CPU range S1 respectively.
  • the method provided in the first aspect in a non-virtualization scenario, firstly, according to which physical CPUs a plurality of threads (for example, the first thread) in any process (for example, the first process) are currently running on, it is possible to determine that the process is currently The corresponding physical CPU range (for example, physical CPU range S1). In this way, when the page table information maintained by the process is modified, the TLB information maintained by all the physical CPUs within the physical CPU range can be updated synchronously according to the modified page table information, thereby avoiding all physical CPUs within the physical CPU range The CPU has a TLB access error while running multiple threads within the process.
  • the corresponding physical CPU range for example, physical CPU range S1
  • TLB refresh requests can only be sent to all physical CPUs in the device or system indiscriminately, and then wait for all physical CPUs to complete the TLB refresh ( In other words, the TLB information is updated), thus resulting in a long time for maintaining the consistency of the entire TLB and a large cost
  • the embodiment of the present application can maintain the consistency of the TLB within a clear and small range, greatly reducing the maintenance Overhead and maintenance delays effectively improve the memory access performance of the entire system. It should be understood that, considering that each process maintains its own page table information in the embodiment of the present application, it only needs to perform corresponding TLB consistency maintenance for the physical CPU currently running the process. In this way, the embodiment of the present application can greatly reduce the range of physical CPUs that need to maintain TLB consistency each time on the premise of effectively avoiding TLB access errors, thereby achieving efficient, convenient and accurate TLB consistency maintenance.
  • the M physical CPUs include a first physical CPU and M-1 second physical CPUs; wherein, before the first thread runs on the first physical CPU, the first Two threads run on the first physical CPU, and M-1 first threads among the M first threads respectively run on M-1 second physical CPUs; the method also Including: after the thread on the first physical CPU is switched from the second thread to the first thread in the first process, judging whether the second thread belongs to the first process; if the If the second thread does not belong to the first process, update the physical CPU range S2 corresponding to the first process to obtain the current physical CPU range S1; M-1 second physical CPUs of the first thread in a process.
  • the physical CPU range needs to be updated in real time.
  • a thread running on any physical CPU such as the first physical CPU
  • the switched thread such as the first thread
  • the thread before the switching such as the second thread
  • Process switching has occurred on a physical CPU, a process is no longer running on the first physical CPU, and another process is about to run on the first physical CPU
  • the physical CPU range corresponding to the switched process can be Perform an update to increase the current physical CPU (such as the first physical CPU), so as to obtain a new physical CPU range (such as physical CPU range S1) including the current physical CPU, so as to ensure that subsequent TLB consistency can be performed within the accurate physical CPU range Consistent maintenance, which ensures the accuracy and effectiveness of TLB consistent maintenance.
  • the second thread belongs to a second process; before the first thread runs on the first physical CPU, the N second threads in the second process Respectively run on the first physical CPU and N-1 third physical CPUs among the plurality of physical CPUs; N is an integer greater than or equal to 1; the method further includes: on the first physical CPU After the thread above is switched from the second thread to the first thread in the first process, update the physical CPU range S3 corresponding to the second process to obtain the physical CPU range S4; the physical CPU range S3 Including the first physical CPU running the second thread in the second process before updating and N-1 third physical CPUs; the range of physical CPUs S4 includes currently running the second thread N-1 third physical CPUs of the second thread in the process.
  • the physical CPU range corresponding to the process before switching (for example, the second process) needs to be updated in real time.
  • the first physical CPU is deleted within the scope of the physical CPU corresponding to the process before switching. Therefore, the range of the physical CPUs currently corresponding to any process is kept within the range of the physical CPUs currently running the process. This ensures that whenever the page table information maintained by any process is modified, efficient and convenient TLB consistency maintenance can be performed within the accurate physical CPU range, improving maintenance efficiency.
  • the method further includes: based on the updating of the physical CPU ranges corresponding to the first process and the second process, changing the physical CPU range corresponding to the first physical CPU by the The physical CPU range S3 is updated to the physical CPU range S1; and, the physical CPU ranges corresponding to the M-1 second physical CPUs are updated from the physical CPU range S2 to the physical CPU range S1; and and updating the physical CPU range corresponding to each of the N-1 third physical CPUs from the physical CPU range S3 to the physical CPU range S4.
  • the physical CPU range corresponding to the process currently running on a certain physical CPU can also be used as the physical CPU range corresponding to the physical CPU, so as to provide an accurate range for the physical CPU to subsequently send TLB refresh requests , to achieve efficient and convenient TLB consistency maintenance.
  • physical CPU range information is stored in the electronic device; the physical CPU range information currently includes at least the physical CPU range S1 corresponding to each of the M physical CPUs, and N-1 The physical CPU range S4 corresponding to each of the third physical CPUs.
  • the current physical CPU range corresponding to each physical CPU may also be stored in the electronic device, so as to constitute the physical CPU range information globally visible to software and hardware.
  • the range information can provide an accurate range for any physical CPU to subsequently send a TLB refresh request, so as to realize efficient and convenient TLB consistency maintenance.
  • methods including but not limited to register bank, memory, and cache (cache) may be used to store the globally visible physical CPU range information of software and hardware.
  • updating the TLB information maintained by all physical CPUs in the physical CPU range S1 based on the page table information maintained by the first process includes: After the page table information of the M first physical CPUs is modified by the currently running first thread of the target physical CPU, based on the modified page table information, the TLB information maintained by the target physical CPU is updated; by The target physical CPU sends a TLB refresh request to the remaining physical CPUs in the physical CPU range S1; the TLB refresh request is used for the remaining physical CPUs in the first physical range to synchronously update the TLB information maintained by them, so that The TLB information maintained by all the physical CPUs in the physical CPU range S1 is consistent.
  • the target physical CPU when a running thread (such as the first thread) on a certain physical CPU (such as the target physical CPU) modifies the corresponding page table information, the target physical CPU can Update the TLB information maintained by itself, and send TLB refresh requests to other physical CPUs within the range of the current physical CPU, so that other physical CPUs also update the TLB information maintained by themselves according to the modified page table information, thereby ensuring that the physical CPU range
  • the TLB information maintained by all the physical CPUs in the process is consistent, so as to avoid TLB access errors when all the physical CPUs in the physical CPU range run multiple threads in the process.
  • the embodiment of the present application can quickly and conveniently complete TLB consistency maintenance within a small range, greatly reducing TLB maintenance delay.
  • the embodiment of the present application does not specifically limit the order in which the target physical CPU modifies the TLB information maintained by itself and sends TLB refresh requests to other physical CPUs within the range of the physical CPU.
  • the sending a TLB refresh request to other physical CPUs in the physical CPU range S1 through the target physical CPU includes: sending the TLB refresh request to an inter-core interconnection network through the target physical CPU.
  • a TLB refresh request; the inter-core interconnection network is a bus or a network on chip NOC; receiving the TLB refresh request through the inter-core interconnection network, determining that the TLB refresh request corresponds to the target physical CPU, and receiving the TLB refresh request from the physical CPU
  • the target physical CPU can send a TLB refresh request to the inter-core interconnection network (that is, the communication medium, such as a bus and an on-chip network, etc.), and then the inter-core interconnection network searches for the target physical CPU from the physical CPU range information The current corresponding physical CPU range (for example, the physical CPU range S1 ), so as to clarify the current physical CPU range that needs to be maintained for TLB consistency.
  • the inter-core interconnection network can send TLB refresh requests to other physical CPUs within the range except the target physical CPU, so that they can update the TLB information maintained by them synchronously. In this way, on the premise of ensuring that the TLB memory access does not go wrong, the TLB consistency is maintained within a necessary and small range, which greatly reduces the TLB consistency maintenance delay.
  • the sending a TLB refresh request to other physical CPUs in the physical CPU range S1 through the target physical CPU includes: Obtain the physical CPU range S1 corresponding to the target physical CPU, and send a TLB refresh request to the inter-core interconnection network; the TLB refresh request carries indication information related to the physical CPU range S1; the inter-core interconnection The network is a bus or a network on chip NOC; the TLB refresh request is received through the inter-core interconnection network, and the physical CPU range S1 is determined according to the TLB refresh request; The other physical CPUs in S1 send the TLB refresh request.
  • the target physical CPU can also search for its current corresponding physical CPU range (for example, physical CPU range S1) from the physical CPU range information, so as to clarify the current physical CPU range that needs to be maintained for TLB consistency. Subsequently, the target physical CPU can send a TLB refresh request to the inter-core interconnection network, and the inter-core interconnection network sends TLBs to other physical CPUs within the range of the corresponding physical CPU except the target physical CPU according to the relevant indication information carried in the TLB refresh request Refresh the request to make it update the TLB information maintained by them synchronously. In this way, on the premise of ensuring that the TLB memory access does not go wrong, the TLB consistency is maintained within a necessary and small range, which greatly reduces the TLB consistency maintenance delay.
  • the target physical CPU can send a TLB refresh request to the inter-core interconnection network, and the inter-core interconnection network sends TLBs to other physical CPUs within the range of the corresponding physical CPU except the target physical CPU according to the relevant indication information carried in the TLB
  • the method further includes: receiving feedback signals sent by each of the M-1 physical CPUs in the physical CPU range S1, and determining the CPUs in the physical CPU range S1 based on the feedback signals.
  • the TLB information maintained by all physical CPUs is consistent.
  • any physical CPU (for example, the target physical CPU) that executes the TLB refresh instruction can continue to execute subsequent instructions, that is, the target physical CPU, after receiving feedback signals from other physical CPUs within the range of the current physical CPU. It needs to be blocked until the TLB consistency maintenance is completed, so as to ensure that subsequent TLB memory accesses will not go wrong.
  • the embodiment of the present application can guarantee On the premise of no error in TLB memory access, TLB consistency can be efficiently and conveniently maintained within a necessary and small range, which greatly shortens the blocking time of the target physical CPU and improves the memory access performance of the entire system.
  • the TLB refresh request carries corresponding TLB refresh information;
  • the TLB refresh information includes the process identifier corresponding to the first process, the virtual One or more of addresses and virtual address ranges;
  • the TLB refresh request is specifically used for the rest of the physical CPUs in the physical CPU range S1 based on the TLB refresh information, while keeping running their respective threads, through The hardware updates the TLB information maintained by them.
  • the TLB refresh request also carries corresponding TLB refresh information, which may include but not limited to the process identifier corresponding to the current process (such as the first process) and the virtual address corresponding to the modified page table information and virtual address ranges etc.
  • TLB refresh information may include but not limited to the process identifier corresponding to the current process (such as the first process) and the virtual address corresponding to the modified page table information and virtual address ranges etc.
  • the physical CPUs in the current physical CPU range can quickly and accurately complete the TLB refresh according to the TLB refresh information carried in the request, ensuring that the TLB information maintained by each physical CPU in the range is consistent.
  • the TLB refresh process involved in the embodiment of the present application can be directly completed by hardware without interrupting the software process (such as the first thread) running on each physical CPU, thereby further improving the efficiency and convenience of TLB consistency maintenance sex.
  • the embodiment of the present application provides a method for maintaining forwarding look-aside cache, which is characterized in that it is applied to an electronic device, and the electronic device includes a plurality of physical central processing units CPU; the electronic device runs a The first virtual machine, the first virtual machine currently includes M first virtual CPUs; the M first virtual CPUs are currently running on M physical CPUs among the plurality of physical CPUs; M is greater than or an integer equal to 1; the method includes: determining the physical CPU range S1 currently corresponding to the first virtual machine, and the physical CPU range S1 includes the first virtual CPU currently running in the first virtual machine The M physical CPUs; based on the page table information maintained by the first virtual machine, update the TLB information maintained by all the physical CPUs in the range S1 of the physical CPUs.
  • the method provided in the second aspect in a virtualization scenario, firstly, it can be determined according to which physical CPUs multiple virtual CPUs (such as the first virtual CPU) in any virtual machine (such as the first virtual machine) are currently running on.
  • the physical CPU range corresponding to the virtual machine for example, the physical CPU range S1 .
  • the TLB information maintained by all the physical CPUs within the range of the physical CPU can be synchronously updated according to the modified page table information, thereby avoiding all TLB information within the range of the physical CPU A TLB access error occurs when the physical CPU runs multiple virtual CPUs in the virtual machine.
  • TLB refresh requests can only be sent to all physical CPUs in the device or system indiscriminately, and then wait for all physical CPUs to complete the TLB refresh (or TLB information update), thus resulting in a long time for maintaining the consistency of the entire TLB and a large overhead
  • the embodiment of the present application can maintain the consistency of the TLB within a clear and small range, greatly reducing the Maintenance overhead and maintenance delay effectively improve the memory access performance of the entire device or system. It should be understood that the embodiment of the present application considers that each virtual machine maintains its own page table information, so it only needs to perform corresponding TLB consistency maintenance for the physical CPU that is currently re-running the virtual machine. In this way, the embodiment of the present application can greatly reduce the range of physical CPUs that need to maintain TLB consistency each time on the premise of effectively avoiding TLB access errors, thereby achieving efficient, convenient and accurate TLB consistency maintenance.
  • the embodiment of the present application can maintain the physical CPU range corresponding to each process, so as to achieve the necessary and smaller physical CPU range no matter in the virtualization scenario or the non-virtualization scenario. Perform efficient and convenient TLB consistency maintenance.
  • the M physical CPUs include a first physical CPU and M-1 second physical CPUs; where, before the first virtual CPU runs on the first physical CPU, The second virtual CPU runs on the first physical CPU, and M-1 first virtual CPUs among the M first virtual CPUs run on M-1 second physical CPUs respectively;
  • the method further includes: after the virtual CPU on the first physical CPU is switched from the second virtual CPU to the first virtual CPU in the first virtual machine, determining whether the second virtual CPU Belongs to the first virtual machine; if the second virtual CPU does not belong to the first virtual machine, update the physical CPU range S2 corresponding to the first virtual machine to obtain the current physical CPU range S1;
  • the physical CPU range S2 includes M ⁇ 1 second physical CPUs running the first virtual CPU in the first virtual machine before the update.
  • the second virtual CPU belongs to a second virtual machine; before the first virtual CPU runs on the first physical CPU, the N all The second virtual CPU runs on the first physical CPU and N-1 third physical CPUs among the plurality of physical CPUs respectively; N is an integer greater than or equal to 1; the method further includes: After the virtual CPU on the first physical CPU is switched from the second virtual CPU to the first virtual CPU in the first virtual machine, the physical CPU range S3 corresponding to the second virtual machine is updated to obtain the physical CPU range S4; the physical CPU range S3 includes the first physical CPU and N-1 third physical CPUs running the second virtual CPU in the second virtual machine before the update; the The physical CPU range S4 includes N ⁇ 1 third physical CPUs currently running the second virtual CPUs in the second virtual machine.
  • the method further includes: based on the updating of the physical CPU ranges corresponding to the first virtual machine and the second virtual machine, updating the physical CPU range corresponding to the first physical CPU to updating the physical CPU range S3 to the physical CPU range S1; and updating the physical CPU ranges corresponding to each of the M-1 second physical CPUs from the physical CPU range S2 to the physical CPU range S1 and, updating the physical CPU range corresponding to each of the N-1 third physical CPUs from the physical CPU range S3 to the physical CPU range S4.
  • physical CPU range information is stored in the electronic device; the physical CPU range information currently includes at least the physical CPU range S1 corresponding to each of the M physical CPUs, and N-1 The physical CPU range S4 corresponding to each of the third physical CPUs.
  • updating the TLB information maintained by all physical CPUs in the physical CPU range S1 based on the page table information maintained by the first virtual machine includes: After the page table information maintained by the machine is modified by the first virtual CPU currently running on the target physical CPU among the M first physical CPUs, update the TLB maintained by the target physical CPU based on the modified page table information Information; the target physical CPU sends a TLB refresh request to the rest of the physical CPUs in the physical CPU range S1; the TLB refresh request is used to synchronously update the TLB information maintained by the rest of the physical CPUs in the first physical range , so that the TLB information maintained by all the physical CPUs in the physical CPU range S1 is consistent.
  • the sending a TLB refresh request to other physical CPUs in the physical CPU range S1 through the target physical CPU includes: sending the TLB refresh request to an inter-core interconnection network through the target physical CPU.
  • a TLB refresh request; the inter-core interconnection network is a bus or a network on chip NOC; receiving the TLB refresh request through the inter-core interconnection network, determining that the TLB refresh request corresponds to the target physical CPU, and receiving the TLB refresh request from the physical CPU
  • the sending a TLB refresh request to other physical CPUs in the physical CPU range S1 through the target physical CPU includes: Obtain the physical CPU range S1 corresponding to the target physical CPU, and send a TLB refresh request to the inter-core interconnection network; the TLB refresh request carries indication information related to the physical CPU range S1; the inter-core interconnection The network is a bus or a network on chip NOC; the TLB refresh request is received through the inter-core interconnection network, and the physical CPU range S1 is determined according to the TLB refresh request; The other physical CPUs in S1 send the TLB refresh request.
  • the method further includes: receiving feedback signals sent by each of the M-1 physical CPUs in the physical CPU range S1, and determining the physical CPU range S1 based on the feedback signals
  • the TLB information maintained by all physical CPUs in the CPU is consistent and executes subsequent instructions.
  • the TLB refresh request carries corresponding TLB refresh information;
  • the TLB refresh information includes the virtual machine identifier corresponding to the first virtual machine, the modified One or more of the virtual address in the virtual machine corresponding to the page table information and the virtual address range;
  • the TLB refresh request is specifically used for other physical CPUs in the physical CPU range S1 based on the TLB refresh information, Under the condition that the respective virtual CPUs are kept running, the TLB information maintained by each is updated through hardware.
  • an embodiment of the present application provides an electronic device, the electronic device includes multiple physical CPUs; a first process runs on the electronic device, and the first process currently includes M first threads , the M first threads are currently running on M physical CPUs among the plurality of physical CPUs; M is an integer greater than 1; wherein, the M physical CPUs are used to determine the first process Corresponding physical CPU range S1, the physical CPU range S1 includes M physical CPUs; M physical CPUs are used for when the page table information maintained by the first process is modified, based on the modified page table information, and synchronously update the forwarding lookaside cache TLB information maintained by all the physical CPUs in the physical CPU range S1.
  • the embodiment of the present application provides an electronic device, the electronic device includes a plurality of physical CPUs; a first virtual machine runs on the electronic device, and the first virtual machine currently includes M A virtual CPU, the M first virtual CPUs are currently running on M physical CPUs among the plurality of physical CPUs; M is an integer greater than 1; wherein, the M physical CPUs are used to determine the The physical CPU range S1 corresponding to the first virtual machine, the physical CPU range S1 includes M physical CPUs; the M physical CPUs are used when the page table information maintained by the first virtual machine is modified , based on the modified page table information, synchronously update the TLB information maintained by all the physical CPUs in the physical CPU range S1 respectively.
  • the embodiment of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to support the electronic device to execute any one of the redirection look-aside caches provided in the first aspect or the second aspect.
  • the corresponding function in the maintenance method may also include a memory, which is used to be coupled with the processor, and stores necessary program instructions and data of the electronic device.
  • the electronic device may also include a communication interface for the electronic device to communicate with other devices or a communication network.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, any one of the above-mentioned first aspect or second aspect is implemented.
  • the embodiment of the present application provides a computer program, the computer program includes instructions, and when the computer program is executed by the computer, the computer can execute any one of the redirection provided by the first aspect or the second aspect.
  • Way cache maintenance method flow when the computer program is executed by the computer, the computer can execute any one of the redirection provided by the first aspect or the second aspect.
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the processor is used to call and run instructions from the communication interface, and when the processor executes the instructions, the chip Execute the process of any one of the methods for maintaining the forwarding look-aside cache provided in the first aspect or the second aspect.
  • the embodiment of the present application provides a chip system
  • the chip system includes the electronic device described in any one of the above-mentioned third aspect or the fourth aspect, and is used to implement the above-mentioned first aspect or the second aspect.
  • the system-on-a-chip further includes a memory, and the memory is used for storing program instructions and data necessary for the maintenance method of the redirect look-aside cache.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for maintaining a forwarding lookaside cache provided by an embodiment of the present application.
  • FIG. 4a is a schematic diagram of an update flow of a physical CPU range provided by an embodiment of the present application.
  • FIG. 4b is a schematic diagram of physical CPU range information provided by an embodiment of the present application.
  • FIG. 4c is a schematic diagram of an update flow of physical CPU range information provided by an embodiment of the present application.
  • Fig. 5a is a schematic diagram of another updating process of a physical CPU range provided by an embodiment of the present application.
  • Fig. 5b is a schematic diagram of another kind of physical CPU range information provided by the embodiment of the present application.
  • FIG. 5c is a schematic diagram of another update flow of physical CPU range information provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a TLB refresh process provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another TLB refresh process provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of another TLB refresh process provided by the embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a processor and a processor may be components.
  • One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
  • packets of data e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems.
  • CPU central process unit
  • CPUs usually refers to the number of CPUs actually configured on the computer.
  • CPUs can generally be divided into single-core CPUs and multi-core CPUs, wherein a single-core CPU generally only includes a single CPU core (or called a physical core, that is, the above-mentioned computing core), and a multi-core CPU may include multiple CPU cores.
  • the physical CPU involved in the embodiment of the present application may be a single-core CPU, may also be a multi-core CPU, and may also be a CPU core in a multi-core CPU, and no further explanation will be given later.
  • multiple physical CPUs may run multiple threads in parallel, and the multiple threads may belong to different processes.
  • multiple physical CPUs may run multiple virtual CPUs (virtual central process unit, vCPU) in parallel, and the multiple virtual CPUs may belong to different virtual machines.
  • vCPU virtual central process unit
  • a process is a running activity of a program with certain independent functions on a certain data set, which is equivalent to the execution process of the program.
  • a process can often include multiple threads.
  • Virtual machine refers to a complete computer system that is simulated by software and has complete hardware system functions and runs in a completely isolated environment.
  • a virtual machine can often include multiple virtual CPUs.
  • the page table is a special data structure used to store the correspondence between logical (virtual) addresses and physical addresses.
  • Each process (or virtual machine) has its own page table, and multiple threads in the process (or multiple virtual CPUs in the virtual machine) share this page table.
  • threads (or virtual CPUs) need to fetch data , you can get the data by querying the page table to get the physical address.
  • the translation lookaside buffer is a small, virtually addressed cache in which each row holds a block consisting of a single page table entry (PTE).
  • the TLB is used for the interaction between the virtual address and the physical address, and provides a cache area for finding the physical address, which can effectively reduce the time spent in finding the physical address. If there is no TLB, each data fetch requires two accesses to the memory, that is, look up the page table to obtain the physical address and fetch the data. Simply put, the TLB is the cache (cache) of the page table, which stores the page table entries that are most likely to be accessed at present, and its content is a copy of some page table entries.
  • each process maintains its own page table, and multiple threads in the process (or multiple virtual CPUs in the virtual machine) share this page table.
  • a thread in a process running on the local physical CPU modifies the page table information
  • the local physical CPU needs to update its own TLB information accordingly, and notify other The physical CPUs update their TLB information synchronously to maintain the consistency of the TLB information and avoid TLB access errors.
  • TLB shootdown solution based on inter-process interrupt (IPI) to maintain TLB consistency
  • TLB broadcast solution based on hardware broadcast instructions to maintain TLB consistency
  • the physical CPUs where multiple threads sharing page tables in a process reside maintain and update their respective TLB information through software.
  • the physical CPU where the vCPU resides also maintains and updates its TLB information through software.
  • any thread in a process running on a physical CPU such as a local physical CPU
  • any vCPU in a virtual machine modifies the page table information shared between multiple cores, it can be notified by an inter-core interrupt
  • make other physical CPUs refresh (or invalidate) corresponding TLB entries so as to keep the TLB information on each physical CPU as the latest valid information.
  • the overhead of software maintenance is large, causing a large delay.
  • the CPU that generates the IPI needs to keep blocking from sending an interrupt to notify other physical CPUs until the other physical CPUs respond and update their respective TLB information.
  • the virtual machine in order to maintain the consistency of the TLB, the virtual machine needs to trap to the hypervisor (virtual machine monitor) to send an IPI to notify the remote virtual CPU to refresh the TLB. This operation will cause the virtual machine to trap and trap As a result, the software path process is longer and the cost of consistency maintenance is greater.
  • TLB broadcast solution in a non-virtualized scenario, when any thread in a process running on a physical CPU (or any vCPU in a virtual machine) modifies the page table information shared between multiple cores, it can directly pass
  • the hardware broadcast notifies all other physical CPUs in the stand-alone system (such as servers and other devices) to refresh TLB information.
  • the physical CPU that executes the hardware broadcast instruction must remain blocked until it receives a signal that the TLB flushing is completed by all other physical CPUs in the system before continuing to execute subsequent instructions. The same is true for virtualization scenarios, and will not be described in detail here.
  • the TLB broadcast solution can broadcast through hardware, it does not interrupt the services being executed on other physical CPUs, there is no software process overhead, and it can also avoid virtual machines from falling into traps in virtualization scenarios.
  • the indiscriminate notification of physical CPUs in the broadcast mechanism causes the bus to be occupied for a long time, resulting in a large number of bus contention behaviors, increasing the TLB consistency maintenance delay, and not having good scalability. Scalability.
  • the actual technical problems to be solved in this application include the following aspects: update the physical CPU range corresponding to the currently running process or virtual machine through software; When the page table information is modified, the TLB consistency within the physical CPU range is maintained by hardware. This greatly reduces the time spent on TLB consistency maintenance, thereby improving the memory access performance and efficiency of the entire system.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the technical solutions of the embodiments of the present application may be specifically implemented in the system architecture shown in FIG. 1 or a similar system architecture.
  • the system architecture may include multiple physical CPUs and physical CPU range information modules, where the multiple physical CPUs may specifically include CPU-1, CPU-1, and CPU-3.
  • CPU-1, CPU-1, and CPU-3 are respectively provided with TLB refresh modules directed to the CPU.
  • a physical CPU range maintenance module is also deployed in the monitor. Further, as shown in FIG.
  • the system architecture also includes an inter-core interconnection network, and CPU-1, CPU-1 and CPU-3 can communicate through the inter-core interconnection network.
  • the inter-core interconnection network may include but not limited to a bus or a network on a chip (network on a chip, NoC) and other technical implementations.
  • the physical CPU range maintenance module is used to maintain the physical CPU range corresponding to the process currently running on the physical CPU or the virtual machine.
  • the corresponding physical CPU ranges of the virtual machines are updated.
  • the physical CPU range maintenance module may be a software module, that is, in this embodiment of the present application, the physical CPU range corresponding to each process or virtual machine may be maintained by software.
  • the virtual machine monitor is responsible for maintaining the physical CPU range corresponding to the virtual machine, and the maintenance strategy is as follows:
  • the range of physical CPUs corresponding to virtual machine 1 before the switchover includes CPU-1 and CPU-2.
  • the CPU-1 Virtual machine 1 is no longer running (specifically, vCPU1 in virtual machine 1 is no longer running), so CPU-1 needs to be deleted within the original physical CPU range to update the current physical CPU range corresponding to virtual machine 1.
  • the update The following physical CPU range includes CPU-2.
  • the virtual machine 2 runs on CPU-3, so before the switchover, the physical CPU range corresponding to the virtual machine 2 includes CPU-3, and after the switchover, the virtual machine 2 starts running on the CPU-1 (Specifically start running vCPU4 in virtual machine 2), so the CPU-1 needs to be added in the original physical CPU range to update the current physical CPU range corresponding to virtual machine 2.
  • the updated physical CPU range includes CPU -1 and CPU-3.
  • the physical CPU corresponding to the virtual machine may not be updated range to reduce the update frequency of the physical CPU range and reduce software maintenance overhead.
  • the vCPU1 currently running on CPU-1 belongs to virtual machine 1. If vCPU5 (not shown in the figure) in virtual machine 1 is switched online on CPU-1 at the next moment, that is, on CPU-1 If the running virtual machine is still virtual machine 1, the original physical CPU range (including CPU-1 and CPU-2) corresponding to virtual machine 1 is maintained.
  • the TLB information related to the online virtual machine makes the TLB information currently maintained on the physical CPU correspond to the page table information maintained by the currently running virtual machine. It can be understood that, in the case described in (2), if the range of the physical CPU does not need to be updated, that is, no virtual machine switching has occurred on the physical CPU, correspondingly there is no need to refresh the TLB information on the physical CPU.
  • the kernel is responsible for maintaining the physical CPU range corresponding to the process, and the maintenance strategy is as follows:
  • process 1 runs on CPU-1 and CPU-2, so the physical CPU range corresponding to process 1 before the switch includes CPU-1 and CPU-2, after the switch, CPU-1 is no longer Run process 1 (specifically, the thread in process 1 is no longer running), so the CPU-1 needs to be deleted in the original physical CPU range to update the current physical CPU range corresponding to process 1, and the updated physical CPU range Including CPU-2.
  • the process 2 runs on CPU-3, so the physical CPU range corresponding to the process 2 before the switch includes CPU-3, after the switch, the process 2 starts to run on the CPU-1 (specifically, Start running thread 4 in process 2), so CPU-1 needs to be added in the original physical CPU range to update the current physical CPU range corresponding to process 2.
  • the updated physical CPU range includes CPU-1 and CPU -3.
  • the physical CPU range corresponding to the process may not be updated to reduce physical CPU-wide update frequency, reducing software maintenance overhead. For example, as shown in Figure 1, thread 1 currently running on CPU-1 belongs to process 1, if thread 5 (not shown in the figure) in process 1 is switched online at CPU-1 at the next moment, that is, on CPU-1 If the running process is still process 1, the original physical CPU range (including CPU-1 and CPU-2) corresponding to process 1 is maintained.
  • the kernel-mode thread does not belong to the scope of the above-mentioned process. Therefore, switching between a process and a kernel-mode thread does not affect the physical CPU bounds of the process.
  • the physical CPU range information module is used to store the globally visible physical CPU range of the software and hardware corresponding to each physical CPU (that is, the physical CPU range corresponding to the process currently running on each physical CPU or the virtual machine).
  • the physical CPU range information includes the physical CPU where each thread that shares the page table in each process resides; in a virtualization scenario, the physical CPU range information includes each thread in the virtual machine. The physical CPU where the vCPU resides.
  • the physical CPU range information module can be accessed by original or newly added software modules (such as kernel software) and hardware modules to update or obtain each physical CPU (or process or virtual machine running on each physical CPU) The current corresponding physical CPU range.
  • the physical CPU range in the embodiment of the present application is used to indicate which physical CPUs the process or virtual machine is currently running on, and further, it also indicates the physical CPU range where the process or virtual machine needs to maintain TLB consistency.
  • the implementation manner of the physical CPU range includes but is not limited to a description manner such as a bitmap.
  • the physical CPU range information module is globally visible, and the way to implement the physical CPU range information to be globally visible includes but not limited to technical means such as kernel address mapping.
  • the physical CPU range information module allocates a corresponding storage space for each physical CPU, which is dedicated to storing the process running on the CPU or the physical CPU range maintained by the virtual machine.
  • the physical CPU range information module may be composed of a register set, and may also be a part in the memory address space or a cache memory, etc., which is not specifically limited in this embodiment of the present application.
  • the physical CPU range information module may be located in a public location in the device or system, or in a physical CPU (such as CPU-1, CPU-1 and CPU-3).
  • the embodiment of the present application can also use different address physical spaces to record the physical CPU range information for the non-virtualization scenario and the virtualization scenario, so that the CPU can be in the virtualization mode and the non-virtualization mode There is no need to update the physical CPU range information stored on the CPU when switching between the two.
  • the TLB refreshing module of the directional CPU is used to modify the page table information maintained by the current process (such as process 1) when the thread running on the local physical CPU (such as thread 1 running on CPU-1) can modify the page table information according to the modified
  • the page table information of the local physical CPU updates the TLB information maintained by the local physical CPU (for example, refreshing or invalidating the corresponding TLB entries, etc.), and according to the physical CPU range corresponding to the current process (for example, including CPU-1 and CPU-2), to the range All other physical CPUs (including CPU-2, for example) send TLB refresh requests, which greatly reduces the range of physical CPUs where the TLBs that need to be maintained by the hardware are located.
  • the coverage of the CPU-oriented TLB flushing module includes, but is not limited to, modules on the CPU side and an inter-core interconnection network connected to the CPU.
  • the inter-core interconnection network as shown in Figure 1 can obtain the physical CPU range corresponding to the current physical CPU from the physical CPU range information module, so as to provide the specified physical CPU (that is, the physical CPU range except the current physical CPU) All the rest of the physical CPUs) send a TLB refresh request, and receive a feedback signal from the specified physical CPU, to ensure that its TLB refresh action ends, that is, it is determined that the TLB consistency maintenance is completed.
  • each part of the system architecture can be located in an electronic device, which can be a smart wearable device, a smart phone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted computer or a server, etc.
  • an electronic device can be a smart wearable device, a smart phone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted computer or a server, etc.
  • it can be A server, or multiple servers can form a server cluster or a cloud computing service center.
  • the electronic device may also be a part of the above-mentioned device, such as a chip having the above-mentioned function, which is not specifically limited in this embodiment of the present application.
  • FIG. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • the embodiment of the present application can be applied to non-virtualized scenarios under private clouds and virtualized scenarios under public clouds, and can be oriented to server types such as virtual servers, functional computing servers, bare metal servers, and container servers.
  • the software process of the embodiment of the present application can be applied to the host operating system (i.e., the kernel) or the virtual machine monitor layer, and the software is deployed in a TLB that supports directional CPU refresh module and physical CPU range information on the physical server of the module.
  • the host operating system i.e., the kernel
  • the virtual machine monitor layer i.e., the virtual machine monitor layer
  • the products of multiple companies will be deployed on the public cloud. They will be affected by each other, and even be infected with harmful viruses.
  • Virtualization technology can be used to isolate the business deployed by each company to ensure its security.
  • the virtual server and function computing server shown in Figure 2 are common servers in a virtualization scenario.
  • the virtual machine operating system and security container in Figure 2 can be based on the maintenance method provided by the embodiment of this application, by calling The TLB refresh instruction efficiently and conveniently maintains the TLB consistency of the virtual machine under the multi-core architecture (that is, multiple physical CPUs) within a small range.
  • the user-mode process in the bare metal server and the container in the container server shown in Figure 2 can also be based on the maintenance method provided by the embodiment of this application, by calling the TLB refresh command, within a smaller range , efficiently and conveniently maintain the TLB consistency of a process or container under a multi-core architecture.
  • the embodiments of the present application can be applied to virtualization scenarios and non-virtualization scenarios, to maintain TLB consistency in a small range, and greatly reduce TLB maintenance delays in virtualization scenarios and non-virtualization scenarios. Effectively improve the memory access performance of the entire system.
  • FIG. 3 is a schematic flowchart of a method for maintaining a forwarding bypass cache provided by an embodiment of the present application.
  • the method can be applied to the system architecture shown in FIG. 1 or the application scenario shown in FIG. 2 .
  • the method can be applied to electronic equipment, and the electronic equipment can include multiple physical CPUs.
  • a first process may run on the electronic device, and the first process may currently include M first threads; the M first threads are currently running on M physical CPUs among the multiple physical CPUs. above; M is an integer greater than or equal to 1.
  • a first virtual machine may run on the electronic device, and the first virtual machine may currently include M first virtual CPUs; the M first virtual CPUs are currently running on multiple physical CPUs on the M physical CPUs.
  • the method may include the following steps S301-step S302.
  • Step S301 determine the physical CPU range S1 corresponding to the first process or the first virtual machine, and the physical CPU range S1 includes M physical CPUs.
  • a non-virtualization scenario if there are currently M first threads running on M physical CPUs in the first process, it can be determined that the physical CPU range S1 corresponding to the first process currently includes the M physical CPU.
  • a virtualization scenario if there are currently M first virtual CPUs running on M physical CPUs in the first virtual machine, it can be determined that the physical CPU range S1 corresponding to the first virtual machine currently includes the M physical CPUs. CPU.
  • the first process may include 20 first threads, and currently 5 of the 20 first threads may run on 5 physical CPUs respectively, while the remaining 15
  • the first thread is not currently running on any physical CPU (it may have finished running, or it may not be running yet), then the physical CPU range (for example, physical CPU range S1) corresponding to the first process may include the 5 physical CPUs .
  • the first virtual machine may include 10 first virtual CPUs, and currently 8 of the 10 first virtual CPUs may run on 8 physical CPUs respectively, and The other two first virtual CPUs are not currently running on any physical CPU, and the current physical CPU range (for example, physical CPU range S1 ) corresponding to the first virtual machine may include the eight physical CPUs.
  • Step S302 based on the page table information maintained by the first process or the first virtual machine, update the TLB information maintained by all the physical CPUs in the physical CPU range S1.
  • the TLB can be considered as a page table cache (cache), which stores the current most likely to be accessed page table entries, and its content is a copy of some page table entries . That is to say, when the first process or M first threads in the first virtual machine or the first virtual CPU run on M physical CPUs, the TLB information maintained on the M physical CPUs is different from that of the first process or Corresponding to the page table information maintained by the first virtual machine. Therefore, when the page table information maintained by the first process or the first virtual machine is modified (for example, the first thread or the first virtual CPU modifies the page table information), it is necessary to separately maintain the TLB information on the M physical CPUs. A synchronous update is performed to maintain that the TLB information maintained on each of the M physical CPUs is the latest valid information, so as to avoid TLB access errors.
  • a page table cache cache
  • the embodiment of the present application considers that each process (or virtual machine) maintains its own page table information, and multiple threads (or virtual CPUs) in each process (or virtual machine) share the page table information. Different processes (or virtual machines) are independent of each other. Therefore, when the page table information is modified, it is only necessary to perform corresponding TLB consistency maintenance on the physical CPU that is currently running the thread (or virtual CPU) in the process (or virtual machine), so as to effectively avoid TLB access errors. In this way, the embodiment of the present application can minimize the range of physical CPUs that need to maintain TLB consistency each time on the premise of effectively avoiding TLB access errors, thereby achieving efficient, convenient and accurate TLB consistency maintenance.
  • each process can use software variables to record the information of the physical CPU where each thread in the process is currently located (for example, where the M first threads in the first process are located) M physical CPUs), and this information can be shared between threads.
  • the process running on the physical CPU is switched (that is, the thread running after the switch and the thread running before the switch belong to different processes), it is necessary to update the physical CPU ranges corresponding to the two processes before and after the switch in real time, so as to maintain the correspondence between each process.
  • the range of physical CPUs always includes the accurate physical CPU information of the process currently running, providing an accurate range for subsequent TLB consistency maintenance, thereby ensuring the effectiveness of TLB consistency maintenance.
  • FIG. 4a is a schematic diagram of an update process of a physical CPU range provided by an embodiment of the present application.
  • the update process of the physical CPU range corresponding to the process may include the following steps S401-step S405.
  • step S401 the kernel scheduler schedules the first thread to go online on the first physical CPU.
  • the M physical CPUs may include a first physical CPU and M-1 second physical CPUs, where the first thread in the first process runs on the first physical CPU. Before a physical CPU, the first physical CPU may run a second thread, and at the same time, the M-1 second physical CPUs may already be running the first thread. At this time, the range S2 of physical CPUs corresponding to the first process may only include the aforementioned M ⁇ 1 second physical CPUs. Then, after the second thread on the first physical CPU finishes running, the kernel scheduler may schedule the first thread to go online on the first physical CPU. It should be noted that the first thread here does not run immediately after it goes online, but can run after the subsequent update of the physical CPU range is completed.
  • Step S402 judging whether the second thread run by the first physical CPU last time belongs to the same process as the first thread.
  • the first physical CPU judges whether the second thread that it ran last time belongs to the same process (for example, the first process) as the current first thread, that is, judges whether an error occurs on the first physical CPU. Process switching. If the second thread does not belong to the first process, execute step S403; otherwise, execute step S405.
  • Step S403 updating the physical CPU ranges of the processes to which the second thread and the first thread respectively belong.
  • the second thread may belong to a second process, and before the first thread in the first process runs on the first physical CPU, N second threads in the second process may run on the first physical CPU On the CPU and the remaining N-1 third physical CPUs, N is an integer greater than 1.
  • the physical CPU range S3 corresponding to the second process may include the first physical CPU and N ⁇ 1 third physical CPUs.
  • the physical CPU corresponding to the second process can be
  • the range S3 is updated to the physical CPU range S4 (including N-1 third physical CPUs), and the physical CPU range S2 corresponding to the first process is updated to the physical CPU range S1 (including the first physical CPU and M-1 second physical CPUs).
  • the physical CPU includes the above M physical CPUs).
  • the physical CPU where each thread sharing the page table in the process is located may be recorded by adding a new variable or data structure in the kernel.
  • the page table information at this time is also synchronously switched to the page table information maintained by the first process, so The first physical CPU needs to refresh the original TLB information corresponding to the second process, and switch to (or update) the TLB information corresponding to the page table information of the current first process.
  • Step S404 updating the physical CPU range information globally visible to software and hardware.
  • the physical CPU range information globally visible to software and hardware is updated synchronously.
  • the globally visible physical CPU range information of software and hardware includes the globally visible physical CPU range of software and hardware corresponding to each physical CPU.
  • the physical CPU range corresponding to the process currently running on each physical CPU may be used as the physical CPU range currently corresponding to each physical CPU.
  • the physical CPU range corresponding to the first physical CPU can be updated from the physical CPU range S3 to the physical CPU range S1; and M-1
  • the physical CPU ranges corresponding to the second physical CPUs are updated from the physical CPU range S2 to the physical CPU range S1; and, the physical CPU ranges corresponding to the N-1 third physical CPUs are updated from the physical CPU range S3 to the physical CPU range S4 .
  • the globally visible physical CPU range information of the software and hardware may include the physical CPU range S1 corresponding to the above-mentioned first physical CPU and the M-1 second physical CPUs (that is, M physical CPUs), and the above-mentioned N - the range S4 of physical CPUs corresponding to each of the third physical CPUs.
  • step S04 may only update the physical CPU ranges corresponding to other physical CPUs except the first physical CPU within the physical CPU ranges corresponding to the first process and the second process, that is, the physical CPU corresponding to the physical CPU is not updated first.
  • the scope is not specifically limited in this embodiment of the present application. For example, only update the physical CPU ranges corresponding to the M-1 second physical CPUs from the physical CPU range S2 to the physical CPU range S1; and, update the physical CPU ranges corresponding to the N-1 third physical CPUs from the physical CPU range CPU range S3 is updated to physical CPU range S4.
  • FIG. 4b is a schematic diagram of physical CPU range information provided by an embodiment of the present application.
  • the embodiment of the present application may record (or store) the above-mentioned physical CPU range information globally visible to software and hardware based on a globally visible register group.
  • the embodiment of the present application adds two sets of registers that can only be accessed by the host machine in a privileged state for each physical CPU, namely cpu_bitmap and vcpu_bitmap.
  • cpu_bitmap to record the physical CPU range corresponding to the currently running process (that is, record the physical CPU range of each thread that is running the shared page table in the process CPU); in the virtualization mode, you can use vcpu_bitmap to record the physical CPU range corresponding to the currently running virtual machine (that is, record the physical CPU that is running each vCPU in the virtual machine).
  • the newly added register set shown in FIG. 4b is globally visible, allowing the kernel and the virtual machine monitor to access the cpu_bitmap and vcpu_bitmap register sets corresponding to all physical CPUs.
  • the access may be achieved by including but not limited to memory mapping technology.
  • FIG. 4c is a schematic diagram of an update flow of physical CPU range information provided by an embodiment of the present application.
  • cpu_bitmap can include 128 bits, and each bit (bit) represents a physical CPU sequentially from front to back, for example, the first bit ( bit[0]) indicates CPU-1, the second bit (bit[1]) indicates CPU-2, the third bit (bit[2]) indicates CPU-3, and so on.
  • thread-11 in process-1 is currently running on CPU-1
  • thread-12 in process-1 is currently running on CPU-2
  • thread-12 in process-2 is currently running on CPU-3. thread-21.
  • the physical CPU range corresponding to process-1 includes CPU-1 and CPU-2
  • the physical CPU range corresponding to process-2 includes CPU-3.
  • the globally visible physical CPU range of the software and hardware corresponding to CPU-1 includes CPU-1 and CPU-2
  • the globally visible physical CPU range of the software and hardware corresponding to CPU-2 includes CPU-1 and CPU-2.
  • the software and hardware corresponding to CPU-3 are globally visible to the physical CPU range including CPU-3.
  • the first two bits of cpu_bitmap are both 1, and the rest of the bits are 0, which means that the physical CPU range of the software and hardware corresponding to CPU-1 is globally visible at this time, including the CPU -1 and CPU-2.
  • the first two bits of cpu_bitmap are both 1, and the rest of the bits are 0, which means that the physical CPU range visible to the software and hardware corresponding to CPU-2 at this time includes CPU -1 and CPU-2.
  • the third bit of cpu_bitmap is 1, and the rest of the bits are 0, which means that the physical CPU range visible to the software and hardware corresponding to CPU-3 at this time only includes the CPU -3.
  • the physical CPU range corresponding to each physical CPU is updated.
  • thread-22 in process-2 is online on CPU-1
  • thread-12 in process-1 is running on CPU-2
  • thread-12 in process-1 is running on CPU-1.
  • thread-21 in process-2 running on 3.
  • the physical CPU range corresponding to process-1 includes CPU-2
  • the physical CPU range corresponding to process-2 includes CPU-1 and CPU-3.
  • the globally visible physical CPU range of software and hardware corresponding to CPU-1 includes CPU-1 and CPU-3.
  • the globally visible physical CPU range of software and hardware corresponding to CPU-2 includes CPU-2.
  • the software and hardware corresponding to CPU-3 are globally visible.
  • the range of physical CPUs includes CPU-1 and CPU-3.
  • each bit represents a physical CPU, so a single bit of cpu_bitmap will only be updated by one CPU at any time (that is, there will be no concurrent writing of the same bit situation), so that the entire update process does not need to hold a lock on cpu_bitmap (that is, to achieve lock-free update).
  • a bit represents a physical CPU set (such as a cluster (cluster), including multiple physical CPU groups)
  • the corresponding physical CPU range can be updated by using a lock (or locking) method to prevent If concurrent read and write errors occur, the following virtual scenarios are the same and will not be described again.
  • the lock holding range may be bit (bit), byte (byte), halfword (halfword), word (word), doubleword (double word) and so on.
  • the types of locks include but are not limited to read-write locks, mutex locks, and atomic operation instructions (compare and swap, CAS), etc., which are not specifically limited in this embodiment of the present application.
  • this embodiment of the present application may also use a globally visible memory address space to record (or store) the aforementioned physical CPU range information globally visible to software and hardware.
  • the physical memory address space can be used instead of the register set, and two fixed memory areas are opened up in the memory to respectively store the ranges of physical CPUs corresponding to virtual machines or processes running on each physical CPU. Each time the software and hardware obtain the physical CPU range corresponding to each physical CPU, they are all implemented by accessing the memory in the physical address space.
  • this embodiment of the present application may also use a cache memory instead of a register set to record (or store) the above-mentioned physical CPU range information globally visible to software and hardware, etc., which is not specifically limited in this embodiment of the present application.
  • Step S405 according to the physical CPU range corresponding to the process to which the first thread belongs, update the globally visible physical CPU range of software and hardware corresponding to the first physical CPU.
  • the first physical CPU corresponding to the first physical CPU can be updated according to the current physical CPU range corresponding to the first process.
  • the range of physical CPUs visible globally to software and hardware. It can be understood that although process switching does not occur on the first physical CPU, process switching may occur on other CPUs, and the physical CPU range corresponding to the corresponding process (such as the first process) is also updated, then the first physical CPU It is still necessary to update the globally visible physical CPU range of software and hardware corresponding to the first physical CPU according to the current physical CPU range corresponding to the first process.
  • CPU-2 can be the first physical CPU, and CPU-2 can switch thread-13 (not shown in the figure) in process-1 to go online, but at this time, a process switch has occurred on CPU-1 , the physical CPU range of process-1 and process-2 has been updated, therefore, CPU-2 needs to update the globally visible software and hardware corresponding to CPU-2 according to the current physical CPU range of process-1 (including CPU-2) Physical CPU range.
  • each virtual machine can use software variables to record the information of the physical CPU where each virtual CPU in the virtual machine is currently located (for example, the M CPUs in the first virtual machine M physical CPUs where the first virtual CPU is located), and this information can be shared between each virtual CPU.
  • FIG. 5a is a schematic diagram of an update flow of another physical CPU range provided by an embodiment of the present application.
  • the process of updating the physical CPU range corresponding to the virtual machine may include the following steps S501 - S505 .
  • step S501 the virtual machine monitor schedules the first virtual CPU to go online on the first physical CPU.
  • step S501 reference may be made to step S401 in the above-mentioned embodiment corresponding to FIG. 4a , which will not be repeated here.
  • Step S502 judging whether the second virtual CPU that the first physical CPU ran last time belongs to the same virtual machine as the first virtual CPU.
  • step S502 reference may be made to step S402 in the above-mentioned embodiment corresponding to FIG. 4a , which will not be repeated here.
  • Step S503 updating the physical CPU ranges of the virtual machines to which the second virtual CPU and the first virtual CPU respectively belong.
  • step S503 reference may be made to step S403 in the above-mentioned embodiment corresponding to FIG. 4a , and details are not repeated here.
  • Step S504 updating the physical CPU range information globally visible to software and hardware.
  • step S504 reference may be made to step S404 in the above-mentioned embodiment corresponding to FIG. 4a , which will not be repeated here.
  • FIG. 5b is a schematic diagram of another physical CPU range information provided by an embodiment of the present application.
  • FIG. 5b reference may be made to the description of the embodiment corresponding to FIG. 4b above, and details are not repeated here.
  • FIG. 5c is a schematic diagram of another update flow of physical CPU range information provided by the embodiment of the present application.
  • vcpu_bitmap can include 128 bits, and each bit represents a physical CPU sequentially from front to back, for example, the first bit (bit[0] ) indicates CPU-1, the second bit (bit[1]) indicates CPU-2, the third bit (bit[2]) indicates CPU-3, and so on.
  • vCPU-11 in VM-1 is currently running on CPU-1
  • vCPU-12 in VM-1 is currently running on CPU-2
  • vCPU-12 in VM-2 is currently running on CPU-3.
  • vCPU-21 the physical CPU range corresponding to VM-1 includes CPU-1 and CPU-2
  • the physical CPU range corresponding to VM-2 includes CPU-3.
  • the globally visible physical CPU range of the software and hardware corresponding to CPU-1 includes CPU-1 and CPU-2
  • the globally visible physical CPU range of the software and hardware corresponding to CPU-2 includes CPU-1 and CPU-2.
  • the software and hardware corresponding to CPU-3 are globally visible to the physical CPU range including CPU-3.
  • the first two bits of vcpu_bitmap are both 1, and the rest of the bits are 0, which means that the physical CPU range of the software and hardware corresponding to CPU-1 is globally visible at this time, including CPU -1 and CPU-2.
  • the first two bits of vcpu_bitmap are both 1, and the rest of the bits are 0, which means that the globally visible physical CPU range of the software and hardware corresponding to CPU-2 includes CPU -1 and CPU-2.
  • the third bit of vcpu_bitmap is 1, and the rest of the bits are 0, which means that the physical CPU range visible to the software and hardware corresponding to CPU-3 at this time only includes the CPU -3.
  • the physical CPU range corresponding to each physical CPU is updated.
  • the physical CPU range corresponding to VM-1 includes CPU-1
  • the physical CPU range corresponding to VM-2 includes CPU-2 and CPU-3.
  • the globally visible physical CPU range of software and hardware corresponding to CPU-1 includes CPU-1
  • the globally visible physical CPU range of software and hardware corresponding to CPU-2 includes CPU-2 and CPU-3.
  • the software and hardware corresponding to CPU-3 are globally visible.
  • the range of physical CPUs includes CPU-1 and CPU-3.
  • the first bit of vcpu_bitmap is 1, and the rest of the bits are 0, which means that CPU-1 corresponds to The scope of physical CPUs globally visible to software and hardware includes CPU-1.
  • the second bit and the third bit of vcpu_bitmap are both 1, and the rest of the bits are 0, which means that the physical CPU range of the software and hardware corresponding to CPU-2 is globally visible at this time, including the CPU -2 and CPU-3.
  • the second bit and the third bit of vcpu_bitmap are both 1, and the rest of the bits are 0, which means that the physical CPU range of the software and hardware corresponding to CPU-3 is globally visible at this time, including the CPU -2 and CPU-3.
  • Step S505 according to the physical CPU range corresponding to the virtual machine to which the first virtual CPU belongs, update the globally visible physical CPU range of software and hardware corresponding to the first physical CPU.
  • step S505 reference may be made to step S405 in the above-mentioned embodiment corresponding to FIG. 4a , which will not be repeated here.
  • any thread may modify the page table information shared by the M threads or first virtual CPUs (ie The page table information maintained by the first process or the first virtual machine), at this time, it is necessary to synchronously update the TLB information maintained on each of the M physical CPUs according to the modified page table information, so as to maintain the TLB consistency in the physical CPU range S1, Avoid subsequent TLB access errors.
  • FIG. 6 is a schematic diagram of a TLB refresh process provided by an embodiment of the present application.
  • the TLB refresh process may specifically include the following steps S601-Step S610.
  • step S601 the target physical CPU acquires a first process identification number or a first virtual machine identification number currently running on the target physical CPU.
  • the target physical CPU may start to execute the TLB refresh instruction.
  • the hardware module corresponding to this instruction can obtain the process identification number (such as the first process identification number) or the virtual machine identification number currently running on the physical CPU from a register on the CPU side (such as a control status register (CSR)). (eg first virtual machine identification number). It can be understood that the process identification number or the virtual machine identification number can also identify whether the current physical CPU is in a non-virtualization scenario or a virtualization scenario.
  • step S602 the target physical CPU sends a TLB refresh request and corresponding TLB refresh information to the communication medium.
  • the target physical CPU sends a TLB refresh request and corresponding TLB refresh information to a communication medium (such as an inter-core interconnection network, such as a bus or an on-chip network), or in other words, the TLB refresh request carries corresponding TLB refresh information.
  • a communication medium such as an inter-core interconnection network, such as a bus or an on-chip network
  • the TLB refresh information may include but not limited to the process identification number corresponding to the first process (for example, the first process identification number), the virtual address corresponding to the modified page table information, and the virtual address One or more in the range, which is not specifically limited in the embodiments of the present application.
  • the TLB refresh information may include, but not limited to, the virtual machine identification number corresponding to the first virtual machine (for example, the first virtual machine identification number), and the virtual machine corresponding to the modified page table information.
  • the virtual address and the virtual address range are not specifically limited in this embodiment of the present application.
  • step S603 the communication medium acquires the physical CPU range S1 currently corresponding to the target physical CPU, and sends a TLB refresh request to all other physical CPUs in the physical CPU range S1.
  • the communication medium can obtain the physical CPU range S1 corresponding to the target physical CPU from the above-mentioned physical CPU range information globally visible to the software and hardware, and send the request to the physical CPU All remaining physical CPUs in range S1 send TLB flush requests.
  • this embodiment of the present application can clarify a necessary range of physical CPUs, and notify the range (that is, the TLB refresh request) Sending range) is reduced, that is, the TLB maintenance range is reduced, and only the corresponding TLB consistency maintenance needs to be performed for the physical CPU currently running the process (or virtual machine), and then the TLB access error can be effectively avoided.
  • High efficiency Convenient and accurate TLB consistency maintenance.
  • step S604 the target physical CPU updates the locally maintained TLB information.
  • the target physical CPU updates locally maintained TLB information according to the modified page table information, for example, refreshing or invalidating corresponding TLB entries and the like.
  • the target physical CPU may execute step S609 to wait for feedback signals from all other physical CPUs.
  • step S605 all other physical CPUs in the physical CPU range S1 receive TLB refresh requests.
  • all other physical CPUs in the physical CPU range S1 can receive the TLB refresh request sent by the communication medium through the TLB refresh hardware logic circuit therein.
  • step S606 all other physical CPUs in the physical CPU range S1 parse the TLB refresh information through hardware.
  • all other physical CPUs within the physical CPU range S1 can parse the TLB refresh information corresponding to the request through hardware.
  • step S607 all other physical CPUs in the physical CPU range S1 update TLB information without interrupting their respective software execution processes.
  • all other physical CPUs in the physical CPU range S1 update the TLB information maintained by them through hardware based on the above TLB refresh information.
  • the TLB updating process is completed by hardware without interrupting the respective software execution processes of all other physical CPUs.
  • the embodiment of the present application can obtain the physical CPU range through hardware, and perform TLB information analysis and TLB information update. In this way, most of the software overhead can be eliminated, and there is no need for virtual machines to fall into traps, interrupt sending, and interrupts. Response and other software processes, thereby further reducing the TLB consistency maintenance delay and improving the efficiency of TLB consistency maintenance.
  • step S608 all other physical CPUs in the physical CPU range S1 send feedback messages to the target physical CPU.
  • all other physical CPUs in the physical CPU range S1 may send a feedback signal to the target physical CPU after updating the TLB information maintained by them.
  • all other physical CPUs may send feedback signals to the communication medium, and then the communication medium forwards the feedback message to the target physical CPU.
  • the feedback signal sent by any physical CPU among all other physical CPUs may be used to indicate that its TLB update has been completed.
  • Step S609 the target physical CPU waits for receiving the feedback signal.
  • the target physical CPU remains blocked and waits for feedback signals from all other physical CPUs.
  • Step S610 whether the target physical CPU has received the feedback signals sent by all other physical CPUs within the physical CPU range S1.
  • the target physical CPU receives feedback signals sent by all other physical CPUs in the physical CPU range S1
  • the permanent maintenance has been completed, the execution of the TLB refresh instruction is completed, and then the target physical CPU can execute subsequent instructions; otherwise, the target physical CPU continues to block until it receives feedback signals from all other physical CPUs in the physical CPU range S1.
  • FIG. 7 is a schematic diagram of another TLB refresh process provided by the embodiment of the present application.
  • the process method can be applied to non-virtualized scenarios and virtualized scenarios.
  • the TLB refresh instruction executed by CPU-1 corresponds to hardware modules such as a feedback signal (acknowledges, ACKs) statistics module 1001 , a sending module 1002 and a TLB refresh module 1003 .
  • the coverage of the entire hardware circuit for refreshing the TLB may include, but not limited to, multiple hardware modules on the CPU side and an inter-core interconnection network.
  • the TLB refresh process is as follows:
  • the CPU-1 executes a TLB refresh instruction, and sends a TLB refresh request and related TLB refresh information to the inter-core interconnection network through the sending module 1002 .
  • the CPU-1 performs local TLB refresh through the TLB refresh module 1003 , that is, updates local TLB information.
  • the inter-core interconnection network receives the TLB refresh request, determines the CPU-1 corresponding to the request, and obtains the current physical CPU range corresponding to the CPU-1 from the globally visible physical CPU range information of the software and hardware (for example, the physical CPU range S1, including CPU-2).
  • the inter-core interconnection network sends a TLB refresh request to all other physical CPUs (such as CPU-2) within the range of the physical CPU currently corresponding to the CPU-1.
  • the CPU-2 receives the TLB refresh request through the TLB refresh module 2003 therein, and updates the TLB information maintained by the CPU-2.
  • the CPU-2 feeds back an ACK to the inter-core interconnection network through the TLB refresh module 2003 .
  • the inter-core interconnection network feeds back the ACK to the CPU-1, and correspondingly, the CPU-1 receives the feedback signal through the ACKs statistics module 1001 therein.
  • FIG. 8 is a schematic diagram of another TLB refresh process provided by an embodiment of the present application.
  • the CPU-1 may directly acquire the current physical CPU range corresponding to the CPU-1 from the globally visible physical CPU range information of the software and hardware, instead of using the inter-core interconnection network. .
  • the TLB refresh process is as follows:
  • CPU-1 executes the TLB refresh instruction, and obtains the current physical CPU range corresponding to the CPU-1 (such as physical CPU range S1, including CPU-2) from the physical CPU range information that is globally visible to the software and hardware through the sending module 1002 .
  • the CPU-1 sends the TLB refresh request and related TLB refresh information to the inter-core interconnection network through the sending module 1002 .
  • the TLB refresh request may carry indication information related to the physical CPU range S1.
  • the CPU-1 performs local TLB refresh through the TLB refresh module 1003 , that is, updates local TLB information.
  • the inter-core interconnection network receives the TLB refresh request, determines all remaining physical CPUs (such as CPU-2) within the physical CPU range corresponding to the CPU-1 based on the request, and sends a TLB refresh request to the CPU-2.
  • the CPU-2 receives the TLB refresh request through the TLB refresh module 2003 therein, and updates the TLB information maintained by the CPU-2.
  • the CPU-2 feeds back an ACK to the inter-core interconnection network through the TLB refresh module 2003 .
  • the inter-core interconnection network feeds back the ACK to the CPU-1, and correspondingly, the CPU-1 receives the feedback signal through the ACKs statistics module 1001 therein.
  • the embodiment of the present application can maintain and update the physical CPU range corresponding to the process or virtual machine through software, and obtain the current physical CPU range through hardware and refresh TLB information based on this range, that is, perform TLB consistency maintenance within this range.
  • the number of physical CPUs that need to refresh the TLB is greatly reduced, the TLB consistency maintenance delay is reduced, and the overall memory access performance of the system is effectively improved.
  • each method procedure in the method for maintaining the forwarding look-aside cache described in the embodiment of this application may be implemented based on software, hardware, or a combination thereof.
  • the way of implementing by hardware may include logic circuit, arithmetic circuit or analog circuit and so on.
  • a software implementation may include program instructions, which may be regarded as a software product, which is stored in a memory and can be executed by a processor to implement related functions.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device at least includes a processor 1101 , an input device 1102 , an output device 1103 and a computer-readable storage medium 1104 , and the electronic device may also include other common components, which will not be described in detail here.
  • the processor 1101 in the electronic device, the input device 1102, the output device 1103 and the computer-readable storage medium 1104 may be connected through a bus or other means.
  • the processor 1101 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in the above solutions.
  • the processor 1101 may include multiple physical CPUs.
  • the electronic device may run a first process, and the first process currently includes M first threads, and the M first threads currently run on M physical CPUs among the plurality of physical CPUs respectively.
  • the M physical CPUs may be used to determine the physical CPU range S1 corresponding to the first process, and the physical CPU range S1 includes the M physical CPUs.
  • the M physical CPUs are also used to synchronously update the forwarding bypass cache TLB information maintained by all the physical CPUs in the physical CPU range S1 based on the modified page table information when the page table information maintained by the first process is modified. .
  • the memory in the electronic device can be read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or can store information and Other types of dynamic storage devices for instructions can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical discs storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media, or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and any other medium that can be accessed by a computer, but is not limited to.
  • the memory can exist independently and be connected to the processor through the bus. Memory can also be integrated with the processor.
  • the computer-readable storage medium 1104 may be stored in the memory of the electronic device, the computer-readable storage medium 1104 is used to store a computer program, the computer program includes program instructions, and the processor 1101 is used to execute the computer-readable
  • the storage medium 1104 stores program instructions.
  • Processor 1101 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and control core of electronic equipment, which is suitable for implementing one or more instructions, specifically for loading and executing one or more instructions to realize Corresponding method flow or corresponding function; in one embodiment, the processor 1101 described in the embodiment of the present application can be used to perform a series of processes in the method for maintaining the forwarding bypass cache, including: determining the physical CPU corresponding to the first process Range S1, physical CPU Range S1 includes M physical CPUs; when the page table information maintained by the first process is modified, based on the modified page table information, the forwarding addresses maintained by all the physical CPUs in the physical CPU range S1 are synchronously updated Bypass caching of TLB information, etc.
  • CPU Central Processing Unit, central processing unit
  • An embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium can store a program, and when the program is executed by a processor, the processor can execute any of the methods described in the above-mentioned method embodiments. Some or all of the steps of one.
  • the embodiment of the present application also provides a computer program, the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor can perform some or all of the steps described in any one of the above method embodiments .
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division.
  • there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application.
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disc, read-only memory (read-only memory, ROM), double data rate synchronous dynamic random access memory (double data rate, DDR), flash memory ( flash) or random access memory (random access memory, RAM) and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请实施例公开了一种转址旁路缓存的维护方法及相关设备,该方法应用于电子设备,该电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一进程,所述第一进程当前包括M个第一线程;M个所述第一线程当前分别运行在多个所述物理CPU中的M个物理CPU上;M为大于或者等于1的整数;所述方法包括:确定所述第一进程当前对应的物理CPU范围S1,所述物理CPU范围S1包括当前运行有所述第一进程内的所述第一线程的M个所述物理CPU;基于所述第一进程维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。采用本申请实施例可以减少TLB的维护时延。

Description

一种转址旁路缓存的维护方法及相关设备
本申请要求于2021年11月27日提交中国专利局、申请号为2021114388050、申请名称为“一种转址旁路缓存的维护方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种转址旁路缓存的维护方法及相关设备。
背景技术
在计算产业中,云计算服务通常由大量的服务器提供,且单个服务器上计算核的数量随着云服务的需求量不断攀升,甚至可以达到200余个。为充分利用多核并行性能,软件程序通常并行运行在多个计算核上,例如MapReduce(一种编程模型),软件事务内存,并发垃圾回收机制等软件程序。进一步地,在软件程序访存过程中,转址旁路缓存(translation lookaside buffer,TLB)可以有效提升访存性能。相应的,也需要在各个计算核之间维护TLB的一致性,避免出现TLB访问错误。同一进程中的不同线程可以运行在不同的计算核上,当其中一个线程修改页表时,不仅要修改该线程所在计算核的TLB信息,还要通知其他计算核更新对应的TLB信息,以确保不同计算核的TLB信息的一致性。
然而,目前TLB一致性维护流程繁杂,时延长,导致服务器等多核设备的整体访存性能大大下降。
因此,如何减少TLB的一致性维护时延是亟待解决的问题。
发明内容
本申请实施例提供一种转址旁路缓存的维护方法及相关设备,可以大大减少TLB的一致性维护时延。
本申请实施例提供的转址旁路缓存的维护方法可以由电子设备等执行。电子设备是指能够被抽象为计算机系统的设备,其中,支持转址旁路缓存的维护功能的电子设备,也可称为转址旁路缓存的维护装置。转址旁路缓存的维护装置可以是该电子设备的整机,例如:智能可穿戴设备、智能手机、平板电脑、笔记本电脑、台式电脑、车载计算机或服务器,等等;也可以是由多个整机构成的系统/装置;还可以是该电子设备中的部分器件,例如:转址旁路缓存维护功能相关的芯片,如系统芯片(system on a chip,SoC),等等,本申请实施例对此不作具体限定。其中,系统芯片也称为片上系统。
第一方面,本申请实施例提供了一种转址旁路缓存的维护方法,应用于电子设备,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一进程,所述第一进程当前包括M个第一线程;M个所述第一线程当前分别运行在多个所述物理CPU中的M个物理CPU上;M为大于或者等于1的整数;所述方法包括:确定所述第一进程当前对应的物理CPU范围S1,所述物理CPU范围S1包括当前运行有所述第一进程内的所述第一线程的M个所述物理CPU;基于所述第一进程维护的页表信息,更新所述物理CPU范围S1内的所有 物理CPU各自维护的转址旁路缓存TLB信息。
通过第一方面提供的方法,在非虚拟化场景中,首先可以根据任意进程(例第一进程)内的多个线程(例如第一线程)当前运行在哪些物理CPU上,从而确定该进程当前对应的物理CPU范围(例如物理CPU范围S1)。如此,当该进程维护的页表信息被修改后,可以根据修改后的页表信息,同步更新该物理CPU范围内的所有物理CPU各自维护的TLB信息,从而避免该物理CPU范围内的所有物理CPU在运行该进程内的多个线程时出现TLB访问错误。由此,相较于现有技术中,在某一进程的页表信息被修改后,只能无区别地向设备或者系统内所有物理CPU发送TLB刷新请求,然后等待所有物理CPU完成TLB刷新(或者说TLB信息更新),从而导致整个TLB一致性维护时间长、开销大的方案而言,本申请实施例可以在一个明确的、较小的范围内,维护TLB的一致性,大大减少了维护开销和维护时延,有效提升了整个系统的访存性能。应理解,本申请实施例考虑到每个进程都会各自维护一份自己的页表信息,因此只需针对当前正在运行该进程的物理CPU进行相应的TLB一致性维护。如此,本申请实施例可以在有效避免TLB访问错误的前提下,极大程度上缩小每次需要维护TLB一致性的物理CPU范围,从而实现高效、便捷且准确的TLB一致性维护。
在一种可能的实施方式中,M个所述物理CPU包括第一物理CPU和M-1个第二物理CPU;其中,在所述第一线程运行在所述第一物理CPU上之前,第二线程运行在所述第一物理CPU上,并且M个所述第一线程中的M-1个所述第一线程分别运行在M-1个所述第二物理CPU上;所述方法还包括:在所述第一物理CPU上的线程由所述第二线程切换至所述第一进程中的所述第一线程后,判断所述第二线程是否属于所述第一进程;若所述第二线程不属于所述第一进程,则更新所述第一进程对应的物理CPU范围S2,得到当前的所述物理CPU范围S1;所述物理CPU范围S2包括更新前运行有所述第一进程内的所述第一线程的M-1个所述第二物理CPU。
在本申请实施例中,为了保证在运行过程中任意进程对应的物理CPU范围在任意时刻都是准确的,进而保证TLB一致性维护的准确有效,需要对物理CPU范围进行实时更新。具体地,当任意物理CPU(例如第一物理CPU)上运行的线程发生切换时,若切换后的线程(例如第一线程)与切换前的线程(例如第二线程)属于不同进程(即第一物理CPU上发生了进程的切换,一个进程不再在第一物理CPU上运行了,而另一个进程即将在第一物理CPU上运行),则可以对切换后的进程所对应的物理CPU范围进行更新,增加当前的物理CPU(例如第一物理CPU),从而得到新的包含当前物理CPU的物理CPU范围(例如物理CPU范围S1),从而保证后续可以在准确的物理CPU范围内进行TLB一致性维护,也即保证了TLB一致性维护的准确性和有效性。
在一种可能的实施方式中,所述第二线程属于第二进程;在所述第一线程运行在所述第一物理CPU上之前,所述第二进程中的N个所述第二线程分别运行在所述第一物理CPU和多个所述物理CPU中的N-1个第三物理CPU上;N为大于或者等于1的整数;所述方法还包括:在所述第一物理CPU上的线程由所述第二线程切换至所述第一进程中的所述第一线程后,更新所述第二进程对应的物理CPU范围S3,得到物理CPU范围S4;所述物理CPU范围S3包括更新前运行有所述第二进程内的所述第二线程的所述第一物理CPU和N-1个所述第三物理CPU;所述物理CPU范围S4包括当前运行有所述第二进程内的所述第二线程的N-1个所述第三物理CPU。
在本申请实施例中,如上所述,当任意物理CPU(例如第一物理CPU)上发生了进程切换时,不仅需要对切换后的进程(例如第一进程)对应的物理CPU范围进行实时更新,还需 要对切换前的进程(例如第二进程)对应的物理CPU范围进行实时更新。具体为在切换前的进程对应的物理CPU范围内删除该第一物理CPU。从而保持任意进程当前对应的物理CPU范围内均是当前正在运行该进程的物理CPU。进而保证无论何时,任意进程维护的页表信息被修改后,均可以在准确的物理CPU范围内进行高效、便捷的TLB一致性维护,提升维护效率。
在一种可能的实施方式中,所述方法还包括:基于所述第一进程和所述第二进程各自对应的物理CPU范围的更新,将所述第一物理CPU对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S1;以及,将M-1个所述第二物理CPU各自对应的物理CPU范围由所述物理CPU范围S2更新为所述物理CPU范围S1;以及,将N-1个所述第三物理CPU各自对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S4。
在本申请实施例中,还可以将某一物理CPU上当前正在运行的进程所对应的物理CPU范围,作为该物理CPU对应的物理CPU范围,从而为该物理CPU后续发送TLB刷新请求提供准确范围,实现高效、便捷的TLB一致性维护。
在一种可能的实施方式中,所述电子设备中存储有物理CPU范围信息;所述物理CPU范围信息当前至少包括M个所述物理CPU各自对应的所述物理CPU范围S1,以及N-1个所述第三物理CPU各自对应的所述物理CPU范围S4。
在本申请实施例中,电子设备中还可以存储有各个物理CPU当前对应的物理CPU范围,从而构成软、硬件全局可见的物理CPU范围信息。该范围信息可以为任意物理CPU后续发送TLB刷新请求提供准确范围,实现高效、便捷的TLB一致性维护。可选地,本申请实施例中可以采用包括但不限于寄存器组、内存和高速缓冲存储器(cache)等方式存储该软、硬件全局可见的物理CPU范围信息。
在一种可能的实施方式中,所述基于所述第一进程维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息,包括:在所述第一进程维护的页表信息被所述M个第一物理CPU中的目标物理CPU当前正在运行的第一线程修改后,基于修改后的所述页表信息,更新所述目标物理CPU维护的TLB信息;通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求;所述TLB刷新请求用于所述第一物理范围内的其余物理CPU同步更新各自维护的TLB信息,以使得所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
在本申请实施例中,当某一物理CPU(例如目标物理CPU)上正在运行的线程(例如第一线程)修改了对应的页表信息后,目标物理CPU可以根据该修改后的页表信息更新自身维护的TLB信息,并向当前物理CPU范围内的其他物理CPU发送TLB刷新请求,使得其他物理CPU也同步根据该修改后的页表信息更新各自维护的TLB信息,从而保证该物理CPU范围内的所有物理CPU各自维护的TLB信息一致,以避免该物理CPU范围内的所有物理CPU在运行该进程内的多个线程时出现TLB访问错误。如此,本申请实施例可以在一个较小的范围内实现快速、便捷的完成TLB一致性维护,大大减少了TLB维护时延。此外,需要说明的是,本申请实施例对目标物理CPU修改自身维护的TLB信息,以及向物理CPU范围内的其余物理CPU发送TLB刷新请求的先后顺序不作具体限定。
在一种可能的实施方式中,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:通过所述目标物理CPU向核间互联网络发送所述TLB刷新请求;所述核间互联网络为总线或者片上网络NOC;通过所述核间互联网络接收所述TLB刷新请求,确定所述TLB刷新请求对应所述目标物理CPU,并从所述物理CPU范围信 息中获取所述目标物理CPU对应的所述物理CPU范围S1;通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
在本申请实施例中,目标物理CPU可以向核间互联网络(即通信媒介,比如总线和片上网络等)发送TLB刷新请求,然后由核间互联网络从物理CPU范围信息内查找该目标物理CPU当前对应的物理CPU范围(例如物理CPU范围S1),从而明确当前需要进行TLB一致性维护的物理CPU范围。随后,核间互联网络可以向该范围内的除目标物理CPU外的其余物理CPU发送TLB刷新请求,使其同步更新各自维护的TLB信息。如此,实现了在保证TLB访存不出错的前提下,在必要的、较小的范围内维护TLB一致性,极大程度上减少了TLB一致性维护时延。
在一种可能的实施方式中,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:通过所述目标物理CPU从所述物理CPU范围信息中获取所述目标物理CPU对应的所述物理CPU范围S1,并向核间互联网络发送TLB刷新请求;所述TLB刷新请求携带有与所述物理CPU范围S1相关的指示信息;所述核间互联网络为总线或者片上网络NOC;通过所述核间互联网络接收所述TLB刷新请求,并根据所述TLB刷新请求确定所述物理CPU范围S1;通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
在本申请实施例中,还可以由目标物理CPU从该物理CPU范围信息内查找其当前对应的物理CPU范围(例如物理CPU范围S1),从而明确当前需要进行TLB一致性维护的物理CPU范围。随后,目标物理CPU可以向核间互联网络发送TLB刷新请求,核间互联网络根据该TLB刷新请求携带的相关指示信息,向对应的物理CPU范围内的除目标物理CPU外的其余物理CPU发送TLB刷新请求,使其同步更新各自维护的TLB信息。如此,实现了在保证TLB访存不出错的前提下,在必要的、较小的范围内维护TLB一致性,极大程度上减少了TLB一致性维护时延。
在一种可能的实施方式中,所述方法还包括:接收所述物理CPU范围S1内的M-1个物理CPU各自发送的反馈信号,基于所述反馈信号确定所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
在本申请实施例中,任意一个执行TLB刷新指令的物理CPU(例如目标物理CPU)在接收到当前物理CPU范围内的其余物理CPU的反馈信号后,才可以继续执行后续指令,即目标物理CPU需要阻塞直至本次TLB一致性维护完成,从而确保后续TLB访存不会出错。相较于现有技术中需要设备或者系统内的所有物理CPU均完成TLB刷新,从而导致TLB维护开销大、时间长,以及目标物理CPU阻塞时间长的方案而言,本申请实施例可以在保证TLB访存不出错的前提下,在必要的、较小的范围内高效、便捷地维护TLB一致性,大大缩短了目标物理CPU的阻塞时长,提升了整个系统的访存性能。
在一种可能的实施方式中,所述TLB刷新请求携带有对应的TLB刷新信息;所述TLB刷新信息包括所述第一进程对应的进程标识符、修改后的所述页表信息对应的虚拟地址和虚拟地址范围中的一种或多种;所述TLB刷新请求,具体用于所述物理CPU范围S1内的其余物理CPU基于所述TLB刷新信息,在保持运行各自线程的情况下,通过硬件更新各自维护的TLB信息。
在本申请实施例中,TLB刷新请求还携带有对应的TLB刷新信息,该信息可以包括但不限于当前进程(例如第一进程)对应的进程标识符、修改后的页表信息对应的虚拟地址和虚拟地址范围等。如此,当前物理CPU范围内的物理CPU接收到TLB刷新请求后,可以根据 该请求携带的TLB刷新信息,快速、准确地完成TLB刷新,保证该范围内的各个物理CPU各自维护的TLB信息一致。此外,本申请实施例中涉及的TLB刷新的过程可以直接通过硬件完成,无需打断各个物理CPU上正在运行的软件流程(例如第一线程),从而进一步提升TLB一致性维护的高效性和便捷性。
第二方面,本申请实施例提供了一种转址旁路缓存的维护方法,其特征在于,应用于电子设备,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一虚拟机,所述第一虚拟机当前包括M个第一虚拟CPU;M个所述第一虚拟CPU当前分别运行在多个所述物理CPU中的M个物理CPU上;M为大于或者等于1的整数;所述方法包括:确定所述第一虚拟机当前对应的物理CPU范围S1,所述物理CPU范围S1包括当前运行有所述第一虚拟机内的所述第一虚拟CPU的M个所述物理CPU;基于所述第一虚拟机维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。
通过第二方面提供的方法,在虚拟化场景中,首先可以根据任意虚拟机(例第一虚拟机)内的多个虚拟CPU(例如第一虚拟CPU)当前运行在哪些物理CPU上,从而确定该虚拟机当前对应的物理CPU范围(例如物理CPU范围S1)。如此,当该虚拟机维护的页表信息被修改后,可以根据修改后的页表信息,同步更新该物理CPU范围内的所有物理CPU各自维护的TLB信息,从而避免该物理CPU范围内的所有物理CPU在运行该虚拟机内的多个虚拟CPU时出现TLB访问错误。由此,相较于现有技术中,在某一虚拟机的页表信息被修改后,只能无区别地向设备或者系统内所有物理CPU发送TLB刷新请求,然后等待所有物理CPU完成TLB刷新(或者说TLB信息更新),从而导致整个TLB一致性维护时间长、开销大的方案而言,本申请实施例可以在一个明确的、较小的范围内,维护TLB的一致性,大大减少了维护开销和维护时延,有效提升了整个设备或者系统的访存性能。应理解,本申请实施例考虑到每个虚拟机都会各自维护一份自己的页表信息,因此只需针对当前正再运行该虚拟机的物理CPU进行相应的TLB一致性维护。如此,本申请实施例可以在有效避免TLB访问错误的前提下,极大程度上缩小每次需要维护TLB一致性的物理CPU范围,从而实现高效、便捷且准确的TLB一致性维护。
可以理解的是,本申请实施例在虚拟化场景与非虚拟化场景下的方案同理,因此虚拟化场景下的有益效果具体可以参见上述非虚拟化场景,此处不再进行赘述。
结合上述第一方面提供的方法,本申请实施例可以通过维护各个进程对应的物理CPU范围,从而实现不论在虚拟化场景和非虚拟化场景下,都能在必要的、较小的物理CPU范围进行高效、便捷的TLB一致性维护。
在一种可能的实施方式中,M个所述物理CPU包括第一物理CPU和M-1个第二物理CPU;其中,在所述第一虚拟CPU运行在所述第一物理CPU上之前,第二虚拟CPU运行在所述第一物理CPU上,并且M个所述第一虚拟CPU中的M-1个所述第一虚拟CPU分别运行在M-1个所述第二物理CPU上;所述方法还包括:在所述第一物理CPU上的虚拟CPU由所述第二虚拟CPU切换至所述第一虚拟机中的所述第一虚拟CPU后,判断所述第二虚拟CPU是否属于所述第一虚拟机;若所述第二虚拟CPU不属于所述第一虚拟机,则更新所述第一虚拟机对应的物理CPU范围S2,得到当前的所述物理CPU范围S1;所述物理CPU范围S2包括更新前运行有所述第一虚拟机内的所述第一虚拟CPU的M-1个所述第二物理CPU。
在一种可能的实施方式中,所述第二虚拟CPU属于第二虚拟机;在所述第一虚拟CPU运行在所述第一物理CPU上之前,所述第二虚拟机中的N个所述第二虚拟CPU分别运行在 所述第一物理CPU和多个所述物理CPU中的N-1个第三物理CPU上;N为大于或者等于1的整数;所述方法还包括:在所述第一物理CPU上的虚拟CPU由所述第二虚拟CPU切换至所述第一虚拟机中的所述第一虚拟CPU后,更新所述第二虚拟机对应的物理CPU范围S3,得到物理CPU范围S4;所述物理CPU范围S3包括更新前运行有所述第二虚拟机内的所述第二虚拟CPU的所述第一物理CPU和N-1个所述第三物理CPU;所述物理CPU范围S4包括当前运行有所述第二虚拟机内的所述第二虚拟CPU的N-1个所述第三物理CPU。
在一种可能的实施方式中,所述方法还包括:基于所述第一虚拟机和所述第二虚拟机各自对应的物理CPU范围的更新,将所述第一物理CPU对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S1;以及,将M-1个所述第二物理CPU各自对应的物理CPU范围由所述物理CPU范围S2更新为所述物理CPU范围S1;以及,将N-1个所述第三物理CPU各自对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S4。
在一种可能的实施方式中,所述电子设备中存储有物理CPU范围信息;所述物理CPU范围信息当前至少包括M个所述物理CPU各自对应的所述物理CPU范围S1,以及N-1个所述第三物理CPU各自对应的所述物理CPU范围S4。
在一种可能的实施方式中,所述基于所述第一虚拟机维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息,包括:在所述第一虚拟机维护的页表信息被所述M个第一物理CPU中的目标物理CPU当前正在运行的第一虚拟CPU修改后,基于修改后的所述页表信息,更新所述目标物理CPU维护的TLB信息;通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求;所述TLB刷新请求用于所述第一物理范围内的其余物理CPU同步更新各自维护的TLB信息,以使得所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
在一种可能的实施方式中,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:通过所述目标物理CPU向核间互联网络发送所述TLB刷新请求;所述核间互联网络为总线或者片上网络NOC;通过所述核间互联网络接收所述TLB刷新请求,确定所述TLB刷新请求对应所述目标物理CPU,并从所述物理CPU范围信息中获取所述目标物理CPU对应的所述物理CPU范围S1;通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
在一种可能的实施方式中,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:通过所述目标物理CPU从所述物理CPU范围信息中获取所述目标物理CPU对应的所述物理CPU范围S1,并向核间互联网络发送TLB刷新请求;所述TLB刷新请求携带有与所述物理CPU范围S1相关的指示信息;所述核间互联网络为总线或者片上网络NOC;通过所述核间互联网络接收所述TLB刷新请求,并根据所述TLB刷新请求确定所述物理CPU范围S1;通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
在一种可能的实施方式中,所述方法还包括:接收所述物理CPU范围S1内的M-1个所述物理CPU各自发送的反馈信号,基于所述反馈信号确定所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致,并执行后续指令。
在一种可能的实施方式中,其特征在于,所述TLB刷新请求携带有对应的TLB刷新信息;所述TLB刷新信息包括所述第一虚拟机对应的虚拟机标识符、修改后的所述页表信息对应的虚拟机中的虚拟地址和虚拟地址范围中的一种或多种;所述TLB刷新请求,具体用于所述物理CPU范围S1内的其余物理CPU基于所述TLB刷新信息,在保持运行各自虚拟CPU 的情况下,通过硬件更新各自维护的TLB信息。
第三方面,本申请实施例提供一种电子设备,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一进程,所述第一进程当前包括M个第一线程,M个所述第一线程当前分别运行在所述多个物理CPU中的M个物理CPU上;M为大于1的整数;其中,M个所述物理CPU,用于确定所述第一进程对应的物理CPU范围S1,所述物理CPU范围S1包括M个所述物理CPU;M个所述物理CPU,用于当所述第一进程维护的页表信息发生修改时,基于修改后的页表信息,同步更新所述物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。
可选地,第三方面中的电子设备的具体功能可参考上述第一方面中提供的方法流程,此处不再进行赘述。
第四方面,本申请实施例提供一种电子设备,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一虚拟机,所述第一虚拟机当前包括M个第一虚拟CPU,M个所述第一虚拟CPU当前分别运行在所述多个物理CPU中的M个物理CPU上;M为大于1的整数;其中,M个所述物理CPU,用于确定所述第一虚拟机对应的物理CPU范围S1,所述物理CPU范围S1包括M个所述物理CPU;M个所述物理CPU,用于当所述第一虚拟机维护的页表信息发生修改时,基于修改后的页表信息,同步更新所述物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。
可选地,第四方面中的电子设备的具体功能可参考上述第二方面中提供的方法流程,此处不再进行赘述。
第五方面,本申请实施例提供一种电子设备,该电子设备中包括处理器,处理器被配置为支持该电子设备执行第一方面或者第二方面提供的任意一种转址旁路缓存的维护方法中相应的功能。该电子设备还可以包括存储器,存储器用于与处理器耦合,其保存该电子设备必要的程序指令和数据。该电子设备还可以包括通信接口,用于该电子设备与其他设备或通信网络通信。
第六方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述第一方面或者第二方面提供的任意一种转址旁路缓存的维护方法流程。
第七方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第一方面或者第二方面提供的任意一种转址旁路缓存的维护方法流程。
第八方面,本申请实施例提供了一种芯片,该芯片包括处理器和通信接口,所述处理器用于从该通信接口调用并运行指令,当该处理器执行所述指令时,使得该芯片执行上述第一方面或者第二方面提供的任意一种转址旁路缓存的维护方法流程。
第九方面,本申请实施例提供了一种芯片系统,该芯片系统包括上述第三方面或者第四方面中任意一项所述的电子设备,用于实现上述第一方面或者第二方面提供的任意一种转址旁路缓存的维护方法流程所涉及的功能。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存转址旁路缓存的维护方法必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1是本申请实施例提供的一种系统架构示意图。
图2是本申请实施例提供的一种应用场景示意图。
图3是本申请实施例提供的一种转址旁路缓存的维护方法流程示意图。
图4a是本申请实施例提供的一种物理CPU范围的更新流程示意图。
图4b是本申请实施例提供的一种物理CPU范围信息的示意图。
图4c是本申请实施例提供的一种物理CPU范围信息的更新流程示意图。
图5a是本申请实施例提供的另一种物理CPU范围的更新流程示意图。
图5b是本申请实施例提供的另一种物理CPU范围信息的示意图。
图5c是本申请实施例提供的另一种物理CPU范围信息的更新流程示意图。
图6是本申请实施例提供的一种TLB刷新流程示意图。
图7是本申请实施例提供的另一种TLB刷新流程示意图。
图8是本申请实施例提供的又一种TLB刷新流程示意图。
图9是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”和“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。需要说明的是,当一个元件被称作与另一个或多个元件“耦合”、“连接”时,它可以是一个元件直接连接到另一个或多个元件,也可以是间接连接至该另一个或多个元件。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本邻域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在处理器上运行的应用和处理器都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面 存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。
首先,对本申请中的部分用语进行解释说明,以便于本邻域技术人员理解。
(1)物理中央处理器(central process unit,CPU),通常情况下是指计算机上实际配置的CPU个数。CPU一般可以分为单核CPU和多核CPU,其中,单核CPU中一般只包括单个CPU核(或者称之为物理核,即上述计算核),多核CPU中可以包括多个CPU核。需要说明的是,本申请实施例中所涉及的物理CPU可以是单核CPU,也可以是多核CPU,还可以是多核CPU中的一个CPU核,后续不再进行解释。其中,在本申请实施例的一些非虚拟化场景中,多个物理CPU可以并行运行多个线程,该多个线程可以属于不同的进程。在本申请实施例的一些虚拟化场景中,多个物理CPU可以并行运行多个虚拟CPU(virtual central process unit,vCPU),该多个虚拟CPU可以属于不同的虚拟机。
(2)进程,是一个具有一定独立功能的程序关于某个数据集合的一次运行活动,相当于程序的执行过程。一个进程往往可以包括多个线程。
(3)虚拟机,指通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统。一个虚拟机往往可以包括多个虚拟CPU。
(4)页表,是一种特殊的数据结构,用于存放逻辑(虚拟)地址与物理地址的对应关系。每一个进程(或虚拟机)都拥有一个自己的页表,进程内的多个线程(或者虚拟机中的多个虚拟CPU)共享这一份页表,当线程(或虚拟CPU)需要取数据时,可以通过查询页表获得物理地址来取数据。
(5)转址旁路缓存(translation lookaside buffer,TLB),是一个小的,虚拟寻址的缓存,其中每一行都保存着一个由单个页表项(page table entry,PTE)组成的块。TLB用于虚拟地址与物理地址之间的交互,提供一个寻找物理地址的缓存区,能够有效减少寻找物理地址所消耗时间。如果没有TLB,则每次取数据都需要两次访问内存,即查页表获得物理地址和取数据。简单地说,TLB就是页表的cache(高速缓存),其中存储了当前最可能被访问到的页表项,其内容是部分页表项的一个副本。在本申请实施例中,每个进程(或者虚拟机)都会维护自己的一份页表,该进程内的多个线程(或者虚拟机中的多个虚拟CPU)共享这一份页表。当本地物理CPU正在运行的某一进程中的某一线程(或者虚拟机中的虚拟CPU)修改了页表信息后,本地物理CPU需要将自身的TLB信息进行相应的更新,同时也要通知其余物理CPU将各自的TLB信息进行同步更新,以维持TLB信息的一致性,避免出现TLB访问错误。
如上所述,在多个物理CPU并行运行多个线程或者多个虚拟CPU的情况下,为了避免TLB访问错误,需要在各个物理CPU之间维护TLB的一致性。其中包括多种技术方案,例如常见的基于核间中断(inter-process interrupt,IPI)维护TLB一致性的TLB击落(shootdown)方案,以及基于硬件广播指令维护TLB一致性的TLB广播(broadcast)方案。
以TLB shootdown方案为例,在非虚拟化场景下,进程中共享页表的多个线程所在的物理CPU通过软件的方式对各自的TLB信息进行维护和更新。在虚拟化场景下,vCPU所在的物理CPU也通过软件的方式对各自的TLB信息进行维护和更新。具体地,当某一物理CPU(例如本地物理CPU)上运行的进程中的任意线程(或者虚拟机中的任意vCPU)修改了多核间共享的页表信息时,可以通过核间中断的方式告知其他物理CPU,令其他物理CPU刷新 (或无效掉)对应的TLB条目,以维持各个物理CPU上的TLB信息是最新的有效信息。
然而,软件维护的开销较大,造成了较大的延迟。具体而言,在非虚拟化场景下,产生IPI的CPU从发送中断通知其他物理CPU开始需要一直保持阻塞,直至其他物理CPU均响应完成了对各自TLB信息的更新。此外,在虚拟化场景下,同样为了维护TLB的一致性,虚拟机需要陷出到hypervisor(虚拟机监视器)发送IPI,以通知远端虚拟CPU刷新TLB,该操作会导致虚拟机陷入以及陷出,从而使得软件路径流程更长,一致性维护开销更大。并且,若其中一个远端vCPU处于非运行态(offline),则需要等待该vCPU被再次调度运行,才能响应TLB刷新请求,从而导致虚拟机的阻塞时延会再上升一个数量级,进一步增加虚拟化场景下TLB的一致性维护时延。
以TLB broadcast方案为例,在非虚拟化场景下,当某一物理CPU上运行的进程中的任意线程(或者虚拟机中的任意vCPU)修改了多核间共享的页表信息时,可以直接通过硬件广播通知单机系统(比如服务器等设备)内其余所有物理CPU刷新TLB信息。同样的,执行该硬件广播指令的物理CPU必须保持阻塞,直至接收到该系统内的其余所有物理CPU反馈TLB刷新完成的信号,才能继续执行后续指令。虚拟化场景同理,此处不再进行详述。
相较于前述TLB shootdown方案,TLB broadcast方案虽然可以通过硬件广播的方式,不打断其余物理CPU上正在执行的业务,没有软件流程上的开销,也可以避免在虚拟化场景下虚拟机陷入陷出导致TLB一致性维护时延加长的问题,性能较优。然而,随着系统中物理CPU数量增加,广播机制中无差别的通知物理CPU使得总线被长时间占用,导致大量的总线竞争行为,使得TLB一致性维护时延不断增加,不具备良好的可扩展性(scalability)。
因此,为了解决当前TLB一致性维护相关技术中时延长的问题,本申请实际要解决的技术问题包括如下方面:通过软件更新当前运行的进程或者虚拟机所对应的物理CPU范围;在任意进程维护的页表信息被修改时,通过硬件维护该物理CPU范围内的TLB一致性。从而大大减少TLB一致性维护消耗的时间,进而提升整个系统的访存性能和效率。
请参阅图1,图1是本申请实施例提供的一种系统架构示意图。本申请实施例的技术方案可以在图1举例所示的系统架构或类似的系统架构中具体实施。如图1所示,该系统架构中可以包括多个物理CPU和物理CPU范围信息模块,其中,该多个物理CPU具体可以包括CPU-1、CPU-1和CPU-3等。如图1所示,CPU-1、CPU-1和CPU-3中分别设置有定向CPU的TLB刷新模块,与此同时,CPU-1、CPU-1和CPU-3各自对应的内核或者虚拟机监视器中还部署有物理CPU范围维护模块。进一步地,如图1所示,在非虚拟化场景下,CPU-1、CPU-1和CPU-3上可以分别运行进程1中的线程1、进程1中的线程2、进程2中的线程3。如图1所示,在虚拟化场景下,CPU-1、CPU-1和CPU-3上可以分别运行虚拟机1中的vCPU1、虚拟机1中的vCPU2、虚拟机2中的vCPU3。如图1所示,该系统架构中还包括核间互联网络,CPU-1、CPU-1和CPU-3可以通过核间互联网络进行通信。可选地,该核间互联网络可以包括但不限于总线或者片上网络(network on a chip,NoC)等技术实现。
其中,物理CPU范围维护模块,用于维护物理CPU当前运行的进程或者虚拟机对应的物理CPU范围,在物理CPU上运行的进程或者虚拟机发生切换时,采取特定的策略对切换前后的进程或者虚拟机各自对应的物理CPU范围进行更新。可选地,该物理CPU范围维护模块可以为软件模块,即本申请实施例可以通过软件维护各个进程或者虚拟机对应的物理CPU范围。
可选地,在虚拟化场景下,虚拟机监视器负责维护虚拟机对应的物理CPU范围,维护策略如下:
(1)当不同虚拟机中的vCPU在同一物理CPU上切换上线时(即上线的虚拟机与最近一次在该物理CPU上运行的虚拟机不相同),则可以通过虚拟机监视器更新切换前后两个虚拟机对应的物理CPU范围。例如,如图1所示,CPU-1上当前运行的vCPU1属于虚拟机1,若下一时刻虚拟机2中的vCPU4(图中未示出)在CPU-1切换上线,即CPU-1上运行的虚拟机由虚拟机1切换成虚拟机2,则可以对虚拟机1和虚拟机2各自对应的物理CPU范围进行更新。如图1所示,切换前,虚拟机1在CPU-1和CPU-2上运行,因此切换前虚拟机1对应的物理CPU范围包括CPU-1和CPU-2,切换后,CPU-1上不再运行虚拟机1(具体为不再运行虚拟机1中的vCPU1),因此需在原本的物理CPU范围内删除该CPU-1,以更新得到虚拟机1当前对应的物理CPU范围,该更新后的物理CPU范围包括CPU-2。相应的,如图1所示,切换前,虚拟机2在CPU-3上运行,因此切换前虚拟机2对应的物理CPU范围包括CPU-3,切换后,CPU-1上开始运行虚拟机2(具体为开始运行虚拟机2中的vCPU4),因此需在原本的物理CPU范围内增加该CPU-1,以更新得到虚拟机2当前对应的物理CPU范围,该更新后的物理CPU范围包括CPU-1和CPU-3。如上所述,在更新物理CPU范围时,可以仅针对其中的变量(即发生虚拟机切换的物理CPU,比如CPU-1)进行更新,无需将原本的整个物理CPU范围擦除并重写,以提升物理CPU范围的更新效率,减少软件维护开销。下述非虚拟化场景同理,不再进行赘述。
(2)当同一虚拟机中的vCPU在同一物理CPU上切换上线时(即上线的虚拟机与最近一次在该物理CPU上运行的虚拟机相同),则可以不更新该虚拟机对应的物理CPU范围,以减少物理CPU范围的更新频率,减少软件维护开销。例如,如图1所示,CPU-1上当前运行的vCPU1属于虚拟机1,若下一时刻虚拟机1中的vCPU5(图中未示出)在CPU-1切换上线,即CPU-1上运行的虚拟机仍然为虚拟机1,则保持虚拟机1原本对应的物理CPU范围(包括CPU-1和CPU-2)。
进一步地,如(1)所述的情况,若物理CPU上发生了虚拟机的切换,则在更新前后两个虚拟机对应的物理CPU范围时,相应的,也要刷新该物理CPU上与即将上线的虚拟机相关的TLB信息,使得该物理CPU上当前维护的TLB信息与当前运行的虚拟机所维护的页表信息对应。可以理解的是,如(2)所述的情况,若不需要更新物理CPU范围,即物理CPU上没有发生了虚拟机的切换,则相应的也不需要刷新该物理CPU上的TLB信息。
可选地,在非虚拟化场景下,内核负责维护进程对应的物理CPU范围,维护策略如下:
(1)当不同进程中的线程在同一物理CPU上切换上线时(即上线的进程与最近一次在该物理CPU上运行的进程不相同),则可以通过进程监视器更新切换前后两个进程对应的物理CPU范围。例如,如图1所示,CPU-1上当前运行的线程1属于进程1,若下一时刻进程2中的线程4(图中未示出)在CPU-1切换上线,即CPU-1上运行的进程由进程1切换成进程2,则可以对进程1和进程2各自对应的物理CPU范围进行更新。如图1所示,切换前,进程1在CPU-1和CPU-2上运行,因此切换前进程1对应的物理CPU范围包括CPU-1和CPU-2,切换后,CPU-1上不再运行进程1(具体为不再运行进程1中的线程),因此需在原本的物理CPU范围内删除该CPU-1,以更新得到进程1当前对应的物理CPU范围,该更新后的物理CPU范围包括CPU-2。相应的,如图1所示,切换前,进程2在CPU-3上运行,因此切换前进程2对应的物理CPU范围包括CPU-3,切换后,CPU-1上开始运行进程2(具 体为开始运行进程2中的线程4),因此需在原本的物理CPU范围内增加该CPU-1,以更新得到进程2当前对应的物理CPU范围,该更新后的物理CPU范围包括CPU-1和CPU-3。
(2)当同一进程中的线程在同一物理CPU上切换上线时(即上线的进程与最近一次在该物理CPU上运行的进程相同),可以不更新该进程对应的物理CPU范围,以减少物理CPU范围的更新频率,减少软件维护开销。例如,如图1所示,CPU-1上当前运行的线程1属于进程1,若下一时刻进程1中的线程5(图中未示出)在CPU-1切换上线,即CPU-1上运行的进程仍然为进程1,则保持进程1原本对应的物理CPU范围(包括CPU-1和CPU-2)。
此外,需要说明的是,内核态线程不属于上述进程所属范围。因此,进程与内核态线程之间的切换不影响进程的物理CPU范围。
进一步地,如(1)所述的情况,若物理CPU上发生了进程的切换,则在更新前后两个进程对应的物理CPU范围时,相应的,也要刷新该物理CPU上与即将上线的进程相关的TLB信息,使得该物理CPU上当前维护的TLB信息与当前运行的进程所维护的页表信息对应。可以理解的是,如(2)所述的情况,若不需要更新物理CPU范围,即物理CPU上没有发生了进程的切换,则相应的也不需要刷新该物理CPU上的TLB信息。
其中,物理CPU范围信息模块,用于存储各个物理CPU对应的软、硬件全局可见的物理CPU范围(即各个物理CPU上当前运行的进程或虚拟机对应的物理CPU范围)。具体地,在非虚拟化场景下,该物理CPU范围信息内包含了各个进程中共享页表的各个线程所在的物理CPU;在虚拟化场景下,该物理CPU范围信息内包含了虚拟机中各个vCPU所在的物理CPU。可选地,该物理CPU范围信息模块可被原本或新增的软件模块(例如内核软件)、硬件模块访问,以更新或获取各个物理CPU(或者说各个物理CPU上运行的进程或虚拟机)当前对应的物理CPU范围。
再次强调,本申请实施例中的物理CPU范围用于表示进程或虚拟机当前正运行在哪些物理CPU上,进一步地,也表示进程或虚拟机需要维护TLB一致性的物理CPU范围。可选地,物理CPU范围的实现方式为包括但不限于位图等描述方式。如上所述,该物理CPU范围信息模块为全局可见,其物理CPU范围信息实现为全局可见的方式包括但不限于内核地址映射等技术手段。
进一步地,该物理CPU范围信息模块中为每个物理CPU分配了相应存储空间,专用于存储运行在该CPU上的进程或虚拟机维护的物理CPU范围。可选地,物理CPU范围信息模块可以由寄存器组构成,还可以是内存地址空间中的部分或者是高速缓冲存储器等等,本申请实施例对此不作具体限定。可选地,该物理CPU范围信息模块可以位于设备或系统内的公共位置,也可以位于物理CPU(例如CPU-1、CPU-1和CPU-3)中。此外,在物理CPU范围信息模块中,本申请实施例还可以针对非虚拟化场景和虚拟化场景使用不同的地址物理空间记录物理CPU范围信息,从而可以在CPU在虚拟化模式和非虚拟化模式下相互切换时,无需更新该CPU上存储的物理CPU范围信息。
其中,定向CPU的TLB刷新模块,用于当本地物理CPU运行的线程(例如CPU-1上运行的线程1)修改了当前进程(例如进程1)维护的页表信息后,可以根据该修改后的页表信息更新本地物理CPU维护的TLB信息(例如刷新或者无效掉对应的TLB条目等),并根据当前进程对应的物理CPU范围(例如包括CPU-1和CPU-2),向该范围内的其余所有物理CPU(例如包括CPU-2)发送TLB刷新请求,大大缩小了硬件需要维护的TLB所在的物理 CPU范围。
进一步地,定向CPU的TLB刷新模块的涵盖范围包括但不限于CPU侧的模块以及与CPU相连的核间互联网络。可选地,如图1所示的核间互联网络可以从物理CPU范围信息模块中获取当前物理CPU对应的物理CPU范围,从而向指定的物理CPU(即该物理CPU范围内除当前物理CPU外的其余所有物理CPU)发送TLB刷新请求,以及接收该指定的物理CPU的反馈信号,确保其TLB刷新动作结束,即确定本次TLB一致性维护完成。
综上,该系统架构中的各个部分可以位于电子设备中,该电子设备可以是智能可穿戴设备、智能手机、平板电脑、笔记本电脑、台式电脑、车载计算机或服务器等,可选地,可以是一台服务器,也可以是多台服务器构成服务器集群或者云计算服务中心。该电子设备也可以是上述设备中的部分器件,例如具备上述功能的芯片等,本申请实施例对此不作具体限定。
请参阅图2,图2是本申请实施例提供的一种应用场景示意图。如图2所示,本申请实施例可以应用于私有云下的非虚拟化场景和公有云下的虚拟化场景,可以面向虚拟服务器、函数计算服务器、裸金属服务器和容器服务器等服务器类型。
首先,如图2所示,在软件层面,本申请实施例的软件流程可以应用在宿主机操作系统(即内核)或虚拟机监视器层,并将该软件部署在具备支持定向CPU的TLB刷新模块以及物理CPU范围信息模块的物理服务器上。
需要说明的是,在公有云场景中,多个公司(或者说客户)的产品均会在公有云上进行部署,为了达到各个公司之间资源的充分隔离,以及各个公司部署的业务之间不会互相受到影响,甚至传染有害病毒,可以通过虚拟化技术对各个公司部署的业务进行隔离,保证其安全性。图2所示的虚拟服务器和函数计算服务器便是虚拟场景下常见的服务器,在虚拟化场景下,图2中的虚拟机操作系统和安全容器可以基于本申请实施例提供的维护方法,通过调用TLB刷新指令,在较小的范围内,高效、便捷地维护虚拟机在多核架构(即多个物理CPU)下的TLB一致性。
而在私有云的应用场景下,由于每个公司都只会通过自己的服务器(例如图2所示的裸金属服务器)在本地部署自己的业务,不会有其他公司的服务器,也即不会部署其他公司的业务,从而不会受到其他公司业务的影响,因此不需要虚拟化技术进行隔离。在非虚拟化场景下,图2所示的裸金属服务器中的用户态进程和容器服务器中的容器也可以基于本申请实施例提供的维护方法,通过调用TLB刷新指令,在较小的范围内,高效、便捷地维护进程或容器在多核架构下的TLB一致性。
综上,本申请实施例可以应用于虚拟化场景和非虚拟化场景,实现在较小的范围内维护TLB一致性,大大减小了虚拟化场景和非虚拟化场景下的TLB维护时延,有效提升整个系统的访存性能。
请参阅图3,图3是本申请实施例提供的一种转址旁路缓存的维护方法流程示意图。该方法可以应用于图1所述的系统架构或者图2所示的应用场景中。该方法可以应用于电子设备中,该电子设备可以包括多个物理CPU。在非虚拟化场景中,该电子设备上可以运行有第一进程,该第一进程当前可以包括M个第一线程;M个第一线程当前分别运行在多个物理CPU中的M个物理CPU上;M为大于或者等于1的整数。同理,在虚拟化场景中,该电子设备上可以运行有第一虚拟机,该第一虚拟机当前可以包括M个第一虚拟CPU;M个第一虚拟CPU当前分别运行在多个物理CPU中的M个物理CPU上。该方法可以包括以下步骤 S301-步骤S302。
步骤S301,确定第一进程或第一虚拟机对应的物理CPU范围S1,物理CPU范围S1包括M个物理CPU。
具体地,在非虚拟化场景下,若第一进程中当前有M个第一线程分别运行在M个物理CPU上,则可以确定该第一进程当前对应的物理CPU范围S1内包括该M个物理CPU。在虚拟化场景下,若第一虚拟机中当前有M个第一虚拟CPU分别运行在M个物理CPU上,则可以确定该第一虚拟机当前对应的物理CPU范围S1内包括该M个物理CPU。
例如,在非虚拟化场景下,该第一进程可以包括有20个第一线程,当前该20个第一线程中的5个第一线程可以分别运行在5个物理CPU上,而其余15个第一线程当前没有在任意物理CPU上运行(可能已经完成了运行,也可能还未运行),则该第一进程当前对应的物理CPU范围(例如物理CPU范围S1)可以包括该5个物理CPU。
又例如,在虚拟场景下,该第一虚拟机可以包括有10个第一虚拟CPU,当前该10个第一虚拟CPU中的8个第一虚拟CPU可以分别运行在8个物理CPU上,而其余2个第一虚拟CPU当前没有在任意物理CPU上运行,则该第一虚拟机当前对应的物理CPU范围(例如物理CPU范围S1)可以包括该8个物理CPU。
步骤S302,基于第一进程或第一虚拟机维护的页表信息,更新物理CPU范围S1内的所有物理CPU各自维护的TLB信息。
具体地,首先需要说明的是,如上所述,TLB可以认为是页表的cache(高速缓存),其中存储了当前最可能被访问到的页表项,其内容是部分页表项的一个副本。也就是说,当第一进程或第一虚拟机中的M个第一线程或第一虚拟CPU在M个物理CPU上运行时,该M个物理CPU上维护的TLB信息与该第一进程或第一虚拟机维护的页表信息相对应。因此,在第一进程或第一虚拟机维护的页表信息发生修改时(例如其中的第一线程或者第一虚拟CPU修改了页表信息),需要对M个物理CPU上各自维护的TLB信息进行同步更新,以维持该M个物理CPU上各自维护的TLB信息是最新的有效信息,避免TLB访问错误。
综上,本申请实施例考虑到每个进程(或虚拟机)都会维护一份自己的页表信息,每个进程(或虚拟机)内的多个线程(或虚拟CPU)共享页表信息,不同进程(或虚拟机)之间相互独立。因此,当页表信息被修改后,只需针对当前正在运行该进程(或虚拟机)中的线程(或虚拟CPU)的物理CPU进行相应的TLB一致性维护,便可以有效避免TLB访问错误。如此,本申请实施例可以在有效避免TLB访问错误的前提下,最大程度上缩小每次需要维护TLB一致性的物理CPU范围,从而实现高效、便捷且准确的TLB一致性维护。
进一步地,如上所述,在非虚拟化场景下,进程中共享页表的各个线程修改页表后,需要更新该进程中各个线程所在物理CPU上的TLB信息,以保证访存的正确性。本申请实施例为了减少硬件需要维护的TLB的物理CPU范围,每个进程都可以使用软件变量记录了该进程中各个线程当前所在物理CPU的信息(例如第一进程中的M个第一线程所在的M个物理CPU),且各个线程之间可以共享这份信息。并且,当物理CPU上运行的进程发生切换时(即切换后运行的线程与切换前运行的线程属于不同的进程),需要实时更新前后两个进程各自对应的物理CPU范围,以维持各个进程对应的物理CPU范围内始终包括当前正在运行该进程的准确物理CPU信息,为后续TLB一致性维护提供准确范围,进而保证TLB一致性维护的有效性。
请参阅图4a,图4a是本申请实施例提供的一种物理CPU范围的更新流程示意图。如图 4a所示,在非虚拟化场景下,进程对应的物理CPU范围的更新流程可以包括如下步骤S401-步骤S405。
步骤S401,内核调度器调度第一线程在第一物理CPU上线。
具体地,请一并参考图3对应的实施例,该M个物理CPU可以包括第一物理CPU和M-1个第二物理CPU,其中,在第一进程中的第一线程运行在该第一物理CPU上之前,该第一物理CPU可以运行有第二线程,同时,该M-1个第二物理CPU可以已经在运行该第一线程。此时,第一进程对应的物理CPU范围S2可以仅包括上述M-1个第二物理CPU。然后,在第一物理CPU上的第二线程运行完后,内核调度器可以调度第一线程在该第一物理CPU上线。需要说明的是,此处第一线程上线后并不是立刻运行,而是可以在后续物理CPU范围更新完成后再运行。
步骤S402,判断第一物理CPU上一次运行的第二线程是否与第一线程属于同一进程。
具体地,在第一线程上线后,第一物理CPU判断其上一次运行的第二线程是否与当前的第一线程属于同一进程(例如第一进程),即判断第一物理CPU上是否发生了进程的切换。若第二线程不属于该第一进程,则执行步骤S403;否则,执行步骤S405。
步骤S403,更新第二线程、第一线程各自所属进程的物理CPU范围。
具体地,该第二线程可以属于第二进程,在第一进程中的第一线程运行在该第一物理CPU上之前,该第二进程中的N个第二线程可以运行在该第一物理CPU和其余N-1个第三物理CPU上,N为大于1的整数。此时,第二进程对应的物理CPU范围S3可以包括该第一物理CPU和N-1个第三物理CPU。然后,如上所述,在第一进程中的第一线程在第一物理CPU上线后,第一物理CPU不再运行第二进程中的第二线程,因此,可以将第二进程对应的物理CPU范围S3更新为物理CPU范围S4(包括N-1个第三物理CPU),以及将第一进程对应的物理CPU范围S2更新为物理CPU范围S1(包括第一物理CPU和M-1个第二物理CPU,即包括上述M个物理CPU)。
可选地,可以通过内核中新增变量或数据结构的方式记录进程中各个共享页表的线程所在的物理CPU。
可选地,需要说明的是,当第一物理CPU上切换了新的进程(即第一进程)上线时,此时的页表信息也同步切换成了第一进程维护的页表信息,因此第一物理CPU需要刷新掉原来第二进程对应的TLB信息,切换成(或者说更新成)与当前第一进程的页表信息相对应的TLB信息。
步骤S404,更新软、硬件全局可见的物理CPU范围信息。
具体地,根据上述第一进程和第二进程对应的物理CPU范围的更新,同步更新软、硬件全局可见的物理CPU范围信息。
需要说明的是,该软、硬件全局可见的物理CPU范围信息中包括各个物理CPU当前对应的软、硬件全局可见的物理CPU范围。可选地,在本申请实施例中,可以将各个物理CPU上当前运行的进程所对应的物理CPU范围作为各个物理CPU当前对应的物理CPU范围。如此,基于上述第一进程和第二进程各自对应的物理CPU范围的更新,可以将第一物理CPU对应的物理CPU范围由物理CPU范围S3更新为物理CPU范围S1;以及,将M-1个第二物理CPU各自对应的物理CPU范围由物理CPU范围S2更新为物理CPU范围S1;以及,将N-1个第三物理CPU各自对应的物理CPU范围由物理CPU范围S3更新为物理CPU范围S4。如此,更新后该软、硬件全局可见的物理CPU范围信息中可以包括上述第一物理CPU和M-1个第二物理CPU(即M个物理CPU)各自对应的物理CPU范围S1,以及上述N-1个第三物 理CPU各自对应的物理CPU范围S4。
可选地,步骤S04可以仅更新第一进程和第二进程各自对应的物理CPU范围内除第一物理CPU外的其他物理CPU对应的物理CPU范围,即先不更新本物理CPU对应的物理CPU范围,本申请实施例对此不作具体限定。例如,仅将上述M-1个第二物理CPU各自对应的物理CPU范围由物理CPU范围S2更新为物理CPU范围S1;以及,将N-1个第三物理CPU各自对应的物理CPU范围由物理CPU范围S3更新为物理CPU范围S4。
可选地,请参阅图4b,图4b是本申请实施例提供的一种物理CPU范围信息的示意图。如图4b所示,本申请实施例可以基于全局可见的寄存器组来记录(或者说存储)上述软、硬件全局可见的物理CPU范围信息。如图4b所示,本申请实施例为每个物理CPU新增两组宿主机特权态才能访问的寄存器,分别为cpu_bitmap和vcpu_bitmap。例如,在支持128核的64位系统中,系统为每个物理CPU新增2个cpu_bitmap和2个vcpu_bitmap寄存器(LEN=64,N=128)。又例如,在1024核的64位系统中,系统为每个物理CPU新增16个cpu_bitmap和16个vcpu_bitmap寄存器(LEN=64,N=1024)。
可选地,在非虚拟化场景下(或者说非虚拟化模式下),可以用cpu_bitmap记录当前正在运行的进程对应的物理CPU范围(即记录正在运行该进程内共享页表的各线程的物理CPU);在虚拟化模式下,可以用vcpu_bitmap记录当前正在运行的虚拟机对应的物理CPU范围(即记录正在运行该虚拟机内各vCPU的物理CPU)。
可选地,图4b所示的新增寄存器组全局可见,允许内核和虚拟机监视器访问所有物理CPU对应的cpu_bitmap和vcpu_bitmap寄存器组,具体可以通过包括但不限于内存映射技术实现访问。
如图4b所示,当物理CPU上的线程发生切换时,如果上线线程所属的进程与该CPU上最近一次运行过的线程所属的进程不相同(例如图4b中的CPU-2,进程-1(线程-11)→进程-2(线程-21),或进程-1(线程-11)→内核线程→进程-2(线程-21)),则更新切换前、后进程中存储的物理CPU范围的变量(即该CPU-2),并基于该变量更新前、后进程所在的物理CPU范围内其他物理CPU对应的cpu_bitmap信息。具体而言,在前一进程的cpu_bitmap信息中去掉当前物理CPU(即该CPU-2);在后一进程的cpu_bitmap信息中加上当前物理CPU(即该CPU-2)。
下面,将通过举例进一步详细说明,在非虚拟化场景下,软、硬件全局可见的物理CPU范围信息的更新流程。
请参阅图4c,图4c是本申请实施例提供的一种物理CPU范围信息的更新流程示意图。如图4c所示,以64位,128核的非虚拟化场景为例,cpu_bitmap可以包括128位,每一个bit(比特)位从前到后按序表示一个物理CPU,例如第一个bit位(bit[0])表示CPU-1,第二个bit位(bit[1])表示CPU-2,第三个bit位(bit[2])表示CPU-3,以此类推。如图4c所示,CPU-1上当前运行有进程-1中的线程-11,CPU-2上当前运行有进程-1中的线程-12,CPU-3上当前运行有进程-2中的线程-21。此时,进程-1对应的物理CPU范围包括CPU-1和CPU-2,进程-2对应的物理CPU范围包括CPU-3。相应的,此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2,此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2,此时CPU-3对应的软、硬件全局可见的物理CPU范围包括CPU-3。
如图4c所示,对于CPU-1,cpu_bitmap的前两个bit位均为1,而其余bit位均为0,则表示此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2。如图4c所 示,对于CPU-2,cpu_bitmap的前两个bit位均为1,而其余bit位均为0,则表示此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2。如图4c所示,对于CPU-3,cpu_bitmap的第三个bit位为1,而其余bit位均为0,则表示此时CPU-3对应的软、硬件全局可见的物理CPU范围仅包括CPU-3。
进一步地,如图4c所示,当CPU-1上运行的进程发生切换,由进程-1切换至进程-2时,更新各个物理CPU对应的物理CPU范围。如图4c所示,当CPU-1上运行的进程发生切换后,此时CPU-1上线进程-2中的线程-22,CPU-2上运行有进程-1中的线程-12,CPU-3上运行有进程-2中的线程-21。相应的,此时进程-1对应的物理CPU范围包括CPU-2,进程-2对应的物理CPU范围包括CPU-1和CPU-3。相应的,此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-3,此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-2,此时CPU-3对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-3。基于此,如图4c所示,该更新流程具体可包括以下步骤:
(1)在进程-1对应的物理CPU范围内删除CPU-1,并基于此更新CPU-2对应的物理CPU范围。例如CPU-2cpu_bitmap bit[1]=0//减少硬件可见的进程-1的物理CPU范围。
(2)在进程-2对应的物理CPU范围内增加CPU-1,并基于此更新CPU-3对应的物理CPU范围。例如CPU-3cpu_bitmap bit[1]=1//增加硬件可见的进程-2的物理CPU范围。
(3)在进程-2中的线程-22在CPU-1上启动运行前,使用进程-2当前对应的物理CPU范围(包括CPU-1和CPU-3)覆盖更新CPU-1对应的物理CPU范围。例如CPU-1cpu_bitmap=物理CPU范围。
综上,如图4c所示,在物理CPU范围信息更新完成后,对于CPU-1,cpu_bitmap的第一个bit位和第三个bit位均为1,而其余bit位均为0,则表示此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-3。对于CPU-2,cpu_bitmap的前第二个bit位为1,而其余bit位均为0,则表示此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-2。对于CPU-3,cpu_bitmap的第一个bit位和第三个bit位均为1,而其余bit位均为0,则表示此时CPU-3对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-3。
需要说明的是,如图4c所示,每个bit位表示一个物理CPU,因此cpu_bitmap的单个bit在任一时刻只会有一个CPU对其进行更新(即不会出现并发去写同一个bit位的情况),从而整个更新过程都无需对cpu_bitmap持锁(即实现无锁更新)。可选地,当一个bit位表示一个物理CPU集合(例如cluster(簇),包括多个物理CPU组)时,则可以使用持锁(或者说加锁)方式更新相应的物理CPU范围,以防发生并发读写错误,下述虚拟场景同理,不再进行赘述。可选地,持锁范围可以是bit(比特),byte(字节),halfword(半字),word(字),doubleword(双字)等。可选地,锁的类型包括但不限于读写锁,互斥锁,以及原子操作指令(compare and swap,CAS)等等,本申请实施例对此不作具体限定。
可选地,本申请实施例还可以采用全局可见的内存地址空间来记录(或者说存储)上述软、硬件全局可见的物理CPU范围信息。具体地,可以采用物理内存地址空间替代寄存器组,在内存开辟两个固定的内存区域,分别存放各个物理CPU上正在运行的虚拟机或进程对应的物理CPU范围。软、硬件在每次获取各个物理CPU对应的物理CPU范围时,都是通过访问该物理地址空间的内存实现的。可选地,本申请实施例还可以采用高速缓冲存储器替代寄存器组来记录(或者说存储)上述软、硬件全局可见的物理CPU范围信息,等等,本申请实施例对此不作具体限定。
步骤S405,根据第一线程所属进程对应的物理CPU范围,更新第一物理CPU对应的软、硬件全局可见的物理CPU范围。
具体地,若上线的第二线程与上一次运行的第一线程属于同一进程,即该物理CPU上没有发生进程切换,则可以根据第一进程当前对应的物理CPU范围,更新第一物理CPU对应的软、硬件全局可见的物理CPU范围。可以理解的是,虽然第一物理CPU上没有发生进程切换,但是其他CPU上可能发生了进程切换,则相应进程(例如第一进程)对应的物理CPU范围也进行了更新,则第一物理CPU仍然需要根据第一进程当前对应的物理CPU范围,更新第一物理CPU对应的软、硬件全局可见的物理CPU范围。例如图4c所示,CPU-2可以为该第一物理CPU,CPU-2可以切换进程-1中的线程-13(图中未示出)上线,但此时CPU-1上发生了进程切换,进程-1和进程-2的物理CPU范围进行了更新,因此,CPU-2需要根据进程-1当前的物理CPU范围(包括CPU-2),更新CPU-2对应的软、硬件全局可见的物理CPU范围。
相应的,如上所述,在虚拟化场景下,虚拟机中共享页表的各个虚拟CPU修改页表后,需要更新该虚拟机中各个虚拟CPU所在物理CPU上的TLB信息,以保证访存的正确性。本申请实施例为了减少硬件需要维护的TLB的物理CPU范围,每个虚拟机都可以使用软件变量记录了该虚拟机中各个虚拟CPU当前所在物理CPU的信息(例如第一虚拟机中的M个第一虚拟CPU所在的M个物理CPU),且各个虚拟CPU之间可以共享这份信息。并且,当物理CPU上运行的虚拟机发生切换时(即切换后运行的虚拟CPU与切换前运行的虚拟CPU属于不同的虚拟机),需要实时更新前后两个虚拟机各自对应的物理CPU范围,以维持各个虚拟机对应的物理CPU范围内始终包括当前正在运行该虚拟机的准确物理CPU信息,为后续TLB一致性维护提供准确范围,进而保证TLB一致性维护的有效性。
请参阅图5a,图5a是本申请实施例提供的另一种物理CPU范围的更新流程示意图。如图5a所示,在虚拟化场景下,虚拟机对应的物理CPU范围的更新流程可以包括如下步骤S501-步骤S505。
步骤S501,虚拟机监视器调度第一虚拟CPU在第一物理CPU上线。
具体地,步骤S501可参考上述图4a对应实施例中的步骤S401,此处不再进行赘述。
步骤S502,判断第一物理CPU上一次运行的第二虚拟CPU是否与第一虚拟CPU属于同一虚拟机。
具体地,步骤S502可参考上述图4a对应实施例中的步骤S402,此处不再进行赘述。
步骤S503,更新第二虚拟CPU、第一虚拟CPU各自所属虚拟机的物理CPU范围。
具体地,步骤S503可参考上述图4a对应实施例中的步骤S403,此处不再进行赘述。
步骤S504,更新软、硬件全局可见的物理CPU范围信息。
具体地,步骤S504可参考上述图4a对应实施例中的步骤S404,此处不再进行赘述。
可选地,请参阅图5b,图5b是本申请实施例提供的另一种物理CPU范围信息的示意图。图5b中的各部分介绍可以参考上述图4b对应实施例的描述,此处不再进行赘述。如图5b所示,当物理CPU上的vCPU发生切换时,如果上线vCPU所属的虚拟机(virtual machine,VM)与该CPU上最近一次运行过的vCPU所属的VM不相同(例如图5b中的CPU-2,VM-1(vCPU-11)→VM-2(vCPU-21),或VM-1(vCPU-11)→host(主机)进程→VM-2(vCPU-21)),则更新切换前、后VM中存储的物理CPU范围的变量(即该CPU-2),并基于该变量更新前、后VM所在的物理CPU范围内其他物理CPU对应的cpu_bitmap信息。具体而言,在前一 VM的cpu_bitmap信息中去掉当前物理CPU(即该CPU-2);在后一VM的cpu_bitmap信息中加上当前物理CPU(即该CPU-2)。
下面,将通过举例进一步详细说明,在虚拟化场景下,软、硬件全局可见的物理CPU范围信息的更新流程。
可选地,请参阅图5c,图5c是本申请实施例提供的另一种物理CPU范围信息的更新流程示意图。如图5c所示,以64位,128核的虚拟化场景为例,vcpu_bitmap可以包括128位,每一个bit位从前到后按序表示一个物理CPU,例如第一个bit位(bit[0])表示CPU-1,第二个bit位(bit[1])表示CPU-2,第三个bit位(bit[2])表示CPU-3,以此类推。如图5c所示,CPU-1上当前运行有VM-1中的vCPU-11,CPU-2上当前运行有VM-1中的vCPU-12,CPU-3上当前运行有VM-2中的vCPU-21。此时,VM-1对应的物理CPU范围包括CPU-1和CPU-2,VM-2对应的物理CPU范围包括CPU-3。相应的,此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2,此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2,此时CPU-3对应的软、硬件全局可见的物理CPU范围包括CPU-3。
如图5c所示,对于CPU-1,vcpu_bitmap的前两个bit位均为1,而其余bit位均为0,则表示此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2。如图5c所示,对于CPU-2,vcpu_bitmap的前两个bit位均为1,而其余bit位均为0,则表示此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-2。如图5c所示,对于CPU-3,vcpu_bitmap的第三个bit位为1,而其余bit位均为0,则表示此时CPU-3对应的软、硬件全局可见的物理CPU范围仅包括CPU-3。
进一步地,如图5c所示,当CPU-2上运行的VM发生切换,由VM-1切换至VM-2时,更新各个物理CPU对应的物理CPU范围。如图5c所示,当CPU-2上运行的VM发生切换后,此时CPU-2上线VM-2中的vCPU-22,CPU-1上运行有VM-1中的vCPU-11,CPU-3上运行有VM-2中的vCPU-21。相应的,此时VM-1对应的物理CPU范围包括CPU-1,VM-2对应的物理CPU范围包括CPU-2和CPU-3。相应的,此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1,此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-2和CPU-3,此时CPU-3对应的软、硬件全局可见的物理CPU范围包括CPU-1和CPU-3。基于此,如图5c所示,该更新流程具体可包括以下步骤:
(1)在VM-1对应的物理CPU范围内删除CPU-2,并基于此更新CPU-1对应的物理CPU范围。例如CPU-1cpu_bitmap bit[1]=0//减少硬件可见的VM-1的物理CPU范围。
(2)在VM-2对应的物理CPU范围内增加CPU-1,并基于此更新CPU-3对应的物理CPU范围。例如CPU-3cpu_bitmap bit[1]=1//增加硬件可见的VM-2的物理CPU范围。
(3)在VM-2中的vCPU-22在CPU-2上启动运行前,使用VM-2当前对应的物理CPU范围(包括CPU-2和CPU-3)覆盖更新CPU-1对应的物理CPU范围。例如CPU-2cpu_bitmap=物理CPU范围。
综上,如图5c所示,在物理CPU范围信息更新完成后,对于CPU-1,vcpu_bitmap的第一个bit位为1,而其余bit位均为0,则表示此时CPU-1对应的软、硬件全局可见的物理CPU范围包括CPU-1。对于CPU-2,vcpu_bitmap的第二个bit位和第三个bit位均为1,而其余bit位均为0,则表示此时CPU-2对应的软、硬件全局可见的物理CPU范围包括CPU-2和CPU-3。对于CPU-3,vcpu_bitmap的第二个bit位和第三个bit位均为1,而其余bit位均为0,则表示此时CPU-3对应的软、硬件全局可见的物理CPU范围包括CPU-2和CPU-3。
步骤S505,根据第一虚拟CPU所属虚拟机对应的物理CPU范围,更新第一物理CPU对应的软、硬件全局可见的物理CPU范围。
具体地,步骤S505可参考上述图4a对应实施例中的步骤S405,此处不再进行赘述。
进一步地,结合上述实施例,当M个第一线程或第一虚拟CPU在M个物理CPU上运行时,任意线程均有可能修改该M个线程或第一虚拟CPU共享的页表信息(即第一进程或第一虚拟机维护的页表信息),此时需要根据修改的页表信息,同步更新该M个物理CPU上各自维护的TLB信息,维持物理CPU范围S1内的TLB一致性,避免后续TLB访问错误。
可选地,请参阅图6,图6是本申请实施例提供的一种TLB刷新流程示意图。如图6所示,在非虚拟化场景以及虚拟化场景下,TLB的刷新流程具体可以包括如下步骤S601-步骤S610。
步骤S601,目标物理CPU获取目标物理CPU上当前运行的第一进程标识号或第一虚拟机标识号。
具体地,若M个物理CPU中的目标物理CPU上运行的第一线程或第一虚拟CPU修改了第一进程维护的页表信息,则该目标物理CPU可以开始执行TLB刷新指令。该指令对应的硬件模块可以从CPU侧的寄存器(例如控制状态寄存器(control status register,CSR))中获取当前物理CPU上正在运行的进程标识号(例如第一进程标识号)或者虚拟机标识号(例如第一虚拟机标识号)。可以理解是,进程标识号或虚拟机标识号同时也可以识别出当前物理CPU所处于非虚拟化场景或虚拟化场景。
步骤S602,目标物理CPU向通信媒介发送TLB刷新请求以及相应的TLB刷新信息。
具体地,目标物理CPU向通信媒介(例如核间互联网络,比如总线或者片上网络)发送TLB刷新请求以及相应的TLB刷新信息,或者说,该TLB刷新请求携带有相应的TLB刷新信息。
可选地,在非虚拟化场景下,该TLB刷新信息可以包括但不限于第一进程对应的进程标识号(例如第一进程标识号)、修改后的页表信息对应的虚拟地址以及虚拟地址范围中的一种或多种,本申请实施例对此不作具体限定。
可选地,在虚拟化场景下,该TLB刷新信息可以包括但不限于第一虚拟机对应的虚拟机标识号(例如第一虚拟机标识号)、修改后的页表信息对应的虚拟机中的虚拟地址以及虚拟地址范围中的一种或多种,本申请实施例对此不作具体限定。
步骤S603,通信媒介获取该目标物理CPU当前对应的物理CPU范围S1,并向物理CPU范围S1内的其余所有物理CPU发送TLB刷新请求。
具体地,通信媒介在接收到目标物理CPU发送的TLB刷新请求后,可以从上述软、硬件全局可见的物理CPU范围信息中获取与该目标物理CPU对应的物理CPU范围S1,并向该物理CPU范围S1内的其余所有物理CPU发送TLB刷新请求。
由此,相较于以往需要无差别地向系统内的所有物理CPU发送TLB刷新请求的方案而言,本申请实施例可以通过明确一个必要的物理CPU范围,将通知范围(即TLB刷新请求的发送范围)缩小,即缩小了TLB维护范围,只需针对当前正在运行该进程(或虚拟机)的物理CPU进行相应的TLB一致性维护,继而可以在有效避免TLB访问错误的前提下,实现高效、便捷且准确的TLB一致性维护。
步骤S604,目标物理CPU更新本地维护的TLB信息。
具体地,目标物理CPU根据修改后的页表信息,更新本地维护的TLB信息,例如刷新或者无效掉相应的TLB条目等等。
可选地,目标物理CPU在完成本地的TLB信息更新后,可以执行步骤S609,等待其余所有物理CPU的反馈信号。
步骤S605,物理CPU范围S1内的其余所有物理CPU接收TLB刷新请求。
具体地,物理CPU范围S1内的其余所有物理CPU可以通过其中的TLB刷新硬件逻辑电路接收通信媒介发送的TLB刷新请求。
步骤S606,物理CPU范围S1内的其余所有物理CPU通过硬件解析TLB刷新信息。
具体地,在接收到TLB刷新请求后,物理CPU范围S1内的其余所有物理CPU可以通过硬件解析该请求相应的TLB刷新信息。
步骤S607,物理CPU范围S1内的其余所有物理CPU更新TLB信息,且不打断各自的软件执行流程。
具体地,物理CPU范围S1内的其余所有物理CPU基于上述TLB刷新信息,通过硬件更新各自维护的TLB信息。可选地,该TLB更新过程由硬件完成,不打断其余所有物理CPU各自的软件执行流程。
如上所述,本申请实施例可以通过硬件获取物理CPU范围,并进行TLB信息解析、TLB信息更新,这样一来,可以消除了绝大部分软件开销,无需虚拟机陷入陷出、中断发送和中断响应等软件流程,从而进一步减少了TLB一致性维护时延,提升了TLB一致性维护的效率。
步骤S608,物理CPU范围S1内的其余所有物理CPU发送反馈消息至目标物理CPU。
具体地,物理CPU范围S1内的其余所有物理CPU在完成各自维护的TLB信息更新后,可以发送反馈信号至目标物理CPU。可选地,其余所有物理CPU可以发送反馈信号至上述通信媒介,然后,通信媒介转发该反馈消息之目标物理CPU。其中,其余所有物理CPU中的任意一个物理CPU发送的反馈信号可以用于指示其TLB更新已完成。
步骤S609,目标物理CPU等待接收反馈信号。
具体地,目标物理CPU在完成本地的TLB信息更新后,保持阻塞并等待其余所有物理CPU的反馈信号。
步骤S610,目标物理CPU是否接收到物理CPU范围S1内的其余所有物理CPU发送的反馈信号。
具体地,若目标物理CPU接收到物理CPU范围S1内的其余所有物理CPU发送的反馈信号,则可以确定该物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致,即确定本次TLB一致性维护已完成,TLB刷新指令执行结束,进而该目标物理CPU可以执行后续指令;否则,该目标物理CPU继续保持阻塞直至接收到物理CPU范围S1内的其余所有物理CPU发送的反馈信号。
请参阅图7,图7是本申请实施例提供的另一种TLB刷新流程示意图。该流程方法可以应用于非虚拟化场景和虚拟化场景。如图7所示,CPU-1(例如上述目标物理CPU)执行的TLB刷新指令对应有反馈信号(acknowledges,ACKs)统计模块1001、发送模块1002和TLB刷新模块1003等硬件模块。整个用于TLB刷新的硬件电路的涵盖范围可以包括但不限于CPU侧的多个硬件模块以及核间互联网络。如图7所示,以通过核间互联网络获取物理CPU范围为例,该TLB刷新流程具体如下:
1.CPU-1执行TLB刷新指令,通过发送模块1002向核间互联网络发送TLB刷新请求和相关TLB刷新信息。
2.CPU-1通过TLB刷新模块1003执行本地TLB刷新,即更新本地的TLB信息。
3.核间互联网络接收TLB刷新请求,确定该请求对应的CPU-1,并从软、硬件全局可见的物理CPU范围信息中获取当前与该CPU-1对应的物理CPU范围(例如物理CPU范围S1,包括CPU-2)。
4.核间互联网络向当前与该CPU-1对应的物理CPU范围内的其余所有物理CPU(例如CPU-2)发送TLB刷新请求。
5.CPU-2通过其中的TLB刷新模块2003接收TLB刷新请求,并更新该CPU-2维护的TLB信息。
6.CPU-2通过TLB刷新模块2003,向核间互联网络反馈ACK。
7.核间互联网络反馈ACK至CPU-1,相应的,CPU-1通过其中的ACKs统计模块1001接收反馈信号。
8.CPU-1通过其中的ACKs统计模块1001确定已接收到当前物理CPU范围内的其余所有物理CPU的反馈信号后,TLB刷新指令执行结束,CPU-1可以执行后续指令。
请参阅图8,图8是本申请实施例提供的又一种TLB刷新流程示意图。图8中的各部分的介绍具体可以参考图7对应实施例中的描述,此处不再进行赘述。如图8所示,本申请实施例还可以不通过核间互联网络,而是由CPU-1直接从软、硬件全局可见的物理CPU范围信息中获取当前与该CPU-1对应的物理CPU范围。如图8所示,以通过物理CPU自己获取物理CPU范围为例,该TLB刷新流程具体如下:
1.CPU-1执行TLB刷新指令,通过发送模块1002从软、硬件全局可见的物理CPU范围信息中获取当前与该CPU-1对应的物理CPU范围(例如物理CPU范围S1,包括CPU-2)。
2.CPU-1通过发送模块1002向核间互联网络发送TLB刷新请求和相关TLB刷新信息。可选地,该TLB刷新请求可以携带有与该物理CPU范围S1相关的指示信息。
3.CPU-1通过TLB刷新模块1003执行本地TLB刷新,即更新本地的TLB信息。
4.核间互联网络接收TLB刷新请求,基于该请求确定当前与该CPU-1对应的物理CPU范围内的其余所有物理CPU(例如CPU-2),并向该CPU-2发送TLB刷新请求。
5.CPU-2通过其中的TLB刷新模块2003接收TLB刷新请求,并更新该CPU-2维护的TLB信息。
6.CPU-2通过TLB刷新模块2003,向核间互联网络反馈ACK。
7.核间互联网络反馈ACK至CPU-1,相应的,CPU-1通过其中的ACKs统计模块1001接收反馈信号。
8.CPU-1通过其中的ACKs统计模块1001确定已接收到当前物理CPU范围内的其余所有物理CPU的反馈信号后,TLB刷新指令执行结束,CPU-1可以执行后续指令。
综上,本申请实施例可以通过软件维护并更新进程或虚拟机对应的物理CPU范围,以及通过硬件获取当前物理CPU范围并基于该范围刷新TLB信息,即在该范围内进行TLB一致性维护,如此,大大减少了需要进行TLB刷新的物理CPU数量,降低了TLB一致性维护时延,有效提升了系统的整体访存性能。
可选地,本申请实施例中所描述的转址旁路缓存的维护方法中的各方法流程具体可以基 于软件、硬件、或其结合的方式实现。其中,以硬件实现的方式可以包括逻辑电路、算法电路或模拟电路等。以软件实现的方式可以包括程序指令,可以被视为是一种软件产品,被存储于存储器中,并可以被处理器运行以实现相关功能。
基于上述方法实施例的描述,本申请实施例还提供一种电子设备。请参阅图9,图9是本申请实施例提供的一种电子设备的结构示意图。如图9所示,该电子设备至少包括处理器1101,输入设备1102、输出设备1103和计算机可读存储介质1104,该电子设备还可以包括其他通用部件,在此不再详述。其中,电子设备内的处理器1101,输入设备1102、输出设备1103和计算机可读存储介质1104可通过总线或其他方式连接。
处理器1101可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。在本申请实施例中,该处理器1101可以包括多个物理CPU。电子设备可以运行有第一进程,第一进程当前包括M个第一线程,M个第一线程当前分别运行在该多个物理CPU中的M个物理CPU上。其中,M个物理CPU,可以用于确定第一进程对应的物理CPU范围S1,物理CPU范围S1包括该M个物理CPU。M个物理CPU,还用于当第一进程维护的页表信息发生修改时,基于修改后的页表信息,同步更新物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。
该电子设备内的存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
计算机可读存储介质1104可以存储在电子设备的存储器中,所述计算机可读存储介质1104用于存储计算机程序,所述计算机程序包括程序指令,所述处理器1101用于执行所述计算机可读存储介质1104存储的程序指令。处理器1101(或称CPU(Central Processing Unit,中央处理器))是电子设备的计算核心以及控制核心,其适于实现一条或一条以上指令,具体适于加载并执行一条或一条以上指令从而实现相应方法流程或相应功能;在一个实施例中,本申请实施例所述的处理器1101可以用于进行转址旁路缓存的维护方法的一系列处理,包括:确定第一进程对应的物理CPU范围S1,物理CPU范围S1包括M个物理CPU;当第一进程维护的页表信息发生修改时,基于修改后的页表信息,同步更新物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息,等等。
需要说明的是,本申请实施例中所描述的电子设备中各功能单元的功能可参见上述图3-图8所述实施例中的相关描述,此处不再赘述。
本申请实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质可存储有程序,该程序被处理器执行时,使得所述处理器可以执行上述方法实施例中记载的任意一种的部分或全部步骤。
本申请实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被多核 处理器执行时,使得所述处理器可以执行上述方法实施例中记载的任意一种的部分或全部步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(read-only memory,ROM)、双倍速率同步动态随机存储器(double data rate,DDR)、闪存(flash)或者随机存取存储器(random access memory,RAM)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (24)

  1. 一种转址旁路缓存的维护方法,其特征在于,应用于电子设备,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一进程,所述第一进程当前包括M个第一线程;M个所述第一线程当前分别运行在多个所述物理CPU中的M个物理CPU上;M为大于或者等于1的整数;所述方法包括:
    确定所述第一进程当前对应的物理CPU范围S1,所述物理CPU范围S1包括当前运行有所述第一进程内的所述第一线程的M个所述物理CPU;
    基于所述第一进程维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。
  2. 根据权利要求1所述的方法,其特征在于,M个所述物理CPU包括第一物理CPU和M-1个第二物理CPU;其中,在所述第一线程运行在所述第一物理CPU上之前,第二线程运行在所述第一物理CPU上,并且M个所述第一线程中的M-1个所述第一线程分别运行在M-1所述第二物理CPU上;所述方法还包括:
    在所述第一物理CPU上的线程由所述第二线程切换至所述第一进程中的所述第一线程后,判断所述第二线程是否属于所述第一进程;
    若所述第二线程不属于所述第一进程,则更新所述第一进程对应的物理CPU范围S2,得到当前的所述物理CPU范围S1;所述物理CPU范围S2包括更新前运行有所述第一进程内的所述第一线程的M-1个所述第二物理CPU。
  3. 根据权利要求2所述的方法,其特征在于,所述第二线程属于第二进程;在所述第一线程运行在所述第一物理CPU上之前,所述第二进程中的N个所述第二线程分别运行在所述第一物理CPU和多个所述物理CPU中的N-1个第三物理CPU上;N为大于或者等于1的整数;所述方法还包括:
    在所述第一物理CPU上的线程由所述第二线程切换至所述第一进程中的所述第一线程后,更新所述第二进程对应的物理CPU范围S3,得到物理CPU范围S4;所述物理CPU范围S3包括更新前运行有所述第二进程内的所述第二线程的所述第一物理CPU和N-1个所述第三物理CPU;所述物理CPU范围S4包括当前运行有所述第二进程内的所述第二线程的N-1个所述第三物理CPU。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    基于所述第一进程和所述第二进程各自对应的物理CPU范围的更新,将所述第一物理CPU对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S1;以及,将M-1个所述第二物理CPU各自对应的物理CPU范围由所述物理CPU范围S2更新为所述物理CPU范围S1;以及,将N-1个所述第三物理CPU各自对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S4。
  5. 根据权利要求4所述的方法,其特征在于,所述电子设备中存储有物理CPU范围信息;所述物理CPU范围信息当前至少包括M个所述物理CPU各自对应的所述物理CPU范围S1,以及N-1个所述第三物理CPU各自对应的所述物理CPU范围S4。
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述第一进程维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息,包括:
    在所述第一进程维护的页表信息被所述M个第一物理CPU中的目标物理CPU当前正在运行的第一线程修改后,基于修改后的所述页表信息,更新所述目标物理CPU维护的TLB信息;
    通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求;所述TLB刷新请求用于所述第一物理范围内的其余物理CPU同步更新各自维护的TLB信息,以使得所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
  7. 根据权利要求6所述的方法,其特征在于,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:
    通过所述目标物理CPU向核间互联网络发送所述TLB刷新请求;所述核间互联网络为总线或者片上网络NOC;
    通过所述核间互联网络接收所述TLB刷新请求,确定所述TLB刷新请求对应所述目标物理CPU,并从所述物理CPU范围信息获取所述目标物理CPU对应的所述物理CPU范围S1;
    通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
  8. 根据权利要求6所述的方法,其特征在于,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:
    通过所述目标物理CPU从所述物理CPU范围信息获取所述目标物理CPU对应的所述物理CPU范围S1,并向核间互联网络发送TLB刷新请求;所述TLB刷新请求携带有与所述物理CPU范围S1相关的指示信息;所述核间互联网络为总线或者片上网络NOC;
    通过所述核间互联网络接收所述TLB刷新请求,并根据所述TLB刷新请求确定所述物理CPU范围S1;
    通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
  9. 根据权利要求1-8任意一项所述的方法,其特征在于,所述方法还包括:
    接收所述物理CPU范围S1内的M-1个所述物理CPU各自发送的反馈信号,基于所述反馈信号确定所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
  10. 根据权利要求6-9任意一项所述的方法,其特征在于,所述TLB刷新请求携带有对应的TLB刷新信息;所述TLB刷新信息包括所述第一进程对应的进程标识符、修改后的所述页表信息对应的虚拟地址和虚拟地址范围中的一种或多种;所述TLB刷新请求,具体用于所述物理CPU范围S1内的其余物理CPU基于所述TLB刷新信息,在保持运行各自线程的情况下,通过硬件更新各自维护的TLB信息。
  11. 一种转址旁路缓存的维护方法,其特征在于,应用于电子设备,所述电子设备包括多 个物理中央处理器CPU;所述电子设备上运行有第一虚拟机,所述第一虚拟机当前包括M个第一虚拟CPU;M个所述第一虚拟CPU当前分别运行在多个所述物理CPU中的M个物理CPU上;M为大于或者等于1的整数;所述方法包括:
    确定所述第一虚拟机对应的物理CPU范围S1,所述物理CPU范围S1包括当前运行有所述第一虚拟机内的所述第一虚拟CPU的M个所述物理CPU;
    基于所述第一虚拟机维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的转址旁路缓存TLB信息。
  12. 根据权利要求11所述的方法,其特征在于,M个所述物理CPU包括第一物理CPU和M-1个第二物理CPU;其中,在所述第一虚拟CPU运行在所述第一物理CPU上之前,第二虚拟CPU运行在所述第一物理CPU上,并且M个所述第一虚拟CPU中的M-1个所述第一虚拟CPU分别运行在M-1个所述第二物理CPU上;所述方法还包括:
    在所述第一物理CPU上的虚拟CPU由所述第二虚拟CPU切换至所述第一虚拟机中的所述第一虚拟CPU后,判断所述第二虚拟CPU是否属于所述第一虚拟机;
    若所述第二虚拟CPU不属于所述第一虚拟机,则更新所述第一虚拟机对应的物理CPU范围S2,得到当前的所述物理CPU范围S1;所述物理CPU范围S2包括更新前运行有所述第一虚拟机内的所述第一虚拟CPU的M-1个所述第二物理CPU。
  13. 根据权利要求12所述的方法,其特征在于,所述第二虚拟CPU属于第二虚拟机;在所述第一虚拟CPU运行在所述第一物理CPU上之前,所述第二虚拟机中的N个所述第二虚拟CPU分别运行在所述第一物理CPU和多个所述物理CPU中的N-1个第三物理CPU上;N为大于或者等于1的整数;所述方法还包括:
    在所述第一物理CPU上的虚拟CPU由所述第二虚拟CPU切换至所述第一虚拟机中的所述第一虚拟CPU后,更新所述第二虚拟机对应的物理CPU范围S3,得到物理CPU范围S4;所述物理CPU范围S3包括更新前运行有所述第二虚拟机内的所述第二虚拟CPU的所述第一物理CPU和N-1个所述第三物理CPU;所述物理CPU范围S4包括当前运行有所述第二虚拟机内的所述第二虚拟CPU的N-1个所述第三物理CPU。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    基于所述第一虚拟机和所述第二虚拟机各自对应的物理CPU范围的更新,将所述第一物理CPU对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S1;以及,将M-1个所述第二物理CPU各自对应的物理CPU范围由所述物理CPU范围S2更新为所述物理CPU范围S1;以及,将N-1个所述第三物理CPU各自对应的物理CPU范围由所述物理CPU范围S3更新为所述物理CPU范围S4。
  15. 根据权利要求14所述的方法,其特征在于,所述电子设备中存储有物理CPU范围信息;所述物理CPU范围信息当前至少包括M个所述物理CPU各自对应的所述物理CPU范围S1,以及N-1个所述第三物理CPU各自对应的所述物理CPU范围S4。
  16. 根据权利要求15所述的方法,其特征在于,所述基于所述第一虚拟机维护的页表信息,更新所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息,包括:
    在所述第一虚拟机维护的页表信息被所述M个第一物理CPU中的目标物理CPU当前正在运行的第一虚拟CPU修改后,基于修改后的所述页表信息,更新所述目标物理CPU维护的TLB信息;
    通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求;所述TLB刷新请求用于所述第一物理范围内的其余物理CPU同步更新各自维护的TLB信息,以使得所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
  17. 根据权利要求16所述的方法,其特征在于,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:
    通过所述目标物理CPU向核间互联网络发送所述TLB刷新请求;所述核间互联网络为总线或者片上网络NOC;
    通过所述核间互联网络接收所述TLB刷新请求,确定所述TLB刷新请求对应所述目标物理CPU,并从所述物理CPU范围信息中获取所述目标物理CPU对应的所述物理CPU范围S1;
    通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
  18. 根据权利要求16所述的方法,其特征在于,所述通过所述目标物理CPU向所述物理CPU范围S1内的其余物理CPU发送TLB刷新请求,包括:
    通过所述目标物理CPU从所述物理CPU范围信息中获取所述目标物理CPU对应的所述物理CPU范围S1,并向核间互联网络发送TLB刷新请求;所述TLB刷新请求携带有与所述物理CPU范围S1相关的指示信息;所述核间互联网络为总线或者片上网络NOC;
    通过所述核间互联网络接收所述TLB刷新请求,并根据所述TLB刷新请求确定所述物理CPU范围S1;
    通过所述核间互联网络向所述物理CPU范围S1内的其余物理CPU发送所述TLB刷新请求。
  19. 根据权利要求11-18任意一项所述的方法,其特征在于,所述方法还包括:
    接收所述物理CPU范围S1内的M-1个所述物理CPU各自发送的反馈信号,基于所述反馈信号确定所述物理CPU范围S1内的所有物理CPU各自维护的TLB信息一致。
  20. 根据权利要求16-19任意一项所述的方法,其特征在于,所述TLB刷新请求携带有对应的TLB刷新信息;所述TLB刷新信息包括所述第一虚拟机对应的虚拟机标识符、修改后的所述页表信息对应的虚拟机中的虚拟地址和虚拟地址范围中的一种或多种;所述TLB刷新请求,具体用于所述物理CPU范围S1内的其余物理CPU基于所述TLB刷新信息,在保持运行各自虚拟CPU的情况下,通过硬件更新各自维护的TLB信息。
  21. 一种电子设备,其特征在于,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一进程,所述第一进程当前包括M个第一线程,M个所述第一线程当前分别运行在所述多个物理CPU中的M个物理CPU上;M为大于1的整数;其中,M个所述物理CPU,用于实现如权利要求1-10所述的方法。
  22. 一种电子设备,其特征在于,所述电子设备包括多个物理中央处理器CPU;所述电子设备上运行有第一虚拟机,所述第一虚拟机当前包括M个第一虚拟CPU,M个所述第一虚拟CPU当前分别运行在所述多个物理CPU中的M个物理CPU上;M为大于1的整数;其中,M个所述物理CPU,用于实现如权利要求11-20所述的方法。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被计算机或处理器执行时实现上述权利要求1-10或权利要求11-20所述的方法。
  24. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被计算机或处理器执行时,使得所述计算机或所述处理器执行如权利要求1-10或权利要求11-20所述的方法。
PCT/CN2022/126013 2021-11-27 2022-10-18 一种转址旁路缓存的维护方法及相关设备 WO2023093380A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111438805.0A CN116185899A (zh) 2021-11-27 2021-11-27 一种转址旁路缓存的维护方法及相关设备
CN202111438805.0 2021-11-27

Publications (1)

Publication Number Publication Date
WO2023093380A1 true WO2023093380A1 (zh) 2023-06-01

Family

ID=86444745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126013 WO2023093380A1 (zh) 2021-11-27 2022-10-18 一种转址旁路缓存的维护方法及相关设备

Country Status (2)

Country Link
CN (1) CN116185899A (zh)
WO (1) WO2023093380A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172099A1 (en) * 2004-01-17 2005-08-04 Sun Microsystems, Inc. Method and apparatus for memory management in a multi-processor computer system
US20110161620A1 (en) * 2009-12-29 2011-06-30 Advanced Micro Devices, Inc. Systems and methods implementing shared page tables for sharing memory resources managed by a main operating system with accelerator devices
CN104021344A (zh) * 2014-05-14 2014-09-03 南京大学 一种用于收集和截获计算机内存行为的蜜罐机制及其方法
CN104021063A (zh) * 2014-05-14 2014-09-03 南京大学 一种基于硬件虚拟化的模块化计算机取证系统及其方法
CN113032101A (zh) * 2021-03-31 2021-06-25 深信服科技股份有限公司 虚拟机的资源分配方法、服务器及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172099A1 (en) * 2004-01-17 2005-08-04 Sun Microsystems, Inc. Method and apparatus for memory management in a multi-processor computer system
US20110161620A1 (en) * 2009-12-29 2011-06-30 Advanced Micro Devices, Inc. Systems and methods implementing shared page tables for sharing memory resources managed by a main operating system with accelerator devices
CN104021344A (zh) * 2014-05-14 2014-09-03 南京大学 一种用于收集和截获计算机内存行为的蜜罐机制及其方法
CN104021063A (zh) * 2014-05-14 2014-09-03 南京大学 一种基于硬件虚拟化的模块化计算机取证系统及其方法
CN113032101A (zh) * 2021-03-31 2021-06-25 深信服科技股份有限公司 虚拟机的资源分配方法、服务器及计算机可读存储介质

Also Published As

Publication number Publication date
CN116185899A (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
US10846145B2 (en) Enabling live migration of virtual machines with passthrough PCI devices
US10552337B2 (en) Memory management and device
US7383374B2 (en) Method and apparatus for managing virtual addresses
US11947458B2 (en) Using cache coherent FPGAS to track dirty cache lines
US9720846B2 (en) Memory swap for direct memory access by a device assigned to a guest operating system
US9772962B2 (en) Memory sharing for direct memory access by a device assigned to a guest operating system
US9218302B2 (en) Page table management
US8793528B2 (en) Dynamic hypervisor relocation
US11036661B2 (en) Directed interrupt virtualization
US8397219B2 (en) Method and apparatus for tracking enregistered memory locations
US7234038B1 (en) Page mapping cookies
US20200034176A1 (en) Using cache coherent fpgas to accelerate post-copy migration
EP2955634B1 (en) Paravirtualization-based interface for memory management in virtual machines
US20200242036A1 (en) Failure-atomic logging for persistent memory systems with cache-coherent fpgas
US20220269615A1 (en) Cache-based trace logging using tags in system memory
WO2016015583A1 (zh) 一种内存管理方法、装置以及内存控制器
WO2024078342A1 (zh) 内存交换方法、装置、计算机设备及存储介质
WO2023155694A1 (zh) 内存换页方法、系统及存储介质
WO2023093380A1 (zh) 一种转址旁路缓存的维护方法及相关设备
US11544194B1 (en) Coherence-based cache-line Copy-on-Write
US11880309B2 (en) Method and system for tracking state of cache lines
US11860792B2 (en) Memory access handling for peripheral component interconnect devices
US20230342282A1 (en) Memory page markings as logging cues for processor-based execution tracing
US20230027307A1 (en) Hypervisor-assisted transient cache for virtual machines
US12020053B2 (en) Exposing untrusted devices to virtual machines

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897463

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022897463

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022897463

Country of ref document: EP

Effective date: 20240605