WO2022170452A1 - 一种访问远端资源的系统及方法 - Google Patents

一种访问远端资源的系统及方法 Download PDF

Info

Publication number
WO2022170452A1
WO2022170452A1 PCT/CN2021/076161 CN2021076161W WO2022170452A1 WO 2022170452 A1 WO2022170452 A1 WO 2022170452A1 CN 2021076161 W CN2021076161 W CN 2021076161W WO 2022170452 A1 WO2022170452 A1 WO 2022170452A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
address
remote node
remote
local node
Prior art date
Application number
PCT/CN2021/076161
Other languages
English (en)
French (fr)
Inventor
程传宁
程中武
石伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21925135.2A priority Critical patent/EP4276638A4/en
Priority to PCT/CN2021/076161 priority patent/WO2022170452A1/zh
Priority to CN202180091402.5A priority patent/CN116745754A/zh
Publication of WO2022170452A1 publication Critical patent/WO2022170452A1/zh
Priority to US18/366,889 priority patent/US20230388371A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/12Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
    • G06F13/124Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine
    • G06F13/128Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine for dedicated transfers to a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • G06F13/1626Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests
    • G06F13/1631Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests through address comparison
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses

Definitions

  • the present application relates to the field of data access, and in particular, to a system and method for accessing remote resources.
  • the entire scale-out system consists of multiple nodes.
  • Each node is configured with an independent processor, memory, hard disk, and other devices.
  • Each node in the system usually requires close cooperation to complete tasks. Therefore, nodes often need to access each other's resources.
  • the local node reads the data of the remote node as an example: when the remote node receives the instruction to read the data, it first copies the local data to the socket buffer of the processor of the remote node. , and then encapsulate the data packet in the processor of the remote node. After processing through a series of multi-layer network protocols, the remote node sends the processed data packet to the buffer of the network card, and processes the data through the network. The resulting data packet is sent to the network card of the local node, and then goes through multiple layers of network analysis to the local node to obtain the data of the remote node, so that the data reading is completed. Since the above-mentioned technology requires the processor of the remote node and the local node to participate in the processing, the prior art has a complicated process and low processing performance compared to directly accessing the local memory.
  • the local memory can also access the memory of the remote node through the remote direct memory access (RDMA) method, but if the hard disk of the local node needs to access the remote node memory, the processor of the local node needs to execute the access command, issue the work request, write the doorbell and other operations. The steps are still very cumbersome, and the use of RDMA technology also needs to be processed by the local node.
  • the processor executes the software transmission interface and performs asynchronous scheduling and other operations to complete the memory access, thereby increasing the delay and increasing the consumption of memory bandwidth. Therefore, the processing efficiency is also very low.
  • the present application provides a system and method for accessing remote resources.
  • the local node can directly access the resources in the remote memory, hard disk and other devices through the issued access resource request to avoid
  • the remote resource access delay is relatively large, the process is cumbersome and the processing efficiency is low.
  • the present application provides a system for accessing remote resources, the system includes a local node and a remote node; the local node is configured to: determine a resource access request, where the resource access request includes an access resource operation and all the address of the local node pointed to by the access resource operation; determine the address of the remote node corresponding to the address of the local node; send an operation message to the remote node, where the operation message includes the the access resource operation and the address of the remote node; receive the operation result message sent by the remote node, and determine the operation result according to the operation result message; the remote node is used for: receiving the operation message; execute the resource access operation included in the operation message to obtain the operation result, and send the operation result message to the local node, where the operation result message includes the operation result .
  • the local node determines the access resource request, it determines the access resource operation carried in the access resource request and the address of the local node, and determines the address of the remote node according to the address of the local node.
  • the address of the end node is converted into a packet that can be transmitted on the network, and the packet is sent to the remote node.
  • the remote node can directly perform resource operations on the address of the remote node, and transfer the operation to the remote node.
  • the result is returned to the local node as a message, so that the memory, hard disk, and other resources on the remote node can be accessed by directly delivering resource operations, reducing access delay, simplifying access steps, and improving processing efficiency.
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node to which the address of the remote node belongs; the local node, in When the operation message is sent to the remote node, it is specifically used for: sending the operation message to the remote node indicated by the remote node number.
  • the remote node when there are multiple remote nodes in the system for accessing remote resources, the remote node can be determined according to the remote node number to send the resource access request to avoid sending the operation message to Multiple remote nodes to improve access efficiency.
  • the address of the local node is the virtual address of the local node
  • the address of the remote node is the virtual address of the remote node
  • the local node is further configured to: establish all a first mapping relationship between the virtual address of the local node and the physical address of the local node, and establishing a second mapping relationship between the physical address of the local node and the virtual address of the remote node
  • the local node in When determining the address of the remote node corresponding to the address of the local node, it is specifically used for: determining the physical address of the local node corresponding to the virtual address of the local node according to the first mapping relationship; The second mapping relationship determines the virtual address of the remote node corresponding to the physical address of the local node.
  • the user space in the operating system of the local node can be associated with the physical address space of the local node, as well as the physical address space of the local node and the user space in the operating system of the remote node.
  • the local node accesses
  • the remote node can be accessed by directly issuing resource operations.
  • the memory, hard disk and other resources on the node reduce the access delay and simplify the access steps.
  • the remote node is further configured to: establish a third mapping relationship between the virtual address of the remote node and the physical address of the remote node; When operating the resource included in the operation message, it is specifically used for: determining the physical address of the remote node corresponding to the virtual address of the remote node according to the third mapping relationship; The physical address of the node, and the operation of accessing the resource contained in the operation packet is performed.
  • the remote node can also associate the user space in the operating system of the remote node with the physical address space of the remote node, thereby improving the efficiency of processing resource access operations.
  • the virtual address of the local node includes: the virtual page number and offset of the local node, and the virtual address of the remote node includes the virtual page number and offset of the remote node Offset, the offset contained in the virtual address of the local node is the same as the offset contained in the virtual address of the remote node; the local node, when determining the remote corresponding to the address of the local node
  • the address of the end node it is specifically used to: determine the physical page number of the local node corresponding to the virtual page number of the local node according to the first mapping relationship; determine the physical page number of the local node according to the second mapping relationship;
  • the virtual page number of the remote node corresponding to the physical page number of the node; the virtual address of the remote node is determined according to the offset in the address of the local node and the virtual page number of the remote node.
  • the resource operation includes at least one of the following: processor read operation load, processor write operation store, processor atomic operation atomic, and DMA access.
  • the system for accessing remote resources provided by the present application may be a storage system supporting the CHI protocol.
  • the first mapping relationship is stored in a first page table
  • the second mapping relationship is stored in a second page table
  • the storage format of the table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the third mapping relationship is stored in a third page table, and the storage format of the third page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the first mapping relationship, the second mapping relationship and the third mapping relationship are stored in the page table in the format of a page table entry PTE or a page table pointer PTP, and the mapping relationship can be discretely stored in this format. In the page table, thereby saving the storage space of the page table.
  • the present application provides a method for accessing remote resources, the method comprising:
  • the access resource request includes an access resource operation and the address of the local node pointed to by the access resource operation; determine the address of the remote node corresponding to the address of the local node; send the operation message to For the remote node, the operation message includes the resource access operation and the address of the remote node; and receives an operation result message including an operation result sent by the remote node, and the operation result is the It is determined by the remote node executing the resource access operation included in the operation message.
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node to which the address of the remote node belongs; the sending an operation message to the remote node, Specifically, it includes: sending the operation message to the remote node indicated by the remote node number.
  • the address of the local node is the virtual address of the local node
  • the address of the remote node is the virtual address of the remote node
  • the method further includes: establishing the local node a first mapping relationship between the virtual address of the node and the physical address of the local node, and establishing a second mapping relationship between the physical address of the local node and the virtual address of the remote node; the determining and the address of the local node
  • the corresponding address of the remote node includes: determining the physical address of the local node corresponding to the virtual address of the local node according to the first mapping relationship; determining the physical address of the local node according to the second mapping relationship; The virtual address of the remote node corresponding to the physical address.
  • the virtual address of the local node includes: the virtual page number and offset of the local node, and the virtual address of the remote node includes the virtual page number and offset of the remote node
  • the resource accessing operation includes at least one of the following: a processor read operation load, a processor write operation store, a processor atomic operation atomic, and a direct data access DMA access.
  • the method further includes: storing the first mapping relationship in a first page table, storing the second mapping relationship in a second page table, the first page table And the storage format of the second page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the present application provides a method for accessing remote resources, the method comprising:
  • the address of the remote node is a virtual address of the remote node
  • the method further includes: establishing a relationship between the virtual address of the remote node and the physical address of the remote node the third mapping relationship;
  • the access resource operation included in the operation packet is performed.
  • the resource accessing operation includes at least one of the following:
  • Processor read operation load processor write operation store, processor atomic operation atomic and direct data access DMA access.
  • the method further includes:
  • the third mapping relationship is stored in a third page table, and the storage format of the third page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the present application provides a local node, where the local node includes:
  • an operation generating unit configured to determine an access resource request, where the access resource request includes an access resource operation and an address of a local node pointed to by the access resource operation;
  • An address determination unit configured to determine the address of the remote node corresponding to the address of the local node
  • an operation sending unit configured to send an operation message to the remote node, where the operation message includes the access resource operation and the address of the remote node;
  • a result determination unit configured to receive an operation result message including an operation result sent by the remote node, where the operation result is determined by the remote node performing the access resource operation included in the operation message .
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node to which the address of the remote node belongs;
  • the operation sending unit is specifically used for:
  • the address of the local node is the virtual address of the local node
  • the address of the remote node is the virtual address of the remote node
  • the local node further includes a local mapping determination unit, which is configured to establish a first mapping relationship between the virtual address of the local node and the physical address of the local node, and to establish the physical address of the local node.
  • the address determination unit is further configured to determine the physical address of the local node corresponding to the virtual address of the local node according to the first mapping relationship;
  • the virtual address of the remote node corresponding to the physical address of the local node is determined according to the second mapping relationship.
  • the virtual address of the local node includes: the virtual page number and offset of the local node, and the virtual address of the remote node includes the virtual page number and offset of the remote node offset, the offset contained in the virtual address of the local node is the same as the offset contained in the virtual address of the remote node;
  • the address determination unit is further configured to determine the physical page number of the local node corresponding to the virtual page number of the local node according to the first mapping relationship;
  • the virtual address of the remote node is determined according to the offset in the address of the local node and the virtual page number of the remote node.
  • the resource accessing operation includes at least one of the following:
  • Processor read operation load processor write operation store, processor atomic operation atomic and direct data access DMA access.
  • the local node further includes: a local mapping storage unit, the local mapping storage unit is configured to store the first mapping relationship in the first page table, and store the second mapping relationship in the first page table.
  • the relationship is stored in the second page table, and the storage formats of the first page table and the second page table include any one of the following: a page table entry PTE and a page table pointer PTP.
  • the present application provides a remote node, where the remote node includes an operation receiving unit configured to receive an operation message sent by the local node, where the operation message includes an access resource operation and an address of the remote node ; an execution sending unit, configured to execute the access resource operation to obtain an operation result, and send an operation result message including the operation result to the local node.
  • the address of the remote node is a virtual address of the remote node
  • the remote node further includes a remote mapping determination unit, the remote mapping determination unit is used to establish the a third mapping relationship between the virtual address of the remote node and the physical address of the remote node;
  • the execution sending unit configured to determine the physical address of the remote node corresponding to the virtual address of the remote node according to the third mapping relationship
  • the access resource operation included in the operation packet is performed.
  • the resource accessing operation includes at least one of the following:
  • Processor read operation load processor write operation store, processor atomic operation atomic and direct data access DMA access.
  • the remote node further includes: a remote mapping storage unit, the remote mapping storage unit is configured to store the third mapping relationship in a third page table, the The storage format of the three page tables includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the present application further provides a computer program product, the computer program product comprising: computer program code, when the computer program code is run on a computer, the computer is made to execute any one of the second aspect or the third aspect. method for accessing remote resources.
  • the present application further provides a computer-readable medium, where program codes are stored in the computer-readable medium, and when the program codes are executed on a computer, the computer is made to execute any one of the second aspect or the third aspect. method for accessing remote resources.
  • the present application provides a chip, including at least one processor and an interface; the interface is configured to provide program instructions or data for the at least one processor; the at least one processor is configured to execute the program line
  • the instructions implement the method for accessing a remote resource according to any one of the second aspect or the third aspect.
  • 1A is a schematic diagram of the architecture of a system for accessing remote resources
  • Fig. 1B is a kind of mapping schematic diagram of the address of a local node and the address of a remote node;
  • 1C is a schematic diagram of a specific architecture of a system for accessing remote resources
  • Fig. 2 is a schematic flowchart of a local node delivering an access resource operation to a remote node
  • 3A is a schematic structural diagram of a first operating device
  • 3B is a schematic structural diagram of a second operating device
  • FIG. 4 is a schematic flow diagram of an example applied to a system for accessing remote resources
  • FIG. 5 is a schematic flowchart of a method for accessing remote resources
  • FIG. 6 is a schematic structural diagram of a local node in a system for accessing remote resources
  • FIG. 7 is a schematic structural diagram of a remote node in a system for accessing remote resources.
  • Atomic operation refers to an independent and indivisible operation. In a single-core environment, threads are not switched during atomic operations in the general sense. Thread switching is either before or after the atomic operation is completed. In a broader sense, an atomic operation refers to a series of operation steps that must be completed as a whole. If any operation is not completed, then all completed steps must be rolled back, so as to ensure that either all operation steps are not completed, or all operations steps are completed.
  • a single machine instruction in a single-core system, can be regarded as an atomic operation; in a multi-core system, a single machine instruction is not an atomic operation, because in a multi-core system, multiple instruction streams run in parallel, and one core is executing When an instruction is executed, instructions executed by other cores at the same time may operate on the same memory area, resulting in a data race phenomenon. All operations performed in a single instruction can be considered atomic operations.
  • Memory mapping refers to the one-to-one correspondence between the location of the file on the hard disk and an area of the same size in the logical address space of the process. The purpose of this method is to reduce the copy operation of data between user space and kernel space. When a large amount of data needs to be transmitted, it is more efficient to use memory mapping to access files.
  • MMU memory management unit
  • PMMU paged memory management unit
  • PCIe is a general bus specification, which consists of Promoted and promoted, the design purpose is to replace the bus transmission interface inside the existing computer system.
  • Non-volatile memory standard (Non-Volatile Memory express, NVMe): NVMe is a specification of solid state disk (SSD) using PCI-e channel. NVMe has been fully utilized since the beginning of its design. The low latency and parallelism of PCI-e SSDs, as well as the parallelism of processors, platforms, and applications. The parallelism of SSD can be fully utilized by the hardware and software of the host. Compared with the advanced host controller interface (AHCI) standard, NVMe can bring various performance improvements.
  • AHCI advanced host controller interface
  • Memory-mapped I/O memory-mapped I/O, MMIO
  • MMIO memory-mapped I/O
  • I/O devices are placed in memory space instead of I/O space. From the processor's point of view, system devices are accessed like memory after memory-mapped I/O.
  • VA Virtual address
  • PTE Page table entry
  • Coherent hub interface (9)
  • the system architecture based on the CHI protocol can include independent CPUs, processor clusters, graphics processors, memory controllers, I/O bridges, PCIe subsystems, and CHI interconnect lines.
  • the components can be classified and named as follows: the request node (RN) is responsible for generating protocol operations (transactions), including reading and writing; the home node (HN) is used to receive data from The protocol operation generated by the RN; the slave node (SN) is used to receive the request from the HN, complete the corresponding operation and return a response.
  • the request node is responsible for generating protocol operations (transactions), including reading and writing
  • the home node (HN) is used to receive data from The protocol operation generated by the RN
  • the slave node (SN) is used to receive the request from the HN, complete the corresponding operation and return a response.
  • TLB Translation lookaside buffer
  • a remote direct data access (remote direct memory access, RDMA) technology can be implemented to access and call the memory of a remote node.
  • the local node can directly access the remote memory through an RDMA-aware network interface controller (RNIC), but when using Ethernet and IB networks to implement remote memory calls, additional switches, Complex conversion between adapters and protocol stacks to PCI-e (peripheral component interconnect-express, express peripheral component interconnect high-speed) protocol.
  • RNIC RDMA-aware network interface controller
  • the local node when it needs to access the remote node, it will create a channel connection.
  • the first and last endpoints of each channel are two pairs of queue pairs (queue pairs, QP).
  • the local node directly accesses the RNIC through the network card, and completes data processing. After the request, cooperate with the complete queue poll (complete queue poll, CQ poll) mechanism or interrupt mechanism to fetch the data.
  • RDMA technology provides a software transport interface (Verbs) to facilitate the local node to send a transmission request (work request, WR).
  • the WR describes the content of the message that it wants to transmit to the remote node, and the WR notifies a queue in the QP ( work queue, WQ), in WQ, the WR of the local node is converted into the format of Work Queue Element (WQE), and waits for the asynchronous scheduling analysis of RNIC, and finally gets the real message from the cache pointed to by WQE and returns it to the local node, so , using RDMA technology still needs to perform software transmission interface, asynchronous scheduling and other operations through the processor, and once processed and executed by the processor, it will increase the delay and increase the consumption of bandwidth, and the efficiency is also low.
  • QP work queue, WQ
  • WQE Work Queue Element
  • RDMA is a point-to-point protocol, which not only requires a dedicated network card (such as an Ethernet card or an IB network card) to be installed on each node, resulting in a high cost to implement the RDMA function, and even if RDMA technology is used, it still cannot be used for each node.
  • the resources of the nodes are allocated reasonably and efficiently.
  • the embodiments of the present application provide a system and method for accessing remote resources, which can directly and quickly access the memory, hard disk and other resources on the remote node by issuing access resource requests, thereby avoiding access delays.
  • the problem is that the time is relatively large, the process is more cumbersome and the processing efficiency is low.
  • each node of the storage system can have an independent hardware structure to independently implement the same or different services.
  • each node may have certain resources, and the resource size of each node may be the same or different, which is not specifically limited in this application.
  • a node that requests to access other remote nodes is named a local node
  • a node that can be accessed by other nodes to memory, processor, hard disk, and other space resources is named a remote node.
  • a node may function as a local node in a certain period of time and function as a remote node in another period of time, or a node may function as a local node and a node for another node at the same time. remote node.
  • FIG. 1A shows a schematic diagram of the architecture of a system for accessing remote resources.
  • the system includes: a local node 100 and a remote node 110 .
  • the local node 100 is used to: determine an access resource request, where the access resource request includes an access resource operation and the address of the local node 100 pointed to by the access resource operation; determine the address corresponding to the local node 100 address of the remote node 110; send an operation message to the remote node 110, the operation message includes the access resource operation and the address of the remote node 110; receive the remote node 110 send the operation result message, and determine the operation result according to the operation result message;
  • the remote node 110 is configured to: receive the operation message; execute the resource access operation included in the operation message to obtain the operation result, and send the operation result message to the local Node 100, the operation result message includes the operation result.
  • the local node 100 and the remote node 110 can both be regarded as nodes in the storage system.
  • the local node 100 can access the remote node by delivering an operation message carrying an operation to access resources. 110 resources.
  • the local node 100 can send an operation message to obtain the computing data to the remote node 110, thereby obtaining Computational data on the remote node, and then complete the computing task.
  • a request to access a resource may include the following types of operations: a processor read (load) operation, a processor write (store) operation, a processor atomic (atomic) operation, and a direct data access DMA access, and the access resources are accessed through the The address of the local node 100 contained in the request can perform a resource access operation on the address of the remote node 110 corresponding to the address of the local node 100 .
  • the address of the local node 100 may be either the virtual address of the local node 100 or the physical address of the local node 100, and the address of the remote node 110 may be the virtual address of the remote node 110 or the remote address.
  • the physical address of the node 110; the operation result message is a message that can be transmitted over the network.
  • an operation message as an RDMA type of network message: an RDMA read message can be generated according to a load operation, an RDMA write message can be generated according to a store operation, and an RDMA atomic message can be generated according to an atomic operation;
  • the operation result message sent to the local node 100 is also a message that can be transmitted over the network.
  • the specific message generation method will not be repeated here, and those skilled in the art should know it.
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node 110 to which the address of the remote node 110 belongs; the local node 100.
  • the operation is specifically configured to: send the operation message to the remote node 110 indicated by the remote node number, when the remote node When there are more than one 110, it is determined according to the remote node number to which remote node 110 the resource access request is to be delivered.
  • the address of the local node 100 is the virtual address of the local node 100
  • the address of the remote node 110 is the virtual address of the remote node 110
  • the local node 100 also used for: establishing a first mapping relationship between the virtual address of the local node 100 and the physical address of the local node 100, and establishing a second mapping relationship between the physical address of the local node 100 and the virtual address of the remote node 110 mapping relationship; when determining the address of the remote node 110 corresponding to the address of the local node 100, the local node 100 is specifically configured to: determine the address of the local node 100 according to the first mapping relationship The physical address of the local node 100 corresponding to the virtual address; the virtual address of the remote node 110 corresponding to the physical address of the local node 100 is determined according to the second mapping relationship.
  • the virtual address of the local node 100 is located in the virtual address space of the local node 100, the virtual address space is a part of the user space of the operating system of the local node 100, and the physical address of the local node 100 is located in the local node 100.
  • the physical address space of the local node 100 the physical address space is a part of the resource space of the local node 100 .
  • the virtual address of the remote node 110 exists in the virtual address space of the remote node 110 , and the virtual address space is a part of the user space of the operating system of the remote node 110 .
  • the local node 100 may establish a first mapping relationship between the virtual address of the local node 100 and the physical address of the local node 100, and establish the physical address of the local node 100 and the remote node 110
  • the second mapping relationship of the virtual address once the mapping is established, the user space in the operating system of the local node 100 and the physical address space of the local node 100, and the physical address space of the local node 100 and the operating system of the remote node 110.
  • the remote node 110 is further configured to: establish a third mapping relationship between the virtual address of the remote node 110 and the physical address of the remote node 110; the remote node 110 , when executing the resource operation included in the operation packet, specifically for: determining the physical address of the remote node 110 corresponding to the virtual address of the remote node 110 according to the third mapping relationship ; According to the physical address of the remote node 110, execute the access resource operation included in the operation message.
  • the remote node 110 may also associate the user space in the operating system of the remote node 110 with the physical address space of the remote node 110, and the specific mapping method should be known by those skilled in the art.
  • the virtual address of the local node 100 includes: the virtual page number and offset of the local node 100
  • the virtual address of the remote node 110 includes the virtual address of the remote node 110 Page number and offset, the offset contained in the virtual address of the local node 100 is the same as the offset contained in the virtual address of the remote node 110;
  • the address of 100 corresponds to the address of the remote node 110
  • it is specifically used to: determine the physical page number of the local node 100 corresponding to the virtual page number of the local node 100 according to the first mapping relationship;
  • the second mapping relationship determines the virtual page number of the remote node 110 corresponding to the physical page number of the local node 100; according to the offset in the address of the local node 100 and the remote node 110
  • the virtual page number of the remote node 110 is determined.
  • 1B is a schematic diagram of mapping from the address of the local node 100 to the address of the remote node 110; the access resource request carries the address of the local node 100, and the local node 100 according to the local node 100
  • the virtual page number of the local node 100 determines the physical page number of the local node 100, and the remote node number corresponding to the physical page number of the local node 100 and the virtual page number of the remote node 110 are queried in the page table. According to the virtual page number of the remote node 110 The page number and the offset yield the address of the remote node 110 .
  • the local node is further configured to: store the first mapping relationship in a first page table, store the second mapping relationship in a second page table, and the first
  • the storage formats of the page table and the second page table include any one of the following: a page table entry PTE and a page table pointer PTP.
  • the remote node is further configured to: store the third mapping relationship in a third page table, where the storage format of the third page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • one PTE corresponds to one page table entry, for example, one PTE has a 32-bit address, where the 20th to 31st bits may store the address corresponding to the PTE.
  • the page table entry can store the address of the remote node 110, and can also store other information of the remote node. For example, when the type of the operation message is RDMA, the page table entry can also include a queue sequence number (QP number). )Wait.
  • QP number queue sequence number
  • the embodiment of the present application provides a system for accessing remote resources, which is used to solve the problem that the resources in the remote memory, hard disk and other devices cannot be directly accessed currently through the access resource request directly issued by the local node.
  • the operation message is carried to the access resource operation, and the access resource operation is sent to the remote node, so that the local node can realize the remote node.
  • Direct access to resources can reduce access delay and simplify access steps when accessing resources of remote nodes, thereby improving processing efficiency.
  • FIG. 1C shows a schematic diagram of a specific architecture of a system for accessing remote resources.
  • the local node 100 may include a first memory 102 and a processor 103
  • the remote node 110 includes a first memory 102 and a processor 103.
  • the local node 100 is provided with the first operating device 101
  • the remote The second operating device 111 is provided on the end node 110, and the first operating device 101 and the second operating device 111 can communicate with each other.
  • the first operation device 101 transmits the operation message carrying the resource access operation to the second operation device 111 , so that the resource access operation issued by the first memory 102 or the processor 103 can be executed on the second memory 112 .
  • the processor 103 is the control center of the local node 100, uses various interfaces and lines to connect various parts of the entire node, runs or executes the software programs and/or modules stored in the first memory 102, and calls the storage Data within the first memory 102 to perform various functions of the computer system and/or process data.
  • the processor 103 may be composed of an integrated circuit (IC), for example, may be composed of a single packaged IC, or may be composed of a plurality of packaged ICs connected with the same function or different functions.
  • the processor 103 may be at least one central processing unit (central processing unit, CPU for short). It can also be the processor of a virtual machine.
  • first memory 102 and second memory 112 are used to store program instructions, data, and the like. It can be understood that the first memory 102 and the second memory 112 in the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • DR RAM direct rambus RAM
  • the first memory 102 and the second memory 112 in this application may also be various types of magnetic disks, hard disks, U disks, mobile hard disks, optical disks, solid state disks (SSD) or other non-volatile memories.
  • SSD solid state disks
  • the first memory 102 and the second memory 112 of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memory.
  • FIG. 2 shows a schematic flow chart describing the perspective from the perspective of the local node 100 issuing the access resource request to the remote node 110 .
  • the method includes steps S201-S207.
  • the processor 103/the first memory 102 sends an access resource request to the first operating device 101, where the resource access request includes an access resource operation and an address of the local node 100 pointed to by the access resource operation.
  • the resource access request may be issued by the first memory 102 in the local node 100 or by the processor 103 in the local node 100 .
  • the first operating device 101 determines, according to the address of the local node 100, the address of the remote node 110 corresponding to the address of the local node 100.
  • the first operating device 101 may determine, according to the address of the local node 100, the address of the remote node 110 corresponding to the address of the local node 100, wherein the first operating device 101 determines that the address of the local node 100 corresponds to the address of the local node 100.
  • the access resource request further includes: a remote node number, and when there are multiple remote nodes 110, the remote node to which the access resource request is delivered is determined according to the remote node number.
  • the first operation device 101 sends the operation message to the second operation device 111 on the remote node 110 .
  • the first operation device 101 generates an operation packet that can be transmitted over the network according to the load/store/atomic operation issued by the local node 100 .
  • an operation message as a network message of an RDMA type: generating an RDMA read message according to a load operation issued by the first memory 102, and generating an RDMA write message according to a store operation issued by the first memory 102.
  • message, and an RDMA atomic message is generated according to the atomic operation issued by the first memory 102.
  • the second operation device 111 receives the operation message, and sends the resource access operation to the second memory 112 .
  • the second operation device 111 receives the operation message, obtains the access resource request issued by the first memory 102, obtains the access resource operation and the address of the remote node 110, and sends the access resource operation to the On the second memory 112, in this way, through the first operating device 101 and the second operating device 111, the local node 100 can directly issue an access resource request to the second memory 112 of the remote node 110, thereby realizing mutual access between nodes.
  • the second memory 112 performs the resource access operation included in the operation message to obtain the operation result, and sends the operation result to the second operation device 111 .
  • the second memory 112 on the remote node 110 After the second memory 112 on the remote node 110 receives the resource access operation and the address of the remote node 110 , executes the resource access operation, and sends the operation result of the resource access request to the second operating device 111 .
  • the access resource request is a load operation
  • the corresponding data is read from the address of the remote node 110 and returned;
  • the access resource request is a store operation, the write data carried in the access resource request is written. Data is written into the remote node 110 address.
  • the second operation device 111 sends the operation result message to the first operation device 101 .
  • an operation result message for network transmission can also be generated.
  • the specific message generation method will not be repeated here, and those skilled in the art should know.
  • the first operation device 101 receives the operation result message sent by the remote node, determines the operation result according to the operation result message, and returns the operation result to the first operation result that issued the resource access request.
  • a memory 102/processor 103 A memory 102/processor 103.
  • the operation result is returned to the first memory 102 of the local node 100.
  • the processor 103 in 100 delivers the operation result, the operation result is returned to the processor 103 of the local node 100 .
  • the resource access request issued by the local node is converted into a message that can be transmitted over the network by the first operating device, and sent by the first operating device.
  • the second operating device parses the message transmitted by the network, and directly sends the resource access request to the memory of the remote node, so that the local node can directly access the remote node. Simplify access steps to improve processing efficiency.
  • FIG. 3A shows a schematic diagram of the architecture of the first operating device 300 .
  • the first operating device 300 includes: a first instruction processing unit 301 , a remote memory management unit 302 and a first packet Transmission unit 303.
  • the first instruction processing unit 301 is configured to receive an access resource request sent by the local node 100, where the resource access request includes an access resource operation and the address of the local node 100; send the address of the local node 100 to the remote memory
  • the management unit 302 receives the operation result message sent by the first message transmission unit 303 , determines the operation result according to the operation result message, and sends the operation result to the local node 100 .
  • the remote memory management unit 302 is configured to determine the address of the remote node 110 corresponding to the address of the local node 100 according to the address of the local node 100;
  • the first message transmission unit 303 is configured to send the operation message to the remote node 110; receive the operation result message sent by the remote node 110, and determine the operation result according to the operation result message.
  • the first instruction processing unit 301 can not only receive the access resource request issued by the first memory 102 in the local node 100, but also receive the request from the local node 100 The access resource request issued by the processor 103 of the
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node 110 to which the address of the remote node 110 belongs; the first The instruction processing unit 301 is further configured to send the operation message to the remote node 110 indicated by the remote node number, and when there are multiple remote nodes 110, determine according to the remote node number The remote node 110 to which the resource access request is delivered.
  • the address of the local node 100 is the virtual address of the local node 100
  • the address of the remote node 110 is the virtual address of the remote node 110
  • the local node 100 is established.
  • the remote memory management unit 302 is specifically configured to: determine the physical address of the local node 100 corresponding to the virtual address of the local node 100 according to the first mapping relationship; determine the physical address corresponding to the local node 100 according to the second mapping relationship The corresponding virtual address of the remote node 110 .
  • the virtual address of the local node 100 includes: the virtual page number and offset of the local node 100
  • the virtual address of the remote node 110 includes the virtual address of the remote node 110 Page number and offset, the offset contained in the virtual address of the local node 100 is the same as the offset contained in the virtual address of the remote node 110
  • the remote memory management unit 302 is specifically used for: according to The first mapping relationship determines the physical page number of the local node 100 corresponding to the virtual page number of the local node 100; the physical page number corresponding to the local node 100 is determined according to the second mapping relationship.
  • the virtual page number of the remote node 110 is determined; the virtual address of the remote node 110 is determined according to the offset in the address of the local node 100 and the virtual page number of the remote node 110 .
  • the remote memory management unit 302 is further configured to store the first mapping relationship in the first page table, and store the second mapping relationship in the second page table, the The storage formats of the first page table and the second page table include any one of the following: a page table entry PTE and a page table pointer PTP.
  • the remote node is further configured to: store the third mapping relationship in a third page table, where the storage format of the third page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the first operating device 300 further includes a conversion look-aside buffer unit 304, and the conversion look-aside buffer unit 304 is configured to cache common page table entries in the first page table and the second page table .
  • the conversion look-aside buffer unit 304 is configured to cache common page table entries in the first page table and the second page table .
  • the translation look-aside buffer unit 304 stores frequently used page table entries, and the frequently used page table entries are a subset of the first page table or a subset of the second page table. In this way, the common page table entry can be searched in the translation bypass buffer unit 304 first to perform address translation, thereby improving the speed of address translation.
  • the present application further provides a second operation device 310 .
  • FIG. 3B shows a schematic structural diagram of the second operation device 310 .
  • the operation device 310 includes a second packet transmission unit 311 and a second instruction processing unit 312 .
  • the second message transmission unit 311 is configured to receive the operation message sent by the local node, the operation message includes the access resource operation and the address of the remote node, and send the access resource operation to the second instruction processing unit 312 ; Receive the operation result, and send the operation result message containing the operation result to the local node 100;
  • the second instruction processing unit 312 is configured to receive the resource access operation, execute the resource access operation to obtain the operation result, obtain the operation result, and send the operation result to the second message transmission unit 311;
  • the address of the remote node is the virtual address of the remote node
  • the second operating device 310 further includes a memory management unit 313, which is configured to create the a third mapping relationship between the virtual address of the remote node and the physical address of the remote node;
  • the second instruction processing unit 312 is configured to determine the physical address of the remote node corresponding to the virtual address of the remote node according to the third mapping relationship; the access resource operation contained in the operation packet.
  • the memory management unit 313 is configured to store the third mapping relationship in a third page table, and the storage format of the third page table includes any one of the following: page table entry PTE And the page table pointer PTP.
  • this embodiment provides an example applied to the system for accessing remote resources, in which the local node of the system for accessing remote resources issues a load operation to the remote node
  • the system is based on the architecture of the CHI bus and the dispatched access resource request will be converted into an RDMA type of message
  • Figure 4 shows a schematic flow diagram of an example applied to a system for accessing remote resources, wherein:
  • the processor or memory on the local node sends a non-listening read operation request readnosnp (load operation) to the first operating device, the first operating device determines the address of the local node, and queries the page table directory on the remote memory unit to obtain the readnosnp request Corresponding queue pair number (queue pairs number, QPN) and the address of the remote node; the first operating device converts the readnosnp request into an RDMA read operation and sends it to the second operating device; the second operating device converts the RDMA read into a read operation readonce, send the readonce to the memory of the remote node to obtain the data on the address of the remote node; receive the completion data compdata sent by the remote node, convert the compdata into an RDMA read response and send it to the first an operating device; after receiving the RDMA read response, the first operating device extracts the compdata, and returns the compdata to the processor or memory on the local node that issues the readnosnp request.
  • readnosnp load operation
  • QPN
  • the present application also provides a method for accessing remote resources.
  • the method is applied to the local node 100 and the remote node 110 described in the above embodiments. Referring to FIG. 5 , the method includes the following steps:
  • the local node 100 determines an access resource request, where the access resource request includes an access resource operation and an address of the local node pointed to by the access resource operation;
  • the local node 100 determines the address of the remote node corresponding to the address of the local node;
  • S503 The local node 100 sends an operation message to the remote node, where the operation message includes the resource access operation and the address of the remote node;
  • the remote node 110 receives an operation message sent by the local node, where the operation message includes an access resource operation and an address of the remote node;
  • the remote node 110 performs the resource access operation to obtain an operation result, and sends an operation result message including the operation result to the local node;
  • S506 The local node 100 receives an operation result message including an operation result sent by the remote node, where the operation result is determined by the remote node performing the resource access operation included in the operation message.
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node to which the address of the remote node belongs;
  • the sending the operation message to the remote node specifically includes: sending the operation message to the remote node indicated by the remote node number.
  • the address of the local node is the virtual address of the local node
  • the address of the remote node is the virtual address of the remote node
  • the method further includes: establishing the local node a first mapping relationship between the virtual address of the node and the physical address of the local node, and establishing a second mapping relationship between the physical address of the local node and the virtual address of the remote node; the determining and the address of the local node
  • the corresponding address of the remote node includes: determining the physical address of the local node corresponding to the virtual address of the local node according to the first mapping relationship; determining the physical address of the local node according to the second mapping relationship; The virtual address of the remote node corresponding to the physical address.
  • the virtual address of the local node includes: the virtual page number and offset of the local node, and the virtual address of the remote node includes the virtual page number and offset of the remote node
  • the address of the remote node is a virtual address of the remote node
  • the method further includes: establishing a relationship between the virtual address of the remote node and the physical address of the remote node a third mapping relationship; the executing the resource operation specifically includes: determining the physical address of the remote node corresponding to the virtual address of the remote node according to the third mapping relationship; according to the remote node The physical address of the node, and the operation of accessing the resource contained in the operation packet is performed.
  • the resource accessing operation includes at least one of the following: a processor read operation load, a processor write operation store, a processor atomic operation atomic, and a direct data access DMA access.
  • the method further includes: storing the first mapping relationship in a first page table, storing the second mapping relationship in a second page table, the first page table And the storage format of the second page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the method further includes: storing the third mapping relationship in a third page table, where the storage format of the third page table includes any one of the following: a page table entry PTE and a page Table Pointer PTP.
  • the present application also provides a local node, as shown in FIG. 6 , the local node 600 includes:
  • an operation generating unit 601 configured to determine an access resource request, where the access resource request includes an access resource operation and an address of a local node pointed to by the access resource operation;
  • An address determination unit 602 configured to determine the address of the remote node corresponding to the address of the local node
  • an operation sending unit 603, configured to send an operation message to the remote node, where the operation message includes the access resource operation and the address of the remote node;
  • a result determination unit 604 configured to receive an operation result message including an operation result sent by the remote node, where the operation result is determined by the remote node performing the access resource operation included in the operation message of.
  • the resource access request further includes: a remote node number, where the remote node number is used to indicate the remote node to which the address of the remote node belongs; the operation sending unit 603 , specifically for:
  • the address of the local node is the virtual address of the local node
  • the address of the remote node is the virtual address of the remote node
  • the local node further includes a local mapping determination unit 605, the local mapping determination unit 605 is configured to establish a first mapping relationship between the virtual address of the local node and the physical address of the local node, and to establish the local node.
  • the address determination unit 602 is further configured to determine the physical address of the local node corresponding to the virtual address of the local node according to the first mapping relationship;
  • the virtual address of the remote node corresponding to the physical address of the local node is determined according to the second mapping relationship.
  • the virtual address of the local node includes: the virtual page number and offset of the local node, and the virtual address of the remote node includes the virtual page number and offset of the remote node
  • the offset contained in the virtual address of the local node is the same as the offset contained in the virtual address of the remote node;
  • the address determining unit 602 is further configured to determine the physical page number of the local node corresponding to the virtual page number of the local node according to the first mapping relationship;
  • the virtual address of the remote node is determined according to the offset in the address of the local node and the virtual page number of the remote node.
  • the resource accessing operation includes at least one of the following:
  • Processor read operation load processor write operation store, processor atomic operation atomic and direct data access DMA access.
  • the local node further includes: a local mapping storage unit 606, which is configured to store the first mapping relationship in the first page table, and store the first mapping relationship in the first page table.
  • the binary mapping relationship is stored in the second page table, and the storage formats of the first page table and the second page table include any one of the following: a page table entry PTE and a page table pointer PTP.
  • the remote node 700 includes:
  • An operation receiving unit 701 configured to receive an operation message sent by a local node, where the operation message includes an access resource operation and an address of a remote node;
  • An execution sending unit 702 is configured to execute the resource accessing operation to obtain an operation result, and send an operation result message including the operation result to the local node.
  • the address of the remote node is a virtual address of the remote node
  • the remote node further includes a remote mapping determining unit 703, the remote mapping determining unit 703 is configured to establishing a third mapping relationship between the virtual address of the remote node and the physical address of the remote node;
  • the execution sending unit 702 is configured to determine the physical address of the remote node corresponding to the virtual address of the remote node according to the third mapping relationship;
  • the access resource operation included in the operation packet is performed.
  • the resource accessing operation includes at least one of the following:
  • Processor read operation load processor write operation store, processor atomic operation atomic and direct data access DMA access.
  • the remote node further includes: a remote mapping storage unit 704, configured to store the third mapping relationship in the third page table, so
  • the storage format of the third page table includes any one of the following: a page table entry PTE and a page table pointer PTP.
  • the present application also provides a computer program product, the computer program product includes: computer program code, when the computer program code is run on a computer, the computer is made to execute the embodiment shown in FIG. 5 . Methods to access remote resources.
  • the present application also provides a computer-readable medium, where program codes are stored in the computer-readable medium, and when the program codes are run on a computer, the computer is made to execute the steps in the embodiment shown in FIG. 5 . Methods to access remote resources.
  • the present application provides a chip including at least one processor and an interface; the interface is used to provide program instructions or data for the at least one processor; the at least one processor is used to execute all
  • the above-described program line instructions implement the method for accessing remote resources in the embodiment shown in FIG. 5 .
  • the embodiments of the present application provide a system and method for accessing remote resources, which are used to solve the problem that resources in remote memory, hard disk and other devices cannot be directly accessed currently through resource access requests directly issued by local nodes.
  • the access resource request issued by the local node can be converted into a message that can be transmitted over the network, and after the remote node parses the message transmitted over the network, the access request can be The resource request is directly sent to the remote node, so that the local node can directly access the remote node.
  • the access delay can be reduced, the access steps can be simplified, and the processing efficiency can be improved.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供一种访问远端资源的系统及方法,该系统包括本地节点以及远端节点;本地节点用于:确定访问资源请求,访问资源请求包含访问资源操作以及访问资源操作指向的本地节点的地址;确定对应的远端节点的地址;将操作报文发送给远端节点,操作报文包含访问资源操作和远端节点的地址;接收远端节点发送的操作结果报文,根据操作结果报文确定操作结果;远端节点用于:接收操作报文;执行访问资源操作得到操作结果,并将操作结果报文发送给本地节点。利用本申请提供的系统及方法,能够通过下发的访问资源请求的方式来直接快速的访问远端节点上的内存、硬盘以及其他资源,从而避免访问延时较大、过程较繁琐以及处理效率较低的问题。

Description

一种访问远端资源的系统及方法 技术领域
本申请涉及数据访问领域,特别涉及一种访问远端资源的系统及方法。
背景技术
随着大数据的发展,在存储(store)、高性能计算(high-performance computing,HPC)等领域普遍使用横向扩展(scale-out)的系统架构。整个scale-out系统由多个节点组成,每个节点都配置了独立的处理器、内存、硬盘以及其他设备,系统中的各个节点之间通常需要密切配合才能完成任务。因此,节点之间经常需要互相进行资源访问。
现有技术下本地节点访问远端节点的资源需要繁琐的过程。这里以本地节点读取远端节点的数据为例:远端节点在收到读取数据的指令时,首先将本地的数据复制到远端节点的处理器的套接字缓存(socket buffer)中,然后在远端节点的处理器中进行数据包的封装,通过一系列多层网络协议处理后,远端节点再将处理后的数据包发送到网卡的缓存(buffer)中,通过网络将处理后的数据包发送到本地节点的网卡上,然后到本地节点又经过多层的网络解析,得到远端节点的数据,这样数据读取才算完成。由于上述技术需要远端节点以及本地节点的处理器参与处理,因此,相比于直接访问本地内存,现有技术过程复杂且处理性能很低。
此外,在现有的另外一些方式下,本地的内存还可以通过远程直接数据存取(remote direct memory access,RDMA)方式来访问远端节点的内存,但若本地节点的硬盘需要访问远端节点的内存,则先需要本地节点的处理器执行访问命令,下发工作请求(work request),写门铃(doorbell)等一系列操作,步骤仍然很繁琐,并且使用RDMA技术还需要通过本地节点的处理器执行软件传输接口以及执行异步调度等等操作完成内存访问,从而增大延时并增加对内存带宽的消耗,因此,处理效率也很低。
有鉴于此,需要提出一种有效的方法通过直接下发访问资源操作的方式来访问远端节点上的内存、硬盘以及其他资源,从而避免远端资源访问延时较大、过程较繁琐以及处理效率较低的问题。
发明内容
有鉴于此,本申请提供一种访问远端资源的系统及方法,利用上述系统及方法,本地节点可以通过下发的访问资源请求来直接访问远程的内存、硬盘以及其他设备中的资源从而避免远端资源访问延时较大、过程较繁琐以及处理效率较低的问题。
第一方面,本申请提供一种访问远端资源的系统,所述系统包括本地节点以及远端节点;所述本地节点用于:确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作指向的所述本地节点的地址;确定与所述本地节点的地址对应的所述远端节点的地址;将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作和所述远端节点的地址;接收所述远端节点发送的操作结果报文,并根据所述操作结果报文确定操作结果;所述远端节点用于:接收所述操作报文;执行所述操作报文所包含的所述访问资源操作以得到所述操作结果,并将所述操作结果报文发送给所述本地节点,所述操作 结果报文包含所述操作结果。
在上述技术方案中,本地节点在确定访问资源请求后,确定访问资源请求中携带的访问资源操作以及本地节点的地址,并根据本地节点的地址确定远端节点的地址,将访问资源操作以及远端节点的地址转化为可进行网络传输的报文,将该报文发送到远端节点上,远端节点在接收报文后,可以直接在远端节点的地址上执行资源操作,并将操作结果以报文返回给本地节点,从而可以通过直接下发资源操作的方式访问远端节点上的内存、硬盘以及其他资源,降低访问延时、简化访问步骤,从而提升处理效率。
在一些可能的实施方式中,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;所述本地节点,在将操作报文发送给所述远端节点时,具体用于:将所述操作报文发送给所述远端节点号指示的所述远端节点。
在上述技术方案中,在访问远端资源的系统中存在多个远端节点时,可以根据远端节点号确定将所述访问资源请求下发到哪个远端节点,避免将操作报文发送到多个远端节点,从而提升访问效率。
在一些可能的实施方式中,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;所述本地节点还用于:建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;所述本地节点,在确定与所述本地节点的地址对应的所述远端节点的地址时,具体用于:根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
在上述技术方案中,可以将本地节点的操作系统中的用户空间与本地节点的物理地址空间,以及本地节点的物理地址空间与远端节点的操作系统中的用户空间关联起来,当本地节点访问本地节点的操作系统中的用户空间的虚拟地址时,实际上会转换为对远端节点的操作系统中的用户空间的虚拟地址的访问,从而可以实现通过直接下发资源操作的方式访问远端节点上的内存、硬盘以及其他资源,降低访问延时、简化访问步骤。
在一些可能的实施方式中,所述远端节点还用于:建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;所述远端节点,在执行所述操作报文所包含的所述资源操作时,具体用于:根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
同理,远端节点也可以将远端节点的操作系统中的用户空间与远端节点的物理地址空间关联起来,从而提升处理访问资源操作的效率。
在一些可能的实施方式中,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;所述本地节点,在确定与所述本地节点的地址对应的所述远端节点的地址时,具体用于:根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
在一些可能的实施方式中,所述资源操作包括以下至少一种:处理器读操作load、处 理器写操作store、处理器原子操作atomic以及DMA访问。在上述技术方案中,本申请提供的访问远程资源的系统可以为支持CHI协议的存储系统。
在一些可能的实施方式中,将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。在一些可能的实施方式中,将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
在上述技术方案中,第一映射关系、第二映射关系以及第三映射关系以页表条目PTE或页表指针PTP的格式存储在页表中,利用该格式存储,可以离散的将映射关系存储在页表中,从而节省页表的存储空间。
第二方面,本申请提供一种访问远端资源的方法,所述方法包括:
确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作所指向的本地节点的地址;确定与所述本地节点的地址对应的远端节点的地址;将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作以及所述远端节点的地址;接收所述远端节点发送的包含操作结果的操作结果报文,所述操作结果为所述远端节点执行所述操作报文所包含的所述访问资源操作所确定的。所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;所述将操作报文发送给所述远端节点,具体包括:将所述操作报文发送给所述远端节点号指示的所述远端节点。
在一些可能的实施方式中,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;所述方法还包括:建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;所述确定与本地节点的地址对应的远端节点的地址,包括:根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
在一些可能的实施方式中,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;所述确定与本地节点的地址对应的远端节点的地址,具体包括:根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
在一些可能的实施方式中,所述方法还包括:将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
第三方面,本申请提供一种访问远端资源的方法,所述方法包括:
接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址;
执行所述访问资源操作以得到操作结果,并将包含所述操作结果的操作结果报文发送 给所述本地节点。
在一些可能的实施方式中,所述远端节点的地址为所述远端节点的虚拟地址,所述方法还包括:建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
所述执行所述资源操作时,具体包括:
根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;
根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:
处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
在一些可能的实施方式中,所述方法还包括:
将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
第四方面,本申请提供一种本地节点,所述本地节点包括:
操作生成单元,用于确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作所指向的本地节点的地址;
地址确定单元,用于确定与所述本地节点的地址对应的远端节点的地址;
操作发送单元,用于将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作以及所述远端节点的地址;
结果确定单元,用于接收所述远端节点发送的包含操作结果的操作结果报文,所述操作结果为所述远端节点执行所述操作报文所包含的所述访问资源操作所确定的。
在一些可能的实施方式中,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;
所述操作发送单元,具体用于:
将所述操作报文发送给所述远端节点号指示的所述远端节点。
在一些可能的实施方式中,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;
所述本地节点还包括本地映射确定单元,所述本地映射确定单元,用于建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;
所述地址确定单元,还用于根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;
根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
在一些可能的实施方式中,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;
所述地址确定单元,还用于根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;
根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟 页号;
根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:
处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
在一些可能的实施方式中,所述本地节点还包括:本地映射存储单元,所述本地映射存储单元,用于将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
第五方面,本申请提供一种远端节点,所述远端节点包括:操作接收单元,用于接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址;执行发送单元,用于执行所述访问资源操作以得到操作结果,并将包含所述操作结果的操作结果报文发送给所述本地节点。
在一些可能的实施方式中,所述远端节点的地址为所述远端节点的虚拟地址,所述远端节点还包括远端映射确定单元,所述远端映射确定单元,用于建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
所述执行发送单元,用于根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;
根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:
处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
在一些可能的实施方式中,所述远端节点还包括:远端映射存储单元,所述远端映射存储单元,用于将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
第六方面,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行第二方面或第三方面任一所述的访问远程资源的方法。
第七方面,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行第二方面或第三方面任一所述的访问远程资源的方法。
第八方面,本申请提供一种芯片,包括至少一个处理器和接口;所述接口,用于为所述至少一个处理器提供程序指令或者数据;所述至少一个处理器用于执行所述程序行指令,实现第二方面或第三方面任一所述的访问远程资源的方法。
上述第二方面至第八方面中任一方面可以达到的技术效果可以参照上述第一方面中有益效果的描述,此处不再重复赘述。
本申请的这些方面或其它方面在以下实施例的描述中会更加简明易懂。
附图说明
图1A为一种访问远端资源的系统的架构示意图;
图1B为一种本地节点的地址与远端节点的地址的映射示意图;
图1C为一种访问远端资源的系统的具体架构示意图;
图2为本地节点向远端节点下发访问资源操作的流程示意图;
图3A为一种第一操作装置的结构示意图;
图3B为一种第二操作装置的结构示意图;
图4所示为应用于访问远端资源的系统的实例流程示意图;
图5为一种访问远端资源方法的流程示意图;
图6为访问远端资源的系统中的本地节点的结构示意图;
图7为访问远端资源的系统中的远端节点的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或系统实施例中。需要说明的是,在本申请的描述中“至少一个”是指一个或多个,其中,多个是指两个或两个以上。鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,示例性的,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
以下,先对本申请实施例中涉及的部分用语进行解释说明,以便于本领域技术人员容易理解。
(1)原子(atomic)操作:原子操作指的是一个独立而不可分割的操作。在单核环境中,一般的意义下原子操作中线程不会被切换,线程切换要么在原子操作之前,要么在原子操作完成之后。更广泛的意义下原子操作是指一系列必须整体完成的操作步骤,如果任何一步操作没有完成,那么所有完成的步骤都必须回滚,这样就可以保证要么所有操作步骤都未完成,要么所有操作步骤都被完成。例如,在单核系统里,单个的机器指令可以看成是原子操作;而在多核系统中,单个的机器指令就不是原子操作,因为多核系统里是多指令流并行运行的,一个核在执行一个指令时,其他核同时执行的指令有可能操作同一块内存区域,从而出现数据竞争现象。单条指令中完成的操作都可以认为是原子操作。
(2)内存映射:内存映射是指将硬盘上文件的位置与进程逻辑地址空间中一块大小相同的区域一一对应,当要访问内存中一段数据时,转换为访问文件的某一段数据。这种方式的目的是减少数据在用户空间和内核空间之间的拷贝操作。当大量数据需要传输的时候,采用内存映射方式去访问文件会获得比较好的效率。
(3)内存管理单元(memory management unit,MMU):MMU有时称作分页内存管理单元(paged memory management unit,PMMU)。它是一种负责处理中央处理器的内存访问请求的计算机硬件。它的功能包括虚拟页表到物理页表的转换(即虚拟内存管理)、 内存保护、中央处理器高速缓存的控制,在较为简单的计算机体系结构中,负责总线的仲裁以及存储体切换。
(4)高速串行计算机扩展总线标准(peripheral component interconnect-express,PCI-e):PCIe是一种通用的总线规格,它由
Figure PCTCN2021076161-appb-000001
所提倡和推广,设计目的是为了取代现有电脑系统内部的总线传输接口。
(5)非易失性存储器标准(Non-Volatile Memory express,NVMe):NVMe是使用PCI-e通道的固态硬盘(solid state disk,SSD)一种规范,NVMe的设计之初就有充分利用到PCI-e SSD的低延时以及并行性,还有处理器、平台与应用的并行性。SSD的并行性可以充分被主机的硬件与软件充分利用,相比与高级主机控制器接口(advanced host controller interface,AHCI)标准,NVMe可以带来多方面的性能提升。
(6)内存映射I/O(memory-mapped I/O,MMIO):MMIO是PCI规范的一部分,I/O设备被放置在内存空间而不是I/O空间。从处理器的角度看,内存映射I/O后系统设备访问起来和内存一样。
(7)虚拟地址(virtual address,VA):VA并不真实存在于计算机中。每个进程都分配有自己的虚拟空间,而且只能访问自己被分配使用的空间。
(8)页表条目(page table entry,PTE):PTE是页表的最低层,它直接处理页,该值包含某页的物理地址,还包含了说明该条目是否有效及相关页是否在物理内存中的位。
(9)一致集线器接口(coherent hub interface,CHI):基于CHI协议的系统架构可以包含独立CPU、处理器簇、图形处理器、memory控制器、I/O桥、PCIe子系统和CHI互联线。具体的,根据CHI协议节点类型,可以将组件分类命名如下:请求节点(request node,RN)负责产生协议操作(transactions),包含读和写;主节点(home node,HN),用于接收来自RN产生的协议操作;从节点(slave node,SN)用于接收来自HN的请求,完成相应的操作并返回一个响应。
(10)旁路转换缓冲(translation lookaside buffer,TLB):TLB为处理器的一种缓存,由存储器管理单元用于改进虚拟地址到物理地址的转译速度,TLB具有固定数目的空间槽,用于存放将虚拟地址映射至物理地址的标签页表条目。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例,基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
在现有技术下,可以基于以太网和无限带宽网络(infiniband,IB)的远程内存调用方式,实现远程直接数据存取(remote direct memory access,RDMA)技术来访问调用远端节点的内存。具体的,本地节点可以通过支持RDMA的网络接口控制器(RDMA-aware network interface controller,RNIC)来直接访问远程内存,但在采用以太网和IB网络实现远程内存调用时,还需要额外的交换机、适配器以及协议栈到PCI-e(peripheral component interconnect-express,快捷外围组件互连高速)协议之间的复杂转换。
具体的,当本地节点需要访问远端节点时,就会创建通道连接,每条通道的首尾端点是两对队列对(queue pairs,QP),本地节点通过网卡直接访问RNIC,并且在完成数据处理请求后,配合完成队列推送(complete queue poll,CQ poll)机制或中断机制来访存数据。 并且RDMA技术提供软件传输接口(software transport interface,Verbs)方便本地节点发送传输请求(work request,WR),WR中描述了希望传输到远端节点的消息内容,WR通知QP中的某个队列(work queue,WQ),在WQ中本地节点的WR被转换为Work Queue Element(WQE)的格式,并等待RNIC的异步调度解析,最终从WQE指向的缓存中拿到真正的消息返回本地节点,因此,使用RDMA技术仍然需要通过处理器执行软件传输接口、执行异步调度等等操作,而一旦经过处理器的处理执行,就会增大延时并增加对带宽的消耗,并且效率也较低。此外,RDMA是一种点对点协议,不但需要在每个节点上均安装专用的网卡(如以太网卡或者IB网卡),导致实现RDMA功能的成本较高,而且即使使用了RDMA技术,仍然不能对各个节点的资源进行合理地、高效地分配。
有鉴于此,本申请实施例提供一种访问远端资源的系统及方法,能够通过下发访问资源请求的方式来直接快速的访问远端节点上的内存、硬盘以及其他资源,从而避免访问延时较大、过程较繁琐以及处理效率较低的问题。
本申请的提供的技术方案,可以应用于具有两个以上的节点的存储系统,其中,存储系统的各节点可以具有独立的硬件结构,以独立的实现相同或相异的业务。此外,需要说明的是,在本申请实施例中,各个节点可以具有的一定的资源,各个节点的资源大小可以相同或不同,本申请不做具体限定。
为了便于理解,在以下说明中,将请求访问其他远端节点的节点命名为本地节点,将能够被其他节点访问内存、处理器、硬盘以及其他空间资源的节点命名为远端节点。需要说明的是,在本申请实施例中,一个节点可以在某一时段内作为本地节点,并在另一时段内作为远端节点,或者一个节点可以同时作为本地节点以及作为对于另一节点的远端节点。
图1A示出了一种访问远端资源的系统的架构示意图,该系统中包括:本地节点100、远端节点110。其中,所述本地节点100用于:确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作指向的所述本地节点100的地址;确定与所述本地节点100的地址对应的所述远端节点110的地址;将操作报文发送给所述远端节点110,所述操作报文包含所述访问资源操作和所述远端节点110的地址;接收所述远端节点110发送的操作结果报文,并根据所述操作结果报文确定操作结果;
所述远端节点110用于:接收所述操作报文;执行所述操作报文所包含的所述访问资源操作以得到所述操作结果,并将所述操作结果报文发送给所述本地节点100,所述操作结果报文包含所述操作结果。
其中,本地节点100与远端节点110均可以视为存储系统中的节点,在节点间需要进行数据交互时,本地节点100通过下发携带有访问资源操作的操作报文,可以访问远端节点110的资源。例如,本地节点100上的某个计算任务需要利用本地节点100的资源以及远端节点110的资源进行协同处理时,本地节点100可以向远端节点110发送获取计算数据的操作报文,从而获取在远端节点上的计算数据,进而完成计算任务。
具体的,访问资源请求可以包括如下类型的操作:处理器读(load)操作、处理器写(store)操作、处理器原子(atomic)操作以及直接数据存取DMA访问,并且通过所述访问资源请求包含的所述本地节点100的地址,能够在与所述本地节点100的地址对应的所述远端节点110的地址上执行访问资源操作。
而所述本地节点100的地址既可以为本地节点100的虚拟地址,也可以为本地节点100 的物理地址,远端节点110的地址既可以为远端节点110的虚拟地址,也可以为远端节点110的物理地址;所述操作结果报文为可以进行网络传输的报文。示例性的,这里以生成操作报文为RDMA类型的网络报文进行举例:根据load操作可以生成RDMA read报文、根据store操作可以生成RDMA write报文、根据atomic操作可以生成RDMA atomic报文;同理,发送给所述本地节点100的操作结果报文,同样为可以进行网络传输的报文,具体的报文生成方式,这里不再赘述,本领域技术人员应当知晓。
在一些可能的实施方式中,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点110的地址所属的远端节点110;所述本地节点100,在将操作报文发送给所述远端节点110时,具体用于:将所述操作报文发送给所述远端节点号指示的所述远端节点110,当所述远端节点110存在多个时,根据所述远端节点号确定将所述访问资源请求下发到哪个远端节点110。
在一些可能的实施方式中,所述本地节点100的地址为所述本地节点100的虚拟地址,所述远端节点110的地址为所述远端节点110的虚拟地址;所述本地节点100还用于:建立所述本地节点100的虚拟地址与所述本地节点100的物理地址的第一映射关系,以及建立所述本地节点100的物理地址与所述远端节点110的虚拟地址的第二映射关系;所述本地节点100,在确定与所述本地节点100的地址对应的所述远端节点110的地址时,具体用于:根据所述第一映射关系确定与所述本地节点100的虚拟地址对应的所述本地节点100的物理地址;根据所述第二映射关系确定与所述本地节点100的物理地址对应的所述远端节点110的虚拟地址。
具体的,所述本地节点100的虚拟地址位于本地节点100的虚拟地址空间中,所述虚拟地址空间为本地节点100的操作系统的用户空间的一部分,所述本地节点100的物理地址位于本地节点100的物理地址空间中,所述物理地址空间为本地节点100的资源空间的一部分。所述远端节点110的虚拟地址存在于远端节点110的虚拟地址空间中,所述虚拟地址空间为远端节点110的操作系统的用户空间的一部分。
在本申请中,本地节点100可以建立所述本地节点100的虚拟地址与所述本地节点100的物理地址的第一映射关系,以及建立所述本地节点100的物理地址与所述远端节点110的虚拟地址的第二映射关系,一旦映射建立完成,本地节点100的操作系统中的用户空间与本地节点100的物理地址空间,以及本地节点100的物理地址空间与远端节点110的操作系统中的用户空间会关联起来,当本地节点100访问本地节点100的操作系统中的用户空间的虚拟地址时,实际上会转换为对远端节点110的操作系统中的用户空间的虚拟地址的访问。具体的地址映射的方式本领域技术人员应当知晓,这里不再赘述。
在一些可能的实施方式中,所述远端节点110还用于:建立所述远端节点110的虚拟地址与所述远端节点110的物理地址的第三映射关系;所述远端节点110,在执行所述操作报文所包含的所述资源操作时,具体用于:根据所述第三映射关系确定与所述远端节点110的虚拟地址对应的所述远端节点110的物理地址;根据所述远端节点110的物理地址,执行所述操作报文所包含的所述访问资源操作。
与上述实施例类似的,远端节点110也可以将远端节点110的操作系统中的用户空间与远端节点110的物理地址空间关联起来,具体映射的方式本领域技术人员应当知晓。
在一些可能的实施方式中,所述本地节点100的虚拟地址包含:所述本地节点100的虚拟页号以及偏移量,所述远端节点110的虚拟地址包含所述远端节点110的虚拟页号以 及偏移量,所述本地节点100的虚拟地址包含的偏移量与所述远端节点110的虚拟地址包含的偏移量相同;所述本地节点100,在确定与所述本地节点100的地址对应的所述远端节点110的地址时,具体用于:根据所述第一映射关系确定与所述本地节点100的虚拟页号对应的所述本地节点100的物理页号;根据所述第二映射关系确定与所述本地节点100的物理页号对应的所述远端节点110的虚拟页号;根据所述本地节点100的地址中的偏移量以及所述远端节点110的虚拟页号,确定所述远端节点110的虚拟地址。
示例性的,参阅图1B所示,图1B为从本地节点100的地址映射到远端节点110的地址的示意图;访问资源请求中携带本地节点100的地址,本地节点100根据所述本地节点100的虚拟页号确定本地节点100的物理页号,在页表中查询与所述本地节点100物理页号对应的远端节点号以及远端节点110虚拟页号,根据所述远端节点110虚拟页号以及所述偏移量得到远端节点110的地址。在一些可能的实施方式中,所述本地节点还用于:将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。所述远端节点还用于:将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
示例性的,一个PTE对应一个页表项,例如,一个PTE具有32位地址,其中,第20位至第31位中可以存储有该PTE对应的地址。所述页表项可以存储远端节点110地址,还可以存储远端节点的其它信息,例如,所述操作报文的类型为RDMA时,所述页表项中还可以包括队列序号(QP number)等。
本申请实施例提供了一种访问远端资源的系统,用于解决目前不能通过本地节点直接下发的访问资源请求来直接访问远端的内存、硬盘以及其他设备中的资源的问题。通过将本地节点下发的访问资源请求转换为可进行网络传输的报文,将操作报文承载访问资源操作,将访问资源操作发送到远端节点上,从而能实现本地节点对远端节点的资源的直接访问,在访问远端节点的资源时,能够降低访问延时、简化访问步骤,从而提升处理效率。
参阅图1C所示,图1C示出了一种访问远端资源的系统的具体架构示意图,可选的,本地节点100上可以包括第一存储器102以及处理器103,而远端节点110包括第二存储器112。为了使本地节点100的第一存储器102以及处理器103可以向远端节点110的第二存储器112下发访问资源操作,所述本地节点100上设置有所述第一操作装置101,所述远端节点110上设置有所述第二操作装置111,而第一操作装置101与第二操作装置111之间可以相互通信。通过第一操作装置101向第二操作装置111传输携带有访问资源操作的操作报文,从而能在第二存储器112上,执行由第一存储器102或处理器103下发的访问资源操作。
其中,所述处理器103为本地节点100的控制中心,利用各种接口和线路连接整个节点的各个部分,通过运行或执行存储在第一存储器102内的软件程序和/或模块,以及调用存储在第一存储器102内的数据,以执行计算机系统的各种功能和/或处理数据。所述处理器103可以由集成电路(integrated circuit,IC)组成,例如可以由单颗封装的IC所组成,也可以由连接多颗相同功能或不同功能的封装IC而组成。在本申请实施方式中,所述处理器103可以为至少一个中央处理器(central processing unit,简称CPU),所述CPU可以是单运算核心,也可以是多运算核心,可以是实体机的处理器,也可以是虚拟机的处理器。
上述第一存储器102以及第二存储器112,用于存放程序指令和数据等。可以理解,本申请中的第一存储器102以及第二存储器112可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。此外,本申请中的第一存储器102以及第二存储器112还可以为磁碟、硬盘、U盘、移动硬盘、光盘、固态硬盘(solid state disk,SSD)或者其他非易失性存储器等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。应注意,本文描述的系统和方法的第一存储器102以及第二存储器112旨在包括但不限于这些和任意其它适合类型的存储器。
结合图1C所示的系统架构,图2示出了由本地节点100向远端节点110下发访问资源请求的角度描述的示意性流程图。参阅图2所示,该方法包括步骤S201-S207。
S201:处理器103/第一存储器102向第一操作装置101发送访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作指向的所述本地节点100的地址。
其中,所述访问资源请求既可以由所述本地节点100中的第一存储器102下发,也可以由所述本地节点100中的处理器103下发。
S202:第一操作装置101根据所述本地节点100的地址,确定与所述本地节点100的地址对应的远端节点110的地址。可选的,第一操作装置101可以根据所述本地节点100的地址,确定与所述本地节点100的地址对应远端节点110的地址,其中,第一操作装置101确定与所述本地节点100的地址对应的页表目录地址;根据所述本地节点100的地址,在所述页表目录地址指向的页表目录中寻找所述本地节点100的地址指向的页表地址;根据所述本地节点100的地址,在所述页表地址指向的页表中查找所述本地节点100的地址对应的页表项,根据所述页表项确定与所述本地节点100的地址对应的远端节点110的地址。其中,所述访问资源请求中还包含:远端节点号,当所述远端节点110存在多个时,根据所述远端节点号确定将所述访问资源请求下发到哪个远端节点。
S203:第一操作装置101将所述操作报文发送到远端节点110上的第二操作装置111。
具体的,第一操作装置101根据所述本地节点100下发的load/store/atomic操作,生成为可以进行网络传输的操作报文。示例性的,这里以生成操作报文为RDMA类型的网络报文进行举例:根据第一存储器102下发的load操作生成RDMA read报文、根据第一存储器102下发的store操作生成RDMA write报文、根据第一存储器102下发的atomic操作生成RDMA atomic报文。
S204:第二操作装置111接收所述操作报文,将访问资源操作发送到第二存储器112上。
具体的,由第二操作装置111接收所述操作报文,得到第一存储器102下发的访问资 源请求,得到访问资源操作以及所述远端节点110的地址,并将访问资源操作发送到第二存储器112上,如此,通过第一操作装置101以及第二操作装置111,可以实现本地节点100直接向远端节点110第二存储器112下发访问资源请求,从而实现节点间的相互访问。
S205:第二存储器112执行所述操作报文所包含的所述访问资源操作以得到所述操作结果,将所述操作结果发送到第二操作装置111。
在远端节点110上的所述第二存储器112接收访问资源操作以及所述远端节点110的地址后,执行访问资源操作,并将所述访问资源请求的操作结果发送至第二操作装置111。
示例性的,若所述访问资源请求为load操作时,则从远端节点110地址读取相应数据进行返回,若所述访问资源请求为store操作时,则将所述访问资源请求携带的写数据写入所述远端节点110地址中。
S206:第二操作装置111将所述操作结果报文发送至第一操作装置101。
同理,根据所述第二操作装置111发送的操作结果,同样可以生成进行网络传输的操作结果报文,具体的报文生成方式,这里不再赘述,本领域技术人员应当知晓。
S207:第一操作装置101接收所述远端节点发送的操作结果报文,并根据所述操作结果报文确定操作结果,将所述操作结果返回至下发所述访问资源请求的所述第一存储器102/处理器103。
当所述访问资源请求由所述本地节点100中的第一存储器102下发时,将所述操作结果返回所述本地节点100的第一存储器102,当所述访问资源请求由所述本地节点100中的处理器103下发时,将所述操作结果返回所述本地节点100的处理器103。
本申请实施例通过访问远端资源的系统中的第一操作装置与第二操作装置,将本地节点下发的访问资源请求,通过第一操作装置转换为可进行网络传输的报文,并由第二操作装置解析网络传输的报文,将访问资源请求直接发送到远端节点的存储器上,从而能实现本地节点直接向远端节点进行访问,在访问远端资源时,降低访问延时、简化访问步骤,从而提升处理效率。
本申请还提供一种第一操作装置,图3A示出了第一操作装置300架构示意图,所述第一操作装置300包括:第一指令处理单元301、远程内存管理单元302以及第一报文传输单元303。
所述第一指令处理单元301用于接收所述本地节点100发送的访问资源请求,所述访问资源请求包含访问资源操作以及本地节点100的地址;将所述本地节点100的地址发送到远程内存管理单元302;接收所述第一报文传输单元303发送的操作结果报文,并根据所述操作结果报文确定操作结果,将所述操作结果发送至所述本地节点100。
所述远程内存管理单元302用于根据所述本地节点100的地址,确定与所述本地节点100的地址对应远端节点110的地址;
所述第一报文传输单元303用于将所述将操作报文发送到远端节点110;接收所述远端节点110发送的操作结果报文,根据操作结果报文确定操作结果。
其中,所述第一指令处理单元301既可以接收所述本地节点100中的第一存储器102下发的所述访问资源请求,所述第一指令处理单元301还可以接收所述本地节点100中的处理器103下发的访问资源请求。
在一些可能的实施方式中,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点110的地址所属的远端节点110;所述第一指令处理单元301还 用于将所述操作报文发送给所述远端节点号指示的所述远端节点110,当所述远端节点110存在多个时,根据所述远端节点号确定将所述访问资源请求下发到哪个远端节点110。
在一些可能的实施方式中,所述本地节点100的地址为所述本地节点100的虚拟地址,所述远端节点110的地址为所述远端节点110的虚拟地址,建立所述本地节点100的虚拟地址与所述本地节点100的物理地址的第一映射关系,以及建立所述本地节点100的物理地址与所述远端节点110的虚拟地址的第二映射关系;所述远程内存管理单元302具体用于:根据所述第一映射关系确定与所述本地节点100的虚拟地址对应的所述本地节点100的物理地址;根据所述第二映射关系确定与所述本地节点100的物理地址对应的所述远端节点110的虚拟地址。
在一些可能的实施方式中,所述本地节点100的虚拟地址包含:所述本地节点100的虚拟页号以及偏移量,所述远端节点110的虚拟地址包含所述远端节点110的虚拟页号以及偏移量,所述本地节点100的虚拟地址包含的偏移量与所述远端节点110的虚拟地址包含的偏移量相同;所述远程内存管理单元302,具体用于:根据所述第一映射关系确定与所述本地节点100的虚拟页号对应的所述本地节点100的物理页号;根据所述第二映射关系确定与所述本地节点100的物理页号对应的所述远端节点110的虚拟页号;根据所述本地节点100的地址中的偏移量以及所述远端节点110的虚拟页号,确定所述远端节点110的虚拟地址。
在一些可能的实施方式中,所述远程内存管理单元302还用于将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。所述远端节点还用于:将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
在一些可能的实施方式中,所述第一操作装置300还包括旁路转换缓冲单元304,所述旁路转换缓冲单元304用于缓存第一页表、第二页表中的常用页表项。其中,若每次本地节点100地址到远端节点110地址的转换都需要访问所述远程内存管理单元302中的第一页表、第二页表,则会花费很多的时间。因此,设置旁路转换缓冲单元304作为存储常用页表项的高级缓存,能提高地址转换的速度。示例性的,所述旁路转换缓冲单元304存储有常用页表项,所述常用页表项是所述第一页表的子集或所述第二页表的子集。这样可以先在所述旁路转换缓冲单元304中查找常用页表项进行地址转换从而能提高地址转换的速度。
本申请还提供一种第二操作装置310,图3B示出了第二操作装置310的架构示意图,所述操作装置310包括:第二报文传输单元311以及第二指令处理单元312。
所述第二报文传输单元311用于接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址,将所述访问资源操作发送到第二指令处理单元312;接收所述操作结果,并将包含所述操作结果的操作结果报文发送给所述本地节点100;
所述第二指令处理单元312用于接收所述访问资源操作,执行所述访问资源操作以得到操作结果,得到操作结果,将所述操作结果发送到所述第二报文传输单元311;
在一些可能的实施方式中,所述远端节点的地址为所述远端节点的虚拟地址,所述第二操作装置310还包括内存管理单元313,所述内存管理单元313用于建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
所述第二指令处理单元312用于根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
在一些可能的实施方式中,所述内存管理单元313用于将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
基于上述实施例提供的访问远端资源的系统,本实施例提供一种应用于访问远端资源的系统的实例,以访问远端资源的系统的本地节点向远端节点下发load操作的场景为例,其中,该系统为基于CHI总线的架构且下发访问资源请求会转换为RDMA类型的报文,图4所示为应用于访问远端资源的系统的实例流程示意图,其中:
本地节点上的处理器或存储器向第一操作装置发送非监听式读操作请求readnosnp(load操作),第一操作装置确定本地节点的地址,并查询远程内存单元上的页表目录,得到readnosnp请求对应的队列对编号(queue pairs number,QPN)以及远端节点的地址;第一操作装置将readnosnp请求转换为RDMA read操作发送到第二操作装置;第二操作装置将RDMA read转换为一次读操作readonce,将所述readonce发送到远端节点的存储器,以获取所述远端节点的地址上的数据;接收远端节点发送的完成数据compdata,将所述compdata转换为RDMA read response发送给第一操作装置;第一操作装置接收所述RDMA read response后,提取所述compdata,将所述compdata返回给下发readnosnp请求的本地节点上的处理器或存储器。
本申请还提供一种访问远端资源的方法,该方法应用于上述实施例所述的本地节点100及远端节点110,参阅图5所示,所述方法包括如下步骤:
S501:本地节点100确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作所指向的本地节点的地址;
S502:本地节点100确定与所述本地节点的地址对应的远端节点的地址;
S503:本地节点100将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作以及所述远端节点的地址;
S504:远端节点110接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址;
S505:远端节点110执行所述访问资源操作以得到操作结果,并将包含所述操作结果的操作结果报文发送给所述本地节点;
S506:本地节点100接收所述远端节点发送的包含操作结果的操作结果报文,所述操作结果为所述远端节点执行所述操作报文所包含的所述访问资源操作所确定的。
在一些可能的实施方式中,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;
所述将操作报文发送给所述远端节点,具体包括:将所述操作报文发送给所述远端节点号指示的所述远端节点。
在一些可能的实施方式中,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;所述方法还包括:建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;所述确定与本地节点的地址对应的远端节点的地址,包括:根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理 地址;根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
在一些可能的实施方式中,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;所述确定与本地节点的地址对应的远端节点的地址,具体包括:根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
在一些可能的实施方式中,所述远端节点的地址为所述远端节点的虚拟地址,所述方法还包括:建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;所述执行所述资源操作时,具体包括:根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。在一些可能的实施方式中,所述方法还包括:将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
在一些可能的实施方式中,所述方法还包括:将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
本申请还提供一种本地节点,参阅图6所示,所述本地节点600包括:
操作生成单元601,用于确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作所指向的本地节点的地址;
地址确定单元602,用于确定与所述本地节点的地址对应的远端节点的地址;
操作发送单元603,用于将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作以及所述远端节点的地址;
结果确定单元604,用于接收所述远端节点发送的包含操作结果的操作结果报文,所述操作结果为所述远端节点执行所述操作报文所包含的所述访问资源操作所确定的。
在一些可能的实施方式中,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;所述操作发送单元603,具体用于:
将所述操作报文发送给所述远端节点号指示的所述远端节点。
在一些可能的实施方式中,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;
所述本地节点还包括本地映射确定单元605,所述本地映射确定单元605,用于建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;
所述地址确定单元602,还用于根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;
根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟 地址。
在一些可能的实施方式,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;
所述地址确定单元602,还用于根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;
根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;
根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:
处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
在一些可能的实施方式中,所述本地节点还包括:本地映射存储单元606,所述本地映射存储单元606,用于将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
本申请还提供一种远端节点,参阅图7所示,所述远端节点700包括:
操作接收单元701,用于接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址;
执行发送单元702,用于执行所述访问资源操作以得到操作结果,并将包含所述操作结果的操作结果报文发送给所述本地节点。
在一些可能的实施方式中,所述远端节点的地址为所述远端节点的虚拟地址,所述远端节点还包括远端映射确定单元703,所述远端映射确定单元703,用于建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
所述执行发送单元702,用于根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;
根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
在一些可能的实施方式中,所述访问资源操作包括以下至少一种:
处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
在一些可能的实施方式中,所述远端节点还包括:远端映射存储单元704,所述远端映射存储单元704,用于将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
基于上述内容和相同构思,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行图5所示实施例中的访问远程资源的方法。
基于上述内容和相同构思,本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行图5所示实施例中的访问远程资源的方法。
基于上述内容和相同构思,本申请提供一种芯片,包括至少一个处理器和接口;所述接口,用于为所述至少一个处理器提供程序指令或者数据;所述至少一个处理器用于执行所述程序行指令,实现图5所示实施例中的访问远程资源的方法。
本申请实施例提供了一种访问远端资源的系统及方法,用于解决目前不能通过本地节点直接下发的访问资源请求来直接访问远程的内存、硬盘以及其他设备中的资源的问题。利用本申请提供的访问远端资源的系统及方法,可以将本地节点下发的访问资源请求,转换为可进行网络传输的报文,并由远端节点解析网络传输的报文后,将访问资源请求直接发送到远端节点上,从而能实现本地节点直接向远端节点进行访问,在访问远端资源时,能够降低访问延时、简化访问步骤,从而提升处理效率。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (28)

  1. 一种访问远端资源的系统,其特征在于,所述系统包括本地节点以及远端节点;
    所述本地节点用于:确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作指向的所述本地节点的地址;确定与所述本地节点的地址对应的所述远端节点的地址;将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作和所述远端节点的地址;接收所述远端节点发送的操作结果报文,并根据所述操作结果报文确定操作结果;
    所述远端节点用于:接收所述操作报文;执行所述操作报文所包含的所述访问资源操作以得到所述操作结果,并将所述操作结果报文发送给所述本地节点,所述操作结果报文包含所述操作结果。
  2. 根据权利要求1所述的系统,其特征在于,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;
    所述本地节点,在将操作报文发送给所述远端节点时,具体用于:
    将所述操作报文发送给所述远端节点号指示的所述远端节点。
  3. 根据权利要求1或2所述的系统,其特征在于,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;
    所述本地节点还用于:
    建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;
    所述本地节点,在确定与所述本地节点的地址对应的所述远端节点的地址时,具体用于:
    根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;
    根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
  4. 根据权利要求3所述的系统,其特征在于,所述远端节点还用于:
    建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
    所述远端节点,在执行所述操作报文所包含的所述资源操作时,具体用于:
    根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;
    根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
  5. 根据权利要求3所述的系统,其特征在于,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;
    所述本地节点,在确定与所述本地节点的地址对应的所述远端节点的地址时,具体用于:
    根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;
    根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;
    根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
  6. 根据权利要求1-5任一所述的系统,其特征在于,所述访问资源操作包括以下至少一种:
    处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
  7. 根据权利要求3所述的系统,其特征在于,所述本地节点还用于:将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
  8. 根据权利要求4所述的系统,其特征在于,所述远端节点还用于:将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
  9. 一种访问远端资源的方法,其特征在于,所述方法包括:
    确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作所指向的本地节点的地址;
    确定与所述本地节点的地址对应的远端节点的地址;
    将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作以及所述远端节点的地址;
    接收所述远端节点发送的包含操作结果的操作结果报文,所述操作结果为所述远端节点执行所述操作报文所包含的所述访问资源操作所确定的。
  10. 根据权利要求9所述的方法,其特征在于,所述访问资源请求中还包含:远端节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;
    所述将操作报文发送给所述远端节点,具体包括:
    将所述操作报文发送给所述远端节点号指示的所述远端节点。
  11. 根据权利要求9或10所述的方法,其特征在于,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;
    所述方法还包括:建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;
    所述确定与本地节点的地址对应的远端节点的地址,包括:
    根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;
    根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
  12. 根据权利要求11所述的方法,其特征在于,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;
    所述确定与本地节点的地址对应的远端节点的地址,具体包括:
    根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;
    根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;
    根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
  13. 根据权利要求9-12任一所述的方法,其特征在于,所述访问资源操作包括以下至少一种:
    处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
  14. 根据权利要求11所述的方法,其特征在于,所述方法还包括:将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
  15. 一种访问远端资源的方法,其特征在于,包括:
    接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址;
    执行所述访问资源操作以得到操作结果,并将包含所述操作结果的操作结果报文发送给所述本地节点。
  16. 根据权利要求15所述的方法,其特征在于,所述远端节点的地址为所述远端节点的虚拟地址,所述方法还包括:建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
    所述执行所述资源操作时,具体包括:
    根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;
    根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
  17. 根据权利要求15或16所述的方法,其特征在于,所述访问资源操作包括以下至少一种:
    处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
  18. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
  19. 一种本地节点,其特征在于,所述本地节点包括:
    操作生成单元,用于确定访问资源请求,所述访问资源请求包含访问资源操作以及所述访问资源操作所指向的本地节点的地址;
    地址确定单元,用于确定与所述本地节点的地址对应的远端节点的地址;
    操作发送单元,用于将操作报文发送给所述远端节点,所述操作报文包含所述访问资源操作以及所述远端节点的地址;
    结果确定单元,用于接收所述远端节点发送的包含操作结果的操作结果报文,所述操作结果为所述远端节点执行所述操作报文所包含的所述访问资源操作所确定的。
  20. 根据权利要求19所述的本地节点,其特征在于,所述访问资源请求中还包含:远端 节点号,所述远端节点号用于指示所述远端节点的地址所属的远端节点;
    所述操作发送单元,具体用于:
    将所述操作报文发送给所述远端节点号指示的所述远端节点。
  21. 根据权利要求19或20所述的本地节点,其特征在于,所述本地节点的地址为所述本地节点的虚拟地址,所述远端节点的地址为所述远端节点的虚拟地址;
    所述本地节点还包括本地映射确定单元,所述本地映射确定单元,用于建立所述本地节点的虚拟地址与所述本地节点的物理地址的第一映射关系,以及建立所述本地节点的物理地址与所述远端节点的虚拟地址的第二映射关系;
    所述地址确定单元,还用于根据所述第一映射关系确定与所述本地节点的虚拟地址对应的所述本地节点的物理地址;
    根据所述第二映射关系确定与所述本地节点的物理地址对应的所述远端节点的虚拟地址。
  22. 根据权利要求21所述的本地节点,其特征在于,所述本地节点的虚拟地址包含:所述本地节点的虚拟页号以及偏移量,所述远端节点的虚拟地址包含所述远端节点的虚拟页号以及偏移量,所述本地节点的虚拟地址包含的偏移量与所述远端节点的虚拟地址包含的偏移量相同;
    所述地址确定单元,还用于根据所述第一映射关系确定与所述本地节点的虚拟页号对应的所述本地节点的物理页号;
    根据所述第二映射关系确定与所述本地节点的物理页号对应的所述远端节点的虚拟页号;
    根据所述本地节点的地址中的偏移量以及所述远端节点的虚拟页号,确定所述远端节点的虚拟地址。
  23. 根据权利要求19-22任一所述的本地节点,其特征在于,所述访问资源操作包括以下至少一种:
    处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
  24. 根据权利要求21所述的本地节点,其特征在于,所述本地节点还包括:本地映射存储单元,所述本地映射存储单元,用于将所述第一映射关系存储在第一页表中,将所述第二映射关系存储在第二页表中,所述第一页表以及所述第二页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
  25. 一种远端节点,其特征在于,所述远端节点包括:
    操作接收单元,用于接收本地节点发送的操作报文,所述操作报文包含访问资源操作及远端节点的地址;
    执行发送单元,用于执行所述访问资源操作以得到操作结果,并将包含所述操作结果的操作结果报文发送给所述本地节点。
  26. 根据权利要求25所述的远端节点,其特征在于,所述远端节点的地址为所述远端节点的虚拟地址,所述远端节点还包括远端映射确定单元,所述远端映射确定单元,用于建立所述远端节点的虚拟地址与所述远端节点的物理地址的第三映射关系;
    所述执行发送单元,用于根据所述第三映射关系确定与所述远端节点的虚拟地址对应的所述远端节点的物理地址;
    根据所述远端节点的物理地址,执行所述操作报文所包含的所述访问资源操作。
  27. 根据权利要求25或26所述的远端节点,其特征在于,所述访问资源操作包括以下至少一种:
    处理器读操作load、处理器写操作store、处理器原子操作atomic以及直接数据存取DMA访问。
  28. 根据权利要求26所述的远端节点,其特征在于,所述远端节点还包括:远端映射存储单元,所述远端映射存储单元,用于将所述第三映射关系存储在第三页表中,所述第三页表的存储格式包括以下任意一种:页表条目PTE以及页表指针PTP。
PCT/CN2021/076161 2021-02-09 2021-02-09 一种访问远端资源的系统及方法 WO2022170452A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21925135.2A EP4276638A4 (en) 2021-02-09 2021-02-09 SYSTEM AND METHOD FOR ACCESSING A REMOTE RESOURCE
PCT/CN2021/076161 WO2022170452A1 (zh) 2021-02-09 2021-02-09 一种访问远端资源的系统及方法
CN202180091402.5A CN116745754A (zh) 2021-02-09 2021-02-09 一种访问远端资源的系统及方法
US18/366,889 US20230388371A1 (en) 2021-02-09 2023-08-08 System and method for accessing remote resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/076161 WO2022170452A1 (zh) 2021-02-09 2021-02-09 一种访问远端资源的系统及方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/366,889 Continuation US20230388371A1 (en) 2021-02-09 2023-08-08 System and method for accessing remote resource

Publications (1)

Publication Number Publication Date
WO2022170452A1 true WO2022170452A1 (zh) 2022-08-18

Family

ID=82838108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076161 WO2022170452A1 (zh) 2021-02-09 2021-02-09 一种访问远端资源的系统及方法

Country Status (4)

Country Link
US (1) US20230388371A1 (zh)
EP (1) EP4276638A4 (zh)
CN (1) CN116745754A (zh)
WO (1) WO2022170452A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230251780A1 (en) * 2022-01-25 2023-08-10 Samsung Electronics Co., Ltd. Appratus and method with data processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105450588A (zh) * 2014-07-31 2016-03-30 华为技术有限公司 一种基于rdma的数据传输方法及rdma网卡
US20160378713A1 (en) * 2015-06-24 2016-12-29 Oracle International Corporation System and method for persistence of application data using replication over remote direct memory access
US20170255590A1 (en) * 2016-03-07 2017-09-07 Mellanox Technologies, Ltd. Atomic Access to Object Pool over RDMA Transport Network
US20180267741A1 (en) * 2017-03-16 2018-09-20 Arm Limited Memory access monitoring
WO2020037201A1 (en) * 2018-08-17 2020-02-20 Oracle International Corporation Remote direct memory operations (rdmos) for transactional processing systems

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5887134A (en) * 1997-06-30 1999-03-23 Sun Microsystems System and method for preserving message order while employing both programmed I/O and DMA operations
US20090089537A1 (en) * 2007-09-28 2009-04-02 Sun Microsystems, Inc. Apparatus and method for memory address translation across multiple nodes
US10769076B2 (en) * 2018-11-21 2020-09-08 Nvidia Corporation Distributed address translation in a multi-node interconnect fabric

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105450588A (zh) * 2014-07-31 2016-03-30 华为技术有限公司 一种基于rdma的数据传输方法及rdma网卡
US20160378713A1 (en) * 2015-06-24 2016-12-29 Oracle International Corporation System and method for persistence of application data using replication over remote direct memory access
US20170255590A1 (en) * 2016-03-07 2017-09-07 Mellanox Technologies, Ltd. Atomic Access to Object Pool over RDMA Transport Network
US20180267741A1 (en) * 2017-03-16 2018-09-20 Arm Limited Memory access monitoring
WO2020037201A1 (en) * 2018-08-17 2020-02-20 Oracle International Corporation Remote direct memory operations (rdmos) for transactional processing systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4276638A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230251780A1 (en) * 2022-01-25 2023-08-10 Samsung Electronics Co., Ltd. Appratus and method with data processing

Also Published As

Publication number Publication date
EP4276638A4 (en) 2024-02-21
US20230388371A1 (en) 2023-11-30
CN116745754A (zh) 2023-09-12
EP4276638A1 (en) 2023-11-15

Similar Documents

Publication Publication Date Title
US10216419B2 (en) Direct interface between graphics processing unit and data storage unit
US8866831B2 (en) Shared virtual memory between a host and discrete graphics device in a computing system
US8250254B2 (en) Offloading input/output (I/O) virtualization operations to a processor
US10210105B2 (en) Inline PCI-IOV adapter
US9280290B2 (en) Method for steering DMA write requests to cache memory
US20150261434A1 (en) Storage system and server
US20150261720A1 (en) Accessing remote storage devices using a local bus protocol
US8862801B2 (en) Handling atomic operations for a non-coherent device
RU2491616C2 (ru) Устройство, способ и система управления матрицами
US20230195633A1 (en) Memory management device
US10866755B2 (en) Two stage command buffers to overlap IOMMU map and second tier memory reads
CN113760560A (zh) 一种进程间通信方法以及进程间通信装置
US11055220B2 (en) Hybrid memory systems with cache management
EP4123649A1 (en) Memory module, system including the same, and operation method of memory module
US20230388371A1 (en) System and method for accessing remote resource
US11526441B2 (en) Hybrid memory systems with cache management
CN115269457A (zh) 使得缓存能够在支持地址转换服务的设备内存储进程特定信息的方法和装置
WO2020247240A1 (en) Extended memory interface
CN114080587A (zh) 输入-输出存储器管理单元对访客操作系统缓冲区和日志的访问
US11579882B2 (en) Extended memory operations
WO2022133656A1 (zh) 一种数据处理装置、方法及相关设备
CN117389685B (zh) 虚拟机热迁移标脏方法及其装置、后端设备、芯片
US20230350812A1 (en) Architectural interface for address translation cache (atc) in xpu to submit command directly to guest software
Khalil et al. FPGA-Accelerated Non-Volatile Memory Access
JP2008123333A (ja) 半導体集積回路装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21925135

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180091402.5

Country of ref document: CN

ENP Entry into the national phase

Ref document number: 2021925135

Country of ref document: EP

Effective date: 20230812

NENP Non-entry into the national phase

Ref country code: DE