WO2024045643A1 - Dispositif, procédé et système d'accès à des données, unité de traitement de données et carte d'interface réseau - Google Patents

Dispositif, procédé et système d'accès à des données, unité de traitement de données et carte d'interface réseau Download PDF

Info

Publication number
WO2024045643A1
WO2024045643A1 PCT/CN2023/089442 CN2023089442W WO2024045643A1 WO 2024045643 A1 WO2024045643 A1 WO 2024045643A1 CN 2023089442 W CN2023089442 W CN 2023089442W WO 2024045643 A1 WO2024045643 A1 WO 2024045643A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data access
storage server
address
access device
Prior art date
Application number
PCT/CN2023/089442
Other languages
English (en)
Chinese (zh)
Inventor
钟刊
崔文林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024045643A1 publication Critical patent/WO2024045643A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1072Decentralised address translation, e.g. in distributed shared memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • This application relates to the field of data storage, and in particular to a data access device, method, system, data processing unit and network card.
  • Shuffle is used to describe the process of shuffling data and then aggregating it to different nodes. Take applications where distributed systems run storage-intensive tasks as an example.
  • Mapper mapping nodes
  • Reducer simplifying nodes
  • RPC remote procedure call
  • Data transmission based on RPC requests requires the cooperation of local nodes (such as simplification nodes) and remote nodes (such as mapping nodes), resulting in a large consumption of network resources and memory, and the disk IO of local nodes and remote nodes If there are too many, data access efficiency will be affected.
  • This application provides a data access device, method, system, data processing unit and network card, which solves the problem of low data access efficiency of local nodes and remote nodes.
  • a data access device includes: a processor, a memory, and a data processing unit (DPU).
  • the data processing unit (DPU) may refer to a pluggable accelerator card, such as a DPU card.
  • the data access device may include a server (or host) and the DPU card.
  • the server may include the aforementioned processor. and memory.
  • the processor is used to: write data into the memory, and send a data synchronization request to the DPU; the data synchronization request is used to instruct the data to be stored in the storage server.
  • the DPU is used to send locking requests and data writing commands to the storage server based on the aforementioned data synchronization request.
  • the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address
  • the lock request is used to instruct the storage server to write the data to the storage space corresponding to the first address during the execution of the data write command. Set so that it cannot be accessed by other data access devices.
  • the storage space in the storage server to which the data is to be written (such as the storage space corresponding to the aforementioned first address) cannot be accessed by other data access devices.
  • multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only One data access device can access the storage space, which prevents multiple data access devices from writing data to a section of storage space in the storage server, causing each data access device to read different data from this section of storage space, and the data is stored in multiple locations. Inconsistency issues in data access devices.
  • the data access device can bypass the control included in the storage server.
  • controller or processor
  • the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the IO for writing data in the storage server.
  • the length of the path increases the data access efficiency between the data access device and the storage server.
  • the DPU is also used to send an unlocking request to the storage server after the data is successfully written to the first address.
  • the unlock request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
  • the storage server can unlock the access status of the data or file, so that the updated data or file can be accessed or modified by other data access devices, etc., to avoid
  • the data or files in the storage server can only be used by a single data access device, causing other data access devices to reduce the data access efficiency of the data or files.
  • the DPU is also used to send failure indication messages to other data access devices.
  • the invalidation indication message is used to instruct other data access devices to invalidate the old data stored in the first address. If the data access device does not send failure indication information to other data access devices, then if the other data access devices have stored or mapped the old data stored in the storage space corresponding to the first address, the other data access devices will use the old data based on the old data. Executing the task will cause errors in the execution of the task. In contrast, in this embodiment, after the data access device writes data in the storage space corresponding to the first address in the storage server, it sends an invalidation indication message to other data access devices, and the other data access devices transfer the stored data The old data at the first address is invalid.
  • the other data access devices When other data access devices need to use the new data written at the first address, the other data access devices re-read the new data in the storage space at the first address from the storage server. , which avoids the problem of inconsistent data read by multiple data access devices in the same storage space of the storage server, leading to data access errors or reduced data access efficiency.
  • the DPU is also configured to invalidate the data stored at the second address indicated by the failure information in the memory based on the failure information sent by other data access devices.
  • the second address is different from the aforementioned first address.
  • the DPU receives the invalidation information sent by other data access devices, it invalidates the old data stored at the second address to prevent the data access device from using the old data to perform tasks.
  • the old data is stored with the storage space at the second address in the storage server.
  • the new data is inconsistent, resulting in an access error caused by inconsistent data in the same storage space cached by multiple data access devices, or after the data access device interacts with the storage server, the storage server changes the second address stored in multiple data access devices.
  • the data is synchronized, which leads to the problem of reduced data access efficiency of the data access device to the storage server.
  • the second address is the same as the aforementioned first address. It should be understood that for a section of storage space in the storage server, multiple data access devices can modify the data in the section of storage space at different times, etc., to avoid that the data in this section of storage space will be deleted during the data access process of the storage server. The problem that it can only be modified by a single data access device improves the performance of the data access services that the storage server can provide.
  • the processor is also used to send a read request to the DPU when the memory misses data.
  • the DPU is also used to: based on the first address carried in the read request, read the data stored in the first address from the storage server. It should be understood that for the data stored in the storage server written by the data access device, other data access devices can read the newly written data (new data), and the data access device that wrote the data can also read the new data. , making the new data consistent in multiple data access devices, and improving the data access performance of the data access device to the storage server.
  • a data access method is provided.
  • the data access method is executed by a data access system.
  • the data access system includes a data access device and a storage server.
  • the data access device includes a processor, a memory, and a DPU.
  • the data access method provided by this embodiment includes: the processor writes data into the memory and sends a data synchronization request to the DPU; the data synchronization request is used to instruct the data to be stored in the storage server. And, the DPU sends locking requests and data writing commands to the storage server based on the data synchronization request.
  • the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address
  • the lock request is used to instruct the storage server to set the storage space corresponding to the first address during the process of executing the data write command. cannot be accessed by other data access devices.
  • the data access method provided in this embodiment also includes: after the data is successfully written to the first address, the DPU sends an unlocking request to the storage server.
  • the unlocking request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
  • the data access method provided in this embodiment also includes: the DPU sending a failure indication message to other data access devices.
  • the invalidation indication message is used to instruct other data access devices to invalidate the old data stored in the first address.
  • the data access method provided by this embodiment also includes: the DPU invalidates the data stored at the second address indicated by the failure information in the memory according to the failure information sent by other data access devices.
  • the data access method provided in this embodiment also includes: when the memory misses data, the DPU sends a read request to the DPU. And, based on the first address carried in the read request, the DPU reads the data stored in the first address from the storage server.
  • a data access system including: a storage server and the data access device shown in any implementation manner in the first aspect.
  • the storage server is used to store data to be synchronized by the data access device, and to set the storage space corresponding to the first address to which the data will be written so that it cannot be accessed by other data access devices.
  • the fourth aspect provides a DPU, including: a control circuit and an interface circuit.
  • the interface circuit is used to receive data from other devices other than the DPU and transmit it to the control circuit, or to send data from the control circuit to other devices other than the DPU.
  • the control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the DPU in any possible implementation manner of the second aspect.
  • a network card including: the DPU and communication interface provided in the fourth aspect.
  • the communication interface is used to send data sent by the DPU, or the communication interface is used to receive data sent to the DPU by other devices.
  • a computer-readable storage medium is provided.
  • Computer programs or instructions are stored in the storage medium.
  • any one of the implementation methods in the second aspect is executed. The steps of the method.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the operational steps of the method described in any implementation manner of the second aspect.
  • the computer may refer to a data access device, a host, a DPU or a DPU card, etc.
  • Figure 1 is a schematic structural diagram of a data access system provided by this application.
  • Figure 2 is a schematic diagram 1 of a storage mapping provided by this application.
  • Figure 3 is a schematic diagram 2 of a storage mapping provided by this application.
  • Figure 4 is a schematic flow chart of the data access method provided by this application.
  • FIG. 5 is a flow diagram 2 of the data access method provided by this application.
  • Figure 6 is a schematic diagram of data failure provided by this application.
  • This application provides a data access method.
  • the storage space in the storage server to which the data is written cannot be accessed by other data access devices. That is to say, for the storage
  • multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only one data access device can access this section of storage space at a time.
  • storage space which avoids multiple data access devices writing data in a storage space in the storage server, causing each data access device to write data from this storage space.
  • the data read in a storage space is different and the data is inconsistent in multiple data access devices.
  • the data access device can bypass the control included in the storage server.
  • controller or processor
  • the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the IO for writing data in the storage server.
  • the length of the path increases the data access efficiency between the data access device and the storage server.
  • FIG 1 is a schematic structural diagram of a data access system provided by this application.
  • the data access system 100 includes: a storage system 110 and multiple data access devices that access the storage system 110.
  • One or more servers in the storage system 110 may also be connected to a computing device.
  • the computing device may be used to provide the server with more computing resources, or the computing functions on the server may be offloaded to an external acceleration device. , in order to improve the data access performance of the storage system 110.
  • the data access device can use a network to access the server in the storage system 110 to access data, and the communication function of the network can be implemented by a switch or a router.
  • the data access device can also communicate with the server through a wired connection, such as peripheral component interconnect express (PCIe) high-speed bus, compute express link (CXL), universal Serial bus (universal serial bus, USB) protocol or buses of other protocols, etc.
  • PCIe peripheral component interconnect express
  • CXL compute express link
  • USB universal Serial bus
  • the data access device includes a host and a computing device.
  • the data access device 1 includes a host 1 and a computing device 131
  • the data access device 2 includes a host 2 and a computing device 132 .
  • the computing device is represented by a DPU card, but this should not be understood as limiting the application.
  • the computing device may include one or more processing units, and the processing unit may not only be a DPU, but also a central processing unit. Unit (central processing unit, CPU), other general-purpose processors, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA ) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor or any conventional processor.
  • the computing device may also be a dedicated processor for artificial intelligence (artificial intelligence, AI), such as a neural processing unit (NPU) or a graphics processor (graphic processing unit, GPU).
  • AI artificial intelligence
  • NPU neural processing unit
  • GPU graphics processor
  • one or more processing units included in the computing device can be packaged as a card, such as the DPU card in Figure 1.
  • the DPU card can be connected through the PCIe interface, CXL interface, unified bus (UB) interface, NVlink interface or other communication interface to connect to the host, and the host can offload some data processing functions to the DPU card.
  • UB unified bus
  • a host is a computer running an application.
  • the computer running the application program is a physical computing device
  • the physical computing device may be a server or a terminal.
  • the terminal can also be called terminal equipment, user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal (mobile terminal, MT), etc.
  • the terminal can be a mobile phone, tablet computer, laptop computer, desktop computer, personal communication service (PCS) phone, desktop computer, wireless terminal in smart city (smart city), wireless terminal in smart home (smart home) etc.
  • PCS personal communication service
  • the embodiments of this application do not limit the specific technology and specific equipment form used by the host.
  • the host shown in Figure 1 may also refer to a client.
  • the storage system provided by the embodiment of the present application may be a distributed storage system or a centralized storage system.
  • the storage system 110 shown in FIG. 1 may be a distributed storage system.
  • the distributed storage system provided by this embodiment includes a storage cluster that integrates computing and storage (integrated storage and computing).
  • the storage cluster includes one or more servers (server 110A and server 110B as shown in Figure 1), and each server can communicate with each other.
  • the servers included in the storage system 110 are also called storage servers.
  • the server 110A shown in FIG. 1 is used for explanation here.
  • the server 110A is a device with both computing capabilities and storage capabilities, such as a server, a desktop computer, etc.
  • an advanced reduced instruction set computer machines (ARM) server or an X86 server can be used as the server 110A here.
  • the server 110A at least includes a processor 112, a memory 113, a network card 114 and a hard disk 105.
  • the processor 112, memory 113, network card 114 and hard disk 105 are connected through a bus. Among them, the processor 112 and the memory 113 are used to provide computing resources.
  • the processor 112 is a CPU that is used to process data access requests (such as write data requests or read data requests) from outside the server 110A (application server or other servers), and is also used to process requests generated internally by the server 110A. .
  • data access requests such as write data requests or read data requests
  • the processor 112 receives log writing requests
  • the data in these log writing requests will be temporarily stored in the memory 113 .
  • the processor 112 sends the data stored in the memory 113 to the hard disk 105 for persistent storage.
  • the processor 112 is also used for data calculation or processing. Only one processor 112 is shown in FIG. 1 . In actual applications, there are often multiple processors 112 , and one processor 112 has one or more CPU cores. This embodiment does not limit the number of CPUs and CPU cores.
  • Memory 113 refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs.
  • Memory includes at least two types of memory.
  • memory can be random access memory.
  • random access memory is dynamic random access memory (DRAM) or storage class memory (SCM).
  • DRAM is a semiconductor memory that, like most random access memories (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-level memory can provide faster read and write speeds than hard disks, but is slower than DRAM in terms of access speed and cheaper than DRAM in cost.
  • DRAM and SCM are only exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as static random access memory (static random access memory, SRAM), etc.
  • the memory 113 can also be a dual in-line memory module or a dual in-line memory module (DIMM), that is, a module composed of dynamic random access memory (DRAM), or a solid state drive ( Solid State Disk, SSD).
  • DIMM dual in-line memory module
  • the storage server 110A may be configured with multiple memories 113 and different types of memories 113 . This embodiment does not limit the number and type of memories 113 .
  • the memory 113 can be configured to have a power-saving function.
  • the power-saving function means that the data stored in the memory 113 will not be lost when the system is powered off and then on again. Memory with a power-saving function is called non-volatile memory.
  • the hard disk 105 is used to provide storage resources, such as storage data and information such as the data access status of each data access device (or host).
  • the data may be stored in the storage hard disk 105 or the memory 113 in the form of an object or a file.
  • the hard disk can be a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive.
  • the hard disk 105 may be a solid state drive based on the Non-Volatile Memory Host Controller Interface Specification (Non-Volatile Memory Express, NVMe), such as an NVMe SSD.
  • NVMe Non-Volatile Memory Express
  • the network card 114 in the server 110A is used to communicate with the host or other application servers (such as the server 110B shown in Figure 1).
  • processor 112 may be offloaded to network card 114.
  • the processor 112 does not perform the processing operation of service data, but the network card 114 completes the processing of service data, address translation and other computing functions.
  • the network card 114 may also have a persistent memory medium, such as persistent memory (PM), non-volatile random access memory (NVRAM), or phase change memory (phase change memory, PCM), etc.
  • the CPU is used to perform operations such as address translation and reading and writing logs.
  • the memory is used to temporarily store data to be written to the hard disk 105, or data to be read from the hard disk 105 and to be sent to the controller.
  • It can also be a programmable electronic component, such as a data processing unit (DPU).
  • the DPU has the generality and programmability of a CPU, but is more specialized and can run efficiently on network packets, storage requests, or analysis requests.
  • DPUs are distinguished from CPUs by their greater degree of parallelism (the need to handle large numbers of requests).
  • the DPU here can also be replaced with processing chips such as GPU and NPU.
  • the network card 114 can access any hard disk 105 in the server 110B where the network card 114 is located. Therefore, it is more convenient to expand the hard disk when the storage space is insufficient.
  • FIG. 1 is only an example provided by the embodiment of this application.
  • the storage system 110 may also include more servers, memories, hard disks and other devices. This application does not limit the number and specific forms of servers, memories and hard disks.
  • the storage system provided by the embodiment of the present application can also be a storage cluster with separate computing and storage.
  • the storage cluster includes a computing device cluster and a storage device cluster.
  • the computing device cluster includes one or more computing devices. , various computing devices can communicate with each other.
  • the computing device may be a computing device, such as a server, a desktop computer, or a controller of a storage array.
  • computing devices can include processors, memory, network cards, etc.
  • the processor is a CPU used to process data access requests from outside the computing device, or requests generated within the computing device. For example, when the processor receives write requests sent by the user, the data carried in these write requests will be temporarily stored in the memory.
  • the processor sends the data stored in the memory to the storage device for persistent storage.
  • the processor is also used for data calculation or processing, such as metadata management, data deduplication, data compression, virtualized storage space, and address translation.
  • the storage system provided by the embodiment of the present application may also be a centralized storage system.
  • the characteristic of the centralized storage system is that it has a unified entrance. All data from external devices must pass through this entrance.
  • This entrance is the engine of the centralized storage system. The engine is the most core component of the centralized storage system, and many advanced functions of the storage system are implemented in it.
  • the engine also includes a front-end interface and a back-end interface, where the front-end interface is used to communicate with the computing device in the centralized storage system to provide storage services for the computing device.
  • the back-end interface is used to communicate with the hard disk to expand the capacity of the centralized storage system. Through the back-end interface, the engine can connect more hard disks, thus forming a very large storage resource pool (referred to as: memory pool).
  • FIG. 2 is a schematic diagram of a storage mapping provided by this application.
  • the storage system 110 may also include a server 110C.
  • server 110C For the hardware implementation of the server, refer to the content in Figure 1 and will not be described again here.
  • DPU 1 is inserted into the motherboard of host 1
  • DPU 2 is inserted into the motherboard of host 2
  • DPU is inserted into the motherboard of host 3.
  • the host with the DPU card inserted on the motherboard is called the data access device of the memory pool.
  • host 1 includes a processor 11 and a memory 12.
  • the specific connection medium between the processor 11 and the memory 12 is not limited in the embodiment of this application. Examples of this application In Figure 2, the processor 11 and the memory 12 are connected through a bus, which can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one line is used in Figure 2, but it does not mean that there is only one bus or one type of bus.
  • the host 1 may also include a communication interface for communicating with other devices through a transmission medium, so that the devices used in the host 1 can communicate with other devices.
  • the memory 12 is used to store program instructions and/or data, and the processor 11 and the memory 12 are coupled.
  • the coupling in the embodiment of this application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
  • the processor 11 may cooperate with the memory 12 .
  • Processor 11 may execute program instructions stored in memory 12 .
  • the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or Execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or an SSD, or a volatile memory (volatile memory), such as a RAM.
  • Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory in the embodiment of the present application can also be a circuit or any other device capable of realizing a storage function, used to store program instructions and/or data.
  • servers 110A to 110C together provide a memory pool for storing objects/files.
  • the memory pool is used to save objects or files in the storage system 110.
  • the object or file refers to a Group data with correlations, as shown in Figure 2 in File 1.
  • the storage space occupied by file 1 is allocated from the memory pool.
  • the memory such as DRAM, PMEM
  • the storage system 110 provides a distributed physical address (Distributed Physical Address, DPA) space (DPA Space) to the outside world.
  • DPA passes The Distributed Page Table (DPT) is mapped to the Distributed Virtual Address (DVA).
  • User files or objects are constructed based on DVA.
  • Applications in the host can map files/objects to the address space of the local process through distributed memory map (distributed mmap), such as file cache 1 that provides storage space by memory 12, and through load/ store to access it.
  • the storage resource used by the file is a page resource, such as the data page (page) shown in Figure 2.
  • the file cache 1 can also be called a page cache (page cache).
  • pages cache page cache
  • files require different storage resources. Multiple files can be distinguished based on file type.
  • File cache 1 includes the metadata of file 1 and some data in file 1. This metadata is used to indicate the address of the data included in file 1 in the storage server. This address can be the aforementioned DVA or DPA, etc. If the data access device or host has this address, the address can be retrieved from the storage server based on this address. Read data directly.
  • File cache 2 includes the metadata of file 1 and part of the data in file 1. This part of the data and the data in file 1 may be the same or different.
  • host 3 maps file 1 and determines file cache 3.
  • File cache 3 includes metadata of file 1 and part of the data in file 1. This part of the data may be the same as or different from the data in file 1.
  • the memory pool shown in Figure 2 is virtualized by one or more memories or hard disks in the storage system 110. , but in some possible examples, the memory pool can also be implemented by other storage media in the storage system 110 , which is not limited in this application.
  • the host can realize the mapping of the above files through an application program, such as mapping management software.
  • mapping management software takes storage mapping (memory map, mmap) management software as an example for explanation, as shown in Figure 3.
  • Figure 3 is an example provided by this application. Schematic diagram 2 of a storage mapping.
  • the data access system shown in Figure 3 includes a data access device 31 and a storage server 32.
  • the data access device 31 can implement the functions of the host 1 and the computing device 131 shown in Figure 1, or the data access device 31 can implement the functions of the host 1 and the DPU 1 shown in Figure 2.
  • the storage server 32 may refer to any server or a combination of multiple servers in the storage system 110, which will not be described again here.
  • File 1 in storage server 32 includes data stored in multiple data pages, such as data pages P1 to P6 shown in FIG. 3 .
  • the data access device 31 includes host 1 and DPU 1 as an example.
  • the data access device 31 maps a global file or object (such as file 1) to the address space of host 1, and maps it through the page fault process. Data to the address space of host 1 is loaded into the local page cache (such as file cache 1).
  • the storage server 32 can add the host 1 to the host list.
  • the host list is used to indicate multiple hosts to which the file 1 is mapped. These multiple hosts can query the file by querying the file 1.
  • the data of file 1 is read from the storage server 32 using the metadata method 1. There is no need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before reading the data from the storage server to the host's memory. The length of the IO path for the data access device to read data from the storage server is reduced, and the data access efficiency between the data access device and the storage server is increased.
  • the storage server 32 provides the storage space of file 1 in the form of a virtual address (Virtual Address, VA)
  • VA Virtual Address
  • VA Physical Address
  • the mapping between the VA and the physical address can be determined in the form of a page table.
  • the page table stores the mapping relationship between the VA provided to file 1 in the storage server 32 and the physical address provided to the file cache 1 by the data access device 31.
  • P1 to P6 in file 1 of the storage server 32 VAs are: Virt1 to Virt6 respectively.
  • the physical addresses of P1 to P6 are: Phys1 to Phys6 respectively.
  • the mapping management software can update the address mapping relationship maintained by the page, so that the data access device 31 can cache the file according to the updated page table.
  • the modified content is synchronized to the storage server 32 through the address mapping relationship indicated by the page table, thereby achieving cache consistency in the storage server 32 and the data access device 31 .
  • the data access device After mmap, the data access device reads and writes the data in file cache 1 through load/store. Among them, the load operation is used to read the data in the file cache 1, and the store operation is used to write the data to the file cache 1. In the embodiment of this application, the store operation will trigger write protection. For example, the mapping management software saves the old data in the file cache 1 to the copy on write log (copy on write Log, Cow Log). If the data access device 31 will write If the data synchronization process of the file cache 1 to the storage server 32 fails, the data access device 31 can roll back the failed data based on the content stored in the copy log to avoid the data being lost after the data synchronization fails and improve the data quality. safety.
  • Figure 4 is a flow diagram 1 of the data access method provided by this application.
  • This data access method can be applied to the data access system shown in Figure 3, where the data access device 31 includes a processor 311, a memory 312 and DPU 313.
  • the hardware implementation of the processor 311, the memory 312 and the DPU 313, please refer to the content of the host 1 and the DPU 1 in Figure 2, which will not be described again here.
  • the data access method includes the following steps S410 to S440.
  • the data access device 31 can write data into the storage space in the memory 312 and synchronize the data according to the address mapping relationship. to the memory pool provided by storage server 32.
  • the content of the address mapping relationship please refer to the above-mentioned relevant explanation in Figure 3 and will not be described again here.
  • the file cache 1 and the first address in the storage server 32 have established an address mapping relationship, then the processor 311 writes the data (referred to as new data in the following embodiment) into the file cache 1 and The storage space corresponding to the first address.
  • the processor 311 establishes an address mapping relationship between the storage space of the first address and the storage space in the file cache 1. , and read the old data stored in the storage space of the first address in the storage server 32 through the address mapping relationship, and the processor 311 writes the new data into the storage space corresponding to the first address in the file cache 1 .
  • the processor 311 may write data into the memory 312 by overwriting or appending.
  • the storage space corresponding to the first address may refer to one or more data pages, such as P1 to P6 as shown in FIG. 4 .
  • the storage space corresponding to the first address refers to P1 as an example.
  • the processor 311 uses an overwrite method to write new data to P1 in the file cache 1, the new data will overwrite the old data.
  • P1 in the file cache 1 The old data stored becomes invalid.
  • the processor 311 writes the new data into the file cache 1 using append writing.
  • P1 in .
  • P1 provides 4KB of storage space, old data occupies 2KB, and new data occupies the remaining 2KB.
  • the data synchronization request is used to instruct data to be stored in the storage server 32 .
  • this data synchronization request is implemented through the "Memory sync" command.
  • the DPU 313 sends a locking request and a data writing command to the storage server 32 based on the data synchronization request.
  • the data write command is used to instruct the storage server 32 to write data into the storage space corresponding to the first address.
  • the locking request is used to instruct the storage server 32 to set the storage space corresponding to the first address so that it cannot be accessed by other data access devices during the process of executing the data write command.
  • the storage space corresponding to the first address is set to be accessible only by the data access device 31, and the access process includes writing data, reading data, and so on.
  • the aforementioned other data access devices refer to other hosts other than host 1 in the data access device 31 recorded in the host list of the storage server 32 .
  • the storage space in the storage server to which the data is written (such as the storage space corresponding to the aforementioned first address) cannot be accessed by other data access devices. That is to say, For a section of storage space of a storage server, multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only one data access device can write data at a time. Accessing this storage space avoids multiple data access devices from writing data to a section of storage space in the storage server, causing each data access device to read different data from this section of storage space, and the data is stored in multiple data access devices. Inconsistency issues.
  • the processor 311 writes the new data stored in the memory 312 into the storage space corresponding to the first address in the storage server 32.
  • the data access device can bypass the controller (or processing unit) included in the storage server. Controller), that is, the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the length of the IO path for writing data in the storage server. Increased data access efficiency between data access devices and storage servers.
  • the processor 311 can also send a read request to the DPU 313.
  • the DPU 313 reads the data stored in the first address from the storage server 32 based on the first address carried by the read request.
  • the data access device For the data stored in the storage server written by the data access device, other data access devices can read the newly written data (new data), and the data access device that writes the data can also read the new data, so that the new data can be read by the data access device.
  • the new data is consistent among multiple data access devices, improving the data access performance of the data access device to the storage server.
  • Figure 5 is a flow diagram 2 of the data access method provided by this application.
  • the data access system applied by the data access method also includes: data access device 33.
  • the DPU 333 included in the data access device 33 can be as shown in Figure 2 DPU 3 is implemented, and the data access device 33 may also include the host 3 shown in Figure 2.
  • the data access method provided in this embodiment includes the following four stages.
  • Stage 1 The DPU 313 sends a locking request to the storage server 32.
  • the locking request is used to instruct the storage server 32 to lock the storage space to be written with new data.
  • the mapping management software in the data access device 31 can obtain the data pages (dirty pages) in the file cache 1 with modified data, thereby determining the dirty page list (dirty page list) in the file cache 1.
  • the aforementioned locking request may carry the dirty page list, so that the storage server 32 determines the storage space to be locked based on the dirty page list.
  • the new data to be written refers to the data stored in P1, P3 and P5 in the file cache 1.
  • the P1, P3 and P5 in the file cache 1 can be called dirty pages in the data access device 31, then the storage server 32
  • the storage space to be written with new data refers to the storage space corresponding to P1, P3 and P5 in the memory pool (shown in the gray part in Figure 5).
  • the storage server 32 maintains the locking or unlocking status of multiple data access devices through a queue.
  • the lock request queue maintains the lock status of one or more data pages in the memory pool.
  • the aforementioned P1, P3 and P5 can only be accessed by the data access device 31, and P2 can only be accessed by data.
  • Device 33 accessed.
  • Stage 2 The data access device 31 writes the data stored in the dirty page into the storage space corresponding to the address of the dirty page in the memory pool (such as the storage space at the first address mentioned above).
  • the DPU 313 writes data unilaterally based on the address of the dirty page in the memory pool recorded in the file cache 1. Write the data stored in dirty pages back to the memory pool.
  • Stage 3 DPU 313 reads the host list of multiple hosts mapped to file 1 from the storage server 32 by unilaterally reading data, and sends an invalidation (Invalidation) indication message to other hosts or data access devices in the host list. To achieve cache coherence among multiple data access devices.
  • the invalidation indication message is used to instruct other data access devices to invalidate old data stored in the first address.
  • the file cache 3 of the data access device 33 maps the storage space and data corresponding to P1, P2, P3, P4 and P6 in the memory pool. Since P1 and P3 have been written by the data access device 31, data, therefore, after the DPU 333 receives the invalidation indication message sent by the DPU 313, the DPU 333 can cache the file 3 with the aforementioned file. The data with the same address of the dirty page in file cache 1 becomes invalid, such as the old data stored in P1 and P3 in file cache 3.
  • the DPU 333 can query the page table maintained by the data access device 33 according to the invalidation indication message sent by the DPU 313, thereby determining the physical address of the data page to be invalidated in the file cache 3. If the failure indication message carries VAs of P1, P3 and P5: Virt1, Virt3, Virt5, DPU 333, after querying the page table, determines that the physical addresses to be invalidated in file cache 3 include: Phys1 corresponding to P1 and Phys3 corresponding to P3. Then the DPU 333 modifies the status of Virt1-Phys1 and Virt3-Phys3 in the page table to an invalid state.
  • the modified data will not be synchronized to the memory. pool.
  • the other data access devices re-read the new data in the storage space at the first address from the storage server, avoiding multiple data access devices.
  • the data read in the same storage space of the storage server is inconsistent, leading to data access errors or reduced data access efficiency.
  • FIG. 6 is a schematic diagram of data invalidation provided by this application.
  • the data access device 33 includes the host 3 and DPU 333, regarding the hardware implementation of host 3, please refer to the content of host 1 in Figure 2, and will not be described again here.
  • the DPU 333 includes a processor 333A and a memory 333B.
  • the processor 333A can be a CPU (DPU CPU as shown in Figure 6), and the memory 333B can be a DRAM.
  • Host 3 maintains multiple page tables.
  • One page table corresponds to one file in a storage server.
  • page table 1 corresponds to file 1.
  • DPU 333 locally maintains a table that records the starting positions of all page tables of this node. When DPU 333 receives failure indication requests sent by other DPUs, it obtains the starting address of the page table corresponding to the file identification (obj ID) based on the file identification. . For example, the physical address of the page table determined by the file identifier "1" is "0x34adf", and the data length of page table 1 corresponding to file 1 is 64B.
  • the DPU 333 reads the page table 1 from the Compute Express Link (CXL).cache to the local (memory 333B), and the processor 333A modifies the page table entry corresponding to the page table 1, as shown in Figure 6 Virt1-Phys1 and Virt3-Phys3, the two page table entries are set to be invalid. Since the modified page table entry is associated with P1 and P3 in file cache 3, when the page table entry becomes invalid, the data access device 33 needs to To access the data corresponding to P1 and P3 in file 1, you need to read it from the storage server, thus completing the cache invalidation.
  • CXL Compute Express Link
  • the processor 333A modifies the page table entry corresponding to the page table 1, as shown in Figure 6 Virt1-Phys1 and Virt3-Phys3, the two page table entries are set to be invalid. Since the modified page table entry is associated with P1 and P3 in file cache 3, when the page table entry becomes invalid, the data access device 33 needs to To access the data corresponding
  • the DPU 313 can also receive invalidation information from other data access devices, thereby storing the second address indicated by the invalidation information in the memory 312. The data is invalid.
  • the second address is different from the aforementioned first address.
  • the DPU 313 receives the invalidation information sent by other data access devices, it invalidates the old data stored at the second address (such as P2) to prevent the data access device from using the old data to perform tasks.
  • the old data is different from the second data in the storage server.
  • the new data stored in the storage space of the address is inconsistent, resulting in an access error caused by inconsistent data in the same storage space cached by multiple data access devices.
  • the storage server stores data in multiple data access devices.
  • the data at the second address is synchronized, resulting in a problem of reduced data access efficiency of the data access device to the storage server.
  • the second address is the same as the aforementioned first address. It should be understood that for a section of storage space in the storage server, multiple data access devices can modify the data in the section of storage space at different times, etc., to avoid that the data in this section of storage space will be deleted during the data access process of the storage server. The problem that it can only be modified by a single data access device improves the performance of the data access services that the storage server can provide.
  • the data access method provided by this embodiment also includes the following stages 4.
  • Stage 4 After the new data is successfully written into the first address, the DPU 313 sends an unlocking request to the storage server 32 .
  • the unlocking request is used to instruct the storage server 32 to set the first address of the data so that it can be accessed by other data access devices.
  • the storage server can unlock the access status of the data or files. This allows the updated data or files to be accessed or modified by other data access devices, etc., thus avoiding the possibility that the data or files in the storage server can only be used by a single data access device, causing other data access devices to access the data or files. The problem of reduced access efficiency.
  • the data access device includes corresponding hardware structures and/or software modules that perform each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software driving the hardware depends on the specific application scenarios and design constraints of the technical solution.
  • the data access device may include: a communication module, a storage module and a lock module.
  • the storage module is used to write data into the memory; the communication module is used to send a data synchronization request to the DPU; the data synchronization request is used to instruct data to be stored in the storage server.
  • the lock module applied to the DPU sends a locking request to the storage server based on the data synchronization request; the communication module is also used to send data writing commands to the storage server.
  • the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address
  • the lock request is used to instruct the storage server to set the storage space corresponding to the first address during the process of executing the data write command. cannot be accessed by other data access devices.
  • the data access device in the embodiment of the present application can be implemented by a DPU.
  • the data access device according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of the various units and modules in the data access device are respectively to realize each of the aforementioned figures. The corresponding process of the method will not be repeated here for the sake of brevity.
  • a DPU includes control circuits and interface circuits.
  • the interface circuit is used to receive data from other devices other than the DPU and transmit it to the control circuit, or to send data from the control circuit to other devices other than the DPU.
  • the control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the DPU in the aforementioned data access method.
  • An embodiment of the present application also provides a network card, including: the DPU and communication interface described in the previous embodiment.
  • the communication interface is used to send data sent by the DPU, or the communication interface is used to receive data sent to the DPU by other devices. Therefore, the DPU implements the operating steps of the data access method provided by this application.
  • the method steps in this embodiment can be implemented by hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC.
  • the ASIC can be located in a computing device.
  • the processor and the storage medium can also exist as discrete components in network equipment or terminal equipment.
  • This application also provides a chip system, which includes a processor and is used to implement the functions of the data processing unit in the above method.
  • the chip system further includes a memory for storing program instructions and/or data.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
  • the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media.
  • the available media may be magnetic media, such as floppy disks, hard disks, and magnetic tapes; they may also be optical media, such as digital video discs (DVDs); they may also be semiconductor media, such as solid state drives (solid state drives). , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Sont divulgués un dispositif, un procédé et un système d'accès à des données, une unité de traitement de données et une carte d'interface réseau, se rapportant au domaine du stockage de données. Lorsqu'un dispositif d'accès à des données écrit des données dans un serveur de stockage, l'espace de stockage dans le serveur de stockage dans lequel les données doivent être écrites ne peut pas être accédé par d'autres dispositifs d'accès à des données. C'est-à-dire qu'un seul dispositif d'accès à des données peut accéder à l'espace de stockage à la fois, de sorte que le problème selon lequel les données lues par chaque dispositif d'accès à des données à partir d'un espace de stockage sont incohérentes parce qu'une pluralité de dispositifs d'accès à des données écrivent des données dans un espace de stockage dans le serveur de stockage est évité. De plus, le dispositif d'accès à des données peut écrire les données dans le serveur de stockage à partir d'une mémoire sans qu'il soit nécessaire d'attendre qu'un contrôleur dans le serveur de stockage interagisse avec un disque correspondant à l'espace de stockage, de sorte que la longueur d'un chemin E/S pour écrire des données dans le serveur de stockage est réduite, ce qui permet d'améliorer l'efficacité d'accès aux données entre le dispositif d'accès à des données et le serveur de stockage.
PCT/CN2023/089442 2022-08-31 2023-04-20 Dispositif, procédé et système d'accès à des données, unité de traitement de données et carte d'interface réseau WO2024045643A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211054647.3A CN117667761A (zh) 2022-08-31 2022-08-31 数据访问设备、方法、系统、数据处理单元及网卡
CN202211054647.3 2022-08-31

Publications (1)

Publication Number Publication Date
WO2024045643A1 true WO2024045643A1 (fr) 2024-03-07

Family

ID=90085005

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089442 WO2024045643A1 (fr) 2022-08-31 2023-04-20 Dispositif, procédé et système d'accès à des données, unité de traitement de données et carte d'interface réseau

Country Status (2)

Country Link
CN (1) CN117667761A (fr)
WO (1) WO2024045643A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225335A1 (en) * 2010-03-15 2011-09-15 International Business Machines Corporation Using a dual mode reader writer lock
CN107807797A (zh) * 2017-11-17 2018-03-16 北京联想超融合科技有限公司 数据写入的方法、装置及服务器
CN110399227A (zh) * 2018-08-24 2019-11-01 腾讯科技(深圳)有限公司 一种数据访问方法、装置和存储介质
CN110691062A (zh) * 2018-07-06 2020-01-14 浙江大学 一种数据写入方法、装置及其设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225335A1 (en) * 2010-03-15 2011-09-15 International Business Machines Corporation Using a dual mode reader writer lock
CN107807797A (zh) * 2017-11-17 2018-03-16 北京联想超融合科技有限公司 数据写入的方法、装置及服务器
CN110691062A (zh) * 2018-07-06 2020-01-14 浙江大学 一种数据写入方法、装置及其设备
CN110399227A (zh) * 2018-08-24 2019-11-01 腾讯科技(深圳)有限公司 一种数据访问方法、装置和存储介质

Also Published As

Publication number Publication date
CN117667761A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US11500689B2 (en) Communication method and apparatus
US10747673B2 (en) System and method for facilitating cluster-level cache and memory space
US9092426B1 (en) Zero-copy direct memory access (DMA) network-attached storage (NAS) file system block writing
US7620784B2 (en) High speed nonvolatile memory device using parallel writing among a plurality of interfaces
US8433888B2 (en) Network boot system
US10733101B2 (en) Processing node, computer system, and transaction conflict detection method
US20190026225A1 (en) Multiple chip multiprocessor cache coherence operation method and multiple chip multiprocessor
CN110119304B (zh) 一种中断处理方法、装置及服务器
US11544812B2 (en) Resiliency schemes for distributed storage systems
WO2023035646A1 (fr) Procédé et appareil d'extension de mémoire et dispositif associé
US11240306B2 (en) Scalable storage system
WO2023125524A1 (fr) Procédé et système de stockage de données, procédé de configuration d'accès au stockage et dispositif associé
WO2019089057A1 (fr) Système de stockage évolutif
US20240211136A1 (en) Service system and memory management method and apparatus
WO2022033269A1 (fr) Procédé, dispositif et système de traitement de données
WO2024051292A1 (fr) Système de traitement de données, procédé et appareil de mise en miroir de mémoire, et dispositif informatique
WO2024045643A1 (fr) Dispositif, procédé et système d'accès à des données, unité de traitement de données et carte d'interface réseau
WO2022073399A1 (fr) Nœud de stockage, dispositif de stockage et puce de réseau
JP2017033375A (ja) 並列計算システム、マイグレーション方法、及びマイグレーションプログラム
CN116594551A (zh) 一种数据存储方法及装置
WO2022222523A1 (fr) Procédé et appareil de gestion de journal
WO2023231572A1 (fr) Procédé et appareil de création de conteneur, et support de stockage
WO2024060710A1 (fr) Procédé et appareil de permutation de page
WO2023000784A1 (fr) Procédé d'accès à des données et dispositif associé
WO2022262623A1 (fr) Procédé et appareil d'échange de données

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23858684

Country of ref document: EP

Kind code of ref document: A1