WO2024045643A1 - Data access device, method and system, data processing unit, and network interface card - Google Patents

Data access device, method and system, data processing unit, and network interface card Download PDF

Info

Publication number
WO2024045643A1
WO2024045643A1 PCT/CN2023/089442 CN2023089442W WO2024045643A1 WO 2024045643 A1 WO2024045643 A1 WO 2024045643A1 CN 2023089442 W CN2023089442 W CN 2023089442W WO 2024045643 A1 WO2024045643 A1 WO 2024045643A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data access
storage server
address
access device
Prior art date
Application number
PCT/CN2023/089442
Other languages
French (fr)
Chinese (zh)
Inventor
钟刊
崔文林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024045643A1 publication Critical patent/WO2024045643A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1072Decentralised address translation, e.g. in distributed shared memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores

Definitions

  • This application relates to the field of data storage, and in particular to a data access device, method, system, data processing unit and network card.
  • Shuffle is used to describe the process of shuffling data and then aggregating it to different nodes. Take applications where distributed systems run storage-intensive tasks as an example.
  • Mapper mapping nodes
  • Reducer simplifying nodes
  • RPC remote procedure call
  • Data transmission based on RPC requests requires the cooperation of local nodes (such as simplification nodes) and remote nodes (such as mapping nodes), resulting in a large consumption of network resources and memory, and the disk IO of local nodes and remote nodes If there are too many, data access efficiency will be affected.
  • This application provides a data access device, method, system, data processing unit and network card, which solves the problem of low data access efficiency of local nodes and remote nodes.
  • a data access device includes: a processor, a memory, and a data processing unit (DPU).
  • the data processing unit (DPU) may refer to a pluggable accelerator card, such as a DPU card.
  • the data access device may include a server (or host) and the DPU card.
  • the server may include the aforementioned processor. and memory.
  • the processor is used to: write data into the memory, and send a data synchronization request to the DPU; the data synchronization request is used to instruct the data to be stored in the storage server.
  • the DPU is used to send locking requests and data writing commands to the storage server based on the aforementioned data synchronization request.
  • the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address
  • the lock request is used to instruct the storage server to write the data to the storage space corresponding to the first address during the execution of the data write command. Set so that it cannot be accessed by other data access devices.
  • the storage space in the storage server to which the data is to be written (such as the storage space corresponding to the aforementioned first address) cannot be accessed by other data access devices.
  • multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only One data access device can access the storage space, which prevents multiple data access devices from writing data to a section of storage space in the storage server, causing each data access device to read different data from this section of storage space, and the data is stored in multiple locations. Inconsistency issues in data access devices.
  • the data access device can bypass the control included in the storage server.
  • controller or processor
  • the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the IO for writing data in the storage server.
  • the length of the path increases the data access efficiency between the data access device and the storage server.
  • the DPU is also used to send an unlocking request to the storage server after the data is successfully written to the first address.
  • the unlock request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
  • the storage server can unlock the access status of the data or file, so that the updated data or file can be accessed or modified by other data access devices, etc., to avoid
  • the data or files in the storage server can only be used by a single data access device, causing other data access devices to reduce the data access efficiency of the data or files.
  • the DPU is also used to send failure indication messages to other data access devices.
  • the invalidation indication message is used to instruct other data access devices to invalidate the old data stored in the first address. If the data access device does not send failure indication information to other data access devices, then if the other data access devices have stored or mapped the old data stored in the storage space corresponding to the first address, the other data access devices will use the old data based on the old data. Executing the task will cause errors in the execution of the task. In contrast, in this embodiment, after the data access device writes data in the storage space corresponding to the first address in the storage server, it sends an invalidation indication message to other data access devices, and the other data access devices transfer the stored data The old data at the first address is invalid.
  • the other data access devices When other data access devices need to use the new data written at the first address, the other data access devices re-read the new data in the storage space at the first address from the storage server. , which avoids the problem of inconsistent data read by multiple data access devices in the same storage space of the storage server, leading to data access errors or reduced data access efficiency.
  • the DPU is also configured to invalidate the data stored at the second address indicated by the failure information in the memory based on the failure information sent by other data access devices.
  • the second address is different from the aforementioned first address.
  • the DPU receives the invalidation information sent by other data access devices, it invalidates the old data stored at the second address to prevent the data access device from using the old data to perform tasks.
  • the old data is stored with the storage space at the second address in the storage server.
  • the new data is inconsistent, resulting in an access error caused by inconsistent data in the same storage space cached by multiple data access devices, or after the data access device interacts with the storage server, the storage server changes the second address stored in multiple data access devices.
  • the data is synchronized, which leads to the problem of reduced data access efficiency of the data access device to the storage server.
  • the second address is the same as the aforementioned first address. It should be understood that for a section of storage space in the storage server, multiple data access devices can modify the data in the section of storage space at different times, etc., to avoid that the data in this section of storage space will be deleted during the data access process of the storage server. The problem that it can only be modified by a single data access device improves the performance of the data access services that the storage server can provide.
  • the processor is also used to send a read request to the DPU when the memory misses data.
  • the DPU is also used to: based on the first address carried in the read request, read the data stored in the first address from the storage server. It should be understood that for the data stored in the storage server written by the data access device, other data access devices can read the newly written data (new data), and the data access device that wrote the data can also read the new data. , making the new data consistent in multiple data access devices, and improving the data access performance of the data access device to the storage server.
  • a data access method is provided.
  • the data access method is executed by a data access system.
  • the data access system includes a data access device and a storage server.
  • the data access device includes a processor, a memory, and a DPU.
  • the data access method provided by this embodiment includes: the processor writes data into the memory and sends a data synchronization request to the DPU; the data synchronization request is used to instruct the data to be stored in the storage server. And, the DPU sends locking requests and data writing commands to the storage server based on the data synchronization request.
  • the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address
  • the lock request is used to instruct the storage server to set the storage space corresponding to the first address during the process of executing the data write command. cannot be accessed by other data access devices.
  • the data access method provided in this embodiment also includes: after the data is successfully written to the first address, the DPU sends an unlocking request to the storage server.
  • the unlocking request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
  • the data access method provided in this embodiment also includes: the DPU sending a failure indication message to other data access devices.
  • the invalidation indication message is used to instruct other data access devices to invalidate the old data stored in the first address.
  • the data access method provided by this embodiment also includes: the DPU invalidates the data stored at the second address indicated by the failure information in the memory according to the failure information sent by other data access devices.
  • the data access method provided in this embodiment also includes: when the memory misses data, the DPU sends a read request to the DPU. And, based on the first address carried in the read request, the DPU reads the data stored in the first address from the storage server.
  • a data access system including: a storage server and the data access device shown in any implementation manner in the first aspect.
  • the storage server is used to store data to be synchronized by the data access device, and to set the storage space corresponding to the first address to which the data will be written so that it cannot be accessed by other data access devices.
  • the fourth aspect provides a DPU, including: a control circuit and an interface circuit.
  • the interface circuit is used to receive data from other devices other than the DPU and transmit it to the control circuit, or to send data from the control circuit to other devices other than the DPU.
  • the control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the DPU in any possible implementation manner of the second aspect.
  • a network card including: the DPU and communication interface provided in the fourth aspect.
  • the communication interface is used to send data sent by the DPU, or the communication interface is used to receive data sent to the DPU by other devices.
  • a computer-readable storage medium is provided.
  • Computer programs or instructions are stored in the storage medium.
  • any one of the implementation methods in the second aspect is executed. The steps of the method.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the operational steps of the method described in any implementation manner of the second aspect.
  • the computer may refer to a data access device, a host, a DPU or a DPU card, etc.
  • Figure 1 is a schematic structural diagram of a data access system provided by this application.
  • Figure 2 is a schematic diagram 1 of a storage mapping provided by this application.
  • Figure 3 is a schematic diagram 2 of a storage mapping provided by this application.
  • Figure 4 is a schematic flow chart of the data access method provided by this application.
  • FIG. 5 is a flow diagram 2 of the data access method provided by this application.
  • Figure 6 is a schematic diagram of data failure provided by this application.
  • This application provides a data access method.
  • the storage space in the storage server to which the data is written cannot be accessed by other data access devices. That is to say, for the storage
  • multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only one data access device can access this section of storage space at a time.
  • storage space which avoids multiple data access devices writing data in a storage space in the storage server, causing each data access device to write data from this storage space.
  • the data read in a storage space is different and the data is inconsistent in multiple data access devices.
  • the data access device can bypass the control included in the storage server.
  • controller or processor
  • the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the IO for writing data in the storage server.
  • the length of the path increases the data access efficiency between the data access device and the storage server.
  • FIG 1 is a schematic structural diagram of a data access system provided by this application.
  • the data access system 100 includes: a storage system 110 and multiple data access devices that access the storage system 110.
  • One or more servers in the storage system 110 may also be connected to a computing device.
  • the computing device may be used to provide the server with more computing resources, or the computing functions on the server may be offloaded to an external acceleration device. , in order to improve the data access performance of the storage system 110.
  • the data access device can use a network to access the server in the storage system 110 to access data, and the communication function of the network can be implemented by a switch or a router.
  • the data access device can also communicate with the server through a wired connection, such as peripheral component interconnect express (PCIe) high-speed bus, compute express link (CXL), universal Serial bus (universal serial bus, USB) protocol or buses of other protocols, etc.
  • PCIe peripheral component interconnect express
  • CXL compute express link
  • USB universal Serial bus
  • the data access device includes a host and a computing device.
  • the data access device 1 includes a host 1 and a computing device 131
  • the data access device 2 includes a host 2 and a computing device 132 .
  • the computing device is represented by a DPU card, but this should not be understood as limiting the application.
  • the computing device may include one or more processing units, and the processing unit may not only be a DPU, but also a central processing unit. Unit (central processing unit, CPU), other general-purpose processors, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA ) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor can be a microprocessor or any conventional processor.
  • the computing device may also be a dedicated processor for artificial intelligence (artificial intelligence, AI), such as a neural processing unit (NPU) or a graphics processor (graphic processing unit, GPU).
  • AI artificial intelligence
  • NPU neural processing unit
  • GPU graphics processor
  • one or more processing units included in the computing device can be packaged as a card, such as the DPU card in Figure 1.
  • the DPU card can be connected through the PCIe interface, CXL interface, unified bus (UB) interface, NVlink interface or other communication interface to connect to the host, and the host can offload some data processing functions to the DPU card.
  • UB unified bus
  • a host is a computer running an application.
  • the computer running the application program is a physical computing device
  • the physical computing device may be a server or a terminal.
  • the terminal can also be called terminal equipment, user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal (mobile terminal, MT), etc.
  • the terminal can be a mobile phone, tablet computer, laptop computer, desktop computer, personal communication service (PCS) phone, desktop computer, wireless terminal in smart city (smart city), wireless terminal in smart home (smart home) etc.
  • PCS personal communication service
  • the embodiments of this application do not limit the specific technology and specific equipment form used by the host.
  • the host shown in Figure 1 may also refer to a client.
  • the storage system provided by the embodiment of the present application may be a distributed storage system or a centralized storage system.
  • the storage system 110 shown in FIG. 1 may be a distributed storage system.
  • the distributed storage system provided by this embodiment includes a storage cluster that integrates computing and storage (integrated storage and computing).
  • the storage cluster includes one or more servers (server 110A and server 110B as shown in Figure 1), and each server can communicate with each other.
  • the servers included in the storage system 110 are also called storage servers.
  • the server 110A shown in FIG. 1 is used for explanation here.
  • the server 110A is a device with both computing capabilities and storage capabilities, such as a server, a desktop computer, etc.
  • an advanced reduced instruction set computer machines (ARM) server or an X86 server can be used as the server 110A here.
  • the server 110A at least includes a processor 112, a memory 113, a network card 114 and a hard disk 105.
  • the processor 112, memory 113, network card 114 and hard disk 105 are connected through a bus. Among them, the processor 112 and the memory 113 are used to provide computing resources.
  • the processor 112 is a CPU that is used to process data access requests (such as write data requests or read data requests) from outside the server 110A (application server or other servers), and is also used to process requests generated internally by the server 110A. .
  • data access requests such as write data requests or read data requests
  • the processor 112 receives log writing requests
  • the data in these log writing requests will be temporarily stored in the memory 113 .
  • the processor 112 sends the data stored in the memory 113 to the hard disk 105 for persistent storage.
  • the processor 112 is also used for data calculation or processing. Only one processor 112 is shown in FIG. 1 . In actual applications, there are often multiple processors 112 , and one processor 112 has one or more CPU cores. This embodiment does not limit the number of CPUs and CPU cores.
  • Memory 113 refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs.
  • Memory includes at least two types of memory.
  • memory can be random access memory.
  • random access memory is dynamic random access memory (DRAM) or storage class memory (SCM).
  • DRAM is a semiconductor memory that, like most random access memories (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-level memory can provide faster read and write speeds than hard disks, but is slower than DRAM in terms of access speed and cheaper than DRAM in cost.
  • DRAM and SCM are only exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as static random access memory (static random access memory, SRAM), etc.
  • the memory 113 can also be a dual in-line memory module or a dual in-line memory module (DIMM), that is, a module composed of dynamic random access memory (DRAM), or a solid state drive ( Solid State Disk, SSD).
  • DIMM dual in-line memory module
  • the storage server 110A may be configured with multiple memories 113 and different types of memories 113 . This embodiment does not limit the number and type of memories 113 .
  • the memory 113 can be configured to have a power-saving function.
  • the power-saving function means that the data stored in the memory 113 will not be lost when the system is powered off and then on again. Memory with a power-saving function is called non-volatile memory.
  • the hard disk 105 is used to provide storage resources, such as storage data and information such as the data access status of each data access device (or host).
  • the data may be stored in the storage hard disk 105 or the memory 113 in the form of an object or a file.
  • the hard disk can be a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive.
  • the hard disk 105 may be a solid state drive based on the Non-Volatile Memory Host Controller Interface Specification (Non-Volatile Memory Express, NVMe), such as an NVMe SSD.
  • NVMe Non-Volatile Memory Express
  • the network card 114 in the server 110A is used to communicate with the host or other application servers (such as the server 110B shown in Figure 1).
  • processor 112 may be offloaded to network card 114.
  • the processor 112 does not perform the processing operation of service data, but the network card 114 completes the processing of service data, address translation and other computing functions.
  • the network card 114 may also have a persistent memory medium, such as persistent memory (PM), non-volatile random access memory (NVRAM), or phase change memory (phase change memory, PCM), etc.
  • the CPU is used to perform operations such as address translation and reading and writing logs.
  • the memory is used to temporarily store data to be written to the hard disk 105, or data to be read from the hard disk 105 and to be sent to the controller.
  • It can also be a programmable electronic component, such as a data processing unit (DPU).
  • the DPU has the generality and programmability of a CPU, but is more specialized and can run efficiently on network packets, storage requests, or analysis requests.
  • DPUs are distinguished from CPUs by their greater degree of parallelism (the need to handle large numbers of requests).
  • the DPU here can also be replaced with processing chips such as GPU and NPU.
  • the network card 114 can access any hard disk 105 in the server 110B where the network card 114 is located. Therefore, it is more convenient to expand the hard disk when the storage space is insufficient.
  • FIG. 1 is only an example provided by the embodiment of this application.
  • the storage system 110 may also include more servers, memories, hard disks and other devices. This application does not limit the number and specific forms of servers, memories and hard disks.
  • the storage system provided by the embodiment of the present application can also be a storage cluster with separate computing and storage.
  • the storage cluster includes a computing device cluster and a storage device cluster.
  • the computing device cluster includes one or more computing devices. , various computing devices can communicate with each other.
  • the computing device may be a computing device, such as a server, a desktop computer, or a controller of a storage array.
  • computing devices can include processors, memory, network cards, etc.
  • the processor is a CPU used to process data access requests from outside the computing device, or requests generated within the computing device. For example, when the processor receives write requests sent by the user, the data carried in these write requests will be temporarily stored in the memory.
  • the processor sends the data stored in the memory to the storage device for persistent storage.
  • the processor is also used for data calculation or processing, such as metadata management, data deduplication, data compression, virtualized storage space, and address translation.
  • the storage system provided by the embodiment of the present application may also be a centralized storage system.
  • the characteristic of the centralized storage system is that it has a unified entrance. All data from external devices must pass through this entrance.
  • This entrance is the engine of the centralized storage system. The engine is the most core component of the centralized storage system, and many advanced functions of the storage system are implemented in it.
  • the engine also includes a front-end interface and a back-end interface, where the front-end interface is used to communicate with the computing device in the centralized storage system to provide storage services for the computing device.
  • the back-end interface is used to communicate with the hard disk to expand the capacity of the centralized storage system. Through the back-end interface, the engine can connect more hard disks, thus forming a very large storage resource pool (referred to as: memory pool).
  • FIG. 2 is a schematic diagram of a storage mapping provided by this application.
  • the storage system 110 may also include a server 110C.
  • server 110C For the hardware implementation of the server, refer to the content in Figure 1 and will not be described again here.
  • DPU 1 is inserted into the motherboard of host 1
  • DPU 2 is inserted into the motherboard of host 2
  • DPU is inserted into the motherboard of host 3.
  • the host with the DPU card inserted on the motherboard is called the data access device of the memory pool.
  • host 1 includes a processor 11 and a memory 12.
  • the specific connection medium between the processor 11 and the memory 12 is not limited in the embodiment of this application. Examples of this application In Figure 2, the processor 11 and the memory 12 are connected through a bus, which can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one line is used in Figure 2, but it does not mean that there is only one bus or one type of bus.
  • the host 1 may also include a communication interface for communicating with other devices through a transmission medium, so that the devices used in the host 1 can communicate with other devices.
  • the memory 12 is used to store program instructions and/or data, and the processor 11 and the memory 12 are coupled.
  • the coupling in the embodiment of this application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules.
  • the processor 11 may cooperate with the memory 12 .
  • Processor 11 may execute program instructions stored in memory 12 .
  • the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or Execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or an SSD, or a volatile memory (volatile memory), such as a RAM.
  • Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory in the embodiment of the present application can also be a circuit or any other device capable of realizing a storage function, used to store program instructions and/or data.
  • servers 110A to 110C together provide a memory pool for storing objects/files.
  • the memory pool is used to save objects or files in the storage system 110.
  • the object or file refers to a Group data with correlations, as shown in Figure 2 in File 1.
  • the storage space occupied by file 1 is allocated from the memory pool.
  • the memory such as DRAM, PMEM
  • the storage system 110 provides a distributed physical address (Distributed Physical Address, DPA) space (DPA Space) to the outside world.
  • DPA passes The Distributed Page Table (DPT) is mapped to the Distributed Virtual Address (DVA).
  • User files or objects are constructed based on DVA.
  • Applications in the host can map files/objects to the address space of the local process through distributed memory map (distributed mmap), such as file cache 1 that provides storage space by memory 12, and through load/ store to access it.
  • the storage resource used by the file is a page resource, such as the data page (page) shown in Figure 2.
  • the file cache 1 can also be called a page cache (page cache).
  • pages cache page cache
  • files require different storage resources. Multiple files can be distinguished based on file type.
  • File cache 1 includes the metadata of file 1 and some data in file 1. This metadata is used to indicate the address of the data included in file 1 in the storage server. This address can be the aforementioned DVA or DPA, etc. If the data access device or host has this address, the address can be retrieved from the storage server based on this address. Read data directly.
  • File cache 2 includes the metadata of file 1 and part of the data in file 1. This part of the data and the data in file 1 may be the same or different.
  • host 3 maps file 1 and determines file cache 3.
  • File cache 3 includes metadata of file 1 and part of the data in file 1. This part of the data may be the same as or different from the data in file 1.
  • the memory pool shown in Figure 2 is virtualized by one or more memories or hard disks in the storage system 110. , but in some possible examples, the memory pool can also be implemented by other storage media in the storage system 110 , which is not limited in this application.
  • the host can realize the mapping of the above files through an application program, such as mapping management software.
  • mapping management software takes storage mapping (memory map, mmap) management software as an example for explanation, as shown in Figure 3.
  • Figure 3 is an example provided by this application. Schematic diagram 2 of a storage mapping.
  • the data access system shown in Figure 3 includes a data access device 31 and a storage server 32.
  • the data access device 31 can implement the functions of the host 1 and the computing device 131 shown in Figure 1, or the data access device 31 can implement the functions of the host 1 and the DPU 1 shown in Figure 2.
  • the storage server 32 may refer to any server or a combination of multiple servers in the storage system 110, which will not be described again here.
  • File 1 in storage server 32 includes data stored in multiple data pages, such as data pages P1 to P6 shown in FIG. 3 .
  • the data access device 31 includes host 1 and DPU 1 as an example.
  • the data access device 31 maps a global file or object (such as file 1) to the address space of host 1, and maps it through the page fault process. Data to the address space of host 1 is loaded into the local page cache (such as file cache 1).
  • the storage server 32 can add the host 1 to the host list.
  • the host list is used to indicate multiple hosts to which the file 1 is mapped. These multiple hosts can query the file by querying the file 1.
  • the data of file 1 is read from the storage server 32 using the metadata method 1. There is no need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before reading the data from the storage server to the host's memory. The length of the IO path for the data access device to read data from the storage server is reduced, and the data access efficiency between the data access device and the storage server is increased.
  • the storage server 32 provides the storage space of file 1 in the form of a virtual address (Virtual Address, VA)
  • VA Virtual Address
  • VA Physical Address
  • the mapping between the VA and the physical address can be determined in the form of a page table.
  • the page table stores the mapping relationship between the VA provided to file 1 in the storage server 32 and the physical address provided to the file cache 1 by the data access device 31.
  • P1 to P6 in file 1 of the storage server 32 VAs are: Virt1 to Virt6 respectively.
  • the physical addresses of P1 to P6 are: Phys1 to Phys6 respectively.
  • the mapping management software can update the address mapping relationship maintained by the page, so that the data access device 31 can cache the file according to the updated page table.
  • the modified content is synchronized to the storage server 32 through the address mapping relationship indicated by the page table, thereby achieving cache consistency in the storage server 32 and the data access device 31 .
  • the data access device After mmap, the data access device reads and writes the data in file cache 1 through load/store. Among them, the load operation is used to read the data in the file cache 1, and the store operation is used to write the data to the file cache 1. In the embodiment of this application, the store operation will trigger write protection. For example, the mapping management software saves the old data in the file cache 1 to the copy on write log (copy on write Log, Cow Log). If the data access device 31 will write If the data synchronization process of the file cache 1 to the storage server 32 fails, the data access device 31 can roll back the failed data based on the content stored in the copy log to avoid the data being lost after the data synchronization fails and improve the data quality. safety.
  • Figure 4 is a flow diagram 1 of the data access method provided by this application.
  • This data access method can be applied to the data access system shown in Figure 3, where the data access device 31 includes a processor 311, a memory 312 and DPU 313.
  • the hardware implementation of the processor 311, the memory 312 and the DPU 313, please refer to the content of the host 1 and the DPU 1 in Figure 2, which will not be described again here.
  • the data access method includes the following steps S410 to S440.
  • the data access device 31 can write data into the storage space in the memory 312 and synchronize the data according to the address mapping relationship. to the memory pool provided by storage server 32.
  • the content of the address mapping relationship please refer to the above-mentioned relevant explanation in Figure 3 and will not be described again here.
  • the file cache 1 and the first address in the storage server 32 have established an address mapping relationship, then the processor 311 writes the data (referred to as new data in the following embodiment) into the file cache 1 and The storage space corresponding to the first address.
  • the processor 311 establishes an address mapping relationship between the storage space of the first address and the storage space in the file cache 1. , and read the old data stored in the storage space of the first address in the storage server 32 through the address mapping relationship, and the processor 311 writes the new data into the storage space corresponding to the first address in the file cache 1 .
  • the processor 311 may write data into the memory 312 by overwriting or appending.
  • the storage space corresponding to the first address may refer to one or more data pages, such as P1 to P6 as shown in FIG. 4 .
  • the storage space corresponding to the first address refers to P1 as an example.
  • the processor 311 uses an overwrite method to write new data to P1 in the file cache 1, the new data will overwrite the old data.
  • P1 in the file cache 1 The old data stored becomes invalid.
  • the processor 311 writes the new data into the file cache 1 using append writing.
  • P1 in .
  • P1 provides 4KB of storage space, old data occupies 2KB, and new data occupies the remaining 2KB.
  • the data synchronization request is used to instruct data to be stored in the storage server 32 .
  • this data synchronization request is implemented through the "Memory sync" command.
  • the DPU 313 sends a locking request and a data writing command to the storage server 32 based on the data synchronization request.
  • the data write command is used to instruct the storage server 32 to write data into the storage space corresponding to the first address.
  • the locking request is used to instruct the storage server 32 to set the storage space corresponding to the first address so that it cannot be accessed by other data access devices during the process of executing the data write command.
  • the storage space corresponding to the first address is set to be accessible only by the data access device 31, and the access process includes writing data, reading data, and so on.
  • the aforementioned other data access devices refer to other hosts other than host 1 in the data access device 31 recorded in the host list of the storage server 32 .
  • the storage space in the storage server to which the data is written (such as the storage space corresponding to the aforementioned first address) cannot be accessed by other data access devices. That is to say, For a section of storage space of a storage server, multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only one data access device can write data at a time. Accessing this storage space avoids multiple data access devices from writing data to a section of storage space in the storage server, causing each data access device to read different data from this section of storage space, and the data is stored in multiple data access devices. Inconsistency issues.
  • the processor 311 writes the new data stored in the memory 312 into the storage space corresponding to the first address in the storage server 32.
  • the data access device can bypass the controller (or processing unit) included in the storage server. Controller), that is, the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the length of the IO path for writing data in the storage server. Increased data access efficiency between data access devices and storage servers.
  • the processor 311 can also send a read request to the DPU 313.
  • the DPU 313 reads the data stored in the first address from the storage server 32 based on the first address carried by the read request.
  • the data access device For the data stored in the storage server written by the data access device, other data access devices can read the newly written data (new data), and the data access device that writes the data can also read the new data, so that the new data can be read by the data access device.
  • the new data is consistent among multiple data access devices, improving the data access performance of the data access device to the storage server.
  • Figure 5 is a flow diagram 2 of the data access method provided by this application.
  • the data access system applied by the data access method also includes: data access device 33.
  • the DPU 333 included in the data access device 33 can be as shown in Figure 2 DPU 3 is implemented, and the data access device 33 may also include the host 3 shown in Figure 2.
  • the data access method provided in this embodiment includes the following four stages.
  • Stage 1 The DPU 313 sends a locking request to the storage server 32.
  • the locking request is used to instruct the storage server 32 to lock the storage space to be written with new data.
  • the mapping management software in the data access device 31 can obtain the data pages (dirty pages) in the file cache 1 with modified data, thereby determining the dirty page list (dirty page list) in the file cache 1.
  • the aforementioned locking request may carry the dirty page list, so that the storage server 32 determines the storage space to be locked based on the dirty page list.
  • the new data to be written refers to the data stored in P1, P3 and P5 in the file cache 1.
  • the P1, P3 and P5 in the file cache 1 can be called dirty pages in the data access device 31, then the storage server 32
  • the storage space to be written with new data refers to the storage space corresponding to P1, P3 and P5 in the memory pool (shown in the gray part in Figure 5).
  • the storage server 32 maintains the locking or unlocking status of multiple data access devices through a queue.
  • the lock request queue maintains the lock status of one or more data pages in the memory pool.
  • the aforementioned P1, P3 and P5 can only be accessed by the data access device 31, and P2 can only be accessed by data.
  • Device 33 accessed.
  • Stage 2 The data access device 31 writes the data stored in the dirty page into the storage space corresponding to the address of the dirty page in the memory pool (such as the storage space at the first address mentioned above).
  • the DPU 313 writes data unilaterally based on the address of the dirty page in the memory pool recorded in the file cache 1. Write the data stored in dirty pages back to the memory pool.
  • Stage 3 DPU 313 reads the host list of multiple hosts mapped to file 1 from the storage server 32 by unilaterally reading data, and sends an invalidation (Invalidation) indication message to other hosts or data access devices in the host list. To achieve cache coherence among multiple data access devices.
  • the invalidation indication message is used to instruct other data access devices to invalidate old data stored in the first address.
  • the file cache 3 of the data access device 33 maps the storage space and data corresponding to P1, P2, P3, P4 and P6 in the memory pool. Since P1 and P3 have been written by the data access device 31, data, therefore, after the DPU 333 receives the invalidation indication message sent by the DPU 313, the DPU 333 can cache the file 3 with the aforementioned file. The data with the same address of the dirty page in file cache 1 becomes invalid, such as the old data stored in P1 and P3 in file cache 3.
  • the DPU 333 can query the page table maintained by the data access device 33 according to the invalidation indication message sent by the DPU 313, thereby determining the physical address of the data page to be invalidated in the file cache 3. If the failure indication message carries VAs of P1, P3 and P5: Virt1, Virt3, Virt5, DPU 333, after querying the page table, determines that the physical addresses to be invalidated in file cache 3 include: Phys1 corresponding to P1 and Phys3 corresponding to P3. Then the DPU 333 modifies the status of Virt1-Phys1 and Virt3-Phys3 in the page table to an invalid state.
  • the modified data will not be synchronized to the memory. pool.
  • the other data access devices re-read the new data in the storage space at the first address from the storage server, avoiding multiple data access devices.
  • the data read in the same storage space of the storage server is inconsistent, leading to data access errors or reduced data access efficiency.
  • FIG. 6 is a schematic diagram of data invalidation provided by this application.
  • the data access device 33 includes the host 3 and DPU 333, regarding the hardware implementation of host 3, please refer to the content of host 1 in Figure 2, and will not be described again here.
  • the DPU 333 includes a processor 333A and a memory 333B.
  • the processor 333A can be a CPU (DPU CPU as shown in Figure 6), and the memory 333B can be a DRAM.
  • Host 3 maintains multiple page tables.
  • One page table corresponds to one file in a storage server.
  • page table 1 corresponds to file 1.
  • DPU 333 locally maintains a table that records the starting positions of all page tables of this node. When DPU 333 receives failure indication requests sent by other DPUs, it obtains the starting address of the page table corresponding to the file identification (obj ID) based on the file identification. . For example, the physical address of the page table determined by the file identifier "1" is "0x34adf", and the data length of page table 1 corresponding to file 1 is 64B.
  • the DPU 333 reads the page table 1 from the Compute Express Link (CXL).cache to the local (memory 333B), and the processor 333A modifies the page table entry corresponding to the page table 1, as shown in Figure 6 Virt1-Phys1 and Virt3-Phys3, the two page table entries are set to be invalid. Since the modified page table entry is associated with P1 and P3 in file cache 3, when the page table entry becomes invalid, the data access device 33 needs to To access the data corresponding to P1 and P3 in file 1, you need to read it from the storage server, thus completing the cache invalidation.
  • CXL Compute Express Link
  • the processor 333A modifies the page table entry corresponding to the page table 1, as shown in Figure 6 Virt1-Phys1 and Virt3-Phys3, the two page table entries are set to be invalid. Since the modified page table entry is associated with P1 and P3 in file cache 3, when the page table entry becomes invalid, the data access device 33 needs to To access the data corresponding
  • the DPU 313 can also receive invalidation information from other data access devices, thereby storing the second address indicated by the invalidation information in the memory 312. The data is invalid.
  • the second address is different from the aforementioned first address.
  • the DPU 313 receives the invalidation information sent by other data access devices, it invalidates the old data stored at the second address (such as P2) to prevent the data access device from using the old data to perform tasks.
  • the old data is different from the second data in the storage server.
  • the new data stored in the storage space of the address is inconsistent, resulting in an access error caused by inconsistent data in the same storage space cached by multiple data access devices.
  • the storage server stores data in multiple data access devices.
  • the data at the second address is synchronized, resulting in a problem of reduced data access efficiency of the data access device to the storage server.
  • the second address is the same as the aforementioned first address. It should be understood that for a section of storage space in the storage server, multiple data access devices can modify the data in the section of storage space at different times, etc., to avoid that the data in this section of storage space will be deleted during the data access process of the storage server. The problem that it can only be modified by a single data access device improves the performance of the data access services that the storage server can provide.
  • the data access method provided by this embodiment also includes the following stages 4.
  • Stage 4 After the new data is successfully written into the first address, the DPU 313 sends an unlocking request to the storage server 32 .
  • the unlocking request is used to instruct the storage server 32 to set the first address of the data so that it can be accessed by other data access devices.
  • the storage server can unlock the access status of the data or files. This allows the updated data or files to be accessed or modified by other data access devices, etc., thus avoiding the possibility that the data or files in the storage server can only be used by a single data access device, causing other data access devices to access the data or files. The problem of reduced access efficiency.
  • the data access device includes corresponding hardware structures and/or software modules that perform each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software driving the hardware depends on the specific application scenarios and design constraints of the technical solution.
  • the data access device may include: a communication module, a storage module and a lock module.
  • the storage module is used to write data into the memory; the communication module is used to send a data synchronization request to the DPU; the data synchronization request is used to instruct data to be stored in the storage server.
  • the lock module applied to the DPU sends a locking request to the storage server based on the data synchronization request; the communication module is also used to send data writing commands to the storage server.
  • the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address
  • the lock request is used to instruct the storage server to set the storage space corresponding to the first address during the process of executing the data write command. cannot be accessed by other data access devices.
  • the data access device in the embodiment of the present application can be implemented by a DPU.
  • the data access device according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of the various units and modules in the data access device are respectively to realize each of the aforementioned figures. The corresponding process of the method will not be repeated here for the sake of brevity.
  • a DPU includes control circuits and interface circuits.
  • the interface circuit is used to receive data from other devices other than the DPU and transmit it to the control circuit, or to send data from the control circuit to other devices other than the DPU.
  • the control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the DPU in the aforementioned data access method.
  • An embodiment of the present application also provides a network card, including: the DPU and communication interface described in the previous embodiment.
  • the communication interface is used to send data sent by the DPU, or the communication interface is used to receive data sent to the DPU by other devices. Therefore, the DPU implements the operating steps of the data access method provided by this application.
  • the method steps in this embodiment can be implemented by hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC.
  • the ASIC can be located in a computing device.
  • the processor and the storage medium can also exist as discrete components in network equipment or terminal equipment.
  • This application also provides a chip system, which includes a processor and is used to implement the functions of the data processing unit in the above method.
  • the chip system further includes a memory for storing program instructions and/or data.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
  • the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media.
  • the available media may be magnetic media, such as floppy disks, hard disks, and magnetic tapes; they may also be optical media, such as digital video discs (DVDs); they may also be semiconductor media, such as solid state drives (solid state drives). , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a data access device, method and system, a data processing unit, and a network interface card, relating to the field of data storage. When a data access device writes data into a storage server, the storage space in the storage server into which the data is to be written cannot be accessed by other data access devices. That is, only one data access device can access the storage space at a time, so that the problem that the data read by each data access device from one storage space is inconsistent because a plurality of data access devices write data into one storage space in the storage server is avoided. Moreover, the data access device can write the data into the storage server from a memory without the need to wait for a controller in the storage server to interact with a disk corresponding to the storage space, so that the length of an I/O path for writing data into the storage server is reduced, thereby improving the efficiency of data access between the data access device and the storage server.

Description

数据访问设备、方法、系统、数据处理单元及网卡Data access equipment, methods, systems, data processing units and network cards 技术领域Technical field
本申请涉及数据存储领域,尤其涉及一种数据访问设备、方法、系统、数据处理单元及网卡。This application relates to the field of data storage, and in particular to a data access device, method, system, data processing unit and network card.
背景技术Background technique
shuffle用于描述将数据打乱后汇聚到不同节点的过程,以分布式系统运行存储密集型任务的应用为例,如Map/Reduce应用中,shuffle是连接映射节点(Mapper)和化简节点(Reducer)之间的桥梁,如shuffle基于远程过程调用(remote procedure call,RPC)请求传输映射节点与化简节点之间的数据。而基于RPC请求实现的数据传输需要本地节点(如化简节点)和远端节点(如映射节点)的协同才能实现,导致网络资源和内存的大量消耗,且本地节点和远端节点的磁盘IO较多,数据访问效率受到影响。Shuffle is used to describe the process of shuffling data and then aggregating it to different nodes. Take applications where distributed systems run storage-intensive tasks as an example. For example, in Map/Reduce applications, shuffle is a process that connects mapping nodes (Mapper) and simplifying nodes ( Reducer), such as shuffle, transmits data between mapping nodes and reducing nodes based on remote procedure call (RPC) requests. Data transmission based on RPC requests requires the cooperation of local nodes (such as simplification nodes) and remote nodes (such as mapping nodes), resulting in a large consumption of network resources and memory, and the disk IO of local nodes and remote nodes If there are too many, data access efficiency will be affected.
发明内容Contents of the invention
本申请提供了一种数据访问设备、方法、系统、数据处理单元及网卡,解决了本地节点和远端节点的数据访问效率较低的问题。This application provides a data access device, method, system, data processing unit and network card, which solves the problem of low data access efficiency of local nodes and remote nodes.
第一方面,提供了一种数据访问设备,该数据访问设备包括:处理器、存储器和数据处理单元(data processing unit,DPU)。示例性的,该数据处理单元(DPU)可以是指一张可插拔的加速卡,如DPU卡,数据访问设备可以包括服务器(或主机)和该DPU卡,该服务器可包括前述的处理器和存储器。本实施例提供的数据访问设备中,处理器,用于:将数据写入存储器,向DPU发送数据同步请求;该数据同步请求用于指示将数据存储至存储服务器。DPU用于:基于前述的数据同步请求,向存储服务器发送加锁请求以及数据写入命令。其中,数据写入命令用于指示存储服务器将数据写入第一地址对应的存储空间,加锁请求用于指示存储服务器在执行数据写入命令的过程中,将该第一地址对应的存储空间设置为不能被其他数据访问设备访问。In the first aspect, a data access device is provided. The data access device includes: a processor, a memory, and a data processing unit (DPU). For example, the data processing unit (DPU) may refer to a pluggable accelerator card, such as a DPU card. The data access device may include a server (or host) and the DPU card. The server may include the aforementioned processor. and memory. In the data access device provided by this embodiment, the processor is used to: write data into the memory, and send a data synchronization request to the DPU; the data synchronization request is used to instruct the data to be stored in the storage server. The DPU is used to send locking requests and data writing commands to the storage server based on the aforementioned data synchronization request. The data write command is used to instruct the storage server to write data to the storage space corresponding to the first address, and the lock request is used to instruct the storage server to write the data to the storage space corresponding to the first address during the execution of the data write command. Set so that it cannot be accessed by other data access devices.
在本实施例中,数据访问设备向存储服务器写入数据的过程中,存储服务器中该数据所要写入的存储空间(如前述的第一地址对应的存储空间)不能被其他数据访问设备所访问,也就是说,对于存储服务器的一段存储空间而言,在该一段存储空间中写入数据的多个数据访问设备之间是互相排斥的(简称:多写互斥),即一个时间仅有一个数据访问设备能够访问该存储空间,避免了多个数据访问设备在存储服务器中的一段存储空间写入数据,导致每个数据访问设备从该一段存储空间中读取的数据不同,数据在多个数据访问设备中不一致的问题。而且,由于前述一段存储空间无法被其他数据访问设备所访问,因此,在本实施例提供的数据访问设备将数据写入存储服务器的过程中,该数据访问设备可绕过存储服务器所包括的控制器(或处理器),即数据访问设备无需等待存储服务器中的控制器与该存储空间对应的磁盘进行交互后,才将数据从存储器写入存储服务器,减少了存储服务器中写入数据的IO路径的长度,增大了数据访问设备和存储服务器之间的数据访问效率。In this embodiment, when the data access device writes data to the storage server, the storage space in the storage server to which the data is to be written (such as the storage space corresponding to the aforementioned first address) cannot be accessed by other data access devices. , that is to say, for a section of storage space of the storage server, multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only One data access device can access the storage space, which prevents multiple data access devices from writing data to a section of storage space in the storage server, causing each data access device to read different data from this section of storage space, and the data is stored in multiple locations. Inconsistency issues in data access devices. Moreover, since the aforementioned storage space cannot be accessed by other data access devices, when the data access device provided in this embodiment writes data to the storage server, the data access device can bypass the control included in the storage server. controller (or processor), that is, the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the IO for writing data in the storage server. The length of the path increases the data access efficiency between the data access device and the storage server.
一种可选的实现方式中,DPU还用于:在数据成功写入第一地址之后,向存储服务器发送解锁请求。如解锁请求用于指示存储服务器将数据的第一地址设置为能被其他数据访问设备访问。如此,对于存储服务器中所存储的数据或文件而言,该数据或文件被数据访问设备 更新后(如写入新数据、删除、修改等),存储服务器可对该数据或文件的访问状态进行解锁,使得更新后的数据或文件可被其他数据访问设备所访问或修改等,避免了存储服务器中的数据或文件仅能被单个数据访问设备所使用,导致其他数据访问设备对该数据或文件的数据访问效率降低的问题。In an optional implementation, the DPU is also used to send an unlocking request to the storage server after the data is successfully written to the first address. For example, the unlock request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices. In this way, for data or files stored in the storage server, the data or files are accessed by the data access device After updating (such as writing new data, deleting, modifying, etc.), the storage server can unlock the access status of the data or file, so that the updated data or file can be accessed or modified by other data access devices, etc., to avoid The data or files in the storage server can only be used by a single data access device, causing other data access devices to reduce the data access efficiency of the data or files.
另一种可选的实现方式中,DPU还用于:向其他数据访问设备发送失效指示消息。该失效指示消息用于指示其他数据访问设备,将第一地址中存储的旧数据失效。若数据访问设备未向其他数据访问设备发送失效指示信息,则其他数据访问设备在已存储或映射有第一地址对应的存储空间所存储的旧数据的情况下,其他数据访问设备基于该旧数据执行任务,会导致任务的执行出现错误。相比之下,在本实施例中,数据访问设备在存储服务器中第一地址对应的存储空间写入数据后,向其他数据访问设备发送失效指示消息,并由其他数据访问设备将已存储的第一地址的旧数据失效,在其他数据访问设备需要使用该第一地址写入的新数据的情况下,该其他数据访问设备重新从存储服务器读取该第一地址的存储空间中的新数据,避免了多个数据访问设备在存储服务器的同一存储空间中读取的数据不一致,导致数据访问出错或者数据访问效率降低的问题。In another optional implementation, the DPU is also used to send failure indication messages to other data access devices. The invalidation indication message is used to instruct other data access devices to invalidate the old data stored in the first address. If the data access device does not send failure indication information to other data access devices, then if the other data access devices have stored or mapped the old data stored in the storage space corresponding to the first address, the other data access devices will use the old data based on the old data. Executing the task will cause errors in the execution of the task. In contrast, in this embodiment, after the data access device writes data in the storage space corresponding to the first address in the storage server, it sends an invalidation indication message to other data access devices, and the other data access devices transfer the stored data The old data at the first address is invalid. When other data access devices need to use the new data written at the first address, the other data access devices re-read the new data in the storage space at the first address from the storage server. , which avoids the problem of inconsistent data read by multiple data access devices in the same storage space of the storage server, leading to data access errors or reduced data access efficiency.
另一种可选的实现方式中,DPU还用于:根据其他数据访问设备发送的失效信息,将存储器中失效信息指示的第二地址存储的数据失效。In another optional implementation, the DPU is also configured to invalidate the data stored at the second address indicated by the failure information in the memory based on the failure information sent by other data access devices.
一种情形中,该第二地址与前述的第一地址不同。如在DPU接收到其他数据访问设备发送的失效信息后,将第二地址存储的旧数据失效,避免数据访问设备使用旧数据来执行任务,该旧数据与存储服务器中第二地址的存储空间存储的新数据不一致,导致多个数据访问设备缓存的同一存储空间的数据不一致产生的访问错误,或者,数据访问设备与存储服务器交互后,存储服务器对多个数据访问设备中存储的该第二地址的数据进行同步,导致数据访问设备对存储服务器的数据访问效率降低的问题。In one case, the second address is different from the aforementioned first address. For example, after the DPU receives the invalidation information sent by other data access devices, it invalidates the old data stored at the second address to prevent the data access device from using the old data to perform tasks. The old data is stored with the storage space at the second address in the storage server. The new data is inconsistent, resulting in an access error caused by inconsistent data in the same storage space cached by multiple data access devices, or after the data access device interacts with the storage server, the storage server changes the second address stored in multiple data access devices. The data is synchronized, which leads to the problem of reduced data access efficiency of the data access device to the storage server.
另一种情形中,该第二地址与前述的第一地址相同。应理解,对于存储服务器中的一段存储空间,多个数据访问设备可在不同的时间对该一段存储空间中的数据进行修改等,避免了存储服务器的数据访问过程中,该一段存储空间的数据仅能被单个数据访问设备所修改的问题,提高了存储服务器所能提供的数据访问服务的性能。In another case, the second address is the same as the aforementioned first address. It should be understood that for a section of storage space in the storage server, multiple data access devices can modify the data in the section of storage space at different times, etc., to avoid that the data in this section of storage space will be deleted during the data access process of the storage server. The problem that it can only be modified by a single data access device improves the performance of the data access services that the storage server can provide.
作为一种可选的实现方式,处理器还用于:在存储器未命中数据的情况下,向DPU发送读请求。DPU还用于:基于读请求携带的第一地址,从存储服务器读取第一地址中存储的数据。应理解,针对于数据访问设备写入到存储服务器中所存储的数据,其他数据访问设备可以读取该新写入的数据(新数据),写数据的数据访问设备也可以读取该新数据,使得该新数据在多个数据访问设备中具有一致性,提高了数据访问设备对存储服务器的数据访问性能。As an optional implementation method, the processor is also used to send a read request to the DPU when the memory misses data. The DPU is also used to: based on the first address carried in the read request, read the data stored in the first address from the storage server. It should be understood that for the data stored in the storage server written by the data access device, other data access devices can read the newly written data (new data), and the data access device that wrote the data can also read the new data. , making the new data consistent in multiple data access devices, and improving the data access performance of the data access device to the storage server.
第二方面,提供了一种数据访问方法,该数据访问方法由数据访问系统执行,数据访问系统包括数据访问设备和存储服务器,数据访问设备包括处理器、存储器和DPU。示例性的,本实施例提供的数据访问方法包括:处理器将数据写入存储器,向DPU发送数据同步请求;该数据同步请求用于指示将数据存储至存储服务器。以及,DPU基于数据同步请求,向存储服务器发送加锁请求以及数据写入命令。其中,数据写入命令用于指示存储服务器将数据写入第一地址对应的存储空间,加锁请求用于指示存储服务器在执行数据写入命令的过程中,将第一地址对应的存储空间设置为不能被其他数据访问设备访问。In the second aspect, a data access method is provided. The data access method is executed by a data access system. The data access system includes a data access device and a storage server. The data access device includes a processor, a memory, and a DPU. Exemplarily, the data access method provided by this embodiment includes: the processor writes data into the memory and sends a data synchronization request to the DPU; the data synchronization request is used to instruct the data to be stored in the storage server. And, the DPU sends locking requests and data writing commands to the storage server based on the data synchronization request. Among them, the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address, and the lock request is used to instruct the storage server to set the storage space corresponding to the first address during the process of executing the data write command. cannot be accessed by other data access devices.
一种可选的实现方式中,本实施例提供的数据访问方法还包括:在数据成功写入第一地址之后,DPU向存储服务器发送解锁请求。该解锁请求用于指示存储服务器将数据的第一地址设置为能被其他数据访问设备访问。 In an optional implementation manner, the data access method provided in this embodiment also includes: after the data is successfully written to the first address, the DPU sends an unlocking request to the storage server. The unlocking request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
一种可选的实现方式中,本实施例提供的数据访问方法还包括:DPU向其他数据访问设备发送失效指示消息。该失效指示消息用于指示其他数据访问设备,将第一地址中存储的旧数据失效。In an optional implementation manner, the data access method provided in this embodiment also includes: the DPU sending a failure indication message to other data access devices. The invalidation indication message is used to instruct other data access devices to invalidate the old data stored in the first address.
一种可选的实现方式中,本实施例提供的数据访问方法还包括:DPU根据其他数据访问设备发送的失效信息,将存储器中失效信息指示的第二地址存储的数据失效。In an optional implementation manner, the data access method provided by this embodiment also includes: the DPU invalidates the data stored at the second address indicated by the failure information in the memory according to the failure information sent by other data access devices.
一种可选的实现方式中,本实施例提供的数据访问方法还包括:在存储器未命中数据的情况下,DPU向DPU发送读请求。以及,DPU基于读请求携带的第一地址,从存储服务器读取第一地址中存储的数据。In an optional implementation manner, the data access method provided in this embodiment also includes: when the memory misses data, the DPU sends a read request to the DPU. And, based on the first address carried in the read request, the DPU reads the data stored in the first address from the storage server.
第三方面,提供了一种数据访问系统,包括:存储服务器和第一方面中任一种实现方式所示出的数据访问设备。其中,该存储服务器用于存储数据访问设备将要同步的数据,以及,将数据将被写入的第一地址对应的存储空间设置为不能被其他数据访问设备访问等。In the third aspect, a data access system is provided, including: a storage server and the data access device shown in any implementation manner in the first aspect. The storage server is used to store data to be synchronized by the data access device, and to set the storage space corresponding to the first address to which the data will be written so that it cannot be accessed by other data access devices.
第四方面,提供了一种DPU,包括:控制电路和接口电路。接口电路,用于接收来自DPU之外的其他设备的数据并传输至控制电路,或将来自控制电路的数据发送给DPU之外的其他设备。控制电路通过逻辑电路或执行代码指令,和接口电路执行第二方面中任一种可能的实现方式中DPU的功能。The fourth aspect provides a DPU, including: a control circuit and an interface circuit. The interface circuit is used to receive data from other devices other than the DPU and transmit it to the control circuit, or to send data from the control circuit to other devices other than the DPU. The control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the DPU in any possible implementation manner of the second aspect.
第五方面,提供了一种网卡,包括:第四方面提供的DPU和通信接口。如该通信接口用于发送DPU发出的数据,或者,通信接口用于接收其他设备发送给DPU的数据。In the fifth aspect, a network card is provided, including: the DPU and communication interface provided in the fourth aspect. For example, the communication interface is used to send data sent by the DPU, or the communication interface is used to receive data sent to the DPU by other devices.
第六方面,提供了一种计算机可读存储介质,存储介质中存储有计算机程序或指令,当计算机程序或指令被数据访问设备或DPU执行时,执行第二方面中任一种实现方式所述的方法的操作步骤。In the sixth aspect, a computer-readable storage medium is provided. Computer programs or instructions are stored in the storage medium. When the computer program or instructions are executed by a data access device or a DPU, any one of the implementation methods in the second aspect is executed. The steps of the method.
第七方面,提供了一种计算机程序产品,计算机程序产品在计算机上运行时,使得计算机执行第二方面中任一种实现方式所述的方法的操作步骤。示例性的,该计算机可以是指数据访问设备、主机、DPU或DPU卡等。In a seventh aspect, a computer program product is provided. When the computer program product is run on a computer, it causes the computer to execute the operational steps of the method described in any implementation manner of the second aspect. For example, the computer may refer to a data access device, a host, a DPU or a DPU card, etc.
第二方面至第七方面的有益效果可参照第一方面中任一种实现方式的描述,此处不再赘述。本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。For the beneficial effects of the second to seventh aspects, reference can be made to the description of any implementation in the first aspect, and will not be described again here. Based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.
附图说明Description of drawings
图1为本申请提供的一种数据访问系统的结构示意图;Figure 1 is a schematic structural diagram of a data access system provided by this application;
图2为本申请提供的一种存储映射的示意图一;Figure 2 is a schematic diagram 1 of a storage mapping provided by this application;
图3为本申请提供的一种存储映射的示意图二;Figure 3 is a schematic diagram 2 of a storage mapping provided by this application;
图4为本申请提供的数据访问方法的流程示意图一;Figure 4 is a schematic flow chart of the data access method provided by this application;
图5为本申请提供的数据访问方法的流程示意图二;Figure 5 is a flow diagram 2 of the data access method provided by this application;
图6为本申请提供的数据失效的示意图。Figure 6 is a schematic diagram of data failure provided by this application.
具体实施方式Detailed ways
本申请提供了一种数据访问方法,在数据访问设备向存储服务器写入数据的过程中,存储服务器中该数据所要写入的存储空间不能被其他数据访问设备所访问,也就是说,对于存储服务器的一段存储空间而言,在该一段存储空间中写入数据的多个数据访问设备之间是互相排斥的(简称:多写互斥),即一个时间仅有一个数据访问设备能够访问该存储空间,避免了多个数据访问设备在存储服务器中的一段存储空间写入数据,导致每个数据访问设备从该 一段存储空间中读取的数据不同,数据在多个数据访问设备中不一致的问题。而且,由于前述一段存储空间无法被其他数据访问设备所访问,因此,在本实施例提供的数据访问设备将数据写入存储服务器的过程中,该数据访问设备可绕过存储服务器所包括的控制器(或处理器),即数据访问设备无需等待存储服务器中的控制器与该存储空间对应的磁盘进行交互后,才将数据从存储器写入存储服务器,减少了存储服务器中写入数据的IO路径的长度,增大了数据访问设备和存储服务器之间的数据访问效率。This application provides a data access method. In the process of the data access device writing data to the storage server, the storage space in the storage server to which the data is written cannot be accessed by other data access devices. That is to say, for the storage For a section of storage space on the server, multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only one data access device can access this section of storage space at a time. storage space, which avoids multiple data access devices writing data in a storage space in the storage server, causing each data access device to write data from this storage space. The data read in a storage space is different and the data is inconsistent in multiple data access devices. Moreover, since the aforementioned storage space cannot be accessed by other data access devices, when the data access device provided in this embodiment writes data to the storage server, the data access device can bypass the control included in the storage server. controller (or processor), that is, the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the IO for writing data in the storage server. The length of the path increases the data access efficiency between the data access device and the storage server.
下面结合附图对本申请提供的数据访问设备和相应的数据访问方法进行说明,首先给出相关技术的介绍。The data access device and corresponding data access method provided by this application will be described below with reference to the accompanying drawings. First, an introduction to related technologies will be given.
图1为本申请提供的一种数据访问系统的结构示意图,该数据访问系统100包括:存储系统110和访问该存储系统110的多个数据访问设备,如图1所示出的数据访问设备1和数据访问设备2。其中,存储系统110中的一个或多个服务器上还可以连接有计算设备,该计算设备可用于为服务器提供更多的计算资源,或者,该服务器上的计算功能可卸载到外接的加速装置中,以便提高存储系统110的数据访问性能。Figure 1 is a schematic structural diagram of a data access system provided by this application. The data access system 100 includes: a storage system 110 and multiple data access devices that access the storage system 110. The data access device 1 shown in Figure 1 and data access device 2. One or more servers in the storage system 110 may also be connected to a computing device. The computing device may be used to provide the server with more computing resources, or the computing functions on the server may be offloaded to an external acceleration device. , in order to improve the data access performance of the storage system 110.
数据访问设备可以利用网络访问存储系统110中的服务器以存取数据,该网络的通信功能可以由交换机或路由器实现。在一种可能的示例中,数据访问设备也可以通过有线连接与服务器通信,例如,快捷外围组件互连(peripheral component interconnect express,PCIe)高速总线、计算快速互联(compute express link,CXL)、通用串行总线(universal serial bus,USB)协议或其他协议的总线等。The data access device can use a network to access the server in the storage system 110 to access data, and the communication function of the network can be implemented by a switch or a router. In one possible example, the data access device can also communicate with the server through a wired connection, such as peripheral component interconnect express (PCIe) high-speed bus, compute express link (CXL), universal Serial bus (universal serial bus, USB) protocol or buses of other protocols, etc.
数据访问设备包括主机和计算装置,如数据访问设备1包括主机1和计算装置131,数据访问设备2包括主机2和计算装置132。在图1中,计算装置是以DPU卡来表示的,但不应理解为对本申请的限定,该计算装置可以包括一个或多个处理单元,该处理单元不仅可以是DPU,还可以是中央处理单元(central processing unit,CPU)、其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。该计算装置还可以是面向人工智能(artificial intelligence,AI)的专用处理器,如神经处理器(neural processing unit,NPU)或图形处理器(graphic processing unit,GPU)等。在物理形态上,计算装置包括的一个或多个处理单元可以封装为一个卡,如图1的DPU卡,该DPU卡可通过PCIe接口、CXL接口、统一总线(unified bus,UB)接口、NVlink接口或其他通信接口来接入主机,主机可将部分数据处理的功能卸载到DPU卡上。The data access device includes a host and a computing device. For example, the data access device 1 includes a host 1 and a computing device 131 , and the data access device 2 includes a host 2 and a computing device 132 . In Figure 1, the computing device is represented by a DPU card, but this should not be understood as limiting the application. The computing device may include one or more processing units, and the processing unit may not only be a DPU, but also a central processing unit. Unit (central processing unit, CPU), other general-purpose processors, digital signal processor (digital signal processor, DSP), application specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA ) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor can be a microprocessor or any conventional processor. The computing device may also be a dedicated processor for artificial intelligence (artificial intelligence, AI), such as a neural processing unit (NPU) or a graphics processor (graphic processing unit, GPU). In physical form, one or more processing units included in the computing device can be packaged as a card, such as the DPU card in Figure 1. The DPU card can be connected through the PCIe interface, CXL interface, unified bus (UB) interface, NVlink interface or other communication interface to connect to the host, and the host can offload some data processing functions to the DPU card.
示例性的,主机是运行有应用程序的计算机。例如,若该运行有应用程序的计算机为物理计算设备,该物理计算设备可以是服务器或终端(Terminal)。其中,终端也可以称为终端设备、用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端(mobile terminal,MT)等。终端可以是手机、平板电脑、笔记本电脑、桌面电脑、个人通信业务(personal communication service,PCS)电话、台式计算机、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等等。本申请的实施例对主机所采用的具体技术和具体设备形态不做限定。在一些可选的实现方式中,图1所示出的主机还可以是指客户端(client)。Illustratively, a host is a computer running an application. For example, if the computer running the application program is a physical computing device, the physical computing device may be a server or a terminal. Among them, the terminal can also be called terminal equipment, user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal (mobile terminal, MT), etc. The terminal can be a mobile phone, tablet computer, laptop computer, desktop computer, personal communication service (PCS) phone, desktop computer, wireless terminal in smart city (smart city), wireless terminal in smart home (smart home) etc. The embodiments of this application do not limit the specific technology and specific equipment form used by the host. In some optional implementations, the host shown in Figure 1 may also refer to a client.
本申请实施例所提供的存储系统可以为分布式存储系统或集中式存储系统。The storage system provided by the embodiment of the present application may be a distributed storage system or a centralized storage system.
在一种可能的情形中,图1所示出的存储系统110可以为分布式存储系统。如图1所示, 本实施例提供的分布式存储系统包括计算存储一体化(存算一体)的存储集群。存储集群包括一个或多个服务器(如图1所示出的服务器110A和服务器110B),各个服务器之间可以相互通信。In one possible scenario, the storage system 110 shown in FIG. 1 may be a distributed storage system. As shown in Figure 1, The distributed storage system provided by this embodiment includes a storage cluster that integrates computing and storage (integrated storage and computing). The storage cluster includes one or more servers (server 110A and server 110B as shown in Figure 1), and each server can communicate with each other.
在一些可选的实现方式中,存储系统110包括的服务器也称为存储服务器。这里以图1所示出的服务器110A进行说明,服务器110A是一种既具有计算能力又具有存储能力的设备,如服务器、台式计算机等。示例的,先进精简指令(advanced reduced instruction set computer machines,ARM)服务器或者X86服务器都可以作为这里的服务器110A。在硬件上,如图1所示,服务器110A至少包括处理器112、内存113、网卡114和硬盘105。处理器112、内存113、网卡114和硬盘105之间通过总线连接。其中,处理器112和内存113用于提供计算资源。具体地,处理器112是一个CPU,用于处理来自服务器110A外部(应用服务器或者其他服务器)的数据访问请求(如写数据请求或读数据请求等),也用于处理服务器110A内部生成的请求。示例性的,处理器112接收写日志请求时,会将这些写日志请求中的数据暂时保存在内存113中。当内存113中的数据总量达到一定阈值时,处理器112将内存113中存储的数据发送给硬盘105进行持久化存储。除此之外,处理器112还用于数据进行计算或处理等。图1中仅示出了一个处理器112,在实际应用中,处理器112的数量往往有多个,其中,一个处理器112又具有一个或多个CPU核。本实施例不对CPU的数量,以及CPU核的数量进行限定。In some optional implementations, the servers included in the storage system 110 are also called storage servers. The server 110A shown in FIG. 1 is used for explanation here. The server 110A is a device with both computing capabilities and storage capabilities, such as a server, a desktop computer, etc. For example, an advanced reduced instruction set computer machines (ARM) server or an X86 server can be used as the server 110A here. In terms of hardware, as shown in Figure 1 , the server 110A at least includes a processor 112, a memory 113, a network card 114 and a hard disk 105. The processor 112, memory 113, network card 114 and hard disk 105 are connected through a bus. Among them, the processor 112 and the memory 113 are used to provide computing resources. Specifically, the processor 112 is a CPU that is used to process data access requests (such as write data requests or read data requests) from outside the server 110A (application server or other servers), and is also used to process requests generated internally by the server 110A. . For example, when the processor 112 receives log writing requests, the data in these log writing requests will be temporarily stored in the memory 113 . When the total amount of data in the memory 113 reaches a certain threshold, the processor 112 sends the data stored in the memory 113 to the hard disk 105 for persistent storage. In addition, the processor 112 is also used for data calculation or processing. Only one processor 112 is shown in FIG. 1 . In actual applications, there are often multiple processors 112 , and one processor 112 has one or more CPU cores. This embodiment does not limit the number of CPUs and CPU cores.
内存113是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存可以是随机存取存储器,举例来说,随机存取存储器是动态随机存取存储器(dynamic random access memory,DRAM),或者存储级存储器(storage class memory,SCM)。DRAM是一种半导体存储器,与大部分随机存取存储器(random access memory,RAM)一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统储存装置与存储器特性的复合型储存技术,存储级存储器能够提供比硬盘更快速的读写速度,但存取速度上比DRAM慢,在成本上也比DRAM更为便宜。然而,DRAM和SCM在本实施例中只是示例性的说明,内存还可以包括其他随机存取存储器,例如静态随机存取存储器(static random access memory,SRAM)等。Memory 113 refers to the internal memory that directly exchanges data with the processor. It can read and write data at any time and very quickly, and serves as a temporary data storage for the operating system or other running programs. Memory includes at least two types of memory. For example, memory can be random access memory. For example, random access memory is dynamic random access memory (DRAM) or storage class memory (SCM). . DRAM is a semiconductor memory that, like most random access memories (RAM), is a volatile memory device. SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. Storage-level memory can provide faster read and write speeds than hard disks, but is slower than DRAM in terms of access speed and cheaper than DRAM in cost. . However, DRAM and SCM are only exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as static random access memory (static random access memory, SRAM), etc.
另外,内存113还可以是双列直插式存储器模块或双线存储器模块(dual in-line memory module,DIMM),即由动态随机存取存储器(DRAM)组成的模块,还可以是固态硬盘(Solid State Disk,SSD)。实际应用中,存储服务器110A中可配置多个内存113,以及不同类型的内存113。本实施例不对内存113的数量和类型进行限定。此外,可对内存113进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存113中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。In addition, the memory 113 can also be a dual in-line memory module or a dual in-line memory module (DIMM), that is, a module composed of dynamic random access memory (DRAM), or a solid state drive ( Solid State Disk, SSD). In practical applications, the storage server 110A may be configured with multiple memories 113 and different types of memories 113 . This embodiment does not limit the number and type of memories 113 . In addition, the memory 113 can be configured to have a power-saving function. The power-saving function means that the data stored in the memory 113 will not be lost when the system is powered off and then on again. Memory with a power-saving function is called non-volatile memory.
硬盘105用于提供存储资源,例如存储数据和各数据访问设备(或主机)的数据访问状态等信息。如数据可以是以对象(object)或文件(file)的形式存储在存储硬盘105或者内存113等。硬盘可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。示例的,硬盘105可以是基于非易失性内存主机控制器接口规范(Non-Volatile Memory Express,NVMe)的固态硬盘,如NVMe SSD。The hard disk 105 is used to provide storage resources, such as storage data and information such as the data access status of each data access device (or host). For example, the data may be stored in the storage hard disk 105 or the memory 113 in the form of an object or a file. The hard disk can be a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive. For example, the hard disk 105 may be a solid state drive based on the Non-Volatile Memory Host Controller Interface Specification (Non-Volatile Memory Express, NVMe), such as an NVMe SSD.
服务器110A中的网卡114用于与主机或其他应用服务器(如图1所示出的服务器110B)通信。The network card 114 in the server 110A is used to communicate with the host or other application servers (such as the server 110B shown in Figure 1).
在一种实施方式中,处理器112的功能可以卸载到网卡114上。换言之,在该种实施方 式中,处理器112不执行业务数据的处理操作,而是由网卡114来完成业务数据的处理、地址转换以及其他计算功能。In one implementation, the functions of processor 112 may be offloaded to network card 114. In other words, in this implementation In the formula, the processor 112 does not perform the processing operation of service data, but the network card 114 completes the processing of service data, address translation and other computing functions.
在一些应用场景中,网卡114也可能具有持久化内存介质,如持久性内存(persistent memory,PM),或者非易失性随机访问存储器(non-volatile random access memory,NVRAM),或者相变存储器(phase change memory,PCM)等。CPU用于执行地址转换以及读写日志等操作。内存用于临时存储将要写入硬盘105的数据,或者从硬盘105读取出来将要发送给控制器的数据。也可以是一个可编程的电子部件,例如数据处理单元(data processing unit,DPU)。DPU具有CPU的通用性和可编程性,但更具有专用性,可以在网络数据包,存储请求或分析请求上高效运行。DPU通过较大程度的并行性(需要处理大量请求)与CPU区别开来。可选的,这里的DPU也可以替换成GPU、NPU等处理芯片。网卡114和硬盘105之间没有归属关系,网卡114可访问网卡114所在的服务器110B中任意一个硬盘105,因此在存储空间不足时扩展硬盘会较为便捷。In some application scenarios, the network card 114 may also have a persistent memory medium, such as persistent memory (PM), non-volatile random access memory (NVRAM), or phase change memory (phase change memory, PCM), etc. The CPU is used to perform operations such as address translation and reading and writing logs. The memory is used to temporarily store data to be written to the hard disk 105, or data to be read from the hard disk 105 and to be sent to the controller. It can also be a programmable electronic component, such as a data processing unit (DPU). The DPU has the generality and programmability of a CPU, but is more specialized and can run efficiently on network packets, storage requests, or analysis requests. DPUs are distinguished from CPUs by their greater degree of parallelism (the need to handle large numbers of requests). Optionally, the DPU here can also be replaced with processing chips such as GPU and NPU. There is no ownership relationship between the network card 114 and the hard disk 105. The network card 114 can access any hard disk 105 in the server 110B where the network card 114 is located. Therefore, it is more convenient to expand the hard disk when the storage space is insufficient.
图1仅为本申请实施例提供的一种示例,存储系统110中还可以包括更多的服务器、内存或硬盘等设备,本申请不对服务器、内存和硬盘的数量和具体形态进行限定。Figure 1 is only an example provided by the embodiment of this application. The storage system 110 may also include more servers, memories, hard disks and other devices. This application does not limit the number and specific forms of servers, memories and hard disks.
在另一种可能的情形中,本申请实施例所提供的存储系统也可以为计算存储分离的存储集群,该存储集群包括计算设备集群和存储设备集群,计算设备集群包括一个或多个计算设备,各个计算设备之间可以相互通信。计算设备可以是一种计算设备,如服务器、台式计算机或者存储阵列的控制器等。在硬件上,计算设备可以包括处理器、内存和网卡等。其中,处理器是一个CPU,用于处理来自计算设备外部的数据访问请求,或者计算设备内部生成的请求。示例性的,处理器接收用户发送的写请求时,会将这些写请求中携带的数据暂时保存在内存中。当内存中的数据总量达到一定阈值时,处理器将内存中存储的数据发送给存储设备进行持久化存储。除此之外,处理器还用于数据进行计算或处理,例如元数据管理、重复数据删除、数据压缩、虚拟化存储空间以及地址转换等。In another possible situation, the storage system provided by the embodiment of the present application can also be a storage cluster with separate computing and storage. The storage cluster includes a computing device cluster and a storage device cluster. The computing device cluster includes one or more computing devices. , various computing devices can communicate with each other. The computing device may be a computing device, such as a server, a desktop computer, or a controller of a storage array. In terms of hardware, computing devices can include processors, memory, network cards, etc. Wherein, the processor is a CPU used to process data access requests from outside the computing device, or requests generated within the computing device. For example, when the processor receives write requests sent by the user, the data carried in these write requests will be temporarily stored in the memory. When the total amount of data in the memory reaches a certain threshold, the processor sends the data stored in the memory to the storage device for persistent storage. In addition, the processor is also used for data calculation or processing, such as metadata management, data deduplication, data compression, virtualized storage space, and address translation.
作为一种可选的实现方式,本申请实施例所提供的存储系统也可以为集中式存储系统。集中式存储系统的特点是有一个统一的入口,所有从外部设备来的数据都要经过这个入口,这个入口就是集中式存储系统的引擎。引擎是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。As an optional implementation manner, the storage system provided by the embodiment of the present application may also be a centralized storage system. The characteristic of the centralized storage system is that it has a unified entrance. All data from external devices must pass through this entrance. This entrance is the engine of the centralized storage system. The engine is the most core component of the centralized storage system, and many advanced functions of the storage system are implemented in it.
示例的,引擎中可以有一个或多个控制器,在一种可能的示例中,若引擎具有多个控制器,任意两个控制器之间可以具有镜像通道,实现任意两个控制器互为备份的功能,从而避免硬件故障导致集中式存储系统的不可用。引擎还包含前端接口和后端接口,其中,前端接口用于与集中式存储系统中的计算设备通信,从而为计算设备提供存储服务。而后端接口用于与硬盘通信,以扩充集中式存储系统的容量。通过后端接口,引擎可以连接更多的硬盘,从而形成一个非常大的存储资源池(简称:内存池)。For example, there can be one or more controllers in the engine. In a possible example, if the engine has multiple controllers, there can be a mirror channel between any two controllers, so that any two controllers can communicate with each other. The backup function prevents hardware failure from causing the centralized storage system to become unavailable. The engine also includes a front-end interface and a back-end interface, where the front-end interface is used to communicate with the computing device in the centralized storage system to provide storage services for the computing device. The back-end interface is used to communicate with the hard disk to expand the capacity of the centralized storage system. Through the back-end interface, the engine can connect more hard disks, thus forming a very large storage resource pool (referred to as: memory pool).
下面在图1所示出的主机、存储系统110提供了内存池的基础上,提供了一种存储映射的实现方式,如图2所示,图2为本申请提供的一种存储映射的示意图一,存储系统110还可以包括服务器110C,关于服务器的硬件实现可参照图1的内容,在此不予赘述。Based on the memory pool provided by the host and storage system 110 shown in Figure 1, a storage mapping implementation method is provided below, as shown in Figure 2. Figure 2 is a schematic diagram of a storage mapping provided by this application. 1. The storage system 110 may also include a server 110C. For the hardware implementation of the server, refer to the content in Figure 1 and will not be described again here.
主机的主板上插有DPU卡,如主机1的主板中插有DPU 1、主机2的主板中插有DPU 2、主机3的主板中插有DPU。在本实施例中,主板上插有DPU卡的主机称为内存池的数据访问设备。There is a DPU card inserted into the motherboard of the host. For example, DPU 1 is inserted into the motherboard of host 1, DPU 2 is inserted into the motherboard of host 2, and DPU is inserted into the motherboard of host 3. In this embodiment, the host with the DPU card inserted on the motherboard is called the data access device of the memory pool.
下面对主机的硬件实现进行简单说明,以主机1为例,主机1包括处理器11和存储器12,本申请实施例中不限定上述处理器11和存储器12之间的具体连接介质。本申请实施例 在图2中以处理器11和存储器12之间通过总线连接,所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图2中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。主机1还可以包括通信接口,用于通过传输介质和其它设备进行通信,从而用于主机1中的装置可以和其它设备进行通信。The following is a brief description of the hardware implementation of the host. Taking host 1 as an example, host 1 includes a processor 11 and a memory 12. The specific connection medium between the processor 11 and the memory 12 is not limited in the embodiment of this application. Examples of this application In Figure 2, the processor 11 and the memory 12 are connected through a bus, which can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one line is used in Figure 2, but it does not mean that there is only one bus or one type of bus. The host 1 may also include a communication interface for communicating with other devices through a transmission medium, so that the devices used in the host 1 can communicate with other devices.
存储器12用于存储程序指令和/或数据,处理器11和存储器12耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器11可能和存储器12协同操作。处理器11可能执行存储器12中存储的程序指令。The memory 12 is used to store program instructions and/or data, and the processor 11 and the memory 12 are coupled. The coupling in the embodiment of this application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information interaction between devices, units or modules. The processor 11 may cooperate with the memory 12 . Processor 11 may execute program instructions stored in memory 12 .
在本申请实施例中,处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the embodiment of this application, the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or Execute each method, step and logical block diagram disclosed in the embodiment of this application. A general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
在本申请实施例中,存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或SSD等,还可以是易失性存储器(volatile memory),例如RAM。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。In this embodiment of the present application, the memory may be a non-volatile memory, such as a hard disk drive (HDD) or an SSD, or a volatile memory (volatile memory), such as a RAM. Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory in the embodiment of the present application can also be a circuit or any other device capable of realizing a storage function, used to store program instructions and/or data.
如图2所示,服务器110A至服务器110C一起提供了用于存储对象/文件(object/file)的内存池,内存池用于保存存储系统110中的对象或文件,该对象或文件是指一组具有关联性的数据,如图2所示出的文件1。换言之,该文件1所占用的存储空间是从内存池中分配出来的。如存储系统110中各服务器包括的内存(如DRAM、PMEM)或硬盘提供了全局地址空间,并由存储系统110对外提供分布式物理地址(Distributed Physical Address,DPA)空间(DPA Space),DPA通过分布式页表(Distributed Page Table,DPT)映射到分布式虚拟地址(Distributed Virtual Address,DVA)。用户文件或者对象基于DVA进行构建。主机中的应用程序可以通过分布式存储映射(distributed memory map,distributed mmap)的方式将文件/对象映射到本地进程的地址空间中,如由存储器12提供存储空间的文件缓存1,并通过load/store对其进行访问。As shown in Figure 2, servers 110A to 110C together provide a memory pool for storing objects/files. The memory pool is used to save objects or files in the storage system 110. The object or file refers to a Group data with correlations, as shown in Figure 2 in File 1. In other words, the storage space occupied by file 1 is allocated from the memory pool. For example, the memory (such as DRAM, PMEM) or hard disk included in each server in the storage system 110 provides a global address space, and the storage system 110 provides a distributed physical address (Distributed Physical Address, DPA) space (DPA Space) to the outside world. DPA passes The Distributed Page Table (DPT) is mapped to the Distributed Virtual Address (DVA). User files or objects are constructed based on DVA. Applications in the host can map files/objects to the address space of the local process through distributed memory map (distributed mmap), such as file cache 1 that provides storage space by memory 12, and through load/ store to access it.
一种可能的情形中,文件使用的存储资源为页面(page)资源,如图2所示出的数据页(page),该文件缓存1也可称为页面缓存(page cache)。在不同业务场景或者不同工作量(workload)情况下,文件所需的存储资源不同。多个文件之间可以是根据文件的类型进行区分。In one possible situation, the storage resource used by the file is a page resource, such as the data page (page) shown in Figure 2. The file cache 1 can also be called a page cache (page cache). In different business scenarios or different workloads, files require different storage resources. Multiple files can be distinguished based on file type.
文件缓存1包括文件1的元数据和文件1中的部分数据。该元数据用于指示文件1包括的数据在存储服务器中的地址,该地址可以是前述的DVA或者DPA等,在数据访问设备或主机具有该地址的情况下,可根据该地址从存储服务器中直接读取数据。File cache 1 includes the metadata of file 1 and some data in file 1. This metadata is used to indicate the address of the data included in file 1 in the storage server. This address can be the aforementioned DVA or DPA, etc. If the data access device or host has this address, the address can be retrieved from the storage server based on this address. Read data directly.
应理解,其他数据访问设备和主机也可以将文件1映射到本地缓存中。例如主机1对文件1进行映射,确定了文件缓存2,文件缓存2包括文件1的元数据和文件1中的部分数据,该部分数据和文件1中的数据可以相同或不同。又如,主机3对文件1进行映射,确定了文件缓存3,文件缓存3包括文件1的元数据和文件1中的部分数据,该部分数据和文件1中的数据可以相同或不同。It should be understood that other data access devices and hosts may also map File 1 into the local cache. For example, host 1 maps file 1 and determines file cache 2. File cache 2 includes the metadata of file 1 and part of the data in file 1. This part of the data and the data in file 1 may be the same or different. For another example, host 3 maps file 1 and determines file cache 3. File cache 3 includes metadata of file 1 and part of the data in file 1. This part of the data may be the same as or different from the data in file 1.
值得注意的是,图2所示出的内存池是以存储系统110中的一个或多个内存或硬盘虚拟 化实现的,但在一些可能的示例中,内存池还可以是由存储系统110中的其他存储介质实现的,本申请对此不予限定。It is worth noting that the memory pool shown in Figure 2 is virtualized by one or more memories or hard disks in the storage system 110. , but in some possible examples, the memory pool can also be implemented by other storage media in the storage system 110 , which is not limited in this application.
主机实现以上文件的映射可以是通过应用程序实现的,如映射管理软件,下面以存储映射(memory map,mmap)管理软件为例进行说明,如图3所示,图3为本申请提供的一种存储映射的示意图二,图3所示出的数据访问系统包括数据访问设备31和存储服务器32。数据访问设备31可以实现图1所示出的主机1和计算装置131的功能,或者,该数据访问设备31可以图2所示出的主机1和DPU 1的功能。存储服务器32可以是指存储系统110中的任一个服务器或者多个服务器的组合,在此不予赘述。The host can realize the mapping of the above files through an application program, such as mapping management software. The following takes storage mapping (memory map, mmap) management software as an example for explanation, as shown in Figure 3. Figure 3 is an example provided by this application. Schematic diagram 2 of a storage mapping. The data access system shown in Figure 3 includes a data access device 31 and a storage server 32. The data access device 31 can implement the functions of the host 1 and the computing device 131 shown in Figure 1, or the data access device 31 can implement the functions of the host 1 and the DPU 1 shown in Figure 2. The storage server 32 may refer to any server or a combination of multiple servers in the storage system 110, which will not be described again here.
存储服务器32中的文件1包括多个数据页存储的数据,如图3所示出的数据页P1至P6。File 1 in storage server 32 includes data stored in multiple data pages, such as data pages P1 to P6 shown in FIG. 3 .
这里以数据访问设备31包括主机1和DPU 1为例进行说明,数据访问设备31将全局文件或者对象(如文件1)映射到主机1的地址空间,并通过页面错误(page fault)进程将映射到主机1的地址空间的数据加载到本地的页面缓存(如文件缓存1)。Here, the data access device 31 includes host 1 and DPU 1 as an example. The data access device 31 maps a global file or object (such as file 1) to the address space of host 1, and maps it through the page fault process. Data to the address space of host 1 is loaded into the local page cache (such as file cache 1).
在文件1已经从存储服务器32映射到数据访问设备31之后,存储服务器32可将主机1加入主机列表,该主机列表用于指示映射有文件1的多个主机,这多个主机可通过查询文件1的元数据的方式从存储服务器32中读取文件1的数据,无需等待存储服务器中的控制器与该存储空间对应的磁盘进行交互后,才将数据从存储服务器读取到主机的存储器,减少了数据访问设备从存储服务器中读取数据的IO路径的长度,增大了数据访问设备和存储服务器之间的数据访问效率。After file 1 has been mapped from the storage server 32 to the data access device 31, the storage server 32 can add the host 1 to the host list. The host list is used to indicate multiple hosts to which the file 1 is mapped. These multiple hosts can query the file by querying the file 1. The data of file 1 is read from the storage server 32 using the metadata method 1. There is no need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before reading the data from the storage server to the host's memory. The length of the IO path for the data access device to read data from the storage server is reduced, and the data access efficiency between the data access device and the storage server is increased.
在一种可能的示例中,若存储服务器32以虚拟地址(Virtual Address,VA)的方式提供文件1的存储空间,则数据访问设备31将文件映射到主机1的地址空间时,需进行VA和主机1的物理地址(Physical Adsress)之间的映射,该VA和物理地址之间的映射可通过页表的形式来确定。如该页表存储有存储服务器32中提供给文件1的VA和数据访问设备31提供给文件缓存1的物理地址的映射关系,以图3为例,存储服务器32的文件1中P1至P6的VA分别为:Virt1至Virt6,数据访问设备31建立了存储服务器32中文件1与存储器312中文件缓存1的地址映射关系后,P1至P6的物理地址分别为:Phys1至Phys6。In a possible example, if the storage server 32 provides the storage space of file 1 in the form of a virtual address (Virtual Address, VA), then when the data access device 31 maps the file to the address space of the host 1, VA and The mapping between the physical address (Physical Address) of host 1. The mapping between the VA and the physical address can be determined in the form of a page table. For example, the page table stores the mapping relationship between the VA provided to file 1 in the storage server 32 and the physical address provided to the file cache 1 by the data access device 31. Taking FIG. 3 as an example, P1 to P6 in file 1 of the storage server 32 VAs are: Virt1 to Virt6 respectively. After the data access device 31 establishes the address mapping relationship between file 1 in the storage server 32 and file cache 1 in the memory 312, the physical addresses of P1 to P6 are: Phys1 to Phys6 respectively.
在数据访问设备31向文件缓存1中写数据出现地址故障的情况下,映射管理软件可以对页面维护的地址映射关系进行更新,以便数据访问设备31可根据更新后的页表,在对文件缓存1中的数据进行修改后,将修改的内容通过页表指示的地址映射关系同步到存储服务器32,实现存储服务器32和数据访问设备31中的缓存一致性。When the data access device 31 writes data to the file cache 1 and an address failure occurs, the mapping management software can update the address mapping relationship maintained by the page, so that the data access device 31 can cache the file according to the updated page table. After the data in 1 is modified, the modified content is synchronized to the storage server 32 through the address mapping relationship indicated by the page table, thereby achieving cache consistency in the storage server 32 and the data access device 31 .
在mmap之后,数据访问设备通过load/store对文件缓存1中的数据进行读写。其中,load操作用于将文件缓存1中的数据读取,store操作用于将数据写入文件缓存1。在本申请实施例中,store操作会触发写保护,如映射管理软件将文件缓存1中的旧数据保存到写拷贝日志(copy on write Log,Cow Log)中,若数据访问设备31将写入到文件缓存1的数据同步至存储服务器32的过程失败,则该数据访问设备31可基于拷贝日志中存储的内容,将同步失败的数据进行回滚,避免该数据同步失败后丢失,提高了数据安全性。After mmap, the data access device reads and writes the data in file cache 1 through load/store. Among them, the load operation is used to read the data in the file cache 1, and the store operation is used to write the data to the file cache 1. In the embodiment of this application, the store operation will trigger write protection. For example, the mapping management software saves the old data in the file cache 1 to the copy on write log (copy on write Log, Cow Log). If the data access device 31 will write If the data synchronization process of the file cache 1 to the storage server 32 fails, the data access device 31 can roll back the failed data based on the content stored in the copy log to avoid the data being lost after the data synchronization fails and improve the data quality. safety.
针对于数据访问设备31将数据同步到存储服务器32的过程,以及,数据访问设备31从存储服务器32中读取数据的过程,以下图4和图5给出了可能的具体实现方式。Regarding the process of the data access device 31 synchronizing data to the storage server 32, and the process of the data access device 31 reading data from the storage server 32, possible specific implementations are shown in Figure 4 and Figure 5 below.
如图4所示,图4为本申请提供的数据访问方法的流程示意图一,该数据访问方法可应用于图3所示出的数据访问系统,其中,数据访问设备31包括处理器311、存储器312和DPU313,关于处理器311、存储器312和DPU 313的硬件实现可参照前述图2中主机1和DPU 1的内容,此处不再赘述。 As shown in Figure 4, Figure 4 is a flow diagram 1 of the data access method provided by this application. This data access method can be applied to the data access system shown in Figure 3, where the data access device 31 includes a processor 311, a memory 312 and DPU 313. Regarding the hardware implementation of the processor 311, the memory 312 and the DPU 313, please refer to the content of the host 1 and the DPU 1 in Figure 2, which will not be described again here.
该数据访问方法包括以下的步骤S410至S440。The data access method includes the following steps S410 to S440.
S410,处理器311将数据写入存储器312。S410, the processor 311 writes data into the memory 312.
示例性的,当存储器312中的存储空间与内存池提供的存储空间建立有地址映射关系时,数据访问设备31可将数据写入存储器312中的存储空间,并根据该地址映射关系将数据同步到存储服务器32提供的内存池。关于地址映射关系的内容可参照前述图3的相关阐述,在此不予赘述。For example, when an address mapping relationship is established between the storage space in the memory 312 and the storage space provided by the memory pool, the data access device 31 can write data into the storage space in the memory 312 and synchronize the data according to the address mapping relationship. to the memory pool provided by storage server 32. Regarding the content of the address mapping relationship, please refer to the above-mentioned relevant explanation in Figure 3 and will not be described again here.
在一种可能的情形中,文件缓存1与存储服务器32中的第一地址已建立有地址映射关系,则处理器311将该数据(下述实施例称新数据)写入文件缓存1中与该第一地址对应的存储空间。In a possible situation, the file cache 1 and the first address in the storage server 32 have established an address mapping relationship, then the processor 311 writes the data (referred to as new data in the following embodiment) into the file cache 1 and The storage space corresponding to the first address.
在另一种可能的情形中,文件缓存1与存储服务器32中的第一地址未建立有地址映射关系,则处理器311建立第一地址的存储空间与文件缓存1中存储空间的地址映射关系,并通过该地址映射关系读取存储服务器32中读取该第一地址的存储空间存储的旧数据,以及,处理器311将新数据写入到文件缓存1中与第一地址对应的存储空间。其中,处理器311将数据写入存储器312的方式可以是覆盖写或追加写。In another possible situation, if there is no address mapping relationship between the file cache 1 and the first address in the storage server 32, the processor 311 establishes an address mapping relationship between the storage space of the first address and the storage space in the file cache 1. , and read the old data stored in the storage space of the first address in the storage server 32 through the address mapping relationship, and the processor 311 writes the new data into the storage space corresponding to the first address in the file cache 1 . The processor 311 may write data into the memory 312 by overwriting or appending.
在一种可行的示例中,第一地址对应的存储空间可以是指一个或多个数据页(page),如图4所示出的P1至P6。这里以第一地址对应的存储空间是指P1为例进行说明。In a feasible example, the storage space corresponding to the first address may refer to one or more data pages, such as P1 to P6 as shown in FIG. 4 . Here, the storage space corresponding to the first address refers to P1 as an example.
例如,若处理器311采用覆盖写的方式将新数据写入文件缓存1中的P1,则新数据会将旧数据覆盖,在新数据被处理器311写到P1时,该文件缓存1中P1存储的旧数据失效。For example, if the processor 311 uses an overwrite method to write new data to P1 in the file cache 1, the new data will overwrite the old data. When the new data is written to P1 by the processor 311, P1 in the file cache 1 The old data stored becomes invalid.
又如,若旧数据不足以占用一个page所提供的存储空间(P1),且新数据也不足以占尽P1的存储空间,则处理器311采用追加写的方式将新数据写入文件缓存1中的P1。如P1提供有4KB的存储空间,旧数据占用2KB,新数据占用剩余的2KB。For another example, if the old data is not enough to occupy the storage space (P1) provided by a page, and the new data is not enough to occupy the storage space of P1, the processor 311 writes the new data into the file cache 1 using append writing. P1 in . For example, P1 provides 4KB of storage space, old data occupies 2KB, and new data occupies the remaining 2KB.
S420,在处理器311将数据写入存储器312后,向DPU 313发送数据同步请求。S420: After the processor 311 writes the data into the memory 312, it sends a data synchronization request to the DPU 313.
该数据同步请求用于指示将数据存储至存储服务器32。例如,该数据同步请求是通过“Memory sync”命令来实现的。The data synchronization request is used to instruct data to be stored in the storage server 32 . For example, this data synchronization request is implemented through the "Memory sync" command.
以图4中的P1为例进行说明,在处理器311将数据写入文件缓存1中的P1后,向DPU 313发送“Memory sync”命令,以使DPU 313将数据存储到存储服务器32中。Taking P1 in Figure 4 as an example, after the processor 311 writes data to P1 in the file cache 1, it sends a "Memory sync" command to the DPU 313 so that the DPU 313 stores the data in the storage server 32.
S430,DPU 313基于数据同步请求,向存储服务器32发送加锁请求以及数据写入命令。S430, the DPU 313 sends a locking request and a data writing command to the storage server 32 based on the data synchronization request.
其中,数据写入命令用于指示存储服务器32将数据写入第一地址对应的存储空间。The data write command is used to instruct the storage server 32 to write data into the storage space corresponding to the first address.
加锁请求用于指示存储服务器32在执行数据写入命令的过程中,将第一地址对应的存储空间设置为不能被其他数据访问设备访问。示例性的,该第一地址对应的存储空间被设置为仅能被数据访问设备31访问,访问的过程包括写入数据和读取数据等。The locking request is used to instruct the storage server 32 to set the storage space corresponding to the first address so that it cannot be accessed by other data access devices during the process of executing the data write command. For example, the storage space corresponding to the first address is set to be accessible only by the data access device 31, and the access process includes writing data, reading data, and so on.
如图4所示,前述的其他数据访问设备是指:存储服务器32的主机列表中记录的除数据访问设备31中主机1外的其他主机等。As shown in FIG. 4 , the aforementioned other data access devices refer to other hosts other than host 1 in the data access device 31 recorded in the host list of the storage server 32 .
在数据访问设备向存储服务器写入数据的过程中,存储服务器中该数据所要写入的存储空间(如前述的第一地址对应的存储空间)不能被其他数据访问设备所访问,也就是说,对于存储服务器的一段存储空间而言,在该一段存储空间中写入数据的多个数据访问设备之间是互相排斥的(简称:多写互斥),即一个时间仅有一个数据访问设备能够访问该存储空间,避免了多个数据访问设备在存储服务器中的一段存储空间写入数据,导致每个数据访问设备从该一段存储空间中读取的数据不同,数据在多个数据访问设备中不一致的问题。During the process of the data access device writing data to the storage server, the storage space in the storage server to which the data is written (such as the storage space corresponding to the aforementioned first address) cannot be accessed by other data access devices. That is to say, For a section of storage space of a storage server, multiple data access devices that write data in this section of storage space are mutually exclusive (referred to as: multi-write mutual exclusion), that is, only one data access device can write data at a time. Accessing this storage space avoids multiple data access devices from writing data to a section of storage space in the storage server, causing each data access device to read different data from this section of storage space, and the data is stored in multiple data access devices. Inconsistency issues.
S440,处理器311将存储器312存储的新数据写入存储服务器32中第一地址对应的存储空间。 S440, the processor 311 writes the new data stored in the memory 312 into the storage space corresponding to the first address in the storage server 32.
由于前述P1无法被其他数据访问设备所访问,因此,在本实施例提供的数据访问设备将数据写入存储服务器的过程中,该数据访问设备可绕过存储服务器所包括的控制器(或处理器),即数据访问设备无需等待存储服务器中的控制器与该存储空间对应的磁盘进行交互后,才将数据从存储器写入存储服务器,减少了存储服务器中写入数据的IO路径的长度,增大了数据访问设备和存储服务器之间的数据访问效率。Since the aforementioned P1 cannot be accessed by other data access devices, when the data access device provided in this embodiment writes data to the storage server, the data access device can bypass the controller (or processing unit) included in the storage server. controller), that is, the data access device does not need to wait for the controller in the storage server to interact with the disk corresponding to the storage space before writing data from the storage to the storage server, reducing the length of the IO path for writing data in the storage server. Increased data access efficiency between data access devices and storage servers.
值得注意的是,若数据访问设备31在存储器312未命中数据的情况下,处理器311还可以向DPU 313发送读请求。DPU 313基于读请求携带的第一地址,从存储服务器32读取第一地址中存储的数据。It is worth noting that if the data access device 31 misses data in the memory 312, the processor 311 can also send a read request to the DPU 313. The DPU 313 reads the data stored in the first address from the storage server 32 based on the first address carried by the read request.
针对于数据访问设备写入到存储服务器中所存储的数据,其他数据访问设备可以读取该新写入的数据(新数据),写数据的数据访问设备也可以读取该新数据,使得该新数据在多个数据访问设备中具有一致性,提高了数据访问设备对存储服务器的数据访问性能。For the data stored in the storage server written by the data access device, other data access devices can read the newly written data (new data), and the data access device that writes the data can also read the new data, so that the new data can be read by the data access device. The new data is consistent among multiple data access devices, improving the data access performance of the data access device to the storage server.
应理解,在存储服务器32的长期使用过程中,该存储服务器32也可被其他数据访问设备所访问,下面提供了一种存储服务器中存储空间的加锁、解锁的实现方式,如图5所示,图5为本申请提供的数据访问方法的流程示意图二,该数据访问方法应用的数据访问系统还包括:数据访问设备33,该数据访问设备33包括的DPU 333可由图2所示出的DPU 3来实现,该数据访问设备33还可以包括图2所示出的主机3。It should be understood that during the long-term use of the storage server 32, the storage server 32 can also be accessed by other data access devices. The following provides an implementation method for locking and unlocking the storage space in the storage server, as shown in Figure 5. As shown, Figure 5 is a flow diagram 2 of the data access method provided by this application. The data access system applied by the data access method also includes: data access device 33. The DPU 333 included in the data access device 33 can be as shown in Figure 2 DPU 3 is implemented, and the data access device 33 may also include the host 3 shown in Figure 2.
请参照图5,本实施例提供的数据访问方法包括以下四个阶段。Referring to Figure 5, the data access method provided in this embodiment includes the following four stages.
阶段①:DPU 313向存储服务器32发送加锁请求,该加锁请求用于指示存储服务器32对待写入新数据的存储空间加锁。Stage ①: The DPU 313 sends a locking request to the storage server 32. The locking request is used to instruct the storage server 32 to lock the storage space to be written with new data.
作为一种可行的示例,数据访问设备31中的映射管理软件可获取文件缓存1中数据被修改的数据页(脏页),从而确定文件缓存1中的脏页列表(dirty page list)。前述的加锁请求可携带有该脏页列表,以便存储服务器32根据该脏页列表来确定待加锁的存储空间。例如,待写入新数据是指文件缓存1中P1、P3和P5所存储的数据,该文件缓存1中P1、P3和P5可被称为数据访问设备31中的脏页,则存储服务器32中待写入新数据的存储空间是指内存池中P1、P3和P5对应的存储空间(如图5中灰色部分所示)。As a feasible example, the mapping management software in the data access device 31 can obtain the data pages (dirty pages) in the file cache 1 with modified data, thereby determining the dirty page list (dirty page list) in the file cache 1. The aforementioned locking request may carry the dirty page list, so that the storage server 32 determines the storage space to be locked based on the dirty page list. For example, the new data to be written refers to the data stored in P1, P3 and P5 in the file cache 1. The P1, P3 and P5 in the file cache 1 can be called dirty pages in the data access device 31, then the storage server 32 The storage space to be written with new data refers to the storage space corresponding to P1, P3 and P5 in the memory pool (shown in the gray part in Figure 5).
在图5所示的存储服务器32中,存储服务器32通过队列的方式来维护多个数据访问设备的加锁或解锁状态。如锁请求队列中维护了内存池中一个或多个数据页的锁状态,如在第一时间内,前述的P1、P3和P5仅可被数据访问设备31访问,而P2仅可被数据访问设备33访问。In the storage server 32 shown in FIG. 5 , the storage server 32 maintains the locking or unlocking status of multiple data access devices through a queue. For example, the lock request queue maintains the lock status of one or more data pages in the memory pool. For example, in the first time, the aforementioned P1, P3 and P5 can only be accessed by the data access device 31, and P2 can only be accessed by data. Device 33 accessed.
阶段②:数据访问设备31将脏页存储的数据写入内存池中与该脏页的地址对应的存储空间(如前述第一地址的存储空间)。Stage ②: The data access device 31 writes the data stored in the dirty page into the storage space corresponding to the address of the dirty page in the memory pool (such as the storage space at the first address mentioned above).
示例性的,在存储服务器32对内存池中与脏页的地址对应的存储空间加锁成功后,DPU313通过单边写数据的方式,根据文件缓存1记录的脏页在内存池中的地址,将脏页所存储的数据写回到内存池中。For example, after the storage server 32 successfully locks the storage space corresponding to the address of the dirty page in the memory pool, the DPU 313 writes data unilaterally based on the address of the dirty page in the memory pool recorded in the file cache 1. Write the data stored in dirty pages back to the memory pool.
阶段③:DPU 313通过单边读数据的方式,从存储服务器32读取映射有文件1的多个主机的主机列表,向主机列表中的其他主机或数据访问设备发送失效(Invalidation)指示消息,以实现多个数据访问设备中的缓存一致性。Stage ③: DPU 313 reads the host list of multiple hosts mapped to file 1 from the storage server 32 by unilaterally reading data, and sends an invalidation (Invalidation) indication message to other hosts or data access devices in the host list. To achieve cache coherence among multiple data access devices.
示例性的,失效指示消息用于指示其他数据访问设备,将第一地址中存储的旧数据失效。For example, the invalidation indication message is used to instruct other data access devices to invalidate old data stored in the first address.
如图5所示,数据访问设备33的文件缓存3中映射了内存池中P1、P2、P3、P4和P6对应的存储空间以及数据,由于P1和P3已经被数据访问设备31写入的新数据所覆盖,因此,在DPU 333接收到DPU 313发送的失效指示消息后,DPU 333可以将文件缓存3中与前述文 件缓存1中的脏页的地址一致的数据失效,如文件缓存3中的P1和P3所存储的旧数据。As shown in Figure 5, the file cache 3 of the data access device 33 maps the storage space and data corresponding to P1, P2, P3, P4 and P6 in the memory pool. Since P1 and P3 have been written by the data access device 31, data, therefore, after the DPU 333 receives the invalidation indication message sent by the DPU 313, the DPU 333 can cache the file 3 with the aforementioned file. The data with the same address of the dirty page in file cache 1 becomes invalid, such as the old data stored in P1 and P3 in file cache 3.
作为一种可行的示例,DPU 333可以根据DPU 313发送的失效指示消息,查询数据访问设备33所维护的页表,从而确定文件缓存3中待置为失效的数据页的物理地址。如失效指示消息携带有P1、P3和P5的VA:Virt1、Virt3、Virt5,DPU 333在查询页表后,确定文件缓存3中待失效的物理地址包括:P1对应的Phys1和P3对应的Phys3,则DPU 333修改页表中的Virt1-Phys1、Virt3-Phys3的状态为无效状态,则数据访问设备33对文件缓存3中的P1和P3进行修改的时候,修改后的数据不会被同步到内存池。在其他数据访问设备需要使用该第一地址写入的新数据的情况下,该其他数据访问设备重新从存储服务器读取该第一地址的存储空间中的新数据,避免了多个数据访问设备在存储服务器的同一存储空间中读取的数据不一致,导致数据访问出错或者数据访问效率降低的问题。As a feasible example, the DPU 333 can query the page table maintained by the data access device 33 according to the invalidation indication message sent by the DPU 313, thereby determining the physical address of the data page to be invalidated in the file cache 3. If the failure indication message carries VAs of P1, P3 and P5: Virt1, Virt3, Virt5, DPU 333, after querying the page table, determines that the physical addresses to be invalidated in file cache 3 include: Phys1 corresponding to P1 and Phys3 corresponding to P3. Then the DPU 333 modifies the status of Virt1-Phys1 and Virt3-Phys3 in the page table to an invalid state. When the data access device 33 modifies P1 and P3 in the file cache 3, the modified data will not be synchronized to the memory. pool. In the case that other data access devices need to use the new data written at the first address, the other data access devices re-read the new data in the storage space at the first address from the storage server, avoiding multiple data access devices. The data read in the same storage space of the storage server is inconsistent, leading to data access errors or reduced data access efficiency.
对于以上DPU 333将旧数据失效的过程,本实施例提供了一种可能的实现方式,如图6所示,图6为本申请提供的数据失效的示意图,该数据访问设备33包括主机3和DPU 333,关于主机3的硬件实现可参照图2中主机1的内容,此处不再赘述。DPU 333包括处理器333A和存储器333B,处理器333A可以是一个CPU(如图6所示出的DPU CPU),存储器333B可以是一个DRAM。For the above process of invalidating old data by DPU 333, this embodiment provides a possible implementation method, as shown in Figure 6. Figure 6 is a schematic diagram of data invalidation provided by this application. The data access device 33 includes the host 3 and DPU 333, regarding the hardware implementation of host 3, please refer to the content of host 1 in Figure 2, and will not be described again here. The DPU 333 includes a processor 333A and a memory 333B. The processor 333A can be a CPU (DPU CPU as shown in Figure 6), and the memory 333B can be a DRAM.
主机3中维护有多个页表,一个页表对应一个存储服务器中的一个文件,如页表1对应文件1。DPU 333在本地维护一个记录本节点所有页表起始位置的表格,在DPU 333收到其他DPU发送的失效指示请求时,根据文件标识(obj ID)获取该文件标识对应的页表起始地址。如文件标识为“1”确定的页表物理地址为“0x34adf”,该文件1对应的页表1的数据长度为64B。Host 3 maintains multiple page tables. One page table corresponds to one file in a storage server. For example, page table 1 corresponds to file 1. DPU 333 locally maintains a table that records the starting positions of all page tables of this node. When DPU 333 receives failure indication requests sent by other DPUs, it obtains the starting address of the page table corresponding to the file identification (obj ID) based on the file identification. . For example, the physical address of the page table determined by the file identifier "1" is "0x34adf", and the data length of page table 1 corresponding to file 1 is 64B.
从而,DPU 333从通过计算快速链路(Compute Express Link,CXL).cache读取页表1到本地(存储器333B),并由处理器333A修改页表1对应的页表项,如图6中的Virt1-Phys1以及Virt3-Phys3,将该两个页表项置为无效,由于被修改的页表项和文件缓存3中的P1、P3关联,当页表项失效后,数据访问设备33要访问文件1中P1和P3对应的数据,则需从存储服务器中读取,从而完成了缓存的失效。Therefore, the DPU 333 reads the page table 1 from the Compute Express Link (CXL).cache to the local (memory 333B), and the processor 333A modifies the page table entry corresponding to the page table 1, as shown in Figure 6 Virt1-Phys1 and Virt3-Phys3, the two page table entries are set to be invalid. Since the modified page table entry is associated with P1 and P3 in file cache 3, when the page table entry becomes invalid, the data access device 33 needs to To access the data corresponding to P1 and P3 in file 1, you need to read it from the storage server, thus completing the cache invalidation.
值得注意的是,以上实施例是以DPU 333为例来说明文件缓存中数据失效的过程,DPU 313也可以接收其他数据访问设备的失效信息,从而将存储器312中失效信息指示的第二地址存储的数据失效。It is worth noting that the above embodiment takes the DPU 333 as an example to illustrate the process of data invalidation in the file cache. The DPU 313 can also receive invalidation information from other data access devices, thereby storing the second address indicated by the invalidation information in the memory 312. The data is invalid.
例如,该第二地址与前述的第一地址不同。如在DPU 313接收到其他数据访问设备发送的失效信息后,将第二地址(如P2)存储的旧数据失效,避免数据访问设备使用旧数据来执行任务,该旧数据与存储服务器中第二地址的存储空间存储的新数据不一致,导致多个数据访问设备缓存的同一存储空间的数据不一致产生的访问错误,或者,数据访问设备与存储服务器交互后,存储服务器对多个数据访问设备中存储的该第二地址的数据进行同步,导致数据访问设备对存储服务器的数据访问效率降低的问题。For example, the second address is different from the aforementioned first address. For example, after the DPU 313 receives the invalidation information sent by other data access devices, it invalidates the old data stored at the second address (such as P2) to prevent the data access device from using the old data to perform tasks. The old data is different from the second data in the storage server. The new data stored in the storage space of the address is inconsistent, resulting in an access error caused by inconsistent data in the same storage space cached by multiple data access devices. Or, after the data access device interacts with the storage server, the storage server stores data in multiple data access devices. The data at the second address is synchronized, resulting in a problem of reduced data access efficiency of the data access device to the storage server.
又如,该第二地址与前述的第一地址相同。应理解,对于存储服务器中的一段存储空间,多个数据访问设备可在不同的时间对该一段存储空间中的数据进行修改等,避免了存储服务器的数据访问过程中,该一段存储空间的数据仅能被单个数据访问设备所修改的问题,提高了存储服务器所能提供的数据访问服务的性能。For another example, the second address is the same as the aforementioned first address. It should be understood that for a section of storage space in the storage server, multiple data access devices can modify the data in the section of storage space at different times, etc., to avoid that the data in this section of storage space will be deleted during the data access process of the storage server. The problem that it can only be modified by a single data access device improves the performance of the data access services that the storage server can provide.
请继续参照图5,本实施例提供的数据访问方法还包括以下阶段④。Please continue to refer to Figure 5. The data access method provided by this embodiment also includes the following stages ④.
阶段④:在新数据成功写入第一地址之后,DPU 313向存储服务器32发送解锁请求。该解锁请求用于指示存储服务器32将数据的第一地址设置为能被其他数据访问设备访问。 Stage ④: After the new data is successfully written into the first address, the DPU 313 sends an unlocking request to the storage server 32 . The unlocking request is used to instruct the storage server 32 to set the first address of the data so that it can be accessed by other data access devices.
对于存储服务器中所存储的数据或文件而言,该数据或文件被数据访问设备更新后(如写入新数据、删除、修改等),存储服务器可对该数据或文件的访问状态进行解锁,使得更新后的数据或文件可被其他数据访问设备所访问或修改等,避免了存储服务器中的数据或文件仅能被单个数据访问设备所使用,导致其他数据访问设备对该数据或文件的数据访问效率降低的问题。For data or files stored in the storage server, after the data or files are updated by the data access device (such as writing new data, deleting, modifying, etc.), the storage server can unlock the access status of the data or files. This allows the updated data or files to be accessed or modified by other data access devices, etc., thus avoiding the possibility that the data or files in the storage server can only be used by a single data access device, causing other data access devices to access the data or files. The problem of reduced access efficiency.
可以理解的是,为了实现上述实施例中的功能,数据访问设备包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。It can be understood that, in order to implement the functions in the above embodiments, the data access device includes corresponding hardware structures and/or software modules that perform each function. Those skilled in the art should easily realize that the units and method steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software driving the hardware depends on the specific application scenarios and design constraints of the technical solution.
上文中结合图1至图6,详细描述了根据本实施例所提供的数据访问方法和数据访问设备,但本申请实施例提供的数据访问装置也可以通过软件单元来实现,如该数据访问装置可应用于上述的数据访问设备,数据访问装置可包括:通信模块、存储模块和锁模块。该存储模块用于将数据写入存储器;通信模块用于向DPU发送数据同步请求;该数据同步请求用于指示将数据存储至存储服务器。以及,应用于DPU的锁模块基于数据同步请求,向存储服务器发送加锁请求;通信模块还用于向存储服务器发送数据写入命令。其中,数据写入命令用于指示存储服务器将数据写入第一地址对应的存储空间,加锁请求用于指示存储服务器在执行数据写入命令的过程中,将第一地址对应的存储空间设置为不能被其他数据访问设备访问。The data access method and data access device provided according to this embodiment are described in detail above with reference to Figures 1 to 6. However, the data access device provided by the embodiment of the present application can also be implemented through a software unit, such as the data access device Applicable to the above-mentioned data access device, the data access device may include: a communication module, a storage module and a lock module. The storage module is used to write data into the memory; the communication module is used to send a data synchronization request to the DPU; the data synchronization request is used to instruct data to be stored in the storage server. In addition, the lock module applied to the DPU sends a locking request to the storage server based on the data synchronization request; the communication module is also used to send data writing commands to the storage server. Among them, the data write command is used to instruct the storage server to write data to the storage space corresponding to the first address, and the lock request is used to instruct the storage server to set the storage space corresponding to the first address during the process of executing the data write command. cannot be accessed by other data access devices.
应理解的是,本申请实施例的数据访问装置可以通过DPU实现。根据本申请实施例的数据访问装置可对应于执行本申请实施例中描述的方法,并且数据访问装置中的各个单元和模块的上述和其它操作和/或功能分别为了实现前述附图中的各个方法的相应流程,为了简洁,在此不再赘述。It should be understood that the data access device in the embodiment of the present application can be implemented by a DPU. The data access device according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of the various units and modules in the data access device are respectively to realize each of the aforementioned figures. The corresponding process of the method will not be repeated here for the sake of brevity.
例如,DPU包括控制电路和接口电路。接口电路,用于接收来自DPU之外的其他设备的数据并传输至控制电路,或将来自控制电路的数据发送给DPU之外的其他设备。控制电路通过逻辑电路或执行代码指令,和接口电路执行前述数据访问方法中DPU的功能。For example, a DPU includes control circuits and interface circuits. The interface circuit is used to receive data from other devices other than the DPU and transmit it to the control circuit, or to send data from the control circuit to other devices other than the DPU. The control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the DPU in the aforementioned data access method.
本申请实施例还提供了一种网卡,包括:前述实施例所述的DPU和通信接口。如该通信接口用于发送DPU发出的数据,或者,通信接口用于接收其他设备发送给DPU的数据。从而,DPU实现本申请提供的数据访问方法的操作步骤。An embodiment of the present application also provides a network card, including: the DPU and communication interface described in the previous embodiment. For example, the communication interface is used to send data sent by the DPU, or the communication interface is used to receive data sent to the DPU by other devices. Therefore, the DPU implements the operating steps of the data access method provided by this application.
本实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于计算设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。The method steps in this embodiment can be implemented by hardware or by a processor executing software instructions. Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media. An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage media may be located in an ASIC. Additionally, the ASIC can be located in a computing device. Of course, the processor and the storage medium can also exist as discrete components in network equipment or terminal equipment.
本申请还提供一种芯片系统,该芯片系统包括处理器,用于实现上述方法中数据处理单元的功能。在一种可能的设计中,所述芯片系统还包括存储器,用于保存程序指令和/或数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。This application also provides a chip system, which includes a processor and is used to implement the functions of the data processing unit in the above method. In a possible design, the chip system further includes a memory for storing program instructions and/or data. The chip system may be composed of chips, or may include chips and other discrete devices.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当 使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. when When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device. The computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media. The available media may be magnetic media, such as floppy disks, hard disks, and magnetic tapes; they may also be optical media, such as digital video discs (DVDs); they may also be semiconductor media, such as solid state drives (solid state drives). , SSD).
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。 The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present application. Modification or replacement, these modifications or replacements shall be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (13)

  1. 一种数据访问设备,其特征在于,包括:处理器、存储器和数据处理单元;A data access device, characterized by including: a processor, a memory and a data processing unit;
    所述处理器,用于:将数据写入所述存储器,向所述数据处理单元发送数据同步请求;所述数据同步请求用于指示将所述数据存储至存储服务器;The processor is configured to: write data into the memory and send a data synchronization request to the data processing unit; the data synchronization request is used to instruct the data to be stored in a storage server;
    所述数据处理单元,用于:基于所述数据同步请求,向所述存储服务器发送加锁请求以及数据写入命令;The data processing unit is configured to: send a locking request and a data writing command to the storage server based on the data synchronization request;
    其中,所述数据写入命令用于指示所述存储服务器将所述数据写入第一地址对应的存储空间,所述加锁请求用于指示所述存储服务器在执行所述数据写入命令的过程中,将所述第一地址对应的存储空间设置为不能被其他数据访问设备访问。Wherein, the data write command is used to instruct the storage server to write the data into the storage space corresponding to the first address, and the lock request is used to instruct the storage server to execute the data write command when executing the data write command. During the process, the storage space corresponding to the first address is set to be inaccessible to other data access devices.
  2. 根据权利要求1所述的数据访问设备,其特征在于,所述数据处理单元,还用于:在所述数据成功写入所述第一地址之后,向所述存储服务器发送解锁请求;The data access device according to claim 1, wherein the data processing unit is further configured to: after the data is successfully written to the first address, send an unlocking request to the storage server;
    所述解锁请求用于指示所述存储服务器将所述数据的第一地址设置为能被其他数据访问设备访问。The unlocking request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
  3. 根据权利要求1或2所述的数据访问设备,其特征在于,所述数据处理单元,还用于:向所述其他数据访问设备发送失效指示消息;The data access device according to claim 1 or 2, characterized in that the data processing unit is further configured to: send an invalidation indication message to the other data access device;
    所述失效指示消息用于指示所述其他数据访问设备,将所述第一地址中存储的旧数据失效。The invalidation indication message is used to instruct the other data access device to invalidate the old data stored in the first address.
  4. 根据权利要求1-3中任一项所述的数据访问设备,其特征在于,所述数据处理单元,还用于:根据所述其他数据访问设备发送的失效信息,将所述存储器中所述失效信息指示的第二地址存储的数据失效。The data access device according to any one of claims 1 to 3, characterized in that the data processing unit is further configured to: according to the failure information sent by the other data access device, convert the data in the memory to The data stored at the second address indicated by the failure information is invalid.
  5. 根据权利要求1-4中任一项所述的数据访问设备,其特征在于,所述处理器,还用于:在所述存储器未命中所述数据的情况下,向所述数据处理单元发送读请求;The data access device according to any one of claims 1 to 4, characterized in that the processor is further configured to: when the memory misses the data, send the data to the data processing unit. read request;
    所述数据处理单元,还用于:基于所述读请求携带的第一地址,从所述存储服务器读取所述第一地址中存储的数据。The data processing unit is further configured to: based on the first address carried in the read request, read the data stored in the first address from the storage server.
  6. 一种数据访问方法,其特征在于,所述数据访问方法由数据访问系统执行,所述数据访问系统包括数据访问设备和存储服务器,所述数据访问设备包括处理器、存储器和数据处理单元,所述方法包括:A data access method, characterized in that the data access method is executed by a data access system, the data access system includes a data access device and a storage server, the data access device includes a processor, a memory and a data processing unit, so The methods include:
    所述处理器将数据写入所述存储器,向所述数据处理单元发送数据同步请求;所述数据同步请求用于指示将所述数据存储至存储服务器;The processor writes data into the memory and sends a data synchronization request to the data processing unit; the data synchronization request is used to instruct the data to be stored in a storage server;
    所述数据处理单元基于所述数据同步请求,向所述存储服务器发送加锁请求以及数据写入命令;The data processing unit sends a locking request and a data writing command to the storage server based on the data synchronization request;
    其中,所述数据写入命令用于指示所述存储服务器将所述数据写入第一地址对应的存储空间,所述加锁请求用于指示所述存储服务器在执行所述数据写入命令的过程中,将所述第一地址对应的存储空间设置为不能被其他数据访问设备访问。Wherein, the data write command is used to instruct the storage server to write the data into the storage space corresponding to the first address, and the lock request is used to instruct the storage server to execute the data write command when executing the data write command. During the process, the storage space corresponding to the first address is set to be inaccessible to other data access devices.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6, further comprising:
    在所述数据成功写入所述第一地址之后,所述数据处理单元向所述存储服务器发送解锁请求;After the data is successfully written to the first address, the data processing unit sends an unlocking request to the storage server;
    所述解锁请求用于指示所述存储服务器将所述数据的第一地址设置为能被其他数据访问设备访问。The unlocking request is used to instruct the storage server to set the first address of the data so that it can be accessed by other data access devices.
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:The method according to claim 6 or 7, characterized in that, the method further includes:
    向所述其他数据访问设备发送失效指示消息; Send an invalidation indication message to the other data access device;
    所述失效指示消息用于指示所述其他数据访问设备,将所述第一地址中存储的旧数据失效。The invalidation indication message is used to instruct the other data access device to invalidate the old data stored in the first address.
  9. 根据权利要求6-8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 6-8, characterized in that the method further includes:
    所述数据处理单元根据所述其他数据访问设备发送的失效信息,将所述存储器中所述失效信息指示的第二地址存储的数据失效。The data processing unit invalidates the data stored at the second address indicated by the invalidation information in the memory according to the invalidation information sent by the other data access device.
  10. 根据权利要求6-9中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 6-9, characterized in that the method further includes:
    在所述存储器未命中所述数据的情况下,所述数据处理单元向所述数据处理单元发送读请求;When the memory misses the data, the data processing unit sends a read request to the data processing unit;
    所述数据处理单元基于所述读请求携带的第一地址,从所述存储服务器读取所述第一地址中存储的数据。The data processing unit reads the data stored in the first address from the storage server based on the first address carried in the read request.
  11. 一种数据访问系统,其特征在于,包括:存储服务器和权利要求1至权利要求5中任一项所述的数据访问设备;A data access system, characterized by comprising: a storage server and the data access device according to any one of claims 1 to 5;
    所述存储服务器用于存储所述数据访问设备将要同步的数据,以及,将所述数据将被写入的第一地址对应的存储空间设置为不能被其他数据访问设备访问。The storage server is used to store the data to be synchronized by the data access device, and set the storage space corresponding to the first address to which the data will be written so that it cannot be accessed by other data access devices.
  12. 一种数据处理单元,其特征在于,包括:控制电路和接口电路;A data processing unit, characterized by including: a control circuit and an interface circuit;
    所述接口电路,用于接收来自所述数据处理单元之外的其他设备的数据并传输至控制电路,或将来自所述控制电路的数据发送给所述数据处理单元之外的其他设备;The interface circuit is used to receive data from other devices other than the data processing unit and transmit it to the control circuit, or to send data from the control circuit to other devices other than the data processing unit;
    所述控制电路通过逻辑电路或执行代码指令,和所述接口电路执行权利要求6至权利要求10中任一项所述的数据处理单元的功能。The control circuit executes code instructions through logic circuits, and the interface circuit performs the functions of the data processing unit according to any one of claims 6 to 10.
  13. 一种网卡,其特征在于,包括:权利要求12所述的数据处理单元和通信接口;A network card, characterized by comprising: the data processing unit and communication interface according to claim 12;
    所述通信接口用于发送所述数据处理单元发出的数据,或者,所述通信接口用于接收其他设备发送给所述数据处理单元的数据。 The communication interface is used to send data sent by the data processing unit, or the communication interface is used to receive data sent by other devices to the data processing unit.
PCT/CN2023/089442 2022-08-31 2023-04-20 Data access device, method and system, data processing unit, and network interface card WO2024045643A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211054647.3A CN117667761A (en) 2022-08-31 2022-08-31 Data access device, method, system, data processing unit and network card
CN202211054647.3 2022-08-31

Publications (1)

Publication Number Publication Date
WO2024045643A1 true WO2024045643A1 (en) 2024-03-07

Family

ID=90085005

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089442 WO2024045643A1 (en) 2022-08-31 2023-04-20 Data access device, method and system, data processing unit, and network interface card

Country Status (2)

Country Link
CN (1) CN117667761A (en)
WO (1) WO2024045643A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225335A1 (en) * 2010-03-15 2011-09-15 International Business Machines Corporation Using a dual mode reader writer lock
CN107807797A (en) * 2017-11-17 2018-03-16 北京联想超融合科技有限公司 The method, apparatus and server of data write-in
CN110399227A (en) * 2018-08-24 2019-11-01 腾讯科技(深圳)有限公司 A kind of data access method, device and storage medium
CN110691062A (en) * 2018-07-06 2020-01-14 浙江大学 Data writing method, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110225335A1 (en) * 2010-03-15 2011-09-15 International Business Machines Corporation Using a dual mode reader writer lock
CN107807797A (en) * 2017-11-17 2018-03-16 北京联想超融合科技有限公司 The method, apparatus and server of data write-in
CN110691062A (en) * 2018-07-06 2020-01-14 浙江大学 Data writing method, device and equipment
CN110399227A (en) * 2018-08-24 2019-11-01 腾讯科技(深圳)有限公司 A kind of data access method, device and storage medium

Also Published As

Publication number Publication date
CN117667761A (en) 2024-03-08

Similar Documents

Publication Publication Date Title
US11500689B2 (en) Communication method and apparatus
US10747673B2 (en) System and method for facilitating cluster-level cache and memory space
US9092426B1 (en) Zero-copy direct memory access (DMA) network-attached storage (NAS) file system block writing
US7620784B2 (en) High speed nonvolatile memory device using parallel writing among a plurality of interfaces
US8433888B2 (en) Network boot system
US10733101B2 (en) Processing node, computer system, and transaction conflict detection method
US20190026225A1 (en) Multiple chip multiprocessor cache coherence operation method and multiple chip multiprocessor
CN110119304B (en) Interrupt processing method and device and server
US11544812B2 (en) Resiliency schemes for distributed storage systems
WO2023035646A1 (en) Method and apparatus for expanding memory, and related device
US11240306B2 (en) Scalable storage system
WO2023125524A1 (en) Data storage method and system, storage access configuration method and related device
WO2019089057A1 (en) Scalable storage system
US20240211136A1 (en) Service system and memory management method and apparatus
WO2022033269A1 (en) Data processing method, device and system
WO2024051292A1 (en) Data processing system, memory mirroring method and apparatus, and computing device
WO2024045643A1 (en) Data access device, method and system, data processing unit, and network interface card
WO2022073399A1 (en) Storage node, storage device and network chip
JP2017033375A (en) Parallel calculation system, migration method, and migration program
CN116594551A (en) Data storage method and device
WO2022222523A1 (en) Log management method and apparatus
WO2023231572A1 (en) Container creation method and apparatus, and storage medium
WO2024060710A1 (en) Page swap-in method and apparatus
WO2023000784A1 (en) Data access method and related device
WO2022262623A1 (en) Data exchange method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23858684

Country of ref document: EP

Kind code of ref document: A1