WO2023207492A1 - 一种数据处理方法、装置、设备及可读存储介质 - Google Patents

一种数据处理方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2023207492A1
WO2023207492A1 PCT/CN2023/084830 CN2023084830W WO2023207492A1 WO 2023207492 A1 WO2023207492 A1 WO 2023207492A1 CN 2023084830 W CN2023084830 W CN 2023084830W WO 2023207492 A1 WO2023207492 A1 WO 2023207492A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
nodes
control flow
access control
Prior art date
Application number
PCT/CN2023/084830
Other languages
English (en)
French (fr)
Inventor
李雪生
李辉
张在贵
Original Assignee
济南浪潮数据技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 济南浪潮数据技术有限公司 filed Critical 济南浪潮数据技术有限公司
Publication of WO2023207492A1 publication Critical patent/WO2023207492A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0622Securing storage systems in relation to access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the present application relates to the field of computer technology, and in particular to a data processing method, device, equipment and readable storage medium.
  • the existing data transmission path is long and needs to go through the client application, client system interface, client network card, and storage node network card to reach the node's back-end disk, which limits access performance and access efficiency.
  • the purpose of this application is to provide a data processing method, device, equipment and readable storage medium to improve data access efficiency.
  • the specific plan is as follows:
  • This application provides a data processing method that can be applied to any target node in a distributed storage system, including:
  • other nodes and the target node determine the client's shared memory address based on the access control flow, including:
  • Other nodes and the target node extract shared memory addresses from the access control flow.
  • the access control flow corresponds to a read operation
  • other nodes and the target node will write the accessed data into the shared memory address, so that the client can read the accessed data.
  • NVMe Non-Volatile Memory express
  • any node writes the read data to the corresponding NVMe disk, including:
  • Any node parses the access control flow to obtain the global statistical identification of the disk corresponding to part of the data in the accessed data on the current node; determines the target NVMe disk based on the global statistical identification of the disk, and writes the read data to the target NVMe disk.
  • Any node counts the status information of all its NVMe disks and synchronizes the status information to the client.
  • This application provides a data processing device, applied to any target node in a distributed storage system, including:
  • the receiving module is used to receive the access control flow sent by the client that does not carry the accessed data; part of the accessed data is stored in the target node;
  • the determination module is used to determine other nodes where the accessed data is stored based on the access control flow
  • the forwarding module is used to forward the access control flow to other nodes, so that other nodes and the target node can complete the reading and writing of the accessed data through the shared memory address after determining the client's shared memory address based on the access control flow.
  • This application provides an electronic device, including:
  • Memory used to store computer programs
  • a processor is used to execute a computer program to implement the aforementioned disclosed data processing method.
  • the present application provides a non-volatile readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned disclosed data processing method is implemented.
  • this application provides a data processing method, which is applied to any target node in the distributed storage system, including: receiving the access control flow sent by the client that does not carry the accessed data; Part of the data is stored in the target node; other nodes where the accessed data is stored are determined based on the access control flow; the access control flow is forwarded to other nodes so that other nodes and the target node determine the client's shared memory address based on the access control flow, The accessed data is read and written through the shared memory address.
  • the access control flow does not carry the accessed data, so the transmission efficiency of the access control flow is faster; after the target node receives the access control flow, it can determine other locations where the accessed data is stored based on the access control flow. node, The access control flow is then forwarded to other nodes, so that other nodes and the target node determine the client's shared memory address based on the access control flow, and then complete the reading and writing of the accessed data through the shared memory address. It can be seen that this application realizes the separation of control flow and data flow, and only needs to forward and transmit the access control flow, while the data flow does not need to be forwarded between nodes, but directly reaches the corresponding node from the client.
  • each node that stores the accessed data can share the memory address of the client, direct data acquisition between the client and the node can be realized. Then each node can obtain the data stream directly through the client's memory, thereby quickly completing the process of accessing the accessed data. Reading and writing improves data access efficiency and performance.
  • the data processing device, equipment and readable storage medium provided by this application also have the above technical effects.
  • Figure 1 is a flow chart of a data processing method disclosed in this application.
  • FIG. 2 is a schematic diagram of a data stream transmission path disclosed in this application.
  • FIG. 3 is a flow chart of another data processing method disclosed in this application.
  • Figure 4 is a schematic diagram of a data processing device disclosed in this application.
  • Figure 5 is a schematic diagram of an electronic device disclosed in this application.
  • this application provides a data processing solution that can realize direct data acquisition between the client and the node and improve data access efficiency.
  • a target node including:
  • a distributed storage system based on the characteristics of distributed storage, data is distributed and stored on different nodes, so the accessed data is also distributed and stored on multiple nodes. As for how many data blocks the accessed data is divided into, and on which nodes these data blocks are distributed and stored, this depends on the distributed storage algorithm and strategy of the current distributed storage system. For specific reference, please refer to the existing related technologies. This implementation This example will not be repeated again. On this basis, the target node that receives the access control flow sent by the client is also determined by the distributed storage algorithm and strategy of the current distributed storage system, and of course can also be determined randomly.
  • the access control flow does not carry specific accessed data, but only records: the memory address in the client that can be shared by each node, the node information of several nodes where the accessed data is distributed and stored, and the information on each node.
  • the corresponding disk global statistics identification The global statistical identifier of the disk is used to distinguish different NVMe disks in the entire system.
  • the access control flow records the node information of several nodes where the accessed data is distributed and stored, all nodes where the accessed data is stored can be determined based on the access control flow.
  • the other nodes where the accessed data is stored may be: any node in the current distributed storage system except the target node.
  • each node that receives the access control flow can determine the client's shared memory address based on the access control flow, thereby obtaining data directly through the client's memory. flow.
  • other nodes and the target node determine the shared memory address of the client based on the access control flow, including: the other nodes and the target node extract the shared memory address from the access control flow.
  • Each node that receives the access control flow goes directly to the client memory and can perform read and write operations.
  • the access control flow corresponds to a read operation
  • other nodes and the target node will write the accessed data into the shared memory address, so that the client can read the accessed data.
  • other nodes and the target node write the accessed data in their corresponding NVMe disks into the shared memory address. Since the disk global statistical identifier is recorded in the access control flow, each node that receives the access control flow can determine the disk global statistical identifier corresponding to the accessed data in itself based on the access control flow. Therefore, when the client performs a read operation , each node can read data from the corresponding NVMe disk and send it to the client memory.
  • the access control flow corresponds to a write operation
  • other nodes and the target node start from the shared memory location. Read the accessed data from the address and write the read data to the corresponding NVMe disk.
  • each node that receives the access control flow can determine the disk global statistical identifier corresponding to the accessed data in itself based on the access control flow, so execute on the client During a write operation, each node can read the data that the client wants to store from the client's shared memory address, and then write it to the corresponding NVMe disk.
  • any node writes the read data to the corresponding NVMe disk, including: any node parses the access control flow to obtain the global statistical identification of the disk corresponding to part of the accessed data on the current node; Determine the target NVMe disk based on the global statistical identification of the disk, and write the read data to the target NVMe disk.
  • this application realizes the separation of control flow and data flow, and only needs to forward and transmit the access control flow, while the data flow does not need to be forwarded between nodes, but directly reaches the corresponding node from the client. Moreover, since each node that stores the accessed data can share the memory address of the client, direct data acquisition between the client and the node can be realized. Then each node can obtain the data stream directly through the client's memory, thereby quickly completing the process of accessing the accessed data. Reading and writing improves data access efficiency and performance.
  • the corresponding metadata needs to be updated. Therefore, in some embodiments of the present application, after completing the reading and writing of the accessed data, it also includes: updating the accessed data. metadata.
  • any node counts the status information of all its NVMe disks and synchronizes the status information to the client.
  • the status information of the NVMe disk includes: online, offline, hardware address, usage, global statistical identification, etc.
  • the status information can be synchronously sent to the client in a subscription manner.
  • this application can also implement: distributed IO rollback, re-storage, etc., thereby ensuring data consistency. For example: during a write operation, if individual nodes fail to write, the node with successful writing will be rolled back to recover the data. Another example: If a node in the system fails, the data is redistributed and stored on other nodes that are not faulty.
  • the following embodiment uses a CNC separation architecture to divide data requests between the client and the server into control flow and data flow.
  • the control flow can control the interconnection route between the client and the storage node, and the data flow flows directly from the client to the storage node. . See Figure 2. In Figure 2, data flows from each client directly to the storage node.
  • the distributed storage system provided by this embodiment can manage the status of NVMe disks in the entire system, including online, offline, hardware address, usage, global statistical identification, etc. This information is collected and aggregated and sent to each client synchronously on a subscription basis.
  • the client is: the user end of the distributed storage system, which can be installed on each host of the high-performance computer cluster.
  • the distributed storage system in addition to storing specific data, also stores metadata of these data. According to the general access logic, before accessing specific data, you need to obtain the corresponding file lock and read the metadata of the data. this application In some embodiments, the acquisition of file locks and the reading and writing of metadata can be performed according to existing related technologies.
  • the data access process provided by this embodiment includes:
  • the client synchronizes the status information of each NVMe disk.
  • the client acquires the distributed file lock and accesses the metadata of the data.
  • the client sends control flow to the current node.
  • the current node determines other nodes based on the control flow and transmits the control flow to other nodes.
  • each node that receives the control flow extracts the client memory address from the control flow, and completes client data reading and writing based on the client memory address.
  • Step 6 After the data reading and writing is completed, each node modifies the metadata information such as the size of the data and the modification time.
  • this embodiment realizes direct data transmission from the client to each node through numerical control separation, improves data access efficiency, and shortens the data flow transmission path.
  • a data processing device provided by an embodiment of the present application is introduced below.
  • the data processing device described below and the data processing method described above may be referred to each other.
  • an embodiment of the present application discloses a data processing device, which is applied to any target node in a distributed storage system, including:
  • the receiving module 401 is used to receive the access control flow sent by the client that does not carry the accessed data; part of the accessed data is stored in the target node;
  • Determining module 402 used to determine other nodes where the accessed data is stored based on the access control flow
  • the forwarding module 403 is used to forward the access control flow to other nodes, so that other nodes and the target node can complete reading and writing of the accessed data through the shared memory address after determining the client's shared memory address based on the access control flow.
  • other nodes and the target node extract the shared memory address from the access control flow.
  • the access control flow corresponds to a read operation
  • other nodes and the target node will write the accessed data into the shared memory address, so that the client can read the accessed data.
  • the access control flow corresponds to a write operation
  • other nodes and the target node read the accessed data from the shared memory address and write the read data to the corresponding NVMe disk.
  • any node writes the read data to the corresponding NVMe disk, including: any node parses the access control flow to obtain the global statistical identification of the disk corresponding to part of the accessed data on the current node; Determine the target NVMe disk based on the global statistical identification of the disk, and write the read data to the target NVMe disk.
  • the metadata update unit is used to update the metadata of the accessed data.
  • any node counts the status information of all its NVMe disks and synchronizes the status information to the client.
  • this embodiment provides a data processing device that can realize direct data acquisition between the client and the node and improve data access efficiency.
  • the distributed storage system includes N storage nodes, and each storage node is connected to N NVMe disks.
  • any storage node can perform the following steps: receive the access control flow sent by the client that does not carry the accessed data; store part of the data in the accessed data in the target node; determine the location of the accessed data based on the access control flow. Other nodes; forward the access control flow to other nodes, so that other nodes and the target node determine the client's shared memory address based on the access control flow, and then complete the reading and writing of the accessed data through the shared memory address.
  • other nodes and the target node extract the shared memory address from the access control flow.
  • the access control flow corresponds to a read operation
  • other nodes and the target node will write the accessed data into the shared memory address, so that the client can read the accessed data.
  • the access control flow corresponds to a write operation
  • other nodes and the target node read the accessed data from the shared memory address and write the read data to the corresponding NVMe disk.
  • any node writes the read data to the corresponding NVMe disk, including: any node parses the access control flow to obtain the global statistical identification of the disk corresponding to part of the accessed data on the current node; Determine the target NVMe disk based on the global statistical identification of the disk, and write the read data to the target NVMe disk.
  • any node counts the status information of all its NVMe disks and synchronizes the status information to the client.
  • An electronic device provided by an embodiment of the present application is introduced below.
  • the electronic device described below and the data processing method and device described above may be referred to each other.
  • an electronic device including:
  • Memory 501 used to store computer programs
  • the processor 502 is used to execute computer programs to implement the methods disclosed in any of the above embodiments.
  • non-volatile readable storage medium provided by the embodiment of the present application.
  • the non-volatile readable storage medium described below and the data processing method, device and equipment described above can be referred to each other. .
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了计算机技术领域内的一种数据处理方法、装置、设备及可读存储介质。本申请实现了控制流和数据流的分离,只需转发传输访问控制流,而数据流无需在各节点间转发,而是从客户端直接到达相应节点。并且,由于存储被访问数据的各节点可以共享客户端的内存地址,因此可实现客户端与节点间的数据直取,那么各节点可以直通客户端内存来得到数据流,从而快速完成被访问数据的读写,提升了数据访问效率和性能。相应地,本申请提供的一种数据处理装置、设备及可读存储介质,也同样具有上述技术效果。

Description

一种数据处理方法、装置、设备及可读存储介质
相关申请的交叉引用
本申请要求于2022年4月29日提交中国专利局,申请号为202210468189.1,申请名称为“一种数据处理方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种数据处理方法、装置、设备及可读存储介质。
背景技术
现有的数据传输路径较长,需要经过客户端应用程序、客户端系统接口、客户端网卡、存储节点网卡,才能到达节点的后端磁盘,限制了访问性能和访问效率。
发明内容
有鉴于此,本申请的目的在于提供一种数据处理方法、装置、设备及可读存储介质,以提高数据访问效率。其具体方案如下:
本申请提供了一种数据处理方法,应用于分布式存储系统中的任一目标节点,包括:
接收客户端发送的未携带被访问数据的访问控制流;被访问数据中的部分数据存储在目标节点中;
基于访问控制流确定被访问数据所存储的其他节点;
将访问控制流转发至其他节点,以使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。
本申请一些实施例中,其他节点和目标节点基于访问控制流确定客户端的共享内存地址,包括:
其他节点和目标节点从访问控制流中提取共享内存地址。
本申请一些实施例中,若访问控制流对应读操作,则其他节点和目标节点将被访问数据写入共享内存地址,以使客户端读取被访问数据。
本申请一些实施例中,若访问控制流对应写操作,则其他节点和目标节点从共享内存地址中读取被访问数据,并将读取到的数据写入相应NVMe(Non-Volatile Memory express)盘。
本申请一些实施例中,任一节点将读取到的数据写入相应NVMe盘,包括:
任一节点解析访问控制流,得到被访问数据中的部分数据在当前节点对应的磁盘全局统计标识;基于磁盘全局统计标识确定目标NVMe盘,并将读取到的数据写入目标NVMe盘。
本申请一些实施例中,完成被访问数据的读写之后,还包括:
更新被访问数据的元数据。
本申请一些实施例中,还包括:
任一节点统计自身所有NVMe盘的状态信息,并将状态信息同步至客户端。
本申请提供了一种数据处理装置,应用于分布式存储系统中的任一目标节点,包括:
接收模块,用于接收客户端发送的未携带被访问数据的访问控制流;被访问数据中的部分数据存储在目标节点中;
确定模块,用于基于访问控制流确定被访问数据所存储的其他节点;
转发模块,用于将访问控制流转发至其他节点,以使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。
本申请提供了一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序,以实现前述公开的数据处理方法。
本申请提供了一种非易失性可读存储介质,用于保存计算机程序,其中,计算机程序被处理器执行时实现前述公开的数据处理方法。
通过以上方案可知,本申请提供了一种数据处理方法,应用于分布式存储系统中的任一目标节点,包括:接收客户端发送的未携带被访问数据的访问控制流;被访问数据中的部分数据存储在目标节点中;基于访问控制流确定被访问数据所存储的其他节点;将访问控制流转发至其他节点,以使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。
可见,在本申请中,访问控制流不携带被访问数据,因此访问控制流的传输效率较快;在目标节点接收到访问控制流后,其可以基于访问控制流确定被访问数据所存储的其他节点, 然后将访问控制流转发至其他节点,从而使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。可见,本申请实现了控制流和数据流的分离,只需转发传输访问控制流,而数据流无需在各节点间转发,而是从客户端直接到达相应节点。并且,由于存储被访问数据的各节点可以共享客户端的内存地址,因此可实现客户端与节点间的数据直取,那么各节点可以直通客户端内存来得到数据流,从而快速完成被访问数据的读写,提升了数据访问效率和性能。
相应地,本申请提供的一种数据处理装置、设备及可读存储介质,也同样具有上述技术效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请公开的一种数据处理方法流程图;
图2为本申请公开的一种数据流传输路径示意图;
图3为本申请公开的另一种数据处理方法流程图;
图4为本申请公开的一种数据处理装置示意图;
图5为本申请公开的一种电子设备示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,现有的数据传输路径较长,需要经过客户端应用程序、客户端系统接口、客户端网卡、存储节点网卡,才能到达节点的后端磁盘,限制了访问性能和访问效率。为此,本申请提供了一种数据处理方案,能够实现客户端与节点间的数据直取,提高数据访问效率。
参见图1所示,本申请实施例公开了一种数据处理方法,应用于分布式存储系统中的任 一目标节点,包括:
S101、接收客户端发送的未携带被访问数据的访问控制流;被访问数据中的部分数据存储在目标节点中。
在分布式存储系统中,基于分布式存储的特性,数据被分布存储在不同节点上,因此被访问数据也分布存储在多个节点上。至于被访问数据被划分为几个数据块,这些数据块分布存储在哪几个节点上,这取决于当前分布式存储系统的分布存储算法及策略,具体参照现有相关技术即可,本实施例对此不再赘述。在此基础上,接收客户端发送的访问控制流的目标节点也由当前分布式存储系统的分布存储算法及策略确定,当然还可以随机确定。
本申请一些实施例中,访问控制流不携带具体的被访问数据,只记录有:客户端中可以被各个节点共享的内存地址、被访问数据分布存储的几个节点的节点信息以及在各个节点上所对应的磁盘全局统计标识。磁盘全局统计标识用于区分整个系统中的不同NVMe盘。
S102、基于访问控制流确定被访问数据所存储的其他节点。
由于访问控制流中记录有被访问数据分布存储的几个节点的节点信息,因此基于访问控制流可以确定被访问数据所存储的所有节点。其中,被访问数据所存储的其他节点可以是:当前分布式存储系统中除目标节点之外的任一个节点。
S103、将访问控制流转发至其他节点,以使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。
由于访问控制流中记录有客户端中可以被各个节点共享的内存地址,因此收到访问控制流的每个节点都可以基于访问控制流确定客户端的共享内存地址,从而直通客户端内存来得到数据流。
本申请一些实施例中,其他节点和目标节点基于访问控制流确定客户端的共享内存地址,包括:其他节点和目标节点从访问控制流中提取共享内存地址。
收到访问控制流的每个节点直通客户端内存,可以实现读和写操作。
本申请一些实施例中,若访问控制流对应读操作,则其他节点和目标节点将被访问数据写入共享内存地址,以使客户端读取被访问数据。具体的,其他节点和目标节点将自身中相应NVMe盘中的被访问数据写入共享内存地址。由于访问控制流中记录有磁盘全局统计标识,因此收到访问控制流的每个节点都可以基于访问控制流确定被访问数据在自身中对应的磁盘全局统计标识,因此在客户端执行读操作时,各节点可以从相应NVMe盘中读取数据并送入客户端内存。
本申请一些实施例中,若访问控制流对应写操作,则其他节点和目标节点从共享内存地 址中读取被访问数据,并将读取到的数据写入相应NVMe盘。相应地,由于访问控制流中记录有磁盘全局统计标识,因此收到访问控制流的每个节点都可以基于访问控制流确定被访问数据在自身中对应的磁盘全局统计标识,因此在客户端执行写操作时,各节点可以从客户端的共享内存地址中读取客户端想要存储的数据,然后写入相应NVMe盘。
本申请一些实施例中,任一节点将读取到的数据写入相应NVMe盘,包括:任一节点解析访问控制流,得到被访问数据中的部分数据在当前节点对应的磁盘全局统计标识;基于磁盘全局统计标识确定目标NVMe盘,并将读取到的数据写入目标NVMe盘。
可见,本申请实现了控制流和数据流的分离,只需转发传输访问控制流,而数据流无需在各节点间转发,而是从客户端直接到达相应节点。并且,由于存储被访问数据的各节点可以共享客户端的内存地址,因此可实现客户端与节点间的数据直取,那么各节点可以直通客户端内存来得到数据流,从而快速完成被访问数据的读写,提升了数据访问效率和性能。
基于上述实施例,需要说明的是,数据被读或写之后,相应的元数据就需要更新,因此本申请一些实施例中,完成被访问数据的读写之后,还包括:更新被访问数据的元数据。
本申请一些实施例中,任一节点统计自身所有NVMe盘的状态信息,并将状态信息同步至客户端。具体的,NVMe盘的状态信息包括:上线、下线、硬件地址、使用情况以及全局统计标识等,状态信息可以以订阅的方式同步发送到客户端。
需要说明的是,本申请还可以实现:分布式IO的回滚、重存储等,从而保障数据的一致性。例如:写操作时,若个别节点写入失败,则使写入成功的节点进行回滚,以恢复数据。再如:若系统中某一节点故障,则使数据重新分布存储在其他未故障的节点上。
下述实施例采用数控分离的架构,将客户端与服务端间的数据请求分为控制流和数据流,控制流可控制客户端和存储节点间的互联路由,数据流从客户端直通存储节点。请参见图2,在图2中,数据流从各客户端直通存储节点。
本实施例提供的分布式存储系统能够管理全系统的NVMe盘的状态,包括,上线、下线、硬件地址、使用情况以及全局统计标识等。这些信息被收集汇总起来,并且以订阅的方式同步发送到各客户端。其中,客户端即:分布式存储系统的用户端,其可以安装在高性能计算机群的各个主机上。
需要说明的是,分布式存储系统除了存储具体数据,还存储这些数据的元数据。按照一般地访问逻辑,在访问具体的数据之前,需要获取相应文件锁并读取数据的元数据。本申请 一些实施例中,文件锁的获取及元数据的读取和写入按照现有相关技术执行即可。
请参见图3,本实施例提供的数据访问流程包括:
第一步,客户端同步各NVMe磁盘的状态信息。
第二步,客户端获取分布式文件锁并访问数据的元数据。
第三步,客户端发送控制流至当前节点。
第四步,当前节点基于控制流确定其他节点,并传输控制流至其他节点。
第五步,收到控制流的各节点从控制流中提取客户端内存地址,并基于该客户端内存地址完成客户端的数据读写。
第六步,数据读写完成后,各节点修改数据的大小、修改时间等元数据信息。
可见,本实施例通过数控分离的方式,实现了客户端至各节点的直接数据传输,提升了数据访问效率,缩短数据流传输路径。
下面对本申请实施例提供的一种数据处理装置进行介绍,下文描述的一种数据处理装置与上文描述的一种数据处理方法可以相互参照。
参见图4所示,本申请实施例公开了一种数据处理装置,应用于分布式存储系统中的任一目标节点,包括:
接收模块401,用于接收客户端发送的未携带被访问数据的访问控制流;被访问数据中的部分数据存储在目标节点中;
确定模块402,用于基于访问控制流确定被访问数据所存储的其他节点;
转发模块403,用于将访问控制流转发至其他节点,以使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。
本申请一些实施例中,其他节点和目标节点从访问控制流中提取共享内存地址。
本申请一些实施例中,若访问控制流对应读操作,则其他节点和目标节点将被访问数据写入共享内存地址,以使客户端读取被访问数据。
本申请一些实施例中,若访问控制流对应写操作,则其他节点和目标节点从共享内存地址中读取被访问数据,并将读取到的数据写入相应NVMe盘。
本申请一些实施例中,任一节点将读取到的数据写入相应NVMe盘,包括:任一节点解析访问控制流,得到被访问数据中的部分数据在当前节点对应的磁盘全局统计标识;基于磁盘全局统计标识确定目标NVMe盘,并将读取到的数据写入目标NVMe盘。
本申请一些实施例中,还包括:
元数据更新单元,用于更新被访问数据的元数据。
本申请一些实施例中,任一节点统计自身所有NVMe盘的状态信息,并将状态信息同步至客户端。
其中,关于本实施例中各个模块、单元更加具体的工作过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
可见,本实施例提供了一种数据处理装置,能够实现客户端与节点间的数据直取,提高数据访问效率。
下面对本申请实施例提供的一种分布式存储系统进行介绍,下文描述的一种分布式存储系统与上文描述的一种数据处理方法及装置可以相互参照。
本实施例提供的分布式存储系统包括N个存储节点,每个存储节点插接有N个NVMe磁盘。其中,任一个存储节点可以执行以下步骤:接收客户端发送的未携带被访问数据的访问控制流;被访问数据中的部分数据存储在目标节点中;基于访问控制流确定被访问数据所存储的其他节点;将访问控制流转发至其他节点,以使其他节点和目标节点基于访问控制流确定客户端的共享内存地址后,通过共享内存地址完成被访问数据的读写。
本申请一些实施例中,其他节点和目标节点从访问控制流中提取共享内存地址。
本申请一些实施例中,若访问控制流对应读操作,则其他节点和目标节点将被访问数据写入共享内存地址,以使客户端读取被访问数据。
本申请一些实施例中,若访问控制流对应写操作,则其他节点和目标节点从共享内存地址中读取被访问数据,并将读取到的数据写入相应NVMe盘。
本申请一些实施例中,任一节点将读取到的数据写入相应NVMe盘,包括:任一节点解析访问控制流,得到被访问数据中的部分数据在当前节点对应的磁盘全局统计标识;基于磁盘全局统计标识确定目标NVMe盘,并将读取到的数据写入目标NVMe盘。
本申请一些实施例中,任一节点统计自身所有NVMe盘的状态信息,并将状态信息同步至客户端。
下面对本申请实施例提供的一种电子设备进行介绍,下文描述的一种电子设备与上文描述的一种数据处理方法及装置可以相互参照。
参见图5所示,本申请实施例公开了一种电子设备,包括:
存储器501,用于保存计算机程序;
处理器502,用于执行计算机程序,以实现上述任意实施例公开的方法。
下面对本申请实施例提供的一种非易失性可读存储介质进行介绍,下文描述的一种非易失性可读存储介质与上文描述的一种数据处理方法、装置及设备可以相互参照。
一种非易失性可读存储介质,用于保存计算机程序,其中,计算机程序被处理器执行时实现前述实施例公开的数据处理方法。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本申请涉及的“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法或设备固有的其它步骤或单元。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的可读存储介质中。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种数据处理方法,其特征在于,应用于分布式存储系统中的任一目标节点,包括:
    接收客户端发送的未携带被访问数据的访问控制流;所述被访问数据中的部分数据存储在所述目标节点中;
    基于所述访问控制流确定所述被访问数据所存储的其他节点;
    将所述访问控制流转发至所述其他节点,以使所述其他节点和所述目标节点基于所述访问控制流确定所述客户端的共享内存地址后,通过所述共享内存地址完成所述被访问数据的读写。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述访问控制流确定所述被访问数据所存储的其他节点,包括:
    基于所述访问控制流中记录的所述被访问数据分布存储的节点的节点信息,确定所述被访问数据所存储的所述其他节点。
  3. 根据权利要求1所述的方法,其特征在于,所述其他节点和所述目标节点基于所述访问控制流确定所述客户端的共享内存地址,包括:
    所述其他节点和所述目标节点从所述访问控制流中提取所述共享内存地址。
  4. 根据权利要求1所述的方法,其特征在于,所述其他节点和所述目标节点基于所述访问控制流确定所述客户端的共享内存地址,包括:
    所述其他节点和所述目标节点基于所述访问控制流中记录的所述客户端可被各个节点共享的内存地址,确定所述客户端的所述共享内存地址。
  5. 根据权利要求1所述的方法,其特征在于,
    若所述访问控制流对应读操作,则所述其他节点和所述目标节点将所述被访问数据写入所述共享内存地址,以使所述客户端读取所述被访问数据。
  6. 根据权利要求5所述的方法,其特征在于,所述其他节点和所述目标节点将所述被访问数据写入所述共享内存地址,包括:
    所述其他节点和所述目标节点将自身中相应NVMe盘中的所述被访问数据写入所述共享内存地址。
  7. 根据权利要求6所述的方法,其特征在于,所述其他节点和所述目标节点将自身中相应NVMe盘中的所述被访问数据写入所述共享内存地址,包括:
    所述其他节点和所述目标节点基于所述访问控制流确定所述被访问数据在自身中对 应的磁盘全局统计标识,从相应NVMe盘中读取所述被访问数据并写入所述共享内存地址。
  8. 根据权利要求1所述的方法,其特征在于,
    若所述访问控制流对应写操作,则所述其他节点和所述目标节点从所述共享内存地址中读取所述被访问数据,并将读取到的数据写入相应NVMe盘。
  9. 根据权利要求8所述的方法,其特征在于,所述其他节点和所述目标节点从所述共享内存地址中读取所述被访问数据,并将读取到的数据写入相应NVMe盘,包括:
    所述其他节点和所述目标节点基于所述访问控制流确定所述被访问数据在自身中对应的磁盘全局统计标识,从所述共享内存地址中读取所述被访问数据,并将读取到的数据写入相应NVMe盘。
  10. 根据权利要求8所述的方法,其特征在于,任一节点将读取到的数据写入相应NVMe盘,包括:
    任一节点解析所述访问控制流,得到所述被访问数据中的部分数据在当前节点对应的磁盘全局统计标识;基于所述磁盘全局统计标识确定目标NVMe盘,并将读取到的数据写入所述目标NVMe盘。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述完成所述被访问数据的读写之后,还包括:
    更新所述被访问数据的元数据。
  12. 根据权利要求1至10任一项所述的方法,其特征在于,还包括:
    任一节点统计自身所有NVMe盘的状态信息,并将所述状态信息同步至所述客户端。
  13. 根据权利要求12所述的方法,其特征在于,所述将所述状态信息同步至所述客户端,包括:
    将所述状态信息以订阅的方式同步至所述客户端。
  14. 根据权利要求1至10任一项所述的方法,其特征在于,还包括:
    若写操作时存在部分节点写入失败,则写入成功的节点回滚,以恢复数据。
  15. 根据权利要求1至10任一项所述的方法,其特征在于,还包括:
    若所述分布式存储系统中存在节点故障,则将对应的数据重新分布存储在未故障的节点上。
  16. 根据权利要求1至10任一项所述的方法,其特征在于,还包括:
    根据所述分布式存储系统的分布存储算法及策略确定所述目标节点。
  17. 根据权利要求1至10任一项所述的方法,其特征在于,所述访问数据流中只记录有:所述客户端中可被各个节点共享的内存地址、所述被访问数据分布存储的节点的节点信息以及在各个节点上所对应的磁盘全局统计标识。
  18. 一种数据处理装置,其特征在于,应用于分布式存储系统中的任一目标节点,包括:
    接收模块,用于接收客户端发送的未携带被访问数据的访问控制流;所述被访问数据中的部分数据存储在所述目标节点中;
    确定模块,用于基于所述访问控制流确定所述被访问数据所存储的其他节点;
    转发模块,用于将所述访问控制流转发至所述其他节点,以使所述其他节点和所述目标节点基于所述访问控制流确定所述客户端的共享内存地址后,通过所述共享内存地址完成所述被访问数据的读写。
  19. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序,以实现如权利要求1至17任一项所述的方法。
  20. 一种非易失性可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的方法。
PCT/CN2023/084830 2022-04-29 2023-03-29 一种数据处理方法、装置、设备及可读存储介质 WO2023207492A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210468189.1 2022-04-29
CN202210468189.1A CN114827178A (zh) 2022-04-29 2022-04-29 一种数据处理方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2023207492A1 true WO2023207492A1 (zh) 2023-11-02

Family

ID=82510439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084830 WO2023207492A1 (zh) 2022-04-29 2023-03-29 一种数据处理方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN114827178A (zh)
WO (1) WO2023207492A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114827178A (zh) * 2022-04-29 2022-07-29 济南浪潮数据技术有限公司 一种数据处理方法、装置、设备及可读存储介质
CN115904253B (zh) * 2023-01-09 2023-06-13 苏州浪潮智能科技有限公司 一种数据传输方法、装置、一种存储系统及设备和介质
CN116886719B (zh) * 2023-09-05 2024-01-23 苏州浪潮智能科技有限公司 存储系统的数据处理方法、装置、存储系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110004732A1 (en) * 2007-06-06 2011-01-06 3Leaf Networks, Inc. DMA in Distributed Shared Memory System
CN106406764A (zh) * 2016-09-21 2017-02-15 郑州云海信息技术有限公司 一种分布式san块存储的高效能数据访问系统及方法
CN109889561A (zh) * 2017-12-25 2019-06-14 新华三大数据技术有限公司 一种数据处理方法及装置
CN110199270A (zh) * 2017-12-26 2019-09-03 华为技术有限公司 存储系统中存储设备的管理方法及装置
CN114827178A (zh) * 2022-04-29 2022-07-29 济南浪潮数据技术有限公司 一种数据处理方法、装置、设备及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354250A (zh) * 2015-10-16 2016-02-24 浪潮(北京)电子信息产业有限公司 一种面向云存储的数据存储方法及装置
US10423568B2 (en) * 2015-12-21 2019-09-24 Microsemi Solutions (U.S.), Inc. Apparatus and method for transferring data and commands in a memory management environment
EP3985949A1 (en) * 2017-12-26 2022-04-20 Huawei Technologies Co., Ltd. Method and apparatus for managing storage device in storage system
WO2019127018A1 (zh) * 2017-12-26 2019-07-04 华为技术有限公司 存储系统访问方法及装置
CN113535068A (zh) * 2020-04-21 2021-10-22 华为技术有限公司 数据读取方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110004732A1 (en) * 2007-06-06 2011-01-06 3Leaf Networks, Inc. DMA in Distributed Shared Memory System
CN106406764A (zh) * 2016-09-21 2017-02-15 郑州云海信息技术有限公司 一种分布式san块存储的高效能数据访问系统及方法
CN109889561A (zh) * 2017-12-25 2019-06-14 新华三大数据技术有限公司 一种数据处理方法及装置
CN110199270A (zh) * 2017-12-26 2019-09-03 华为技术有限公司 存储系统中存储设备的管理方法及装置
CN114827178A (zh) * 2022-04-29 2022-07-29 济南浪潮数据技术有限公司 一种数据处理方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN114827178A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2023207492A1 (zh) 一种数据处理方法、装置、设备及可读存储介质
US10496669B2 (en) System and method for augmenting consensus election in a distributed database
US10853182B1 (en) Scalable log-based secondary indexes for non-relational databases
CN103885895B (zh) 容错集群存储系统中的写入性能
US7467259B2 (en) System and method to protect data stored in a storage system
US8996611B2 (en) Parallel serialization of request processing
US7293145B1 (en) System and method for data transfer using a recoverable data pipe
US8583885B1 (en) Energy efficient sync and async replication
US7373470B2 (en) Remote copy control in a storage system
US10133673B2 (en) Cache optimization based on predictive routing
CN106104502B (zh) 用于存储系统事务的系统、方法和介质
CN104580439B (zh) 一种云存储系统中使数据均匀分布的方法
JP6225262B2 (ja) 分散データグリッドにおいてデータを同期させるためにパーティションレベルジャーナリングをサポートするためのシステムおよび方法
CN106126374B (zh) 数据写入方法、数据读取方法及装置
US20140337457A1 (en) Using network addressable non-volatile memory for high-performance node-local input/output
US20120078844A1 (en) System and method for distributed processing of file volume
CN107329704B (zh) 一种缓存镜像方法及控制器
US9984139B1 (en) Publish session framework for datastore operation records
US20130326150A1 (en) Process for maintaining data write ordering through a cache
JP6133396B2 (ja) 計算機システム、サーバ、及び、データ管理方法
US20160308965A1 (en) Storage node, storage node administration device, storage node logical capacity setting method, program, recording medium, and distributed data storage system
CN113268472B (zh) 一种分布式数据存储系统及方法
CN108540510B (zh) 一种云主机创建方法、装置及云服务系统
CN107817951A (zh) 一种实现Ceph集群融合的方法及装置
CN106897288B (zh) 数据库的服务提供方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794927

Country of ref document: EP

Kind code of ref document: A1