WO2017096942A1 - File storage system, data scheduling method, and data node - Google Patents

File storage system, data scheduling method, and data node Download PDF

Info

Publication number
WO2017096942A1
WO2017096942A1 PCT/CN2016/095532 CN2016095532W WO2017096942A1 WO 2017096942 A1 WO2017096942 A1 WO 2017096942A1 CN 2016095532 W CN2016095532 W CN 2016095532W WO 2017096942 A1 WO2017096942 A1 WO 2017096942A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
written
node
data node
distributed storage
Prior art date
Application number
PCT/CN2016/095532
Other languages
French (fr)
Chinese (zh)
Inventor
董阳
单卫华
殷晖
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2017096942A1 publication Critical patent/WO2017096942A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput

Definitions

  • the present invention relates to the field of file systems, and in particular, to a file storage system, a data scheduling method, and a data node.
  • HDFS Hadoop Distributed File System
  • the system architecture of the HDFS is as shown in FIG. 1, and includes: a client 11 and a server group 12.
  • the client 11 includes a distributed file system module 111 and a file system data output stream (FSData OutputStream) module 112.
  • the server group adopts a master-slave structure, and consists of a name node (NN) 121 and a plurality of data nodes (DNs) 122.
  • the name node 121 is a main server that manages the file namespace and adjusts the client access file; the data node 122 is used to store data, generally one data node corresponds to one server, and each data node corresponds to a distributed storage subsystem. Storage with distributed storage.
  • the client Before using the above HDFS system to write data, the client first initiates an RPC request to the remote NN node through the DistributedFileSystem module; the NN node creates a new file in the file system namespace; the DistributedFileSystem module returns a DFSOutputStream to the HDFS client, and then The client starts writing data.
  • the client starts writing data, and DFSOutputStream divides the data into blocks and writes it into the data queue.
  • the Data queue is read by the Data Streamer and informs the name node to allocate data nodes for storing data blocks (each data block corresponds to 3 data nodes by default).
  • Data Streamer uses pipelines to sequentially write data into allocated data nodes to achieve mutual backup of data blocks between multiple data nodes.
  • each data node corresponds to a distributed storage device, and the distributed storage device refers to a plurality of physical disks.
  • the data node forwards the written block data to the distributed storage device through the IO, triggers the write process of the distributed storage device, and the distributed storage device writes the data to the primary physical disk, and simultaneously sends the copy request to the standby physical disk to implement Multiple backup data (3 copies by default) is written on the distributed storage device.
  • Data Streamer closes the write stream and notifies the name node that the data has been written.
  • the writing operation of the next data block is performed only when all the data nodes complete the writing of the data, and the data writing speed is slow.
  • the invention provides a file storage system, a data scheduling method and a data node, which can speed up data writing speed.
  • the first aspect of the present invention provides a file storage system, where the server side of the file storage system includes:
  • a name node a primary data node, and at least one backup data node
  • the primary data node and the at least one backup data node share a first distributed storage subsystem, the first distributed storage subsystem including a primary storage device and at least one backup storage device;
  • the master data node is configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written; and the data to be written is written into the first distributed storage subsystem; Sending an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
  • the backup data node is configured to receive an update request; according to the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written in the update request,
  • the first distributed storage subsystem searches for the data to be written; when the data to be written is found, the attribute information of the data to be written is saved.
  • the operation permission of the primary data node to the first distributed storage subsystem is to allow a read/write operation; the backup data section
  • the operational authority to point to the first distributed storage subsystem is to allow only read operations.
  • the embodiment of the present invention further provides a data scheduling method, where the method is applied to the file storage system according to the first aspect, the method includes:
  • the main data node receives a write operation instruction sent by the client, where the write operation instruction includes data to be written;
  • the primary data node sends an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written .
  • the attribute information of the data to be written includes a name and a size of the data to be written.
  • the method further includes:
  • the first data node fails, recovering the system file of the failed data node, and obtaining the restored data node, where the first data node is any one of all data nodes;
  • the first distributed subsystem is mounted to the restored data node.
  • an embodiment of the present invention further provides a data scheduling method, including:
  • the backup data node receives an update request, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
  • the backup data node searches for the storage location in the first distributed storage subsystem and the attribute information of the data to be written according to the storage data of the to-be-written data in the first distributed storage subsystem. Describe the data to be written;
  • the backup data node saves the attribute information of the data to be written.
  • an embodiment of the present invention provides a data node, including:
  • a receiving module configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written
  • a write circuit configured to write the data to be written into the first distributed storage subsystem
  • a sending module configured to send an update request to the first backup data node, where the update request includes a storage location of the to-be-written data in the first distributed storage subsystem and the number of to-be-written According to the attribute information.
  • the attribute information of the data to be written includes a name and a size of the data to be written.
  • an embodiment of the present invention further provides a data node, including:
  • a receiving module configured to receive an update request, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
  • a processing module configured to search for a location in the first distributed storage subsystem according to a storage location of the to-be-written data in the first distributed storage subsystem and attribute information of the to-be-written data Describe the data to be written;
  • a storage module configured to: when the data to be written is found, the backup data node saves the attribute information of the data to be written.
  • a plurality of data nodes share a distributed storage subsystem
  • the distributed storage subsystem includes a primary storage device and at least one backup storage device, which can implement mutual data between the storage devices. Backup.
  • the primary data node writes the data to be written into the first distributed storage subsystem, and then sends an update request to the first backup data node to notify the first backup data node that the data is to be written.
  • the attribute information and the storage location of the data to be written, the first backup data node only needs to view the first distributed storage subsystem according to the update request, and determines that the data to be written has been written in the first distributed storage subsystem.
  • the attribute information of the data to be written in the update request is saved, and the writing process of the data to be written is completed.
  • all the data nodes need to write the data to be written into their corresponding distributed storage systems respectively.
  • the node performs the process of writing data to the first distributed operating system, and the remaining data nodes utilize the data localization feature to view the shared distributed storage subsystem, which can reduce the network transmission and storage time of data between the various data nodes. In turn, the data writing speed is accelerated.
  • FIG. 1 is a schematic structural diagram of an HDFS in the prior art
  • FIG. 2 is a schematic structural diagram of an HDFS according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a data scheduling method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart diagram of another data scheduling method according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a data node according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of another data node according to an embodiment of the present invention.
  • the embodiment of the invention provides a file storage system, as shown in FIG. 2, comprising: a client 21 and a server group 22.
  • the client includes a distributed file system module 211 and a file system data output stream (FSData OutputStream) module 212.
  • the server farm employs a master-slave structure, including a name node 221, a primary data node 222, and at least one backup data node 223; wherein the primary data node and the at least one backup data node share a first distributed storage subsystem 224, the first distribution
  • the storage subsystem 224 includes a primary storage device 2241 and at least one backup storage device 2242.
  • each node in the server group can be equivalent to one server in physical implementation.
  • the distributed storage subsystem is provided to each data node in the form of a virtual device, which is embodied as a virtual disk on each data node, and the read and write to the distributed subsystem is similar to the reading and writing of the local physical disk.
  • the distributed subsystem includes multiple physical storage devices, for example, multiple hard disks; data between multiple physical storage devices can be backed up to each other.
  • the first distributed storage subsystem is capable of sharing between various data nodes.
  • the operation authority of each data node to the first distributed storage subsystem may be unrestricted, and special provisions may also be made.
  • the operation authority of the primary data node 222 to the first distributed storage subsystem 224 is to allow read and write operations; and the operation of the backup data node 223 to the first distributed storage subsystem 224 Permissions are only allowed for read operations.
  • the restored data node Since a plurality of data nodes share the first distributed storage subsystem, when a data node fails, only the system file of the failed data node needs to be restored, the restored data node is obtained, and then the first distributed sub-node is The system is mounted to the restored data node to implement data recovery in the first distributed storage subsystem under the data node, without using data replication to recover data, thereby improving data recovery efficiency.
  • each data node may separately correspond to the distributed storage subsystem, respectively, with respect to the first distribution.
  • the storage subsystem can be shared by all data nodes.
  • the distributed storage subsystem corresponding to a data node has only read and write permissions to the data node itself.
  • an embodiment of the present invention provides a data scheduling method, as described in FIG. 3, the method includes:
  • the primary data node receives a write operation instruction sent by the client.
  • the write operation instruction includes data to be written.
  • the primary data node writes the to-be-written data into the first distributed storage subsystem.
  • the primary data node after receiving the data to be written, forwards the data to be written to the distributed storage subsystem, triggering the write operation process of the distributed storage subsystem, and the distributed storage subsystem.
  • the data to be written is written to the primary physical disk, and the copy request is sent to the backup disk.
  • the backup disk then copies the data of the primary physical disk and saves the data to backup the backup disk and the primary physical disk.
  • the primary data node sends a notification message to the name node, and sends a response message to the client.
  • the primary data node After the primary data node successfully writes the data to the distributed storage subsystem, indicating that the data is written, the primary data node sends a notification message to the name node to inform the name node that the data to be written has been written, and sends a response message to the client. In order to inform the client that the data has been written according to the command sent by the client.
  • the primary data node sends an update request to the first backup data node.
  • the primary data node After the primary data node writes the data into the distributed storage system, it sends a write request to the first backup data node, where the write request carries the data to be written, and the first backup data node receives the write request. After that, the data to be written carried in the request needs to be written to an independent one corresponding to itself. In the distributed storage subsystem, this completes the write data process of the first backup data node.
  • the update request includes only the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written, such as the name and size of the data to be written, and the like.
  • the information contained in the update request is equivalent to the index information of the data to be written, and does not include the data to be written itself.
  • the first backup data node can find the data to be written from the corresponding location according to the update request, and the process of data transmission between the primary data node and the first backup data node, and the first backup data are omitted.
  • the node will write the data to be written for the entire process.
  • the backup data node receives the update request.
  • the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written.
  • the backup data node referred to in this step includes the first backup data node referred to in the foregoing steps.
  • the file storage system may be in addition to the primary data node and the first backup data node. Also included are a second backup data node, a third backup data node, and the like.
  • the pipeline processing process is used, and after the previous data node is written, an update request is sent to the next data node. For example, after the primary data node writes the data to be written into the distributed storage system, the primary backup data node sends an update request to the first backup data node; after the first backup data node completes the data write, the update request is sent to the second backup data node. So on and so forth.
  • the backup data node searches for the data to be written in the first distributed storage subsystem according to the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written.
  • the backup data node saves the attribute information of the data to be written.
  • the backup data node sends a notification message to the name node, and sends a response message to the client.
  • the backup data node only needs to save the attribute information of the data to be written, and the writing process of the data to be written is completed. After the backup data node completes the data writing, it sends a notification message to the name node to send a response message to the client.
  • a plurality of data nodes share a distributed storage subsystem, and the distributed storage subsystem includes a primary storage device and at least one backup storage device, which can implement data between the storage devices. Back up each other.
  • the primary data node writes the data to be written into the first distributed storage subsystem, and then sends an update request to the first backup data node to notify the first backup data node that the data is to be written.
  • the attribute information and the storage location of the data to be written, the first backup data node only needs to view the first distributed storage subsystem according to the update request, and determines that the data to be written has been written in the first distributed storage subsystem.
  • the attribute information of the data to be written in the update request is saved, and the writing process of the data to be written is completed.
  • all the data nodes need to write the data to be written into their corresponding distributed storage systems, so that the data scheduling method provided by the present invention has only one data node, but only one
  • the data node performs the process of writing data to the first distributed operating system, and the remaining data nodes utilize the data localization feature to view the shared distributed storage subsystem, which can reduce the network transmission and storage time of data between the various data nodes. , thereby speeding up data writing speed.
  • the system implements the local request data, which can reduce the amount of data transmitted between the data nodes and reduce the transmission overhead; in addition, since only the first data node performs the reading and writing of the data to be written and the data to be written is saved in the first In a distributed storage system, the remaining data nodes do not need to save the data to be written, so the storage space occupied can be reduced, thereby saving the storage overhead of the server group.
  • the embodiment of the present invention further provides a specific implementation process of data scheduling, as shown in FIG. 4, including:
  • the client's DistributedFileSystem module initiates an RPC request to the name node.
  • the Remote Procedure Call Protocol (RPC) request is used to create a new file in the file system namespace.
  • the name node creates a new file after receiving the RPC request.
  • the name node first checks whether the file to be created already exists, and whether the creator has permission to operate. If the file to be created does not exist and the creator has permission to perform the operation, the name will be executed. Steps and subsequent steps; otherwise the client will throw an exception and end the process of reading and writing files.
  • the following steps 403 to 407 are data writing processes.
  • the client's DFSOutputStream module divides the data into blocks, writes them into a data queue, and notifies the name node to allocate data nodes.
  • the Data queue is read by the Data Streamer submodule in the DFSOutputStream module.
  • the data node is used to store data blocks, and the assigned data nodes are placed in a pipeline.
  • the client's Data Streamer sub-module writes the data block to the primary data node in the pipeline.
  • the primary data node is the first data node in the pipeline.
  • the client's DFSOutputStream module saves the ack queue for the sent data block, waiting for each data node in the pipeline to inform that the data has been successfully written.
  • the primary data node triggers a write process of the distributed storage subsystem.
  • data is first written to the primary physical disk, and a replication request is sent to the standby disk, so that multiple backup data (3 copies by default) is written on the distributed storage device.
  • the primary data node no longer needs to send a data block to the backup data node, but only sends an update request, thereby entering the process of “update layer”.
  • the backup data node referred to in this step is the first backup data node, and the second backup data node is also shown in FIG. 4, and the first backup data node completes the data write operation. After that, an update request is also sent to the second backup data node.
  • update layer may include other processing procedures in addition to the processing of the following steps.
  • the backup data node After receiving the update request, the backup data node refreshes the shared distributed storage subsystem according to the content of the message in the update request, and when the information of the data block is read, the attribute information of the data block is saved, that is, the data block is completed.
  • the writing process when the writing of the data block is completed, a notification message is sent to the name node, and a response message is returned to the client.
  • akb quene removes the corresponding data packet.
  • Data Streamer will brush the remaining data packets into the pipeline and wait for the ack information. After receiving the last ack, the metadata node is notified to complete the writing.
  • the client When the client completes the write operation of all data blocks, it calls the stream's close method to close the write stream.
  • an embodiment of the present invention provides a data node, including:
  • the receiving module 501 is configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written.
  • Write circuit 502 configured to write the data to be written into the first distributed storage subsystem
  • the sending module 503 is configured to send a notification message to the name node, and send a response message to the client.
  • the attribute information of the data to be written includes a name and a size of the data to be written.
  • the embodiment of the invention further provides a data node, as shown in FIG. 6, comprising:
  • the receiving module 601 is configured to receive an update request, where the update request includes a storage location of the to-be-written data in the first distributed storage subsystem and attribute information of the to-be-written data.
  • the processing module 602 is configured to search, in the first distributed storage subsystem, according to the storage location of the to-be-written data in the first distributed storage subsystem and the attribute information of the data to be written. The data to be written.
  • the storage module 603 is configured to: when the data to be written is found, the backup data node protects The attribute information of the data to be written is stored.
  • the sending module 604 is configured to send a notification message to the name node, and send a response message to the client.
  • a plurality of data nodes share a distributed storage subsystem, and the distributed storage subsystem includes a primary storage device and at least one backup storage device, which can implement data between the storage devices. Back up each other.
  • the primary data node writes the data to be written into the first distributed storage subsystem, and then sends an update request to the first backup data node to notify the first backup data node that the data is to be written.
  • the attribute information and the storage location of the data to be written, the first backup data node only needs to view the first distributed storage subsystem according to the update request, and determines that the data to be written has been written in the first distributed storage subsystem.
  • the attribute information of the data to be written in the update request is saved, and the writing process of the data to be written is completed.
  • all the data nodes need to write the data to be written into their corresponding distributed storage systems, so that the data scheduling method provided by the present invention has only one data node, but only one
  • the data node performs the process of writing data to the first distributed operating system, and the remaining data nodes utilize the data localization feature to view the shared distributed storage subsystem, which can reduce the network transmission and storage time of data between the various data nodes. , thereby speeding up data writing speed.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk, etc. includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the field of file systems. Disclosed are a file storage system, a data scheduling method, and a data node. The data scheduling method comprises: a primary data node receives a write operation instruction sent by a client, the write operation instruction comprising to-be-written data; the primary data node writes the to-be-written data into a first distributed storage subsystem, sending a notification message to a name node, and sends a response message to the client; and the primary data node sends an update request to a first backup data node, the update request comprising a storage position of the to-be-written data in the first distributed storage subsystem and attribute information of the to-be-written data. The present invention is applied in a process of storing a file.

Description

一种文件存储系统、数据调度方法及数据节点File storage system, data scheduling method and data node 技术领域Technical field
本发明涉及文件系统领域,尤其涉及一种文件存储系统、数据调度方法及数据节点。The present invention relates to the field of file systems, and in particular, to a file storage system, a data scheduling method, and a data node.
背景技术Background technique
Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)是一种适合运行在通用硬件(commodity hardware)上的分布式文件系统,具备高可扩展性,无需停机便能够实现动态扩容;高可靠性,能够实现数据自动检测和复制以及高吞吐量访问,消除访问瓶颈等特性。Hadoop Distributed File System (HDFS) is a distributed file system suitable for running on commodity hardware. It is highly scalable and can be dynamically expanded without downtime. High reliability. Enables automatic data detection and replication as well as high-throughput access, eliminating access bottlenecks and more.
现有技术中,HDFS的系统架构如图1所示,包括:客户端11和服务器群12。其中,客户端11包括分布式文件系统(Distributed File System)模块111和文件系统数据输出流(FSData OutputStream)模块112。服务器群采用主从结构,由一个名字节点(Name Node,NN)121和多个数据节点(Data Node,DN)122组成。其中,名字节点121是一个管理文件命名空间和调节客户端访问文件的主服务器;数据节点122,用于存储数据,一般一个数据节点对应一个服务器,每个数据节点对应一个分布式存储子系统,采用分布式存储的存储方式。In the prior art, the system architecture of the HDFS is as shown in FIG. 1, and includes: a client 11 and a server group 12. The client 11 includes a distributed file system module 111 and a file system data output stream (FSData OutputStream) module 112. The server group adopts a master-slave structure, and consists of a name node (NN) 121 and a plurality of data nodes (DNs) 122. The name node 121 is a main server that manages the file namespace and adjusts the client access file; the data node 122 is used to store data, generally one data node corresponds to one server, and each data node corresponds to a distributed storage subsystem. Storage with distributed storage.
在利用上述HDFS系统进行写数据之前,客户端首先通过DistributedFileSystem模块向远程的NN节点发起RPC请求;NN节点在文件系统的命名空间中创建一个新的文件;DistributedFileSystem模块向HDFS客户端返回DFSOutputStream,然后客户端开始写数据。客户端开始写入数据,DFSOutputStream将数据分成块,写入数据队列(data queue)中。Data queue由数据流(Data Streamer)读取,并通知名字节点分配用于存储数据块的数据节点(每个数据块默认对应3个数据节点)。Data Streamer采用流水线的形式将数据依次写入分配的数据节点中实现数据块在多个数据节点之间的互相备份。例如:将数据块写入第一个数据节点;第 一个数据节点将数据块发送给第二个数据节点;第二个数据节点将数据块发送给第三个数据节点。此外,每个数据节点对应有分布式存储设备,所指的分布式存储设备实际上由多个物理磁盘组成。数据节点将已写入的块数据通过IO转发到分布式存储设备内,触发分布式存储设备的写流程,分布式存储设备将数据写到主物理磁盘,同时发送复制请求给备物理磁盘,实现分布式存储设备上多备份数据(默认复制3份)写入。当客户端完成数据写入后,Data Streamer关闭写入流并通知名字节点数据写入完毕。Before using the above HDFS system to write data, the client first initiates an RPC request to the remote NN node through the DistributedFileSystem module; the NN node creates a new file in the file system namespace; the DistributedFileSystem module returns a DFSOutputStream to the HDFS client, and then The client starts writing data. The client starts writing data, and DFSOutputStream divides the data into blocks and writes it into the data queue. The Data queue is read by the Data Streamer and informs the name node to allocate data nodes for storing data blocks (each data block corresponds to 3 data nodes by default). Data Streamer uses pipelines to sequentially write data into allocated data nodes to achieve mutual backup of data blocks between multiple data nodes. For example: writing a data block to the first data node; One data node sends the data block to the second data node; the second data node sends the data block to the third data node. In addition, each data node corresponds to a distributed storage device, and the distributed storage device refers to a plurality of physical disks. The data node forwards the written block data to the distributed storage device through the IO, triggers the write process of the distributed storage device, and the distributed storage device writes the data to the primary physical disk, and simultaneously sends the copy request to the standby physical disk to implement Multiple backup data (3 copies by default) is written on the distributed storage device. When the client finishes writing data, Data Streamer closes the write stream and notifies the name node that the data has been written.
现有的这种文件读写方法,只有在全部数据节点均完成数据的写入才会进行下一个数据块的写操作过程,数据写入速度较慢。In the existing file reading and writing method, the writing operation of the next data block is performed only when all the data nodes complete the writing of the data, and the data writing speed is slow.
发明内容Summary of the invention
本发明提供一种文件存储系统、数据调度方法及数据节点,能够加快数据写入速度。The invention provides a file storage system, a data scheduling method and a data node, which can speed up data writing speed.
为达到上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
第一方面,本发明实施例提供一种文件存储系统,所述文件存储系统的服务器侧包括:The first aspect of the present invention provides a file storage system, where the server side of the file storage system includes:
名字节点、主数据节点和至少一个备份数据节点;a name node, a primary data node, and at least one backup data node;
所述主数据节点和至少一个备份数据节点共享第一分布式存储子系统,所述第一分布式存储子系统包括一个主存储设备和至少一个备份存储设备;The primary data node and the at least one backup data node share a first distributed storage subsystem, the first distributed storage subsystem including a primary storage device and at least one backup storage device;
其中,所述主数据节点,用于接收客户端发送的写操作指令,所述写操作指令中包括待写入数据;将所述待写入数据写入第一分布式存储子系统中;并且向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息;The master data node is configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written; and the data to be written is written into the first distributed storage subsystem; Sending an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
所述备份数据节点,用于接收更新请求;根据所述更新请求中的待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据;当查找到所述待写入数据时,保存所述待写入数据的属性信息。The backup data node is configured to receive an update request; according to the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written in the update request, The first distributed storage subsystem searches for the data to be written; when the data to be written is found, the attribute information of the data to be written is saved.
结合第一方面,在第一方面的第一种实现方式中,所述主数据节点对所述第一分布式存储子系统的操作权限为允许进行读写操作;所述备份数据节 点对所述第一分布式存储子系统的操作权限为仅允许进行读操作。With reference to the first aspect, in a first implementation manner of the first aspect, the operation permission of the primary data node to the first distributed storage subsystem is to allow a read/write operation; the backup data section The operational authority to point to the first distributed storage subsystem is to allow only read operations.
第二方面,本发明实施例还提供了一种数据调度方法,该方法应用于如第一方面所述的文件存储系统中,所述方法包括:In a second aspect, the embodiment of the present invention further provides a data scheduling method, where the method is applied to the file storage system according to the first aspect, the method includes:
主数据节点接收客户端发送的写操作指令,所述写操作指令中包括待写入数据;The main data node receives a write operation instruction sent by the client, where the write operation instruction includes data to be written;
所述主数据节点将所述待写入数据写入第一分布式存储子系统中;Writing, by the primary data node, the data to be written into the first distributed storage subsystem;
所述主数据节点向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息。The primary data node sends an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written .
结合第二方面,在第二方面的第一种实现方式中,所述待写入数据的属性信息包括所述待写入数据的名称以及大小。With reference to the second aspect, in a first implementation manner of the second aspect, the attribute information of the data to be written includes a name and a size of the data to be written.
结合第二方面或者第二方面的第一种实现方式,在第二方面的第二种实现方式中,所述方法还包括:With reference to the second aspect, or the first implementation of the second aspect, in a second implementation manner of the second aspect, the method further includes:
当第一数据节点发生故障时,恢复发生故障的数据节点的系统文件,得到恢复后的数据节点,所述第一数据节点为所有数据节点中的任意一个;When the first data node fails, recovering the system file of the failed data node, and obtaining the restored data node, where the first data node is any one of all data nodes;
将所述第一分布式子系统挂载至所述恢复后的数据节点。The first distributed subsystem is mounted to the restored data node.
第三方面,本发明实施例还提供了一种数据调度方法,包括:In a third aspect, an embodiment of the present invention further provides a data scheduling method, including:
备份数据节点接收更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息;The backup data node receives an update request, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
所述备份数据节点根据所述待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据;The backup data node searches for the storage location in the first distributed storage subsystem and the attribute information of the data to be written according to the storage data of the to-be-written data in the first distributed storage subsystem. Describe the data to be written;
当查找到所述待写入数据时,所述备份数据节点保存所述待写入数据的属性信息。When the data to be written is found, the backup data node saves the attribute information of the data to be written.
第四方面,本发明实施例提供了一种数据节点,包括:In a fourth aspect, an embodiment of the present invention provides a data node, including:
接收模块,用于接收客户端发送的写操作指令,所述写操作指令中包括待写入数据;a receiving module, configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written;
写电路,用于将所述待写入数据写入第一分布式存储子系统中;a write circuit, configured to write the data to be written into the first distributed storage subsystem;
发送模块,用于向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数 据的属性信息。a sending module, configured to send an update request to the first backup data node, where the update request includes a storage location of the to-be-written data in the first distributed storage subsystem and the number of to-be-written According to the attribute information.
结合第四方面,在第四方面的第一种实现方式中,所述待写入数据的属性信息包括所述待写入数据的名称以及大小。In conjunction with the fourth aspect, in a first implementation manner of the fourth aspect, the attribute information of the data to be written includes a name and a size of the data to be written.
第五方面,本发明实施例还提供了一种数据节点,包括:In a fifth aspect, an embodiment of the present invention further provides a data node, including:
接收模块,用于接收更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息;a receiving module, configured to receive an update request, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
处理模块,用于根据所述待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据;a processing module, configured to search for a location in the first distributed storage subsystem according to a storage location of the to-be-written data in the first distributed storage subsystem and attribute information of the to-be-written data Describe the data to be written;
存储模块,用于当查找到所述待写入数据时,所述备份数据节点保存所述待写入数据的属性信息。And a storage module, configured to: when the data to be written is found, the backup data node saves the attribute information of the data to be written.
本发明提供的文件存储系统,多个数据节点共享一个分布式存储子系统,且该分布式存储子系统包括一个主存储设备和至少一个备份存储设备,能够实现数据在各个存储设备之间的互相备份。在利用该文件存储系统进行写数据时,主数据节点将待写入数据写入第一分布式存储子系统,然后向第一备份数据节点发送更新请求以通知第一备份数据节点待写入数据的属性信息以及待写入数据的存储位置,第一备份数据节点根据该更新请求仅需要查看第一分布式存储子系统,确定第一分布式存储子系统中已经写入了待写入数据后,保存更新请求中的待写入数据的属性信息,便完成了对待写入数据的写过程。与现有技术中,所有数据节点都需要将待写入数据分别写入其对应的分布式存储系统相比,本发明提供的数据调度的方法,虽然存在多个数据节点,但实际只有一个数据节点进行了数据写入第一分布式操作系统的过程,其余数据节点则利用数据本地化特性,查看共享的分布式存储子系统,能够减少数据在各个数据节点之间的网络传输和存储时间,进而加快数据写入速度。According to the file storage system provided by the present invention, a plurality of data nodes share a distributed storage subsystem, and the distributed storage subsystem includes a primary storage device and at least one backup storage device, which can implement mutual data between the storage devices. Backup. When writing data by using the file storage system, the primary data node writes the data to be written into the first distributed storage subsystem, and then sends an update request to the first backup data node to notify the first backup data node that the data is to be written. The attribute information and the storage location of the data to be written, the first backup data node only needs to view the first distributed storage subsystem according to the update request, and determines that the data to be written has been written in the first distributed storage subsystem. The attribute information of the data to be written in the update request is saved, and the writing process of the data to be written is completed. Compared with the prior art, all the data nodes need to write the data to be written into their corresponding distributed storage systems respectively. Compared with the data scheduling method provided by the present invention, although there are multiple data nodes, there is actually only one data. The node performs the process of writing data to the first distributed operating system, and the remaining data nodes utilize the data localization feature to view the shared distributed storage subsystem, which can reduce the network transmission and storage time of data between the various data nodes. In turn, the data writing speed is accelerated.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is some embodiments of the present invention, and those of ordinary skill in the art, Other drawings may also be obtained from these drawings without paying for creative labor.
图1为现有技术中的HDFS的架构示意图;1 is a schematic structural diagram of an HDFS in the prior art;
图2为本发明实施例提供的HDFS的架构示意图;2 is a schematic structural diagram of an HDFS according to an embodiment of the present invention;
图3为本发明实施例提供的一种数据调度方法的流程示意图;FIG. 3 is a schematic flowchart of a data scheduling method according to an embodiment of the present disclosure;
图4为本发明实施例提供的另一种数据调度方法的流程示意图;FIG. 4 is a schematic flowchart diagram of another data scheduling method according to an embodiment of the present disclosure;
图5为本发明实施例提供的一种数据节点的示意图;FIG. 5 is a schematic diagram of a data node according to an embodiment of the present invention;
图6为本发明实施例提供的另一种数据节点的示意图。FIG. 6 is a schematic diagram of another data node according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本实施例中的附图,对本实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the present embodiment will be clearly and completely described in the following with reference to the drawings in the embodiments. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例提供一种文件存储系统,如图2所示,包括:客户端21和服务器群22。其中,客户端包括分布式文件系统(Distributed File System)模块211和文件系统数据输出流(FSData OutputStream)模块212。服务器群采用主从结构,包括名字节点221、主数据节点222和至少一个备份数据节点223;其中,主数据节点和至少一个备份数据节点共享第一分布式存储子系统224,所述第一分布式存储子系统224包括一个主存储设备2241和至少一个备份存储设备2242。The embodiment of the invention provides a file storage system, as shown in FIG. 2, comprising: a client 21 and a server group 22. The client includes a distributed file system module 211 and a file system data output stream (FSData OutputStream) module 212. The server farm employs a master-slave structure, including a name node 221, a primary data node 222, and at least one backup data node 223; wherein the primary data node and the at least one backup data node share a first distributed storage subsystem 224, the first distribution The storage subsystem 224 includes a primary storage device 2241 and at least one backup storage device 2242.
其中,服务器群中的各个节点(包括名字节点、主数据节点和备份数据节点)在物理实现上可相当于一个服务器。分布式存储子系统以虚拟设备的方式提供给各个数据节点,在各个数据节点上体现为虚拟磁盘,对分布式子系统的读写类似于对本地物理磁盘的读写。而在物理实现上,分布式子系统包括多个物理存储设备,例如:多个硬盘;多个物理存储设备之间的数据能够实现互相备份。Among them, each node in the server group (including the name node, the primary data node, and the backup data node) can be equivalent to one server in physical implementation. The distributed storage subsystem is provided to each data node in the form of a virtual device, which is embodied as a virtual disk on each data node, and the read and write to the distributed subsystem is similar to the reading and writing of the local physical disk. In physical implementation, the distributed subsystem includes multiple physical storage devices, for example, multiple hard disks; data between multiple physical storage devices can be backed up to each other.
第一分布式存储子系统能够在各个数据节点之间实现共享。各个数据节点对第一分布式存储子系统的操作权限可以不加限定,也可以进行特殊的规定。可选的,主数据节点222对第一分布式存储子系统224的操作权限为允许进行读写操作;备份数据节点223对所述第一分布式存储子系统224的操 作权限为仅允许进行读操作。The first distributed storage subsystem is capable of sharing between various data nodes. The operation authority of each data node to the first distributed storage subsystem may be unrestricted, and special provisions may also be made. Optionally, the operation authority of the primary data node 222 to the first distributed storage subsystem 224 is to allow read and write operations; and the operation of the backup data node 223 to the first distributed storage subsystem 224 Permissions are only allowed for read operations.
由于多个数据节点共享第一分布式存储子系统,因而当某个数据节点发生故障时,仅需要恢复发生故障的数据节点的系统文件,得到恢复后的数据节点,然后将第一分布式子系统挂载至恢复后的数据节点,即可实现该数据节点下第一分布式存储子系统中的数据的恢复,而不需要采用复制数据的方式进行数据恢复,提高数据恢复的效率。Since a plurality of data nodes share the first distributed storage subsystem, when a data node fails, only the system file of the failed data node needs to be restored, the restored data node is obtained, and then the first distributed sub-node is The system is mounted to the restored data node to implement data recovery in the first distributed storage subsystem under the data node, without using data replication to recover data, thereby improving data recovery efficiency.
需要说明的是,本发明实施例提供的文件存储系统,各个数据节点除了共享该第一分布式存储子系统外,每个数据节点还可能分别独立对应分布式存储子系统,相对于第一分布式存储子系统可以被所有数据节点共享,某个数据节点对应的分布式存储子系统只有该数据节点本身具有读写权限。It should be noted that, in the file storage system provided by the embodiment of the present invention, in addition to sharing the first distributed storage subsystem, each data node may separately correspond to the distributed storage subsystem, respectively, with respect to the first distribution. The storage subsystem can be shared by all data nodes. The distributed storage subsystem corresponding to a data node has only read and write permissions to the data node itself.
基于上述文件存储系统,本发明实施例提供了一种数据调度方法,如图3所述,该方法包括:Based on the foregoing file storage system, an embodiment of the present invention provides a data scheduling method, as described in FIG. 3, the method includes:
301:主数据节点接收客户端发送的写操作指令。301: The primary data node receives a write operation instruction sent by the client.
其中,所述写操作指令中包括待写入数据。The write operation instruction includes data to be written.
302:主数据节点将所述待写入数据写入第一分布式存储子系统中。302: The primary data node writes the to-be-written data into the first distributed storage subsystem.
在本步骤的具体实现过程中,主数据节点接收到待写入数据后,将待写入数据转发到分布式存储子系统内部,触发分布式存储子系统的写操作流程,分布式存储子系统将待写入数据写到主物理磁盘,同时发送复制请求给备份磁盘,备份磁盘再复制主物理磁盘的数据并保存,实现备份磁盘与主物理磁盘之间的数据的备份。In the specific implementation process of the step, after receiving the data to be written, the primary data node forwards the data to be written to the distributed storage subsystem, triggering the write operation process of the distributed storage subsystem, and the distributed storage subsystem. The data to be written is written to the primary physical disk, and the copy request is sent to the backup disk. The backup disk then copies the data of the primary physical disk and saves the data to backup the backup disk and the primary physical disk.
303:主数据节点向名字节点发送通知消息,向客户端发送应答消息。303: The primary data node sends a notification message to the name node, and sends a response message to the client.
当主数据节点成功将数据写入分布式存储子系统后,表明写完数据,则该主数据节点向名字节点发送通知消息以告知名字节点已经写完待写入数据,并向客户端发送应答消息以告知客户端已经根据客户端发送的命令完成数据的写入。After the primary data node successfully writes the data to the distributed storage subsystem, indicating that the data is written, the primary data node sends a notification message to the name node to inform the name node that the data to be written has been written, and sends a response message to the client. In order to inform the client that the data has been written according to the command sent by the client.
304:主数据节点向第一备份数据节点发送更新请求。304: The primary data node sends an update request to the first backup data node.
这是本发明实施例提供的数据调度方法与现有技术的一个不同之处。现有技术中,主数据节点将数据写入分布式存储系统中后,会向第一备份数据节点发送写请求,该写请求中携带有待写入数据,第一备份数据节点收到该写请求后需要将该请求中携带的待写入数据写入到其自身对应的一个独立的 分布式存储子系统中,这样才算完成了第一备份数据节点的写数据过程。This is a difference between the data scheduling method provided by the embodiment of the present invention and the prior art. In the prior art, after the primary data node writes the data into the distributed storage system, it sends a write request to the first backup data node, where the write request carries the data to be written, and the first backup data node receives the write request. After that, the data to be written carried in the request needs to be written to an independent one corresponding to itself. In the distributed storage subsystem, this completes the write data process of the first backup data node.
而在本发明中,该更新请求仅包括待写入数据在第一分布式存储子系统中的存储位置和待写入数据的属性信息,例如:待写入数据的名称以及大小等。更新请求中包含的信息相当于待写入数据的索引信息,并不包括待写入数据本身。这样,第一备份数据节点根据该更新请求可以从相应的位置查找到待写入数据即可,省去了主数据节点与第一备份数据节点之间的数据传输的过程,以及第一备份数据节点将待写入数据进行写的整个过程。In the present invention, the update request includes only the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written, such as the name and size of the data to be written, and the like. The information contained in the update request is equivalent to the index information of the data to be written, and does not include the data to be written itself. In this way, the first backup data node can find the data to be written from the corresponding location according to the update request, and the process of data transmission between the primary data node and the first backup data node, and the first backup data are omitted. The node will write the data to be written for the entire process.
305:备份数据节点接收更新请求。305: The backup data node receives the update request.
其中,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息。The update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written.
本步骤中所指的备份数据节点包括前述步骤中所指的第一备份数据节点。The backup data node referred to in this step includes the first backup data node referred to in the foregoing steps.
如果需要将待写入数据在2个数据节点之间实现备份,则文件存储系统中仅包括主数据节点和第一备份数据节点。If the data to be written needs to be backed up between two data nodes, only the primary data node and the first backup data node are included in the file storage system.
如果需要将待写入数据在2个以上的数据节点之间实现备份(一般为在3个数据节点之间实现备份),则文件存储系统中除了主数据节点和第一备份数据节点外,可能还包括第二备份数据节点、第三备份数据节点等。这种情况下,采用流水处理的过程,前一数据节点写完后,向后一数据节点发送更新请求。例如:主数据节点将待写入数据写入分布式存储系统中后,向第一备份数据节点发送更新请求;第一备份数据节点完成数据写入后,向第二备份数据节点发送更新请求,依此类推。If the data to be written needs to be backed up between two or more data nodes (generally implemented between three data nodes), the file storage system may be in addition to the primary data node and the first backup data node. Also included are a second backup data node, a third backup data node, and the like. In this case, the pipeline processing process is used, and after the previous data node is written, an update request is sent to the next data node. For example, after the primary data node writes the data to be written into the distributed storage system, the primary backup data node sends an update request to the first backup data node; after the first backup data node completes the data write, the update request is sent to the second backup data node. So on and so forth.
306:备份数据节点根据待写入数据在第一分布式存储子系统中的存储位置以及待写入数据的属性信息,在第一分布式存储子系统中查找所述待写入数据。306: The backup data node searches for the data to be written in the first distributed storage subsystem according to the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written.
307:当查找到待写入数据时,备份数据节点保存所述待写入数据的属性信息。307: When the data to be written is found, the backup data node saves the attribute information of the data to be written.
308:备份数据节点向名字节点发送通知消息,向客户端发送应答消息。308: The backup data node sends a notification message to the name node, and sends a response message to the client.
备份数据节点仅需要保存待写入数据的属性信息,便完成了对待写入数据的写入过程。当备份数据节点完成数据写入后,向名字节点发送通知消息,向客户端发送应答消息。The backup data node only needs to save the attribute information of the data to be written, and the writing process of the data to be written is completed. After the backup data node completes the data writing, it sends a notification message to the name node to send a response message to the client.
需要说明的是,图3中仅示出了文件存储系统中包括主数据节点和第一 备份数据节点的时候,数据调度的具体实现过程;当文件存储系统中还包括其他备份数据节点时,其余备份数据节点的操作过程同第一备份数据节点,图3中并未示出。It should be noted that only the primary data node and the first file storage system are shown in FIG. When the data node is backed up, the specific implementation process of the data scheduling; when the file storage system further includes other backup data nodes, the operation process of the remaining backup data nodes is the same as that of the first backup data node, which is not shown in FIG.
本发明实施例提供的文件存储系统,多个数据节点共享一个分布式存储子系统,且该分布式存储子系统包括一个主存储设备和至少一个备份存储设备,能够实现数据在各个存储设备之间的互相备份。在利用该文件存储系统进行写数据时,主数据节点将待写入数据写入第一分布式存储子系统,然后向第一备份数据节点发送更新请求以通知第一备份数据节点待写入数据的属性信息以及待写入数据的存储位置,第一备份数据节点根据该更新请求仅需要查看第一分布式存储子系统,确定第一分布式存储子系统中已经写入了待写入数据后,保存更新请求中的待写入数据的属性信息,便完成了对待写入数据的写过程。与现有技术中,所有数据节点都需要将待写入数据分别写入其对应的分布式存储系统相比,这样本发明提供的数据调度的方法,虽然存在多个数据节点,但实际只有一个数据节点进行了数据写入第一分布式操作系统的过程,其余数据节点则利用数据本地化特性,查看共享的分布式存储子系统,能够减少数据在各个数据节点之间的网络传输和存储时间,进而加快数据写入速度。In the file storage system provided by the embodiment of the present invention, a plurality of data nodes share a distributed storage subsystem, and the distributed storage subsystem includes a primary storage device and at least one backup storage device, which can implement data between the storage devices. Back up each other. When writing data by using the file storage system, the primary data node writes the data to be written into the first distributed storage subsystem, and then sends an update request to the first backup data node to notify the first backup data node that the data is to be written. The attribute information and the storage location of the data to be written, the first backup data node only needs to view the first distributed storage subsystem according to the update request, and determines that the data to be written has been written in the first distributed storage subsystem. The attribute information of the data to be written in the update request is saved, and the writing process of the data to be written is completed. Compared with the prior art, all the data nodes need to write the data to be written into their corresponding distributed storage systems, so that the data scheduling method provided by the present invention has only one data node, but only one The data node performs the process of writing data to the first distributed operating system, and the remaining data nodes utilize the data localization feature to view the shared distributed storage subsystem, which can reduce the network transmission and storage time of data between the various data nodes. , thereby speeding up data writing speed.
此外,现有技术中,当需要通过多个数据节点实现数据备份时,待写入数据除了需要在多个数据节点之间进行传输;分布式存储子系统内部也要实现数据备份,因而数据还要在分布式存储子系统内部的多个存储设备之间进行传输,因而容易带来网络热点问题,造成网络瓶颈。且每个数据节点都要读取全部待写入数据并保存,占用的存储空间较大,进而带来较大的存储开销。In addition, in the prior art, when data backup needs to be implemented through multiple data nodes, data to be written needs to be transmitted between multiple data nodes; data backup is also implemented inside the distributed storage subsystem, and thus the data is still To transmit between multiple storage devices inside a distributed storage subsystem, it is easy to bring network hotspot problems and cause network bottlenecks. Each data node has to read all the data to be written and saves, which occupies a large storage space, which in turn leads to a large storage overhead.
而本发明实施例中,尽管存在多个互相备份的数据节点,但实际上只有一个数据节点进行了真正的写数据的过程,其余数据节点仅仅是利用数据本地化特性,通过共享的分布式存储系统实现本地请求数据,能够减少各个数据节点之间传输的数据量,减少传输开销;此外,由于仅由第一个数据节点进行了待写入数据的读写并将待写入数据保存在第一分布式存储系统中,其余数据节点无需再保存待写入数据,因此还能够减少占用的存储空间,进而节省服务器群的存储开销。 In the embodiment of the present invention, although there are multiple data nodes that are backed up each other, only one data node actually performs the process of writing data, and the remaining data nodes only use the data localization feature through the shared distributed storage. The system implements the local request data, which can reduce the amount of data transmitted between the data nodes and reduce the transmission overhead; in addition, since only the first data node performs the reading and writing of the data to be written and the data to be written is saved in the first In a distributed storage system, the remaining data nodes do not need to save the data to be written, so the storage space occupied can be reduced, thereby saving the storage overhead of the server group.
结合实际应用,本发明实施例还提供了一个数据调度的具体实现过程,如图4所示,包括:In conjunction with the actual application, the embodiment of the present invention further provides a specific implementation process of data scheduling, as shown in FIG. 4, including:
401:客户端的DistributedFileSystem模块向名字节点发起RPC请求。401: The client's DistributedFileSystem module initiates an RPC request to the name node.
所述远程过程调用协议(Remote Procedure Call Protocol,RPC)请求,用于在文件系统的命名空间中创建一个新的文件。The Remote Procedure Call Protocol (RPC) request is used to create a new file in the file system namespace.
402:名字节点接收到RPC请求后,创建新文件。402: The name node creates a new file after receiving the RPC request.
需要说明的是,在执行本步骤之前,名字节点首先检查要创建的文件是否已经存在,创建者是否有权限进行操作,如果不存在要创建的文件且创建者有权限进行操作,才会执行本步骤以及后续步骤;否则会让客户端抛出异常,结束文件读写的过程。It should be noted that before performing this step, the name node first checks whether the file to be created already exists, and whether the creator has permission to operate. If the file to be created does not exist and the creator has permission to perform the operation, the name will be executed. Steps and subsequent steps; otherwise the client will throw an exception and end the process of reading and writing files.
下述步骤403至步骤407为数据写入过程。The following steps 403 to 407 are data writing processes.
403:客户端的DFSOutputStream模块将数据分成块,写入数据队列(data queue)中,并通知名字节点分配数据节点。403: The client's DFSOutputStream module divides the data into blocks, writes them into a data queue, and notifies the name node to allocate data nodes.
其中,Data queue由DFSOutputStream模块中的Data Streamer子模块读取。数据节点用来存储数据块,分配的数据节点放在一个管道(pipeline)里。The Data queue is read by the Data Streamer submodule in the DFSOutputStream module. The data node is used to store data blocks, and the assigned data nodes are placed in a pipeline.
404:客户端的Data Streamer子模块将数据块写入pipeline中的主数据节点。404: The client's Data Streamer sub-module writes the data block to the primary data node in the pipeline.
其中,该主数据节点为pipeline中的第一个数据节点。The primary data node is the first data node in the pipeline.
与此同时,客户端的DFSOutputStream模块为发出去的数据块保存了ack queue,等待pipeline中的每个数据节点告知数据已经写入成功。At the same time, the client's DFSOutputStream module saves the ack queue for the sent data block, waiting for each data node in the pipeline to inform that the data has been successfully written.
405:主数据节点触发分布式存储子系统的写流程。405: The primary data node triggers a write process of the distributed storage subsystem.
在该步骤的具体实现过程中,首先将数据写到主物理磁盘,同时发送复制请求给备磁盘,实现分布式存储设备上多备份数据(默认复制3份)写入。In the specific implementation process of the step, data is first written to the primary physical disk, and a replication request is sent to the standby disk, so that multiple backup data (3 copies by default) is written on the distributed storage device.
406:主数据节点将数据写入分布式存储子系统中后,向名字节点发送通知消息,向客户端发送应答消息,向备份数据节点发送更新请求。406: After the primary data node writes the data into the distributed storage subsystem, sends a notification message to the name node, sends a response message to the client, and sends an update request to the backup data node.
在本步骤的具体实现过程中,主数据节点不再需要向备份数据节点发送数据块,而只是发送更新请求,由此进入“更新层”的处理过程。In the specific implementation process of this step, the primary data node no longer needs to send a data block to the backup data node, but only sends an update request, thereby entering the process of “update layer”.
需要说明的是,本步骤中所指的备份数据节点为第一备份数据节点,图4中还示出了第二备份数据节点,则第一备份数据节点在完成数据的写入操 作后,还要向第二备份数据节点发送更新请求。It should be noted that the backup data node referred to in this step is the first backup data node, and the second backup data node is also shown in FIG. 4, and the first backup data node completes the data write operation. After that, an update request is also sent to the second backup data node.
还需要说明的是,除了下述步骤的处理过程以外,更新层还可以包括其他处理过程。It should also be noted that the update layer may include other processing procedures in addition to the processing of the following steps.
407:备份数据节点收到更新请求后,根据更新请求中的消息内容刷新共享的分布式存储子系统,当读取到数据块的信息时,保存数据块的属性信息等,即完成数据块的写入过程,当完成数据块的写入后,向名字节点发送通知消息,向客户端返回响应消息。407: After receiving the update request, the backup data node refreshes the shared distributed storage subsystem according to the content of the message in the update request, and when the information of the data block is read, the attribute information of the data block is saved, that is, the data block is completed. During the writing process, when the writing of the data block is completed, a notification message is sent to the name node, and a response message is returned to the client.
当pipeline中的所有数据节点都完成数据的写入,akc quene把对应的数据包移除掉。When all data nodes in the pipeline complete the data write, akb quene removes the corresponding data packet.
重复上述过程,Data Streamer把剩余的数据包都刷到pipeline里然后等待ack信息,收到最后一个ack后,通知元数据节点写入完毕。Repeat the above process, Data Streamer will brush the remaining data packets into the pipeline and wait for the ack information. After receiving the last ack, the metadata node is notified to complete the writing.
当客户端完成所有数据块的写入操作后,调用stream的close方法关闭写入流。When the client completes the write operation of all data blocks, it calls the stream's close method to close the write stream.
作为数据调度方法的具体应用,如图5所示,本发明实施例提供了一种数据节点,包括:As a specific application of the data scheduling method, as shown in FIG. 5, an embodiment of the present invention provides a data node, including:
接收模块501,用于接收客户端发送的写操作指令,所述写操作指令中包括待写入数据。The receiving module 501 is configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written.
写电路502,用于将所述待写入数据写入第一分布式存储子系统中;Write circuit 502, configured to write the data to be written into the first distributed storage subsystem;
发送模块503,用于向名字节点发送通知消息,向客户端发送应答消息;The sending module 503 is configured to send a notification message to the name node, and send a response message to the client.
向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息。And sending an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written.
其中,所述待写入数据的属性信息包括所述待写入数据的名称以及大小。The attribute information of the data to be written includes a name and a size of the data to be written.
本发明实施例还提供了一种数据节点,如图6所示,包括:The embodiment of the invention further provides a data node, as shown in FIG. 6, comprising:
接收模块601,用于接收更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息。The receiving module 601 is configured to receive an update request, where the update request includes a storage location of the to-be-written data in the first distributed storage subsystem and attribute information of the to-be-written data.
处理模块602,用于根据所述待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据。The processing module 602 is configured to search, in the first distributed storage subsystem, according to the storage location of the to-be-written data in the first distributed storage subsystem and the attribute information of the data to be written. The data to be written.
存储模块603,用于当查找到所述待写入数据时,所述备份数据节点保 存所述待写入数据的属性信息。The storage module 603 is configured to: when the data to be written is found, the backup data node protects The attribute information of the data to be written is stored.
发送模块604,用于向名字节点发送通知消息,向客户端发送应答消息。The sending module 604 is configured to send a notification message to the name node, and send a response message to the client.
本发明实施例提供的文件存储系统,多个数据节点共享一个分布式存储子系统,且该分布式存储子系统包括一个主存储设备和至少一个备份存储设备,能够实现数据在各个存储设备之间的互相备份。在利用该文件存储系统进行写数据时,主数据节点将待写入数据写入第一分布式存储子系统,然后向第一备份数据节点发送更新请求以通知第一备份数据节点待写入数据的属性信息以及待写入数据的存储位置,第一备份数据节点根据该更新请求仅需要查看第一分布式存储子系统,确定第一分布式存储子系统中已经写入了待写入数据后,保存更新请求中的待写入数据的属性信息,便完成了对待写入数据的写过程。与现有技术中,所有数据节点都需要将待写入数据分别写入其对应的分布式存储系统相比,这样本发明提供的数据调度的方法,虽然存在多个数据节点,但实际只有一个数据节点进行了数据写入第一分布式操作系统的过程,其余数据节点则利用数据本地化特性,查看共享的分布式存储子系统,能够减少数据在各个数据节点之间的网络传输和存储时间,进而加快数据写入速度。In the file storage system provided by the embodiment of the present invention, a plurality of data nodes share a distributed storage subsystem, and the distributed storage subsystem includes a primary storage device and at least one backup storage device, which can implement data between the storage devices. Back up each other. When writing data by using the file storage system, the primary data node writes the data to be written into the first distributed storage subsystem, and then sends an update request to the first backup data node to notify the first backup data node that the data is to be written. The attribute information and the storage location of the data to be written, the first backup data node only needs to view the first distributed storage subsystem according to the update request, and determines that the data to be written has been written in the first distributed storage subsystem. The attribute information of the data to be written in the update request is saved, and the writing process of the data to be written is completed. Compared with the prior art, all the data nodes need to write the data to be written into their corresponding distributed storage systems, so that the data scheduling method provided by the present invention has only one data node, but only one The data node performs the process of writing data to the first distributed operating system, and the remaining data nodes utilize the data localization feature to view the shared distributed storage subsystem, which can reduce the network transmission and storage time of data between the various data nodes. , thereby speeding up data writing speed.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘,硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. . Based on the understanding, the technical solution of the present invention, which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。 The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention.

Claims (9)

  1. 一种文件存储系统,其特征在于,所述文件存储系统的服务器侧包括:A file storage system, wherein the server side of the file storage system includes:
    名字节点、主数据节点和至少一个备份数据节点;a name node, a primary data node, and at least one backup data node;
    所述主数据节点和至少一个备份数据节点共享第一分布式存储子系统,所述第一分布式存储子系统包括一个主存储设备和至少一个备份存储设备;The primary data node and the at least one backup data node share a first distributed storage subsystem, the first distributed storage subsystem including a primary storage device and at least one backup storage device;
    其中,所述主数据节点,用于接收客户端发送的写操作指令,所述写操作指令中包括待写入数据;将所述待写入数据写入第一分布式存储子系统中;并且向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息;The master data node is configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written; and the data to be written is written into the first distributed storage subsystem; Sending an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
    所述备份数据节点,用于接收更新请求;根据所述更新请求中的待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据;当查找到所述待写入数据时,保存所述待写入数据的属性信息。The backup data node is configured to receive an update request; according to the storage location of the data to be written in the first distributed storage subsystem and the attribute information of the data to be written in the update request, The first distributed storage subsystem searches for the data to be written; when the data to be written is found, the attribute information of the data to be written is saved.
  2. 根据权利要求1所述的文件存储系统,其特征在于,所述主数据节点对所述第一分布式存储子系统的操作权限为允许进行读写操作;所述备份数据节点对所述第一分布式存储子系统的操作权限为仅允许进行读操作。The file storage system according to claim 1, wherein the primary data node has an operation authority for the first distributed storage subsystem to allow read and write operations; and the backup data node pairs the first The operational permissions of the distributed storage subsystem are only allowed for read operations.
  3. 一种数据调度方法,其特征在于,应用于如权利要求1或2所述的文件存储系统中,所述方法包括:A data scheduling method, which is applied to the file storage system according to claim 1 or 2, the method comprising:
    主数据节点接收客户端发送的写操作指令,所述写操作指令中包括待写入数据;The main data node receives a write operation instruction sent by the client, where the write operation instruction includes data to be written;
    所述主数据节点将所述待写入数据写入第一分布式存储子系统中;Writing, by the primary data node, the data to be written into the first distributed storage subsystem;
    所述主数据节点向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息。The primary data node sends an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written .
  4. 根据权利要求3所述的方法,其特征在于,所述待写入数据的属性信息包括所述待写入数据的名称以及大小。The method according to claim 3, wherein the attribute information of the data to be written includes a name and a size of the data to be written.
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:The method according to claim 3 or 4, wherein the method further comprises:
    当第一数据节点发生故障时,恢复发生故障的数据节点的系统文件,得到恢复后的数据节点,所述第一数据节点为所有数据节点中的任意一个;When the first data node fails, recovering the system file of the failed data node, and obtaining the restored data node, where the first data node is any one of all data nodes;
    将所述第一分布式子系统挂载至所述恢复后的数据节点。The first distributed subsystem is mounted to the restored data node.
  6. 一种数据调度方法,其特征在于,应用于如权利要求1或2所述的文 件存储系统中,包括:A data scheduling method, characterized by being applied to the article according to claim 1 or In the storage system, including:
    备份数据节点接收更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息;The backup data node receives an update request, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
    所述备份数据节点根据所述待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据;The backup data node searches for the storage location in the first distributed storage subsystem and the attribute information of the data to be written according to the storage data of the to-be-written data in the first distributed storage subsystem. Describe the data to be written;
    当查找到所述待写入数据时,所述备份数据节点保存所述待写入数据的属性信息。When the data to be written is found, the backup data node saves the attribute information of the data to be written.
  7. 一种数据节点,其特征在于,包括:A data node, comprising:
    接收模块,用于接收客户端发送的写操作指令,所述写操作指令中包括待写入数据;a receiving module, configured to receive a write operation instruction sent by the client, where the write operation instruction includes data to be written;
    写电路,用于将所述待写入数据写入第一分布式存储子系统中;a write circuit, configured to write the data to be written into the first distributed storage subsystem;
    发送模块,用于向第一备份数据节点发送更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息。a sending module, configured to send an update request to the first backup data node, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written .
  8. 根据权利要求7所述的数据节点,其特征在于,所述待写入数据的属性信息包括所述待写入数据的名称以及大小。The data node according to claim 7, wherein the attribute information of the data to be written includes a name and a size of the data to be written.
  9. 一种数据节点,其特征在于,包括:A data node, comprising:
    接收模块,用于接收更新请求,所述更新请求包括所述待写入数据在所述第一分布式存储子系统中的存储位置和所述待写入数据的属性信息;a receiving module, configured to receive an update request, where the update request includes a storage location of the data to be written in the first distributed storage subsystem and attribute information of the data to be written;
    处理模块,用于根据所述待写入数据在所述第一分布式存储子系统中的存储位置以及所述待写入数据的属性信息,在所述第一分布式存储子系统中查找所述待写入数据;a processing module, configured to search for a location in the first distributed storage subsystem according to a storage location of the to-be-written data in the first distributed storage subsystem and attribute information of the to-be-written data Describe the data to be written;
    存储模块,用于当查找到所述待写入数据时,所述备份数据节点保存所述待写入数据的属性信息。 And a storage module, configured to: when the data to be written is found, the backup data node saves the attribute information of the data to be written.
PCT/CN2016/095532 2015-12-11 2016-08-16 File storage system, data scheduling method, and data node WO2017096942A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510922155.5 2015-12-11
CN201510922155.5A CN106873902B (en) 2015-12-11 2015-12-11 File storage system, data scheduling method and data node

Publications (1)

Publication Number Publication Date
WO2017096942A1 true WO2017096942A1 (en) 2017-06-15

Family

ID=59012648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/095532 WO2017096942A1 (en) 2015-12-11 2016-08-16 File storage system, data scheduling method, and data node

Country Status (2)

Country Link
CN (1) CN106873902B (en)
WO (1) WO2017096942A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110337633A (en) * 2017-06-30 2019-10-15 华为技术有限公司 A kind of date storage method and equipment
CN109358813B (en) * 2018-10-10 2022-03-04 郑州云海信息技术有限公司 Capacity expansion method and device for distributed storage system
CN111881107B (en) * 2020-08-05 2022-09-06 北京计算机技术及应用研究所 Distributed storage method supporting mounting of multi-file system
CN114024979A (en) * 2021-10-25 2022-02-08 深圳市高德信通信股份有限公司 Distributed edge computing data storage system
CN115826879B (en) * 2023-02-14 2023-05-23 北京派网软件有限公司 Data updating method for storage nodes in distributed storage system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156964A1 (en) * 2005-12-30 2007-07-05 Sistla Krishnakanth V Home node aware replacement policy for caches in a multiprocessor system
CN101741911A (en) * 2009-12-18 2010-06-16 中兴通讯股份有限公司 Multi-copy collaboration-based write operation method, system and node
CN104598568A (en) * 2015-01-12 2015-05-06 浪潮电子信息产业股份有限公司 Efficient and low-power-consumption offline storage system and method
CN104917788A (en) * 2014-03-11 2015-09-16 中国移动通信集团公司 Data storage method and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011157156A2 (en) * 2011-06-01 2011-12-22 华为技术有限公司 Operation method and device for data storage system
CN103853612A (en) * 2012-12-04 2014-06-11 中山大学深圳研究院 Method for reading data based on digital family content under distributed storage
CN103714014B (en) * 2013-11-18 2016-12-07 华为技术有限公司 Process data cached method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070156964A1 (en) * 2005-12-30 2007-07-05 Sistla Krishnakanth V Home node aware replacement policy for caches in a multiprocessor system
CN101741911A (en) * 2009-12-18 2010-06-16 中兴通讯股份有限公司 Multi-copy collaboration-based write operation method, system and node
CN104917788A (en) * 2014-03-11 2015-09-16 中国移动通信集团公司 Data storage method and apparatus
CN104598568A (en) * 2015-01-12 2015-05-06 浪潮电子信息产业股份有限公司 Efficient and low-power-consumption offline storage system and method

Also Published As

Publication number Publication date
CN106873902B (en) 2020-04-28
CN106873902A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
US20220239602A1 (en) Scalable leadership election in a multi-processing computing environment
WO2017096942A1 (en) File storage system, data scheduling method, and data node
US9600553B1 (en) Distributed replication in cluster environments
US9934242B2 (en) Replication of data between mirrored data sites
US10223007B1 (en) Predicting IO
US20220317882A1 (en) Methods and systems to interface between a multi-site distributed storage system and an external mediator to efficiently process events related to continuity
US10831741B2 (en) Log-shipping data replication with early log record fetching
EP2821925B1 (en) Distributed data processing method and apparatus
CN107402722B (en) Data migration method and storage device
WO2016070375A1 (en) Distributed storage replication system and method
WO2015176636A1 (en) Distributed database service management system
US20170168756A1 (en) Storage transactions
JP6225262B2 (en) System and method for supporting partition level journaling to synchronize data in a distributed data grid
WO2018054079A1 (en) Method for storing file, first virtual machine and namenode
US20160350192A1 (en) Storage system transactions
WO2016078420A1 (en) Virtual machine processing method and virtual computer system
US10852985B2 (en) Persistent hole reservation
US9110820B1 (en) Hybrid data storage system in an HPC exascale environment
WO2020025049A1 (en) Data synchronization method and apparatus, database host, and storage medium
WO2016029744A1 (en) Metadata recovery method and relevant device
US20190179807A1 (en) Table and index communications channels
US10545667B1 (en) Dynamic data partitioning for stateless request routing
CN112068992A (en) Remote data copying method, storage device and storage system
WO2016101759A1 (en) Data routing method, data management device and distributed storage system
WO2018157605A1 (en) Message transmission method and device in cluster file system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16872138

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16872138

Country of ref document: EP

Kind code of ref document: A1