WO2017028721A1 - Data update method and device in distributed file system - Google Patents

Data update method and device in distributed file system Download PDF

Info

Publication number
WO2017028721A1
WO2017028721A1 PCT/CN2016/094322 CN2016094322W WO2017028721A1 WO 2017028721 A1 WO2017028721 A1 WO 2017028721A1 CN 2016094322 W CN2016094322 W CN 2016094322W WO 2017028721 A1 WO2017028721 A1 WO 2017028721A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
update
node
index
information
Prior art date
Application number
PCT/CN2016/094322
Other languages
French (fr)
Chinese (zh)
Inventor
段兵
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017028721A1 publication Critical patent/WO2017028721A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a data update method and apparatus for a distributed file system.
  • the disk reads M-1 data, write M + N data to the disk;
  • the main purpose of the present application is to provide a data update method and apparatus for a distributed file system, which overcomes the problems of high cost, low performance, and difficulty in ensuring consistency of data distribution in the distributed file system existing in the prior art.
  • a data update method for a distributed file system includes a plurality of data nodes and at least one check node, the method comprising: obtaining update data, writing the update data to a tail of a current data node, and updating corresponding index information in the data node; The update data is written to the tail of the check node, and the corresponding index information is updated in the check node.
  • the updating the corresponding index information in the data node includes: updating an end position occupied by a storage space of the data node; finding an index of the original data corresponding to the update data, and indexing the index Modified to the index of the update data.
  • the method further includes: sending the update data, the identifier information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where the identifier information of the data node includes: Disk ID, IP address, and port information.
  • the updating the corresponding index information in the check node includes: updating an end position of the occupied storage space in the check node.
  • the method further includes: querying, according to the identifier information of the data node, index information corresponding to the data node; searching, according to the identifier information of the original data corresponding to the update data, in the index information of the data node To the index of the original data, the index is modified to an index of the updated data.
  • a data update apparatus of a distributed file system is further provided, where the distributed file system includes a plurality of data nodes and at least one check node, and the apparatus includes:
  • An obtaining module configured to obtain update data; a data node update module, configured to write the update data to a tail of a current data node, and update corresponding index information in the data node; and verify a node update module, where The update data is written to the tail of the check node, and the corresponding index information is updated in the check node.
  • the data node update module is further configured to: update an end position occupied by the storage space of the data node; find an index of the original data corresponding to the update data, and modify the index to the update data. index of.
  • the method further includes: a data sending module, configured to send the update data, the identification information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where the data
  • the identification information of the node includes: disk identifier, IP address, and port information.
  • the check node update module is further configured to update an end position of the occupied storage space in the check node.
  • the check node update module is further configured to: query, according to the identifier information of the data node, index information corresponding to the data node; and according to the identifier information of the original data corresponding to the update data, An index of the original data is found in the index information of the data node, and the index is modified into an index of the update data.
  • the defect of applying the Erasure Code update algorithm for data update is solved by the disk space, in the data update process of the distributed file system. Effectively achieves high performance, no use of computing resources, and consistent data.
  • FIG. 1 is an architectural diagram of a distributed file system in accordance with an embodiment of the present application
  • FIG. 2 is a schematic diagram of storage management of a data node according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of storage management of a check node according to an embodiment of the present application.
  • FIG. 4 is a flowchart of a data update method of a distributed file system according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of data update according to an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a data update apparatus of a distributed file system according to an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a data updating apparatus of a distributed file system according to another embodiment of the present application.
  • a data update method for a distributed file system is provided according to an embodiment of the present application.
  • a distributed file system is composed of at least one master node (or a control node), at least one client, and multiple storage nodes.
  • the devices communicate with each other through a network.
  • Each node is an independent physical machine.
  • the main control node is mainly used to locate storage nodes, and each storage node is responsible for managing one disk.
  • the storage node includes two types: a storage original data storage node and a storage verification data storage node:
  • the storage raw data storage node referred to as a data node, is used to store the original data.
  • the raw data is divided into different data blocks according to a predetermined size (for example, 1M), and each small block has a unique number (ID) with respect to the current data node.
  • a predetermined size for example, 1M
  • the functions of the data node mainly include: (1) receiving and forwarding from the master node, other data nodes and/or guest Network data of the client; (2) Management of data on the disk and index information.
  • the index information of the data node specifically includes: a disk ID, an encoding start position and an encoding end position, an ending position of the occupied storage space (incremented sequentially), and an index of all the data blocks.
  • each data block 21 corresponds to an index 22, and the index of the data block includes: a data block identifier (ID), a data block start position, and a data block end position.
  • ID data block identifier
  • the location of the data file (data block) in the data node can be quickly located by the index information of the data block.
  • a checksum data storage node which is simply referred to as a check node, stores check data 31 generated by the Erasure Code algorithm and update data 32 generated by the update file, wherein the update data is written to the end of the check node. .
  • the check node and the data node not only store different data types, but also store the type of index information.
  • the check node usually records the encoding start position, the end position, and the end position occupied by the storage space (incrementally incremented) of the check node at the initial position 33 of the index data, and then after the start position 33.
  • Location 44 stores index information for all other data nodes.
  • the check node records the disk identifier, IP address, port information (disk information) of each data node, and the index of the check data block corresponding to the data block of the data node.
  • the index of the check data block may include: a check data block identifier (ID), a check data block start position, and a check data block end position.
  • ID check data block identifier
  • check data block start position a check data block start position
  • check data block end position a check data block end position
  • FIG. 4 illustrates a flow chart of a data update method of a distributed file system according to an embodiment of the present application.
  • Step S402 acquiring update data, writing the update data to a tail of the current data node, and updating corresponding index information in the data node;
  • Position specifically: updating the start position of the update data to N, the end position update to N plus update data length, and the end position occupied by the storage space is updated to N plus update data length; wherein N is the pre-update storage space The end position occupied.
  • the update data, the identification information of the original data corresponding to the update data, and the identifier information of the current data node are sent to the check node, where the identifier information of the data node includes: a disk identifier, IP address and port information.
  • Step S404 writing the update data to the tail of the check node, and updating the pair in the check node Index information should be.
  • the check node After receiving the update data, the check node appends the data to its own tail, and then needs to update the index information of the check node, specifically: querying the index information corresponding to the data node according to the identifier information of the data node; And searching, according to the identifier information of the original data corresponding to the update data, an index of the original data in the index information of the data node, and modifying the index into an index of the update data.
  • the end position of the storage space is updated to N plus the new data length; then the index information related to the data node is queried according to the identification information of the data node, and the start position of the update data is updated to N and the end The location is updated to N plus the update data length.
  • the distributed file system may be a TFS (Taobao File System), and the TFS system manages storage of data files in units of data blocks, each of which has a data block.
  • TFS Transmissionbao File System
  • ID Globally unique identification information
  • the TFS system supports the update operation of the files.
  • the TFS system stores a file, it establishes index information according to the file. The index information is not encoded in the Erasure Code and is directly copied to the data node (check node) where the parity block is located.
  • each data block In the encoding process, only the real data of each data block is Erasure Code encoded, and the index information of each data block is not encoded.
  • the index information of each data block is copied to each check block (check node) and stored in the index file of the check block.
  • the distributed file system is set to include three data nodes and one check node (only two data nodes are shown in FIG. 5 for simplicity), the encoding start position is 0, the end position is M, and the storage is performed. The end position where the space is occupied is N. It is assumed that the data node 1 and the data block numbered 2 are updated, and the update data (new data) is acquired in advance.
  • the update procedure of the data block 1 of the data node 1 ie, the original data
  • the update procedure of the data block 1 of the data node 1 is as follows:
  • the data node 1 appends new data to its own tail and updates the index information related to the data block No. 2.
  • the index of data block 2 is index 2, before the update, index 2 points to data block 2 (shown in the dotted line in the figure); after the update, index 2 needs to point to the update data, that is, The start position of index 2 is updated to N, the end position is updated to N+ new data length (shown as the solid line segment in the figure), and the end position of the storage space is updated to N+ new data length;
  • the data node 1 transmits new data, identification information of the original data, and related information of the data node 1 (disk ID, IP address, port number PORT) to the check node 1 through the network;
  • the check node 1 before the update, the index 2 points to the data block 2 (that is, the dotted line segment in the figure); after the update, the check node 1 first adds the new data to itself after receiving the new data. At the same time, the end position of the updated storage space is N+ new data length; the index information related to the data node 1 is queried by the disk ID, the IP address, and the port number, and the index information of the data block 2 is updated, and the index 2 points to the update. Data (shown in the solid line in the figure).
  • a data update apparatus of a distributed file system where the distributed file system includes a plurality of data nodes and at least one check node.
  • FIG. 6 is a structural block diagram of a data update apparatus of a distributed file system according to an embodiment of the present application. As shown in FIG. 6, the apparatus includes:
  • the obtaining module 610 is configured to obtain update data.
  • a data node update module 620 configured to write the update data to a tail of a current data node, and update corresponding index information in the data node;
  • the data node update module 620 updates an end position occupied by the storage space of the data node; finds an index of the original data corresponding to the update data, and modifies the index into an index of the update data. . Specifically, the start position of the update data is updated to N, the end position update is N plus the update data length, and the end position occupied by the storage space is updated to N plus update data length; wherein N is the pre-update occupied storage. The end position of the space.
  • the check node update module 630 is configured to write the update data to the tail of the check node, and update the corresponding index information in the check node.
  • the check node update module 630 updates the end position of the occupied storage space in the check node; and queries the index information corresponding to the data node according to the identification information of the data node; Updating the identification information of the original data corresponding to the data, searching an index of the original data in the index information of the data node, and modifying the index to an index of the updated data.
  • the end position of the storage space is updated to N plus a new data length; and an index related to the data node is queried according to the identification information of the data node.
  • Information; the start position of the update data is updated to N, and the end position is updated to N plus update data length.
  • FIG. 7 is a structural block diagram of a data updating apparatus of a distributed file system according to another embodiment of the present application, where the apparatus includes: an obtaining module 710, a data node updating module 720, a check node updating module 730, Data sending module 740.
  • the obtaining module 710, the data node updating module 720, and the check node updating module 730 are similar to the obtaining module 610, the data node updating module 620, and the check node updating module 630, respectively, and are not described herein.
  • the data sending module 740 is configured to send the update data, the identification information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where
  • the identification information of the data node includes: a disk identifier, an IP address, and port information.
  • the disk space is used to solve the defect that the application of the Erasure Code update algorithm for data update exists in the distributed file system.
  • the data update process effectively achieves high performance, no use of computing resources, and consistent data.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic box magnetic A tape, magnetic tape storage or other magnetic storage device or any other non-transportable medium can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data update method and device in a distributed file system. The method comprises: acquiring update data, writing the update data to a tail portion of a current data node, and updating corresponding index information in the data node (S402); and writing the update data to a tail portion of a check node, and updating corresponding index information in the check node (S404). The present invention achieves the effects of high performance, less computational resource occupation and data consistency for a data update process in a distributed file system.

Description

分布式文件系统的数据更新方法和装置Data update method and device for distributed file system
本申请要求2015年08月19日递交的申请号为201510512344.5、发明名称为“分布式文件系统的数据更新方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种分布式文件系统的数据更新方法和装置。The present application relates to the field of computer technologies, and in particular, to a data update method and apparatus for a distributed file system.
背景技术Background technique
随着Internet的不断发展,互联网上的数据成爆发性增长。成本也越来越高。很多分布文件系统通过Erasure Code(纠删码)算法对数据编码来达到降低成本的目的。在分布式文件系统中应用Erasure Code更新算法存在很多问题,无法在生产环境中应用,但业务实现在又离不开文件的更新操作,所以如何解决Erasure Code更新算法所带来的问题也越来越迫。With the continuous development of the Internet, data on the Internet has grown explosively. The cost is also getting higher and higher. Many distributed file systems encode data by the Erasure Code algorithm to achieve cost reduction. There are many problems in applying the Erasure Code update algorithm in the distributed file system, which cannot be applied in the production environment, but the business implementation is inseparable from the file update operation, so how to solve the problem caused by the Erasure Code update algorithm is also coming. The more forced.
假设Erasure Code编码的参数为:数据节点为M,校验点为N,则现有技术算法的缺点有:Assuming that the parameters encoded by the Erasure Code are: the data node is M and the checkpoint is N, the disadvantages of the prior art algorithm are:
(1)性能低;(1) low performance;
a读写磁盘次数多:磁盘读M-1份数据,向磁盘写M+N份数据;a read and write disk times: the disk reads M-1 data, write M + N data to the disk;
b网络数据传输量大:从网络接收M-1份数据,向网络发送M-1+N份数据;b Large amount of network data transmission: receiving M-1 data from the network and sending M-1+N data to the network;
(2)计算资源浪费严重,CPU需要计算M份数据;(2) The computing resources are wasted seriously, and the CPU needs to calculate M data;
(3)更新过程中部分失败时数据不可恢复;例如,计算完成后,往磁盘上写数据时,如果有失败就会出现此类问题。(3) The data cannot be recovered when part of the failure occurs during the update process; for example, when the data is written to the disk after the calculation is completed, such problems may occur if there is a failure.
综上所述,在分布文件系统中应用Erasure Code更新算法进行数据更新导致代价高、性能低、一致性难以保证等问题,因此有必要提出改进的技术手段解决上述问题。In summary, the application of the Erasure Code update algorithm in the distributed file system for data update results in high cost, low performance, and difficulty in ensuring consistency. Therefore, it is necessary to propose improved technical means to solve the above problems.
发明内容Summary of the invention
本申请的主要目的在于提供一种分布式文件系统的数据更新方法和装置,以克服现有技术中存在的分布文件系统进行数据更新导致代价高、性能低、一致性难以保证等问题。The main purpose of the present application is to provide a data update method and apparatus for a distributed file system, which overcomes the problems of high cost, low performance, and difficulty in ensuring consistency of data distribution in the distributed file system existing in the prior art.
根据本申请实施例提出一种分布式文件系统的数据更新方法,所述分布式文件系统 包括多个数据节点以及至少一个校验节点,所述方法包括:获取更新数据,将所述更新数据写入当前数据节点的尾部,并在所述数据节点中更新对应的索引信息;将所述更新数据写入所述校验节点的尾部,并在所述校验节点中更新对应的索引信息。A data update method for a distributed file system according to an embodiment of the present application is provided. Include a plurality of data nodes and at least one check node, the method comprising: obtaining update data, writing the update data to a tail of a current data node, and updating corresponding index information in the data node; The update data is written to the tail of the check node, and the corresponding index information is updated in the check node.
其中,所述在所述数据节点中更新对应的索引信息,包括:更新所述数据节点的存储空间被占用的结束位置;查找到与所述更新数据相对应的原数据的索引,将该索引修改为所述更新数据的索引。The updating the corresponding index information in the data node includes: updating an end position occupied by a storage space of the data node; finding an index of the original data corresponding to the update data, and indexing the index Modified to the index of the update data.
其中,还包括:将所述更新数据、与所述更新数据相对应的原数据的标识信息、当前数据节点的标识信息发送至所述校验节点,其中,所述数据节点的标识信息包括:磁盘标识、IP地址、端口信息。The method further includes: sending the update data, the identifier information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where the identifier information of the data node includes: Disk ID, IP address, and port information.
其中,所述在所述校验节点中更新对应的索引信息,包括:更新所述校验节点中被占用存储空间的结束位置。The updating the corresponding index information in the check node includes: updating an end position of the occupied storage space in the check node.
其中,还包括:根据所述数据节点的标识信息查询出与该数据节点相对应的索引信息;根据与所述更新数据相对应的原数据的标识信息,在所述数据节点的索引信息中查找到原数据的索引,将该索引修改为所述更新数据的索引。The method further includes: querying, according to the identifier information of the data node, index information corresponding to the data node; searching, according to the identifier information of the original data corresponding to the update data, in the index information of the data node To the index of the original data, the index is modified to an index of the updated data.
根据本申请实施例还提出一种分布式文件系统的数据更新装置,所述分布式文件系统包括多个数据节点以及至少一个校验节点,所述装置包括:According to an embodiment of the present application, a data update apparatus of a distributed file system is further provided, where the distributed file system includes a plurality of data nodes and at least one check node, and the apparatus includes:
获取模块,用于获取更新数据;数据节点更新模块,用于将所述更新数据写入当前数据节点的尾部,并在所述数据节点中更新对应的索引信息;校验节点更新模块,用于将所述更新数据写入所述校验节点的尾部,并在所述校验节点中更新对应的索引信息。An obtaining module, configured to obtain update data; a data node update module, configured to write the update data to a tail of a current data node, and update corresponding index information in the data node; and verify a node update module, where The update data is written to the tail of the check node, and the corresponding index information is updated in the check node.
其中,所述数据节点更新模块还用于,更新所述数据节点的存储空间被占用的结束位置;查找到与所述更新数据相对应的原数据的索引,将该索引修改为所述更新数据的索引。The data node update module is further configured to: update an end position occupied by the storage space of the data node; find an index of the original data corresponding to the update data, and modify the index to the update data. index of.
其中,还包括:数据发送模块,用于将所述更新数据、与所述更新数据相对应的原数据的标识信息、当前数据节点的标识信息发送至所述校验节点,其中,所述数据节点的标识信息包括:磁盘标识、IP地址、端口信息。The method further includes: a data sending module, configured to send the update data, the identification information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where the data The identification information of the node includes: disk identifier, IP address, and port information.
其中,所述校验节点更新模块还用于,更新所述校验节点中被占用存储空间的结束位置。The check node update module is further configured to update an end position of the occupied storage space in the check node.
其中,所述校验节点更新模块还用于,根据所述数据节点的标识信息查询出与该数据节点相对应的索引信息;根据与所述更新数据相对应的原数据的标识信息,在所述数据节点的索引信息中查找到原数据的索引,将该索引修改为所述更新数据的索引。 The check node update module is further configured to: query, according to the identifier information of the data node, index information corresponding to the data node; and according to the identifier information of the original data corresponding to the update data, An index of the original data is found in the index information of the data node, and the index is modified into an index of the update data.
根据本申请的上述技术方案,通过将更新数据写入数据节点以及校验节点的尾部,以磁盘空间解决了应用Erasure Code更新算法进行数据更新存在的缺陷,在分布式文件系统的数据更新过程中有效实现了高性能、不占用计算资源、以及数据一致的效果。According to the above technical solution of the present application, by writing the update data to the data node and checking the tail of the node, the defect of applying the Erasure Code update algorithm for data update is solved by the disk space, in the data update process of the distributed file system. Effectively achieves high performance, no use of computing resources, and consistent data.
附图说明DRAWINGS
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the present application, and are intended to be a part of this application. In the drawing:
图1是根据本申请实施例的分布式文件系统的架构图;1 is an architectural diagram of a distributed file system in accordance with an embodiment of the present application;
图2是根据本申请实施例的数据节点的存储管理的示意图;2 is a schematic diagram of storage management of a data node according to an embodiment of the present application;
图3是根据本申请实施例的校验节点的存储管理的示意图;3 is a schematic diagram of storage management of a check node according to an embodiment of the present application;
图4是根据本申请实施例的分布式文件系统的数据更新方法的流程图;4 is a flowchart of a data update method of a distributed file system according to an embodiment of the present application;
图5是根据本申请实施例的数据更新的示意图;FIG. 5 is a schematic diagram of data update according to an embodiment of the present application; FIG.
图6是根据本申请一个实施例的分布式文件系统的数据更新装置的结构框图;6 is a structural block diagram of a data update apparatus of a distributed file system according to an embodiment of the present application;
图7是根据本申请另一实施例的分布式文件系统的数据更新装置的结构框图。FIG. 7 is a structural block diagram of a data updating apparatus of a distributed file system according to another embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described in the following with reference to the specific embodiments of the present application and the corresponding drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
根据本申请实施例提供一种分布式文件系统的数据更新方法。A data update method for a distributed file system is provided according to an embodiment of the present application.
本申请是建立在分布式文件系统之上,参考图1,分布式文件系统由至少一个主控节点(或称为控制节点)、至少一个客户端和多个存储节点组成,上述三种类型的装置之间通过网络相互通信。每个节点都是一台独立的物理机,主控节点主要用于定位存储节点,每个存储节点负责管理一块磁盘。The present application is built on a distributed file system. Referring to FIG. 1, a distributed file system is composed of at least one master node (or a control node), at least one client, and multiple storage nodes. The devices communicate with each other through a network. Each node is an independent physical machine. The main control node is mainly used to locate storage nodes, and each storage node is responsible for managing one disk.
其中,存储节点包括存储原始数据存储节点和存储校验数据存储节点两种类型:The storage node includes two types: a storage original data storage node and a storage verification data storage node:
存储原始数据存储节点,简称为数据节点,用于存储原始数据。原始数据按照预定大小(例如1M)切分成不同的数据小块,每个小块相对于当前数据节点都有一个唯一的编号(ID)。The storage raw data storage node, referred to as a data node, is used to store the original data. The raw data is divided into different data blocks according to a predetermined size (for example, 1M), and each small block has a unique number (ID) with respect to the current data node.
数据节点的功能主要包括:(1)接收并转发来自主控节点、其他数据节点和/或客 户端的网络数据;(2)管理磁盘上的数据以及索引信息。The functions of the data node mainly include: (1) receiving and forwarding from the master node, other data nodes and/or guest Network data of the client; (2) Management of data on the disk and index information.
在实际应用中,数据节点的索引信息具体包括:磁盘ID、编码起始位置和编码结束位置、被占用存储空间的结束位置(依次递增)、所有数据块的索引。参考图2,每个数据块21都对应一个索引22,数据块的索引包括:数据块标识(ID)、数据块开始位置、数据块结束位置。在本申请的一个实施例中,通过数据块的索引信息可以快速定位数据文件(数据块)在数据节点中的位置。In an actual application, the index information of the data node specifically includes: a disk ID, an encoding start position and an encoding end position, an ending position of the occupied storage space (incremented sequentially), and an index of all the data blocks. Referring to FIG. 2, each data block 21 corresponds to an index 22, and the index of the data block includes: a data block identifier (ID), a data block start position, and a data block end position. In one embodiment of the present application, the location of the data file (data block) in the data node can be quickly located by the index information of the data block.
参考图3,存储校验数据存储节点,简称为校验节点,存储通过Erasure Code算法生成的校验数据31以及更新文件所产生的更新数据32,其中,将更新数据写入校验节点的尾部。Referring to FIG. 3, a checksum data storage node, which is simply referred to as a check node, stores check data 31 generated by the Erasure Code algorithm and update data 32 generated by the update file, wherein the update data is written to the end of the check node. .
校验节点与数据节点不仅存储的数据类型不一样,而且存储的索引信息的类型也不一样。如图3所示,校验节点通常会在索引数据最开始位置33记录校验节点的编码起始位置、结束位置、存储空间被占用的结束位置(依次递增),接着在开始位置33之后的位置44存储其他所有数据节点的索引信息。如图3所示,校验节点中分别记录每个数据节点的磁盘标识、IP地址、端口信息(磁盘信息)、以及该数据节点的数据块对应的校验数据块的索引。其中,校验数据块的索引可以包括:校验数据块标识(ID)、校验数据块开始位置、校验数据块结束位置。在实际应用中,通过校验数据块的索引可以快速定位校验数据块在校验节点中的位置。The check node and the data node not only store different data types, but also store the type of index information. As shown in FIG. 3, the check node usually records the encoding start position, the end position, and the end position occupied by the storage space (incrementally incremented) of the check node at the initial position 33 of the index data, and then after the start position 33. Location 44 stores index information for all other data nodes. As shown in FIG. 3, the check node records the disk identifier, IP address, port information (disk information) of each data node, and the index of the check data block corresponding to the data block of the data node. The index of the check data block may include: a check data block identifier (ID), a check data block start position, and a check data block end position. In practical applications, the position of the check data block in the check node can be quickly located by checking the index of the data block.
参考图4,图4示出根据本申请一个实施例的分布式文件系统的数据更新方法的流程图。Referring to FIG. 4, FIG. 4 illustrates a flow chart of a data update method of a distributed file system according to an embodiment of the present application.
步骤S402,获取更新数据,将所述更新数据写入当前数据节点的尾部,并在所述数据节点中更新对应的索引信息;Step S402, acquiring update data, writing the update data to a tail of the current data node, and updating corresponding index information in the data node;
将更新数据写入数据节点的尾部后,数据节点的存储空间发生变化,因此需要对数据节点的被占用存储空间的结束位置进行更新;并且,数据更新后,还需要更新原数据的索引指向的位置,具体地:将所述更新数据的开始位置更新为N、结束位置更新为N加更新数据长度、存储空间被占用的结束位置更新为N加更新数据长度;其中,N为更新前存储空间被占用的结束位置。After the update data is written to the end of the data node, the storage space of the data node changes, so the end position of the occupied storage space of the data node needs to be updated; and after the data is updated, the index of the original data needs to be updated. Position, specifically: updating the start position of the update data to N, the end position update to N plus update data length, and the end position occupied by the storage space is updated to N plus update data length; wherein N is the pre-update storage space The end position occupied.
然后,将所述更新数据、与所述更新数据相对应的原数据的标识信息、当前数据节点的标识信息发送至所述校验节点,其中,所述数据节点的标识信息包括:磁盘标识、IP地址、端口信息。Then, the update data, the identification information of the original data corresponding to the update data, and the identifier information of the current data node are sent to the check node, where the identifier information of the data node includes: a disk identifier, IP address and port information.
步骤S404,将所述更新数据写入所述校验节点的尾部,并在所述校验节点中更新对 应的索引信息。Step S404, writing the update data to the tail of the check node, and updating the pair in the check node Index information should be.
校验节点接收到更新数据后,将数据追加到自己的尾部,接着需要更新校验节点的索引信息,具体地:根据所述数据节点的标识信息查询出与该数据节点相对应的索引信息;根据与所述更新数据相对应的原数据的标识信息,在所述数据节点的索引信息中查找到原数据的索引,将该索引修改为所述更新数据的索引。After receiving the update data, the check node appends the data to its own tail, and then needs to update the index information of the check node, specifically: querying the index information corresponding to the data node according to the identifier information of the data node; And searching, according to the identifier information of the original data corresponding to the update data, an index of the original data in the index information of the data node, and modifying the index into an index of the update data.
首先将存储空间被占用的结束位置更新为N加新数据长度;然后根据所述数据节点的标识信息查询出与该数据节点相关的索引信息,将所述更新数据的开始位置更新为N、结束位置更新为N加更新数据长度。First, the end position of the storage space is updated to N plus the new data length; then the index information related to the data node is queried according to the identification information of the data node, and the start position of the update data is updated to N and the end The location is updated to N plus the update data length.
下面结合图5详细描述分布式文件系统的数据更新的实例。An example of data update of a distributed file system is described in detail below in conjunction with FIG.
在本申请的一个实施例中,所述分布式文件系统可以是TFS(Taobao File System,淘宝文件系统),TFS系统以数据块(Block)为单位管理数据文件的存储,每个数据块有一个全局唯一的标识信息(ID)。当存储小文件时,可以将多个小文件存储在同一个数据块中,同时TFS系统支持对文件的更新操作。TFS系统在存储文件时,建立根据该文件的索引信息。,索引信息不进行Erasure Code编码,直接复制到校验块所在的数据节点(校验节点)中。In an embodiment of the present application, the distributed file system may be a TFS (Taobao File System), and the TFS system manages storage of data files in units of data blocks, each of which has a data block. Globally unique identification information (ID). When storing small files, multiple small files can be stored in the same data block, and the TFS system supports the update operation of the files. When the TFS system stores a file, it establishes index information according to the file. The index information is not encoded in the Erasure Code and is directly copied to the data node (check node) where the parity block is located.
在编码过程中只对每个数据块的真实数据进行Erasure Code编码,不对每个数据块的索引信息编码。每个数据块的索引信息会复制到每个校验块(校验节点),并存储到校验块的索引文件中。In the encoding process, only the real data of each data block is Erasure Code encoded, and the index information of each data block is not encoded. The index information of each data block is copied to each check block (check node) and stored in the index file of the check block.
为了简单说明更新数据流程,设置分布式文件系统包括3个数据节点和1个校验节点(为简明图5中仅示出2个数据节点),编码开始位置为0、结束位置为M,存储空间被占用的结束位置为N。假设对数据节点1,编号为2的数据块进行更新,预先获取更新数据(新数据),参考图5,对数据节点1的2号数据块(即原数据)的更新流程如下:In order to briefly describe the update data flow, the distributed file system is set to include three data nodes and one check node (only two data nodes are shown in FIG. 5 for simplicity), the encoding start position is 0, the end position is M, and the storage is performed. The end position where the space is occupied is N. It is assumed that the data node 1 and the data block numbered 2 are updated, and the update data (new data) is acquired in advance. Referring to FIG. 5, the update procedure of the data block 1 of the data node 1 (ie, the original data) is as follows:
(1)数据节点1将新数据追加到自己的尾部,更新2号数据块相关的索引信息。在数据节点1中,2号数据块的索引即为索引2,在更新之前,索引2指向数据块2(即图中虚线段所示);更新之后,需要将索引2指向更新数据,也就是索引2的开始位置更新为N、结束位置更新为N+新数据长度(即图中实线段所示),存储空间被占用的结束位置更新为N+新数据长度;(1) The data node 1 appends new data to its own tail and updates the index information related to the data block No. 2. In data node 1, the index of data block 2 is index 2, before the update, index 2 points to data block 2 (shown in the dotted line in the figure); after the update, index 2 needs to point to the update data, that is, The start position of index 2 is updated to N, the end position is updated to N+ new data length (shown as the solid line segment in the figure), and the end position of the storage space is updated to N+ new data length;
(2)数据节点1通过网络将新数据、原数据的标识信息、以及数据节点1的相关信息(磁盘ID、IP地址、端口号PORT)发送给校验节点1; (2) The data node 1 transmits new data, identification information of the original data, and related information of the data node 1 (disk ID, IP address, port number PORT) to the check node 1 through the network;
(3)在校验节点1中,在更新之前,索引2指向数据块2(即图中虚线段所示);更新之后,校验节点1接收到新数据后,首先将新数据追加到自己的尾部,同时更新存储空间被占用的结束位置为N+新数据长度;通过磁盘ID、IP地址、端口号查询出与数据节点1相关的索引信息,更新数据块2的索引信息,索引2指向更新数据(即图中实线段所示)。(3) In the check node 1, before the update, the index 2 points to the data block 2 (that is, the dotted line segment in the figure); after the update, the check node 1 first adds the new data to itself after receiving the new data. At the same time, the end position of the updated storage space is N+ new data length; the index information related to the data node 1 is queried by the disk ID, the IP address, and the port number, and the index information of the data block 2 is updated, and the index 2 points to the update. Data (shown in the solid line in the figure).
根据本申请的实施例能够实现以下的技术效果:According to the embodiment of the present application, the following technical effects can be achieved:
(1)高性能;(1) high performance;
读写磁盘的次数只占原有方案的(2/(M-1+M+N)=2/(3-1+3+1)=2/6=33.33%;The number of read and write disks only accounts for the original scheme (2/(M-1+M+N)=2/(3-1+3+1)=2/6=33.33%;
网络传量只占原有方案的(2/(M-1+M+N)=2/(3-1+3+1)=2/6=33.33%;Network traffic only accounts for the original scheme (2/(M-1+M+N)=2/(3-1+3+1)=2/6=33.33%;
(2)不占用计算资源,只需要在磁盘末尾追加数据即可;(2) does not occupy computing resources, only need to add data at the end of the disk;
(3)更新过程中部分失败时会出现用户看到的数据一致的情况。(3) When the partial failure of the update process occurs, the data that the user sees is consistent.
根据本申请实施例还提供一种分布式文件系统的数据更新装置,所述分布式文件系统包括多个数据节点以及至少一个校验节点。According to an embodiment of the present application, there is also provided a data update apparatus of a distributed file system, where the distributed file system includes a plurality of data nodes and at least one check node.
图6是根据本申请一个实施例的分布式文件系统的数据更新装置的结构框图,如图6所示,所述装置包括:FIG. 6 is a structural block diagram of a data update apparatus of a distributed file system according to an embodiment of the present application. As shown in FIG. 6, the apparatus includes:
获取模块610,用于获取更新数据;The obtaining module 610 is configured to obtain update data.
数据节点更新模块620,用于将所述更新数据写入当前数据节点的尾部,并在所述数据节点中更新对应的索引信息;a data node update module 620, configured to write the update data to a tail of a current data node, and update corresponding index information in the data node;
进一步地,所述数据节点更新模块620更新所述数据节点的存储空间被占用的结束位置;查找到与所述更新数据相对应的原数据的索引,将该索引修改为所述更新数据的索引。具体来说,将所述更新数据的开始位置更新为N、结束位置更新为N加更新数据长度、存储空间被占用的结束位置更新为N加更新数据长度;其中,N为更新前被占用存储空间的结束位置。Further, the data node update module 620 updates an end position occupied by the storage space of the data node; finds an index of the original data corresponding to the update data, and modifies the index into an index of the update data. . Specifically, the start position of the update data is updated to N, the end position update is N plus the update data length, and the end position occupied by the storage space is updated to N plus update data length; wherein N is the pre-update occupied storage. The end position of the space.
校验节点更新模块630,用于将所述更新数据写入所述校验节点的尾部,并在所述校验节点中更新对应的索引信息。The check node update module 630 is configured to write the update data to the tail of the check node, and update the corresponding index information in the check node.
进一步地,所述校验节点更新模块630更新所述校验节点中被占用存储空间的结束位置;根据所述数据节点的标识信息查询出与该数据节点相对应的索引信息;根据与所述更新数据相对应的原数据的标识信息,在所述数据节点的索引信息中查找到原数据的索引,将该索引修改为所述更新数据的索引。具体来说,将存储空间被占用的结束位置更新为N加新数据长度;根据所述数据节点的标识信息查询出与该数据节点相关的索引 信息;将所述更新数据的开始位置更新为N、结束位置更新为N加更新数据长度。Further, the check node update module 630 updates the end position of the occupied storage space in the check node; and queries the index information corresponding to the data node according to the identification information of the data node; Updating the identification information of the original data corresponding to the data, searching an index of the original data in the index information of the data node, and modifying the index to an index of the updated data. Specifically, the end position of the storage space is updated to N plus a new data length; and an index related to the data node is queried according to the identification information of the data node. Information; the start position of the update data is updated to N, and the end position is updated to N plus update data length.
参考图7,图7是根据本申请另一实施例的分布式文件系统的数据更新装置的结构框图,所述装置包括有:获取模块710、数据节点更新模块720、校验节点更新模块730、数据发送模块740。Referring to FIG. 7, FIG. 7 is a structural block diagram of a data updating apparatus of a distributed file system according to another embodiment of the present application, where the apparatus includes: an obtaining module 710, a data node updating module 720, a check node updating module 730, Data sending module 740.
其中,获取模块710、数据节点更新模块720、校验节点更新模块730分别与图6所示的获取模块610、数据节点更新模块620、校验节点更新模块630类似,不再赘述。The obtaining module 710, the data node updating module 720, and the check node updating module 730 are similar to the obtaining module 610, the data node updating module 620, and the check node updating module 630, respectively, and are not described herein.
如图7所示,数据发送模块740,用于将所述更新数据、与所述更新数据相对应的原数据的标识信息、当前数据节点的标识信息发送至所述校验节点,其中,所述数据节点的标识信息包括:磁盘标识、IP地址、端口信息。As shown in FIG. 7, the data sending module 740 is configured to send the update data, the identification information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where The identification information of the data node includes: a disk identifier, an IP address, and port information.
本发明的方法的操作步骤与装置的结构特征对应,可以相互参照,不再一一赘述。The operational steps of the method of the present invention correspond to the structural features of the device, and can be referred to each other without further elaboration.
综上所述,根据本申请的上述技术方案,通过将更新数据写入数据节点以及校验节点的尾部,以磁盘空间解决了应用Erasure Code更新算法进行数据更新存在的缺陷,在分布式文件系统的数据更新过程中有效实现了高性能、不占用计算资源、以及数据一致的效果。In summary, according to the above technical solution of the present application, by writing update data to the data node and checking the tail of the node, the disk space is used to solve the defect that the application of the Erasure Code update algorithm for data update exists in the distributed file system. The data update process effectively achieves high performance, no use of computing resources, and consistent data.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁 带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic box magnetic A tape, magnetic tape storage or other magnetic storage device or any other non-transportable medium can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。 The above description is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims (10)

  1. 一种分布式文件系统的数据更新方法,所述分布式文件系统包括多个数据节点以及至少一个校验节点,其特征在于,所述方法包括:A data update method for a distributed file system, the distributed file system comprising a plurality of data nodes and at least one check node, wherein the method comprises:
    获取更新数据,将所述更新数据写入当前数据节点的尾部,并在所述数据节点中更新对应的索引信息;Obtaining update data, writing the update data to a tail of a current data node, and updating corresponding index information in the data node;
    将所述更新数据写入所述校验节点的尾部,并在所述校验节点中更新对应的索引信息。The update data is written to the tail of the check node, and the corresponding index information is updated in the check node.
  2. 根据权利要求1所述的方法,其特征在于,所述在所述数据节点中更新对应的索引信息,包括:The method according to claim 1, wherein the updating the corresponding index information in the data node comprises:
    更新所述数据节点的存储空间被占用的结束位置;Updating an end position occupied by the storage space of the data node;
    查找到与所述更新数据相对应的原数据的索引,将该索引修改为所述更新数据的索引。An index of the original data corresponding to the update data is found, and the index is modified to an index of the update data.
  3. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    将所述更新数据、与所述更新数据相对应的原数据的标识信息、当前数据节点的标识信息发送至所述校验节点,其中,所述数据节点的标识信息包括:磁盘标识、IP地址、端口信息。Sending the update data, the identifier information of the original data corresponding to the update data, and the identifier information of the current data node to the check node, where the identifier information of the data node includes: a disk identifier, an IP address Port information.
  4. 根据权利要求3所述的方法,其特征在于,所述在所述校验节点中更新对应的索引信息,包括:The method according to claim 3, wherein the updating the corresponding index information in the check node comprises:
    更新所述校验节点中被占用存储空间的结束位置。Updating the end position of the occupied storage space in the check node.
  5. 根据权利要求4所述的方法,其特征在于,还包括:The method of claim 4, further comprising:
    根据所述数据节点的标识信息查询出与该数据节点相对应的索引信息;Querying, according to the identification information of the data node, index information corresponding to the data node;
    根据与所述更新数据相对应的原数据的标识信息,在所述数据节点的索引信息中查找到原数据的索引,将该索引修改为所述更新数据的索引。And searching, according to the identifier information of the original data corresponding to the update data, an index of the original data in the index information of the data node, and modifying the index into an index of the update data.
  6. 一种分布式文件系统的数据更新装置,所述分布式文件系统包括多个数据节点以及至少一个校验节点,其特征在于,所述装置包括:A data update device for a distributed file system, the distributed file system comprising a plurality of data nodes and at least one check node, wherein the device comprises:
    获取模块,用于获取更新数据;An acquisition module for obtaining update data;
    数据节点更新模块,用于将所述更新数据写入当前数据节点的尾部,并在所述数据节点中更新对应的索引信息;a data node update module, configured to write the update data to a tail of a current data node, and update corresponding index information in the data node;
    校验节点更新模块,用于将所述更新数据写入所述校验节点的尾部,并在所述校验节点中更新对应的索引信息。 And a check node update module, configured to write the update data to a tail of the check node, and update corresponding index information in the check node.
  7. 根据权利要求6所述的装置,其特征在于,所述数据节点更新模块还用于,更新所述数据节点的存储空间被占用的结束位置;查找到与所述更新数据相对应的原数据的索引,将该索引修改为所述更新数据的索引。The device according to claim 6, wherein the data node update module is further configured to: update an end position occupied by a storage space of the data node; and find original data corresponding to the update data. An index that modifies the index to an index of the updated data.
  8. 根据权利要求6所述的装置,其特征在于,还包括:The device according to claim 6, further comprising:
    数据发送模块,用于将所述更新数据、与所述更新数据相对应的原数据的标识信息、当前数据节点的标识信息发送至所述校验节点,其中,所述数据节点的标识信息包括:磁盘标识、IP地址、端口信息。a data sending module, configured to send, to the check node, the update data, the identifier information of the original data corresponding to the update data, and the identifier information of the current data node, where the identifier information of the data node includes : Disk ID, IP address, and port information.
  9. 根据权利要求8所述的装置,其特征在于,所述校验节点更新模块还用于,更新所述校验节点中被占用存储空间的结束位置。The apparatus according to claim 8, wherein the check node update module is further configured to update an end position of the occupied storage space in the check node.
  10. 根据权利要求6所述的装置,其特征在于,所述校验节点更新模块还用于,根据所述数据节点的标识信息查询出与该数据节点相对应的索引信息;根据与所述更新数据相对应的原数据的标识信息,在所述数据节点的索引信息中查找到原数据的索引,将该索引修改为所述更新数据的索引。 The apparatus according to claim 6, wherein the check node update module is further configured to: query, according to the identification information of the data node, index information corresponding to the data node; according to the update data Corresponding identification information of the original data, an index of the original data is found in the index information of the data node, and the index is modified into an index of the update data.
PCT/CN2016/094322 2015-08-19 2016-08-10 Data update method and device in distributed file system WO2017028721A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510512344.5A CN106469172B (en) 2015-08-19 2015-08-19 The data-updating method and device of distributed file system
CN201510512344.5 2015-08-19

Publications (1)

Publication Number Publication Date
WO2017028721A1 true WO2017028721A1 (en) 2017-02-23

Family

ID=58050718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/094322 WO2017028721A1 (en) 2015-08-19 2016-08-10 Data update method and device in distributed file system

Country Status (2)

Country Link
CN (1) CN106469172B (en)
WO (1) WO2017028721A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245166A (en) * 2019-05-21 2019-09-17 阿里巴巴集团控股有限公司 Verification of data method and device
CN114398659A (en) * 2021-10-28 2022-04-26 上海哔哩哔哩科技有限公司 Resource checking method, device, equipment and storage medium
CN114676166A (en) * 2022-05-26 2022-06-28 阿里巴巴(中国)有限公司 Data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681793A (en) * 2012-04-16 2012-09-19 华中科技大学 Local data updating method based on erasure code cluster storage system
US8726129B1 (en) * 2004-07-23 2014-05-13 Hewlett-Packard Development Company, L.P. Methods of writing and recovering erasure coded data
CN104102558A (en) * 2014-07-13 2014-10-15 中国人民解放军国防科学技术大学 Erasure code based file appending method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996250B (en) * 2010-11-15 2012-07-25 中国科学院计算技术研究所 Hadoop-based mass stream data storage and query method and system
CN102799679B (en) * 2012-07-24 2014-10-22 河海大学 Hadoop-based massive spatial data indexing updating system and method
CN104376053B (en) * 2014-11-04 2017-12-22 南京信息工程大学 A kind of storage and retrieval method based on magnanimity meteorological data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8726129B1 (en) * 2004-07-23 2014-05-13 Hewlett-Packard Development Company, L.P. Methods of writing and recovering erasure coded data
CN102681793A (en) * 2012-04-16 2012-09-19 华中科技大学 Local data updating method based on erasure code cluster storage system
CN104102558A (en) * 2014-07-13 2014-10-15 中国人民解放军国防科学技术大学 Erasure code based file appending method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245166A (en) * 2019-05-21 2019-09-17 阿里巴巴集团控股有限公司 Verification of data method and device
CN110245166B (en) * 2019-05-21 2023-09-26 创新先进技术有限公司 Data checking method and device
CN114398659A (en) * 2021-10-28 2022-04-26 上海哔哩哔哩科技有限公司 Resource checking method, device, equipment and storage medium
CN114676166A (en) * 2022-05-26 2022-06-28 阿里巴巴(中国)有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN106469172B (en) 2019-07-23
CN106469172A (en) 2017-03-01

Similar Documents

Publication Publication Date Title
CN110275884B (en) Data storage method and node
CN106815218B (en) Database access method and device and database system
CN106547859B (en) Data file storage method and device under multi-tenant data storage system
US10140351B2 (en) Method and apparatus for processing database data in distributed database system
JP6264666B2 (en) Data storage method, data storage device, and storage device
WO2017201977A1 (en) Data writing and reading method and apparatus, and distributed object storage cluster
US10540119B2 (en) Distributed shared log storage system having an adapter for heterogenous big data workloads
WO2016101283A1 (en) Data processing method, apparatus and system
WO2016180055A1 (en) Method, device and system for storing and reading data
WO2015070674A1 (en) Method and system for manipulating data
CN107977396B (en) Method and device for updating data table of KeyValue database
CN104020961A (en) Distributed data storage method, device and system
US9110820B1 (en) Hybrid data storage system in an HPC exascale environment
CN107391033B (en) Data migration method and device, computing equipment and computer storage medium
JP2020535550A5 (en)
WO2017020668A1 (en) Physical disk sharing method and apparatus
US20210250218A1 (en) Abstraction layer for streaming data sources
WO2017028721A1 (en) Data update method and device in distributed file system
CN105227672A (en) The method and system that data store and access
JP2015528957A (en) Distributed file system, file access method, and client device
WO2017084520A1 (en) Method and apparatus for synchronizing data files in a cloud environment
WO2016101759A1 (en) Data routing method, data management device and distributed storage system
CN107493309B (en) File writing method and device in distributed system
CN109254958B (en) Distributed data reading and writing method, device and system
US20190243807A1 (en) Replication of data in a distributed file system using an arbiter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16836585

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16836585

Country of ref document: EP

Kind code of ref document: A1