WO2023098048A1 - Method and apparatus for expanding erasure code storage system - Google Patents

Method and apparatus for expanding erasure code storage system Download PDF

Info

Publication number
WO2023098048A1
WO2023098048A1 PCT/CN2022/101302 CN2022101302W WO2023098048A1 WO 2023098048 A1 WO2023098048 A1 WO 2023098048A1 CN 2022101302 W CN2022101302 W CN 2022101302W WO 2023098048 A1 WO2023098048 A1 WO 2023098048A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
nodes
data
extended
stripe
Prior art date
Application number
PCT/CN2022/101302
Other languages
French (fr)
Chinese (zh)
Inventor
沈志荣
杜知城
范瑞彬
张开翔
李辉忠
李成博
Original Assignee
深圳前海微众银行股份有限公司
厦门大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司, 厦门大学 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023098048A1 publication Critical patent/WO2023098048A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • V LCM(K, K+D+1)(K+D)(K+1)/K
  • the present invention provides a device for expanding an erasure code storage system, the device comprising:
  • the LCM() is used to characterize the function of obtaining the least common multiple
  • k is used to represent the number of nodes storing data blocks before expansion on each stripe
  • d is used to represent the number of newly added nodes.
  • Fig. 10 is a schematic diagram of the result graph of the test in different bandwidth impact experiments provided by the embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a test result diagram of an experiment on the influence of different numbers of newly added nodes provided by an embodiment of the present invention.
  • the spatial distribution scheme for distributing data blocks and check blocks on the same strip on different K+M nodes is set as follows: store one checksum on the first K+1 nodes check blocks and K data blocks; among them, the positions of the check blocks are arranged diagonally on K+1 strips; the check blocks other than 1 check block are stored on the last M-1 nodes piece.
  • the first number of nodes storing data blocks on each stripe, and the second number of nodes storing check blocks on each stripe based on the spatial location distribution information, determine the first number of nodes storing data blocks on each stripe, and the second number of nodes storing check blocks on each stripe; Add the number and the number of newly added nodes to obtain the number of the third node, and use the number of the third node as the number of expanded storage data blocks on each stripe; and, use the number of the second node , as the number of extended storage check blocks on each stripe, so as to determine the extended node information on each stripe.
  • the foregoing process can be expressed as: in the i-th basic stripe for logical location replacement, perform location replacement on the first check block of the stripe and the i-th data block of the corresponding node adjustment stripe, After the replacement, the above data block migration algorithm is also executed.
  • the value range of i is greater than 0 and less than D.
  • the expansion time when the network bandwidth changes from 1 Gb/s to 2 Gb/s can be measured.
  • the test results are shown in FIG. 10 .
  • the method provided by the embodiment of the present invention requires the least extension traffic, and improves the parallelism of transmission compared with the other two mechanisms.
  • the present invention reduces the average by 49.8% and 58.9%.
  • the network bandwidth increases to 2Gb/s the average reduction is 50.8% and 58.8% respectively.
  • a traffic simulation test under a general configuration may be performed.
  • This test is to evaluate the flow of successive expansion processes of different expansion mechanisms, and consider the two situations of RS(6,3) and RS(10,4), and set the parameters The value of d is set to 2.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Error Detection And Correction (AREA)

Abstract

Disclosed in the present invention are a method and apparatus for expanding an erasure code storage system. The method comprises: determining data in a storage system, encoding the data, dispersedly storing the data in nodes, and obtaining spatial position distribution information of the nodes; determining the number of newly added nodes on each stripe on the basis of expansion demand information, and determining expansion node information on each stripe on the basis of the number of newly added nodes and the spatial position distribution information, wherein each stripe comprises a data block and a check block which have an encoding relationship; determining an expansion group on the basis of the expansion node information and the rule of the least common multiple, and splitting the expansion group, so as to obtain a target group comprising a plurality of selected stripes; and executing an expansion algorithm on the target group, so as to obtain a corresponding target expanded group, wherein the target expanded group comprises an expanded data block and an expanded check block. On the basis of the method, the expansion efficiency of an erasure code storage system can be improved.

Description

一种扩展纠删码存储系统的方法及装置A method and device for extending an erasure code storage system
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年12月02日提交中国专利局、申请号为202111459202.9、申请名称为“一种扩展纠删码存储系统的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111459202.9 and the application name "A Method and Device for Extended Erasure Code Storage System" submitted to the China Patent Office on December 02, 2021, the entire contents of which are incorporated by reference incorporated in this application.
技术领域technical field
本发明实施例涉及金融科技(Fintech)领域,尤其涉及一种扩展纠删码存储系统的方法及装置。Embodiments of the present invention relate to the field of financial technology (Fintech), and in particular to a method and device for expanding an erasure code storage system.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技转变,但由于金融行业的安全性、实时性要求,也对技术提出的更高的要求。With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology. However, due to the security and real-time requirements of the financial industry, higher requirements are also placed on technology.
目前,存储系统部署在大量存储节点上,是支持信息检索、机器学习、视频流等各种上层应用的主要骨干。为了保证存储系统中数据的可靠性,存储系统经常依赖于复制和纠删码技术,两者都需要提前存储额外的数据冗余,这样系统就可以利用冗余来恢复丢失的数据,与复制相比,纠删码技术可以在同等的存储开销下实现更高的数据可靠性。At present, the storage system is deployed on a large number of storage nodes, and it is the main backbone supporting various upper-level applications such as information retrieval, machine learning, and video streaming. In order to ensure the reliability of data in the storage system, the storage system often relies on replication and erasure code technology, both of which need to store additional data redundancy in advance, so that the system can use redundancy to recover lost data, which is similar to replication Compared with erasure code technology, it can achieve higher data reliability with the same storage overhead.
并且,随着数据的不断增长,对存储系统的可扩展性提出了更高的要求。具体的,存储扩展功能的实现需要存储系统执行数据重定位和校验块更新着两个操作。然而,现有技术中提供的方案,在数据重定位和校验块更新中,存储系统的扩展,不可避免地引起大量的数据传输,导致传输并行性差,且扩展过程较长,即扩展效率和效果较差。Moreover, with the continuous growth of data, higher requirements are placed on the scalability of the storage system. Specifically, the implementation of the storage expansion function requires the storage system to perform two operations: data relocation and check block update. However, in the solutions provided in the prior art, in data relocation and check block update, the expansion of the storage system inevitably causes a large amount of data transmission, resulting in poor transmission parallelism and a long expansion process, that is, the expansion efficiency and The effect is poor.
发明内容Contents of the invention
本发明提供一种扩展纠删码存储系统的方法及装置,用于解决现有技术中纠删码存储系统扩展效率较低的问题。The invention provides a method and device for expanding an erasure code storage system, which is used to solve the problem of low expansion efficiency of an erasure code storage system in the prior art.
第一方面,本发明提供一种扩展纠删码存储系统的方法,所述方法包括:确定所述存储系统中的数据,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点的空间位置分布信息;基于扩展需求信息,确定每个条带上新增的节点个数,并基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息;所述条带包括具有编码关系的数据块和校验块;基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括多个被选择的条带的目标组;所述扩展组由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的;对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块。In a first aspect, the present invention provides a method for expanding an erasure code storage system, the method comprising: determining data in the storage system, encoding the data, and dispersively storing the data in each node, Obtain the spatial location distribution information of each node; determine the number of newly added nodes on each strip based on the extended demand information, and determine each Extended node information on a stripe; the stripe includes a data block and a check block having an encoding relationship; based on the extended node information and the least common multiple rule, an extended group is determined, and the extended group is split, Obtaining a target group including a plurality of selected strips; the expansion group is composed of a plurality of strips that satisfy the expansion requirement and the condition that the spatial position distribution law remains unchanged; perform an expansion algorithm on the target group, A corresponding target extension group is obtained, where the target extension group includes an extended data block and an extended parity block.
在上述方法中,提出了一种全新的扩展机制,目的是减少流量并在连续伸缩中探索传 输并行性的扩展机制。在该扩展机制中,设计了一种新的条带布局,该条带布局利用本地存储的数据块进行校验块的更新,从而减少了用于校验块更新的数据传输。因此,可以减少校验块更新的数据传输,从而提高扩展效率。In the above method, a new scaling mechanism is proposed with the aim of reducing traffic and exploring the scaling mechanism of transmission parallelism in continuous scaling. In this extension mechanism, a new stripe layout is designed, which utilizes locally stored data blocks for parity block update, thereby reducing the data transmission for parity block update. Therefore, the data transmission for parity block update can be reduced, thereby improving the expansion efficiency.
可选的,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点空间位置分布信息,包括:将所述数据划分为K个大小相同的数据块;K为大于1的正整数;将所述K个数据块与预设编码矩阵做域内矩阵运算,获得M个校验块;M为大于1且小于K的正整数;所述K个数据块与所述M个校验块构成多个条带;将同一条带上的数据块和校验块分散在不同的K+M个节点上,确定所述K个数据块与所述M个校验块在各个节点的分布信息,并基于所述分布信息获得所述空间位置分布信息。Optionally, the data is encoded, and the data is scattered and stored in each node, and the spatial position distribution information of each node is obtained, including: dividing the data into K data blocks of the same size; K is A positive integer greater than 1; performing an intra-domain matrix operation on the K data blocks and a preset encoding matrix to obtain M check blocks; M is a positive integer greater than 1 and less than K; the K data blocks and the M check blocks form multiple strips; the data blocks and check blocks on the same strip are scattered on different K+M nodes, and it is determined that the K data blocks and the M check blocks are in distribution information of each node, and obtain the spatial location distribution information based on the distribution information.
在上述方法中,提供了具体对数据的处理,以及对数据块和校验块分散存储的方式。基于该方式,可以为后续的基于该新的条带布局实现校验块和数据块的扩展更新提供良好的实施基础,从而提高拓展效率。In the above method, specific processing of data and a manner of distributed storage of data blocks and check blocks are provided. Based on this method, a good implementation basis can be provided for subsequent expansion and updating of check blocks and data blocks based on the new stripe layout, thereby improving expansion efficiency.
可选的,基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息,包括:基于所述空间位置分布信息,确定每个条带上存储数据块的第一节点个数,以及每个条带上存储校验块的第二节点个数;将所述第一节点个数和所述新增的节点个数相加,获得第三节点个数,将所述第三节点个数,作为每个条带上扩展后的存储数据块的个数;以及,将所述第二节点个数,作为每个条带上扩展后的存储校验块的个数,以确定每个条带上的扩展节点信息。Optionally, determining the extended node information on each stripe based on the newly added number of nodes and the spatial location distribution information includes: determining the data stored on each stripe based on the spatial location distribution information The first number of nodes of the block, and the second number of nodes storing the check block on each stripe; adding the first number of nodes and the number of newly added nodes to obtain the third number of nodes number, use the number of the third node as the number of expanded storage data blocks on each stripe; and use the number of the second node as the expanded storage checksum on each stripe The number of blocks to determine the extended node information on each stripe.
基于上述方法,可以准确且快速的确定每个条带上的扩展节点信息,每个条带上扩展后的存储数据块和存储校验块的个数。这样,为后续数据块和校验块填充数据提供基础,从而快速实现数据块的迁移和校验块的更新,提高扩展效率。Based on the above method, the expanded node information on each stripe, the number of expanded storage data blocks and storage check blocks on each stripe can be accurately and quickly determined. In this way, a basis is provided for filling data of subsequent data blocks and check blocks, so as to quickly realize the migration of data blocks and update of check blocks, and improve the expansion efficiency.
可选的,基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括具有对应关系的条带的目标组,包括:基于所述扩展节点信息和最小公倍数规则,确定扩展组;所述扩展组包括V个扩展条带;对所述V个扩展条带进行拆分,确定P个基本组和R个调整组;每个所述基本组包括Vp个基本条带,每个所述调整组中包括Vr个调整条带;P和R为大于1的正整数;从所述基本组中选择K个基本条带,以及从所述调整组中选择D个调整条带,并基于所述K个基本条带和所述D个调整条带确定目标组;所述目标组中包括K+D个条带。Optionally, based on the extended node information and the least common multiple rule, determine an extended group, and split the extended group to obtain a target group including stripes with corresponding relationships, including: based on the extended node information and the least common multiple rule to determine an extended group; the extended group includes V extended strips; the V extended strips are split to determine P basic groups and R adjustment groups; each of the basic groups includes Vp basic strips, each of the adjustment groups includes Vr adjustment strips; P and R are positive integers greater than 1; K basic strips are selected from the basic group, and from the adjustment group Select D adjustment strips, and determine a target group based on the K basic strips and the D adjustment strips; the target group includes K+D strips.
基于上述方法,可以对扩展组进行拆分,从而确定包括需要根据新增节点上的数据块进行更新的条带的基本组,以及包括将数据块发送到基本组的条带的调整组,从而可以基于调整组和基本组实现对数据块的快速迁移以及校验块的快速更新。Based on the above method, the expansion group can be split to determine the basic group including the stripes that need to be updated according to the data blocks on the newly added nodes, and the adjustment group including the stripes that send data blocks to the basic group, so that Based on the adjustment group and the basic group, the fast migration of the data block and the fast update of the check block can be realized.
可选的,所述最小公倍数规则采用以下公式确定:Optionally, the least common multiple rule is determined by the following formula:
V=LCM(K,K+D+1)(K+D)(K+1)/KV=LCM(K, K+D+1)(K+D)(K+1)/K
其中,所述LCM()用于表征求取最小公倍数的函数,k用于表征每个条带上的扩展前的存储数据块的节点个数;d用于表征新增节点个数。Wherein, the LCM() is used to represent a function for obtaining the least common multiple, k is used to represent the number of nodes storing data blocks before extension on each stripe; d is used to represent the number of newly added nodes.
基于上述方法,可以准确且快速的确定扩展组包括的条带个数。Based on the above method, the number of stripes included in the extended group can be accurately and quickly determined.
可选的,对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块,包括:对任一所述目标组中的K+D个条带编号,并对所述存储系统扩展后的K+M+D个节点编号;计算前K+1个节点中调整条带上数据块的差异校验块,并基于所述差异校验块更新同一节点上基本条带的第一个校验块;按照轮循模式将所述调 整条带上数据块,传输到同一节点上的所述基本条带中,获得拓展后的初始扩展组;对所述初始扩展组执行预设操作,获得对应的目标扩展组。Optionally, an extension algorithm is executed on the target group to obtain a corresponding target extension group, the target extension group includes extended data blocks and extended check blocks, including: K+D blocks in any of the target groups Numbering the stripe, and numbering the K+M+D nodes after the storage system is expanded; calculating the difference check block for adjusting the data blocks on the stripe in the first K+1 nodes, and based on the difference check block Updating the first check block of the basic stripe on the same node; transferring the data block on the adjusted stripe to the basic stripe on the same node according to the round robin mode, and obtaining the expanded initial extended group; A preset operation is performed on the initial extended group to obtain a corresponding target extended group.
基于上述方法,并行执行纠删码存储系统扩展的数据块迁移和校验块更新,即在扩展过程中,调度部分节点执行数据块迁移操作,同时,分配传输任务到另一部分节点来执行校验块更新操作,这样,可以提高扩展效率。Based on the above method, the data block migration and verification block update of the erasure code storage system expansion are performed in parallel, that is, during the expansion process, some nodes are scheduled to perform the data block migration operation, and at the same time, the transmission task is assigned to another part of the nodes to perform the verification Block update operations, in this way, can improve scaling efficiency.
可选的,在获得所述目标扩展组之后,所述方法还包括:确定目标扩展组对应的条带的逻辑关系,以及各个扩展数据块和扩展校验块对应的第一空间分布信息;按照所述空间分布信息,调整所述逻辑关系顺序,以使所述第一空间分布信息与所述空间分布信息的逻辑布局相同。Optionally, after obtaining the target extended group, the method further includes: determining the logical relationship of the stripes corresponding to the target extended group, and the first spatial distribution information corresponding to each extended data block and extended parity block; The spatial distribution information adjusts the order of the logical relationship so that the logical layout of the first spatial distribution information is the same as that of the spatial distribution information.
基于上述方法,可以支持在纠删码存储系统执行下一次扩展时,不需要调整空间分布,从而减少不必要的开销。此外,还提供了支持纠删码存储系统的连续扩展的功能。Based on the above method, it can be supported that the space distribution does not need to be adjusted when the erasure code storage system performs the next expansion, thereby reducing unnecessary overhead. Additionally, functionality is provided to support continuous scaling of erasure coded storage systems.
第二方面,本发明提供一种扩展纠删码存储系统的装置,所述装置包括:In a second aspect, the present invention provides a device for expanding an erasure code storage system, the device comprising:
第一处理单元,用于确定所述存储系统中的数据,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点的空间位置分布信息;第二处理单元,用于基于扩展需求信息,确定每个条带上新增的节点个数,并基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息;所述条带包括具有编码关系的数据块和校验块;第三处理单元,用于基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括多个被选择的条带的目标组;所述扩展组由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的;获得单元,用于对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块。The first processing unit is configured to determine the data in the storage system, encode the data, and dispersely store the data in each node, and obtain the spatial location distribution information of each node; the second processing unit, It is used to determine the number of newly added nodes on each strip based on the extended demand information, and determine the expanded node information on each strip based on the number of newly added nodes and the spatial location distribution information; The strip includes a data block and a check block with an encoding relationship; the third processing unit is configured to determine an extended group based on the extended node information and the least common multiple rule, and split the extended group to obtain the A target group of a plurality of selected strips; the expansion group is composed of a plurality of strips satisfying the condition that the expansion requirement can be completed and the spatial position distribution law is unchanged; the obtaining unit is used to execute the target group An extension algorithm is used to obtain a corresponding target extension group, where the target extension group includes an extension data block and an extension check block.
可选的,所述第一处理单元,用于:将所述数据划分为K个大小相同的数据块;K为大于1的正整数;将所述K个数据块与预设编码矩阵做域内矩阵运算,获得M个校验块;M为大于1且小于K的正整数;所述K个数据块与所述M个校验块构成多个条带;将同一条带上的数据块和校验块分散在不同的K+M个节点上,确定所述K个数据块与所述M个校验块在各个节点的分布信息,并基于所述分布信息获得所述空间位置分布信息。Optionally, the first processing unit is configured to: divide the data into K data blocks of the same size; K is a positive integer greater than 1; make the K data blocks and the preset encoding matrix into domain Matrix operation to obtain M check blocks; M is a positive integer greater than 1 and less than K; the K data blocks and the M check blocks form multiple strips; the data blocks on the same strip and the The check blocks are scattered on different K+M nodes, the distribution information of the K data blocks and the M check blocks on each node is determined, and the spatial position distribution information is obtained based on the distribution information.
可选的,所述第二处理单元,用于:基于所述空间位置分布信息,确定每个条带上存储数据块的第一节点个数,以及每个条带上存储校验块的第二节点个数;将所述第一节点个数和所述新增的节点个数相加,获得第三节点个数,将所述第三节点个数,作为每个条带上扩展后的存储数据块的个数;以及,将所述第二节点个数,作为每个条带上扩展后的存储校验块的个数,以确定每个条带上的扩展节点信息。Optionally, the second processing unit is configured to: determine the first number of nodes storing data blocks on each stripe and the number of first nodes storing check blocks on each stripe based on the spatial position distribution information. Two number of nodes; add the first number of nodes and the number of newly added nodes to obtain the third number of nodes, and use the third number of nodes as the expanded number of nodes on each strip The number of stored data blocks; and, using the second number of nodes as the number of expanded storage check blocks on each stripe to determine the expanded node information on each stripe.
可选的,所述第三处理单元,用于:基于所述扩展节点信息和最小公倍数规则,确定扩展组;所述扩展组包括V个扩展条带;对所述V个扩展条带进行拆分,确定P个基本组和R个调整组;每个所述基本组包括Vp个基本条带,每个所述调整组中包括Vr个调整条带;P和R为大于1的正整数;从所述基本组中选择K个基本条带,以及从所述调整组中选择D个调整条带,并基于所述K个基本条带和所述D个调整条带确定目标组;所述目标组中包括K+D个条带。Optionally, the third processing unit is configured to: determine an extension group based on the extension node information and the least common multiple rule; the extension group includes V extension strips; and disassemble the V extension strips points, determine P basic groups and R adjustment groups; each of the basic groups includes Vp basic strips, and each of the adjustment groups includes Vr adjustment strips; P and R are positive integers greater than 1; Select K basic strips from the basic group, and select D adjustment strips from the adjustment group, and determine a target group based on the K basic strips and the D adjustment strips; the The target group includes K+D stripes.
可选的,所述最小公倍数规则采用以下公式确定:Optionally, the least common multiple rule is determined by the following formula:
V=LCM(K,K+D+1)(K+D)(K+1)/KV=LCM(K, K+D+1)(K+D)(K+1)/K
其中,所述LCM()用于表征求取最小公倍数的函数,k用于表征每个条带上的扩展 前的存储数据块的节点个数;d用于表征新增节点个数。Wherein, the LCM() is used to characterize the function of obtaining the least common multiple, k is used to represent the number of nodes storing data blocks before expansion on each stripe; d is used to represent the number of newly added nodes.
可选的所述获得单元,具体用于:对任一所述目标组中的K+D个条带编号,并对所述存储系统扩展后的K+M+D个节点编号;计算前K+1个节点中调整条带上数据块的差异校验块,并基于所述差异校验块更新同一节点上基本条带的第一个校验块;按照轮循模式将所述调整条带上数据块,传输到同一节点上的所述基本条带中,获得拓展后的初始扩展组;对所述初始扩展组执行预设操作,获得对应的目标扩展组。The optional obtaining unit is specifically configured to: number K+D stripes in any one of the target groups, and number K+M+D nodes after the storage system is expanded; calculate the first K Adjust the difference check block of the data block on the stripe in +1 node, and update the first check block of the basic stripe on the same node based on the difference check block; The upper data block is transmitted to the basic stripe on the same node to obtain the expanded initial extended group; the preset operation is performed on the initial extended group to obtain the corresponding target extended group.
可选的,所述装置还包括调整单元,用于:确定目标扩展组对应的条带的逻辑关系,以及各个扩展数据块和扩展校验块对应的第一空间分布信息;按照所述空间分布信息,调整所述逻辑关系顺序,以使所述第一空间分布信息与所述空间分布信息的逻辑布局相同。Optionally, the device further includes an adjustment unit, configured to: determine the logical relationship of the stripes corresponding to the target extended group, and the first spatial distribution information corresponding to each extended data block and extended parity block; according to the spatial distribution information, adjusting the order of the logical relationship so that the logical layout of the first spatial distribution information is the same as that of the spatial distribution information.
上述第二方面及第二方面各个可选装置的有益效果,可以参考上述第一方面及第一方面各个可选方法的有益效果,这里不再赘述。For the beneficial effects of the above-mentioned second aspect and each optional device of the second aspect, reference may be made to the beneficial effects of the above-mentioned first aspect and each optional method of the first aspect, which will not be repeated here.
第三方面,本发明提供一种计算机设备,包括程序或指令,当所述程序或指令被执行时,用以执行上述第一方面及第一方面各个可选的方法。In a third aspect, the present invention provides a computer device, including a program or an instruction, and when the program or instruction is executed, is used to execute the above-mentioned first aspect and each optional method of the first aspect.
第四方面,本发明提供一种存储介质,包括程序或指令,当所述程序或指令被执行时,用以执行上述第一方面及第一方面各个可选的方法。In a fourth aspect, the present invention provides a storage medium, including a program or an instruction, and when the program or instruction is executed, is used to execute the above-mentioned first aspect and each optional method of the first aspect.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the drawings that need to be used in the description of the embodiments.
图1为传统存储系统扩展纠删码RS(2,1,4)过程的数据块迁移阶段的示意图;Fig. 1 is a schematic diagram of the data block migration stage of the traditional storage system extended erasure code RS (2,1,4) process;
图2为传统存储系统扩展纠删码RS(2,1,4)过程的校验块更新阶段的示意图;Fig. 2 is a schematic diagram of the check block update stage of the extended erasure code RS(2,1,4) process of the traditional storage system;
图3为本发明实施例提供的一种可选的应用场景示意图;FIG. 3 is a schematic diagram of an optional application scenario provided by an embodiment of the present invention;
图4为本发明实施例提供的一种可选的纠删码存储系统的架构示意图;FIG. 4 is a schematic structural diagram of an optional erasure code storage system provided by an embodiment of the present invention;
图5为本发明实施例提供的一种扩展纠删码存储系统的方法的步骤流程示意图;FIG. 5 is a schematic flowchart of steps of a method for extending an erasure code storage system provided by an embodiment of the present invention;
图6为本发明实施例提供的条带中纠删码RS(k,m)的编码过程的示意图;FIG. 6 is a schematic diagram of an encoding process of an erasure correction code RS(k,m) in a stripe according to an embodiment of the present invention;
图7为本发明实施例提供的纠删码存储系统中纠删码RS(2,2)和纠删码RS(3,2)的纠删码存储分布的示意图;Fig. 7 is a schematic diagram of erasure correction code storage distribution of erasure correction code RS(2,2) and erasure correction code RS(3,2) in the erasure correction code storage system provided by the embodiment of the present invention;
图8为本发明实施例提供的校验块更新与数据块重定位并行性算法针对纠删码RS(2,1,4)的示意图;FIG. 8 is a schematic diagram of the parallel algorithm for updating check blocks and relocating data blocks provided by an embodiment of the present invention for erasure code RS(2,1,4);
图9为本发明实施例提供的纠删码(2,2,3)的扩展过程的工作流程图的示意图;FIG. 9 is a schematic diagram of a work flow diagram of an expansion process of an erasure code (2, 2, 3) provided by an embodiment of the present invention;
图10为本发明实施例提供的测试在不同带宽影响实验的结果图的示意图;Fig. 10 is a schematic diagram of the result graph of the test in different bandwidth impact experiments provided by the embodiment of the present invention;
图11为本发明实施例提供的测试在不同大小数据块影响实验的结果图的示意图;Fig. 11 is a schematic diagram of the result graph of testing the impact experiment on data blocks of different sizes provided by the embodiment of the present invention;
图12为本发明实施例提供的测试在不同新增节点数量影响实验的结果图的示意图;FIG. 12 is a schematic diagram of a test result diagram of an experiment on the influence of different numbers of newly added nodes provided by an embodiment of the present invention;
图13为本发明实施例提供的纠删码存储系统通用配置下扩展过程流量数值分析测试的结果图的示意图;Fig. 13 is a schematic diagram of the result graph of the extended process flow numerical analysis test under the general configuration of the erasure code storage system provided by the embodiment of the present invention;
图14为本发明实施例提供的纠删码存储系统扩展过程在不同新增节点数量下影响流量带宽的数值分析实验结果图的示意图;Fig. 14 is a schematic diagram of a numerical analysis experiment result diagram of the impact of the expansion process of the erasure code storage system provided by the embodiment of the present invention on the flow bandwidth under different numbers of newly added nodes;
图15为本发明实施例提供的删码存储系统不同扩展过程中带宽利用率数值分析的结果图的示意图;FIG. 15 is a schematic diagram of a result graph of a numerical analysis of bandwidth utilization in different expansion processes of a code erasure storage system provided by an embodiment of the present invention;
图16为本发明实施例提供的一种扩展纠删码存储系统的装置的结构示意图。FIG. 16 is a schematic structural diagram of an extended erasure code storage system device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了更好的理解上述技术方案,下面将结合说明书附图及具体的实施方式对上述技术方案进行详细的说明,应当理解本发明实施例以及实施例中的具体特征是对本发明技术方案的详细的说明,而不是对本发明技术方案的限定,在不冲突的情况下,本发明实施例以及实施例中的技术特征可以相互结合。In order to better understand the above-mentioned technical solution, the above-mentioned technical solution will be described in detail below in conjunction with the accompanying drawings and specific implementation methods. It should be understood that the embodiments of the present invention and the specific features in the embodiments are detailed descriptions of the technical solution of the present invention. To illustrate, rather than limit, the technical solutions of the present invention, the embodiments of the present invention and the technical features in the embodiments may be combined without conflict.
需要说明的是,本发明的说明书和权利要求中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的图像在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。It should be noted that the terms "first" and "second" in the description and claims of the present invention are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence. It is to be understood that the images so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with aspects of the invention as recited in the appended claims.
为便于理解本发明实施例提供的技术方案,这里先对本发明实施例使用的一些关键名词或过程进行解释:In order to facilitate the understanding of the technical solutions provided by the embodiments of the present invention, some key terms or processes used in the embodiments of the present invention are first explained here:
1、纠删码(Erasure Code,EC):是一种前向错误纠正技术(Forward Error Correction,FEC),可以将n份原始数据,增加m份数据,并能通过n+m份中的任意n份数据,还原为原始数据。即如果有任意小于等于m份的数据失效,仍然能通过剩下的数据还原出来。其主要应用在网络传输中避免包的丢失,存储系统利用它来提高存储可靠性。1. Erasure Code (EC): It is a forward error correction technology (Forward Error Correction, FEC), which can add n copies of original data to m copies of data, and can pass any of n+m copies n copies of data, restored to the original data. That is, if any data less than or equal to m is invalid, it can still be restored through the remaining data. It is mainly used to avoid packet loss during network transmission, and is used by storage systems to improve storage reliability.
2、在分布式存储系统中纠删码技术的应用主要有三类:阵列纠删码、RS(Reed-Solomon)里德-所罗门类纠删码和LDPC(LowDensity Parity Check Code)低密度奇偶校验纠删码。在本发明实施例中,主要针对RS类纠删码对应分布式储存系统的扩展进行说明。2. There are three main types of erasure code technology applications in distributed storage systems: array erasure codes, RS (Reed-Solomon) Reed-Solomon class erasure codes, and LDPC (LowDensity Parity Check Code) low-density parity checks Erasure coding. In the embodiment of the present invention, the extension of the RS type erasure code corresponding to the distributed storage system is mainly described.
下面对本发明实施例的设计思想进行简要介绍:The design idea of the embodiment of the present invention is briefly introduced below:
请参阅图1所示,为现有技术中传统存储系统扩展纠删码参数为(2,1,4)的数据块迁移阶段的示意图。以及,请参阅图2所示,为现有技术中传统存储系统扩展纠删码的参数为(2,1,4)的校验块更新阶段的示意图。其中,图1和图2中的S用于表征条带,N用于表征节点,D用于表征数据块,P用于表征校验块。Please refer to FIG. 1 , which is a schematic diagram of a data block migration stage with extended erasure code parameters (2, 1, 4) in a traditional storage system in the prior art. And, please refer to FIG. 2 , which is a schematic diagram of an update phase of a parity block with extended erasure code parameters (2, 1, 4) in a traditional storage system in the prior art. Among them, S in Figure 1 and Figure 2 is used to represent a stripe, N is used to represent a node, D is used to represent a data block, and P is used to represent a check block.
很显然,现有技术中的数据块的迁移和校验块的更新,不可避免地引起大量的数据传输,导致传输并行性差,且扩展过程较长,即扩展效率和效果较差。Apparently, the migration of data blocks and the update of check blocks in the prior art will inevitably lead to a large amount of data transmission, resulting in poor transmission parallelism and a long expansion process, that is, poor expansion efficiency and effect.
鉴于此,本发明提供一种扩展纠删码存储系统的方法,该方法提出了一种全新的扩展机制,目的是减少流量并在连续伸缩中探索传输并行性的扩展机制。在该扩展机制中,设计了一种新的条带布局,该条带布局利用本地存储的数据块进行校验块的更新,从而减少了用于校验块更新的数据传输。可见,本发明提供的扩展纠删码存储系统的方法可以减少校验块更新的数据传输,从而提高扩展效率。In view of this, the present invention provides a method for extending an erasure code storage system, which proposes a brand-new extension mechanism, with the purpose of reducing traffic and exploring the extension mechanism of transmission parallelism in continuous scaling. In this extension mechanism, a new stripe layout is designed, which utilizes locally stored data blocks for parity block update, thereby reducing the data transmission for parity block update. It can be seen that the method for extending the erasure code storage system provided by the present invention can reduce the data transmission for updating the check block, thereby improving the expansion efficiency.
介绍完本发明实施例的设计思想之后,下面对本发明实施例中的扩展纠删码存储系统的技术方案适用的应用场景做一些简单介绍,需要说明的是,本发明实施例描述的应用场景是为了更加清楚的说明本发明实施例的技术方案,并不构成对于本发明实施例提供的技术方案的限定,本领域普通技术人员可知,随着新应用场景的出现,本发明实施例提供的技术方案对于类似的技术问题,同样适用。After introducing the design ideas of the embodiments of the present invention, the following briefly introduces the application scenarios applicable to the technical solution of the extended erasure code storage system in the embodiments of the present invention. It should be noted that the application scenarios described in the embodiments of the present invention are In order to illustrate the technical solutions of the embodiments of the present invention more clearly, it does not constitute a limitation to the technical solutions provided by the embodiments of the present invention. Those of ordinary skill in the art know that with the emergence of new application scenarios, the technical solutions provided by the embodiments of the present invention The scheme is also applicable to similar technical problems.
本发明实施例中提供的可以适用于大多数需要进行存储扩展功能的存储系统。其中,存储系统例如是业务订单存储系统,或者是交易数据存储系统,等等。参阅图3所示,为本发明实施例提供的一种场景示意图。在该场景示意图中,包括多个分别部署有代理节点的电子设备301和部署有全局协调器的元数据服务器302,电子设备301可以通过网络303与全局协调器的元数据服务器302进行通信,例如通过有线或无线通信方式进行直接或间接地连接,本发明不做限制。其中,电子设备301-1、电子设备301-2、……、电子设备301-n可以被不同的代理节点部署。The information provided in the embodiments of the present invention can be applied to most storage systems that need to perform storage expansion functions. Wherein, the storage system is, for example, a business order storage system, or a transaction data storage system, and so on. Referring to FIG. 3 , it is a schematic diagram of a scene provided by an embodiment of the present invention. In the schematic diagram of the scene, it includes a plurality of electronic devices 301 deployed with proxy nodes and a metadata server 302 deployed with a global coordinator, and the electronic devices 301 can communicate with the metadata server 302 of the global coordinator through a network 303, for example The direct or indirect connection is performed through wired or wireless communication, which is not limited in the present invention. Wherein, the electronic device 301-1, the electronic device 301-2, ..., the electronic device 301-n may be deployed by different proxy nodes.
在本发明实施例中,电子设备301例如可以服务器,但并不局限于此。其中,每个电子设备301均可以包括一个或多个处理器3011、存储器3012以及与其它服务器交互的I/O接口3013等。In this embodiment of the present invention, the electronic device 301 may be, for example, a server, but it is not limited thereto. Wherein, each electronic device 301 may include one or more processors 3011, memory 3012, and an I/O interface 3013 for interacting with other servers, and the like.
在本发明实施例中,部署有全局协调器的元数据服务器302可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。In the embodiment of the present invention, the metadata server 302 deployed with the global coordinator can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, Cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network, content distribution network), and cloud of basic cloud computing services such as big data and artificial intelligence platforms server.
在该场景中,部署有全局协调器的元数据服务器302负责管理条带的元数据,此外,还可以发布每一轮传输任务到各个电子设备301上,执行数据块迁移或者校验块更新的操作。当各个电子设备301将此次确认信号传递给署有全局协调器的元数据服务器302,署有全局协调器的元数据服务器302可以执行将下一轮的传输命令发送到各个电子设备301上。In this scenario, the metadata server 302 deployed with the global coordinator is responsible for managing the metadata of the stripes. In addition, each round of transmission tasks can be issued to each electronic device 301 to perform data block migration or check block update. operate. When each electronic device 301 transmits the confirmation signal to the metadata server 302 equipped with the global coordinator, the metadata server 302 equipped with the global coordinator can execute and send the next round of transmission commands to each electronic device 301 .
在该场景中,各个电子设备301需要接收协调器发送的传输命令,解析传输命令后执行传输命令的任务内容。具体的,各个电子设备301将需要发送的数据块或校验块发送到对应电子设备301后,该电子设备301会发送确认信号到部署有全局协调器的元数据服务器302上,告知部署有全局协调器的元数据服务器302发送完成,从而可以准备执行下一轮传输命令。In this scenario, each electronic device 301 needs to receive the transmission command sent by the coordinator, analyze the transmission command and execute the task content of the transmission command. Specifically, after each electronic device 301 sends the data block or check block that needs to be sent to the corresponding electronic device 301, the electronic device 301 will send a confirmation signal to the metadata server 302 deployed with the global coordinator to inform the deployment of the global The metadata server 302 of the coordinator has finished sending, so that the next round of transmission commands can be prepared for execution.
参阅图4所示,为本发明实施例提供的纠删码存储系统的架构示意图。其中,元数据服务器可以下发将现存节点中的代理节点中的校验块更新的命令,以及下发将现存节点中的代理节点的数据块,迁移至新增节点中的代理节点中的命令。Referring to FIG. 4 , it is a schematic structural diagram of an erasure code storage system provided by an embodiment of the present invention. Among them, the metadata server can issue a command to update the check block in the proxy node in the existing node, and issue a command to migrate the data block of the proxy node in the existing node to the proxy node in the new node .
当然,本发明实施例提供的方法并不限用于图1所示的应用场景中,还可以用于其他可能的应用场景,本发明实施例并不进行限制。Certainly, the method provided by the embodiment of the present invention is not limited to the application scenario shown in FIG. 1 , and may also be used in other possible application scenarios, which is not limited by the embodiment of the present invention.
为进一步说明本发明实施例提供的扩展纠删码存储系统的方法的方案,下面结合附图以及具体实施方式对此进行详细的说明。虽然本发明实施例提供了如下述实施例或附图所示的方法操作步骤,但基于常规或者无需创造性的劳动在所述方法中可以包括更多或者更少的操作步骤。在逻辑上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本发明实施例提供的执行顺序。所述方法在实际的处理过程中或者装置执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的应用环境)。In order to further illustrate the scheme of the method for extending the erasure correction code storage system provided by the embodiment of the present invention, it will be described in detail below in conjunction with the accompanying drawings and specific implementation methods. Although the embodiments of the present invention provide method operation steps as shown in the following embodiments or drawings, more or less operation steps may be included in the method based on routine or no creative effort. In the steps that logically do not have a necessary causal relationship, the execution order of these steps is not limited to the execution order provided in the embodiment of the present invention. The method can be executed sequentially or in parallel according to the methods shown in the embodiments or drawings during the actual processing process or when the device is executed (for example, a parallel processor or an application environment for multi-thread processing).
以下结合图5所示的方法流程图对本发明实施例中扩展纠删码存储系统的方法进行说明,下面对本发明实施例的方法流程进行介绍。The method for extending the erasure code storage system in the embodiment of the present invention will be described below with reference to the method flowchart shown in FIG. 5 , and the method flow in the embodiment of the present invention will be introduced below.
参阅图5所示,为本发明实施例提供的扩展纠删码存储系统的方法的实施流程图,该方法可以由元数据服务器执行,具体实施流程如下:Referring to FIG. 5 , it is an implementation flowchart of a method for extending an erasure code storage system provided by an embodiment of the present invention. The method can be executed by a metadata server, and the specific implementation process is as follows:
步骤501:确定存储系统中的数据,对数据进行编码,并将数据分散存储在各个节点,获得各个节点的空间位置分布信息。Step 501: Determine the data in the storage system, encode the data, and store the data scattered in each node, and obtain the spatial location distribution information of each node.
在本发明实施例中,元数据服务器可以根据存储系统的可靠性需求和存储开销限制,选择满足存储系统容错需求和存储效率的RS类纠删码,并将该RS类纠删码作为存储系统中的数据。In the embodiment of the present invention, the metadata server can select an RS type erasure code that meets the storage system's fault tolerance requirements and storage efficiency according to the storage system's reliability requirements and storage overhead limitations, and use the RS type erasure code as the storage system data in .
在本发明实施例中,元数据服务器可以将数据划分为K个大小相同的数据块;其中,K为大于1的正整数。然后,可以将K个数据块与预设编码矩阵做域内矩阵运算,获得M个校验块。其中,M为大于1且小于K的正整数。此外,K个数据块与M个校验块可以构成多个条带。In the embodiment of the present invention, the metadata server may divide the data into K data blocks of the same size; wherein, K is a positive integer greater than 1. Then, an intra-domain matrix operation can be performed on the K data blocks and the preset encoding matrix to obtain M check blocks. Wherein, M is a positive integer greater than 1 and less than K. In addition, K data blocks and M parity blocks may form multiple stripes.
进一步地,可以将同一条带上的数据块和校验块分散在不同的K+M个节点上,确定K个数据块与M个校验块在各个节点的分布信息,并基于分布信息获得空间位置分布信息。Furthermore, the data blocks and check blocks on the same strip can be dispersed on different K+M nodes, the distribution information of K data blocks and M check blocks on each node can be determined, and based on the distribution information, the Spatial location distribution information.
在本发明实施例中,RS类纠删码的参数包括三个参数,该三个参数例如用K、M、W表示,其中,K表示RS类纠删码有K个数据块、M表示RS类纠删码有M个校验块,以及W用于表征RS类纠删码对应的位数;其中,W一般可以取值为:4、8、16、32。在本发明实施例中,后文中以w=8为例进行说明。In the embodiment of the present invention, the parameters of the RS-type erasure code include three parameters, and the three parameters are represented by K, M, and W, for example, where K means that the RS-type erasure code has K data blocks, and M means that the RS The type erasure code has M check blocks, and W is used to represent the number of bits corresponding to the RS type erasure code; wherein, W can generally take values: 4, 8, 16, 32. In the embodiment of the present invention, w=8 is taken as an example for description in the following.
在本发明实施例中,元数据服务器可以根据RS类纠删码的参数和预设编码矩阵,获得校验块。具体的,元数据服务器可以将上述K个数据块通过与生成的限定在伽罗华域内的预设编码矩阵做域内矩阵运算,从而可以得到M个校验块。示例性的,可以通过数据块和预设编码矩阵的数进行按位运算,获得校验块。其中,前述的预设编码矩阵可以为范德蒙德矩阵或者是柯西矩阵,本发明实施例中对此不做限制。In the embodiment of the present invention, the metadata server can obtain the check block according to the parameters of the RS type erasure code and the preset coding matrix. Specifically, the metadata server may perform an intra-domain matrix operation on the above K data blocks and the generated preset encoding matrix limited in the Galois field, so as to obtain M check blocks. Exemplarily, the check block can be obtained by performing a bitwise operation on the data block and the number of the preset coding matrix. Wherein, the foregoing preset encoding matrix may be a Vandermonde matrix or a Cauchy matrix, which is not limited in this embodiment of the present invention.
例如,请参阅图6所示,图6为本发明实施例中提供的对RS类纠删码进行编码的过程示意图。其中,元数据服务器可以基于单位矩阵和生成矩阵确定编码矩阵,并将该编码矩阵确定为预设编码矩阵,然后将预设编码矩阵和k个数据块相乘,获得m个校验块,从而可以将k个数据块和m个数据块存储到k+m个节点上。For example, please refer to FIG. 6 , which is a schematic diagram of a process of encoding an RS-type erasure code provided in an embodiment of the present invention. Wherein, the metadata server may determine the encoding matrix based on the identity matrix and the generating matrix, and determine the encoding matrix as a preset encoding matrix, and then multiply the preset encoding matrix and k data blocks to obtain m check blocks, thereby It is possible to store k data blocks and m data blocks on k+m nodes.
在本发明实施例中,根据纠删码的参数和预设编码矩阵,将K个数据块进行编码操作生成M个对应的校验块,用二元组(K,M)表示。In the embodiment of the present invention, according to the parameters of the erasure correction code and the preset encoding matrix, K data blocks are encoded to generate M corresponding check blocks, represented by a tuple (K, M).
在本发明实施例中,当确定数据块和与该数据块具有编码关系的校验块之后,可以基于定数据块和与该数据块具有编码关系的校验块,确定条带。进一步地,可以将同个条带的数据块和校验块分散存储在不同的节点中。In the embodiment of the present invention, after the data block and the check block having the encoding relationship with the data block are determined, the stripe may be determined based on the given data block and the check block having the encoding relationship with the data block. Furthermore, the data blocks and check blocks of the same stripe can be distributed and stored in different nodes.
具体的,本发明实施例中对将同一条带上的数据块和校验块分散在不同的K+M个节点上的空间分布方案设置如下:在前K+1个节点上存储1个校验块和K个数据块;其中,校验块的位置在K+1个条带上呈对角线排布;在后M-1个节点上存储除1个校验块之外的校验块。Specifically, in the embodiment of the present invention, the spatial distribution scheme for distributing data blocks and check blocks on the same strip on different K+M nodes is set as follows: store one checksum on the first K+1 nodes check blocks and K data blocks; among them, the positions of the check blocks are arranged diagonally on K+1 strips; the check blocks other than 1 check block are stored on the last M-1 nodes piece.
在本发明实施例中,元数据服务器可以根据数据块和校验块在不同节点的分布情况,确定空间位置分布信息。具体的,空间位置分布信息可以理解为数据块位于条带和节点的位置信息,以及校验块位于条带和节点的位置信息。In the embodiment of the present invention, the metadata server may determine spatial location distribution information according to the distribution of data blocks and check blocks on different nodes. Specifically, the spatial location distribution information can be understood as location information of data blocks located in stripes and nodes, and location information of check blocks located in stripes and nodes.
例如,请参阅图7所示,图7为本发明实施例中提供的RS类纠删码的参数为(2,2)和RS类纠删码的参数为(3,2)的示意图。其中,S用于表征条带,N用于表征节点,D用于表征数据块,P用于表征校验块。基于图7,可以明确知晓本发明实施例数据块,以及与数据块对应的校验块的存储分布。For example, please refer to FIG. 7 . FIG. 7 is a schematic diagram of parameters of RS-type erasure codes (2, 2) and parameters of RS-type erasure codes (3, 2) provided in the embodiment of the present invention. Among them, S is used to represent a stripe, N is used to represent a node, D is used to represent a data block, and P is used to represent a check block. Based on FIG. 7 , the storage distribution of the data blocks and the check blocks corresponding to the data blocks in the embodiment of the present invention can be clearly known.
步骤502:基于扩展需求信息,确定每个条带上新增的节点个数,并基于新增的节点 个数和空间位置分布信息,确定每个条带上的扩展节点信息;条带包括具有编码关系的数据块和校验块。Step 502: Determine the number of newly added nodes on each stripe based on the extended demand information, and determine the extended node information on each stripe based on the number of newly added nodes and spatial location distribution information; the stripe includes The data block and check block of the encoding relationship.
在本发明实施例中,基于空间位置分布信息,确定每个条带上存储数据块的第一节点个数,以及每个条带上存储校验块的第二节点个数;将第一节点个数和新增的节点个数相加,获得第三节点个数,将第三节点个数,作为每个条带上扩展后的存储数据块的个数;以及,将第二节点个数,作为每个条带上扩展后的存储校验块的个数,以确定每个条带上的扩展节点信息。In the embodiment of the present invention, based on the spatial location distribution information, determine the first number of nodes storing data blocks on each stripe, and the second number of nodes storing check blocks on each stripe; Add the number and the number of newly added nodes to obtain the number of the third node, and use the number of the third node as the number of expanded storage data blocks on each stripe; and, use the number of the second node , as the number of extended storage check blocks on each stripe, so as to determine the extended node information on each stripe.
示例性的,根据系统调整可靠性和条带长度需求,确定新增的节点个数为D,每个条带上存储数据块的第一节点个数为K,以及每个条带上存储校验块的第二节点个数为M,从而可以确定每个条带上扩展后的存储数据块的个数为K+D,以及确定每个条带上扩展后的存储校验块的个数为M。Exemplarily, according to system adjustment reliability and stripe length requirements, it is determined that the number of newly added nodes is D, the number of first nodes storing data blocks on each stripe is K, and the number of nodes storing data blocks on each stripe is The number of second nodes of the verification block is M, so that the number of expanded storage data blocks on each stripe can be determined as K+D, and the number of expanded storage verification blocks on each stripe can be determined for M.
步骤503:基于扩展节点信息和最小公倍数规则,确定扩展组,并对扩展组进行拆分处理,获得包括多个被选择的条带的目标组;扩展组是由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的。Step 503: Determine the expansion group based on the expansion node information and the least common multiple rule, and split the expansion group to obtain a target group including multiple selected stripes; It is composed of multiple strips under the condition that the distribution law does not change.
在本发明实施例中,元数据服务器可以基于扩展节点信息和最小公倍数规则,确定扩展组,其中,扩展组是由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的,且扩展组包括V个扩展条带。可见,V个扩展条带可完成扩展需求且空间位置分布规律不变的条件。In the embodiment of the present invention, the metadata server can determine the extension group based on the extension node information and the least common multiple rule, where the extension group is formed by a plurality of strips that meet the expansion requirements and the spatial position distribution law is unchanged. formed, and the extension group includes V extension stripes. It can be seen that V expansion strips can fulfill the condition of expansion requirements and the spatial location distribution law remains unchanged.
具体的,最小公倍数规则采用以下公式确定:Specifically, the least common multiple rule is determined by the following formula:
V=LCM(K,K+D+1)(K+D)(K+1)/KV=LCM(K, K+D+1)(K+D)(K+1)/K
其中,所述LCM()用于表征求取最小公倍数的函数,K用于表征每个条带上的扩展前的存储数据块的节点个数;D用于表征新增节点个数。Wherein, the LCM() is used to represent a function for obtaining the least common multiple, K is used to represent the number of nodes storing data blocks before extension on each stripe; D is used to represent the number of newly added nodes.
在本发明实施例中,元数据服务器确定V个扩展条带之后,可以对V个扩展条带进行拆分,确定P个基本组和R个调整组;每个基本组包括Vp个基本条带,每个调整组中包括Vr个调整条带;P和R为大于1的正整数。In the embodiment of the present invention, after the metadata server determines the V extended strips, it can split the V extended strips to determine P basic groups and R adjustment groups; each basic group includes Vp basic strips , each adjustment group includes Vr adjustment strips; P and R are positive integers greater than 1.
在本发明实施例中,元数据服务器可以根据V个扩展条带中数据块和校验块承担的功能不同,将V个条带表示为两个种类的组,即前述的基本组和调整组。其中,基本组中条带的校验块需要根据新增节点上的数据块进行更新,调整组中条带的数据块发送到基本组的条带中。In the embodiment of the present invention, the metadata server can represent the V stripes as two types of groups according to the different functions of the data blocks and check blocks in the V extended stripes, namely the aforementioned basic group and adjustment group . Among them, the check blocks of the stripes in the basic group need to be updated according to the data blocks on the newly added nodes, and the data blocks of the stripes in the adjustment group are sent to the stripes of the basic group.
具体的,Vp个基本条带和Vr个调整条带需要满足等式:Vp:Vr=K:D,从而可以基于以下公式确定Vp和Vr:Specifically, Vp basic strips and Vr adjustment strips need to satisfy the equation: Vp:Vr=K:D, so that Vp and Vr can be determined based on the following formula:
Vp=LCM(K,K+D+1)(K+1);Vr=LCM(K,K+D+1)D(K+1)/K。Vp=LCM(K, K+D+1)(K+1); Vr=LCM(K, K+D+1)D(K+1)/K.
在本发明实施例中,在扩展过程中,可以将调整组的数据块对应传输到具有对应关系的条基本组的条带中元数据服务器。具体的,可以按照条带顺序,确定基本组中K(K+1)个基本条带与调整组中D(K+1)个条带对应。因此对应关系可以表示为:基本组中:{(i-1)K(K+1)+1,(i-1)K(K+1)+2,…,iK(K+1)},这K(K+1)个条带,与调整组中:{(i-1)D(K+1)+1,(i-1)D(K+1)+2,…,iD(K+1)}这D(K+1)个条带具备对应关系。其中,0<i<LCM(K,K+D+1)/K。可见,每V个条带中有LCM(K,K+D+1)/K对具备对应关系的条带。In the embodiment of the present invention, during the expansion process, the data blocks of the adjustment group may be correspondingly transmitted to the metadata server in the stripe of the stripe basic group having a corresponding relationship. Specifically, according to the order of the stripes, it may be determined that the K (K+1) basic stripes in the basic group correspond to the D(K+1) stripes in the adjustment group. Therefore, the corresponding relationship can be expressed as: in the basic group: {(i-1)K(K+1)+1, (i-1)K(K+1)+2, ..., iK(K+1)}, These K(K+1) strips, in the adjustment group: {(i-1)D(K+1)+1, (i-1)D(K+1)+2, ..., iD(K +1)} These D(K+1) strips have a corresponding relationship. Wherein, 0<i<LCM(K, K+D+1)/K. It can be seen that there are LCM(K, K+D+1)/K pairs of strips with a corresponding relationship among each V strips.
在本发明实施例中,当确定P个基本组和R个调整组和各自对应的条带之后,可以从基本组中的K(K+1)个条带,按照条带顺序,每个组选择K个基本条带。以及,从与K(K+1)个 条带具备对应关系的调整组中的D(K+1)个条带,按照条带顺序,间隔K个调整条带连续选择D个调整条带。因此,可以将上述(K+D)(K+1)个条带,组成(K+1)个小组,从而基于K个基本条带和D个调整条带确定目标组;其中,每个目标组中包括K+D个条带。In the embodiment of the present invention, after determining the P basic groups and R adjustment groups and their respective corresponding strips, from the K (K+1) strips in the basic group, according to the strip sequence, each group Select K basic strips. And, from the D(K+1) strips in the adjustment group corresponding to the K(K+1) strips, D adjustment strips are continuously selected at intervals of K adjustment strips according to the order of the strips. Therefore, the above (K+D)(K+1) strips can be combined into (K+1) groups, so as to determine the target group based on K basic strips and D adjustment strips; wherein, each target A group includes K+D strips.
可见,任一目标组包括:独立地从基本组中的K(K+1)个条带中选择{(i-1)K+1,(i-1)K+2,…,iK}的K个基本条带,以及和其具有对应关系的调整组中的{D(K+1)-i,(D-1)(K+1)-i,…,(K+1)-i}个条带中选择的D个调整条带。It can be seen that any target group includes: independently select {(i-1)K+1, (i-1)K+2,...,iK} from the K(K+1) stripes in the basic group K basic strips, and {D(K+1)-i, (D-1)(K+1)-i, ..., (K+1)-i} in the adjustment group corresponding to them D adjustment strips selected from strips.
在本发明实施例中,当确定目标组之后,对每个目标组执行步骤504:对目标组执行扩展算法,获得对应的目标扩展组,目标扩展组包括扩展数据块和扩展校验块。In the embodiment of the present invention, after the target group is determined, perform step 504 for each target group: execute the expansion algorithm on the target group to obtain the corresponding target expansion group, and the target expansion group includes the expansion data block and the expansion check block.
具体的,确定对应的目标扩展组可以采用但不限于以下步骤:Specifically, determining the corresponding target extension group may adopt but not limited to the following steps:
步骤a:对任一目标组中的K+D个条带编号,并对存储系统扩展后的K+M+D个节点编号。Step a: Number K+D stripes in any target group, and number K+M+D nodes after the storage system is expanded.
在本发明实施例中,对任一目标组中的K+D个条带可以编号为:{1,2,……,K+D}。以及,可以对K+M+D个节点编号为:{1,2,……,K+M+D}。In this embodiment of the present invention, the K+D slices in any target group may be numbered as: {1, 2, ..., K+D}. And, the K+M+D nodes can be numbered as: {1, 2, . . . , K+M+D}.
步骤b:计算前K+1个节点中调整条带上数据块的差异校验块,并基于差异校验块更新同一节点上基本条带的第一个校验块。Step b: Calculate the difference check blocks of the data blocks on the adjustment stripe in the first K+1 nodes, and update the first check block of the basic stripe on the same node based on the difference check blocks.
在本发明实施例中,首先可以更新前K个条带的第一个校验块,具体的,可以以计算差异校验块的方式更新校验块。具体的,可以计算前K+1个节点中,{K+1,K+2,…,K+D}调整条带上数据块的差异校验块,并基于异校验块更新同一节点上基本条带的第一个校验块。In the embodiment of the present invention, first, the first check blocks of the first K stripes may be updated. Specifically, the check blocks may be updated in a manner of calculating difference check blocks. Specifically, among the first K+1 nodes, {K+1, K+2,...,K+D} can be calculated to adjust the difference check blocks of the data blocks on the stripe, and update the difference check blocks on the same node based on the difference check blocks The first checksum of the basic stripe.
步骤c:按照轮循模式将调整条带上数据块,传输到同一节点上的基本条带中,获得拓展后的初始扩展组。Step c: Transfer the data blocks on the adjusted stripe to the basic stripe on the same node according to the round robin mode, and obtain the expanded initial extended group.
在本发明实施例中,针对数据块迁移部分,可以将按照轮循的方式传输{K+1,K+2,…,K+D}调整条带上数据块到{1,2,…,K}基本条带上。具体的,每一轮按照顺序从K个存储数据块的节点中选择D个节点上的D个数据块,第i轮选择{i,i+1,…,i+D-1}的D个节点,对应传输{K+1,K+2,…,K+D}这D个调整条带上的D个数据块到D个新增节点上,当i+D-1>K时,从第1个存储数据块节点开始,继续选择节点。In the embodiment of the present invention, for the data block migration part, it is possible to transmit {K+1, K+2, ..., K+D} in a round-robin manner to adjust the data blocks on the stripe to {1, 2, ..., K} on the basic strip. Specifically, each round selects D data blocks on D nodes from K nodes that store data blocks in sequence, and selects D data blocks from {i,i+1,...,i+D-1} in the i-th round Nodes, correspondingly transmit {K+1, K+2, ..., K+D} D data blocks on D adjustment stripes to D new nodes. When i+D-1>K, from Start with the first storage data block node and continue to select nodes.
步骤d:对初始扩展组执行预设操作,获得对应的目标扩展组。Step d: Execute preset operations on the initial expansion group to obtain the corresponding target expansion group.
在本发明实施例中,通过全局计数器的方式,统计数据块传输到新节点的个数,每发送(K+1)D个数据块到新增节点上后,将后续D个基本条带的第1个校验块和对应节点调整条带的校验块进行逻辑位置替换。In the embodiment of the present invention, the number of data blocks transmitted to the new node is counted by means of a global counter, and after sending (K+1)D data blocks to the newly added node, the subsequent D basic stripes The logical position of the first check block and the check block of the corresponding node adjustment stripe is replaced.
具体的,前述过程可以表示为:在第i个进行逻辑位置替换的基本条带中,将该条带的第1个校验块和对应节点调整条带的第i个数据块进行位置替换,替换后同样执行上述数据块迁移算法。其中,i的取值范围为大于0小于D。Specifically, the foregoing process can be expressed as: in the i-th basic stripe for logical location replacement, perform location replacement on the first check block of the stripe and the i-th data block of the corresponding node adjustment stripe, After the replacement, the above data block migration algorithm is also executed. Wherein, the value range of i is greater than 0 and less than D.
可知,在经过K轮后结束,在执行数据块迁移部分算法时,原本存储数据块的K个节点每一轮只有D个节点被占用,剩下(K-D)个节点闲置。It can be seen that after K rounds, when executing part of the data block migration algorithm, only D nodes are occupied in each round of K nodes that originally stored data blocks, and the remaining (K-D) nodes are idle.
在本发明实施例中,对基带条带中除第1个校验块外的(M-1)个校验块的更新,可以执行以下操作:In the embodiment of the present invention, the following operations can be performed for updating (M-1) check blocks except the first check block in the baseband strip:
Step1:调整条带中有M个节点存储校验块,其中(M-1)个节点与基本条带中只存储校验块的节点重叠。每一轮从这(M-1)个节点中传输(M-1)个校验块的线性组合到其他(M-1)节点。Step1: There are M nodes storing check blocks in the adjustment stripe, among which (M-1) nodes overlap with nodes that only store check blocks in the basic stripe. Each round transmits a linear combination of (M-1) check blocks from these (M-1) nodes to other (M-1) nodes.
具体的,该过程可以表示为:第i轮中,这(M-1)个节点传输校验块的线性组合到与各自间隔i个位置的节点上,当节点的位置超过(M-1)时,从第1个节点开始,继续选择节点传输,可知经过(M-2)轮后结束,其中,i的取值范围为大于0且小于M-2。Specifically, the process can be expressed as: In the i-th round, the (M-1) nodes transmit the linear combination of the check blocks to the nodes that are i positions away from each other. When the position of the node exceeds (M-1) When , starting from the first node, continue to select nodes for transmission, it can be seen that it will end after (M-2) rounds, where the value range of i is greater than 0 and less than M-2.
Step2:调整条带中有1个存储校验块节点和(K-M)个存储数据块节点与基本条带中剩余(M-1)个存储校验块节点不重叠。每一轮从这(K-M+1)节点中选择(M-1)个节点,传输数据块的线性组合或校验块的线性组合到基本条带中剩余的(M-1)个存储校验块节点。Step2: One storage check block node and (K-M) storage data block nodes in the adjustment stripe do not overlap with the remaining (M-1) storage check block nodes in the basic stripe. Each round selects (M-1) nodes from these (K-M+1) nodes, and transmits a linear combination of data blocks or a linear combination of check blocks to the remaining (M-1) storage in the basic stripe Check block node.
具体的,该过程可表示为:第i轮中,从这(K-M+1)节点中选择{i,i+1,…,i+M-1}节点传输数据块的线性组合或校验块的线性组合到对应{1,2,…,M-1}基本条带存储校验块的节点,可知,(K-M+1)轮后结束。其中,i的取值范围为大于0小于K-M+1。Specifically, the process can be expressed as: in the i-th round, select {i,i+1,...,i+M-1} nodes from the (K-M+1) nodes to transmit a linear combination or collation of data blocks The linear combination of check blocks is stored in nodes corresponding to {1,2,...,M-1} basic strips, and it can be seen that (K-M+1) rounds will end. Wherein, the value range of i is greater than 0 and less than K-M+1.
综上经过(K-1)轮后,更新校验块操作收集需要的数据块的线性组合和校验块的线性组合完成,可以通过本发明设计的更新算法完成更新校验块的操作。In summary, after (K-1) rounds, the linear combination of data blocks and check blocks required for the operation of updating check blocks is completed, and the operation of updating check blocks can be completed through the update algorithm designed in the present invention.
需要说明的是,在本发明实施例中,前述对校验块更新部分的限制条件为:数据块迁移中每一轮需要占用D个存储数据块节点和更新校验块的Step2中最大需要(M-1)个存储数据块节点不能重叠,即要求满足不等式:K大于等于D+M-1。It should be noted that, in the embodiment of the present invention, the aforementioned restriction on the check block update part is: each round of data block migration needs to occupy D nodes for storing data blocks and the maximum requirement in Step 2 of updating the check block ( M-1) storage data block nodes cannot overlap, that is, it is required to satisfy the inequality: K is greater than or equal to D+M-1.
在本发明实施例中,需要限制每个节点发送和接收是全双工工作,也就是在每一轮中,每个节点只能同时接收和发送一个块,需要基于预设算法达到在每一轮内最大化节点的使用率。在实际实施过程中,数据中心管理人员在确定新增节点个数时,需要确定参数K,M,D满足这个限制条件。In the embodiment of the present invention, it is necessary to limit the sending and receiving of each node to full-duplex work, that is, in each round, each node can only receive and send one block at the same time, which needs to be achieved based on a preset algorithm in each block. Maximize the utilization of nodes within a round. In the actual implementation process, when the data center management personnel determine the number of new nodes, they need to determine that the parameters K, M, and D satisfy this restriction.
在本发明实施例中,基本条带中后(M-1)个只存储校验块的节点,在接收完毕来自其他节点的(K-M)个数据块线性组合和(M-1)个校验块线性组合以及自身节点上计算得到的调整条带上校验块的线性组合,可以通过纠删码解码的算法,计算得到基本条带上后(M-1)个校验块对应的差异校验块,再将基本条带的校验块和计算得到的差异校验块进行异或操作,可以计算得到更新后的校验块即扩展校验块。In the embodiment of the present invention, the last (M-1) nodes in the basic stripe that only store check blocks, after receiving the linear combination of (K-M) data blocks and (M-1) check blocks from other nodes, The linear combination of blocks and the linear combination of the check blocks on the adjustment strip calculated on the own node can calculate the difference correction corresponding to the last (M-1) check blocks on the basic strip through the algorithm of erasure code decoding. check block, and then XOR the check block of the basic stripe and the calculated difference check block, and the updated check block, that is, the extended check block, can be calculated.
在本发明实施例中,当一个扩展组的条带都被更新完成后,可以获得对应的目标扩展组。此外,还可以按照空间分布信息,调整条带的逻辑关系顺序以满足扩展前的空间整体分布方案。这样,当在存储系统执行下一次扩展时,不需要调整空间分布带来不必要的开销。In the embodiment of the present invention, after all the stripes of an extended group are updated, the corresponding target extended group can be obtained. In addition, according to the spatial distribution information, the logical relationship sequence of the stripes can be adjusted to meet the overall spatial distribution scheme before expansion. In this way, when the next expansion is performed on the storage system, there is no need to adjust the space distribution to bring unnecessary overhead.
在本发明实施例中,请参阅图8,图8为本发明提供的RS(2,1,4)的校验块更新与数据块重定位并行性算法的示意图。以及请参阅图9,图9为本发明实施例提供的扩展RS(2,1,4)的过程示意图。In the embodiment of the present invention, please refer to FIG. 8 . FIG. 8 is a schematic diagram of a parallel algorithm for parity block update and data block relocation of RS(2,1,4) provided by the present invention. And please refer to FIG. 9 , which is a schematic diagram of a process of extending RS (2, 1, 4) provided by an embodiment of the present invention.
在本发明实施例中,还可以确定目标扩展组对应的条带的逻辑关系,以及各个扩展数据块和扩展校验块对应的第一空间分布信息。然后,按照空间分布信息,调整逻辑关系顺序,以使第一空间分布信息与空间分布信息的逻辑布局相同。这样,当在存储系统执行下一次扩展时,不需要调整空间分布带来不必要的开销。In the embodiment of the present invention, the logical relationship of the stripes corresponding to the target extended group, and the first spatial distribution information corresponding to each extended data block and extended parity block may also be determined. Then, according to the spatial distribution information, the order of the logical relationship is adjusted so that the logical layout of the first spatial distribution information is the same as that of the spatial distribution information. In this way, when the next expansion is performed on the storage system, there is no need to adjust the space distribution to bring unnecessary overhead.
在本发明实施例中,当所有基本组的条带都完成了扩展操作,还可以删除存储系统中调整组的所有数据块和校验块。这样,可以尽量减少资源的浪费和消耗。In the embodiment of the present invention, when all stripes of the basic group have completed the expansion operation, all data blocks and parity blocks of the adjustment group in the storage system may also be deleted. In this way, the waste and consumption of resources can be minimized.
可见,本发明实施例提供的一种扩展纠删码存储系统的方法,扩展过程的输入/输出即I/O开销小,即减少扩展过程中需要读取、写入的数据量以及网络中传输的数据量。以及,扩展过程的时间延迟短,在I/O开销小和全双工通信的基础上,全新的更新校验块算法增加 存储系统中的可利用带宽资源,通过并行执行调度扩展算法,降低扩展过程的时间延迟。此外,还可以支持连续扩展,即在单次扩展过程结束后,本发明的整体空间分布情况与扩展前一致,因此,在存储系统执行下一次扩展时,不需要调整空间分布带来不必要的开销。It can be seen that in the method for expanding the erasure code storage system provided by the embodiment of the present invention, the input/output of the expansion process, that is, the I/O overhead is small, that is, the amount of data that needs to be read and written in the expansion process and the amount of data transmitted in the network are reduced. amount of data. And, the time delay of the expansion process is short. On the basis of small I/O overhead and full-duplex communication, the new update check block algorithm increases the available bandwidth resources in the storage system, and reduces the expansion by executing the scheduling expansion algorithm in parallel. The time delay of the process. In addition, it can also support continuous expansion, that is, after a single expansion process, the overall space distribution of the present invention is consistent with that before the expansion. Therefore, when the storage system performs the next expansion, there is no need to adjust the space distribution to bring unnecessary overhead.
在具体的实施过程中,对本发明实施例提出的方案进行了测试。具体的,本发明实施例从真实平台研究以及基于仿真实验这两种测试方式,对本发明实施例提出的方案进行了测试。In the specific implementation process, the solutions proposed in the embodiments of the present invention were tested. Specifically, in the embodiment of the present invention, the solution proposed in the embodiment of the present invention is tested from two test modes of real platform research and simulation experiment.
方式一:基于真实平台对本发明实施例提出的方案进行测试。Mode 1: Test the solution proposed in the embodiment of the present invention based on a real platform.
在本发明实施例中,具体的实验环境包括19台ecs.g6.large类型的虚拟服务器,每个虚拟服务器配置配备了2vCPU(2.5GHz Intel Xeon Platinum 8269CY)和8GB内存。和40GB存储,运行的操作系统为Ubuntu18.04。任意两个服务器间的网络带宽最大值约为3Gb/s。19个服务器中的1个服务器作为全局协调器,剩余的18个服务器为代理端运行本发明的服务端程序。其中,实验的默认设置为块大小为64MB,纠删码方案为RS(6,3)和RS(10,4),新增节点数量根据实验不同而变化。In the embodiment of the present invention, the specific experimental environment includes 19 virtual servers of type ecs.g6.large, and each virtual server is configured with 2vCPU (2.5GHz Intel Xeon Platinum 8269CY) and 8GB memory. And 40GB storage, the running operating system is Ubuntu18.04. The maximum network bandwidth between any two servers is about 3Gb/s. One of the 19 servers is used as the global coordinator, and the remaining 18 servers are agents running the server program of the present invention. Among them, the default setting of the experiment is that the block size is 64MB, the erasure code scheme is RS(6,3) and RS(10,4), and the number of new nodes varies according to different experiments.
具体的,每次实验重复多次测试,且测量的参数为测量扩展过程的时间消耗,即将所有块传输完毕到对应节点的时间,具体的,扩展时间定义为平均条带的扩展时间消耗,平均扩展时间消耗越短,说明扩展过程的效率越高。Specifically, each experiment is repeated multiple times, and the measured parameter is the time consumption of the expansion process, that is, the time for all blocks to be transmitted to the corresponding node. Specifically, the expansion time is defined as the expansion time consumption of the average stripe, and the average The shorter the expansion time consumption, the higher the efficiency of the expansion process.
此外,测试采用对比实验,比较Scale-RS和NCScale两种先进的纠删码存储系统扩展机制。在实际实施中,还可以在其他实验测试环境下进行测试或使用,本发明实施例对此不做限制。In addition, the test uses a comparative experiment to compare two advanced erasure code storage system expansion mechanisms, Scale-RS and NCScale. In actual implementation, it may also be tested or used in other experimental testing environments, which is not limited in this embodiment of the present invention.
在具体的实施过程中,可以测量网络带宽从1Gb/s变化到2Gb/s时的扩展时间,具体的,测试结果如图10所示。请参阅图10,在三种扩展机制中,本发明实施例提供的方法需要的扩展流量最少,并且相对其他两种机制提高了传输的并行性。总体上,当网络带宽为1Gb/s时,本发明与Scale-RS和NCScale相比,平均减少了49.8%和58.9%。以及,当网络带宽增加到2Gb/s时,分别平均减少了50.8%和58.8%。In a specific implementation process, the expansion time when the network bandwidth changes from 1 Gb/s to 2 Gb/s can be measured. Specifically, the test results are shown in FIG. 10 . Referring to FIG. 10 , among the three extension mechanisms, the method provided by the embodiment of the present invention requires the least extension traffic, and improves the parallelism of transmission compared with the other two mechanisms. Overall, when the network bandwidth is 1Gb/s, compared with Scale-RS and NCScale, the present invention reduces the average by 49.8% and 58.9%. And, when the network bandwidth increases to 2Gb/s, the average reduction is 50.8% and 58.8% respectively.
显然,当带宽增加时,本发明的平均扩展时间少于Scale-RS和NCScale,可见,本发明的提供方法的拓展性能优于Scale-RS和NCScale。Apparently, when the bandwidth increases, the average expansion time of the present invention is shorter than Scale-RS and NCScale. It can be seen that the expansion performance of the providing method of the present invention is better than Scale-RS and NCScale.
在具体的实施过程中,还可以测试研究了不同块大小,例如从32MB到64MB下的扩展时间。此次测试过程中,可以将网络带宽设置为3Gb/s。请参阅图11所示,扩展时间随着块大小的增加而增加,且本发明提供的方法比Scale-RS和NCScale分别缩短了49.1-53.0%和24.1-76.9%的缩放时间。并且,可以看到本发明提供的方法和Scale-RS在连续的扩展过程中都取得了相当稳定的性能,而NCScale的缩放时间在第二次扩展操作,即(8,3,10)扩展过程中显著增加。In the specific implementation process, you can also test and study the expansion time under different block sizes, for example, from 32MB to 64MB. During this test, the network bandwidth can be set to 3Gb/s. Please refer to Fig. 11, the expansion time increases with the increase of the block size, and the method provided by the present invention shortens the scaling time by 49.1-53.0% and 24.1-76.9% respectively compared with Scale-RS and NCScale. Moreover, it can be seen that the method provided by the present invention and Scale-RS have achieved quite stable performance in the continuous expansion process, while the scaling time of NCScale is in the second expansion operation, that is, the (8,3,10) expansion process significantly increased in.
在具体的实施过程中,还可以测试研究新添加节点的数量(即前述的新增的节点的个数即参数D)对缩放时间的影响。具体的,可以将网络带宽固定为3Gb/s,并将研究参数D从2到3的情况。请参阅图12所示,在不同的新增节点数下,三种机制的平均扩展时间并没有受到非常显著的影响,最根本的原因是我们使所有的方法都具有传输并行性,以实现公平比较,这样新添加的D个节点就可以并行地接收迁移后的数据,本发明提供的方法将Scale-RS和NCScale机制的扩展时间分别降低了49.8-51.4%和23.6-76.3%,显著提升了扩展效率。In the specific implementation process, it is also possible to test and study the influence of the number of newly added nodes (that is, the number of newly added nodes, that is, the parameter D) on the scaling time. Specifically, the network bandwidth can be fixed at 3Gb/s, and the case where the parameter D is from 2 to 3 will be studied. Please refer to Figure 12, under different numbers of new nodes, the average expansion time of the three mechanisms has not been significantly affected. The most fundamental reason is that we enable all methods to have transmission parallelism to achieve fairness In comparison, the newly added D nodes can receive the migrated data in parallel, and the method provided by the present invention reduces the expansion time of the Scale-RS and NCScale mechanisms by 49.8-51.4% and 23.6-76.3% respectively, significantly improving Scale efficiency.
方式二:基于仿真测试对本发明实施例提出的方案进行测试。Mode 2: Test the solution proposed in the embodiment of the present invention based on a simulation test.
在具体的实施过程中,可以执行通用配置下的流量仿真试验。示例性的,请参阅图13所示,该测试为对不同扩展机制的逐次扩展流程的流量进行评估,且考虑RS(6,3)和RS(10,4)这两种情况,以及将参数d的值设为2。In a specific implementation process, a traffic simulation test under a general configuration may be performed. As an example, please refer to Figure 13. This test is to evaluate the flow of successive expansion processes of different expansion mechanisms, and consider the two situations of RS(6,3) and RS(10,4), and set the parameters The value of d is set to 2.
请继续参阅图13,可见,在不同扩展过程参数下,本发明提供的方案在连续扩展过程中表现良好,与Scale-RS相比,从RS(6,3)和RS(10,4)开始进行扩展时,本发明提供的方案分别减少了22.9-26.7%和19.4-21.7%的扩展流量,与NCScale相比,扩展流量减少了8.3%~62.8%,即本发明提供的方案减少了资源的消耗。Please continue to refer to Figure 13. It can be seen that under different expansion process parameters, the scheme provided by the present invention performs well in the continuous expansion process. Compared with Scale-RS, it starts from RS (6,3) and RS (10,4) When expanding, the solution provided by the present invention reduces the expansion flow by 22.9-26.7% and 19.4-21.7% respectively, and compared with NCScale, the expansion flow reduces by 8.3% to 62.8%, that is, the solution provided by the present invention reduces resource consumption. consume.
在具体的实施过程中,可以执行扩容节点数量影响扩展带宽的仿真试验,该实验测量本发明在添加不同节点数量带来的扩展过程效率的影响。示例性的,请参阅图14所示,在扩展之前使用RS(6,3)和RS(10,4)这两个参数情况,然后将新增的节点个数(即,参数D)从2更改为10。可见,扩展流量随着添加的新节点的数量而增加,产生这样的情况的原因是添加更多的节点需要传输更多的块来进行重新定位和校验块更新。然而,本发明提供的方案仍然保持压缩扩展流量的优势,与Scale-RS和NCScale相比,本发明扩展过程平均可分别减少35.2%和38.1%的扩展流量,即本发明提供的方案减少了资源的消耗。In a specific implementation process, a simulation experiment can be carried out in which the number of expanded nodes affects the expanded bandwidth. The experiment measures the influence of the present invention on the efficiency of the expansion process brought about by adding different numbers of nodes. Exemplarily, please refer to Figure 14, use the two parameters RS(6,3) and RS(10,4) before the expansion, and then change the number of newly added nodes (ie, parameter D) from 2 Change to 10. It can be seen that the expansion traffic increases with the number of new nodes added. The reason for this is that adding more nodes requires more blocks to be transmitted for relocation and check block updates. However, the scheme provided by the present invention still maintains the advantages of compressing and expanding traffic. Compared with Scale-RS and NCScale, the expansion process of the present invention can reduce the expansion traffic by 35.2% and 38.1% respectively on average, that is, the scheme provided by the present invention reduces resources. consumption.
在具体的实施过程中,可以执行平均带宽利用率的不同的仿真实验,该试验最终评估的是平均带宽利用率,且平均带宽利用率定义为每个时间单位传输的平均数据量与每时间单位数据块重定位和校验块更新可传输的理论最大数据量的比值。In the specific implementation process, different simulation experiments of the average bandwidth utilization can be performed. The final evaluation of the experiment is the average bandwidth utilization, and the average bandwidth utilization is defined as the average amount of data transmitted per time unit and the average amount of data per time unit The ratio of the theoretical maximum amount of data that can be transmitted for data block relocation and check block update.
请参阅图15所示,可见,与Scale-RS和NCScale相比,本发明提供的方案实现了接近最佳的带宽利用率。特别是,本发明提供的方案在扩展过程RS(18,4,20)中达到了96.7%的带宽利用率。平均而言,本发明的带宽利用率分别比Scale-RS和NCScale高41.7-46.7%和61.9-78.3%。Referring to FIG. 15 , it can be seen that, compared with Scale-RS and NCScale, the solution provided by the present invention achieves near-optimal bandwidth utilization. In particular, the scheme provided by the present invention achieves a bandwidth utilization rate of 96.7% in the extended process RS(18,4,20). On average, the bandwidth utilization of the present invention is 41.7-46.7% and 61.9-78.3% higher than Scale-RS and NCScale, respectively.
综上可知,本发明针对纠删码存储系统扩展过程I/O消耗大,带宽利用率低且连续扩展消耗增大的现象提出了一种快速连续扩展机制。且本发明从连续的角度分析纠删码存储系统的扩展过程,设计新的空间分布方案和校验块更新算法增加节点带宽利用率和块发送执行度。本发明在保证系统可靠性的基础上,减少了扩展过程时间和带宽流量消耗。In summary, the present invention proposes a rapid continuous expansion mechanism for the phenomenon that the expansion process of the erasure code storage system consumes a lot of I/O, the bandwidth utilization rate is low, and the continuous expansion consumption increases. Moreover, the present invention analyzes the expansion process of the erasure code storage system from a continuous perspective, and designs a new space distribution scheme and a check block update algorithm to increase node bandwidth utilization and block transmission execution. The invention reduces the expansion process time and bandwidth flow consumption on the basis of ensuring system reliability.
如图16所示,本发明提供一种扩展纠删码存储系统的装置,所述装置包括:第一处理单元1601,用于确定所述存储系统中的数据,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点的空间位置分布信息;第二处理单元1602,用于基于扩展需求信息,确定每个条带上新增的节点个数,并基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息;所述条带包括具有编码关系的数据块和校验块;第三处理单元1603,用于基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括多个被选择的条带的目标组;所述扩展组由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的;获得单元1604,用于对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块。As shown in Figure 16, the present invention provides an apparatus for extending an erasure code storage system, the apparatus including: a first processing unit 1601, configured to determine data in the storage system, encode the data, and The data is distributedly stored in each node, and the spatial location distribution information of each node is obtained; the second processing unit 1602 is configured to determine the number of newly added nodes on each stripe based on the extended demand information, and based on the The number of newly added nodes and the spatial position distribution information are used to determine the extended node information on each stripe; the stripe includes data blocks and check blocks with encoding relationships; the third processing unit 1603 is used to Based on the extended node information and the least common multiple rule, an extended group is determined, and the extended group is split to obtain a target group including a plurality of selected strips; Composed of multiple strips under the condition that the distribution of spatial positions remains unchanged; the obtaining unit 1604 is configured to perform an expansion algorithm on the target group to obtain a corresponding target expansion group, and the target expansion group includes expansion data blocks and expansion parity block.
可选的,所述第一处理单元1601,用于:将所述数据划分为K个大小相同的数据块;K为大于1的正整数;将所述K个数据块与预设编码矩阵做域内矩阵运算,获得M个校验块;M为大于1且小于K的正整数;所述K个数据块与所述M个校验块构成多个条带;将同一条带上的数据块和校验块分散在不同的K+M个节点上,确定所述K个数据块与所述M个校验块在各个节点的分布信息,并基于所述分布信息获得所述空间位置分布信息。Optionally, the first processing unit 1601 is configured to: divide the data into K data blocks of the same size; K is a positive integer greater than 1; In-domain matrix operation to obtain M check blocks; M is a positive integer greater than 1 and less than K; the K data blocks and the M check blocks form multiple stripes; the data blocks on the same stripe The check blocks are distributed on different K+M nodes, the distribution information of the K data blocks and the M check blocks on each node is determined, and the spatial position distribution information is obtained based on the distribution information .
可选的,所述第二处理单元1602,用于:基于所述空间位置分布信息,确定每个条带上存储数据块的第一节点个数,以及每个条带上存储校验块的第二节点个数;将所述第一节点个数和所述新增的节点个数相加,获得第三节点个数,将所述第三节点个数,作为每个条带上扩展后的存储数据块的个数;以及,将所述第二节点个数,作为每个条带上扩展后的存储校验块的个数,以确定每个条带上的扩展节点信息。Optionally, the second processing unit 1602 is configured to: based on the spatial location distribution information, determine the first number of nodes storing data blocks on each stripe, and the number of first nodes storing check blocks on each stripe The second number of nodes; add the first number of nodes and the number of newly added nodes to obtain the third number of nodes, and use the third number of nodes as the expanded number of each strip The number of stored data blocks; and, using the second number of nodes as the number of expanded storage check blocks on each stripe to determine the expanded node information on each stripe.
可选的,所述第三处理单元1603,用于:基于所述扩展节点信息和最小公倍数规则,确定扩展组;所述扩展组包括V个扩展条带;对所述V个扩展条带进行拆分,确定P个基本组和R个调整组;每个所述基本组包括Vp个基本条带,每个所述调整组中包括Vr个调整条带;P和R为大于1的正整数;从所述基本组中选择K个基本条带,以及从所述调整组中选择D个调整条带,并基于所述K个基本条带和所述D个调整条带确定目标组;所述目标组中包括K+D个条带。Optionally, the third processing unit 1603 is configured to: determine an extension group based on the extension node information and the least common multiple rule; the extension group includes V extension strips; Splitting and determining P basic groups and R adjustment groups; each of the basic groups includes Vp basic strips, and each of the adjustment groups includes Vr adjustment strips; P and R are positive integers greater than 1 ; Select K basic strips from the basic group, and select D adjustment strips from the adjustment group, and determine a target group based on the K basic strips and the D adjustment strips; The target group includes K+D strips.
可选的,所述最小公倍数规则采用以下公式确定:Optionally, the least common multiple rule is determined by the following formula:
V=LCM(K,K+D+1)(K+D)(K+1)/KV=LCM(K, K+D+1)(K+D)(K+1)/K
其中,所述LCM()用于表征求取最小公倍数的函数,k用于表征每个条带上的扩展前的存储数据块的节点个数;d用于表征新增节点个数。Wherein, the LCM() is used to represent a function for obtaining the least common multiple, k is used to represent the number of nodes storing data blocks before extension on each stripe; d is used to represent the number of newly added nodes.
可选的,所述获得单元1604,具体用于:对任一所述目标组中的K+D个条带编号,并对所述存储系统扩展后的K+M+D个节点编号;计算前K+1个节点中调整条带上数据块的差异校验块,并基于所述差异校验块更新同一节点上基本条带的第一个校验块;按照轮循模式将所述调整条带上数据块,传输到同一节点上的所述基本条带中,获得拓展后的初始扩展组;对所述初始扩展组执行预设操作,获得对应的目标扩展组。Optionally, the obtaining unit 1604 is specifically configured to: number K+D stripes in any one of the target groups, and number K+M+D nodes after the storage system is expanded; calculate Adjust the difference check block of the data block on the stripe in the first K+1 nodes, and update the first check block of the basic stripe on the same node based on the difference check block; The data blocks on the stripe are transmitted to the basic stripe on the same node to obtain an expanded initial extended group; a preset operation is performed on the initial extended group to obtain a corresponding target extended group.
可选的,所述装置还包括调整单元,用于:确定目标扩展组对应的条带的逻辑关系,以及各个扩展数据块和扩展校验块对应的第一空间分布信息;按照所述空间分布信息,调整所述逻辑关系顺序,以使所述第一空间分布信息与所述空间分布信息的逻辑布局相同。Optionally, the device further includes an adjustment unit, configured to: determine the logical relationship of the stripes corresponding to the target extended group, and the first spatial distribution information corresponding to each extended data block and extended parity block; according to the spatial distribution information, adjusting the order of the logical relationship so that the logical layout of the first spatial distribution information is the same as that of the spatial distribution information.
本发明实施例提供一种计算机设备,包括程序或指令,当所述程序或指令被执行时,用以执行本发明实施例提供的一种扩展纠删码存储系统的方法及任一可选方法。An embodiment of the present invention provides a computer device, including a program or an instruction, and when the program or instruction is executed, it is used to execute a method for extending an erasure code storage system and any optional method provided in an embodiment of the present invention .
本发明实施例提供一种存储介质,包括程序或指令,当所述程序或指令被执行时,用以执行本发明实施例提供的一种扩展纠删码存储系统的方法及任一可选方法。An embodiment of the present invention provides a storage medium, including a program or an instruction, and when the program or instruction is executed, it is used to execute a method for extending an erasure code storage system and any optional method provided in an embodiment of the present invention .
最后应说明的是:本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、光学存储器等)上实施的计算机程序产品的形式。Finally, it should be noted that: those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装 置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

Claims (10)

  1. 一种扩展纠删码存储系统的方法,其特征在于,所述方法包括:A method for expanding an erasure code storage system, characterized in that the method comprises:
    确定所述存储系统中的数据,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点的空间位置分布信息;Determining the data in the storage system, encoding the data, and dispersely storing the data in each node, and obtaining the spatial location distribution information of each node;
    基于扩展需求信息,确定每个条带上新增的节点个数,并基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息;所述条带包括具有编码关系的数据块和校验块;Determine the number of newly added nodes on each strip based on the extended demand information, and determine the expanded node information on each strip based on the number of newly added nodes and the spatial location distribution information; The band includes a data block and a check block with an encoding relationship;
    基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括多个被选择的条带的目标组;所述扩展组由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的;Based on the extended node information and the least common multiple rule, an extended group is determined, and the extended group is split to obtain a target group including a plurality of selected strips; Composed of multiple strips under the condition that the distribution of spatial positions remains unchanged;
    对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块。Executing an expansion algorithm on the target group to obtain a corresponding target expansion group, where the target expansion group includes an extended data block and an extended check block.
  2. 如权利要求1所述的方法,其特征在于,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点空间位置分布信息,包括:The method according to claim 1, wherein said data is encoded, and said data is distributedly stored in each node, and said each node spatial position distribution information is obtained, comprising:
    将所述数据划分为K个大小相同的数据块;K为大于1的正整数;Divide the data into K data blocks of the same size; K is a positive integer greater than 1;
    将所述K个数据块与预设编码矩阵做域内矩阵运算,获得M个校验块;M为大于1且小于K的正整数;所述K个数据块与所述M个校验块构成多个条带;performing an intra-domain matrix operation on the K data blocks and the preset encoding matrix to obtain M check blocks; M is a positive integer greater than 1 and less than K; the K data blocks and the M check blocks constitute multiple strips;
    将同一条带上的数据块和校验块分散在不同的K+M个节点上,确定所述K个数据块与所述M个校验块在各个节点的分布信息,并基于所述分布信息获得所述空间位置分布信息。Distribute the data blocks and check blocks on the same strip on different K+M nodes, determine the distribution information of the K data blocks and the M check blocks on each node, and based on the distribution The information obtains the spatial location distribution information.
  3. 如权利要求1或2所述的方法,其特征在于,基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息,包括:The method according to claim 1 or 2, wherein, based on the newly added number of nodes and the spatial location distribution information, determining the extended node information on each stripe includes:
    基于所述空间位置分布信息,确定每个条带上存储数据块的第一节点个数,以及每个条带上存储校验块的第二节点个数;Based on the spatial position distribution information, determine the first number of nodes storing data blocks on each stripe, and the second number of nodes storing check blocks on each stripe;
    将所述第一节点个数和所述新增的节点个数相加,获得第三节点个数,将所述第三节点个数,作为每个条带上扩展后的存储数据块的个数;以及,将所述第二节点个数,作为每个条带上扩展后的存储校验块的个数,以确定每个条带上的扩展节点信息。Adding the first number of nodes and the number of newly added nodes to obtain a third number of nodes, and using the third number of nodes as the number of expanded storage data blocks on each stripe number; and, using the second number of nodes as the number of expanded storage check blocks on each stripe, so as to determine the expanded node information on each stripe.
  4. 如权利要求1所述的方法,其特征在于,基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括具有对应关系的条带的目标组,包括:The method according to claim 1, wherein an extension group is determined based on the extension node information and the least common multiple rule, and the extension group is split to obtain a target group including strips with corresponding relationships ,include:
    基于所述扩展节点信息和最小公倍数规则,确定扩展组;所述扩展组包括V个扩展条带;Determine an extension group based on the extension node information and the least common multiple rule; the extension group includes V extension strips;
    对所述V个扩展条带进行拆分,确定P个基本组和R个调整组;每个所述基本组包括Vp个基本条带,每个所述调整组中包括Vr个调整条带;P和R为大于1的正整数;Splitting the V extended strips to determine P basic groups and R adjustment groups; each of the basic groups includes Vp basic strips, and each of the adjustment groups includes Vr adjustment strips; P and R are positive integers greater than 1;
    从所述基本组中选择K个基本条带,以及从所述调整组中选择D个调整条带,并基于所述K个基本条带和所述D个调整条带确定目标组;所述目标组中包括K+D个条带。Select K basic strips from the basic group, and select D adjustment strips from the adjustment group, and determine a target group based on the K basic strips and the D adjustment strips; the The target group includes K+D stripes.
  5. 如权利要求4所述的方法,其特征在于,所述最小公倍数规则采用以下公式确定:The method according to claim 4, wherein the least common multiple rule is determined by the following formula:
    V=LCM(K,K+D+1)(K+D)(K+1)/KV=LCM(K, K+D+1)(K+D)(K+1)/K
    其中,所述LCM()用于表征求取最小公倍数的函数,k用于表征每个条带上的扩展前的存储数据块的节点个数;d用于表征新增节点个数。Wherein, the LCM() is used to represent a function for obtaining the least common multiple, k is used to represent the number of nodes storing data blocks before extension on each stripe; d is used to represent the number of newly added nodes.
  6. 如权利要求4所述的方法,其特征在于,对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块,包括:The method according to claim 4, wherein an expansion algorithm is performed on the target group to obtain a corresponding target expansion group, and the target expansion group includes an expansion data block and an expansion check block, including:
    对任一所述目标组中的K+D个条带编号,并对所述存储系统扩展后的K+M+D个节点编号;Numbering K+D stripes in any one of the target groups, and numbering K+M+D nodes after the storage system is expanded;
    计算前K+1个节点中调整条带上数据块的差异校验块,并基于所述差异校验块更新同一节点上基本条带的第一个校验块;Calculating the difference check blocks of the data blocks on the adjusted stripes in the first K+1 nodes, and updating the first check block of the basic stripe on the same node based on the difference check blocks;
    按照轮循模式将所述调整条带上数据块,传输到同一节点上的所述基本条带中,获得拓展后的初始扩展组;Transmitting the data blocks on the adjusted stripe to the basic stripe on the same node according to the round robin mode to obtain an expanded initial extended group;
    对所述初始扩展组执行预设操作,获得对应的目标扩展组。A preset operation is performed on the initial extended group to obtain a corresponding target extended group.
  7. 如权利要求1所述的方法,其特征在于,在获得所述目标扩展组之后,所述方法还包括:The method according to claim 1, characterized in that, after obtaining the target extension group, the method further comprises:
    确定目标扩展组对应的条带的逻辑关系,以及各个扩展数据块和扩展校验块对应的第一空间分布信息;Determine the logical relationship of the stripes corresponding to the target extended group, and the first spatial distribution information corresponding to each extended data block and extended parity block;
    按照所述空间分布信息,调整所述逻辑关系顺序,以使所述第一空间分布信息与所述空间分布信息的逻辑布局相同。According to the spatial distribution information, the order of the logical relationship is adjusted so that the logical layout of the first spatial distribution information is the same as that of the spatial distribution information.
  8. 一种扩展纠删码存储系统的装置,其特征在于,所述装置包括:A device for expanding an erasure code storage system, characterized in that the device includes:
    第一处理单元,用于确定所述存储系统中的数据,对所述数据进行编码,并将所述数据分散存储在各个节点,获得所述各个节点的空间位置分布信息;The first processing unit is configured to determine the data in the storage system, encode the data, and store the data dispersedly in each node, and obtain the spatial location distribution information of each node;
    第二处理单元,用于基于扩展需求信息,确定每个条带上新增的节点个数,并基于所述新增的节点个数和所述空间位置分布信息,确定每个条带上的扩展节点信息;所述条带包括具有编码关系的数据块和校验块;The second processing unit is configured to determine the number of newly added nodes on each stripe based on the extended demand information, and determine the number of nodes on each stripe based on the number of newly added nodes and the spatial location distribution information. Extended node information; the strip includes a data block and a check block with an encoding relationship;
    第三处理单元,用于基于所述扩展节点信息和最小公倍数规则,确定扩展组,并对所述扩展组进行拆分处理,获得包括多个被选择的条带的目标组;所述扩展组由满足可完成扩展需求且空间位置分布规律不变的条件的多个条带所组成的;The third processing unit is configured to determine an extension group based on the extension node information and the least common multiple rule, and split the extension group to obtain a target group including multiple selected strips; the extension group It is composed of multiple strips that meet the conditions that the expansion requirements can be completed and the spatial distribution law remains unchanged;
    获得单元,用于对所述目标组执行扩展算法,获得对应的目标扩展组,所述目标扩展组包括扩展数据块和扩展校验块。The obtaining unit is configured to execute an expansion algorithm on the target group to obtain a corresponding target expansion group, where the target expansion group includes an extended data block and an extended check block.
  9. 一种计算机设备,其特征在于,包括程序或指令,当所述程序或指令被执行时,如权利要求1至7中任意一项所述的方法被执行。A computer device, characterized by including programs or instructions, when the programs or instructions are executed, the method according to any one of claims 1 to 7 is executed.
  10. 一种存储介质,其特征在于,包括程序或指令,当所述程序或指令被执行时,如权利要求1至7中任意一项所述的方法被执行。A storage medium is characterized by including programs or instructions, and when the programs or instructions are executed, the method according to any one of claims 1 to 7 is executed.
PCT/CN2022/101302 2021-12-02 2022-06-24 Method and apparatus for expanding erasure code storage system WO2023098048A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111459202.9A CN114237970A (en) 2021-12-02 2021-12-02 Method and device for expanding erasure code storage system
CN202111459202.9 2021-12-02

Publications (1)

Publication Number Publication Date
WO2023098048A1 true WO2023098048A1 (en) 2023-06-08

Family

ID=80752786

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/101302 WO2023098048A1 (en) 2021-12-02 2022-06-24 Method and apparatus for expanding erasure code storage system

Country Status (2)

Country Link
CN (1) CN114237970A (en)
WO (1) WO2023098048A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237970A (en) * 2021-12-02 2022-03-25 深圳前海微众银行股份有限公司 Method and device for expanding erasure code storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630423A (en) * 2015-12-25 2016-06-01 华中科技大学 Erasure code cluster storage expansion method based on data caching
CN108536396A (en) * 2018-04-08 2018-09-14 华中科技大学 A kind of storage extended method based on network code
US20180293265A1 (en) * 2017-04-06 2018-10-11 International Business Machines Corporation Enhanced FSCK Mechanism for Improved Consistency in Case of Erasure Coded Object Storage Architecture Built Using Clustered File System
CN111831223A (en) * 2020-06-19 2020-10-27 华中科技大学 Fault-tolerant coding method, device and system for improving expandability of data deduplication system
CN114237970A (en) * 2021-12-02 2022-03-25 深圳前海微众银行股份有限公司 Method and device for expanding erasure code storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630423A (en) * 2015-12-25 2016-06-01 华中科技大学 Erasure code cluster storage expansion method based on data caching
US20180293265A1 (en) * 2017-04-06 2018-10-11 International Business Machines Corporation Enhanced FSCK Mechanism for Improved Consistency in Case of Erasure Coded Object Storage Architecture Built Using Clustered File System
CN108536396A (en) * 2018-04-08 2018-09-14 华中科技大学 A kind of storage extended method based on network code
CN111831223A (en) * 2020-06-19 2020-10-27 华中科技大学 Fault-tolerant coding method, device and system for improving expandability of data deduplication system
CN114237970A (en) * 2021-12-02 2022-03-25 深圳前海微众银行股份有限公司 Method and device for expanding erasure code storage system

Also Published As

Publication number Publication date
CN114237970A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
US10594340B2 (en) Disaster recovery with consolidated erasure coding in geographically distributed setups
US10169129B2 (en) Dispersed B-tree directory trees
US11582299B2 (en) Allocating cache memory in a dispersed storage network
US9817701B2 (en) Threshold computing in a distributed computing system
Aggarwal et al. Sprout: A functional caching approach to minimize service latency in erasure-coded storage
US10740198B2 (en) Parallel partial repair of storage
CN111788560B (en) Dynamic authorization batching in a distributed storage network
CN110308984B (en) Cross-cluster computing system for processing geographically distributed data
US10860256B2 (en) Storing data utilizing a maximum accessibility approach in a dispersed storage network
Aktaş et al. On the service capacity region of accessing erasure coded content
Anderson et al. Service rate region of content access from erasure coded storage
WO2023098048A1 (en) Method and apparatus for expanding erasure code storage system
US20200226107A1 (en) Reinforcement learning for optimizing data deduplication
CN109478125B (en) Manipulating a distributed consistency protocol to identify a desired set of storage units
US10298684B2 (en) Adaptive replication of dispersed data to improve data access performance
Wu et al. Optimal data placement for stripe merging in locally repairable codes
Gong et al. Optimal node selection for data regeneration in heterogeneous distributed storage systems
Wang et al. XORInc: Optimizing data repair and update for erasure-coded systems with XOR-based in-network computation
Xu et al. SelectiveEC: Towards balanced recovery load on erasure-coded storage systems
US20200218450A1 (en) Updating protection sets in a geographically distributed storage environment
CN110603517B (en) De-copying distributed erasure coded objects
US20180107423A1 (en) Modifying and utilizing a file structure in a dispersed storage network
Zhang et al. NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage
CN114237985B (en) Method for repairing failed memory block in erasure code memory system and related device
Pu et al. SDCUP: software-defined-control based erasure-coded collaborative data update mechanism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22899868

Country of ref document: EP

Kind code of ref document: A1