CN105610921A

CN105610921A - Erasure code filing method based on data cache in cluster

Info

Publication number: CN105610921A
Application number: CN201510979326.8A
Authority: CN
Inventors: 黄建忠; 曹强; 谢长生; 蔡颖; 代尔卫; 夏杰
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2015-12-23
Filing date: 2015-12-23
Publication date: 2016-05-25
Anticipated expiration: 2035-12-23
Also published as: CN105610921B

Abstract

The invention discloses an erasure code archiving method based on data cache under a cluster, which comprises the following steps: (1) according to a user access request, read the required data blocks from the production node where it is located to the node buffer, and The data block is sent to the access node, and forwarded to the archiving node according to the archive data status table; (2) the archiving node updates the number of data blocks of each data node according to the received data block; judges whether the archiving node is All the data blocks of the current stripe are received, if so, the current stripe is archived; if not, the cold data block is read from the production cluster, and the stripes are assembled and then archived; the correction provided by the present invention Compared with the common erasure code archiving method, the erasure code archiving method can greatly reduce the number of times the archiving node reads data blocks from the production cluster, without affecting user access. Perform erasure code archiving operations under certain circumstances to improve archiving efficiency.

Description

A cluster-based erasure code archiving method based on data caching

技术领域 technical field

本发明属于计算机存储技术领域，更具体地，涉及一种集群下基于数据缓存的纠删码归档方法。 The invention belongs to the technical field of computer storage, and more specifically relates to an erasure code archiving method based on data caching under a cluster.

背景技术 Background technique

在如今的信息时代数据量急速增长，大规模存储集群得到了越来越广泛的应用。根据数据访问频度的高低，可以依次将存储集群中的数据分为热数据(HotData)、温数据(WarmData)和冷数据(ColdData)。随着系统运行时间的推移，热数据的访问频度会下降，变成温数据，最后成为冷数据，热数据通常按三副本或多副本方式存放在生产集群(ProductionCluster)。一种提高生产集群系统利用率的方法是将访问频度低的副本数据迁移到纠删码归档集群(ArchivalCluster)，该归档集群采用RS码来组织存放其中的数据；纠删码归档(Erasure-codedArchival)是指将数据从副本存储迁移到纠删码存储的操作。 In today's information age, the amount of data is increasing rapidly, and large-scale storage clusters have been more and more widely used. According to the frequency of data access, the data in the storage cluster can be divided into hot data (HotData), warm data (WarmData) and cold data (ColdData) in turn. As the system runs longer, the access frequency of hot data will decrease, become warm data, and finally become cold data. Hot data is usually stored in the production cluster (ProductionCluster) in three or more copies. A method to improve the utilization rate of the production cluster system is to migrate the copy data with low access frequency to the erasure-coded archive cluster (ArchivalCluster), which uses RS codes to organize the data stored in it; the erasure-coded archive (Erasure- codedArchival) refers to the operation of migrating data from replica storage to erasure code storage.

现有纠删码归档和用户访问副本集群是两个相互独立的过程；现有纠删码归档包括(1)获取需要归档数据在生产集群中的数据位图；(2)从生产集群中读取副本数据；(3)将读取副本数据进行纠删码编码，生成校验分块；(4)将一份副本数据和相应校验分块写入纠删码归档集群；(5)对生产集群上剩余冷数据进行回收，删除生产集群中所有参与纠删码归档的数据副本。用户访问副本集群包括(1)获取所需数据分块在生产集群中的元数据信息；(2)对数据分块所在节点发出读请求；(3)该节点读取相应数据分块到节点缓冲区；(4)数据节点发送数据分块，并将数据分块转发到客户端。 The existing erasure code archiving and user access to the replica cluster are two independent processes; the existing erasure code archiving includes (1) obtaining the data bitmap of the data to be archived in the production cluster; (2) reading from the production cluster Take the copy data; (3) perform erasure code encoding on the read copy data to generate verification blocks; (4) write a copy of the copy data and corresponding verification blocks into the erasure code archiving cluster; (5) The remaining cold data on the production cluster is recycled, and all data copies in the production cluster that participate in erasure code archiving are deleted. The user's access to the replica cluster includes (1) obtaining the metadata information of the required data block in the production cluster; (2) sending a read request to the node where the data block is located; (3) the node reading the corresponding data block to the node buffer (4) The data node sends the data block and forwards the data block to the client.

对于该纠删码归档，若数据归档的请求过量，则采用离线归档，数据访问与归档二者独立；但在当前大数据时代，几乎不存在专门的归档时间，需采用在线归档；对于在线归档，生产集群中的节点上始终保持有热数据和冷数据，采用上述纠删码归档方法存在如下问题： For this erasure code archiving, if there are too many requests for data archiving, offline archiving is used, and data access and archiving are independent; however, in the current era of big data, there is almost no dedicated archiving time, and online archiving is required; for online archiving , the nodes in the production cluster always keep hot data and cold data, and the above-mentioned erasure code archiving method has the following problems:

(1)归档请求直接下发给生产集群，此时生产集群也在为用户提供访问服务；归档请求带来额外的外存访问、内存缓存的占用、网络资源的占用，导致用户请求的响应时间大大增加； (1) The archiving request is directly sent to the production cluster, and the production cluster is also providing access services for users at this time; the archiving request brings additional external memory access, memory cache occupation, and network resource occupation, resulting in response time for user requests greatly increase;

(2)对于归档任务，由于生产集群中的节点需要同时处理用户访问请求与数据归档请求，两者之间存在资源竞争；某个存储集群节点用户访问请求频度高的情况下，该节点的归档请求将迟迟得不到响应，造成归档节点无法及时得到数据分块，进而影响整个归档效率。 (2) For archiving tasks, because nodes in the production cluster need to process user access requests and data archiving requests at the same time, there is resource competition between the two; when a storage cluster node has high frequency of user access requests, the node's The archiving request will not be responded for a long time, causing the archiving node to fail to obtain data blocks in time, which will affect the overall archiving efficiency.

发明内容 Contents of the invention

针对现有技术的以上缺陷或改进需求，本发明提供了一种集群下基于数据缓存的纠删码归档方法，其目的在于提高纠删码归档效率。 In view of the above defects or improvement needs of the prior art, the present invention provides an erasure code archiving method based on data caching under a cluster, the purpose of which is to improve the efficiency of erasure code archiving.

为实现上述目的，按照本发明的一个方面，提供了一种集群下基于数据缓存的纠删码归档方法，具体包括以下步骤： In order to achieve the above purpose, according to one aspect of the present invention, a method for archiving erasure codes based on data caching under a cluster is provided, which specifically includes the following steps:

(1)根据用户访问请求，将所需数据分块从其所在生产节点读取到节点缓冲区，并将数据分块发送给访问节点，同时根据归档数据状态表将其转发给归档节点； (1) According to the user's access request, read the required data block from the production node where it is located to the node buffer, and send the data block to the access node, and forward it to the archive node according to the archive data status table;

(2)归档节点根据接收到的数据分块，更新每个数据节点的数据分块个数；并判断归档节点是否接收到当前条带的全部数据分块，若是，则将当前条带进行归档；若否，则从生产集群读取冷数据块，将条带凑齐之后进行归档。 (2) The archiving node updates the number of data blocks of each data node according to the received data blocks; and judges whether the archiving node has received all the data blocks of the current stripe, and if so, archives the current stripe ; If not, read the cold data block from the production cluster, and archive the stripes.

上述纠删码归档方法，由于采用了用户访问缓存的数据分块，相比于普通纠删码归档，可大幅度减少归档节点从生产集群中读取数据分块的次数，从而可以在基本不影响用户访问的情况下进行归档操作，提高归档效率； Compared with ordinary erasure code archiving, the above erasure code archiving method can greatly reduce the number of times the archiving node reads data blocks from the production cluster due to the use of data blocks accessed by users in the cache, so that it can be stored in almost no time Perform archiving operations when user access is affected to improve archiving efficiency;

另一方面，将用户访问缓存的数据分块转发到归档节点，避免了用户访问和普通纠删码归档对生产节点中相同数据分块的重复读取请求，可有效地减少生产集群中网络传输的数据量，减轻整个网络的负载。 On the other hand, the data blocks of user access cache are forwarded to the archive node, which avoids repeated read requests for the same data block in the production node for user access and common erasure code archiving, which can effectively reduce network transmission in the production cluster The amount of data reduces the load on the entire network.

优选地，上述步骤(1)，具体包括如下子步骤： Preferably, the above-mentioned step (1) specifically includes the following sub-steps:

(1.1)访问节点从管理服务器获得所需的数据分块在生产集群中的地址； (1.1) The access node obtains the address of the required data block in the production cluster from the management server;

(1.2)对上述数据分块在生成集群中的地址对应节点发出读请求； (1.2) Send a read request to the node corresponding to the address of the above-mentioned data block in the generation cluster;

(1.3)将数据分块从其所在生成集群中的生产节点读取到节点缓冲区； (1.3) Read the data block from the production node in the generation cluster where it is located to the node buffer;

(1.4)根据归档数据状态表，判断该数据分块是否已被转发到归档节点，若是，则将该数据分块转发到访问节点；若否，则将该数据分块同时转发到访问节点和归档节点；该步骤直接利用集群中的交换机将用户访问的数据转发给归档节点。 (1.4) According to the archived data state table, judge whether the data block has been forwarded to the archiving node, if so, then forward the data block to the access node; if not, forward the data block to the access node and the access node at the same time Archiving node; this step directly uses the switch in the cluster to forward the data accessed by the user to the archiving node.

优选地，上述步骤(2)，具体包括如下子步骤： Preferably, the above-mentioned step (2) specifically includes the following sub-steps:

(2.1)归档节点接收数据分块(热数据)，更新归档节点中包含的每个数据节点的数据分块个数； (2.1) The archive node receives data blocks (hot data), and updates the number of data blocks of each data node included in the archive node;

(2.2)归档节点初始化当前条带，分析条带的节点组成，判断接收的数据块属于哪个数据节点，并初始化存储变量；本步骤中，节点包括数据节点和校验节点； (2.2) The archiving node initializes the current stripe, analyzes the node composition of the stripe, judges which data node the received data block belongs to, and initializes the storage variable; in this step, the node includes a data node and a check node;

(2.3)由归档节点判断是否接收到当前条带的全部数据分块，若是，进入步骤(2.7)；若否，则进入步骤(2.4)； (2.3) Judging by the archiving node whether all the data blocks of the current stripe have been received, if so, enter step (2.7); if not, then enter step (2.4);

(2.4)在时长为T的时间间隔内，归档节点从用户缓存接收数据分块； (2.4) During the time interval of T, the archiving node receives data blocks from the user cache;

(2.5)判断当前条带的k个数据分块是否集齐，若是，进入步骤(2.7)；若否，则进入步骤(2.6)；其中，k值等于数据节点的个数； (2.5) Judging whether the k data blocks of the current stripe are collected, if so, enter step (2.7); if not, then enter step (2.6); wherein, the k value is equal to the number of data nodes;

(2.6)归档节点从生产集群中直接读取冷数据，以凑齐μ个条带，然后进入步骤(2.3)； (2.6) The archiving node directly reads the cold data from the production cluster to gather μ stripes, and then enters step (2.3);

其中，从数据节点j读取的数据分块数为Num_j，Num_j＝max{(μ-a_j-λ(j)),0}，μ为归档节点在时间T内所能归档的最大条带数，a_j为归档节点中包含数据节点j的数据分块个数，λ(j)表示在下一个时间间隔T内归档节点将要从数据节点j接收到的数据分块个数； Among them, the number of data blocks read from data node j is Num _j , Num _j = max{(μ-a _j -λ(j)),0}, μ is the maximum number of blocks that can be archived by the archiving node within time T The number of stripes, a _j is the number of data blocks containing data node j in the archive node, λ(j) represents the number of data blocks that the archive node will receive from data node j in the next time interval T;

(2.7)归档当前条带，并将条带号i的值加1，并更新归档数据状态表； (2.7) archive the current strip, and add 1 to the value of the strip number i, and update the archived data status table;

(2.8)根据归档数据状态表判断是否完成全部条带的归档，若否，则进入步骤(2.3)；若是，则结束归档； (2.8) judge whether to complete the archiving of all stripes according to the archiving data status table, if not, then enter step (2.3); if so, then end archiving;

上述过程，归档节点在不影响用户访问的同时，利用缓存数据接收用户分块，可起到提高归档效率的作用。 In the above process, the archiving node can use the cached data to receive user blocks without affecting user access, which can play a role in improving archiving efficiency.

总体而言，通过本发明所构思的以上技术方案与现有技术相比，能够取得下列有益效果： Generally speaking, compared with the prior art, the above technical solutions conceived by the present invention can achieve the following beneficial effects:

(1)本发明提供的这种集群下基于数据缓存的纠删码归档方法，在不影响用户访问的情况下，利用用户访问的缓存数据来提高归档效率：通过将用户访问的缓存数据分块直接发送给归档节点，相比于普通纠删码归档方法，本发明提供的方法能减少磁盘读取的次数，从而加速生产节点外存数据分块读取、生产节点内存空间占用、以及生产节点网卡发送归档数据分块这三个环节，提高归档效率；只有在一个条带未全部接收到的情况下，才会直接读取生产集群数据分块，而读取数据分块的次数和块数相比于用户访问的数据量和频度低很多，基本不影响用户访问； (1) The erasure code archiving method based on data caching under the cluster provided by the present invention uses cached data accessed by users to improve archiving efficiency without affecting user access: by dividing cached data accessed by users into blocks Send it directly to the archiving node. Compared with the ordinary erasure code archiving method, the method provided by the present invention can reduce the number of disk reads, thereby speeding up the production node external storage data block reading, production node memory space occupation, and production node The network card sends the archived data into blocks to improve the efficiency of archiving; only when a stripe is not fully received, it will directly read the production cluster data blocks, and the number of read data blocks and the number of blocks Compared with the amount and frequency of data accessed by users, it is much lower, and basically does not affect user access;

(2)本发明提供的这种集群下基于数据缓存的纠删码归档方法，降低了生产集群中网络传输的数据量；在现有的普通纠删码归档中，需要将数据分块从节点缓冲区发送给集群中的交换机，再转发给归档节点；而在本发明的提供的纠删码归档方法中，可以直接利用集群中的交换机将用户访问的数据转发给归档节点，避免了普通纠删码归档方法中将数据从节点缓冲区发送到集群网络中的交换机，从而可以有效地减少生产集群中网络传输的数据量，减轻整个网络的负载。 (2) The erasure code archiving method based on data cache under the cluster provided by the present invention reduces the amount of data transmitted by the network in the production cluster; The buffer is sent to the switch in the cluster, and then forwarded to the archiving node; and in the erasure code archiving method provided by the present invention, the data accessed by the user can be directly forwarded to the archiving node by using the switch in the cluster, avoiding the common correction In the code-deletion archiving method, the data is sent from the node buffer to the switch in the cluster network, which can effectively reduce the amount of data transmitted by the network in the production cluster and reduce the load of the entire network.

附图说明 Description of drawings

图1是本发明集群下基于数据缓存的纠删码归档方法的流程图； Fig. 1 is a flow chart of the erasure code archiving method based on data cache under the cluster of the present invention;

图2是实施例提供的集群下基于数据缓存的纠删码归档方法的示意图； Fig. 2 is a schematic diagram of an erasure code archiving method based on data caching under a cluster provided by an embodiment;

图3是用户访问和普通纠删码归档的示意图； Figure 3 is a schematic diagram of user access and common erasure code archiving;

图4是基于数据缓存的用户访问和数据归档的示意图。 Fig. 4 is a schematic diagram of user access and data archiving based on data caching.

具体实施方式 detailed description

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。此外，下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。 In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

为便于理解，将在本发明中涉及的术语解释说明如下： For ease of understanding, the terms involved in the present invention are explained as follows:

生产集群：指在归档过程中提供归档数据的同时，仍然需要给上层应用提供数据访问服务的节点总称。 Production cluster: refers to the general term for nodes that still need to provide data access services to upper-layer applications while providing archived data during the archiving process.

纠删码归档集群：归档完成后存储完整数据条带的集群，可以由生产集群本身的空闲空间替代。 Erasure-coded archiving cluster: A cluster that stores complete data stripes after archiving is completed, which can be replaced by the free space of the production cluster itself.

管理服务器：管理整体集群的元数据信息，包括集群中数据分块及其副本的节点位置信息。 Management server: manages the metadata information of the overall cluster, including the node location information of data blocks and their copies in the cluster.

编码节点：一个条带的数据归档过程中，实施编码计算操作的节点。 Coding node: A node that implements coding calculation operations during the data archiving process of a stripe.

节点数据分块以及条带：在归档的编码过程中，读取数据的单元就是数据分块。在存储集群中，一个条带就是由在不同节点上相同偏移地址的数据分块单元组成，可以独立地恢复出失效数据的信息集合。 Node data blocks and stripes: In the encoding process of the archive, the unit of reading data is the data block. In a storage cluster, a stripe is composed of data block units with the same offset address on different nodes, which can independently recover the information set of invalid data.

数据位图：保存在存储集群的管理服务器中，表示数据分块和数据节点之间的对应关系。 Data bitmap: stored in the management server of the storage cluster, representing the correspondence between data blocks and data nodes.

NUM_j：归档过程中，当前归档条带所需的数据分块没有集齐时，归档节点直接向生产集群中数据节点j读取NUM_j＝max{(μ-a_j-λ(j)),0}个数据分块，μ为归档节点在时间T内所能归档的条带数，a_j为归档节点中包含数据节点j的数据分块个数，λ(j)表示在下一段时间T内归档节点从节点j接收到的数据分块个数。 NUM _j : During the archiving process, when the data blocks required for the current archiving stripe are not collected, the archiving node directly reads NUM _j = max{(μ-a _j -λ(j)) from the data node j in the production cluster ,0} data blocks, μ is the number of stripes that can be archived by the archiving node within time T, a _j is the number of data blocks that contain data node j in the archiving node, λ(j) means that in the next period of time T The number of data blocks received by the internal archiving node from node j.

归档数据状态表：管理服务器所维护的一个全局表格，记录生产集群中数据分块是否已被归档，在实施例中，用‘0’和‘1’分别表示该数据分块‘未被归档’和‘已被归档’。 Archived data status table: a global table maintained by the management server, recording whether the data block in the production cluster has been archived. In the embodiment, '0' and '1' are respectively used to indicate that the data block is 'not archived' and 'archived'.

本发明提供了一种集群下基于数据缓存的纠删码归档方法，主要利用用户访问的缓存数据来加速纠删码归档过程，其具体流程如图1所示，包括如下步骤： The present invention provides an erasure code archiving method based on data cache under a cluster, which mainly uses cached data accessed by users to accelerate the erasure code archiving process. The specific process is shown in Figure 1, including the following steps:

(1)根据用户访问请求，将所需数据分块从其所在生产节点读取到节点缓冲区，再将数据分块发送给访问节点，同时根据归档数据状态表将其转发给归档节点；本步骤具体包括以下子步骤： (1) Read the required data blocks from the production node to the node buffer according to the user's access request, then send the data blocks to the access node, and forward it to the archive node according to the archive data status table; The steps specifically include the following sub-steps:

(1.1)访问节点根据用户请求，查询管理服务器，得到用户请求的数据分块在生产集群中的节点位置； (1.1) The access node queries the management server according to the user request, and obtains the node position of the data block requested by the user in the production cluster;

(1.2)根据步骤(1.1)得到的生产集群节点位置，生产集群的节点读取数据分块到节点缓冲区； (1.2) According to the position of the production cluster node obtained in step (1.1), the nodes of the production cluster read data blocks to the node buffer;

(1.3)生产节点查询管理服务器，得到归档数据状态表；根据归档数据状态表，判断用户请求数据分块的归档状态； (1.3) The production node inquires the management server to obtain the archived data state table; according to the archived data state table, judge the archiving state of the user request data block;

(1.4)生产节点发送缓冲区中数据分块； (1.4) Data blocks in the production node sending buffer;

(1.5)交换机将数据分块发送给访问节点，以相应用户请求，同时，将未被归档的数据分块转发到归档节点，以相应归档请求； (1.5) The switch sends the data blocks to the access node to respond to user requests, and at the same time, forwards the unarchived data blocks to the archiving node to correspond to the archiving request;

(1.6)判断用户请求是否全部处理完成，若为是，则用户访问结束；若为否，则转到步骤(1.1)，继续执行下一个用户请求； (1.6) Judging whether all the processing of the user request is completed, if yes, the user visit ends; if not, then go to step (1.1), and continue to execute the next user request;

(2)归档节点根据接收到的用户访问数据，更新每个数据节点的数据分块个数，归档节点若接收到当前条带的全部数据分块，则直接进行归档；若通过用户访问数据无法接收到当前条带全部数据分块，则从生产集群读取冷数据，然后进行归档；本步骤具体包括以下子步骤： (2) The archiving node updates the number of data blocks of each data node according to the received user access data. If the archiving node receives all the data blocks of the current stripe, it will archive directly; if the data cannot be accessed through the user After receiving all the data blocks of the current stripe, the cold data is read from the production cluster and then archived; this step specifically includes the following sub-steps:

(2.1)归档节点初始化归档条带，当前归档条带编号为i，i＝0；初始化归档节点中包含数据节点j的数据分块个数a_j的集合a为[0，0，0，0]； (2.1) The archiving node initializes the archiving stripe, the current archiving stripe number is i, i=0; the set a of the number of data blocks a _j including the data node j in the initializing archiving node is [0, 0, 0, 0 ];

(2.2)归档节点接收用户访问数据(热数据)，并跟新集合a； (2.2) The archive node receives user access data (hot data) and follows the new collection a;

(2.3)归档节点判断是否接收到归档条带i的全部数据分块，若是，进入步骤(2.7)；若否，则进入步骤(2.4)； (2.3) The archiving node judges whether it has received all the data blocks of the archiving strip i, if so, enters step (2.7); if not, then enters step (2.4);

(2.4)归档节点等待时间间隔T；在时长T的时间间隔内，归档节点不断从用户缓存接收数据分块；本实施例中，T为1秒； (2.4) The archiving node waits for a time interval T; within the time interval of the duration T, the archiving node continuously receives data blocks from the user cache; in this embodiment, T is 1 second;

(2.5)归档节点跟新集合a，再次判断是否接收到归档条带i的全部数据分块，若是，进入步骤(2.7)；若否，则进入步骤(2.6)； (2.5) The archiving node follows the new set a, and judges again whether all the data blocks of the archiving stripe i have been received, if so, enter step (2.7); if not, enter step (2.6);

(2.6)归档节点从生产集群中直接读取冷数据，从数据节点j读取的数据分块数为Num_j，Num_j＝max{(μ-a_j-λ(j)),0}，以凑齐μ个条带，然后进入步骤(2.3)； (2.6) The archive node directly reads cold data from the production cluster, and the number of data blocks read from data node j is Num _j , Num _j = max{(μ-a _j -λ(j)),0}, To get together μ strips, and then enter step (2.3);

(2.7)归档当前条带，并将条带号i值加1； (2.7) Archive the current stripe, and add 1 to the value of the stripe number i;

(2.8)更新管理服务器中的归档数据状态表； (2.8) update the archive data status table in the management server;

(2.9)判断是否完成生产集群中全部条带的归档，若否，则进入步骤(2.3)；若是，则结束归档。 (2.9) Judging whether the archiving of all stripes in the production cluster has been completed, if not, proceed to step (2.3); if yes, end the archiving.

以下结合具体实施例阐述本发明提供的基于数据缓存的纠删码归档方法；实例中采用的是(k+r,k)RS编码方式；其中，k指原始数据分块的个数，r表示编码生成的校验分块个数，k个数据分块和r个校验分块中任意k个均可以解码出k个原始数据分块。 The erasure correction code archiving method based on data caching provided by the present invention is described below in conjunction with specific embodiments; what adopt in the example is (k+r, k) RS coding mode; Wherein, k refers to the number of original data blocks, and r represents The number of verification blocks generated by encoding, any k of k data blocks and r verification blocks can decode k original data blocks.

如图2所示，是实施例中采用RS(6,4)编码集群下基于数据缓存的纠删码归档方法的示例图；其中，纠删码归档集群中总共有六个节点，其中四个是数据节点，两个是检验节点，采用(6,4)RS的编码方式来实现并保障数据完整性，节点编号为{1,2,3,4,5,6}； As shown in Figure 2, it is an example diagram of the erasure code archiving method based on data caching under the RS(6,4) encoding cluster in the embodiment; wherein, there are a total of six nodes in the erasure code archiving cluster, of which four It is a data node, and two are inspection nodes, using (6,4) RS coding method to realize and ensure data integrity, and the node numbers are {1,2,3,4,5,6};

其中，来自于生产集群中的任一数据分块的逻辑地址均对应于纠删码集群中四个数据节点中的其中一个，归档节点在时间T内归档条带数μ＝3。 Wherein, the logical address of any data block from the production cluster corresponds to one of the four data nodes in the erasure code cluster, and the archiving node archives the stripe number μ=3 within time T.

本实施例提供的集群下基于数据缓存的纠删码归档方法，具体如下： The erasure code archiving method based on data cache under the cluster provided in this embodiment is as follows:

(1)实施例中，第一次用户访问的数据分块个数集合是[2，4，5，3]，即，从数据节点1、数据节点2、数据节点3和数据节点4分别获得2个、4个、5个和3个数据分块； (1) In the embodiment, the number of data blocks accessed by the user for the first time is [2, 4, 5, 3], that is, obtained from data node 1, data node 2, data node 3 and data node 4 respectively 2, 4, 5 and 3 data blocks;

访问节点首先查询管理服务器，获得该数据分块集合的全部地址信息；同时，归档节点初始化条带i，；此时，归档节点中各数据节点的数据分块个数为[0，0，0，0]；其中，i的初始值为0； The access node first queries the management server to obtain all the address information of the data block set; at the same time, the archive node initializes the stripe i; at this time, the number of data blocks of each data node in the archive node is [0, 0, 0 ,0]; among them, the initial value of i is 0;

(2)生产集群读取数据分块集合到缓冲区；第一次访问时，交换机将数据分块集合同时转发到访问节点和归档节点； (2) The production cluster reads the data block set to the buffer; when accessing for the first time, the switch forwards the data block set to the access node and the archive node at the same time;

(3)访问节点获得数据分块集合，响应用户请求；同时，归档节点获得数据分块集合，更新各数据节点的数据分块个数为[2，4，5，3]； (3) The access node obtains the data block set and responds to the user request; at the same time, the archiving node obtains the data block set, and updates the data block numbers of each data node to [2, 4, 5, 3];

上述步骤(1)(2)(3)对应图2中①所指示的流程； The above steps (1)(2)(3) correspond to the process indicated by ① in Figure 2;

(4)归档节点进行条带i编码计算，将4个数据分块和2个校验分块发送到纠删码归档集群，i增加1，此时i为1；归档节点清除已归档条带0的数据分块； (4) The archiving node performs stripe i encoding calculation, sends 4 data blocks and 2 check blocks to the erasure code archiving cluster, i increases by 1, and i is 1 at this time; the archiving node clears the archived stripe 0 data blocks;

(5)若归档节点收到条带i的所有数据分块，则进行归档；并将4个数据分块和2个校验分块发送到纠删码归档集群，i增加1，此时i为2； (5) If the archiving node receives all the data blocks of stripe i, it will archive; and send 4 data blocks and 2 check blocks to the erasure code archiving cluster, i increases by 1, at this time i is 2;

删除归档节点中已经归档的条带1的数据分块；归档节点中各节点数据分块个数为[0，2，3，1]； Delete the data block of stripe 1 that has been archived in the archive node; the number of data blocks of each node in the archive node is [0, 2, 3, 1];

上述步骤(4)(5)对应图2中②所指示的流程； The above steps (4) (5) correspond to the process indicated by ② in Fig. 2;

(6)若归档节点中未收到条带i的全部数据分块，则进入等待状态；同时，用户进行第二次访问，数据分块集合为[0，3，3，4]，假定数据分块集合中没有与第一次访问重复的数据分块，则交换机将数据分块集合同时转发到访问节点和归档节点； (6) If the archiving node has not received all the data blocks of the stripe i, it will enter the waiting state; at the same time, the user conducts the second visit, and the set of data blocks is [0, 3, 3, 4], assuming that the data If there is no data block duplicated with the first access in the block set, the switch forwards the data block set to the access node and the archiving node at the same time;

(7)访问节点相应用户第二次访问；同时，归档节点接收数据分块集合[0，3，3，4]，更新节点数据分块个数为[0，5，6，5]； (7) The second visit of the corresponding user of the access node; at the same time, the archive node receives the data block set [0, 3, 3, 4], and the update node data block number is [0, 5, 6, 5];

上述步骤(6)(7)对应图2中③所指示的流程； Above-mentioned steps (6)(7) correspond to the flow process indicated by 3. in Fig. 2;

(8)归档节点等待时间段T后，若仍未接收到条带i的全部数据分块，则直接从生产集群中读取数据分块； (8) After the archive node waits for the time period T, if it has not received all the data blocks of the stripe i, it will directly read the data blocks from the production cluster;

在下一个时间段T内，用户访问的数据分块λ为[1，1，1，1]，根据NUM_j＝max{(μ-a_j-λ(j)),0}，获得从四个数据节点读取的数据分块个数为[2，0，0，0]；需要从数据节点1读取两个对应的数据分块；其中，归档节点在时间T内所能归档的条带数μ＝3；归档节点中包含数据节点j的数据分块个数a_j的集合a为[0，5，6，5]； In the next time period T, the data block λ accessed by the user is [1, 1, 1, 1]. According to NUM _j = max{(μ-a _j -λ(j)), 0}, four The number of data blocks read by the data node is [2, 0, 0, 0]; two corresponding data blocks need to be read from data node 1; among them, the stripes that the archive node can archive within time T The number μ=3; the set a of the number of data blocks a _j of the data node j in the archiving node is [0, 5, 6, 5];

(9)第三次用户访问数据分块集合为[1，1，2，1]，交换机将数据分块集合同时转发到访问节点和归档节点；同时，归档节点按数据分块集合[2，0，0，0]直接从生产集群中读取数据分块； (9) The third user access data block set is [1, 1, 2, 1], the switch forwards the data block set to the access node and the archiving node at the same time; at the same time, the archiving node sets the data block set [2, 0, 0, 0] read data blocks directly from the production cluster;

(10)归档节点接收来自用户缓存的数据分块集合[1，1，2，1]和直接读取的数据分块集合[2，0，0，0]，更新节点数据分块个数为[3，6，8，6]； (10) The archiving node receives the data block set [1, 1, 2, 1] from the user cache and the data block set [2, 0, 0, 0] read directly, and the number of data blocks to update the node is [3,6,8,6];

上述步骤(8)(9)(10)对应图2中④所指示的流程； Above-mentioned steps (8)(9)(10) correspond to the flow process indicated by ④ in Fig. 2;

(11)归档节点完成3个条带的归档计算，并将每个条带的数据分块和校验分块发送到纠删码归档集群，删除归档节点中已经归档的数据分块；此时i为5，归档节点中各节点数据分块的个数为[0，3，5，3]； (11) The archiving node completes the archiving calculation of three stripes, and sends the data block and verification block of each stripe to the erasure code archiving cluster, and deletes the data block already archived in the archiving node; at this time i is 5, and the number of data blocks of each node in the archive node is [0, 3, 5, 3];

上述步骤(11)对应图2中⑤所指示的流程； Above-mentioned step (11) corresponds to the flow process indicated by ⑤ in Fig. 2;

(12)重复步骤(1)～(11)，直到生产集群中全部条带的归档完成。 (12) Steps (1) to (11) are repeated until the archiving of all stripes in the production cluster is completed.

现有技术中，普通纠删码归档的过程如图3所示，具体为：归档节点直接从生产集群中读取条带所需的k个数据分块，然后获取该条带的r个校验分块，最后将k个数据分块和r个校验分块发送到纠删码归档集群保存；每次归档都是从生产集群中读取数据分块再计算出校验分块，然后将数据分块和校验分块发送到纠删码归档集群，直到归档全部完成； In the prior art, the general erasure code archiving process is shown in Figure 3, specifically: the archiving node directly reads the k data blocks required by the stripe from the production cluster, and then obtains the r collations of the stripe Finally, k data blocks and r verification blocks are sent to the erasure code archiving cluster for storage; each archive reads the data blocks from the production cluster and then calculates the verification blocks, and then Send data blocks and check blocks to the erasure code archiving cluster until the archiving is complete;

而本发发明提供的基于数据缓存的纠删码归档方法，具体如下： The erasure code archiving method based on data cache provided by the present invention is as follows:

(1)归档节点初始化条带，接收用户访问缓存的数据分块； (1) The archive node initializes the stripes and receives the data blocks of the user's access cache;

(2)如果当前归档条带的k个数据分块全部接收到，就进入步骤(4)；否则等待下一个时间段T内从用户访问的数据缓存中发送的数据； (2) If the k data blocks of the current archiving strip are all received, then enter step (4); otherwise, wait for the data sent from the data cache accessed by the user in the next time period T;

(3)判断是否接收到当前条带的全部数据分块，若是，则进入步骤(4)；若为否，则从生产集群中读取相应的数据分块以凑齐μ个条带； (3) Judging whether all the data blocks of the current stripe have been received, if so, enter step (4); if not, read the corresponding data blocks from the production cluster to get together μ stripes;

(4)归档节点对当前条带进行编码计算，将k个数据分块和r和校验分块发送到纠删码归档集群中保存； (4) The archiving node encodes and calculates the current strip, and sends k data blocks and r sum check blocks to the erasure code archiving cluster for storage;

(5)重复步骤(2)到(4)，直到生产集群中全部条带的归档完成。 (5) Steps (2) to (4) are repeated until the archiving of all stripes in the production cluster is completed.

实施例中，采用步骤(1)中的接收用户访问缓存的数据分块，相比于普通纠删码归档，大幅度减少了归档节点从生产集群中读取数据分块的次数，从而可以在基本不影响用户访问的情况下进行归档操作，提高归档效率；另一方面，通过交换机将用户访问缓存的数据分块转发到归档节点，避免了用户访问和普通纠删码归档对生产节点中相同数据分块的重复读取请求，可有效地减少生产集群中网络传输的数据量，减轻整个网络的负载。 In the embodiment, using the data block received from the user access cache in step (1), compared with the common erasure code archiving, greatly reduces the number of times the archiving node reads the data block from the production cluster, so that the The archiving operation is carried out without affecting the user access, and the archiving efficiency is improved; on the other hand, the data cached by the user is forwarded to the archiving node through the switch, which avoids the same effect on the production node as the user access and the ordinary erasure code archiving. Repeated read requests for data blocks can effectively reduce the amount of data transmitted by the network in the production cluster and reduce the load on the entire network.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。 Those skilled in the art can easily understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, All should be included within the protection scope of the present invention.

Claims

1. an erasure code archiving method based on data cache under a cluster, characterized in that, the erasure code archiving method specifically comprises the following steps:

(1) According to the user's access request, read the required data blocks from the production node to the node buffer, and send the data blocks to the access node, and forward the data blocks to the archiving node according to the archive data status table ;

(2) The archiving node updates the number of data blocks of each data node according to the received data blocks; and judges whether the archiving node has received all the data blocks of the current stripe, and if so, archives the current stripe ; If not, read the cold data block from the production cluster, and archive the stripes.

2. erasure code archiving method as claimed in claim 1, is characterized in that, described step (1) specifically comprises the following sub-steps:

(1.1) The access node obtains the address of the required data block in the production cluster from the management server;

(1.2) Send a read request to the node corresponding to the address of the data block in the generation cluster;

(1.3) Read the data block from the production node in the generation cluster where it is located to the node buffer;

(1.4) Judging whether the data block has been forwarded to the archiving node, if so, forwarding the data block to the access node; if not, forwarding the data block to both the access node and the archiving node.

3. The erasure code archiving method as claimed in claim 1 or 2, is characterized in that, described step (2) specifically comprises the following sub-steps:

(2.1) The archiving node receives data blocks, and updates the number of data blocks of each data node included in the archiving nodes;

(2.2) The archiving node initializes the current stripe, analyzes the node composition of the stripe, judges which data node the received data block belongs to, and initializes the storage variable; the node includes a data node and a check node;

(2.3) Judging by the archiving node whether all the data blocks of the current stripe have been received, if so, enter step (2.7); if not, then enter step (2.4);

(2.4) During the time interval of T, the archiving node receives data blocks from the user cache;

(2.5) Judging whether the k data blocks of the current stripe are collected, if so, enter step (2.7); if not, then enter step (2.6); wherein, the k value is equal to the number of data nodes;

(2.6) The archiving node directly reads the cold data from the production cluster to gather μ stripes, and then enters step (2.3); where μ is the maximum number of stripes that the archiving node can archive within the time interval T;

(2.7) archive the current strip, and add 1 to the value of the strip number i, and update the archived data status table;

(2.8) Judging whether the archiving of all the stripes is completed according to the archiving data status table, if not, then enter step (2.3); if yes, end the archiving.