CN106708653A - Mixed tax administration data security protecting method based on erasure code and multi-copy - Google Patents
Mixed tax administration data security protecting method based on erasure code and multi-copy Download PDFInfo
- Publication number
- CN106708653A CN106708653A CN201611252092.8A CN201611252092A CN106708653A CN 106708653 A CN106708653 A CN 106708653A CN 201611252092 A CN201611252092 A CN 201611252092A CN 106708653 A CN106708653 A CN 106708653A
- Authority
- CN
- China
- Prior art keywords
- data
- tax
- correcting
- copy
- eleting codes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000003860 storage Methods 0.000 claims abstract description 162
- 230000008569 process Effects 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000007726 management method Methods 0.000 claims description 38
- 238000012360 testing method Methods 0.000 claims description 18
- 238000013523 data management Methods 0.000 claims description 6
- 238000013500 data storage Methods 0.000 claims 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 239000011248 coating agent Substances 0.000 claims 1
- 238000000576 coating method Methods 0.000 claims 1
- 230000014759 maintenance of location Effects 0.000 claims 1
- 230000008439 repair process Effects 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 4
- 230000010076 replication Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种基于纠删码与多副本的混合税务大数据安全保护方法,当税务数据分布式存储系统的税务数据正常时,启动税务数据的多副本与纠删码存储方式存储流程,当税务数据分布式存储系统的税务数据失效时,启动税务数据容错处理流程。本发明利用不同时间的税务数据特点进行分模式存储,将纠删编码任务分发在不同的节点上,采用先副本后纠删码的模式,综合提高了整个税务数据的安全性和数据修复性能,提高了系统整体的编码性能,保证了在纠删编码完成之前数据的安全性。
The invention discloses a hybrid tax big data security protection method based on erasure codes and multiple copies. When the tax data in the tax data distributed storage system is normal, the multi-copy and erasure code storage mode storage process of the tax data is started. When the tax data in the tax data distributed storage system fails, start the tax data fault-tolerant processing process. The present invention uses the characteristics of tax data at different times to store in different modes, distributes erasure coding tasks to different nodes, adopts the mode of first copy and then erasure code, and comprehensively improves the security and data repair performance of the entire tax data. It improves the overall coding performance of the system and ensures the security of data before erasure coding is completed.
Description
技术领域technical field
本发明涉及计算机数据管理技术领域,具体涉及一种基于纠删码与多副本的混合税务大数据安全保护方法。The invention relates to the technical field of computer data management, in particular to a security protection method for hybrid tax big data based on erasure codes and multiple copies.
背景技术Background technique
随着经济全球化和我国经济的不断深入发展,我国纳税人数量迅猛增长、税种越发丰富,面对越来越庞大的税务数据,分布式存储是一个主流的存储方案,具有很高的性价比和扩展性。对于税务数据而言,其在分布式存储环境中的数据安全问题是值得研究的关键点。分布式存储系统包含大量节点,节点失效或者外部入侵都有可能导致数据不完整。为了避免数据丢失,通常采用基于冗余数据的容错方法,冗余容错主要有两种:一种是多副本容错,通过复制冗余数据进行容错;另一种是纠删码容错,通过编码生成冗余数据进行容错。With the economic globalization and the continuous in-depth development of my country's economy, the number of taxpayers in my country has grown rapidly and the types of taxes have become more abundant. Facing the increasingly large tax data, distributed storage is a mainstream storage solution with high cost performance and scalability. For tax data, its data security in a distributed storage environment is a key point worth studying. A distributed storage system contains a large number of nodes, and node failure or external intrusion may lead to incomplete data. In order to avoid data loss, a fault tolerance method based on redundant data is usually adopted. There are two main types of redundant fault tolerance: one is multi-copy fault tolerance, which replicates redundant data for fault tolerance; the other is erasure code fault tolerance, which is generated by encoding Redundant data for fault tolerance.
目前被广泛运用的容错方法是基于复制的多副本容错:将原数据复制成c个副本,然后将c个数据副本分发到c个不同的存储节点,这样任意c-1个节点失效时,每个数据至少还有1个副本存在。多副本容错具有简单易实现、计算开销少、数据访问性能好的优点。但是多副本容错也具有非常突出的缺点:存储开销很大。对于税务数据这种本身很庞大,且一直保持高速增长的数据而言,基于复制的多副本容错并不适用。The currently widely used fault tolerance method is multi-copy fault tolerance based on replication: copy the original data into c copies, and then distribute the c data copies to c different storage nodes, so that when any c-1 nodes fail, every There is at least 1 copy of the data. Multi-copy fault tolerance has the advantages of simple implementation, less computing overhead, and good data access performance. However, multi-copy fault tolerance also has a very prominent disadvantage: high storage overhead. For tax data, which is huge in itself and has been growing at a high speed, multi-copy fault tolerance based on replication is not applicable.
随着数据爆炸式的增长,纠删码容错因其能够以低得多的存储开销提供相同甚至更高的数据可靠性,近年来也开始成为研究热点。纠删码的容错策略是:将一个数据分成c个数据块,然后将c个数据块编码成n(n>c)个编码块分发到n个不同磁盘中,这样当节点失效时,只要该数据还有c个编码块存在,就能够将原数据解码出来。与被广泛使用的三副本容错方案相比,RS纠删码既可以将存储空间消耗降低53%,也同时可以将容错能力提高一倍。但是纠删码的缺陷在于数据重建时性能低下,尤其是在分布式存储中,由于数据重建需要多个节点相 互协作,不可避免地带来大量的网络资源消耗和计算资源消耗。对于税务数据这种分布式数据而言,这将成为整个系统性能的关键瓶颈。With the explosive growth of data, erasure code fault tolerance has become a research hotspot in recent years because it can provide the same or even higher data reliability with much lower storage overhead. The fault-tolerant strategy of erasure code is: divide a piece of data into c data blocks, and then encode the c data blocks into n (n>c) coded blocks and distribute them to n different disks, so that when a node fails, as long as the The data still has c encoding blocks, and the original data can be decoded. Compared with the widely used three-copy fault-tolerant scheme, RS erasure coding can reduce storage space consumption by 53% and double the fault-tolerant capacity at the same time. However, the disadvantage of erasure codes is that the performance of data reconstruction is low, especially in distributed storage. Since data reconstruction requires the cooperation of multiple nodes, it will inevitably lead to a large amount of network resource consumption and computing resource consumption. For distributed data such as tax data, this will become a key bottleneck in the performance of the entire system.
发明内容Contents of the invention
有鉴于此,为了解决现有技术中的上述问题,本发明提出一种基于纠删码与多副本的混合税务大数据安全保护方法。In view of this, in order to solve the above-mentioned problems in the prior art, the present invention proposes a hybrid tax big data security protection method based on erasure codes and multiple copies.
本发明通过以下技术手段解决上述问题:The present invention solves the above problems by the following technical means:
一种基于纠删码与多副本的混合税务大数据安全保护方法,当税务数据分布式存储系统的税务数据正常时,启动税务数据的多副本与纠删码存储方式存储流程;A hybrid tax big data security protection method based on erasure codes and multiple copies. When the tax data in the tax data distributed storage system is normal, the multi-copy and erasure code storage method storage process of tax data is started;
当税务数据分布式存储系统的税务数据失效时,启动税务数据容错处理流程;When the tax data in the tax data distributed storage system fails, start the tax data fault-tolerant processing process;
所述多副本与纠删码存储方式存储流程包括如下步骤:The storage process of the multi-copy and erasure code storage method includes the following steps:
步骤S11,将税务数据按时间划分为历史数据和近期数据,所述近期数据包括多个不同的近期数据包;Step S11, dividing the tax data into historical data and recent data according to time, the recent data includes a plurality of different recent data packets;
步骤S12,将所述近期数据按照多副本存储方式存储在多副本存储模块,将所述历史数据按照纠删码存储方式存储在纠删码存储模块;Step S12, storing the recent data in the multi-copy storage module according to the multi-copy storage method, and storing the historical data in the erasure code storage module according to the erasure code storage method;
步骤S13,当一近期数据包被标记为已完成状态,则将该近期数据包转存到纠删码存储模块从而形成历史数据;Step S13, when a recent data packet is marked as completed, then transfer the recent data packet to the erasure code storage module to form historical data;
所述税务数据容错处理流程包括如下步骤:The tax data fault-tolerant processing flow includes the following steps:
步骤S21,根据多副本存储模块数据管理节点,判断失效税务数据存储在多副本存储模块中还是纠删码存储模块中;Step S21, according to the data management node of the multi-copy storage module, it is judged whether the expired tax data is stored in the multi-copy storage module or in the erasure code storage module;
步骤S22,如果失效的税务数据存储在多副本存储模块,按照多副本管理节点的记录向相关多副本存储节点发送测试报文,根据测试报文反馈时延选择与失效税务数据对应的副本,并将副本恢复成有效的税务数据;Step S22, if the invalid tax data is stored in the multi-copy storage module, send a test message to the relevant multi-copy storage node according to the record of the multi-copy management node, select the copy corresponding to the invalid tax data according to the feedback delay of the test message, and Restoring copies into valid tax data;
步骤S23,如果失效的税务数据存储在纠删码存储模块,需进一步查找纠删 码管理节点的记录,向相关纠删码存储节点发送测试报文,然后根据测试报文反馈时延依次选择相应编码块,获得足够数量编码块后,即可还原恢复税务数据;Step S23, if the invalid tax data is stored in the erasure code storage module, it is necessary to further search for the records of the erasure code management node, send a test message to the relevant erasure code storage node, and then select the corresponding data according to the feedback delay of the test message. Encoding blocks, after obtaining a sufficient number of encoding blocks, the tax data can be restored;
所述税务数据分布式存储系统,用于提供针对税务数据的存储及容错服务;The tax data distributed storage system is used to provide storage and fault-tolerant services for tax data;
所述税务数据为税务数据分布式存储系统的客户端输入的数据;The tax data is the data input by the client of the tax data distributed storage system;
所述历史数据为税务数据分布式存储系统时间划分点之前的数据,存储在纠删码存储模块;The historical data is the data before the time division point of the tax data distributed storage system, which is stored in the erasure code storage module;
所述近期数据,为税务数据分布式存储系统时间划分点之后的数据,存储在多副本存储模块;The recent data is the data after the time division point of the tax data distributed storage system, and is stored in the multi-copy storage module;
所述多副本存储模块,用于存储与处理近期数据,包括一个多副本存储模块数据管理节点与至少一个多副本存储节点;The multi-copy storage module is used to store and process recent data, including a multi-copy storage module data management node and at least one multi-copy storage node;
所述多副本管理节点,用于管理多副本存储模块内数据的复制、分发和存储,并对数据信息进行记录;The multi-copy management node is used to manage the replication, distribution and storage of data in the multi-copy storage module, and record data information;
所述多副本存储节点,用于存储近期数据;The multi-copy storage node is used to store recent data;
所述纠删码存储模块,用于存储与处理历史数据,包括一个纠删码管理节点与至少一个纠删码存储节点;The erasure code storage module is used to store and process historical data, including an erasure code management node and at least one erasure code storage node;
所述纠删码管理节点,用于管理纠删码存储模块内数据的编码、分发和存储,并对数据信息进行记录;The erasure code management node is used to manage the encoding, distribution and storage of data in the erasure code storage module, and record the data information;
所述纠删码存储节点,用于存储历史数据;The erasure code storage node is used to store historical data;
所述多副本存储方式,用于通过税务数据分布式存储系统来读取、存储、记录和恢复近期数据;The multi-copy storage method is used to read, store, record and restore recent data through the tax data distributed storage system;
所述纠删码存储方式,用于通过税务数据分布式存储系统来转存、读取、记录和恢复历史数据;The erasure code storage method is used to dump, read, record and restore historical data through the tax data distributed storage system;
所述编码块为待转存的近期数据被分包并编码后形成的编码块,存储在纠删码存储节点,用于在税务数据容错处理过程中还原恢复成税务数据。The coded block is the coded block formed after the recent data to be dumped is sub-packaged and coded, and stored in the erasure code storage node, and used to restore and restore the tax data in the fault-tolerant processing of the tax data.
进一步地,步骤S12中所述的纠删码存储方式包括以下步骤:Further, the erasure code storage method described in step S12 includes the following steps:
步骤S1221,由纠删码管理节点判断外部对纠删码存储模块的访问频度是否低于访问频度阈值,从而判断当前纠删码存储模块是否处于空闲状态,如果是则激活全部纠删码存储节点;Step S1221, the erasure code management node judges whether the frequency of external access to the erasure code storage module is lower than the access frequency threshold, thereby judging whether the current erasure code storage module is in an idle state, and if so, activates all erasure codes storage node;
步骤S1222,对每一个被激活的纠删码存储节点进行如下判断:该纠删码存储节点的存储负载是否超过存储满载阈值,以及该纠删码存储节点的网络负载是否超过网络满载阈值,如果均不超过,则向多副本管理节点请求待转存数据;Step S1222, judge each activated erasure code storage node as follows: whether the storage load of the erasure code storage node exceeds the storage full load threshold, and whether the network load of the erasure code storage node exceeds the network full load threshold, if If they are not exceeded, then request the data to be dumped from the multi-copy management node;
步骤S1223,将待转存数据编码后,分发并保存在纠删码存储节点,并将分发信息记录在纠删码管理节点;Step S1223, after encoding the data to be dumped, distribute and save it in the erasure code storage node, and record the distribution information in the erasure code management node;
步骤S1224,确认数据转存成功后,将多副本存储模块中已转存的税务数据及其副本全部删除;Step S1224, after confirming that the data transfer is successful, delete all the transferred tax data and its copies in the multi-copy storage module;
所述待转存数据为多副本管理节点记录中被申请转存数据的某一个副本数据,该副本数据的选择原则需符合负载均衡,该副本数据用于编码后分发并保存在纠删码存储节点;The data to be dumped is a certain copy data of the data to be dumped in the record of the multi-copy management node. The selection principle of the copy data must conform to the load balance. The copy data is used for distribution after encoding and stored in the erasure code storage node;
所述分发信息为多个编码块分发到多个纠删码存储节点的记录信息,用于指引多个编码块还原恢复成税务数据。The distribution information is record information that multiple coded blocks are distributed to multiple erasure code storage nodes, and is used to guide multiple coded blocks to restore and restore tax data.
进一步地,步骤S12中所述的多副本存储方式中,写入近期数据的处理流程包括如下步骤:Further, in the multi-copy storage method described in step S12, the processing flow of writing recent data includes the following steps:
步骤S1231,当客户端发出近期数据请求写入时,多副本管理节点进行响应;Step S1231, when the client sends a recent data request to write, the multi-copy management node responds;
步骤S1232,对写入的税务数据进行复制形成副本,并将写入的税务数据及其副本分开存放在不同的多副本存储节点中;Step S1232, copying the written tax data to form a copy, and storing the written tax data and its copies separately in different multi-copy storage nodes;
步骤S1233,将写入的税务数据的存储信息记录在多副本管理节点。Step S1233, recording the storage information of the written tax data in the multi-copy management node.
进一步地,步骤S12中所述的多副本存储方式中,读取近期数据的处理流程包括如下步骤:Further, in the multi-copy storage method described in step S12, the processing flow of reading recent data includes the following steps:
步骤S1241,当客户端发出近期数据读取请求时,多副本管理节点进行响应并根据记录向相关多副本存储节点发送测试报文并请求计算负载;Step S1241, when the client sends a recent data read request, the multi-copy management node responds and sends a test message to the relevant multi-copy storage node according to the record and requests for calculation load;
步骤S1242,通过测试报文反馈的时延和相关多副本存储节点的计算负载来综合选择对应的多副本存储节点;Step S1242, comprehensively select the corresponding multi-copy storage node by testing the delay of message feedback and the computing load of the related multi-copy storage node;
步骤S1243,根据多副本管理节点的分发让对应的多副本存储节点内的税务数据直接发送到客户端中;Step S1243, according to the distribution of the multi-copy management node, the tax data in the corresponding multi-copy storage node is directly sent to the client;
所述客户端为分布式存储系统的客户端,用于写入与读取税务数据。The client is a client of the distributed storage system and is used for writing and reading tax data.
进一步地,步骤S1232中,写入的税务数据及其副本的存放方式是将同一税务数据的不同副本进行物理隔离,选择不同的机柜或机房存储。Further, in step S1232, the storage method of the written tax data and its copies is to physically isolate different copies of the same tax data, and select different cabinets or computer rooms for storage.
进一步地,所述编码块形成的过程包括以下步骤:Further, the process of forming the coding block includes the following steps:
步骤61,待转存的近期数据被分包成C个数据块;Step 61, the recent data to be dumped is subpackaged into C data blocks;
步骤62,将C个数据块编码成N个编码块,所述N的数目大于C;Step 62, encoding C data blocks into N encoding blocks, the number of N is greater than C;
步骤63,N个编码块分发到N个不同的纠删码存储模块;Step 63, N coded blocks are distributed to N different erasure code storage modules;
步骤64,将N个编码块分发信息记录在编码块所在的纠删码管理节点。Step 64: Record the distribution information of the N coded blocks in the erasure code management node where the coded blocks are located.
本发明利用不同时间的税务数据特点进行分模式存储,综合提高了整个税务数据的安全性和数据修复性能。由于税务数据的访问频度具有阶段性变化的特点,近期的数据访问频度是最高的,历史数据的访问频度则相对较低。在访问频度低的数据上使用纠删码,可以提高存储空间使用率,在访问频度高的数据上使用多副本,提高了数据修复性能。The present invention utilizes the characteristics of tax data at different times to store in different modes, and comprehensively improves the security and data restoration performance of the entire tax data. Because the access frequency of tax data has the characteristics of periodic changes, the access frequency of recent data is the highest, while the access frequency of historical data is relatively low. Using erasure codes on data with low access frequency can improve storage space utilization, and using multiple copies on data with high access frequency improves data repair performance.
其次,本发明将纠删编码任务分发在不同的节点上,且选择节点时充分考虑节点的负载情况,将计算负载和网络传输负载分担在多个节点中,提高系统整体的编码性能。Secondly, the present invention distributes erasure correction coding tasks to different nodes, fully considers the load conditions of the nodes when selecting nodes, and shares the calculation load and network transmission load among multiple nodes to improve the overall coding performance of the system.
再次,本发明采用先副本后纠删码的模式,能够保证在纠删编码完成之前数据的安全性,弥补了单纯使用纠删码容错时容易遇到的编码中数据丢失的情况。Thirdly, the present invention adopts the mode of copy first and then erasure code, which can ensure the security of data before erasure code is completed, and makes up for the situation of data loss in coding that is easy to encounter when simply using erasure code for error tolerance.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所 需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.
图1是本发明一种基于纠删码与多副本的混合税务大数据安全保护方法的工作流程图;Fig. 1 is a working flow diagram of a hybrid tax big data security protection method based on erasure codes and multiple copies in the present invention;
图2是本发明的多副本存储模块的结构示意图;Fig. 2 is a schematic structural diagram of a multi-copy storage module of the present invention;
图3是本发明的纠删码存储模块的结构示意图。Fig. 3 is a schematic structural diagram of an erasure code storage module of the present invention.
具体实施方式detailed description
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合附图和具体的实施例对本发明的技术方案进行详细说明。需要指出的是,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例,基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the above objects, features and advantages of the present invention more comprehensible, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be pointed out that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all those skilled in the art can obtain without creative work. Other embodiments all belong to the protection scope of the present invention.
如图1所示,一种基于纠删码与多副本的混合税务大数据安全保护方法,当税务数据分布式存储系统的税务数据正常时,启动税务数据的多副本与纠删码存储方式存储流程;As shown in Figure 1, a hybrid tax big data security protection method based on erasure codes and multiple copies, when the tax data in the tax data distributed storage system is normal, the multi-copy and erasure code storage mode storage of tax data is started process;
当税务数据分布式存储系统的税务数据失效时,启动税务数据容错处理流程;When the tax data in the tax data distributed storage system fails, start the tax data fault-tolerant processing process;
所述多副本与纠删码存储方式存储流程包括如下步骤:The storage process of the multi-copy and erasure code storage method includes the following steps:
步骤S11,将税务数据按时间划分为历史数据和近期数据,所述近期数据包括多个不同的近期数据包;Step S11, dividing the tax data into historical data and recent data according to time, the recent data includes a plurality of different recent data packets;
步骤S12,将所述近期数据按照多副本存储方式存储在多副本存储模块,将所述历史数据按照纠删码存储方式存储在纠删码存储模块;Step S12, storing the recent data in the multi-copy storage module according to the multi-copy storage method, and storing the historical data in the erasure code storage module according to the erasure code storage method;
步骤S13,当一近期数据包被标记为已完成状态,则将该近期数据包转存到 纠删码存储模块从而形成历史数据;Step S13, when a recent data packet is marked as completed, then transfer the recent data packet to the erasure code storage module to form historical data;
所述税务数据容错处理流程包括如下步骤:The tax data fault-tolerant processing flow includes the following steps:
步骤S21,根据多副本存储模块数据管理节点,判断失效税务数据存储在多副本存储模块中还是纠删码存储模块中;Step S21, according to the data management node of the multi-copy storage module, it is judged whether the expired tax data is stored in the multi-copy storage module or in the erasure code storage module;
步骤S22,如果失效的税务数据存储在多副本存储模块,按照多副本管理节点的记录向相关多副本存储节点发送测试报文,根据测试报文反馈时延选择与失效税务数据对应的副本,并将副本恢复成有效的税务数据;Step S22, if the invalid tax data is stored in the multi-copy storage module, send a test message to the relevant multi-copy storage node according to the record of the multi-copy management node, select the copy corresponding to the invalid tax data according to the feedback delay of the test message, and Restoring copies into valid tax data;
步骤S23,如果失效的税务数据存储在纠删码存储模块,需进一步查找纠删码管理节点的记录,向相关纠删码存储节点发送测试报文,然后根据测试报文反馈时延依次选择相应编码块,获得足够数量编码块后,即可还原恢复税务数据;Step S23, if the invalid tax data is stored in the erasure code storage module, it is necessary to further search for the records of the erasure code management node, send a test message to the relevant erasure code storage node, and then select the corresponding data according to the feedback delay of the test message. Encoding blocks, after obtaining a sufficient number of encoding blocks, the tax data can be restored;
所述税务数据分布式存储系统,用于提供针对税务数据的存储及容错服务;The tax data distributed storage system is used to provide storage and fault-tolerant services for tax data;
所述税务数据为税务数据分布式存储系统的客户端输入的数据;The tax data is the data input by the client of the tax data distributed storage system;
所述历史数据为税务数据分布式存储系统时间划分点之前的数据,存储在纠删码存储模块;The historical data is the data before the time division point of the tax data distributed storage system, which is stored in the erasure code storage module;
所述近期数据,为税务数据分布式存储系统时间划分点之后的数据,存储在多副本存储模块。The recent data is the data after the time division point of the tax data distributed storage system, and is stored in the multi-copy storage module.
如图2所示,所述多副本存储模块,用于存储与处理近期数据,包括一个多副本存储模块数据管理节点与至少一个多副本存储节点;As shown in Figure 2, the multi-copy storage module is used to store and process recent data, including a multi-copy storage module data management node and at least one multi-copy storage node;
所述多副本管理节点,用于管理多副本存储模块内数据的复制、分发和存储,并对数据信息进行记录;The multi-copy management node is used to manage the replication, distribution and storage of data in the multi-copy storage module, and record data information;
所述多副本存储节点,用于存储近期数据。The multi-copy storage nodes are used to store recent data.
如图3所示,所述纠删码存储模块,用于存储与处理历史数据,包括一个纠删码管理节点与至少一个纠删码存储节点;As shown in Figure 3, the erasure code storage module is used to store and process historical data, including an erasure code management node and at least one erasure code storage node;
所述纠删码管理节点,用于管理纠删码存储模块内数据的编码、分发和存储,并对数据信息进行记录;The erasure code management node is used to manage the encoding, distribution and storage of data in the erasure code storage module, and record the data information;
所述纠删码存储节点,用于存储历史数据。The erasure code storage node is used to store historical data.
所述多副本存储方式,用于通过税务数据分布式存储系统来读取、存储、记录和恢复近期数据;The multi-copy storage method is used to read, store, record and restore recent data through the tax data distributed storage system;
所述纠删码存储方式,用于通过税务数据分布式存储系统来转存、读取、记录和恢复历史数据;The erasure code storage method is used to dump, read, record and restore historical data through the tax data distributed storage system;
所述编码块为待转存的近期数据被分包并编码后形成的编码块,存储在纠删码存储节点,用于在税务数据容错处理过程中还原恢复成税务数据。The coded block is the coded block formed after the recent data to be dumped is sub-packaged and coded, and stored in the erasure code storage node, and used to restore and restore the tax data in the fault-tolerant processing of the tax data.
步骤S12中所述的纠删码存储方式包括以下步骤:The erasure code storage method described in step S12 includes the following steps:
步骤S1221,由纠删码管理节点判断外部对纠删码存储模块的访问频度是否低于访问频度阈值,从而判断当前纠删码存储模块是否处于空闲状态,如果是则激活全部纠删码存储节点;Step S1221, the erasure code management node judges whether the frequency of external access to the erasure code storage module is lower than the access frequency threshold, thereby judging whether the current erasure code storage module is in an idle state, and if so, activates all erasure codes storage node;
步骤S1222,对每一个被激活的纠删码存储节点进行如下判断:该纠删码存储节点的存储负载是否超过存储满载阈值,以及该纠删码存储节点的网络负载是否超过网络满载阈值,如果均不超过,则向多副本管理节点请求待转存数据;Step S1222, judge each activated erasure code storage node as follows: whether the storage load of the erasure code storage node exceeds the storage full load threshold, and whether the network load of the erasure code storage node exceeds the network full load threshold, if If they are not exceeded, then request the data to be dumped from the multi-copy management node;
步骤S1223,将待转存数据编码后,分发并保存在纠删码存储节点,并将分发信息记录在纠删码管理节点;Step S1223, after encoding the data to be dumped, distribute and save it in the erasure code storage node, and record the distribution information in the erasure code management node;
步骤S1224,确认数据转存成功后,将多副本存储模块中已转存的税务数据及其副本全部删除;Step S1224, after confirming that the data transfer is successful, delete all the transferred tax data and its copies in the multi-copy storage module;
所述待转存数据为多副本管理节点记录中被申请转存数据的某一个副本数据,该副本数据的选择原则需符合负载均衡,该副本数据用于编码后分发并保存在纠删码存储节点;The data to be dumped is a certain copy data of the data to be dumped in the record of the multi-copy management node. The selection principle of the copy data must conform to the load balance. The copy data is used for distribution after encoding and stored in the erasure code storage node;
所述分发信息为多个编码块分发到多个纠删码存储节点的记录信息,用于指引多个编码块还原恢复成税务数据。The distribution information is record information that multiple coded blocks are distributed to multiple erasure code storage nodes, and is used to guide multiple coded blocks to restore and restore tax data.
步骤S12中所述的多副本存储方式中,写入近期数据的处理流程包括如下步骤:In the multi-copy storage method described in step S12, the processing flow of writing recent data includes the following steps:
步骤S1231,当客户端发出近期数据请求写入时,多副本管理节点进行响应;Step S1231, when the client sends a recent data request to write, the multi-copy management node responds;
步骤S1232,对写入的税务数据进行复制形成副本,并将写入的税务数据及其副本分开存放在不同的多副本存储节点中;Step S1232, copying the written tax data to form a copy, and storing the written tax data and its copies separately in different multi-copy storage nodes;
步骤S1233,将写入的税务数据的存储信息记录在多副本管理节点。Step S1233, recording the storage information of the written tax data in the multi-copy management node.
步骤S12中所述的多副本存储方式中,读取近期数据的处理流程包括如下步骤:In the multi-copy storage method described in step S12, the processing flow of reading recent data includes the following steps:
步骤S1241,当客户端发出近期数据读取请求时,多副本管理节点进行响应并根据记录向相关多副本存储节点发送测试报文并请求计算负载;Step S1241, when the client sends a recent data read request, the multi-copy management node responds and sends a test message to the relevant multi-copy storage node according to the record and requests for calculation load;
步骤S1242,通过测试报文反馈的时延和相关多副本存储节点的计算负载来综合选择对应的多副本存储节点;Step S1242, comprehensively select the corresponding multi-copy storage node by testing the delay of message feedback and the computing load of the related multi-copy storage node;
步骤S1243,根据多副本管理节点的分发让对应的多副本存储节点内的税务数据直接发送到客户端中;Step S1243, according to the distribution of the multi-copy management node, the tax data in the corresponding multi-copy storage node is directly sent to the client;
所述客户端为分布式存储系统的客户端,用于写入与读取税务数据。The client is a client of the distributed storage system and is used for writing and reading tax data.
步骤S1232中,写入的税务数据及其副本的存放方式是将同一税务数据的不同副本进行物理隔离,选择不同的机柜或机房存储。In step S1232, the written tax data and its copies are stored by physically separating different copies of the same tax data and selecting different cabinets or computer rooms for storage.
所述编码块形成的过程包括以下步骤:The process of forming the coding block includes the following steps:
步骤61,待转存的近期数据被分包成C个数据块;Step 61, the recent data to be dumped is subpackaged into C data blocks;
步骤62,将C个数据块编码成N个编码块,所述N的数目大于C;Step 62, encoding C data blocks into N encoding blocks, the number of N is greater than C;
步骤63,N个编码块分发到N个不同的纠删码存储模块;Step 63, N coded blocks are distributed to N different erasure code storage modules;
步骤64,将N个编码块分发信息记录在编码块所在的纠删码管理节点。Step 64: Record the distribution information of the N coded blocks in the erasure code management node where the coded blocks are located.
本发明利用不同时间的税务数据特点进行分模式存储,综合提高了整个税务数据的安全性和数据修复性能。由于税务数据的访问频度具有阶段性变化的特点,近期的数据访问频度是最高的,历史数据的访问频度则相对较低。在访问频度低的数据上使用纠删码,可以提高存储空间使用率,在访问频度高的数据上使用多副本,提高了数据修复性能。The present invention utilizes the characteristics of tax data at different times to store in different modes, and comprehensively improves the security and data restoration performance of the entire tax data. Because the access frequency of tax data has the characteristics of periodic changes, the access frequency of recent data is the highest, while the access frequency of historical data is relatively low. Using erasure codes on data with low access frequency can improve storage space utilization, and using multiple copies on data with high access frequency improves data repair performance.
其次,本发明将纠删编码任务分发在不同的节点上,且选择节点时充分考虑节点的负载情况,将计算负载和网络传输负载分担在多个节点中, 提高系统整体的编码性能。Secondly, the present invention distributes erasure correction coding tasks to different nodes, fully considers the load conditions of the nodes when selecting nodes, and shares the calculation load and network transmission load among multiple nodes to improve the overall coding performance of the system.
再次,本发明采用先副本后纠删码的模式,能够保证在纠删编码完成之前数据的安全性,弥补了单纯使用纠删码容错时容易遇到的编码中数据丢失的情况。Thirdly, the present invention adopts the mode of copy first and then erasure code, which can ensure the security of data before erasure code is completed, and makes up for the situation of data loss in coding that is easy to encounter when simply using erasure code for error tolerance.
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the present invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611252092.8A CN106708653B (en) | 2016-12-29 | 2016-12-29 | A hybrid tax big data security protection method based on erasure coding and multiple copies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611252092.8A CN106708653B (en) | 2016-12-29 | 2016-12-29 | A hybrid tax big data security protection method based on erasure coding and multiple copies |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106708653A true CN106708653A (en) | 2017-05-24 |
CN106708653B CN106708653B (en) | 2020-06-30 |
Family
ID=58904096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611252092.8A Active CN106708653B (en) | 2016-12-29 | 2016-12-29 | A hybrid tax big data security protection method based on erasure coding and multiple copies |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106708653B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108196978A (en) * | 2017-12-22 | 2018-06-22 | 新华三技术有限公司 | Date storage method, device, data-storage system and readable storage medium storing program for executing |
CN108255432A (en) * | 2018-01-12 | 2018-07-06 | 郑州云海信息技术有限公司 | Write operation control method, system, device and storage medium based on bedding storage |
CN110196682A (en) * | 2018-06-15 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Data managing method, calculates equipment and storage medium at device |
CN110209670A (en) * | 2019-05-09 | 2019-09-06 | 北京猫盘技术有限公司 | Data processing method and device based on network storage equipment cluster |
CN111008181A (en) * | 2019-10-31 | 2020-04-14 | 苏州浪潮智能科技有限公司 | A distributed file system storage policy switching method, system, terminal and storage medium |
CN111381767A (en) * | 2018-12-28 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN111782582A (en) * | 2019-06-14 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Data conversion method, system and name node |
CN112965660A (en) * | 2021-02-09 | 2021-06-15 | 山东英信计算机技术有限公司 | Method, system, device and medium for feeding back information of double storage pools |
CN114398006A (en) * | 2021-12-24 | 2022-04-26 | 中国电信股份有限公司 | Distributed storage mode control method, device, equipment and storage medium |
CN114764425A (en) * | 2021-01-13 | 2022-07-19 | 北京金山云网络技术有限公司 | Object updating method and device, storage medium and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103118133A (en) * | 2013-02-28 | 2013-05-22 | 浙江大学 | Mixed cloud storage method based on file access frequency |
CN105472047A (en) * | 2016-02-03 | 2016-04-06 | 天津书生云科技有限公司 | Storage system |
-
2016
- 2016-12-29 CN CN201611252092.8A patent/CN106708653B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103118133A (en) * | 2013-02-28 | 2013-05-22 | 浙江大学 | Mixed cloud storage method based on file access frequency |
CN105472047A (en) * | 2016-02-03 | 2016-04-06 | 天津书生云科技有限公司 | Storage system |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108196978B (en) * | 2017-12-22 | 2021-03-09 | 新华三技术有限公司 | Data storage method, device, data storage system and readable storage medium |
CN108196978A (en) * | 2017-12-22 | 2018-06-22 | 新华三技术有限公司 | Date storage method, device, data-storage system and readable storage medium storing program for executing |
CN108255432A (en) * | 2018-01-12 | 2018-07-06 | 郑州云海信息技术有限公司 | Write operation control method, system, device and storage medium based on bedding storage |
CN110196682A (en) * | 2018-06-15 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Data managing method, calculates equipment and storage medium at device |
CN111381767B (en) * | 2018-12-28 | 2024-03-26 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN111381767A (en) * | 2018-12-28 | 2020-07-07 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN110209670B (en) * | 2019-05-09 | 2022-03-25 | 北京猫盘技术有限公司 | Data processing method and device based on network storage device cluster |
CN110209670A (en) * | 2019-05-09 | 2019-09-06 | 北京猫盘技术有限公司 | Data processing method and device based on network storage equipment cluster |
CN111782582A (en) * | 2019-06-14 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Data conversion method, system and name node |
CN111008181A (en) * | 2019-10-31 | 2020-04-14 | 苏州浪潮智能科技有限公司 | A distributed file system storage policy switching method, system, terminal and storage medium |
CN114764425A (en) * | 2021-01-13 | 2022-07-19 | 北京金山云网络技术有限公司 | Object updating method and device, storage medium and electronic equipment |
CN112965660A (en) * | 2021-02-09 | 2021-06-15 | 山东英信计算机技术有限公司 | Method, system, device and medium for feeding back information of double storage pools |
CN112965660B (en) * | 2021-02-09 | 2023-08-08 | 山东英信计算机技术有限公司 | Method, system, equipment and medium for double storage pool information feedback |
CN114398006A (en) * | 2021-12-24 | 2022-04-26 | 中国电信股份有限公司 | Distributed storage mode control method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106708653B (en) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106708653A (en) | Mixed tax administration data security protecting method based on erasure code and multi-copy | |
US10956276B2 (en) | System state recovery in a distributed, cloud-based storage system | |
US9535790B2 (en) | Prioritizing data reconstruction in distributed storage systems | |
CN101667181B (en) | Method, device and system for data disaster tolerance | |
CN103118133B (en) | Based on the mixed cloud storage means of the file access frequency | |
CN103780638B (en) | Method of data synchronization and system | |
CN108196978A (en) | Date storage method, device, data-storage system and readable storage medium storing program for executing | |
CN107885612A (en) | Data processing method and system and device | |
CN103051681B (en) | Collaborative type log system facing to distribution-type file system | |
CN107291889A (en) | A kind of date storage method and system | |
CN102142006A (en) | File processing method and device of distributed file system | |
CN106066896A (en) | A kind of big Data duplication applying perception deletes storage system and method | |
CN101692226A (en) | Storage method of mass filing stream data | |
CN104899117A (en) | Memory database parallel logging method for nonvolatile memory | |
CN109582213A (en) | Data reconstruction method and device, data-storage system | |
CN102142032A (en) | Method and system for reading and writing data of distributed file system | |
CN104424052A (en) | Automatic redundant distributed storage system and method | |
CN104965835B (en) | A kind of file read/write method and device of distributed file system | |
KR101254179B1 (en) | Method for effective data recovery in distributed file system | |
CN103530206A (en) | Data recovery method and device | |
CN106027638A (en) | Hadoop data distribution method based on hybrid coding | |
US20140040574A1 (en) | Resiliency with a destination volume in a replication environment | |
CN103399943A (en) | Communication method and communication device for parallel query of clustered databases | |
CN113051428B (en) | A method and device for backing up camera front-end storage | |
CN104866245B (en) | The method and apparatus of synchronisation snapshot between buffer memory device and storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |