CN106708653A - Mixed tax administration data security protecting method based on erasure code and multi-copy - Google Patents

Mixed tax administration data security protecting method based on erasure code and multi-copy Download PDF

Info

Publication number
CN106708653A
CN106708653A CN201611252092.8A CN201611252092A CN106708653A CN 106708653 A CN106708653 A CN 106708653A CN 201611252092 A CN201611252092 A CN 201611252092A CN 106708653 A CN106708653 A CN 106708653A
Authority
CN
China
Prior art keywords
data
tax
correcting
copy
eleting codes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611252092.8A
Other languages
Chinese (zh)
Other versions
CN106708653B (en
Inventor
崔莹
陈升东
陈健彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Institute of Software Application Technology Guangzhou GZIS
Original Assignee
Guangzhou Institute of Software Application Technology Guangzhou GZIS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Institute of Software Application Technology Guangzhou GZIS filed Critical Guangzhou Institute of Software Application Technology Guangzhou GZIS
Priority to CN201611252092.8A priority Critical patent/CN106708653B/en
Publication of CN106708653A publication Critical patent/CN106708653A/en
Application granted granted Critical
Publication of CN106708653B publication Critical patent/CN106708653B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于纠删码与多副本的混合税务大数据安全保护方法,当税务数据分布式存储系统的税务数据正常时,启动税务数据的多副本与纠删码存储方式存储流程,当税务数据分布式存储系统的税务数据失效时,启动税务数据容错处理流程。本发明利用不同时间的税务数据特点进行分模式存储,将纠删编码任务分发在不同的节点上,采用先副本后纠删码的模式,综合提高了整个税务数据的安全性和数据修复性能,提高了系统整体的编码性能,保证了在纠删编码完成之前数据的安全性。

The invention discloses a hybrid tax big data security protection method based on erasure codes and multiple copies. When the tax data in the tax data distributed storage system is normal, the multi-copy and erasure code storage mode storage process of the tax data is started. When the tax data in the tax data distributed storage system fails, start the tax data fault-tolerant processing process. The present invention uses the characteristics of tax data at different times to store in different modes, distributes erasure coding tasks to different nodes, adopts the mode of first copy and then erasure code, and comprehensively improves the security and data repair performance of the entire tax data. It improves the overall coding performance of the system and ensures the security of data before erasure coding is completed.

Description

一种基于纠删码与多副本的混合税务大数据安全保护方法A hybrid tax big data security protection method based on erasure code and multiple copies

技术领域technical field

本发明涉及计算机数据管理技术领域,具体涉及一种基于纠删码与多副本的混合税务大数据安全保护方法。The invention relates to the technical field of computer data management, in particular to a security protection method for hybrid tax big data based on erasure codes and multiple copies.

背景技术Background technique

随着经济全球化和我国经济的不断深入发展,我国纳税人数量迅猛增长、税种越发丰富,面对越来越庞大的税务数据,分布式存储是一个主流的存储方案,具有很高的性价比和扩展性。对于税务数据而言,其在分布式存储环境中的数据安全问题是值得研究的关键点。分布式存储系统包含大量节点,节点失效或者外部入侵都有可能导致数据不完整。为了避免数据丢失,通常采用基于冗余数据的容错方法,冗余容错主要有两种:一种是多副本容错,通过复制冗余数据进行容错;另一种是纠删码容错,通过编码生成冗余数据进行容错。With the economic globalization and the continuous in-depth development of my country's economy, the number of taxpayers in my country has grown rapidly and the types of taxes have become more abundant. Facing the increasingly large tax data, distributed storage is a mainstream storage solution with high cost performance and scalability. For tax data, its data security in a distributed storage environment is a key point worth studying. A distributed storage system contains a large number of nodes, and node failure or external intrusion may lead to incomplete data. In order to avoid data loss, a fault tolerance method based on redundant data is usually adopted. There are two main types of redundant fault tolerance: one is multi-copy fault tolerance, which replicates redundant data for fault tolerance; the other is erasure code fault tolerance, which is generated by encoding Redundant data for fault tolerance.

目前被广泛运用的容错方法是基于复制的多副本容错:将原数据复制成c个副本,然后将c个数据副本分发到c个不同的存储节点,这样任意c-1个节点失效时,每个数据至少还有1个副本存在。多副本容错具有简单易实现、计算开销少、数据访问性能好的优点。但是多副本容错也具有非常突出的缺点:存储开销很大。对于税务数据这种本身很庞大,且一直保持高速增长的数据而言,基于复制的多副本容错并不适用。The currently widely used fault tolerance method is multi-copy fault tolerance based on replication: copy the original data into c copies, and then distribute the c data copies to c different storage nodes, so that when any c-1 nodes fail, every There is at least 1 copy of the data. Multi-copy fault tolerance has the advantages of simple implementation, less computing overhead, and good data access performance. However, multi-copy fault tolerance also has a very prominent disadvantage: high storage overhead. For tax data, which is huge in itself and has been growing at a high speed, multi-copy fault tolerance based on replication is not applicable.

随着数据爆炸式的增长,纠删码容错因其能够以低得多的存储开销提供相同甚至更高的数据可靠性,近年来也开始成为研究热点。纠删码的容错策略是:将一个数据分成c个数据块,然后将c个数据块编码成n(n>c)个编码块分发到n个不同磁盘中,这样当节点失效时,只要该数据还有c个编码块存在,就能够将原数据解码出来。与被广泛使用的三副本容错方案相比,RS纠删码既可以将存储空间消耗降低53%,也同时可以将容错能力提高一倍。但是纠删码的缺陷在于数据重建时性能低下,尤其是在分布式存储中,由于数据重建需要多个节点相 互协作,不可避免地带来大量的网络资源消耗和计算资源消耗。对于税务数据这种分布式数据而言,这将成为整个系统性能的关键瓶颈。With the explosive growth of data, erasure code fault tolerance has become a research hotspot in recent years because it can provide the same or even higher data reliability with much lower storage overhead. The fault-tolerant strategy of erasure code is: divide a piece of data into c data blocks, and then encode the c data blocks into n (n>c) coded blocks and distribute them to n different disks, so that when a node fails, as long as the The data still has c encoding blocks, and the original data can be decoded. Compared with the widely used three-copy fault-tolerant scheme, RS erasure coding can reduce storage space consumption by 53% and double the fault-tolerant capacity at the same time. However, the disadvantage of erasure codes is that the performance of data reconstruction is low, especially in distributed storage. Since data reconstruction requires the cooperation of multiple nodes, it will inevitably lead to a large amount of network resource consumption and computing resource consumption. For distributed data such as tax data, this will become a key bottleneck in the performance of the entire system.

发明内容Contents of the invention

有鉴于此,为了解决现有技术中的上述问题,本发明提出一种基于纠删码与多副本的混合税务大数据安全保护方法。In view of this, in order to solve the above-mentioned problems in the prior art, the present invention proposes a hybrid tax big data security protection method based on erasure codes and multiple copies.

本发明通过以下技术手段解决上述问题:The present invention solves the above problems by the following technical means:

一种基于纠删码与多副本的混合税务大数据安全保护方法,当税务数据分布式存储系统的税务数据正常时,启动税务数据的多副本与纠删码存储方式存储流程;A hybrid tax big data security protection method based on erasure codes and multiple copies. When the tax data in the tax data distributed storage system is normal, the multi-copy and erasure code storage method storage process of tax data is started;

当税务数据分布式存储系统的税务数据失效时,启动税务数据容错处理流程;When the tax data in the tax data distributed storage system fails, start the tax data fault-tolerant processing process;

所述多副本与纠删码存储方式存储流程包括如下步骤:The storage process of the multi-copy and erasure code storage method includes the following steps:

步骤S11,将税务数据按时间划分为历史数据和近期数据,所述近期数据包括多个不同的近期数据包;Step S11, dividing the tax data into historical data and recent data according to time, the recent data includes a plurality of different recent data packets;

步骤S12,将所述近期数据按照多副本存储方式存储在多副本存储模块,将所述历史数据按照纠删码存储方式存储在纠删码存储模块;Step S12, storing the recent data in the multi-copy storage module according to the multi-copy storage method, and storing the historical data in the erasure code storage module according to the erasure code storage method;

步骤S13,当一近期数据包被标记为已完成状态,则将该近期数据包转存到纠删码存储模块从而形成历史数据;Step S13, when a recent data packet is marked as completed, then transfer the recent data packet to the erasure code storage module to form historical data;

所述税务数据容错处理流程包括如下步骤:The tax data fault-tolerant processing flow includes the following steps:

步骤S21,根据多副本存储模块数据管理节点,判断失效税务数据存储在多副本存储模块中还是纠删码存储模块中;Step S21, according to the data management node of the multi-copy storage module, it is judged whether the expired tax data is stored in the multi-copy storage module or in the erasure code storage module;

步骤S22,如果失效的税务数据存储在多副本存储模块,按照多副本管理节点的记录向相关多副本存储节点发送测试报文,根据测试报文反馈时延选择与失效税务数据对应的副本,并将副本恢复成有效的税务数据;Step S22, if the invalid tax data is stored in the multi-copy storage module, send a test message to the relevant multi-copy storage node according to the record of the multi-copy management node, select the copy corresponding to the invalid tax data according to the feedback delay of the test message, and Restoring copies into valid tax data;

步骤S23,如果失效的税务数据存储在纠删码存储模块,需进一步查找纠删 码管理节点的记录,向相关纠删码存储节点发送测试报文,然后根据测试报文反馈时延依次选择相应编码块,获得足够数量编码块后,即可还原恢复税务数据;Step S23, if the invalid tax data is stored in the erasure code storage module, it is necessary to further search for the records of the erasure code management node, send a test message to the relevant erasure code storage node, and then select the corresponding data according to the feedback delay of the test message. Encoding blocks, after obtaining a sufficient number of encoding blocks, the tax data can be restored;

所述税务数据分布式存储系统,用于提供针对税务数据的存储及容错服务;The tax data distributed storage system is used to provide storage and fault-tolerant services for tax data;

所述税务数据为税务数据分布式存储系统的客户端输入的数据;The tax data is the data input by the client of the tax data distributed storage system;

所述历史数据为税务数据分布式存储系统时间划分点之前的数据,存储在纠删码存储模块;The historical data is the data before the time division point of the tax data distributed storage system, which is stored in the erasure code storage module;

所述近期数据,为税务数据分布式存储系统时间划分点之后的数据,存储在多副本存储模块;The recent data is the data after the time division point of the tax data distributed storage system, and is stored in the multi-copy storage module;

所述多副本存储模块,用于存储与处理近期数据,包括一个多副本存储模块数据管理节点与至少一个多副本存储节点;The multi-copy storage module is used to store and process recent data, including a multi-copy storage module data management node and at least one multi-copy storage node;

所述多副本管理节点,用于管理多副本存储模块内数据的复制、分发和存储,并对数据信息进行记录;The multi-copy management node is used to manage the replication, distribution and storage of data in the multi-copy storage module, and record data information;

所述多副本存储节点,用于存储近期数据;The multi-copy storage node is used to store recent data;

所述纠删码存储模块,用于存储与处理历史数据,包括一个纠删码管理节点与至少一个纠删码存储节点;The erasure code storage module is used to store and process historical data, including an erasure code management node and at least one erasure code storage node;

所述纠删码管理节点,用于管理纠删码存储模块内数据的编码、分发和存储,并对数据信息进行记录;The erasure code management node is used to manage the encoding, distribution and storage of data in the erasure code storage module, and record the data information;

所述纠删码存储节点,用于存储历史数据;The erasure code storage node is used to store historical data;

所述多副本存储方式,用于通过税务数据分布式存储系统来读取、存储、记录和恢复近期数据;The multi-copy storage method is used to read, store, record and restore recent data through the tax data distributed storage system;

所述纠删码存储方式,用于通过税务数据分布式存储系统来转存、读取、记录和恢复历史数据;The erasure code storage method is used to dump, read, record and restore historical data through the tax data distributed storage system;

所述编码块为待转存的近期数据被分包并编码后形成的编码块,存储在纠删码存储节点,用于在税务数据容错处理过程中还原恢复成税务数据。The coded block is the coded block formed after the recent data to be dumped is sub-packaged and coded, and stored in the erasure code storage node, and used to restore and restore the tax data in the fault-tolerant processing of the tax data.

进一步地,步骤S12中所述的纠删码存储方式包括以下步骤:Further, the erasure code storage method described in step S12 includes the following steps:

步骤S1221,由纠删码管理节点判断外部对纠删码存储模块的访问频度是否低于访问频度阈值,从而判断当前纠删码存储模块是否处于空闲状态,如果是则激活全部纠删码存储节点;Step S1221, the erasure code management node judges whether the frequency of external access to the erasure code storage module is lower than the access frequency threshold, thereby judging whether the current erasure code storage module is in an idle state, and if so, activates all erasure codes storage node;

步骤S1222,对每一个被激活的纠删码存储节点进行如下判断:该纠删码存储节点的存储负载是否超过存储满载阈值,以及该纠删码存储节点的网络负载是否超过网络满载阈值,如果均不超过,则向多副本管理节点请求待转存数据;Step S1222, judge each activated erasure code storage node as follows: whether the storage load of the erasure code storage node exceeds the storage full load threshold, and whether the network load of the erasure code storage node exceeds the network full load threshold, if If they are not exceeded, then request the data to be dumped from the multi-copy management node;

步骤S1223,将待转存数据编码后,分发并保存在纠删码存储节点,并将分发信息记录在纠删码管理节点;Step S1223, after encoding the data to be dumped, distribute and save it in the erasure code storage node, and record the distribution information in the erasure code management node;

步骤S1224,确认数据转存成功后,将多副本存储模块中已转存的税务数据及其副本全部删除;Step S1224, after confirming that the data transfer is successful, delete all the transferred tax data and its copies in the multi-copy storage module;

所述待转存数据为多副本管理节点记录中被申请转存数据的某一个副本数据,该副本数据的选择原则需符合负载均衡,该副本数据用于编码后分发并保存在纠删码存储节点;The data to be dumped is a certain copy data of the data to be dumped in the record of the multi-copy management node. The selection principle of the copy data must conform to the load balance. The copy data is used for distribution after encoding and stored in the erasure code storage node;

所述分发信息为多个编码块分发到多个纠删码存储节点的记录信息,用于指引多个编码块还原恢复成税务数据。The distribution information is record information that multiple coded blocks are distributed to multiple erasure code storage nodes, and is used to guide multiple coded blocks to restore and restore tax data.

进一步地,步骤S12中所述的多副本存储方式中,写入近期数据的处理流程包括如下步骤:Further, in the multi-copy storage method described in step S12, the processing flow of writing recent data includes the following steps:

步骤S1231,当客户端发出近期数据请求写入时,多副本管理节点进行响应;Step S1231, when the client sends a recent data request to write, the multi-copy management node responds;

步骤S1232,对写入的税务数据进行复制形成副本,并将写入的税务数据及其副本分开存放在不同的多副本存储节点中;Step S1232, copying the written tax data to form a copy, and storing the written tax data and its copies separately in different multi-copy storage nodes;

步骤S1233,将写入的税务数据的存储信息记录在多副本管理节点。Step S1233, recording the storage information of the written tax data in the multi-copy management node.

进一步地,步骤S12中所述的多副本存储方式中,读取近期数据的处理流程包括如下步骤:Further, in the multi-copy storage method described in step S12, the processing flow of reading recent data includes the following steps:

步骤S1241,当客户端发出近期数据读取请求时,多副本管理节点进行响应并根据记录向相关多副本存储节点发送测试报文并请求计算负载;Step S1241, when the client sends a recent data read request, the multi-copy management node responds and sends a test message to the relevant multi-copy storage node according to the record and requests for calculation load;

步骤S1242,通过测试报文反馈的时延和相关多副本存储节点的计算负载来综合选择对应的多副本存储节点;Step S1242, comprehensively select the corresponding multi-copy storage node by testing the delay of message feedback and the computing load of the related multi-copy storage node;

步骤S1243,根据多副本管理节点的分发让对应的多副本存储节点内的税务数据直接发送到客户端中;Step S1243, according to the distribution of the multi-copy management node, the tax data in the corresponding multi-copy storage node is directly sent to the client;

所述客户端为分布式存储系统的客户端,用于写入与读取税务数据。The client is a client of the distributed storage system and is used for writing and reading tax data.

进一步地,步骤S1232中,写入的税务数据及其副本的存放方式是将同一税务数据的不同副本进行物理隔离,选择不同的机柜或机房存储。Further, in step S1232, the storage method of the written tax data and its copies is to physically isolate different copies of the same tax data, and select different cabinets or computer rooms for storage.

进一步地,所述编码块形成的过程包括以下步骤:Further, the process of forming the coding block includes the following steps:

步骤61,待转存的近期数据被分包成C个数据块;Step 61, the recent data to be dumped is subpackaged into C data blocks;

步骤62,将C个数据块编码成N个编码块,所述N的数目大于C;Step 62, encoding C data blocks into N encoding blocks, the number of N is greater than C;

步骤63,N个编码块分发到N个不同的纠删码存储模块;Step 63, N coded blocks are distributed to N different erasure code storage modules;

步骤64,将N个编码块分发信息记录在编码块所在的纠删码管理节点。Step 64: Record the distribution information of the N coded blocks in the erasure code management node where the coded blocks are located.

本发明利用不同时间的税务数据特点进行分模式存储,综合提高了整个税务数据的安全性和数据修复性能。由于税务数据的访问频度具有阶段性变化的特点,近期的数据访问频度是最高的,历史数据的访问频度则相对较低。在访问频度低的数据上使用纠删码,可以提高存储空间使用率,在访问频度高的数据上使用多副本,提高了数据修复性能。The present invention utilizes the characteristics of tax data at different times to store in different modes, and comprehensively improves the security and data restoration performance of the entire tax data. Because the access frequency of tax data has the characteristics of periodic changes, the access frequency of recent data is the highest, while the access frequency of historical data is relatively low. Using erasure codes on data with low access frequency can improve storage space utilization, and using multiple copies on data with high access frequency improves data repair performance.

其次,本发明将纠删编码任务分发在不同的节点上,且选择节点时充分考虑节点的负载情况,将计算负载和网络传输负载分担在多个节点中,提高系统整体的编码性能。Secondly, the present invention distributes erasure correction coding tasks to different nodes, fully considers the load conditions of the nodes when selecting nodes, and shares the calculation load and network transmission load among multiple nodes to improve the overall coding performance of the system.

再次,本发明采用先副本后纠删码的模式,能够保证在纠删编码完成之前数据的安全性,弥补了单纯使用纠删码容错时容易遇到的编码中数据丢失的情况。Thirdly, the present invention adopts the mode of copy first and then erasure code, which can ensure the security of data before erasure code is completed, and makes up for the situation of data loss in coding that is easy to encounter when simply using erasure code for error tolerance.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所 需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1是本发明一种基于纠删码与多副本的混合税务大数据安全保护方法的工作流程图;Fig. 1 is a working flow diagram of a hybrid tax big data security protection method based on erasure codes and multiple copies in the present invention;

图2是本发明的多副本存储模块的结构示意图;Fig. 2 is a schematic structural diagram of a multi-copy storage module of the present invention;

图3是本发明的纠删码存储模块的结构示意图。Fig. 3 is a schematic structural diagram of an erasure code storage module of the present invention.

具体实施方式detailed description

为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合附图和具体的实施例对本发明的技术方案进行详细说明。需要指出的是,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例,基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the above objects, features and advantages of the present invention more comprehensible, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be pointed out that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all those skilled in the art can obtain without creative work. Other embodiments all belong to the protection scope of the present invention.

如图1所示,一种基于纠删码与多副本的混合税务大数据安全保护方法,当税务数据分布式存储系统的税务数据正常时,启动税务数据的多副本与纠删码存储方式存储流程;As shown in Figure 1, a hybrid tax big data security protection method based on erasure codes and multiple copies, when the tax data in the tax data distributed storage system is normal, the multi-copy and erasure code storage mode storage of tax data is started process;

当税务数据分布式存储系统的税务数据失效时,启动税务数据容错处理流程;When the tax data in the tax data distributed storage system fails, start the tax data fault-tolerant processing process;

所述多副本与纠删码存储方式存储流程包括如下步骤:The storage process of the multi-copy and erasure code storage method includes the following steps:

步骤S11,将税务数据按时间划分为历史数据和近期数据,所述近期数据包括多个不同的近期数据包;Step S11, dividing the tax data into historical data and recent data according to time, the recent data includes a plurality of different recent data packets;

步骤S12,将所述近期数据按照多副本存储方式存储在多副本存储模块,将所述历史数据按照纠删码存储方式存储在纠删码存储模块;Step S12, storing the recent data in the multi-copy storage module according to the multi-copy storage method, and storing the historical data in the erasure code storage module according to the erasure code storage method;

步骤S13,当一近期数据包被标记为已完成状态,则将该近期数据包转存到 纠删码存储模块从而形成历史数据;Step S13, when a recent data packet is marked as completed, then transfer the recent data packet to the erasure code storage module to form historical data;

所述税务数据容错处理流程包括如下步骤:The tax data fault-tolerant processing flow includes the following steps:

步骤S21,根据多副本存储模块数据管理节点,判断失效税务数据存储在多副本存储模块中还是纠删码存储模块中;Step S21, according to the data management node of the multi-copy storage module, it is judged whether the expired tax data is stored in the multi-copy storage module or in the erasure code storage module;

步骤S22,如果失效的税务数据存储在多副本存储模块,按照多副本管理节点的记录向相关多副本存储节点发送测试报文,根据测试报文反馈时延选择与失效税务数据对应的副本,并将副本恢复成有效的税务数据;Step S22, if the invalid tax data is stored in the multi-copy storage module, send a test message to the relevant multi-copy storage node according to the record of the multi-copy management node, select the copy corresponding to the invalid tax data according to the feedback delay of the test message, and Restoring copies into valid tax data;

步骤S23,如果失效的税务数据存储在纠删码存储模块,需进一步查找纠删码管理节点的记录,向相关纠删码存储节点发送测试报文,然后根据测试报文反馈时延依次选择相应编码块,获得足够数量编码块后,即可还原恢复税务数据;Step S23, if the invalid tax data is stored in the erasure code storage module, it is necessary to further search for the records of the erasure code management node, send a test message to the relevant erasure code storage node, and then select the corresponding data according to the feedback delay of the test message. Encoding blocks, after obtaining a sufficient number of encoding blocks, the tax data can be restored;

所述税务数据分布式存储系统,用于提供针对税务数据的存储及容错服务;The tax data distributed storage system is used to provide storage and fault-tolerant services for tax data;

所述税务数据为税务数据分布式存储系统的客户端输入的数据;The tax data is the data input by the client of the tax data distributed storage system;

所述历史数据为税务数据分布式存储系统时间划分点之前的数据,存储在纠删码存储模块;The historical data is the data before the time division point of the tax data distributed storage system, which is stored in the erasure code storage module;

所述近期数据,为税务数据分布式存储系统时间划分点之后的数据,存储在多副本存储模块。The recent data is the data after the time division point of the tax data distributed storage system, and is stored in the multi-copy storage module.

如图2所示,所述多副本存储模块,用于存储与处理近期数据,包括一个多副本存储模块数据管理节点与至少一个多副本存储节点;As shown in Figure 2, the multi-copy storage module is used to store and process recent data, including a multi-copy storage module data management node and at least one multi-copy storage node;

所述多副本管理节点,用于管理多副本存储模块内数据的复制、分发和存储,并对数据信息进行记录;The multi-copy management node is used to manage the replication, distribution and storage of data in the multi-copy storage module, and record data information;

所述多副本存储节点,用于存储近期数据。The multi-copy storage nodes are used to store recent data.

如图3所示,所述纠删码存储模块,用于存储与处理历史数据,包括一个纠删码管理节点与至少一个纠删码存储节点;As shown in Figure 3, the erasure code storage module is used to store and process historical data, including an erasure code management node and at least one erasure code storage node;

所述纠删码管理节点,用于管理纠删码存储模块内数据的编码、分发和存储,并对数据信息进行记录;The erasure code management node is used to manage the encoding, distribution and storage of data in the erasure code storage module, and record the data information;

所述纠删码存储节点,用于存储历史数据。The erasure code storage node is used to store historical data.

所述多副本存储方式,用于通过税务数据分布式存储系统来读取、存储、记录和恢复近期数据;The multi-copy storage method is used to read, store, record and restore recent data through the tax data distributed storage system;

所述纠删码存储方式,用于通过税务数据分布式存储系统来转存、读取、记录和恢复历史数据;The erasure code storage method is used to dump, read, record and restore historical data through the tax data distributed storage system;

所述编码块为待转存的近期数据被分包并编码后形成的编码块,存储在纠删码存储节点,用于在税务数据容错处理过程中还原恢复成税务数据。The coded block is the coded block formed after the recent data to be dumped is sub-packaged and coded, and stored in the erasure code storage node, and used to restore and restore the tax data in the fault-tolerant processing of the tax data.

步骤S12中所述的纠删码存储方式包括以下步骤:The erasure code storage method described in step S12 includes the following steps:

步骤S1221,由纠删码管理节点判断外部对纠删码存储模块的访问频度是否低于访问频度阈值,从而判断当前纠删码存储模块是否处于空闲状态,如果是则激活全部纠删码存储节点;Step S1221, the erasure code management node judges whether the frequency of external access to the erasure code storage module is lower than the access frequency threshold, thereby judging whether the current erasure code storage module is in an idle state, and if so, activates all erasure codes storage node;

步骤S1222,对每一个被激活的纠删码存储节点进行如下判断:该纠删码存储节点的存储负载是否超过存储满载阈值,以及该纠删码存储节点的网络负载是否超过网络满载阈值,如果均不超过,则向多副本管理节点请求待转存数据;Step S1222, judge each activated erasure code storage node as follows: whether the storage load of the erasure code storage node exceeds the storage full load threshold, and whether the network load of the erasure code storage node exceeds the network full load threshold, if If they are not exceeded, then request the data to be dumped from the multi-copy management node;

步骤S1223,将待转存数据编码后,分发并保存在纠删码存储节点,并将分发信息记录在纠删码管理节点;Step S1223, after encoding the data to be dumped, distribute and save it in the erasure code storage node, and record the distribution information in the erasure code management node;

步骤S1224,确认数据转存成功后,将多副本存储模块中已转存的税务数据及其副本全部删除;Step S1224, after confirming that the data transfer is successful, delete all the transferred tax data and its copies in the multi-copy storage module;

所述待转存数据为多副本管理节点记录中被申请转存数据的某一个副本数据,该副本数据的选择原则需符合负载均衡,该副本数据用于编码后分发并保存在纠删码存储节点;The data to be dumped is a certain copy data of the data to be dumped in the record of the multi-copy management node. The selection principle of the copy data must conform to the load balance. The copy data is used for distribution after encoding and stored in the erasure code storage node;

所述分发信息为多个编码块分发到多个纠删码存储节点的记录信息,用于指引多个编码块还原恢复成税务数据。The distribution information is record information that multiple coded blocks are distributed to multiple erasure code storage nodes, and is used to guide multiple coded blocks to restore and restore tax data.

步骤S12中所述的多副本存储方式中,写入近期数据的处理流程包括如下步骤:In the multi-copy storage method described in step S12, the processing flow of writing recent data includes the following steps:

步骤S1231,当客户端发出近期数据请求写入时,多副本管理节点进行响应;Step S1231, when the client sends a recent data request to write, the multi-copy management node responds;

步骤S1232,对写入的税务数据进行复制形成副本,并将写入的税务数据及其副本分开存放在不同的多副本存储节点中;Step S1232, copying the written tax data to form a copy, and storing the written tax data and its copies separately in different multi-copy storage nodes;

步骤S1233,将写入的税务数据的存储信息记录在多副本管理节点。Step S1233, recording the storage information of the written tax data in the multi-copy management node.

步骤S12中所述的多副本存储方式中,读取近期数据的处理流程包括如下步骤:In the multi-copy storage method described in step S12, the processing flow of reading recent data includes the following steps:

步骤S1241,当客户端发出近期数据读取请求时,多副本管理节点进行响应并根据记录向相关多副本存储节点发送测试报文并请求计算负载;Step S1241, when the client sends a recent data read request, the multi-copy management node responds and sends a test message to the relevant multi-copy storage node according to the record and requests for calculation load;

步骤S1242,通过测试报文反馈的时延和相关多副本存储节点的计算负载来综合选择对应的多副本存储节点;Step S1242, comprehensively select the corresponding multi-copy storage node by testing the delay of message feedback and the computing load of the related multi-copy storage node;

步骤S1243,根据多副本管理节点的分发让对应的多副本存储节点内的税务数据直接发送到客户端中;Step S1243, according to the distribution of the multi-copy management node, the tax data in the corresponding multi-copy storage node is directly sent to the client;

所述客户端为分布式存储系统的客户端,用于写入与读取税务数据。The client is a client of the distributed storage system and is used for writing and reading tax data.

步骤S1232中,写入的税务数据及其副本的存放方式是将同一税务数据的不同副本进行物理隔离,选择不同的机柜或机房存储。In step S1232, the written tax data and its copies are stored by physically separating different copies of the same tax data and selecting different cabinets or computer rooms for storage.

所述编码块形成的过程包括以下步骤:The process of forming the coding block includes the following steps:

步骤61,待转存的近期数据被分包成C个数据块;Step 61, the recent data to be dumped is subpackaged into C data blocks;

步骤62,将C个数据块编码成N个编码块,所述N的数目大于C;Step 62, encoding C data blocks into N encoding blocks, the number of N is greater than C;

步骤63,N个编码块分发到N个不同的纠删码存储模块;Step 63, N coded blocks are distributed to N different erasure code storage modules;

步骤64,将N个编码块分发信息记录在编码块所在的纠删码管理节点。Step 64: Record the distribution information of the N coded blocks in the erasure code management node where the coded blocks are located.

本发明利用不同时间的税务数据特点进行分模式存储,综合提高了整个税务数据的安全性和数据修复性能。由于税务数据的访问频度具有阶段性变化的特点,近期的数据访问频度是最高的,历史数据的访问频度则相对较低。在访问频度低的数据上使用纠删码,可以提高存储空间使用率,在访问频度高的数据上使用多副本,提高了数据修复性能。The present invention utilizes the characteristics of tax data at different times to store in different modes, and comprehensively improves the security and data restoration performance of the entire tax data. Because the access frequency of tax data has the characteristics of periodic changes, the access frequency of recent data is the highest, while the access frequency of historical data is relatively low. Using erasure codes on data with low access frequency can improve storage space utilization, and using multiple copies on data with high access frequency improves data repair performance.

其次,本发明将纠删编码任务分发在不同的节点上,且选择节点时充分考虑节点的负载情况,将计算负载和网络传输负载分担在多个节点中, 提高系统整体的编码性能。Secondly, the present invention distributes erasure correction coding tasks to different nodes, fully considers the load conditions of the nodes when selecting nodes, and shares the calculation load and network transmission load among multiple nodes to improve the overall coding performance of the system.

再次,本发明采用先副本后纠删码的模式,能够保证在纠删编码完成之前数据的安全性,弥补了单纯使用纠删码容错时容易遇到的编码中数据丢失的情况。Thirdly, the present invention adopts the mode of copy first and then erasure code, which can ensure the security of data before erasure code is completed, and makes up for the situation of data loss in coding that is easy to encounter when simply using erasure code for error tolerance.

以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the present invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims (6)

1. a kind of mixing tax big data method for security protection based on correcting and eleting codes with many copies, it is characterised in that when tax number According to the tax data of distributed memory system it is normal when, many copies and the correcting and eleting codes storage mode for starting tax data store stream Journey;
When the tax data of tax data distributed memory system fails, start tax data fault-tolerant processing flow;
Many copies comprise the following steps with correcting and eleting codes storage mode Stored Procedure:
Step S11, tax data is temporally divided into historical data and Recent data, and the Recent data includes multiple different Recent data bag;
Step S12, the Recent data is stored in many copy memory modules according to many copy storage modes, by the history number Stored in correcting and eleting codes memory module according to according to correcting and eleting codes storage mode;
Step S13, when Recent data coating is labeled as completion status, then dumps to correcting and eleting codes storage by the Recent data bag Module is so as to history of forming data;
The tax data fault-tolerant processing flow comprises the following steps:
Step S21, according to many copy memory module data management nodes, judges that failure tax data storage stores mould in many copies In block or in correcting and eleting codes memory module;
Step S22, if the tax data storage of failure is in many copy memory modules, according to many replica management nodes record to Related many copy memory nodes send test packet, according to test packet feedback delay selection pair corresponding with failure tax data This, and copy is reverted into effective tax data;
Step S23, if the tax data of failure is stored in correcting and eleting codes memory module, need to further search for correcting and eleting codes management node Record, to related correcting and eleting codes memory node send test packet, then selected successively accordingly according to test packet feedback delay Encoding block, after obtaining sufficient amount encoding block, you can reduction recovers tax data.
2. a kind of mixing tax big data method for security protection based on correcting and eleting codes with many copies as claimed in claim 1, its It is characterised by, the correcting and eleting codes storage mode described in step S12 is comprised the following steps:
Whether step S1221, judge the outside visiting frequency to correcting and eleting codes memory module less than access by correcting and eleting codes management node Frequency threshold, so as to judge that whether current correcting and eleting codes memory module, in idle condition, if it is activates whole correcting and eleting codes and deposits Storage node;
Step S1222, makes the following judgment to the correcting and eleting codes memory node that each is activated:The correcting and eleting codes memory node is deposited Whether storage load exceedes storage loading thresholds, and whether the offered load of the correcting and eleting codes memory node is fully loaded with threshold more than network Value, if be no more than, unloading data is treated to the request of many replica management nodes;
Step S1223, by after after unloading data encoding, distributing and be stored in correcting and eleting codes memory node, and will distribute information record In correcting and eleting codes management node;
Step S1224 is complete by the tax data and its copy of unloading in many copy memory modules after confirming data conversion storage success Delete in portion.
3. a kind of mixing tax big data method for security protection based on correcting and eleting codes with many copies as claimed in claim 1, its It is characterised by, in many copy storage modes described in step S12, the handling process for writing Recent data comprises the following steps:
Step S1231, when client sends Recent data request write-in, many replica management nodes are responded;
Step S1232, the tax data to writing carries out duplication and forms copy, and the tax data and its copy of write-in are separated It is stored in different many copy memory nodes;
Step S1233, the storage information of the tax data that will be write is recorded in many replica management nodes.
4. a kind of mixing tax big data method for security protection based on correcting and eleting codes with many copies as claimed in claim 1, its It is characterised by, in many copy storage modes described in step S12, the handling process for reading Recent data comprises the following steps:
Step S1241, when client sends Recent data read requests, many replica management nodes are responded and according to record Test packet is sent to related many copy memory nodes and ask computational load;
Step S1242, the time delay and the computational load of related many copy memory nodes fed back by test packet are come comprehensive selection Corresponding many copy memory nodes;
Step S1243, the tax data that the distribution according to many replica management nodes allows in corresponding many copy memory nodes is direct It is sent in client.
5. a kind of mixing tax big data method for security protection based on correcting and eleting codes with many copies as claimed in claim 3, its It is characterised by, in step S1232, the tax data of write-in and its location mode of copy are that the difference of same tax data is secondary Originally it is physically separated, selects different rack or computer room storages.
6. a kind of mixing tax big data method for security protection based on correcting and eleting codes with many copies as claimed in claim 1, its It is characterised by, the process that the encoding block is formed is comprised the following steps:
Step 61, treats that the Recent data of unloading is packetized into C data block;
Step 62, N number of encoding block is encoded into by C data block, and the number of the N is more than C;
Step 63, N number of encoding block is distributed to N number of different correcting and eleting codes memory module;
Step 64, information record is distributed in the correcting and eleting codes management node where encoding block by N number of encoding block.
CN201611252092.8A 2016-12-29 2016-12-29 A hybrid tax big data security protection method based on erasure coding and multiple copies Active CN106708653B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611252092.8A CN106708653B (en) 2016-12-29 2016-12-29 A hybrid tax big data security protection method based on erasure coding and multiple copies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611252092.8A CN106708653B (en) 2016-12-29 2016-12-29 A hybrid tax big data security protection method based on erasure coding and multiple copies

Publications (2)

Publication Number Publication Date
CN106708653A true CN106708653A (en) 2017-05-24
CN106708653B CN106708653B (en) 2020-06-30

Family

ID=58904096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611252092.8A Active CN106708653B (en) 2016-12-29 2016-12-29 A hybrid tax big data security protection method based on erasure coding and multiple copies

Country Status (1)

Country Link
CN (1) CN106708653B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108196978A (en) * 2017-12-22 2018-06-22 新华三技术有限公司 Date storage method, device, data-storage system and readable storage medium storing program for executing
CN108255432A (en) * 2018-01-12 2018-07-06 郑州云海信息技术有限公司 Write operation control method, system, device and storage medium based on bedding storage
CN110196682A (en) * 2018-06-15 2019-09-03 腾讯科技(深圳)有限公司 Data managing method, calculates equipment and storage medium at device
CN110209670A (en) * 2019-05-09 2019-09-06 北京猫盘技术有限公司 Data processing method and device based on network storage equipment cluster
CN111008181A (en) * 2019-10-31 2020-04-14 苏州浪潮智能科技有限公司 A distributed file system storage policy switching method, system, terminal and storage medium
CN111381767A (en) * 2018-12-28 2020-07-07 阿里巴巴集团控股有限公司 Data processing method and device
CN111782582A (en) * 2019-06-14 2020-10-16 北京京东尚科信息技术有限公司 Data conversion method, system and name node
CN112965660A (en) * 2021-02-09 2021-06-15 山东英信计算机技术有限公司 Method, system, device and medium for feeding back information of double storage pools
CN114398006A (en) * 2021-12-24 2022-04-26 中国电信股份有限公司 Distributed storage mode control method, device, equipment and storage medium
CN114764425A (en) * 2021-01-13 2022-07-19 北京金山云网络技术有限公司 Object updating method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN105472047A (en) * 2016-02-03 2016-04-06 天津书生云科技有限公司 Storage system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN105472047A (en) * 2016-02-03 2016-04-06 天津书生云科技有限公司 Storage system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108196978B (en) * 2017-12-22 2021-03-09 新华三技术有限公司 Data storage method, device, data storage system and readable storage medium
CN108196978A (en) * 2017-12-22 2018-06-22 新华三技术有限公司 Date storage method, device, data-storage system and readable storage medium storing program for executing
CN108255432A (en) * 2018-01-12 2018-07-06 郑州云海信息技术有限公司 Write operation control method, system, device and storage medium based on bedding storage
CN110196682A (en) * 2018-06-15 2019-09-03 腾讯科技(深圳)有限公司 Data managing method, calculates equipment and storage medium at device
CN111381767B (en) * 2018-12-28 2024-03-26 阿里巴巴集团控股有限公司 Data processing method and device
CN111381767A (en) * 2018-12-28 2020-07-07 阿里巴巴集团控股有限公司 Data processing method and device
CN110209670B (en) * 2019-05-09 2022-03-25 北京猫盘技术有限公司 Data processing method and device based on network storage device cluster
CN110209670A (en) * 2019-05-09 2019-09-06 北京猫盘技术有限公司 Data processing method and device based on network storage equipment cluster
CN111782582A (en) * 2019-06-14 2020-10-16 北京京东尚科信息技术有限公司 Data conversion method, system and name node
CN111008181A (en) * 2019-10-31 2020-04-14 苏州浪潮智能科技有限公司 A distributed file system storage policy switching method, system, terminal and storage medium
CN114764425A (en) * 2021-01-13 2022-07-19 北京金山云网络技术有限公司 Object updating method and device, storage medium and electronic equipment
CN112965660A (en) * 2021-02-09 2021-06-15 山东英信计算机技术有限公司 Method, system, device and medium for feeding back information of double storage pools
CN112965660B (en) * 2021-02-09 2023-08-08 山东英信计算机技术有限公司 Method, system, equipment and medium for double storage pool information feedback
CN114398006A (en) * 2021-12-24 2022-04-26 中国电信股份有限公司 Distributed storage mode control method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106708653B (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN106708653A (en) Mixed tax administration data security protecting method based on erasure code and multi-copy
US10956276B2 (en) System state recovery in a distributed, cloud-based storage system
US9535790B2 (en) Prioritizing data reconstruction in distributed storage systems
CN101667181B (en) Method, device and system for data disaster tolerance
CN103118133B (en) Based on the mixed cloud storage means of the file access frequency
CN103780638B (en) Method of data synchronization and system
CN108196978A (en) Date storage method, device, data-storage system and readable storage medium storing program for executing
CN107885612A (en) Data processing method and system and device
CN103051681B (en) Collaborative type log system facing to distribution-type file system
CN107291889A (en) A kind of date storage method and system
CN102142006A (en) File processing method and device of distributed file system
CN106066896A (en) A kind of big Data duplication applying perception deletes storage system and method
CN101692226A (en) Storage method of mass filing stream data
CN104899117A (en) Memory database parallel logging method for nonvolatile memory
CN109582213A (en) Data reconstruction method and device, data-storage system
CN102142032A (en) Method and system for reading and writing data of distributed file system
CN104424052A (en) Automatic redundant distributed storage system and method
CN104965835B (en) A kind of file read/write method and device of distributed file system
KR101254179B1 (en) Method for effective data recovery in distributed file system
CN103530206A (en) Data recovery method and device
CN106027638A (en) Hadoop data distribution method based on hybrid coding
US20140040574A1 (en) Resiliency with a destination volume in a replication environment
CN103399943A (en) Communication method and communication device for parallel query of clustered databases
CN113051428B (en) A method and device for backing up camera front-end storage
CN104866245B (en) The method and apparatus of synchronisation snapshot between buffer memory device and storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant