CN115361401A - Data encoding and decoding method and system for copy certification - Google Patents

Data encoding and decoding method and system for copy certification Download PDF

Info

Publication number
CN115361401A
CN115361401A CN202210829009.8A CN202210829009A CN115361401A CN 115361401 A CN115361401 A CN 115361401A CN 202210829009 A CN202210829009 A CN 202210829009A CN 115361401 A CN115361401 A CN 115361401A
Authority
CN
China
Prior art keywords
layer
data
node
nodes
decoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210829009.8A
Other languages
Chinese (zh)
Other versions
CN115361401B (en
Inventor
万胜刚
董子豪
易成龙
朱捷瑞
何绪斌
谢长生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210829009.8A priority Critical patent/CN115361401B/en
Publication of CN115361401A publication Critical patent/CN115361401A/en
Application granted granted Critical
Publication of CN115361401B publication Critical patent/CN115361401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a data encoding and decoding method and system for copy certification, belonging to the technical field of computer storage; after the coded data corresponding to each node of the upper layer are obtained, based on the relative dependency relationship of the node of the lower layer in the stacked depth robust graph to the node of the upper layer, the coded data corresponding to the parent node of each node of the lower layer in the upper layer are sequentially written into an external memory, and the coded data corresponding to each node of the upper layer in the internal memory are deleted; the invention saves the coded data corresponding to the upper node in the memory in the external memory, and researches the proper distribution of the external memory nodes, so that the child nodes positioned at the next layer are sequentially read when reading the upper parent node which depends on the child nodes, thereby greatly relieving the algorithm execution performance loss caused by the difference of the internal and external memory performances, and greatly reducing the expenditure of memory space resources.

Description

Data encoding and decoding method and system for copy certification
Technical Field
The invention belongs to the technical field of computer storage, and particularly relates to a data encoding and decoding method and system for copy certification.
Background
The distributed storage network achieves the purpose of horizontal capacity expansion by adding anonymous and idle storage equipment in the Internet. In order to ensure the reliability of data, the distributed storage network mainly adopts two data redundancy mechanisms, namely a multi-copy mechanism and an erasure code mechanism. In the multi-copy system, data is copied and a plurality of data copies are generated, and the plurality of data copies are dispersedly stored on different devices, so that when any copy is lost, the data can be repaired by any other copy. However, in a distributed storage network, due to the anonymous nature of the storage devices, malicious attacks against the replica data by malicious attackers, namely witch attacks, are made possible.
In order to resist Sybil attacks, a time limit-based copy certification method is often adopted for realization, and under a time hypothesis mode, a coding process from original data to a data copy needs to meet a certain hypothesis, namely, a long period of time is needed in the coding process, and the computing process is anti-parallel (the computing process is difficult to accelerate in parallel through a plurality of cores), so that when a challenge occurs, honest nodes can pass the challenge only through the existing data generation certification, and malicious nodes can restore the required challenge data copy only through a decoding process plus a long-time coding process; therefore, in order to easily distinguish malicious nodes from honest nodes to resist Sybil attacks, it is of great significance to research a data coding method for copy certification.
The existing time limit-based replication certification method generally performs data coding based on an SDR algorithm, the SDR algorithm introduces DRG and an expansion diagram to perform data coding to resist Sybil attacks and derivative attacks thereof, in the process, data of any node is deleted, all father nodes of the node need to be recalculated, calculation amplification is caused, two layers of nodes need to be reserved in a memory in the execution process, all calculation resources are cached in the memory for calculation, and high memory overhead exists.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides a data encoding and decoding method and system for proof of replication, so as to solve the technical problem of high memory resource overhead of the existing data encoding/decoding method for proof of replication.
In order to achieve the above object, in a first aspect, the present invention provides a data encoding method for proof of copy, comprising:
performing L-layer encoding on the data to be encoded in series:
and (3) first layer coding: obtaining a random seed according to the information of the data to be coded, and generating random data with the same size as the data to be coded according to the random seed; dividing random data into M data blocks in sequence, wherein each data block corresponds to each node position in a first layer of a pre-built stack-type depth robust graph one by one; based on the dependency relationship among all nodes in the first layer of the stack-type depth robust graph, sequentially encoding all data blocks, and sequentially generating M encoded data to obtain the encoded data corresponding to all nodes of the first layer; based on the relative dependency relationship of a second layer node to a first layer node in a stacked depth robust graph, sequentially writing the coded data corresponding to the father node of each node of the second layer in the upper layer of the second layer node into an external memory, and deleting the coded data corresponding to each node of the first layer in the internal memory;
and (3) encoding the ith layer: reading coded data corresponding to father nodes of the ith layer in the i-1 th layer from an external memory based on the relative dependency relationship of the nodes of the ith layer in the stacked depth robust graph to the nodes of the i-1 th layer, sequentially obtaining the coded data corresponding to all the father nodes of the ith layer based on the dependency relationship among the nodes of the ith layer in the stacked depth robust graph, splicing the coded data and then further coding the coded data to obtain the coded data corresponding to the nodes of the ith layer; based on the relative dependency relationship of the i +1 th layer node to the i-th layer node in the stacked depth robust graph, sequentially writing the coded data corresponding to the father node of each node of the i +1 th layer in the i-th layer into an external memory, and deleting the coded data corresponding to each node of the i-th layer in the internal memory; i =2,3, …, L-1;
l-layer coding: reading coded data corresponding to father nodes of all nodes of the L-th layer in the L-1 th layer from an external memory based on the relative dependency relationship of the L-th layer nodes in the stacked depth robustness graph on the L-1 th layer nodes, sequentially obtaining the coded data corresponding to all father nodes of all nodes of the L-th layer based on the dependency relationship among all nodes in the L-th layer of the stacked depth robustness graph, further coding after splicing to obtain the coded data corresponding to all nodes of the L-th layer, and carrying out bitwise XOR on the coded data corresponding to all nodes of the L-th layer and the coded data to be coded to obtain a final coding result;
wherein, the stack depth robust graph comprises an L layer; the number of nodes in each layer is M; randomly generated dependency relationships exist between nodes in the same layer and between nodes in two adjacent layers.
Further preferably, after the coded data corresponding to each node of the jth layer is obtained, the coded data corresponding to each node of the jth layer are all distributed into different computing windows, and an external storage file is distributed to each computing window; based on the relative dependency relationship of the j +1 th layer node to the j layer node in the stacked depth robust graph and the dependency relationship between the nodes in the j +1 th layer, sequentially writing the coded data, which are in different computing windows and correspond to all father nodes of each node in the j +1 th layer, into corresponding external storage files respectively, and deleting the coded data corresponding to each node in the j layer in the internal storage; wherein j =1,2, …, L-1;
correspondingly, the method for calculating the coded data corresponding to each node of the j +1 th layer comprises the following steps:
when calculating the coded data corresponding to a node of the j +1 th layer, reading a preset number of coded data from the header of each external storage file which cannot be placed back to the corresponding memory window in advance, reading one coded data from the header of each memory window in which the coded data corresponding to all father nodes of the node of the j +1 th layer are located in an unreplaceable sequence respectively, obtaining the coded data corresponding to all father nodes of the node of the j +1 th layer, and ejecting the read coded data from the memory window; when calculating the coded data corresponding to the next node of the j +1 th layer, reading the coded data of a preset number from the head of each external memory file which cannot be replaced, and sequentially placing the coded data at the tail of the corresponding memory window; based on the above process, the coded data corresponding to all father nodes of each node of the j +1 th layer are sequentially obtained, and further coding is performed after splicing to obtain the coded data corresponding to each node of the j +1 th layer.
In a second aspect, the present invention provides a data encoding method for proof of copy, comprising: sequentially dividing original data into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to one execution stage and is sequentially executed according to the dividing sequence; in each execution stage, the corresponding continuous data set is used as data to be encoded, and the data encoding method provided by the first aspect of the invention is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
In a third aspect, the present invention provides a data encoding method for proof of copy, for encoding a plurality of original data simultaneously, including: sequentially dividing each original data into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to an execution stage; executing each original data according to the execution stage, and executing the execution stage of each original data in a pipeline mode; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
In a fourth aspect, the present invention provides a data encoding system for proof of copy, comprising: a memory storing a computer program and a processor executing the computer program to perform the data encoding method provided by the first, second and/or third aspects of the present invention.
In a fifth aspect, a data decoding method for proof of copy, comprising:
performing L-layer decoding on data to be decoded in series:
decoding of the first layer: acquiring a random seed according to the information of the data to be decoded, and generating random data with the same size as the data to be decoded according to the random seed; dividing random data into M' data blocks in sequence, wherein each data block corresponds to each node position in a first layer of a pre-built stack type depth robust graph one by one; decoding each data block in sequence based on the dependency relationship among each node in the first layer of the stack-type depth robust graph, and sequentially generating M' pieces of decoding data to obtain the decoding data corresponding to each node of the first layer; based on the relative dependency relationship of a second layer node to a first layer node in a stacked depth robust graph, sequentially writing decoding data corresponding to a father node of each node of the second layer in the upper layer of the second layer into an external memory, and deleting the decoding data corresponding to each node of the first layer in the internal memory;
decoding the ith layer: reading decoding data corresponding to father nodes of the ith layer in the i-1 layer from an external memory based on the relative dependency relationship of the nodes of the ith layer in the stacked deep robust graph on the nodes of the i-1 layer, sequentially obtaining the decoding data corresponding to all the father nodes of the ith layer based on the dependency relationship among the nodes of the ith layer in the stacked deep robust graph, splicing and then further decoding to obtain the decoding data corresponding to the nodes of the ith layer; based on the relative dependency relationship of the i +1 th layer node to the i-th layer node in the stacked depth robust graph, sequentially writing decoding data corresponding to the father node of each node of the i +1 th layer in the i-th layer into an external memory, and deleting the decoding data corresponding to each node of the i-th layer in the internal memory; i =2,3, …, L-1;
decoding at an L layer: reading decoding data corresponding to father nodes of all nodes of the L-1 layer from an external memory based on the relative dependency relationship of the L-layer nodes in the stacked depth robust graph on the L-1 layer nodes, sequentially obtaining the decoding data corresponding to all the father nodes of all the nodes of the L-layer nodes based on the dependency relationship among all the nodes in the L-layer of the stacked depth robust graph, further decoding after splicing to obtain the decoding data corresponding to all the nodes of the L-layer, and carrying out bitwise XOR on the decoding data corresponding to all the nodes of the L-layer and the data to be decoded to obtain a final decoding result;
wherein the stacked depth robust graph comprises L layers; the number of nodes in each layer is M'; randomly generated dependency relationships exist between nodes in the same layer and between nodes in two adjacent layers.
Further preferably, after the decoded data corresponding to each node of the jth layer is obtained, the decoded data corresponding to each node of the jth layer are uniformly distributed into different calculation windows, and an external storage file is distributed to each calculation window; based on the relative dependency relationship of the j +1 th layer node to the j layer node in the stacked depth robust graph and the dependency relationship between the nodes in the j +1 th layer, sequentially writing the decoding data, which are in different calculation windows and correspond to all father nodes of each node in the j +1 th layer, into the corresponding external storage file respectively, and deleting the decoding data corresponding to each node in the j layer in the internal storage; wherein j =1,2, …, L-1;
correspondingly, the method for calculating the decoding data corresponding to each node of the j +1 th layer comprises the following steps:
when calculating the decoding data corresponding to a node of the j +1 th layer, reading a preset number of decoding data from the header of each external storage file which can not be placed back to the corresponding memory window in advance, reading one decoding data from the header of each memory window in which the decoding data corresponding to all father nodes of the node of the j +1 th layer are located, in an unreplaceable order, to obtain the decoding data corresponding to all the father nodes of the node of the j +1 th layer, and ejecting the read decoding data from the memory window; when calculating the decoding data corresponding to the next node of the j +1 th layer, reading the decoding data of the preset quantity from the head of each external memory file which can not be replaced, and sequentially placing the decoding data at the tail of the corresponding memory window; based on the above process, the decoded data corresponding to all father nodes of each node of the j +1 th layer are sequentially obtained, and further decoded after splicing to obtain the decoded data corresponding to each node of the j +1 th layer.
In a sixth aspect, the present invention provides a data decoding method for proof of copy, comprising: sequentially dividing original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to one execution stage and is sequentially executed according to the dividing sequence; in each execution stage, the corresponding continuous data set is used as data to be decoded, and the data decoding method provided by the fifth aspect of the invention is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
In a seventh aspect, the present invention provides a data decoding method for proof of copy, for decoding a plurality of original data simultaneously, including: sequentially dividing each original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage; executing each original data according to the execution stage, and executing the execution stage of each original data in a pipeline mode; in each execution stage, the corresponding continuous data set is used as data to be decoded, and the data decoding method provided by the fifth aspect of the invention is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
In an eighth aspect, the present invention provides a data decoding system for proof of copy, comprising: a memory storing a computer program and a processor executing the computer program to perform the data decoding method provided by the fifth, sixth and/or seventh aspects of the present invention.
In a ninth aspect, the present invention also provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the computer program controls an apparatus on which the storage medium is located to execute the data encoding method provided by the first aspect, the second aspect, and the third aspect of the present invention, and/or the data decoding method provided by the fifth aspect, the sixth aspect, and the seventh aspect of the present invention.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. after the encoded data corresponding to each node on the upper layer is obtained, based on the relative dependency relationship of the node on the upper layer by the next layer in the stacked depth robust graph, sequentially writing the encoded data corresponding to the father node of each node on the upper layer of the next layer into an external memory, and deleting the encoded data corresponding to each node on the upper layer in the internal memory; the invention saves the coded data corresponding to the upper node in the memory in the external memory, and researches the proper distribution of the external memory nodes, so that the child nodes positioned at the next layer are sequentially read when reading the upper parent node which depends on the child nodes, thereby greatly relieving the algorithm execution performance loss caused by the difference of the internal and external memory performances, and greatly reducing the expenditure of memory space resources.
2. The invention provides a data coding method for copy certification, which divides a single-layer large-scale node into a plurality of small windows, writes the node in the window into an external memory and corresponds to a file in the external memory, and writes the node into the external memory according to the relative sequence of the nodes, thereby ensuring that father nodes in a single file are relatively continuously stored during reading, reading any number of father nodes in the file corresponding to the single window, and reading subsequent father nodes from the file after the read father nodes are used. And then the influence of the read operation on the overall execution time of the algorithm is reduced by utilizing the random reading of the large data block. In addition, reading and writing can be executed concurrently with node calculation, so that the time overhead of the algorithm is further reduced.
3. In the SDR algorithm execution process, the memory resources are pre-allocated, and the memory space is pre-allocated before the SDR starts to calculate the nodes. However, because most of the memory space is not used for a long time after being allocated in the progressive execution process of the SDR algorithm, the data coding methods for the replication certification provided by the second aspect and the third aspect of the present invention both adopt a memory immediate allocation strategy to perform progressive memory allocation, firstly, it is proposed to decompose large-scale data in the layer into small data sets, each small data set is called a window, and a memory is allocated before the current window starts to calculate, so as to avoid a large amount of memory occupation for a long time; in addition, the coding methods provided by the first aspect of the present invention may be executed concurrently, thereby ensuring that the memory space usage of a plurality of executed coding methods is unchanged, and reducing the average memory space overhead.
Drawings
Fig. 1 is a schematic diagram of a data encoding method for proof of copy according to embodiment 1 of the present invention;
fig. 2 is a stacked depth robust diagram provided in embodiment 1 of the present invention;
fig. 3 is a schematic diagram illustrating a relative order distribution of upper parent nodes according to embodiment 1 of the present invention;
FIG. 4 is a diagram illustrating a parent node in a relatively ordered distribution of reads provided in embodiment 1 of the present invention;
fig. 5 is a schematic diagram of a four-stage progressive memory allocation method according to embodiment 2 of the present invention;
fig. 6 is a schematic diagram illustrating memory space occupied by concurrently executing 4 codes of original data according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
It should be noted that the invention finds two types of ubiquitous but highly ignored characteristics by comprehensively analyzing the SDR algorithm of the most representative distributed storage system, filocoin, in the world at present: (1) the node dependency relationship is fixed, and before calculation, a father node on which each node depends can be predetermined; (2) and (4) node progressive generation, wherein in the process of carrying out hash calculation on the node to be generated, the hash calculation of the next node must wait for the completion of the previous hash calculation. More importantly, despite these two features, in order to avoid the computation amplification of the SDR algorithm caused by the loss of the parent node during the execution process, the SDR algorithm still needs to store all the node data of two adjacent layers into the memory, and the total amount of the node data is up to 64GB, which poses a great challenge to the popularization of distributed storage to general users. Considering that the above-mentioned device usually has a large capacity of external memory (several hundreds of GB or several TB), if a part of the nodes that should originally be stored in the internal memory can be stored in the external memory, the amount of memory used can be effectively reduced. However, in a modern computer system, there is a great performance gap between the external memory and the internal memory, and if the node data is directly stored in the external memory according to the layout in the internal memory, the difference of node reading delay is 4-6 (for example, the access delay of the hard disk is about 10 ms, and the access delay of the internal memory is about 10 ns), so that the performance of the algorithm execution is greatly reduced. In order to solve the above problem, the present invention provides a coding and decoding method for saving memory required for calculation in the process of copy conversion of stored data, which specifically comprises the following steps:
examples 1,
A data encoding method for proof of replication, the corresponding schematic diagram being as shown in fig. 1, comprising: performing L-layer encoding on the data to be encoded in series:
and (3) first layer coding: obtaining a random seed according to the information of the data to be coded, and generating random data with the same size as the data to be coded according to the random seed; dividing random data into M data blocks in sequence, wherein each data block corresponds to each node position in a first layer of a pre-built stack type depth robust graph one by one; based on the dependency relationship among all nodes in the first layer of the stack-type depth robust graph, sequentially encoding all data blocks, and sequentially generating M encoded data to obtain the encoded data corresponding to all nodes in the first layer; based on the relative dependency relationship of a second layer node to a first layer node in a stacked depth robust graph, sequentially writing the coded data corresponding to the father node of each node of the second layer in the upper layer of the second layer node into an external memory, and deleting the coded data corresponding to each node of the first layer in the internal memory;
and (3) encoding the ith layer: reading coded data corresponding to father nodes of the ith layer in the i-1 th layer from an external memory based on the relative dependency relationship of the nodes of the ith layer in the stacked depth robust graph to the nodes of the i-1 th layer, sequentially obtaining the coded data corresponding to all the father nodes of the ith layer based on the dependency relationship among the nodes of the ith layer in the stacked depth robust graph, splicing the coded data and then further coding the coded data to obtain the coded data corresponding to the nodes of the ith layer; based on the relative dependency relationship of the i +1 th layer node to the i-th layer node in the stacked depth robust graph, sequentially writing the coded data corresponding to the father node of each node of the i +1 th layer in the i-th layer into an external memory, and deleting the coded data corresponding to each node of the i-th layer in the internal memory; i =2,3, …, L-1;
l-layer coding: reading coded data corresponding to father nodes of all nodes of the L-th layer in the L-1 th layer from an external memory based on the relative dependency relationship of the L-th layer nodes in the stacked depth robustness graph on the L-1 th layer nodes, sequentially obtaining the coded data corresponding to all father nodes of all nodes of the L-th layer based on the dependency relationship among all nodes in the L-th layer of the stacked depth robustness graph, further coding after splicing to obtain the coded data corresponding to all nodes of the L-th layer, and carrying out bitwise XOR on the coded data corresponding to all nodes of the L-th layer and the coded data to be coded to obtain a final coding result;
it should be noted that the encoding method in the foregoing process may be any encoding method, such as encoding by using SHA256 hash algorithm.
In summary, in order to solve the problem of excessive memory space overhead of the SDR algorithm, the present invention proposes an upper node write-back strategy to reduce the memory space overhead of the SDR algorithm. The write-back strategy of the upper node considers that the data of the upper node in the memory is stored in the external memory, thereby reducing the memory space overhead. And proper external memory node distribution is researched, so that child nodes positioned on the next layer are sequentially read when reading parent nodes on the upper layer, which depend on the child nodes, and the algorithm execution performance loss caused by the internal and external memory performance difference is greatly relieved. The experimental result shows that when the size of the copy is 32GB and the algorithm is executed in an Intel test environment, compared with the original SDR algorithm, the memory space overhead is reduced by 50%, and the execution time increment of the algorithm is not more than 6.9%.
Further, the stacked depth robust graph is constructed based on a depth robust graph structure and an expansion graph structure, and comprises an L layer; the number of nodes in each layer is M; each layer is a depth robust graph, and an expansion graph is formed between any two adjacent layers; randomly generated parent-child dependency relationships exist between nodes in the same layer and between two adjacent layers of nodes, namely a parent-child dependency relationship is formed between the nodes in the same layer based on a deep robust graph, and a parent-child dependency relationship is formed between the nodes in the two adjacent layers based on an extended graph.
Specifically, a Deep Robust Graph (DRG) is one of the directed acyclic graphs, commonly denoted as (N, α, β, d) -DRG; wherein N is the total number of nodes included in the DRG, all nodes have the same degree of entry and are d, and α and β are two robustness coefficients between 0 and 1. It has the following properties: when any (1- α) N nodes and all edges with the nodes as starting points or end points of the depth robust graph (N, α, β, d) -DRG are deleted, a node v0 is inevitably present in the graph formed by the remaining nodes and edges, and the maximum depth (v 0) is greater than or equal to β N.
The expander graph is composed of directed bipartite graphs, commonly referred to as (N, α, β, d) -expander graphs; a directed bipartite graph may typically consist of two sets of nodes (a source node set and a target node set, respectively) and one set of edges. The total number of nodes contained in the source node set and the target node set in the expanded graph is N, the out-degree of any node in the source node set is d, the in-degree of any node in the target node set is d, and alpha and beta are two expansibility coefficients which are between 0 and 1 and alpha is smaller than beta. It has the following properties: any alpha N number of target nodes in the target node set of the (N, alpha, beta, d) -expansion diagram are connected with beta N number of source nodes in the source node set L of the (N, alpha, beta, d) -expansion diagram.
Specifically, a stacked depth robust graph in one embodiment is shown in FIG. 2, and includes two levels of nodes, each level including 6And 12 nodes in total. Among the 6 nodes located on the same layer, there is formed (6,5/6,1/2,2) -DRG, the degree of entry of the other nodes of the same layer except the first node of the same layer is 2, for example, except N located in the first layer 1 And N in the second layer 7 The in-degree is not 2, and the in-degrees of the rest nodes are all 2. Thus, all nodes depend on 2 parent nodes at the same level and in front of the node, and the 2 parent node indices of the node are given by the stacked depth robustness graph. In addition, a (6,1/6,1/2,3) -expanded graph is formed between the nodes of the second layer and the nodes of the first layer. The out-degree of the source node at the first level is 3, and the in-degree of the target node at the second level is 3. Therefore, all nodes at the lower level of the first hierarchy depend on 3 parents at the upper level thereof, and the 3 parents of the nodes are given by the expanded graph structure. In summary, in the stacked depth robustness graph structure constructed based on the DRG and the extended graph, the parent-child dependency relationship is fixed: a node located in the first layer depends on the preamble d of the node in the first layer L A parent node. Nodes underlying the first layer depend on preambles d lying in the same layer L Parent node and d on the upper layer E And (4) a parent node. And the index position of the dependent parent node is fixed.
It should be noted that, since the sequential access performance of the external memory (especially the hard disk) is much higher than the random access performance, for example, when the sequential read-write bandwidth of the hard disk is 250MB/s, the random read-write bandwidth of 4KB is 556KB/s. In order to solve the above problems, considering that the stacked deep robust graph structure has the characteristic that the dependency relationship of parent nodes and child nodes is fixed, by sequentially writing back the upper parent nodes of all nodes to be generated to the external memory according to the node generation sequence, it can be ensured that the access to the parent nodes is sequential, so that the algorithm execution performance loss caused by the performance difference of the internal memory and the external memory can be greatly relieved. Since each parent node of the upper layer has d E The child nodes are positioned at the lower layer, therefore, in order to maintain the sequential read-write characteristic of the data of the father nodes, each father node needs to write d into the external memory E Next, the process is carried out. Based on the above, the invention provides an upper layer data write-back strategy which is divided into two parts according to a relative sequenceThe intermediate data are written into the external memory after being sequenced in the internal memory, and d is sequentially written and read in the external memory E The data with the size being more than that of the copy avoids the data of the upper father node from being stored in the memory, thereby reducing the memory overhead by 50 percent; meanwhile, sequential writing is ensured, and the external memory bandwidth is fully utilized.
Specifically, fig. 1 depicts the original distribution of the parent nodes and the sequential distribution of the parent nodes. In the original distribution mode of the father node, N 7 Reading nodes located at 1 st, 2 nd and 5 th bits of the original distribution of nodes, N 8 The nodes located at bits 2-4 in the original distribution of nodes are read. Therefore, in the original distribution of the nodes, the reading process of the parent node of the node to be generated is a random reading process. Then traversing the node N to be generated 7 To N 12 All parent nodes of N, will 7 All father nodes N of 1 、N 2 And N 5 Placed at positions 1-3. By analogy, N is 12 All father nodes N of 2 、N 3 And N 6 Placed at bits 16-18, may constitute a parent node sequential distribution. When reading father node in node sequence distribution mode, N 7 Reading nodes located at 1-3 bits in the node order distribution, N 8 The nodes located at bits 4-6 in the node order distribution are read. By analogy, N 12 The nodes located at the 16 th to 18 th bits in the node order distribution are read. Therefore, in the parent node sequential distribution mode, the parent node reading process of the nodes to be generated is a sequential reading process. And sequencing the father nodes in the memory according to the rule in a node sequence distribution mode, and writing the father nodes into the external memory, so that the father nodes can be accessed sequentially.
Further, the execution process of the data write-back policy can be divided into the following 4 steps:
1) Reading the coded data corresponding to the father node: and sequentially reading the coded data corresponding to the parent node of the node to be generated from the external memory to a read buffer RBuf, and stopping when the read buffer is full.
2) Calculating the coded data corresponding to the lower-layer node: reading d from buffer sequentially E Coded data corresponding to each father node and randomly reading d from the memory L And the coded data corresponding to the parent node. According to d read E +d L The coded data corresponding to the parent node and some auxiliary information related to the copy are calculated by a hash algorithm.
3) Sequencing of coded data corresponding to the parent node: and after the hash calculation of all the nodes on the same layer is finished, sequencing the nodes in the memory according to the node distribution mode in the external memory, and writing the sequencing result into a write buffer WBuf.
4) Writing the coded data corresponding to the father node into the external memory: and when the writing buffer area is full, sequentially writing the data in the buffer area into the external memory.
In the 4 steps, the reading parent node and the calculating generation node step is a typical problem of a producer and a consumer, and therefore, in an optional implementation, a thread may be respectively allocated to the process of reading the encoded data corresponding to the upper node and the process of calculating the encoded data corresponding to the lower node, and the threads are respectively denoted as a first thread and a second thread; the first thread reads the corresponding coded data to a buffer area for the second thread to calculate the coded data corresponding to the lower-layer node; the first thread and the second thread Cheng Bingfa execute, thereby greatly reducing the execution time under the upper layer data write-back strategy. In addition, the parent node sorting and writing parent node to external memory steps are also a typical problem for producers and consumers, so that a thread can be allocated to the sorting process of the encoded data corresponding to the parent node and the process of writing the encoded data corresponding to the parent node to external memory, and the threads are respectively marked as a third thread and a fourth thread, and the third thread and the fourth thread Cheng Bingfa execute, thereby further reducing the execution time under the upper layer data write-back strategy.
Further, in the data encoding method provided in this embodiment, although a data write-back scheme is adopted, it is still necessary to wait for the single-layer node data to be written back after the completion of the computation, and therefore, a layer of data needs to be retained in the main memory: all nodes located at the same level as the node to be generated. When the data copy size is large, such as 32GB, the main memory space usage still needs 32GB. Therefore, the data encoding method still requires a large amount of memory space, and for a device with a large amount of idle storage resources, the memory space requirement still cannot be met. Considering that if it is desired to further reduce the memory space overhead of the SDR algorithm, it is necessary to write back the external memory after the computation of a part of nodes (a set composed of part of nodes is called a window), for example, the total number of nodes in a single layer is N, and each time the computation of N/8 nodes is completed, the N/8 nodes are written into the external memory according to a certain node distribution. Only the memory space needed by all the nodes in the window needs to be reserved in the memory, so that the use amount of the main memory space is greatly reduced. However, since the parent node is random and has a very large span (much more than N/2), especially in the parent-child node dependency relationship of two adjacent layers in the SDR algorithm. When only the nodes with the window size are reserved in the memory, the node distribution generated by the part of nodes is not necessarily the node sequential distribution no matter how the window size is adjusted. Therefore, when reading these node distributions, there must be a random read operation. In order to solve the above problems, the present invention further studies node distribution to find that if only a part of nodes of the current layer, i.e. a single window, are desired to be reserved in the memory, all nodes in the window need to be written into the external memory after the computation is completed. However, if the nodes in the window are written into the external memory, the node distribution in the external memory cannot necessarily be sorted to obtain the node order distribution or the node order sub-distribution (a continuous part of the node order distribution). And the father nodes of the nodes to be generated are necessarily distributed according to the node sequence for sequential reading. Therefore, no matter what distribution is adopted to store all the nodes in the window into the external memory, the process of reading the nodes from the external memory is necessarily a random reading process. In addition, all nodes in each window correspond to two node distribution modes, and the single-layer nodes can be divided into a plurality of windows. If only partial nodes of the layer are expected to be reserved in the memory, a large number of node distributions are inevitably existed, and each node distribution corresponds to one file in the external memory. Reading of the parent node therefore amounts to randomly reading a large number of files. However, considering that the random read bandwidth of the external memory (especially the hard disk) is related to the single read data amount, for example, in the hard disk with 250MB/s sequential read bandwidth, the random read bandwidth of 32KB of single read data amount is about 4MB/s, the random read bandwidth of 1MB of single read data amount is about 75MB/s, and the random read bandwidth of 32MB of single read data amount is about 240MB/s. If random reading with large data volume is adopted for reading of the external storage node, especially random reading with the read data volume of 32MB, the bandwidth of random reading can be close to or equal to the bandwidth of sequential reading.
If the parent node distribution can be generated, it is ensured that nodes in the parent node distribution maintain a relative reading order with respect to other nodes in the parent node distribution. In general, if a current node is necessarily used between a predecessor node and a successor node (as an input to a hash operation of its child nodes) in a parent node distribution, the node distribution is referred to as a relative node distribution. The relative sequential distribution of the nodes can ensure that any number of nodes are read, and the nodes are continuously read from the node distribution after the part of the nodes are used up. Therefore, the nodes are stored in the external memory according to the relative order distribution of the nodes, and random reading with large data volume can be adopted when the nodes are read, so that the performance loss of algorithm execution caused by random reading is relieved. In addition, each parent node of the lower layer has d E The child nodes are located at the upper layer, and each father node of the layer has d L The child nodes are located in the same layer, but most of the parent nodes and the child nodes are located in the same window. Thus, to maintain the relative sequential distribution of the parent node data, each parent node needs to write at most d in external memory E + dL (worst case, true case closer to d E ) Second, i.e. presence of at most d on external memory E + dL times data write amplification. Thus, the nodes are written to at most and read randomly d in the order of the relative order distribution of the nodes E The data which is multiple of the size of the copy only needs to store the nodes in the window in the memory, thereby greatly reducing the memory space overhead.
The SDR algorithm is generated according to the number sequence of the indexes in the execution process. Thus, the node distribution may be generated in the relative order of the parent nodes in the window.And is divided into upper-layer father node distribution and local-layer father node distribution during writing. In the distribution of the parent nodes at the upper layer, traversing the index i =0 … n of the nodes at the lower layer according to the generation sequence of the nodes at the lower layer, and according to SDR E (i) The function outputs the index of all the parent nodes of node i, which are then located according to the index. In the node distribution of the current layer, traversing indexes i = k … n of nodes of the next layer according to the generation sequence of subsequent nodes of the current layer (k represents the index number of the last node in the current window), outputting indexes of all parent nodes of the node i according to the SDR. And writing the data into an external memory according to the distribution rule according to the upper-layer father node distribution and the current-layer father node distribution which are respectively obtained according to the rule, so that the data can be read and written in a guaranteed sequence in the reading and writing process. And random reading of any size can be adopted in the subsequent node calculation process.
Specifically, in an optional embodiment, after obtaining the encoded data corresponding to each node of the jth layer, the encoded data corresponding to each node of the jth layer are all divided into different computing windows, and an external storage file is respectively allocated to each computing window; based on the relative dependency relationship of the j +1 th layer node to the j layer node in the stacked depth robust graph and the dependency relationship between the nodes in the j +1 th layer, sequentially writing the coded data, which are in different computing windows and correspond to all father nodes of each node in the j +1 th layer, into corresponding external storage files respectively, and deleting the coded data corresponding to each node in the j layer in the internal storage; wherein j =1,2, …, L-1. Specifically, fig. 3 is a schematic diagram illustrating a relative order distribution of upper parent nodes. Wherein, the single layer comprises 6 nodes, the number of divided windows is 2, and each window corresponds to a file in the external memory. Taking window 1 as an example, the node of window 1 includes N 1 ,N 2 And N 3 . The sequencing method is to traverse all target nodes and to traverse N 7 ,N 8 …N 12 And acquiring the parent nodes of the 6 target nodes in the window. Wherein N is 7 The parent node in Window 1 is N 1 And N 2 Thus will be N 1 And N 2 Arranged at the head of file 1. N is a radical of 8 The parent node in Window 1 is N 2 And N 3 Thus will be N 2 And N 3 Arranged in file 1N 1 And N 2 Behind the head. By analogy, last N 12 Parent node in Window 1 is N 2 And N 3 Thus will be N 2 And N 3 Arranged at the end of file 1. After the calculation of the node data in the window 1 is completed, the node data in the window 1 is sorted according to the rule, and the sorted data is written into an external memory.
Correspondingly, the method for calculating the coded data corresponding to each node of the j +1 th layer comprises the following steps:
when calculating the coded data corresponding to a node of the j +1 th layer, reading a preset number of coded data from the header of each external storage file, which cannot be placed back, into the corresponding memory window (reading a plurality of/a large number of coded data from the header of each external storage file in an unreplaceable order in advance, wherein the specific reading amount is related to the window size, the parameter of the robustness graph and the actual connection of the robustness graph, in this embodiment, 1000 coded data are read from the header of each external storage file in an unreplaceable order, one coded data is 256 bits, and 32KB coded data are read in total), reading one coded data from the header of each memory window, in which the coded data corresponding to all parent nodes of the node of the j +1 th layer are located, in an unreplaceable order, respectively, to obtain the coded data corresponding to all parent nodes of the node of the j +1 th layer, and ejecting the read coded data from the memory window; when the coded data corresponding to the next node of the j +1 th layer are calculated, reading the coded data of a preset number from the head of each external storage file which cannot be put back, and sequentially placing the coded data at the tail of the corresponding memory window; based on the above process, the coded data corresponding to all father nodes of each node of the j +1 th layer are sequentially obtained, and further coding is performed after splicing to obtain the coded data corresponding to each node of the j +1 th layer.
It should be noted that in the process of reading dependency, the relative sequential distribution of the nodes needs to be rearranged, and then the sequential distribution of the nodes is obtainedAnd carrying out the next hash calculation. Thus, read dependencies are actually a reordering of the way to arrange the node data in a file into a fully ordered state. Fig. 4 is a schematic diagram of reading parent nodes in a relatively ordered distribution, where the total amount of data is 6, and the number of windows is 2, so that there are two corresponding files in the external memory. One of them holds N 1 ,N 2 And N 3 The nodes of (b) are distributed in a relative sequence, and the other part is provided with N 4 ,N 5 And N 6 Are distributed in a relative order. Upon reading, the corresponding file can be found using the index of the dependent parent node, e.g., parent node No. 3 in file 1 and parent node No. 5 in file 2. Subsequently, when acquiring the dependent node, the dependent node is acquired from the latest position of the dependent file. With N 7 For example, it is determined that node N needs to be read from Window 1 based on the dependency 1 And N 2 Reading node N from window 2 4 Therefore, two nodes are sequentially read from the file 1 corresponding to the window 1, which is N 1 And N 2 (ii) a Sequentially reading a node from the file 2 corresponding to the window 2, namely N 5 (ii) a Thereby reading out N 7 All upper nodes relied on, start to proceed with N 7 And (4) calculating. In particular, N 7 Dependent on N in the expander graph 1 ,N 2 And N 5 Node, therefore N 7 Has a file number of 1,1,2. Only first the first bit in file 1 just needs to be read, N 1 The next bit in the read file 1 is exactly N 2 Finally, the first bit in the second read file is just N 5 Therefore, N can be correctly acquired 1 ,N 2 And N 5 The value of the node. By analogy, N can be obtained 8 ,N 9 And waiting for the dependent data of the nodes.
In summary, the present invention adopts a window manner when allocating memory for the node, node data required for each calculation is prefetched into the memory window in advance, and data used after the calculation is completed is evicted from the memory window and covered by other node data. After all the nodes in the current window are calculated, writing the middle nodes in the window into an external memory according to the relative sequential distribution of the stacked depth robust graphs and the relative sequential distribution of the extended graphs, and storing a mapping relation table of the father nodes and the file indexes. After the calculation of the upper node data is completed, according to the predetermined dependency relationship between two layers of nodes, sequentially forming M upper nodes required by the calculation of each lower node into a pre-fetch data set, and sequentially writing the N pre-fetch data sets into an external memory according to the sequence of nodes to be generated. When the lower-layer node calculation is carried out, a memory window with the size of M is maintained, a pre-fetching data set is sequentially read out from an external memory during each calculation, the data is ejected from the memory window after the calculation is finished, and the data is covered by the next pre-fetching data set, so that the memory space is saved. Further, the window mode can be adopted to slide to the lower part of nodes after the nodes in the current window are calculated. Compared with the original SDR algorithm which needs to allocate memories with the sizes of the complete two layers of nodes, the method only needs to allocate the memories with the sizes of the windows. In the process, the mapping relation table of the father node and the file index is stored, the coded data corresponding to the father node in the file is read randomly through the mapping relation table, and the coded data corresponding to all father nodes of any child node to be generated can be obtained from the distribution of a plurality of relative sequence nodes. The nodes in the relative sequence node distribution have spatial locality in the access process, and a large number of nodes can be read in advance and buffer areas can be allocated to the nodes in the relative sequence node distribution in the process of reading the nodes, so that the random read bandwidth is improved.
Furthermore, the nodes in the layer can be written into the external memory in a window mode, so that only part of the nodes are needed to be stored in the internal memory, and the single-layer large-scale nodes are divided into a plurality of small windows. The method comprises the steps that nodes in a window are written into an external memory, the external memory corresponds to a file in the external memory, and the external memory is written according to the relative sequence of the nodes, so that the father nodes in a single file are guaranteed to be relatively continuously stored during reading, therefore, any number of father nodes are read from the file corresponding to the single window, and the follow-up father nodes are read from the file after the read father nodes are used. And then the influence of the read operation on the overall execution time of the algorithm is reduced by utilizing the random reading of the large data block. In addition, reading and writing can be executed concurrently with node calculation, so that the time overhead of the algorithm is further reduced. Experimental results show that when the size of the copy is 32GB and the algorithm is executed in an Intel test environment, the memory usage of the algorithm can be reduced to 192MB from 64GB by the node write-back strategy at the layer, namely 99.7% of memory space overhead is reduced, and the execution time of the algorithm is kept unchanged.
Examples 2,
The embodiment also provides a data coding method for the copy certification; it should be noted that, in the existing implementation of the SDR algorithm, the memory is pre-allocated, and when the algorithm is initially executed, a memory space with a size twice that of the copy is allocated. Because the encoded data corresponding to the node is generated progressively in the execution process of the SDR algorithm, the memory space allocated to the SDR algorithm is wasted most of the time, for example, when the SDR algorithm calculates the encoded data corresponding to the node i, the memory space allocated to the encoded data corresponding to the node i +1 and the subsequent nodes is not used yet. In order to solve the above problems, a gradual memory allocation strategy is proposed, and a memory allocation mode adopts gradual allocation to reduce the total memory space overhead.
Specifically, the data encoding method provided in this embodiment includes: sequentially dividing original data into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to one execution stage and is sequentially executed according to the dividing sequence; in each execution stage, the corresponding continuous data set is used as data to be encoded, and the data encoding method provided by the embodiment 1 of the invention is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage. And controlling the execution process of each data coding method, and ensuring that the execution process of each data coding method simultaneously starts and ends memory allocation and processing of the current window. For example, when processing original data composed of 1000 nodes, each layer is composed of 1000 nodes, and nodes with sequence numbers 1 to 500 are composed into window 1, and nodes with sequence numbers 501 to 1000 are composed into window 2. And the process of processing the nodes in each window using the above coding method is called a phase. Only after the current stage is finished and when the next stage is entered, the memory space is allocated to the window to be processed in the next stage. The memory allocation method is called a progressive memory allocation method. As shown in fig. 5, which shows a four-stage progressive memory allocation manner, it can be seen from fig. 5 that, after the processing of the nodes in window 1 in stage 1 is completed, the stage is allocated for the window and the process goes to stage 2.
The related technical scheme is the same as embodiment 1, and is not described herein.
Examples 3,
The embodiment also provides a data coding method for the copy certification; it should be noted that, the memory space occupation amount is not fixed according to the window size of the SDR algorithm adopting progressive memory allocation in the execution process. When the SDR algorithm processes the first window in the execution process, the memory space occupation of the SDR algorithm is minimum. And when the SDR algorithm processes the last window in the execution process, the memory space occupation amount of the SDR algorithm is the maximum. At this time, if the SDR algorithm for processing the last window and the SDR algorithm for processing the first window start to process the corresponding windows at the same time, the memory space occupation amount reaches twice of the average memory space occupation amount of the gradual memory allocation. Of course, when the corresponding window is processed, the memory space still changes. However, when K SDR algorithms with the number of windows are executed concurrently and each SDR algorithm starts to process mutually different windows at the same time, the memory space consumption reaches K times of the average memory space consumption of the progressive memory allocation. When all window processing ends simultaneously, although the memory space overhead changes for a single SDR algorithm, the overall memory overhead does not change.
In order to solve the above problem, the data encoding method provided by this embodiment is used for encoding a plurality of original data simultaneously, and includes: sequentially dividing each original data into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to an execution stage; executing each original data according to the execution stage, and executing the execution stage of each original data in a pipeline mode; in each execution stage, the corresponding continuous data set is used as data to be encoded, and the data encoding method provided by the embodiment 1 is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage. In the concurrent execution process, the execution process of each data coding method is controlled, the memory allocation and the processing of the current window of the execution process of each data coding method are ensured to be started at the same time, and the processing of the current window of the execution process of each data coding method is ensured to be ended at the same time. Specifically, as shown in fig. 6, by taking the example of concurrently executing the codes of 4 original data, a single-layer large-scale node is divided into 4 windows, a progressive memory allocation manner is adopted during the execution, multiple memory windows in different stages are concurrently calculated, after the current stage is completed, a memory space is allocated for a window to be processed in the next stage, and the execution process of each data coding method is controlled, so that it is ensured that the memory allocation of the current window and the processing of the current window are simultaneously started by the execution process of each data coding method, and that the processing of the current window by the execution process of each data coding method is simultaneously ended. And after finishing the windows 4-1 in the copy conversion process of the copies 1-4, simultaneously entering the next stage. At this time, although the memory space occupied by the execution process of the single data encoding method changes, the total memory space occupied by the execution processes of the 4 data encoding methods does not change and the total memory space occupied is reduced.
The related technical scheme is the same as embodiment 1, and is not described herein.
It should be noted that both the embodiment 2 and the embodiment 3 adopt the memory immediate allocation policy. In the SDR algorithm execution process, memory resources are pre-allocated in a pre-allocation mode, and a memory space is pre-allocated before the SDR starts to calculate nodes. However, due to the progressive execution process of the SDR algorithm, most of the memory space is not used for a long time after allocation, so the present invention proposes a strategy for progressive memory allocation. Firstly, the large-scale data in the layer is decomposed into small data sets, each small data set is called a window, and the memory is allocated before the current window starts to calculate, so that a large amount of memory occupation for a long time is avoided. In addition, the encoding methods in embodiment 1 can be executed concurrently, so that the memory space usage of multiple executed encoding methods is guaranteed to be unchanged, and the average memory space overhead is reduced. Experimental results show that, when 4 encoding methods in embodiment 1 are concurrently executed to process 32GB copies through a progressive memory allocation policy, the memory space overhead can be reduced by 18.75% and the execution time is kept unchanged, compared with the original SDR algorithm.
Examples 4,
A data encoding system for proof of replication, comprising: a memory storing a computer program and a processor executing the computer program to perform the data encoding method provided in embodiment 1, embodiment 2 and/or embodiment 3 of the present invention.
The related technical scheme is the same as the embodiment 1-3, and is not described herein.
Examples 5,
A data decoding method for proof of replication, comprising:
performing L-layer decoding on data to be decoded in series:
decoding of the first layer: obtaining a random seed according to the information of the data to be decoded, and generating random data with the same size as the data to be decoded according to the random seed; dividing random data into M' data blocks in sequence, wherein each data block corresponds to each node position in a first layer of a pre-built stack type depth robust graph one by one; decoding each data block in sequence based on the dependency relationship among each node in the first layer of the stacked depth robust graph, and sequentially generating M' pieces of decoding data to obtain the decoding data corresponding to each node of the first layer; based on the relative dependency relationship of a second layer node to a first layer node in a stacked depth robust graph, sequentially writing decoding data corresponding to a father node of each node of the second layer in the upper layer of the second layer into an external memory, and deleting the decoding data corresponding to each node of the first layer in the internal memory;
decoding the ith layer: reading decoding data corresponding to father nodes of an ith layer in the i-1 th layer from an external memory based on the relative dependency relationship of the nodes of the ith layer in the stacked depth robust graph to the nodes of the i-1 th layer, sequentially obtaining the decoding data corresponding to all the father nodes of the ith layer based on the dependency relationship among the nodes of the ith layer in the stacked depth robust graph, splicing the decoding data, and further decoding the decoding data to obtain the decoding data corresponding to the nodes of the ith layer; based on the relative dependency relationship of the i +1 th layer node to the i-th layer node in the stacked depth robust graph, sequentially writing decoding data corresponding to the father node of the i + 1-th layer node into an external memory, and deleting the decoding data corresponding to each node of the i-th layer in the internal memory; i =2,3, …, L-1;
l-layer decoding: reading decoding data corresponding to father nodes of all nodes of the L-1 layer of each node of the L-1 layer from an external memory based on the relative dependency relationship of the L-1 layer nodes in the stacked deep robust graph, sequentially obtaining the decoding data corresponding to all the father nodes of all the nodes of the L-1 layer based on the dependency relationship among all the nodes in the L-1 layer of the stacked deep robust graph, further decoding after splicing to obtain the decoding data corresponding to all the nodes of the L-1 layer, and carrying out bitwise XOR on the decoding data corresponding to all the nodes of the L-1 layer and the data to be decoded to obtain a final decoding result;
wherein the stacked depth robust graph comprises L layers; the number of nodes in each layer is M'; randomly generated dependency relationships exist between nodes in the same layer and between nodes in two adjacent layers.
As in embodiment 1, this embodiment may also allocate a thread to each of the process of reading the decoded data corresponding to the upper node and the process of calculating the decoded data corresponding to the lower node, and record the thread as a fifth thread and a sixth thread; the fifth thread reads the corresponding decoding data to the buffer area for the sixth thread to calculate the decoding data corresponding to the lower layer node; and the fifth thread and the sixth thread are executed concurrently, so that the execution time under the upper layer data write-back strategy is greatly reduced. In addition, similarly, one thread may be allocated to the sorting process of the decoded data corresponding to the parent node and the process of writing the decoded data corresponding to the parent node into the external memory, and the threads are respectively marked as a seventh thread and an eighth thread, and the seventh thread and the eighth thread Cheng Bingfa execute, so that the execution time under the upper layer data write-back policy is further reduced. The correlation analysis was the same as in example 1.
Preferably, after the decoding data corresponding to each node of the jth layer is obtained, the decoding data corresponding to each node of the jth layer are uniformly distributed into different computing windows, and an external storage file is distributed to each computing window; based on the relative dependency relationship of the j +1 th layer node to the j layer node in the stacked depth robust graph and the dependency relationship between the nodes in the j +1 th layer, sequentially writing the decoding data, which are in different calculation windows and correspond to all father nodes of each node in the j +1 th layer, into corresponding external storage files respectively, and deleting the decoding data corresponding to each node in the j layer in the internal storage; wherein j =1,2, …, L-1;
correspondingly, the method for calculating the decoding data corresponding to each node of the j +1 th layer comprises the following steps:
when calculating the decoded data corresponding to a node of the j +1 th layer, reading a preset amount of decoded data from the header of each external storage file in advance into the corresponding memory window (reading a plurality of/a large amount of decoded data from the sequence in which the header of each external storage file is not able to be placed back in advance, specifically reading how much of the decoded data is related to the window size, the parameter of the robustness graph and the actual connection of the robustness graph, in this embodiment, reading 1000 pieces of decoded data from the sequence in which the header of each external storage file is not able to be placed back in advance, wherein one piece of decoded data is 256 bits, reading 32KB of decoded data altogether), reading one piece of decoded data from the sequence in which the decoded data corresponding to all parent nodes of the node of the j +1 th layer are located at the header of each memory window, obtaining the decoded data corresponding to all parent nodes of the node of the j +1 th layer, and ejecting the read decoded data from the memory window; when calculating the decoding data corresponding to the next node of the j +1 th layer, reading the decoding data of the preset quantity from the head of each external memory file which can not be replaced, and sequentially placing the decoding data at the tail of the corresponding memory window; based on the above process, the decoded data corresponding to all father nodes of each node of the j +1 th layer are sequentially obtained, and further decoded after splicing to obtain the decoded data corresponding to each node of the j +1 th layer.
Examples 6,
A data decoding method for proof of replication, comprising: sequentially dividing original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to one execution stage and is sequentially executed according to the dividing sequence; in each execution stage, the corresponding continuous data set is used as data to be decoded, and the data decoding method provided by embodiment 5 of the invention is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
The related technical solution is the same as embodiment 5, and is not described herein.
Example 7,
A data decoding method for proof of copy for decoding a plurality of original data simultaneously, comprising: sequentially dividing each original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage; executing each original data according to the execution stage, and executing the execution stage of each original data in a pipeline mode; in each execution stage, the corresponding continuous data set is used as data to be decoded, and the data decoding method provided by embodiment 5 of the invention is executed; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
The related technical solution is the same as embodiment 5, and is not described herein.
For the above embodiments 5 to 7, it should be noted that the data decoding method and the data encoding method in the present invention are completely the same as the specific implementation of the data encoding method, and only the data to be encoded in the data encoding method is replaced by the data to be decoded, and the related technical solutions and analyses are the same as those in embodiments 1 to 3, which are not described herein again.
In summary, the present invention innovatively utilizes two important features of the stacked depth robust graph structure and the algorithm execution process: (1) fixing the node dependency relationship; (2) The method comprises the steps that nodes are gradually generated, a novel technical framework for reducing the memory space overhead of an SDR algorithm is provided, namely, aiming at the characteristic that the dependency relationship of the nodes is fixed, intermediate data are sequentially stored in an external memory according to a use time sequence, and the memory space overhead is greatly reduced; the intermediate data is accessed in a sequential mode so as to utilize the external memory bandwidth to the maximum extent and keep higher algorithm performance; aiming at the characteristic of node sequential generation, a progressive memory allocation mechanism is adopted, the total amount of memory space occupied by a plurality of SDR algorithm processes executed in parallel is reduced, and high algorithm performance can be maintained while the SDR memory space overhead is greatly reduced.
Example 8,
A data decoding system for proof of replication, comprising: a memory storing a computer program and a processor executing the computer program to perform the data decoding method provided in embodiment 5, embodiment 6 and/or embodiment 7 of the present invention.
The related technical scheme is the same as the embodiment 5-7, and is not described herein.
Examples 9,
A computer-readable storage medium, which includes a stored computer program, wherein when the computer program is executed by a processor, the apparatus on which the storage medium is located is controlled to execute the data encoding method provided in embodiment 1, embodiment 2, or embodiment 3 of the present invention, and/or the data decoding method provided in embodiment 5, embodiment 6, or embodiment 7 of the present invention.
The related technical scheme is the same as the embodiments 1 to 3 and the embodiments 5 to 7, and the description is omitted here.
It will be understood by those skilled in the art that the foregoing is only an exemplary embodiment of the present invention, and is not intended to limit the invention to the particular forms disclosed, since various modifications, substitutions and improvements within the spirit and scope of the invention are possible and within the scope of the appended claims.

Claims (9)

1. A data encoding method for proof of replication, comprising: performing L-layer encoding on the data to be encoded in series:
and (3) first layer coding: obtaining a random seed according to the information of the data to be coded, and generating random data with the same size as the data to be coded according to the random seed; dividing random data into M data blocks in sequence, wherein each data block corresponds to each node position in a first layer of a pre-built stack type depth robust graph one by one; based on the dependency relationship among all nodes in the first layer of the stack-type depth robust graph, sequentially encoding all data blocks, and sequentially generating M encoded data to obtain the encoded data corresponding to all nodes in the first layer; based on the relative dependency relationship of a second layer node to a first layer node in a stacked depth robust graph, sequentially writing the coded data corresponding to the father node of each node of the second layer in the upper layer of the second layer node into an external memory, and deleting the coded data corresponding to each node of the first layer in the internal memory;
layer i encoding: reading coded data corresponding to father nodes of the ith layer in the i-1 th layer from an external memory based on the relative dependency relationship of the nodes of the ith layer in the stacked depth robust graph to the nodes of the i-1 th layer, sequentially obtaining the coded data corresponding to all the father nodes of the ith layer based on the dependency relationship among the nodes of the ith layer in the stacked depth robust graph, splicing the coded data and then further coding the coded data to obtain the coded data corresponding to the nodes of the ith layer; based on the relative dependency relationship of the i +1 th layer node to the i-th layer node in the stacked depth robust graph, sequentially writing the coded data corresponding to the father node of each node of the i +1 th layer in the i-th layer into an external memory, and deleting the coded data corresponding to each node of the i-th layer in the internal memory; i =2,3, …, L-1;
l-layer coding: reading coded data corresponding to father nodes of all nodes of the L-th layer in the L-1 th layer from an external memory based on the relative dependency relationship of the L-th layer nodes in the stacked depth robustness graph on the L-1 th layer nodes, sequentially obtaining the coded data corresponding to all father nodes of all nodes of the L-th layer based on the dependency relationship among all nodes in the L-th layer of the stacked depth robustness graph, further coding after splicing to obtain the coded data corresponding to all nodes of the L-th layer, and carrying out bitwise XOR on the coded data corresponding to all nodes of the L-th layer and the coded data to be coded to obtain a final coding result;
wherein the stacked depth robust graph comprises an L layer; the number of nodes in each layer is M; randomly generated dependency relationships exist between nodes in the same layer and between nodes in two adjacent layers.
2. The data encoding method of claim 1, wherein after the encoded data corresponding to each node of the jth layer is obtained, the encoded data corresponding to each node of the jth layer are divided into different calculation windows, and each calculation window is allocated with an external storage file; based on the relative dependency relationship of the j +1 th layer node to the j layer node in the stacked depth robust graph and the dependency relationship between the nodes in the j +1 th layer, sequentially writing the coded data, which are in different computing windows and correspond to all father nodes of each node in the j +1 th layer, into corresponding external storage files respectively, and deleting the coded data corresponding to each node in the j layer in the internal storage; wherein j =1,2, …, L-1;
correspondingly, the method for calculating the coded data corresponding to each node of the j +1 th layer comprises the following steps:
when the coded data corresponding to a node of the j +1 th layer is calculated, reading a preset number of coded data from the heads of all external storage files which cannot be placed back to the corresponding memory windows in advance, reading one coded data from the heads of all memory windows in which the coded data corresponding to all father nodes of the node of the j +1 th layer are located in an unreplaceable sequence to obtain the coded data corresponding to all the father nodes of the node of the j +1 th layer, and ejecting the read coded data from the memory windows; when the coded data corresponding to the next node of the j +1 th layer are calculated, reading the coded data of a preset number from the head of each external storage file which cannot be put back, and sequentially placing the coded data at the tail of the corresponding memory window; based on the above process, the coded data corresponding to all father nodes of each node of the j +1 th layer are sequentially obtained, and further coding is performed after splicing to obtain the coded data corresponding to each node of the j +1 th layer.
3. A data encoding method for proof of replication, comprising: sequentially dividing original data into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to one execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be encoded and executes the data encoding method of claim 1 or 2; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
4. A data encoding method for proof of copy, for encoding a plurality of original data simultaneously, comprising: sequentially dividing each original data into a plurality of continuous data sets, wherein the coding process of each continuous data set corresponds to an execution stage; executing each original data according to the execution stage, and executing the execution stage of each original data in a pipeline mode; each execution phase takes the corresponding continuous data set as data to be encoded and executes the data encoding method of claim 1 or 2; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
5. A data decoding method for proof of replication, comprising: performing L-layer decoding on data to be decoded in series:
decoding of the first layer: obtaining a random seed according to the information of the data to be decoded, and generating random data with the same size as the data to be decoded according to the random seed; dividing random data into M' data blocks in sequence, wherein each data block corresponds to each node position in a first layer of a pre-built stack-type depth robust graph one by one; decoding each data block in sequence based on the dependency relationship among each node in the first layer of the stack-type depth robust graph, and sequentially generating M' pieces of decoding data to obtain the decoding data corresponding to each node of the first layer; based on the relative dependency relationship of a second layer node to a first layer node in a stacked depth robust graph, sequentially writing decoding data corresponding to a father node of each node of the second layer in the upper layer of the second layer into an external memory, and deleting the decoding data corresponding to each node of the first layer in the internal memory;
decoding the ith layer: reading decoding data corresponding to father nodes of the ith layer in the i-1 layer from an external memory based on the relative dependency relationship of the nodes of the ith layer in the stacked deep robust graph on the nodes of the i-1 layer, sequentially obtaining the decoding data corresponding to all the father nodes of the ith layer based on the dependency relationship among the nodes of the ith layer in the stacked deep robust graph, splicing and then further decoding to obtain the decoding data corresponding to the nodes of the ith layer; based on the relative dependency relationship of the i +1 th layer node to the i-th layer node in the stacked depth robust graph, sequentially writing decoding data corresponding to the father node of each node of the i +1 th layer in the i-th layer into an external memory, and deleting the decoding data corresponding to each node of the i-th layer in the internal memory; i =2,3, …, L-1;
decoding at an L layer: reading decoding data corresponding to father nodes of all nodes of the L-1 layer from an external memory based on the relative dependency relationship of the L-layer nodes in the stacked depth robust graph on the L-1 layer nodes, sequentially obtaining the decoding data corresponding to all the father nodes of all the nodes of the L-layer nodes based on the dependency relationship among all the nodes in the L-layer of the stacked depth robust graph, further decoding after splicing to obtain the decoding data corresponding to all the nodes of the L-layer, and carrying out bitwise XOR on the decoding data corresponding to all the nodes of the L-layer and the data to be decoded to obtain a final decoding result;
wherein the stacked depth robust graph comprises L layers; the number of nodes in each layer is M'; randomly generated dependency relationships exist between nodes in the same layer and between nodes in two adjacent layers.
6. The data decoding method of claim 5, wherein after the decoded data corresponding to each node of the jth layer is obtained, the decoded data corresponding to each node of the jth layer are divided into different calculation windows, and each calculation window is allocated with an external storage file; based on the relative dependency relationship of the j +1 th layer node to the j layer node in the stacked depth robust graph and the dependency relationship between the nodes in the j +1 th layer, sequentially writing the decoding data, which are in different calculation windows and correspond to all father nodes of each node in the j +1 th layer, into corresponding external storage files respectively, and deleting the decoding data corresponding to each node in the j layer in the internal storage; wherein j =1,2, …, L-1;
correspondingly, the method for calculating the decoding data corresponding to each node of the j +1 th layer comprises the following steps:
when calculating the decoding data corresponding to a node of the j +1 th layer, reading a preset number of decoding data from the header of each external storage file which can not be placed back to the corresponding memory window in advance, reading one decoding data from the header of each memory window in which the decoding data corresponding to all father nodes of the node of the j +1 th layer are located, in an unreplaceable order, to obtain the decoding data corresponding to all the father nodes of the node of the j +1 th layer, and ejecting the read decoding data from the memory window; when calculating the decoding data corresponding to the next node of the j +1 th layer, reading the decoding data of the preset quantity from the head of each external memory file which can not be replaced, and sequentially placing the decoding data at the tail of the corresponding memory window; based on the above process, the decoded data corresponding to all father nodes of each node of the j +1 th layer are sequentially obtained, and further decoded after splicing to obtain the decoded data corresponding to each node of the j +1 th layer.
7. A data decoding method for proof of replication, comprising: sequentially dividing original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to one execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method of claim 5 or 6; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
8. A data decoding method for proof of replication, for decoding a plurality of original data simultaneously, comprising: sequentially dividing each original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage; executing each original data according to the execution stage, and executing the execution stage of each original data in a pipeline mode; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method of claim 5 or 6; and only after the current execution stage finishes processing and when the next execution stage is entered, allocating memory space to the data required to be processed in the next execution stage.
9. A computer-readable storage medium comprising a stored computer program, wherein the computer program, when executed by a processor, controls an apparatus in which the storage medium is located to perform the data encoding method of any one of claims 1-2, the data encoding method of claim 3, the data encoding method of claim 4, the data decoding method of claims 5-6, the data decoding method of claim 7, and/or the data decoding method of claim 8.
CN202210829009.8A 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification Active CN115361401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210829009.8A CN115361401B (en) 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210829009.8A CN115361401B (en) 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification

Publications (2)

Publication Number Publication Date
CN115361401A true CN115361401A (en) 2022-11-18
CN115361401B CN115361401B (en) 2024-04-05

Family

ID=84032494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210829009.8A Active CN115361401B (en) 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification

Country Status (1)

Country Link
CN (1) CN115361401B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177739A1 (en) * 2006-01-27 2007-08-02 Nec Laboratories America, Inc. Method and Apparatus for Distributed Data Replication
CN103188048A (en) * 2013-02-01 2013-07-03 北京邮电大学 Network coding method oriented to peer-to-peer communication in tree topology structure
CN104904202A (en) * 2012-09-28 2015-09-09 三星电子株式会社 Video encoding method and apparatus for parallel processing using reference picture information, and video decoding method and apparatus for parallel processing using reference picture information
CN108540306A (en) * 2018-02-28 2018-09-14 博尔联科(厦门)智能技术有限公司 A kind of network node management method and its communicating control method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177739A1 (en) * 2006-01-27 2007-08-02 Nec Laboratories America, Inc. Method and Apparatus for Distributed Data Replication
CN104904202A (en) * 2012-09-28 2015-09-09 三星电子株式会社 Video encoding method and apparatus for parallel processing using reference picture information, and video decoding method and apparatus for parallel processing using reference picture information
CN103188048A (en) * 2013-02-01 2013-07-03 北京邮电大学 Network coding method oriented to peer-to-peer communication in tree topology structure
CN108540306A (en) * 2018-02-28 2018-09-14 博尔联科(厦门)智能技术有限公司 A kind of network node management method and its communicating control method
US20190268235A1 (en) * 2018-02-28 2019-08-29 PLCT System Ltd. Method for managing network nodes and communication control method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUNTAO FANG: "Early Identification of Critical Blocks: Making Replicated Distributed Storage Systems Reliable Against Node Failures", 《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》, 7 May 2018 (2018-05-07) *
罗宏宇: "分散式存储系统中多层次容错机制的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2022 (2022-01-15) *
陶钧;沙基昌;王晖;: "网格环境下基于编码机制的数据复制研究", 计算机科学, no. 02, 25 February 2008 (2008-02-25) *

Also Published As

Publication number Publication date
CN115361401B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
US11706020B2 (en) Circuit and method for overcoming memory bottleneck of ASIC-resistant cryptographic algorithms
CN107066393B (en) Method for improving mapping information density in address mapping table
US9471500B2 (en) Bucketized multi-index low-memory data structures
US10983955B2 (en) Data unit cloning in memory-based file systems
US8732538B2 (en) Programmable data storage management
US9665485B2 (en) Logical and physical block addressing for efficiently storing data to improve access speed in a data deduplication system
US20090249004A1 (en) Data caching for distributed execution computing
CN112579602B (en) Multi-version data storage method, device, computer equipment and storage medium
JP6608468B2 (en) Storage apparatus and control method thereof
US11709596B2 (en) Method, device and computer program for data storage
CN109558456A (en) A kind of file migration method, apparatus, equipment and readable storage medium storing program for executing
US11226798B2 (en) Information processing device and information processing method
US11409798B2 (en) Graph processing system including different kinds of memory devices, and operation method thereof
CN110851434A (en) Data storage method, device and equipment
KR101123335B1 (en) Method and apparatus for configuring hash index, and apparatus for storing data having the said apparatus, and the recording media storing the program performing the said method
US8959309B2 (en) Skip list generation
WO2024093090A1 (en) Metadata management method and apparatus, computer device, and readable storage medium
CN115361401A (en) Data encoding and decoding method and system for copy certification
CN110119245B (en) Method and system for operating NAND flash memory physical space to expand memory capacity
KR20130089324A (en) Data i/o controller and system having the same
TWI755168B (en) Flash memory controller mechanism capable of generating host-based cache information or flash-memory-based cache information to build and optimize binary tree with fewer nodes when cache stores data from host
JPWO2019008715A1 (en) Data load program, data load method and data load device
CN116820333B (en) SSDRAID-5 continuous writing method based on multithreading
JP7239827B2 (en) Information processing device and compiler program
Cho et al. A Study of Burst Transfer Generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant